How to Upload Bigwig File to Ucsc

HOMER

Software for motif discovery and adjacent-gen sequencing assay

Visualizing Experiments with a Genome Browser

The UCSC Genome Browser is quite maybe one of the all-time computational tools always adult. Not merely does information technology contain an incredible corporeality of data in a single application, it allows users to upload custom data such equally data from their Chip-Seq experiments so that they can be easily visualized and compared to other information. There are too other genome browsers that are available, and each has a dissimilar forcefulness:

UCSC Genome Browser

Truly a unique resource, logs of data preloaded and annotations.

WashU Epigenome Browser

Capable of visualizing long-range interactions (great for data sets like Hello-C), too has a lot of preloaded data.

IGV

The Integrated Genomics Viewer (IGV), great for looking at reads locally instead of needing to load them to a server/deject based solution. Dandy for directly looking at sorted bam/bai files to examine mutations in reads.

Many others...

Most of the tools that are part of HOMER cater to the strengths of the UCSC Genome Browser - however, the bedGraph and other files generated by HOMER can be unremarkably exist used in the other genome browsers equally well.

Making Genome Browser Files

The basic strategy HOMER uses is to create a bedGraph formatted file that can and then exist uploaded equally a custom track to the genome browser. This is accomplished using the makeUCSCfile program. To make a ucsc visualization file, type the post-obit:

makeUCSCfile <tag directory> -o auto

i.e. makeUCSCfile PU.1-Flake-Seq/ -o motorcar
(output file volition be in the PU.i-ChIP-Seq/ folder named PU.1-ChIP-Seq.ucsc.bedGraph.gz)

The "-o auto" with make the program automatically generate an output file proper noun (i.e. TagDirectory.ucsc.bedGraph.gz) and place information technology in the tag directory which helps with the arrangement of all these files. The output file can be named differently past specifying "-o outputfilename" or past only omitting "-o", which will send the output of the program to stdout (i.eastward. add " > outputfile" to capture information technology in the file outputfile). Information technology is recommended that you zip the file using gzip and directly upload the zipped file when loading custom tracks at UCSC.

To visualize the experiment in the UCSC Genome Browser, go to Genome Browser page and select the appropriate genome (i.e. the genome that the sequencing tags were mapped to). Then click on the "add custom tracks" button (this will read "manage custom tracks" once at least one custom runway is loaded). Enter the file created before in the "Paste URLs or data" section and click "Submit".

Issues Loading UCSC Files

The most common trouble encountered while loading UCSC files is to see "position exceeds chromosome length" or something to that effect. This is usually acquired by one of two issues:

1. You are trying to load the file to the incorrect genome associates. Make certain the assembly is correct!

2. Did you align the genome to a UCSC version? chr1 != Chr1 != 1

three. Some of your tags are mapping outside the reference chromosome - this can be caused by mapping to non-standard assemblies or past some alignment programs. To remove all reads outside of the UCSC chromosome lengths, you can run the program removeOutOfBoundsReads.pl.

removeOutOfBoundsReads.pl <tag directory> <genome>
i.e. removeOutOfBoundsReads.pl PU.1-Flake-Seq/ mm9

After running the plan, you can rerun makeUCSCfile.

What does makeUCSCfile do?

The program works by approximating the ChIP-fragment density at each position in the genome. This is done by starting with each tag and extending it by the estimated fragment length (determined by autocorrelation, or it tin be manually specified using "-fragLength <#>"). The ChIP-fragment density is then defined as the total number of overlapping fragments at each position in the genome. Below is a diagram that depicts how this works:
ucsc diagram
Equally keen as the UCSC Genome Browser is, the large size of recent Scrap-Seq experiments results in custom rail files that are very large. In addition to taking a long fourth dimension to upload, the genome browser has trouble loading excessively large files. To assist cope with this, the makeUCSCfile program tin help past specifying a target file size when zipped (i.east. "-fsize 50e6" for 50MB). In guild to meet the specified target file size, makeUCSCfile merges next regions of tag density levels by their weighted average to reduce the total number lines in the terminal bedGraph file. If yous accept problem loading getting your file to load, try reducing the size of the file using the "-fsize <#>" option (i.east. "-fsize 2e7"). To force the creation of larger files, employ a very large file size (i.e. "-fsize 1e50") - this will create a file that does non merge any regions and displays a "native" view of the data.

Tags can exist visualized separately for each strand using the "-strand separate" option.

Changing the Resolution

By default, makeUCSCfile uses the "-fsize <#>" choice to make up one's mind how man reads to substantially "skip" when making the output file. You tin also manually set the resolution.

In an attempt to reduce the size of big UCSC files, one bonny option is to reduce the overall resolution of the file. By default, makeUCSCfile will fill resolution (i.eastward. i bp) files, simply this can exist changed by specifying the "-res <#>" option. For example, "-res 10" volition crusade changes in ChIP-fragment density to be reported only every 10 bp.

Normalization of UCSC files: ii types of normalization

There are ii important parameters to consider during normalization of data. Commencement, the full read depth of the experiment is of import, which is obvious. The 2nd factor to consider is the length of the reads (this is new to v4.4). The trouble is that if an experiment has longer fragment lengths, it will generate additional coverage than an experiment with shorter fragment lengths. In guild to make sure there full area nether the bend is the aforementioned for each experiment, experiments are normalized to a stock-still number of reads as well as a 100 bp fragment length. If reads are longer than 100 bp, they are 'down-normalized' a fractional amount such that they produce the same relative coverage of a 100 bp fragment length. Experiments with shorter fragment lengths are 'up-normalized' a proportional amount (maximum of 4x or 25 bp). This allows experiments with different fragment lengths to exist comparable along the genome browser. The behavior of each normalization is controlled with the following parameters:

-norm <#> : Normalize the total number of reads to this number, default 1e7. This ways that tags from an experiment with merely 5 meg mapped tags will count for 2 tags apiece.
-normLength <#> : Set the standand length for normalization, default 100.

"-normLength 0" will disable the length normalization altogether, useful when visualizing single nucleotide data.

-noadj or -raw : who needs normalization? Just give me the raw coverage numbers...

Normalizing files to Input

The paragraph above specifies how to normalize read densities based on the total number of reads. For some applications, particularly if studying organisms with small genomes, it is improve to visualize the read density equally a ratio relative to Input or IgG. Normally I would NOT recommend visualizing reads this way if the Input/IgG read coverage is sparse as this will crusade problem when calculating ratios. To normalize the experiment to a 2d tag directory, apply the "-i <input tag directory>" option:

makeUCSCfile ExpTagDirectory/ -i InputTagDirectory -o auto

Additional parameters to command the normalized output:

-pseudo <#> : To avert fluctuations in the ratio due to depression coverage input, a pseudo count is added to the numerator and denominator when calculating the ratio (default: 5)
-log : report as a log ratio (default is a simple ratio)
-inputtbp <#> : set up the maximum tags per bp considered in the input experiment

Separating data from unlike strands / RNA-Seq

You tin specify that HOMER carve up the data based on the strand past using the "-strand <...>" option. This is useful when looking at strand-specific RNA-Seq/GRO-Seq experiments. The post-obit options are available:

-strand both : default behavior, for ChIP-Seq/MNase-Seq etc.
-strand carve up : separate data by strand, for RNA-Seq/GRO-Seq
-strand + : merely show the positive strand (i.e. Watson strand) data
-strand - : only evidence the negative strand (i.e. crick strand) data

RNA-Seq and Splicing:

HOMER does not fully support the visualization of spliced RNA-Seq reads. Notwithstanding, if you specify the "-fragLength given" pick, HOMER volition only visualize the reads from the 5' end of the read until the first splice site (or the end of the read). This will assistance make the read densities expect squeamish a crisp over exons, but will non visualize parts of the read that are 3' from the first splice constitute in the read.

Modifying Read Coverage

You can manually fix the fragment lengths that are visualized and shift their positions, both of which can be useful:

-fragLength <# | given> : sets the fragment length, default: uses fragmentLengthEstimate in the tagInfo.txt file of the tag directory. If you want to visualize how the betoken changes over large regions, it can be useful to prepare the fragment length to a very large value (i.due east. 10000). If you want to visualize the verbal length of the reads, use "-fragLength given".
-adjust <#> : adjust the position of the read by this amount from the 5' cease. For example, -adjust -10 would start the coverage x bp upstream. This useful when the v' cease of the read represents a localized signal, i.e. DNase nicking site, as opposed to a Fleck-Seq fragment, which implies the factor binds downstream from the 5' end.

-tbp <#> : limit the number of reads considered per position, default: no limit. i.due east. "-tbp 1" only counts one read per position.

-inputFragLength <#>, -inputAdjust <#>, -inputtbp <#> work the same for input directories if calculating a ratio.

Special Visualization Styles

To aid streamline the visualization of different data types, you lot can use the "-manner <styles>" choice (i.east. "-style rnaseq"). This will accommodate parameters for each type:

chipseq : standard, default
rnaseq : strand specific, volition merely extended fragments their given amount to aid visualize exon edges.
tss : strand specific, and only shows the v' nucleotide of the read (single base precision)
dnase : for nicking style DNase data only (run into hither), centers read fragment over the v' end of the read.
methylated : reports cytosine methylation pct at single bp resolution.
unmethylated : reports the percent of unmethylated cytosinse at single bp resolution.
damid : reports large coverage fragments (2kb) centered on 5' end of the reads

Creating bigWig files with HOMER

Some information sets of very large, but you lot still want to encounter all of the details from your sequencing in the UCSC Genome Browser. HOMER can produce bigWig files by running the conversion program for yous (bedGraphToBigWig). The only catch is that you must accept access to a webserver where you tin can mail service the resulting bigWig file - this is because instead of uploading the whole file to UCSC, the browser actually looks for the information file on YOUR webserver and grabs simply the parts it needs. Slick, eh. Chuck uses this all the fourth dimension for big experiments.

Before fifty-fifty trying to make bigWigs, y'all must download the bedGraphToBigWig program from UCSC and place it somewhere in your executable path (i.e. the /path-to-homer/bin/ binder). This called directly by HOMER to create the BigWig files.

Using the makeBigWig.pl Script

To make bigWig files easier to generate, HOMER includes a plan creatively named "makeBigWig.pl" that automates all of the steps below.

makeBigWig.pl <tag directory> <genome> [special options] [makeUCSC file options] -webDir /path-to-web-fold/ -url http://webserverURL/bigwigFold/
i.e. makeBigWig.pl PU.1-ChIP-Seq/ mm9 -webDir /var/www/bigWigs/ -url http://ChuckNorrisU.edu/bigWigs/

If you lot are visualizing strand specific data (i.e. RNA-Seq), specify "-strand". The -url and -webDir are the directories are the web URL directory and file system directory where the bigWigs will exist stored, respectively. Recent changes to UCSC crave that the chromosome sizes be specified exactly. If having trouble, the current version of HOMER has the option "-chromSizes <filename>" and so that yous can specify the sizes explicitly.

Other makeBigWig.pl options:

-normal (default, similar to "-mode chipseq" for makeUCSCfile).
-strand (for RNAseq, will create two bigWigs separately for each strand).
-dnase (will use "-mode dnase")
-cage (combines -strand with -style muzzle)
-cpg (creates both methylated and unmethylated bigWigs)

-update (will overwrite default bigWig for that tag directory name. Otherwise, if the same file name exists, a random number volition exist added to the finish)
-chromSizes <chrom.size file> (specify the chromosome sizes, default: automated)
-url <URL> (URL directory -no filename- to tell UCSC where to wait)
-webdir <directory> (name of directory to place resulting bigWig file)

Making bigWigs from scratch

This is a quick description of what HOMER is doing. To make a bigWig, add the "-bigWig <chrom.sizes file> -fsize 1e20" parameters to your makeUCSCfile command. When making a bigWig, you ordinarily want to encounter all of the tag information, so brand sure the "-fsize" options is large. You also need to specify an output file using "-o <bigwigfilename>" and also capture the stdout stream using "> trackfileoutput.txt". You tin can too use "-o auto". The "trackfileoutput.txt" will contain the header data that is uploaded as a custom track to UCSC. Recently, changes to UCSC require that HOMER know the exact size of the chromosomes when making the file - these should be placed in a file (<chrom.sizes> file). makeBigWig.pl and makeMultiWigHub.pl volition generate these files automatically by analyzing the sequences in the genome directory.

After running the makeUCSCfile program with the bigWig options, you demand to practice the post-obit:

Copy the *.bigWig file to your webserver location and brand sure it is viewable over the net.
Demand to edit the "trackfileoutput.txt" file and enter the URL of your bigWig file (... bigDataUrl=http://server/path/bigWigFilename ...)
Upload the "trackfileoutput.txt" file to UCSC as a custom track to view your data.

makeUCSCfile <tag directory> -o auto -bigWig <chrom.sizes file> -fsize 1e20 > trackInfo.txti.eastward.

makeUCSCfile PU.1-Flake-Seq/ -o auto -bigWig chrom.sizes -fsize 1e20 > PU.1-bigWig.trackInfo.txt
cp PU.1-ChIP-Seq/PU.1-Scrap-Seq.ucsc.bigWig /Web/Server/Root/Path/
** edit PU.1-bigWig.trackInfo.txt to accept the correct URL **

Annotation: Every bit of at present, a bigWig file can only be composed of a single track - if you want to separate the data by strands, do the following:

makeUCSCfile PU.1-Fleck-Seq/ -o PU.one.positiveStrand.bigWig -bigWig chrom.sizes -fsize 1e20 -strand + > PU.i-bigWig.trackInfo.positiveStrand.txt
makeUCSCfile PU.1-Flake-Seq/ -o PU.1.negativeStrand.bigWig -bigWig chrom.sizes -fsize 1e20 -strand - > PU.one-bigWig.trackInfo.negativeStrand.txt
cp PU.1.positiveStrand.bigWig PU.1.negativeStrand.bigWig /Web/Server/Root/Path/
cat PU.one-bigWig.trackInfo.positiveStrand.txt PU.ane-bigWig.trackInfo.negativeStrand.txt > PU.one-bigWig.trackInfo.both.txt
** edit PU.1-bigWig.trackInfo.both.txt to accept the right URLs for both the negative and positive strands **

Creating Multi-Experiment Overlay Tracks

UCSC has recently added the selection to create overlay tracks, where several bigWig files can be viewed in the same infinite with the help of transparent colors. The first example of this was the Encode Regulation Rails, which showed H3K4me1/3 data from several cell types at the same time. This is very useful for large-scale data sets will many unlike experiments. In these cases it is just nigh impossible to go them on the screen together.

To make a "multi-wig hub", as we will refer to them, you need to make sure y'all have the bedGraphToBigWig plan from UCSC, and a working webserver to host your files. If you tin handle bigWigs in the department above, you lot can brand multi-wig hubs.

The HOMER plan to handle multi-wig hubs is called makeMultiWigHub.pl. It works essentially the same fashion as the makeBigWig.pl script, however, the syntax is a little different. The basic usage is:

makeMultiWigHub.pl <hub name> <genome> [options] -d <tag directory1> <tag directory2> ...
i.eastward. makeMultiWigHub.pl ES-Factors mm9 -d mES-Oct4/ mES-Sox2/ mES-Nanog/ mES-Klf4/ mES-Esrrb/ mES-cMyc/ mES-Stat3/

Note: brand sure y'all use the UCSC genome (e.k. mm9) and not the masked, bastardized HOMER version (mm9r).

The higher up example volition produce a hub called "ES-Factors", composed of configuration files and bigWig files, and place it on your server in the directory specified by "-webDir <directory>". Information technology will also provide you with a URL to the hub (dependent on the value of -url <base url>"). To load the Hub, click on "Track Hubs" on the UCSC browser (adjacent to custom tracks button), and paste the URL in to the dialog box. The example to a higher place will look something like this:

HOMER UCSC HUB example

To figure out which factors stand for to which colors, click on the Blueish Heading for the Hub in the settings area below the UCSC picture. Something like this should pop upwardly:

UCSC Hub settings HOMER

Unfortunately, as of now editing hub information can only exist done by directly modifying the hub files on the server. For example, to edit to colors, you must edit the "/webserver/directory/hubName/genome/trackDB.txt" file.

Considering Hubs are and then cool, HOMER will also practice +/- strand RNA information right. Unfortunately, for now you can't mix stranded and not-stranded information in the same hub with the makeMultiWigHub.pl plan. To visualize stranded data, add "-strand". Below is an example:

HOMER UCSC Hub RNA example

Other makeMultiWigHub.pl options are essentially identical to makeBigWig.pl.

Examples of UCSC bedGraph files

The following shows what the aforementioned information fix looks like irresolute options for file size (-fsize) and resolution (-res). Usually information technology'due south best to utilize i or the other.

-fsize 5e7 -res i
-fsize 1e7 -res 1
-fsize 5e7 -res ten
-fsize 1e7 -res x

UCSC examples

Command line options for makeUCSCfile

Usage: makeUCSCfile <tag directory> [options]

Creates a bedgraph file for visualization using the UCSC Genome Browser

General Options:
-fsize <#> (Size of file, when gzipped, default: 1e10, i.e. no reduction)
-strand <both|split up|+|-> (control if reads are separated by strand, default: both)
-fragLength <# | auto | given> (Estimate fragment length, default: auto)
-adjust <#> (Accommodate edge of tag 3' by # bp, negative for five', default: none[good for dnase])
-tbp <#> (Maximum tags per bp to count, default: no limit)
-mintbp <#> (Minimum tags per bp to count, default: no limit)
-res <#> (Resolution, in bp, of file, default: 1)
-avg (report average coverage if resolution is larger than 1bp, default: max is reported)
-lastTag (To keep ucsc happy, last mapped tag is Non extended by default
Using this option will allow extending of data past the terminal tag position)
-norm <#> (Total number of tags to normalize experiment to, default: 1e7)
-normLength <#> (Expected length of fragment to normalize to [0=off], default: 100)
-noadj (Exercise not normalize tag counts)
-neg (plot negative values, i.eastward. for - strand transcription)
-CpG (Show unmethylated CpG ratios)
-color <(0-255),(0-255),(0-255)> (no spaces, rgb color for UCSC rail, default: random)
-i <input tag directory> (normalize bedGraph to input information)
-pseudo <#> (Number of pseudo counts used to smooth out low coverage areas, default: 5)
-log (report log2 ratio instead of linear ratio)
-inputtbp <#>, -inputFragLength <#>, -inputAdjust <#> can besides exist set up
-bigWig <chrom.size file> (creates a total resolution bigWig file and rails line file)
This requires bedGraphToBigWig to exist available in your executable path
Also, because how how bigWig files work, use "-strand -" and "-strand +"
in divide runs to make strand specific files: "-strand divide" will not work
Consider using makeBigWig.pl and makeMultiWigHub.pl if interested in bigWigs
-o <filename|auto> (transport output to this file - volition be gzipped, default: prints to stdout)
automobile: this will place an appropriately named file in the tag directory
-name <...> (Proper name of UCSC rails, default: machine generated)
-style <option> (Run across options below:)
chipseq (standard, default)
rnaseq (strand specific, if unstranded add '-strand both' to cease of command)
tss (strand specific, single bp fragment length)
dnase (fragments centered on tag position instead of downstream)
methylated (unmarried bp resolution of cytosine methylation)
unmethylated (single bp resolution of unmethylated cytosines)
damid (2kb fragments centered on 5' end of reads)
-circos <chrN:Thirty-YYY|genome> (output just a specific region for circos[no header])

Command line options for makeBigWig.pl

Script for automating the process of creating bigWigs

Usage: makeBigWig.pl <tag directory> <genome> [special options] [options]

Special Options for bigWigs [cull 1, don't combine]:
-normal (ChIP-Seq style, default)
-strand (Strand specific, for RNA-Seq and GRO-Seq)
-dnase (Special options for Crawford-lab style DNase-Seq)
-muzzle (Special options for CAGE/TSS-Seq)
-cpg (Special options for mCpG/CpG)

Other options:
Whatever options you want to pass to makeUCSCfile
!!Alarm!!: exercise non try to specify "-strand separate" - use the special choice above.

File options:
-fsize <#> (Use to limit the size of the bigwig files)
-url <URL> (URL directory -no filename- to tell UCSC where to expect)
-webdir <directory> (name of directory to place resulting bigWig file)
-update (overwrite bigwigs in the webDir directory, otherwise random numbers are
added to make the file unique.

Current url target (-url): http://homer.salk.edu/bigWig/
Current web directory (-webDir): /data/www/bigWig/

You're going to want to modify the $wwwDir and $httpDir variables at the top of
the makeBigWig.pl program file to accomidate your system so you don't have to
specify -url and -webdir all the time.

Command line options for makeMultiWigHub.pl

Script for automating the procedure of creating multiWig tracks

Usage: makeMultiWigHub.pl <hubname> <genome> [options] -d <tag directory1> [tag directory2]...

Special Options for bigWigs [choose one, don't combine]:
-normal (ChIP-Seq style, default)
-strand (Strand specific, for RNA-Seq and GRO-Seq)
-dnase (Special options for Crawford-lab mode DNase-Seq)
-cage (Special options for CAGE/TSS-Seq)
-cpg (Special options for mCpG/CpG)

Other options:
Whatever options yous desire to pass to makeUCSCfile
!!Warning!!: do not try to specify "-strand separate" - utilise the special choice above.
Likewise, for the genome, do Non use echo version (mm9r) - utilise mm9 instead

File options:
-force (overwrite existing hub)
-fsize <#> (limit the file size of the bigwig files to this value)
-url <URL> (URL directory -no filename- to tell UCSC where to wait)
-webdir <directory> (name of directory to place resulting hub directory)

Current url target (-url): http://biowhat.ucsd.edu/hubs/
Electric current web directory (-webDir): /data/www/hubs/

You're going to want to modify the $wwwDir and $httpDir variables at the height of
the makeMultiWigHub.pl program file to accomidate your system so you don't take to
specify -url and -webdir all the time.

Adjacent: Finding Peaks (ChIP-enriched regions) in the genome

Can't figure something out? Questions, comments, concerns, or other feedback:
cbenner@salk.edu

bienhydre1966.blogspot.com

Source: http://homer.ucsd.edu/homer/ngs/ucsc.html