TiMAT2 Usage

This is an outline on how to use TiMAT2 (aka T2) to process your tiling microarray data.

The applications are designed to be user friendly and require only moderate familiarity with command line programs. When in doubt type 'java -jar pathTo/T2/Apps/ApplicationName' to pull a menu of options. See the readme file for installation instructions.

General Recommendations:

  1. I think it important to have three replicas for every experiment. These don't necessarily have to be biological replicas, e.g. three different isolates of a fibroblast cell line processed in parallel. Technical replicas in most cases are preferred, e.g. pool multiple plates of fibroblasts and split it after fixing/lysing/preparing chromatin and process in parallel. Technical replicas are extremely informative. They provide a measure of the sample prep/ microarray processing consistency without any confounding biological variability.
  2. If you choose to only go with two replicas be prepared for lots of qPCR validation. The symmetric null method of confidence estimation works reasonably well for non complex organisms but rarely for human or mouse. Three IPs and three input replicas are the minimum needed for a random label permutation confidence estimation.
  3. It is rather critical to perform all of these preps using the same reagents, same equipment, at the same time, etc. Be prepared to throw out all of the data generated during the pilot/ optimization phase of the experiment and repeat it along side the other samples. If you do plan to make use of the pilot data, prep/ IP/ amplify in parallel the other technical replicas and freeze them while deciding if the pilot worked, if so, then run the matched samples. One good way to kill a tiling experiment is to prep samples on different days using different reagents and expect some sort of magic to happen upon hybridization and anlysis. Technical inconsistency is the primary cause for tiling microarray failure.
  4. For two color arrays, there is no need to have a matching input control on every chip. You can treat the different colors independently and think of them as a way of getting two readouts from the same chip. For example, if I performed a ChIp-ChiP experiment with 4 antibodies on the same fibroblast cells I would use 9 chips. For each antibody, I would prepare three technical replicas, and label them with Cy3 or Cy5 and place them on different chips. I would do likewise with one input sample split to three technical replicas. For the last sample, I would perform mock IPs, three technical replicas, using IgG or a non specific antibody, e.g. anti-GST or anti-FLAG. See below.
  5. For your experiments involving an IP, consider performing a mock IP using IgG or an antibody that doesn't bind anything within your sample, e.g. anti-GST or anti-FLAG. It is best to use a type matched antibody. TiMAT2 is set up to generate empirical FDRs based on the mock IP and is a great way to estimate confidence in your real IP data without having to perform exhaustive qPCR. One consideration though, it must be performed in parallel with the real IPs. Subtle differences in washing and amplification really effect the output.
  6. Don't over wash your beads during the IP. Use the minimum number of washes (3x 5min?) that give good fold enrichment for known targets by qPCR. Over washing can lead to a PCR bottle neck effect where a small number of regions get overly amplified and will look exactly like real regions on the microarray even though nothing specific was pulled down by the IP.
  7. You might want to prepare twice the amount of chromatin you will need to be sure to not run out if and when parts of the experiment need to be repeated.
  8. Save enough pooled pre amplification real IP and input material for ~50 qPCR rxns each.

For ChIP-chip experiments:

  1. Perform:
  2. For Affy data, skip mismatch transformation for ChIP-Chip experiments. Just median scale intensity values to 100 and quantile normalize across all chips.
  3. Use the Correlate app to check the correlation coefficient between your replicas and appropriate clustering of like chips. For IPs, the r^2 values between replicas should be > 0.5 (or 50 for 100 *r^2). These should typically be > 0.8. If they are less than these values your IP/ PCR/ labeling protocols must be optimized. It cannot be emphasized enough to not proceed with the full experiment until these values have been achieved. Otherwise, considerable time and money will be wasted. Garbage in = garbage out. Speckles and spots on the chip can throw the correlation coefficient. If this is the case, use full quantile normalization and the robust pseudo median window scanning statistic.

Required Resources:

  1. Java version 1.4 or greater.
  2. These command line applications have been tested on MacOSX and Linux. They have not been tested on a Windows machine. All the source code, with extensive documentation, is included in the TiMAT2 package.
  3. A computer with > 2 GigaBytes RAM. One can use a machine with less memory but some programs will run rather slowly, hours instead of minutes.

Processing Protocol:

Tiling analysis follows five basic steps:

  1. Obtain or make map file(s) to associate microarray intensities with genomic coordinates, release specific
  2. Normalize raw microarray intensities
  3. Map normalized intensities to the genome
  4. Use window smoothing to create an array of window summary scores
  5. Identify enriched (and or reduced) regions by picking a threshold summary score and merge overlapping windows that exceed the threshold

Obtaining Map files

  1. TiMAT2 uses the Affymetrix tpmap file format. This is a text version of the binary probe map file, bpmaps. A bpmap can be converted to a tpmap using the bpmap2tpmap command line utility. Since this tool is not officially released ask your Affy rep for the tpmap version of the officially released bpmaps.
  2. TPMaps can be generated from the NimbleGen NAD files using the ConvertNimblegenNDF2TPMap app.
  3. TPMaps might be generated from the Agilent txt results file, provided sequences are present, using the ConvertAgilentData app.
  4. You can create your own, it's a simple tab delimited txt file containing:
    1. Two comment lines: '#seq_group_name' and '#version'
    2. Oligo sequence (5'->3', sense strand)
    3. t or f (boolean describing whether the real oligo on the microarray will hybridize with the sense target strand)
    4. Chromosome
    5. Start position of the oligo, zero base coordinates
    6. Perfect match X microarray coordinate
    7. Perfect match Y microarray coordinate
    8. (Optional) Mis match X microarray coordinate
    9. (Optional) Mis match Y microarray coordinate
    10. (Optional) Mystery number 1 (for backward compatibility?)
  5. Example of a Sense, PM only tpmap

Building Map Files

Building map files is a very complex process fraught with many pitfalls. It should only be performed by tiling microarray professionals.
  1. Obtain fasta files for the genome you wish to map. Complex organisms should be RepeatMasked with X's or N's, not lower case.
  2. Convert the fasta headers into the convention used by UCSC and IGB. (e.g. > chromosome 1 : NCBI build 35.1 to >chr1)
  3. (Optional) Obtain control sequence fasta files containing control sequences (bacterial, arabidopsis, intergenic regions...) include these with the genomic sequences. You may also like to create one particular fasta file and call it chrCtrls.fasta. This can be used in median scaling during array normalization.
  4. Create or obtain a xxx.1lq file. This can be obtained directly from your Affy rep. Do not use a rotated 7G specific xxx.1lq file. For NimbleGen and Agilent, you will need to create it. The PrintSelectColumns app may be helpful. You'll need to copy and paste entire columns and most spread sheet applications cannot handle the large sized map files. You may also need to transform the array coordinates into zero based coordinates. It is a tab delimited txt file containing:
    1. A variety of header lines only one is required 'X Y Seq Destype'
    2. X coordinate on the array (zero based)
    3. Y coordinate on the array (zero based)
    4. Oligo sequence (the original 1lq format specified 3'-5', you can leave it 5'-3' and throw the appropriate flag in MummerMapper)
    5. Destype (the original 1lq format specified a number used to describe the class of oligo, PM, MM, S, AS, Bright Control, Dark Control, etc. You can put anything here and use a regular expression in MummerMapper to select appropriate probes to map.)
  5. Example of a custom xxx.1lq file that can be read by MummerMapper:
    X       Y       Seq     Destype
    0       3       TATATTTCTGCATATACATCATAGTCTCTAAAAACTGTACTAGGTCATACTCCATAGAGG    Spo_P101594-R|Spombe|complement_chr2:8636..8695
    0       4       AGTTATGACGAGCTTTGGAACGTTTATGACGACTGTTAGGATCGCAGTTTCCTGAAGTCT    Spo_P110716-R|Spombe|complement_chr2:511391..511450
    0       6       GACAATAATTTGGCGGCGCTGCTACCAATATAGCTGGAAGCAGTAAAGTATTAATCGTGC    Spo_P043768-R|Spombe|complement_chr1:2408231..2408290
    0       7       TAGGCTAAGCATCGAGTTCATGGCTAAATTAATAAAAGACTTTAAAAGTTTGTTAAAATG    Spo_P000762-R|Spombe|complement_chr1:42901..42960
    0       8       CCACGACCTTTTCCGTGAGTCCGGGGGTTGTAAGTCTTCTTTATACCGTAGCGGGTAGAA    Spo_P041016-R|Spombe|complement_chr1:2256871..2256930
    0       9       CATCAGCTATGAACTTTCACTTTTGTCTACAACTTTTTTAATCAGCTACAAATTTACAAA    Spo_P042235-R|Spombe|complement_chr1:2323916..2323975
  6. Run the MummerMapper app. Questions to address:
  7. Test, test, and test the finished tpmap using known datasets. One mistake may create hundreds of hours of analysis headaches later on. Building tpmaps is a lot like taking pictures for your friend's wedding. Sounds easy but mess it up and you'll be mistrusted for life.

Convert Raw Microarray Data Into xxx.cela Files

To use T2 you must convert your raw microarray intensity data into a format that can be read by TiMAT, serialized binary Java float[][] files. These are compressed, load quickly, and are used by multiple TiMAT2 applications.

Option 1: Process your tiling data using the T2 application (Standard Analysis)

The T2 application is a wrapper for many TiMAT2 applications and can work with a cluster to farm out big jobs for rapid analysis.
  1. Convert your raw intensity data to binary xxx.cela or text xxx.cel files, see above.
  2. Create processed TPMapFiles by running the TPMapProcessor app on your tpmap file(s).
  3. Complete the t2ParamaterFile and launch the T2 app.

Option 2a: Process your single chip set tiling data using individual TiMAT applications (Custom Analysis)

This approach can be used to customize your analysis for non standard tiling experiments. Many options exist here that are not available in the T2 app.
  1. Convert your raw intensity data to binary xxx.cela files, see above.
  2. Create processed TPMapFiles by running the TPMapProcessor app on your tpmap file(s)., pay close attention to the size of the window option.
  3. (Optional) Run the VirtualCel application on a directory of converted cel files (xxx.cela) to create image files for visual inspection. Any large spots should be masked using the CelMasker app.
  4. Run the CelProcessor app on sets of xxx.cela files to scale, normalize, and map the data. This application generates xxx.celp files.
  5. (Optional) Use the ScatterPlot to draw a simple scatter plot and calculate a Pearson correlation coefficient on processed cel files. For example, compare the different treatment chips to one another. There should be a good internal correlation (>=0.8).
  6. (Optional) Use HierarchicalClustering to cluster your xxx.celp files. Any file that fails to cluster with an r^2*100 value of > 50 should be removed from the analysis.
  7. Run ScanChip
  8. (Optional) Run ScanGenes to perform a basic expression array analysis using your tiling data.
  9. If you have performed mock IPs, run the FDRWindowConverter app to calculate an empirical FDR for each window.
  10. Merge high scoring, overlapping Windows into an array of Intervals with the IntervalMaker application. The biggest difficulty is where to set the threshold for merging windows. Two confidence estimations are provided by TiMAT2: an empirical FDR based on a mock IP and a statistical FDR based on Richard Bourgon's non-parametric symmetric p-test. Set these generously and manually filter. If you have multiple replicas, or have used different antibodies you can merge the different Window arrays using the MultiWindowIntervalMaker app.
  11. (Optional) The SetNumberIntervalMaker can also be used to make multiple interval arrays, each containing a specific number of intervals. This is useful, and recommended, for analysis independent of thresholds.
  12. Load the interval array(s) with oligo information using the LoadIntervalOligoInfo app.
  13. For chIP-chip data, find the best average intensity difference sub window within each Interval, as well as enrichment peaks using FindSubBindingRegions.
  14. (Optional) Score Intervals for the presence of a transcription factor using ScoreIntervals. You can also score whole chromosomes or a set of FASTA sequences using ScoreChromosomes and ScoreSequences respectively.
  15. (Optional) Filter Intervals with a variety of parameters using IntervalFilter. Sorts intervals into pass and fail.
  16. (Optional) Compare and split different Interval sets based on their overlap with the OverlapCounter. Use the IntersectRegions app for a more robust and sophisticated analysis.
  17. Print results:

Option 2b: Process your multi chip set tiling data using individual TiMAT applications (Custom Analysis)

Multi-chip set analysis is very similar to single-chip set analysis and is the pipeline wrapped by the T2 app. The key differences are noted below.
  1. Set the -r flag when running the CelProcessor app to break the mapped intensities down by chromosome. The results in this case will be a directory instead of a single xxx.celp file for each array.
  2. Run the MakeChromosomeSets app to merge the chromosome specific normalized data directories together.
  3. Run ScanChromosomes instead of ScanChip.
  4. Run MergeWindowArrays if you processed individual chromosomes in ScanChromosomes.
  5. Use the LoadChipSetIntervalOligoInfo instead of the LoadIntervalOligoInfo.

Useful higher level analysis and utility applications

Be sure to check out the General Analysis and Utility applications. They might save you days of coding.