Examples of Cluster and Treeview software on Yeast cDNA and Human Affymetrix data
June 19, 2004
Brian Davis
Download Software (Cluster and TreeView) from Mike Eisen's site: http://rana.lbl.gov/EisenSoftware.htm
Unzip all.
I tested the latest versions (ie, Cluster v2.2 and TreeView v1.6) and was NOT satisfied with the results (the tree was not made and the data appeared not to be clustered). Therefore, I used the oldest available Cluster software (v1.4) and was satisfied with the results. These software were tested on a PC running XP with approx. 750 MB RAM (I do not know if RAM is critical).
The software is easily installable, simply follow the instructions in the manual exactly (manual is downloaded from the same site).
Altho the yeast data download (for testing the software) is available from the same site, I went back to the original PNAS paper (Eisen, 1998) and downloaded the data from the PNAS site (supplemental data referred to in PNAS paper). This is the Data referred to in the manual.
ONE BIG ERROR IN THE MANUAL: DO NOT TRANSFORM THE (YEAST) DATA!!! It has already been normalized and log transformed. For the yeast data provided, simple load in the yeast data and run the clustering. DO NOT LOF TRANSFORM.
I then downloaded a large dataset from the NCI (National Cancer Institute) site:
http://gedp.nci.nih.gov/dc/servlet/manager
Search experiments with the search form (look for human microarrays) or simple search on experiment ID 196. We used this data (from the paper Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, Lander ES, Loda M, Kantoff PW, Golub TR, Sellers WR. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002 Mar;1(2):203-9.)
Download the data and unzip all. It appears that the raw data is in the file "Prostate_TN_final0701_allmeanScale.res"
However, this data (unlike the yeast data) needs to be formated to use with CLUSTER. The alternating rows with "A" and "P" and "M" need to be removed (easily done in excel). The data will then be able to be used by CLuster and visualized in TreeView. I have done this and Cluster Runs (in about 30-45 minutes) and the results can be presented in Treeview. However, this form of the data is unlikely to uncover differences in cancer vs. normal tissue. The data needs to be transformed first.
There are typical steps in the examination of affymetrix data. According to Kam Dahlquist, the typical transformation of "older" Affymetrix data is:
For Affy values from MAS 3.1 or 4.0, NOT 5.0
1. Start with Affymetrix “Average Difference” values from GeneChip Analysis software (these are apparently the values provided)
2. “Truncate” values: round values <20 up to 20. Equation =+IF(A2<20,20,A2)
3. Log (“truncated”) values: take log base 2 of individual (“truncated”) values. Equation =LOG(B2,2)
4. Log mean: compute the mean of the truncated/logged values within a group. Equation =AVERAGE(A2:B2)
5. Geometric mean: raise 2 to the power of the log mean. Equation =POWER(2,C2)
6. Log ratio. Subtract the log mean of the control group from log mean of experimental group. Equation =[log mean exp group]-[log mean control group]
I HAVE NOT TESTED THIS!!!!
These steps can be performed in Excel but are tedious. Scripting with Perl would probably be faster.