R / Bioconductor - related material for microarray data analysis

I am currently updating this page...

Affymetrix summerization techniques

Gene expression data generated by Affymetrix microarrays requires various pre-processing steps: background correction, normalization, summerization. The summerization steps take the measurements of the multiple different probes for each probe set and averages them in some way. This summerization step has been studied extensively and is also one of my interests.

FARMS and I/NI-calls

Besides using the FARMS algorithm for summarization, we have also developed a characteristic of the approach further to come up with a qualitative value that will aid the scientist in deciding whether a given gene was informative for an experiment or not (I/NI-calls).

Read more about FARMS and I/NI-calls on this page.

Small sample RMA

It happens at times that one would like to generate gene expression data from samples that contain only very few cells. As Affymetrix microarrays require a certain minimal amount of messenger RNA, one option is to amplify the small amount of starting material in a linear way, so that the relative abundance of the different transcripts is preserved. One standard protocol that is often used for this process results in the introduction of a bias into the data, that was addressed with a modified version of RMA.

R code as well as the data used for the analysis presented in the Biotechniques paper which discusses this RMA variant can be found on the sRMA page of Rafael Irizarry.

Influence of the summerization technique on gene set analysis

Does it really matter which summerization is used when one does not do a gene-by-gene analysis but rather looks at differences between two treatment groups on the level of sets of genes?

The following paper looks into this and has been published in Bioinformatics:

Quality control: Spectral map analysis

Spectral map analysis is a visualization tool we routinely use to assess the quality of a mircorarray experiment. In short, it visualizes samples as squares and genes as circles in a 2-dimensional plot. The two dimensions (x-axis and y-axis) visualize the largest variability in the dataset. As spectral map uses a data preprocessing step that removes the information about the absolute signal intensity, the size of the symbols reintroduces this information by visualizing the average across samples (for the size of the circle representing a given gene) or the average across all genes (for the size of a square representing a given sample). If an experiment worked well and gene changes induced by the treatments are larger than other technical sources of variability, the replicate samples of a given treatment should be positioned in close proximity. Similarly, the further samples are apart, the more different they are with respect to the variability which is explained by the x-axis and the y-axis.

Spectral Map

The GPL-licensed R-package can be found on the SPM page of Luc Wouters.

We are currently also setting up a project on R-Forge for the maintenance and future development of the spm package. It has now been renamed "Multivariate Projection Methods" (MPM) and new versions are supposed to appear on CRAN.

http://mpm.r-forge.r-project.org/

More details on using spectral map analysis for gene expression data can be found in the following publications: