Schloss Dagstuhl

Friday, February 27th, 2009 by hinrich

Dagstuhl Seminar

Last week I have attended the seminar "Similarity-based learning on structures". Even though there were very many links to biological problems, it was quite a challenge to follow all the presentations and discussions as the field of machine learning is quite distant from my molecular biology education. Still, the discussions, the new contacts and especially the venue Schloss Dagstuhl made this one of the best seminars I have attended.

While it would lead too far to mention everything that I learned, here are some of the remarks that were made during the seminar - they might be common sense to some people, but I find them important to keep in mind:

Even though there were some people from non-academic institutions and industry present, the clear majority of the participants came from academia. It is unfortunate that the scientific exchange between academic and non-academic scientists is not more common as both sides could really benefit from such scientific brainstorming exercises.

Posted in Science

Data integration

Friday, February 06th, 2009 by hinrich

Data Integration

More and more molecular biology technologies generate multiple measurements / data points per sample. While we were able to "handle" technologies that multiplex or generate a few measurements, the situation is changing: Often we are now looking at hundreds or even thousands of measurements per sample for a given technology. Or better (worse?): equal amounts of data from multiple experiments (mRNA / miRNA / SNPs / CNVs / methylation / proteomics / metabolomics / ...) for a single sample.

The solution? Data integration. Or at least, that is what a lot of people currently believe. This way one would not only be able to e.g. look at the significantly affected mRNAs in a given experiment, but would immediately see the copy number state, known SNPs, ... , you name it.

I really would like to be convinced that this will indeed "solve" the way we deal with the increase in multivariate data that modern molecular biology tools deliver. How realistic is it that "normal" biologists (and I do not mean a few, but the majority) will be aware of multivariate data analysis techniques and the issues that are associated? Multiple testing problem and overfitting come to my mind immediately...

So, what else could be done? I personally believe that the short to mid term solution is investment in statistically educated data analysis experts. Besides the requirement of having a basic understanding of the biology that they will deal with (e.g. oncology / immunology / metabolic disorders / etc.), they need to have strong communication skills to explain and jointly interpret significant findings in the data together with the scientist who has done the biological experiment.

While everyone currently searches for opportunities to save money via outsourcing defined tasks - having someone code a "data integration" software package could be one... - I would favor hiring such people / investing in internal FTEs. Otherwise we will drown in data or chase up findings that did not pass statistical/mathematical checks simply because the scientist was not aware of the pitfalls. And I would not be surprised if the technology would be blamed later on - rather than potential shortcomings during the data interpretation.

Posted in Molecular Profiling