Sciencehttp://goehlmann.infoHinrich GoehlmannTue, 23 Dec 2014 11:59:13 GMTTue, 23 Dec 2014 11:59:13 GMTScienceFuguHub1440<![CDATA[Annotated data lake]]> Science

During a discussion with colleagues on the necessity to make our various data sources easily accessible for integrated analyses, we of course considered recent trends in IT ("Big Data", "The Cloud", "High Performance Computing", ...). It still strikes me how difficult it is to see the whole picture. Technological concepts such as "Data Lake" make perfect sense in this context: they conceptually enable you to do analyses that you could not do in the past because the various data types were just not available for a joint analysis. Still, data analysts frequently spend a lot of time in chasing down the correct annotation of their data: Which exact chemical structure has been tested? In which well of a microtiter plate was which sample tested? Where are the controls located? Did this experiment test exactly the same compound or do I only have a generic name and thereby won't know which chemical structure has been profiled?

Spending time to obtain answers to these questions is a necessary evil to ensure that any analysis results are likely to be useful. However, they are typically not considered or even recognized when thinking about data analysis.

So, I coined on the spot a new term to reflect what we really need: an "Annotated Data Lake" (quick Google search did not give any hits, so I am curious whether it will change as a number of my colleagues seemed to like the term...). We need to make a substantial effort here today to enable big data analytics / integrated data analysis tomorrow.

ScienceTue, 23 Dec 2014 11:59:13 GMT
<![CDATA[Innovative technologies]]> Science

I am continuously confronted with the following balancing act - easy to describe - difficult to judge, decide and act upon...

One side looks for new technologies (also called innovation) that can provide us with new scientific insights into the compounds that we would like to develop into new medicines. Here elements like cutting-edge technologies, competitive advantages, novel types of insights, automation, throughput, etc., weigh a lot.

The other side (you could summarize them simply as the more conservative aspects) look at applicability. What kind of data are produced? What do they mean? Do we fully understand how to translate the novel data / information into knowledge and understanding about our chemical structures?

An example to illustrate this are technologies or approaches that will provide us with information about the polypharmacology of compounds - something that we increasingly get with modern high dimensional biology technologies. What do the measurements mean that, e.g., implicate an effect on a molecular level in a context that we cannot directly link to the desired effect? Will they turn out to harm the further development by creating concerns that may not be justified?

Often times such new approaches are being positioned at early stages of the discovery pipeline and are accordingly utilizing simple model systems (e.g., cell lines). Here we will frequently be left with the question of how such findings will translate into the human situation? And furthermore, even if undesired effects were to be correctly indicated, it is usually impossible to predict at what concentration the desired therapeutic effect will be seen in humans. And maybe the detected undesired effects might not yet be induced to an extent that would cause concern when using a concentration that is sufficient for inducing the desired therapeutic effect.

However, while such thoughts are important to consider, it is also necessary to continue to explore new ways of obtaining important information on compounds early to help in prioritizing the structures that have the highest chances of being successful. Tricky...

ScienceThu, 05 Jun 2014 12:46:24 GMT
<![CDATA[Scientific peer review of research concepts]]> Black Hat

I regularly hear people complain about meetings. They can be inefficient especially when one or more people are present wearing a black hat / basically just criticizing without contributing to solve an issue.

As I have been co-author of a number of scientific articles and I am currently busy with another one, I just realized that - like many other authors of such articles - I was thinking about the peer review process. Who will be reading the article? What will they criticize?

Similar to how Edward de Bono attempted to improve brainstorming with his concept of the six thinking hats, I am wondering whether we are not wasting a lot of money and time by using the wrong order in our peer review process of scientific research.

Why don't we have a scientist come up with an idea/concept and have that peer-reviewed first? Once such review process has established what experiments should be done to test a certain hypothesis, the next step is only to review the results and execution. This will avoid the reviewer lottery where you never know whether a reviewer will agree with your point of view or whether you will get a reviewer assigned who will only criticize your work.

And instead of replacing the old system with such a concept, introduce it as an option. Allow people to continue with the old system, but generate the opportunity for researchers to surely get their work published by providing them with a peer-reviewed-concept-token that tells the journal editor that the approach is solid. Then editors can still decide whether the content and/or results are suitable for a given journal.

Because criticisms can always be found in scientific research...

ScienceThu, 01 Aug 2013 16:02:33 GMT
<![CDATA[Dagstuhl seminar on early drug design]]> Schloss Dagstuhl

After having attended a Dagstuhl seminar in 2009 I am happy to be again at this great location - this time as co-organizer of the seminar Computational Methods Aiding Early-Stage Drug Design. We have had already two days of intense discussions and I am grateful for the scientific council of Dagstuhl that they have accepted this seminar even though the content is a bit more distant from the kind of seminars that are typically organized in this well known meeting place for computational scientists.

Even though the list of participant is not huge (we are with 20+), the composition of attendees is very diverse and the discussions have already been very stimulating. Just a few remarks that stuck in my mind:

• The relevance of including & making available datasets to the scientific community that are more reflective of the situation at pharmaceutical companies where roughly 90% of compounds from a screen are inactive for a given primary assay.
• The difficulty of sampling chemical space...
• How to define biological activity and especially novel biological activity?

ScienceTue, 21 May 2013 20:58:48 GMT
<![CDATA[Paul J Lewi has died]]> Paul Lewi

I have just learned from my colleague that Dr. Paul J. Lewi has died on August 28th. I have known Paul from his introduction to partial least squares, the numerous discussions on finding new applications for his spectral map approach - especially for gene expression data - and various scientific discussions.

While I have not known Paul very much personally, I have always valued his insights and broad overview of statistical concepts - something that is very important for me as a non-statistician who relies heavily on statistical techniques to analyze and interpret the biological data we generate.

Spectral mapping - you can find more info on a web site that he has created together with Luc Wouters here - has become a fundamental tool in my work with high dimensional data. Starting out as a tool for QC, it has quickly moved into providing us insights into biological effects within high dimensional data sets. Nowadays we are exploring the use of spectral mapping for defining metagenes. In other words, we continue to rely heavily on his work - maybe the best way to remember Paul.

ScienceMon, 03 Sep 2012 18:00:22 GMT