Software for data interpretation and knowledge creation

Tuesday, April 08th, 2008 by hinrich

Technology advances lead to the generation of increasing amounts of data. The first wave of software only operates and/or captures the data generated by a given experiment / instrumentation. The second wave helps in data preprocessing and provides tools to identify statistically significant data points in the the whole data set. What is still largely lacking are third wave programs that are very simple to use by people with high scientific domain knowledge who are not necessarily computer-minded. As the second wave software often requires some level of programming skills (a good example is the increasingly popular open source software R - or a higher than scientist-normal level of data analysis (traditionally called "bioinformatics") skills, the flow from data analysis to data interpretation is interrupted. Third wave programs are needed to capture the scientific interpretation. Ideally such third wave programs should be build in such a way that they are aware of fourth wave software: the tools that ultimately generate knowledge and provide scientists with new hypothesis. For these programs to work effectively and to require as little human input as possible, third wave tools have to control and ensure that scientific conclusions are captured in a computer-understandable way. Many different things like controlled vocabulary, enforced use of official gene symbols (instead of the very many synonyms), enforced use of spell checking, potentially already providing the scientist with a view about how a computer sees his/her input, come to mind. This is a completely different approach to what many software companies nowadays proclaim as the next big thing: data integration. If the scientific interpretation is lacking, just integrating data will only have limited use.

Posted in Science