- Centering, scaling and transformations: improving the biological information content of metabolomics data.
- BMC Genomics
- Pages (from-to)
- Number of pages
- Document type
- Faculty of Science (FNWI)
- Swammerdam Institute for Life Sciences (SILS)
Background: Extracting relevant biological information from large data sets is a major challenge
in functional genomics research. Different aspects of the data hamper their biological
interpretation. For instance, 5000-fold differences in concentration for different metabolites are
present in a metabolomics data set, while these differences are not proportional to the biological
relevance of these metabolites. However, data analysis methods are not able to make this
distinction. Data pretreatment methods can correct for aspects that hinder the biological
interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability.
Results: Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range
scaling, vast scaling, log transformation, and power transformation, were tested on a real-life
metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the
stability of the rank, the influence of technical errors on data analysis, and the preference of data
analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis.
Conclusion: Different pretreatment methods emphasize different aspects of the data and each
pretreatment method has its own merits and drawbacks. The choice for a pretreatment method
depends on the biological question to be answered, the properties of the data set and the data
analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).
In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.
© 2006 van den Berg et al; licensee BioMed Central Ltd.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.