A classification model for the Leiden proteomics competition

Open Access
Authors
Publication date 2008
Journal Statistical Applications in Genetics and Molecular Biology
Volume | Issue number 7 | 2
Pages (from-to) 8
Number of pages 10
Organisations
  • Faculty of Science (FNWI) - Swammerdam Institute for Life Sciences (SILS)
Abstract
A strategy is presented to build a discrimination model in proteomics studies. The model is built using cross-validation. This cross-validation step can simply be combined with a variable selection method, called rank products. The strategy is especially suitable for the low-samples-to-variables-ratio (undersampling) case, as is often encountered in proteomics and metabolomics studies. As a classification method, Principal Component Discriminant Analysis is used; however, the methodology can be used with any classifier. A data set containing serum samples from breast cancer patients and healthy controls is analysed. Double cross-validation shows that the sensitivity of the model is 82% and the specificity 86%. Potential putative biomarkers are identified using the variable selection method. In each cross-validation loop a classification model is built. The final classification uses a majority voting scheme from the ensemble classifier.
Document type Article
Published at http://www.bepress.com/sagmb/vol7/iss2/art8
Downloads
Permalink to this page
Back