Assessing the statistical validity of proteomics based biomarkers

S. Smit; M.J. van Breemen; H.C.J. Hoefsloot; A.K. Smilde; J.M.F.G. Aerts; C.G. de Koster

doi:https://doi.org/10.1016/j.aca.2007.04.043

Assessing the statistical validity of proteomics based biomarkers

Authors	S. Smit M.J. van Breemen H.C.J. Hoefsloot A.K. Smilde J.M.F.G. Aerts C.G. de Koster
Publication date	2007
Journal	Analytica Chimica Acta
Volume \| Issue number	592 \| 2
Pages (from-to)	210-217
Organisations	Faculty of Science (FNWI) - Swammerdam Institute for Life Sciences (SILS)
Abstract	A strategy is presented for the statistical validation of discrimination models in proteomics studies. Several existing tools are combined to form a solid statistical basis for biomarker discovery that should precede a biochemical validation of any biomarker. These tools consist of permutation tests, single and double cross-validation. The cross-validation steps can simply be combined with a new variable selection method, called rank products. The strategy is especially suited for the low-samples-to-variables-ratio (undersampling) case, as is often encountered in proteomics and metabolomics studies. As a classification method, principal component discriminant analysis is used; however, the methodology can be used with any classifier. A dataset containing serum samples from Gaucher patients and healthy controls serves as a test case. Double cross-validation shows that the sensitivity of the model is 89% and the specificity 90%. Potential putative biomarkers are identified using the novel variable selection method. Results from permutation tests support the choice of double cross-validation as the tool for determining error rates when the modelling procedure involves a tuneable parameter. This shows that even cross-validation does not guarantee unbiased results. The validation of discrimination models with a combination of permutation tests and double cross-validation helps to avoid erroneous results which may result from the undersampling.
Document type	Article
Language	English
Published at	https://doi.org/10.1016/j.aca.2007.04.043 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Assessing the statistical validity of proteomics based biomarkers