Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data

Open Access
Authors
Publication date 24-02-2023
Journal Journal of Cheminformatics
Article number 28
Volume | Issue number 15 | 1
Number of pages 12
Organisations
  • Faculty of Science (FNWI) - Van 't Hoff Institute for Molecular Sciences (HIMS)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity and limited number structures in chemical databases. In this work, we have developed and validated a novel machine learning algorithm to predict the retention index (ri) values for structurally (un)known chemicals bIased on their measured fragmentation pattern. The developed model, for the first time, enabled the predication of r
values without the need for the exact structure of the chemicals, with an R2 of 0.91 and 0.77 and root mean squared error (RMSE) of 47 and 67 ri  units for the NORMAN (n = 3131) and amide (n = 604) test sets, respectively. This fragment based model showed comparable accuracy in ri  prediction compared to conventional descriptor-based models that rely on known chemical structure, which obtained an Rof 0.85 with an RMSE of 67.
Document type Article
Language English
Published at https://doi.org/10.1186/s13321-023-00699-8
Downloads
Supplementary materials
Permalink to this page
Back