Hydration free energies from kernel-based machine learning: Compound-database bias
| Authors |
|
|---|---|
| Publication date | 07-07-2020 |
| Journal | Journal of Chemical Physics |
| Article number | 014101 |
| Volume | Issue number | 153 | 1 |
| Number of pages | 9 |
| Organisations |
|
| Abstract |
We consider the prediction of a basic thermodynamic property - hydration free energies - across a large subset of the chemical space of small organic molecules. Our in silico study is based on computer simulations at the atomistic level with implicit solvent. We report on a kernel-based machine learning approach that is inspired by recent work in learning electronic properties but differs in key aspects: The representation is averaged over several conformers to account for the statistical ensemble. We also include an atomic-decomposition ansatz, which offers significant added transferability compared to molecular learning. Finally, we explore the existence of severe biases from databases of experimental compounds. By performing a combination of dimensionality reduction and cross-learning models, we show that the rate of learning depends significantly on the breadth and variety of the training dataset. Our study highlights the dangers of fitting machine-learning models to databases of a narrow chemical range. |
| Document type | Article |
| Note | With supplementary files |
| Language | English |
| Published at | https://doi.org/10.1063/5.0012230 |
| Other links | https://www.scopus.com/pages/publications/85087589076 |
| Downloads |
5.0012230
(Final published version)
|
| Supplementary materials | |
| Permalink to this page | |
