Abstract
Unraveling challenging problems by machine learning has recently become a hot topic in many
scientific disciplines. For developing rigorous machine-learning models to study problems of interest
in molecular sciences, translating molecular structures to quantitative representations as suitable
machine-learning inputs plays a central role. Many different molecular representations and the state-ofthe-
art ones, although efficient in studying numerous molecular features, still are sub-optimal in many
challenging cases, as discussed in the context of present research. The main aim of the present study is
to introduce the Implicitly Perturbed Hamiltonian (ImPerHam) as a class of versatile representations
for more efficient machine learning of challenging problems in molecular sciences. ImPerHam
representations are defined as energy attributes of the molecular Hamiltonian, implicitly perturbed by a
number of hypothetic or real arbitrary solvents based on continuum solvation models. We demonstrate
outstanding performance of machine-learning models based on ImPerHam representations for three
diverse and challenging cases of predicting inhibition of the CYP450 enzyme, high precision and
transferrable evaluation of conformational energy of molecular systems and accurately reproducing
solvation free energies for large benchmark sets.