Abstract
Next-generation sequencing technologies enable doubling of the genomic databases every 2.5 years. Collected sequences represent a rich source of novel biocatalysts. However, the rate of accumulation of sequence data exceeds the rate of functional studies, calling for acceleration and miniaturization of biochemical assays. Here, we present an integrated platform employing bioinformatics, microanalytics, and microfluidics and its application for exploration of unmapped sequence space, using haloalkane dehalogenases as model enzymes. First, we employed bioinformatic analysis for identification of 2,905 putative dehalogenases and rational selection of 45 representative enzymes. Second, we expressed and experimentally characterized 24 enzymes showing sufficient solubility for microanalytical and microfluidic testing. Miniaturization increased the throughput to 20,000 reactions per day with 1000-fold lower protein consumption compared to conventional assays. A single run of the platform doubled dehalogenation toolbox of family members characterized over three decades. Importantly, the dehalogenase activities of nearly one-third of these novel biocatalysts far exceed that of most published HLDs. Two enzymes showed unusually narrow substrate specificity, never before reported for this enzyme family. The strategy is generally applicable to other enzyme families, paving the way towards the acceleration of the process of identification of novel biocatalysts for industrial applications but also for the collection of homogenous data for machine learning. The automated in silico workflow has been released as a user-friendly web-tool EnzymeMiner: https://loschmidt.chemi.muni.cz/enzymeminer/.