Data-driven models are being developed to predict battery lifetime because of their ability to capture complex aging phenomena. In this perspective, we demonstrate that it is critical to consider the use cases when developing prediction models. Specifically, model features need to be classified to di↵erentiate whether or not they encode cycling conditions, which are sometimes used to artificially increase the diversity in battery lifetime. Many use cases require the prediction of cell- to-cell variability between identically cycled cells, such as production quality control. Developing models for such prediction tasks thus requires features that are blind to cycling conditions. Using the dataset published by Severson et al. in 2019 as an example, we show that features encoding cycling conditions boost model accuracy because they predict the protocol-to-protocol variability. However, models based on these features are less transferable when deployed on identically cycled cells. Our analysis underscores the concept of using the right features for the right prediction task. We encourage researchers to consider the usage scenarios they are developing models for, and whether or not to blind their model from information on cycling conditions in order to avoid information leakage. Equally important, benchmarking model performance should be carried out between models developed for the same use case.
Supplemental material for the perspective article "Battery lifetime predictions: information leakage from unblinded training"