Abstract
Active learning through interactive exploration significantly enhances student engagement and understanding in chemical education. This educational activity leverages Principal Component Analysis (PCA) and Partial Least Square-Discriminant Analysis (PLS-DA), two foundational machine learning techniques widely applied in contemporary research. Interactive Python-based Jupyter notebooks offer accessible educational platforms for students exploring the chemical data, requiring no prior 15 programming experience. These notebooks allow learners to actively engage in feature exploration and dimensionality reduction processes, applied to clustering and classifying binary AB equiatomic solid state compounds. Students can actively select and modify chemical and physical features, observing in real time how these choices impact the effectiveness of PCA and PLS-DA clustering models. Initially, PCA enables unsupervised visualization of natural clustering and correlations among compounds 20 without prior labeling. Subsequently, employing PLS-DA, students develop supervised models capable of predicting crystal structures, explicitly illustrating supervised versus unsupervised learning paradigms. The proposed activity highlights the importance of explainability in machine learning models, rather than operating the models as a "black box". Beyond learning fundamental concepts, the activity encourages students to participate in genuine exploratory processes, mirroring the investigative 25 approaches historically utilized by researchers and practiced today. By experimenting freely with datasets and computational methods, students experience firsthand the iterative nature of scientific discovery, fostering deeper insight into both chemical informatics and the broader research methodology.