Abstract
Genomic sequencing and other big biological data is unquestionably of paramount value, however the success in recruiting highly skilled individuals with diverse backgrounds has been limited. A main reason for this deficiency could be due to the lack of educational resources and early exposure to the field. With the steady increase in big biological data over the past decade1, we not only need to increase the number of skilled researchers in the field but also empower the next generation of students with skills that can apply data analysis skills to a variety of career trajectories23,4. Here, we share a successful example of integrating python-based interactive digital notebooks in a large-enrollment undergraduate chemistry course with more than 400 participants across various degree programs. The goal of this manuscript is to detail the teaching pedagogy, supply the teaching materials, and evaluate the outcomes of integrating coding in a large-enrollment undergraduate chemistry course. The benefit of integrating coding exercises in large-enrollment undergraduate classes is to provide earlier exposure of data science to undergraduate students. Gaining skills in big data analysis will be an asset to any chemist, biologist, physician or scientist, regardless of career path or academic trajectory.