I. INTRODUCTION
The nearly 200,000 structural models in the Protein Data Bank (PDB) constitute a treasure trove for biochemistry. This resource will only get richer as the pace of structural biology accelerates due to advances in cryo-electron microscopy (1) and computational structure prediction (2, 3), which have provided structural models for whole proteomes (4). This implies that a structural understanding of biomolecules is no longer a specialized topic but rather an integral part of biochemistry. Consequently, all scientists working in the molecular biosciences will need to know how to analyze and evaluate structural models at some level. To accomplish this, an undergraduate degree program in the molecular biosciences should not only provide students with a theoretical understanding of the structural basis of biochemistry but also equip them with the tools necessary for structural analysis.
A. Scientific and pedagogical background
Structural analysis of biomolecules relies on molecular graphics software. Various programs, such as RasMOL (5), Swiss PDB Viewer (6), Jmol (7), and UCSF Chimera (8), have historically allowed users to visualize atomic coordinates in various representations and create stunning illustrations (9). These programs allow the user to navigate structures and focus on features of functional relevance. Advanced programs have built-in analysis tools used for interpretation of structural models, including measurement of interatomic distances, identification of hydrogen bonds and salt bridges, mapping of contacts between molecules, structural alignment of different models, and mapping of physical properties, such as surface charges, onto the structure. In the popular program PyMOL (10), most common functions are available via the graphical user interface (GUI). Advanced functions, however, require text commands that can be combined into elaborate analysis scripts using procedural programming (11, 12). The use of molecular graphics software is a key skill set that allows structural interpretation of molecular mechanisms and formulation of new hypotheses.
The constructivist theory of learning posits that knowledge cannot be passively transmitted from educator to learner but must be constructed anew in the mind of the learner. This has led to a range of “active learning” approaches, where the learners are guided through activities that allow them to rediscover the material for themselves (13). A large body of evidence shows that students learn better when they are active rather than passive (14), although students tend to underestimate the benefits due to the greater mental effort required for active learning (15). However, experimental structural biology is time consuming and requires expensive equipment and is thus difficult to implement in undergraduate classes. In contrast, molecular graphics software is run on standard laptops. Moreover, PyMOL is available at no cost for educators and students as an “Educational-use-only” version (https://pymol.org/edu). Therefore, structural analysis and visualization can be taught actively even in undergraduate classes with hundreds of students, using the students' own hardware. Moreover, exercises can be designed to simultaneously build structural understanding and acquire useful research skills. Such skills can readily be put to use in individual research projects by advanced undergraduate students. For example, PyMOL was recently used in an “Undergraduate Research Laboratory Experience” where students assigned functions to targets from the Protein Structure Initiative (16).
There are several published examples of using PyMOL to analyze protein structures in a university study setting. Lineback and Jansma (17) used PyMOL to investigate the structures and homology between myoglobin and hemoglobin. Rigsby and Parker (18) also used hemoglobin as an example to explore ligand binding, while Allred et al. (19) used PyMOL to investigate the pH dependence of the interaction between immunoglobulin G and protein A. Finally, Simmons et al. (20) used PyMOL to investigate the impact of disease-associated mutations in 4 proteins. All these published teaching materials are intended to be used as single sessions, which resembles the way we originally taught the use of molecular graphics software. However, we observed several problems with this approach that we attribute to the single-session format. First, modern molecular graphics software has so many functions that students are easily overwhelmed by information if everything must be given in a single session. This left us with the unenviable choice of either omitting essential elements or moving at a pace that many students cannot follow. Second, introducing structural analysis in a single session typically also means limiting it to a single scientific context. It is rarely possible to use all the needed tools in a scientifically relevant way in a single session. Third, consolidation in long-term memory is greatly enhanced by spaced repetitions, where material is actively recalled at a later date (21). This is not feasible in stand-alone sessions.
To improve the pedagogical design, we adapted our existing teaching of practical PyMOL skills to second-year biochemistry students from a single session to a semester-long learning path (Fig 1). The learning path progresses from basic to advanced competences in structural analysis and introduces many tools currently used in research. In our experience, many students resist moving beyond the GUI, which prevents them from using advanced tools as well as automating repeated tasks using procedural scripts. For these reasons, we aimed to familiarize the students with the command line early on. We think PyMOL offers an excellent opportunity for teaching scripting to biology-focused science students; these are broadly applicable skills in the molecular biosciences and bioinformatics. Furthermore, we aim to support the development of a “computational thinking” mind-set. Computational thinking involves analyzing a problem and expressing its solution in machine-executable form, typically in the form of an algorithm that breaks the process down into simple sequential steps. Computational thinking is thus a skill set that transcends any specific scripting language. While the importance of scripting may be apparent to the instructors, it is not necessarily apparent for an undergraduate specializing in biology. A key pedagogical challenge is thus to design exercises where the students quickly become able to solve problems that are meaningful to them. A PyMOL script comprising a few lines of code can deliver stunning visualizations and allow the students to experience early successes. We thus think PyMOL offers a great opportunity for introducing the students to a computational mind-set that is transferable to other scientific contexts.
B. Teaching materials
The complete learning path consists of problem sets within 19 topics, each containing between 1 and 6 exercises, resulting in 53 exercises in total. Additionally, there are 6 instructional videos and a test used to evaluate the students' PyMOL acquired competences. Figure 1 provides an overview of the teaching materials, including when new tools are introduced. The materials were originally in Danish but have been made available in English as a course pack via Figshare (https://figshare.com/s/0a55edec9b8e441b2813) or as supplementary information. The videos support different learning styles and illustrate concepts that are difficult to describe in writing, such as the use of the GUI. A set of advanced videos illustrates how to use scripts and implement custom-designed functions using the application programming interface and are deliberately kept below 10 minutes.
The 53 exercises that require PyMOL make up the core of the learning path. They cover a wide range of relevant biological topics and mechanisms and will fit into most introductory courses in biomolecular structure and function. The course starts by introducing the basic structure of proteins and nucleic acids, then progresses through important classes of biomolecules, such as enzymes, pumps, and channels. Additionally, the course deals with important regulatory concepts, such as posttranslational modifications, trafficking, and structural dynamics, as well as essential structural techniques, including X-ray crystallography, cryogenic electron microscopy, nuclear magnetic resonance spectroscopy, and fluorescence. The students are also assigned reading materials describing the theoretical background of the topic, typically a chapter in a textbook, and solve other problems that do not require PyMOL. If this is material that is already covered, our materials could be used as a stand-alone PyMOL learning path. The materials are organized to follow the progression of scientific topics (Fig 1), and each topic is thus accompanied by between 1 and 6 exercises containing up to 13 questions. Questions get progressively more difficult within an exercise, which allows students at different academic levels to contribute.
PyMOL functions are introduced in a progressive manner such that the program is initially controlled through the GUI, after which the students are gradually encouraged to switch to the command line and subsequently scripts. Initially, the course focuses on visualization, followed by analysis, and finally touches on manipulation of structures during model building. Beyond the first sessions, only a few new tools are introduced in each session as relevant to the scientific context. For example, the structural alignment tool is introduced in the session on protein evolution. Key functions are repeatedly used throughout the learning path to reinforce long-term memory in students. New tools are explained in a short paragraph immediately before they are needed. Many of the exercises also include premade PyMOL scripts that demonstrate the power of scripting and allow the students to learn by example through the copying of relevant sections of the code. The students are also directed to use Internet resources, such as the RCSB Protein Databank (https://www.rcsb.org), the Protein Structure Classification Database, CATH (https://www.cathdb.info), UniProt (https://www.uniprot.org), and ProtParam (https://www.expasy.org/resources/protparam), and are instructed to find further guidance on PyMOL usage on PyMOLwiki (https://pymolwiki.org).
II. Results
A. Educational setting
The PyMOL learning path was designed for a course on biomolecular structure and function, which is a mandatory part of the third semester of the BS programs in molecular biology, molecular medicine, and medicinal chemistry at Aarhus University, Aarhus, Denmark. Overall, the course consists of lectures, classroom teaching, online learning tasks, and an experimental lab exercise. The course uses mainly the textbook Biochemistry by Berg et al. (22), supplemented with material from other sources. The learning path was implemented as part of 2-hour classroom sessions conducted twice a week, meeting in groups of 20–30 students led by a teaching assistant. The students were asked to attempt to solve the problem sets before arriving in class but were instructed that the difficulty was such that they were not expected to be able to finish all the problems themselves. In the classroom, each exercise was assigned to a group of 3 or 4 students who were given half an hour to prepare, aided by the teaching assistant, before presenting their solution to the rest of the class. This format allows for peer-to-peer learning while the instructor circulates to help groups that are stuck. The supervised preparation in small groups also serves to buttress struggling or insecure students by allowing them to whet their answers in a safe setting before presenting to the whole class. As a principle, all students must present part of a solved problem in every session. The instructor listens to the presentations and corrects errors or misunderstandings. Typically, each session would contain 1 exercise from the PyMOL learning path, while the remaining exercises focus on other aspects of the scientific topic that do not require structural analysis.
B. Evaluation of the students' perception
In the end-of-course survey, the students were asked 3 questions about the PyMOL learning path; 115 students returned the questionnaire out of 146 students participating in the final exam. Some students register for the course without attending classes or taking the exam. We think the number of students taking the final exam is the best proxy for the active student population and thus use these to calculate response rates. In principle, it is possible that some students filled the survey without taking the exam. However, we think this is unlikely to change the results by much. Our questionnaire thus had an estimated response rate of 79% for active students. The high response rate suggests that we capture opinions of a representative cross section of the students. First, the students were asked to assess how much time they had spent preparing for the PyMOL learning path per week (Fig 2A). The results show that students typically spent 2–4 hours per week on the learning path, which translates into 1–2 hours per exercise and suggests that most students made a serious attempt to solve the exercises. A small fraction of students report spending more than 8 hours per week on the PyMOL problems, which would be excessive. We think that this is misreporting the total preparation time spent on the course rather than just the PyMOL exercises. Nevertheless, in future iterations of the course, it might be useful to instruct the students not to get bogged down by single exercises.
Next, the students were asked to rate the degree to which the PyMOL learning path supported their ability to perform structural analysis (Fig 2B). Fifty percent of the students responded “very large” or “large,” with most other students replying “some.” Overall, most of the students felt that the exercises helped them analyze biomolecular structures, which was the central aim of the learning path. Finally, the students were asked to rate the degree to which the progression and the difficulty of the learning path were appropriate (Fig 2C). Thirty-three percent responded “very large” or “large,” whereas most responded “some.” This question did not distinguish between students who thought it was too easy or too hard. However, the questionnaire also allowed free text comments where several students indicated that they found the exercises too difficult or too time consuming. None stated that the exercises were too easy. In summary, this suggests that exercises were challenging but not to an extent that led the students to give up. The student-reported learning outcomes should be taken with a grain of salt, as the increased effort required for active learning makes students prefer passive approaches and to overestimate the learning outcomes of these relative to active approaches (15).
C. Evaluation of learning outcomes
We also asked the students to participate in an invigilated test that would evaluate their practical PyMOL skills after completing the course. Out of the 146 students who took the final exam, 123 students participated in this test, and again the high participation rate (84%) suggests that we tested a representative sample. The students were asked to solve 8 tasks individually within 45 minutes and provide their answers as a PyMOL script (Fig 3). This test format requires that the students produce an operational script and thus directly tests their computational skills and mind-set. During the test, the students had access to written resources and the Internet but were not allowed to communicate with one another. This format allows students to look up command syntax, but with an average of less than 6 minutes per question, there was little chance to learn to solve problems from scratch. The questions prompt the students to analyze 2 different crystal structures of the amino acid transporter LeuT (23, 24). As a starting point, the students were given a script that would load the required PDB files and create 8 empty scenes, 1 for each of the answers. The questions follow the progression of the learning path and were meant to become increasingly difficult (Fig 3A). The first 3 questions requested illustration of structural features such as the fold of the protein (Q1), nonprotein components (Q2), and selected residues (Q3). The next 3 questions requested common structural analyses, such as measuring the shortest distance between 2 residues (Q4), highlighting residues within 6 Å from some metal ions (Q5), and a structural alignment of 2 structures (Q6). The final 2 questions requested information to be mapped onto the structures via a color gradient, such as the B-factor of the crystal structure (Q7) and vacuum electrostatics (Q8). The answers were evaluated based on visual inspection of the resulting 8 scenes generated by the submitted scripts, and in cases of doubt, the raw scripts were inspected. Minor deviations from perfection, such as reversed color gradients or scripts that were nonfunctional due to wrong file extensions or omission of elements from the provided template, were tolerated.
The results show that practically all students (98%) manage to visualize the fold of the proteins and highlight nonprotein components after having completed the learning path. A large majority (86%) was also able to highlight specific residues. The test was conducted about 3 months after these functions were introduced in the course, which suggests that the students remember how to use of basic functions under realistic working conditions due to regular repetition. Furthermore, this shows that almost all students have a basic understanding of procedural scripting as implemented in PyMOL. The success rate dropped to 76% in Q4, which asked the students to measure the distance between 2 specific atoms. The syntax for this selection is more complex, as it involves both a residue number and a specific atom. The lowest success rate (68%) was found for Q5, which asked the student to select and highlight residues within a custom distance of 6 Å. In the analysis section, 90% managed to perform a full-length structural alignment of 2 different structures. The last 2 questions requested structural properties to be displayed as color gradients (Q7: B factor visualized on the protein backbone; Q8: vacuum electrostatics plotted on a surface); this was managed by 74% and 73% of the students, respectively. In total, 44% of students answered all questions correctly, and 80% had at least 6 correct answers, suggesting that they managed to solve at least 1 advanced task. As many students solved all tasks, we may underestimate the level of the most accomplished students' learning. Anecdotally, we noticed that some of the students were highly motivated by the PyMOL scripting and moved beyond curriculum using online resources.
III. Discussion
Computer-based structural analysis is a key practical skill in modern biochemistry. The lack of requirements for specialized equipment makes computer-based analysis an attractive platform for active learning approaches in large classes. Here, we have redesigned our teaching of the molecular graphics software PyMOL to a semester-long learning path to integrate practical skills into relevant scientific contents. By distributing the exercises throughout the semester, it was possible to divide the learning into manageable chunks and introduce spaced repetition. We covered many more functions than in previous years, so our quantitative evaluation cannot be immediately compared to a control group. Our evaluation indicates that after completing the course, most students mastered many advanced functions, including analysis functions typically used in functional interpretation in research labs. The gradual learning path thus likely allowed the students to achieve a higher level of proficiency than was possible in a single-session format.
A semester-long PyMOL path requires a great deal of time and may therefore be more difficult to incorporate in all educational settings. However, as the teaching of the computer analysis is tightly integrated with the learning goals of the overall course, it may not require much more instructional time but rather a change in the nature of the teaching activities to emphasize student active learning and allow independent exploration of structural biology. While the sessions form part of a learning path, they can also work as independent modules. As specialized functions are introduced when they are needed, it is to some extent possible to excise individual exercises to adapt the path to a different curriculum. We thus share the materials of the learning path as supplementary material in the hope that they will be useful to others.
Students enter the course with a wide range skills and attitudes to computers. From the onset, some students already know how to code and may even be more computer savvy than the instructors. Others are only superficially familiar with computing and may have previous negative experiences with coding. The preexisting skills of the students thus vary more than usual in molecular biosciences, which could explain the wide range of different reported preparation times (Fig 2A) and perception of the difficulty (Fig 2C). This poses a challenge for course design: How do you motivate advanced students without losing the students who struggle? We thus think it is important to teach the use of the GUI in parallel to scripting. The GUI lowers the barrier for accessing basic tools but does not allow for all advanced functions. Furthermore, a self-paced learning path supports struggling students with gradual progression and manageable information chunks while allowing more experienced students to skim basic exercises before independently moving beyond the confines of the course. The course also serves as an intuitive way of introducing computational thinking in undergraduate classes. Practically all students managed to produce a PyMOL script and thus have at least a rudimentary understanding of procedural scripting. The learning path thus serves as a primer to scripting that is used broadly in the molecular biosciences in, for example, automation and data analysis as well as bioinformatics.