I. INTRODUCTION
Physical processes unfold over time. Our minds grasp physical mechanisms largely by narrative, so it is not surprising that some of the most vivid physics demonstrations also play out over time. Simulations of physics that unfold over time are similarly powerful; simulations created by the student can be best of all. This view is gaining ground in introductory courses (1), but the benefits of animated simulation extend farther than this. Here we wish to show that the behavior of strongly nonequilibrium statistical systems can be illustrated via stochastic simulations that are simple enough to serve as undergraduate projects. Recently developed, free, open source programming resources sidestep some of the laborious coding chores that were once required for such work. In particular, we believe that the error correction mechanism known as kinetic proofreading can be more clearly understood when a student views typical stochastic temporal sequences, as opposed to solving a set of deterministic rate equations (or even more complicated approaches involving first-passage times, etc.). Coding this and other simple processes also opens the door for the student to study systems that are too complex for the rate-equation approach to yield insight.
II. SCIENTIFIC AND PEDAGOGICAL BACKGROUND
A. Double-well hopping
Students are often told that a simple chemical reaction, for example isomerization of a macromolecule, can be regarded as a barrier-passing process. A micrometer-size bead in a double optical trap serves as a mesoscopic model system with this character (2), and it is worthwhile for students to watch it undergo a few dozen sharp transitions in between episodes of constrained Brownian motion near its two stable positions (see Video S1 in the Supplemental Material). A simple model for this behavior states that the hopping transitions occur at random times drawn from an exponential distribution; that is, many rapid transitions are interspersed with a few long pauses. Section III.A below discusses a simple simulation of this process.
B. Birth–death process
Section III.B generalizes from situations with essentially only one kind of transition (or two symmetric kinds) to the more interesting case in which several inequivalent choices are possible, and where the relevant transition probabilities depend on the current state. This general situation can describe a chemical reaction that requires, and depletes, molecules of some substrate.
Most science students know that living cells synthesize each of their messenger RNAs (mRNAs) from a single copy (or a small fixed number) of the corresponding gene. Some genes are constitutive (unregulated), and we model these with a constant rate of mRNA synthesis. Once an mRNA transcript has formed, it has a limited lifetime until it is degraded by other cellular machinery. We assume that this process, too, relies on chance encounters with degradation enzymes. Each of many species of mRNA must all share the attentions of a limited number of degradation enzymes, so each mRNA copy has a fixed probability per unit time to be removed from the system.
The physical hypotheses in the preceding paragraph amount to a model called the birth–death process, which has many other applications in physics and elsewhere. Similar to a one-dimensional (1D) random walk, we characterize the system's state by an integer, in this case the population size of the mRNA of interest. Synthesis is a transition that increases this number, with a fixed probability per unit time k s (called the mean rate of synthesis). Degradation is a transition that decreases it, with a probability per unit time that is the current population n times another constant k d (the rate constant for degradation). Section IV.B describes the insights students can get from simulating this model and extend them to describe the stochastic phenomenon of gene bursting.
C. Proofreading
Ask a student, “What is the big secret of life?” and the answer will probably be “DNA,” or perhaps “evolution by natural selection.” Indeed, DNA's high, but not perfect, degree of stability underlies life's ability to replicate with occasional random modifications. However, it is less well appreciated that the stability of a molecule of DNA does not guarantee the accuracy of its replication and transcription. There is another big secret here, just as essential to life as the well known ones. In fact, a wide range of molecular recognition events must have extremely high accuracy for cells and their organisms to function. Think of our immune cells, which must ignore the vast majority of antigens they encounter (from “self”), yet reliably attack a tiny subpopulation of foreign antigens differing only slightly from the self.
Translation of mRNA into proteins is an emblematic example of such a puzzle. It is true that artificial machines now exist that can read the sequence of mRNA. Then, another artificial machine can take the resulting sequence of base triplets, decode it, and synthesize a corresponding polymer of amino acids (a polypeptide), which in some cases will then fold into a functional protein without further help. But the cells in our bodies, and even bacteria, do these jobs reliably without those huge and expensive machines, despite the incessant nanoscale thermal motion!
Merely intoning that a wonderful molecular machine called the ribosome accomplishes this feat doesn't get us over the fundamental problem: At each step in translation, the triplet codon at the ribosome's active site fits a few of the 41 distinct Escherichia coli transfer RNA (tRNA) isoacceptors somewhat better than it fits the others. But the binding energy difference, which quantifies “somewhat better,” only amounts to two or three hydrogen bonds. This translates into a fraction of time spent bound to the wrong tRNAs that is about 1/100 times as great as the corresponding quantity for the correct amino acid (3). If the fraction of incorrect amino acids incorporated into a polypeptide chain were that high, then every protein copy longer than a few hundred amino acids would be defective!
In fact, the error fraction of amino acid incorporation is more like 10−4. The fact that this figure is so much smaller than the one seemingly demanded by thermodynamics remained puzzling for decades. After all, the ribosome is rather complicated, but it is still a nanoscale machine. Which of its features could confer this vast improvement in accuracy?
Hopfield and Ninio proposed an elegant physical mechanism for the ribosome's surprising accuracy (3–5). To explore it, we begin by paraphrasing a metaphor from Alon (6). Imagine that you run an art museum and want to find a mechanism that picks out Picasso lovers from among all your museum's visitors. You could open a door from the main hallway into a room with a Picasso painting. Visitors would wander in at random, but those who do not love Picasso would not remain for as long as those who do. Thus, the concentration of Picasso lovers in the room would arrive at a steady value (with fluctuations) that is enriched for the desired subpopulation.
To improve the enrichment factor further, you could hire an employee who occasionally closes the door to the main hallway, stopping the dilution of your enriched group by random visitors, then open a new exit doorway onto an empty corridor. Some of the trapped visitors will gratefully escape, but die-hard Picasso lovers will remain, leading to a second level of enrichment. After an appropriate time has elapsed, you can then reward everyone still in the room with, say, tickets to visit the Picasso museum in Paris.
In the ribosome, the initial, reversible binding of a tRNA is followed by a transformation analogous to closing the door in the preceding metaphor. This transformation involves hydrolysis of a guanosine triphosphate (GTP) molecule complexed with the tRNA; hence, it is nearly irreversible because of the highly nonequilibrium concentration of GTP compared with the hydrolysis products guanosine diphosphate (GDP) and Pi (inorganic phosphate). Such hydrolysis reactions are well known to supply the free energy needed to drive otherwise unfavorable reactions in cells, but here their role is more subtle.
Hopfield (4) knew that after hydrolysis, incorporation of the amino acid was delayed and could still be preempted by unbinding of the tRNA complex. The existence of this pathway was previously known but had seemed wasteful: an energy-rich GTP had been spent without anything useful (protein synthesis) being done. On the contrary, however, this second step implements the mechanism in the art museum metaphor, giving the ribosome an independent second chance to dismiss a wrong tRNA that accidentally stayed bound long enough to progress to this stage. Spending some GTPs may be a modest price to pay compared with creating and then having to detect and recycle an entire defective protein.
Hopfield coined the name kinetic proofreading for this mechanism, but we will refer to it as the classic Hopfield–Ninio (HN) mechanism because the original term is somewhat misleading. In chemical reaction contexts, a kinetic mechanism generally implies bias toward a product with lower activation barrier, even if it is less stable than another product with a higher barrier. This preference is most pronounced at high, far-from-equilibrium catalytic rates (7). In contrast, the classic HN proofreading mechanism involves two sequential thermodynamic (quasi-equilibrium) discriminations. Moreover, these discriminations take place before reading even the very next codon, in contrast to editorial proofreading, which generally happens after an entire manuscript is written. (Our choice of term also distinguishes the classic scheme from later models that are sometimes also called kinetic proofreading.)
In a nutshell, we will outline an exploration of Hopfield's conclusion (4) that an effectively irreversible step, or at least a step far from equilibrium, can give rise to enhanced accuracy. The free energy of GTP hydrolysis is the price paid for this accuracy.
III. METHODS
A. Double-well hopping
With the physical motivation from section II.A, students can explore how to generate simulated waiting times. Any computer math system has a pseudorandom number generator that generates floating-point numbers uniformly distributed between 0 and 1. Many students are surprised (and some are intrigued) to learn that applying a nonlinear function to samples from a random variable yields samples with a different probability density function, and in particular that y = –τ ln x is exponentially distributed, with mean τ, if x is uniform on (0,1] (8).
Starting from that insight, it takes just one line of code to generate a list of simulated waiting times for transitions in a symmetric double well; finding the cumulative sums of that list gives the actual transition times (see Computer Code S1 in Supplemental Material and Kinder and Nelson (9)].
The freely accessible VPython programming system (or its web-based version GlowScript) makes it very easy to create an animation of an object whose spatial position is supplied as a function of time (10). The only challenging part is to pass from a list of irregularly spaced transition times tm to particle positions x at each of many (regularly spaced) video frames. Computer Code S1 accomplishes this transformation in a few lines summarized here:
-
Divide the desired total movie duration into real time steps according to the desired frame rate. Initialize real time to the end of the first video frame. Initialize transition number to m = 0.
-
Step forward in transition number m until the time tm of the most recent transition exceeds current real time. Switch states for each transition encountered (if any) and save the state at the end of the video frame.
-
Update the real time to the end of the next video frame and repeat step 2 until the whole animation has been generated.
The payoff is immediate: visually, the simulated trajectories have a very similar character to the actual hopping of a bead in a double trap (Video S2).
B. The stochastic simulation algorithm
Gillespie extended, validated, and popularized a simple but powerful method, the stochastic simulation algorithm, which generalizes the preceding idea to describe systems that make transitions between multiple discrete states (11). For any allowed state i, we have been given the probability per time kj ← i to transition to every other allowed state j. The algorithm repeatedly executes the following steps:
-
The system begins in some initial state i. It waits for a certain time, then transitions to a new state. To find the waiting time and the new state:
-
Determine the probability per time k tot for any of the allowed transitions to occur by summing all the mean rates, kj←i over j.
-
Draw a waiting time from the exponential distribution with mean given by the reciprocal of k tot by the method in section III.A: Δt = –(k tot)−1 ln x.
-
Determine which of the M allowed processes happens at that transition time by drawing from a nonuniform discrete distribution on M objects. The probabilities for the choices j are proportional to the corresponding kj←i .
-
-
Update the current state according to that decision. Update the current time by adding the waiting time Δt. Return to step 1 and repeat until the required total elapsed time is reached.
The beauty of this algorithm, besides its correctness (12), is that no computation is wasted on time steps at which nothing happened: by construction, there is a state transition at every chosen time.
Two-state hopping is a particularly simple example of this algorithm: here, the state variable is position x = ±1. The transition rate depends on the system's state only in the trivial sense that each state can hop only to the other one; the numerical value of the rate constant is always the same.
C. Birth–death process
Computer Code S2 implements a stochastic simulation of the birth–death process introduced in section II.B. Here, the state is specified by an integer n, the population of the molecular species of interest. In each transition, n can either increase or decrease by one.
The probability per time to make a transition is not a constant; rather, k tot = k s + nk d, where k s is the probability per time for synthesis and k d is the rate constant for degradation. To decide what reaction occurs, we make a Bernoulli trial with probability p = k s/k tot to increase population n by one, and 1 – p to decrease it.
D. Proofreading
1. Simulation goals
The qualitative word model given at the start of section II.C may seem promising, but the corresponding kinetic equations make for difficult reading and understanding. Better intuition could emerge from a presentation that stays closer to the concrete ideas of discrete actors randomly arriving, binding, unbinding, and so on, visibly implementing the ideas behind the museum metaphor. The following sections argue that stochastic simulation can realize that goal.
Besides confirming the predicted accuracy payoff, we will explore the claim that the enhancement of accuracy depends on GTP, GDP, and Pi being held far from chemical equilibrium, so that the hydrolysis step is nearly irreversible (the “door shuts tightly” in the museum metaphor). In fact, the model predicts no enhancement of accuracy compared with the one-step model when this chemical driving force is low (4). Far from equilibrium, however, we will confirm Hopfield's observation that the predicted error fraction can be as low as the square of the equilibrium value (or even a higher power if multiple rounds of sequential testing are employed).
In addition to checking Hopfield's prediction, section IV.C explores speed–accuracy tradeoffs in different minimal models of translation. The classic HN mechanism relies only on off-rate differences between right and wrong tRNA. However, differences in on-rates have also been observed between right and wrong tRNA (13–16), so we also explore a model relying solely on on-rate differences. Finally, we explore a model in which all kinetic parameters are based on in vitro translation experiments, which includes both on-rate and off-rate differences. The speed and error rate of translation obtained in the three different models will be compared with the speed and error rate measured in vitro, which, respectively, are about 0.25–8 amino acids/s (17, 18) depending on the in vitro system used and 1.6 · 10−3 for Zaher and Green's (13) in vitro system. Translation rates in vitro are typically slower than in vivo E. coli translation rates of 10–20 amino acids/s (18). Zaher and Green's in vitro study did not directly measure translation rates, but from the assumption that peptide bond formation is the rate-limiting step, the upper limit for translation rates in their system should be 7 amino acids/s. The actual translation rate is likely lower because of tRNA rejections at the proofreading steps.
This exercise will provide insight into the contribution of off-rate and on-rate differences to the speed and accuracy of translation. All these variations are readily implemented by changing a handful of constants in a simulation code.
2. A single ribosome in a bath of precursors
This section's goal is to formulate the word model of section II.C in the context of mRNA translation, then set up a stochastic simulation [section III.B; see also Zuckerman (19)]. Later sections will show how students can explore the expectations raised at the end of the preceding section.
We will assume that a single ribosome is complexed with a single mRNA and has arrived at a particular codon. This complex sits in a bath containing the following free, dissolved species at fixed concentrations (Fig 1a):
-
C denotes correct tRNA (i.e., the species that matches the codon currently being read), loaded with its appropriate amino acid. We neglect the possibility of a tRNA being incorrectly loaded; accurate loading is the concern of a separate proofreading mechanism that we are not studying now (21, 22).
-
W is similar to C, but refers to the wrong tRNA for the codon under study.
-
Other reactions supply complexes of tRNA with guanosine phosphates: C·GTP, C·GDP, W·GTP, and W·GDP. (For simplicity, we suppress any mention of elongation factors, one of which, EF-Tu (elongation factor thermo unstable), is also included in these complexes but is only implicit in the classic HN mechanism.)
Computer Code S3 implements a stochastic simulation on the five states shown in Figure 1b. The figure represents the ribosome–mRNA complex by ribosome R. In state 0, this complex is not bound to any tRNA. (More precisely, no tRNA is bound at the A site of the ribosome; a previously bound tRNA, together with the nascent polypeptide chain, is bound at another site [labeled P in Fig 1a], which we do not explicitly note.) Surrounding this state, Figure 1b shows four other states 1–4 in which the ribosome is bound to the complexes introduced earlier. The upper part of the figure describes wrong tRNA binding and possible incorporation; the lower part corresponds to the correct tRNA. Horizontal arrows at the top and bottom denote hydrolysis of GTP, which is coupled to a transformation of the ribosome into an activated state, R*.
Although any chemical reaction is fundamentally reversible, under cellular conditions the concentration ratio [Pi][C·GDP]/[C·GTP] is far below the equilibrium value, so that the reactions in Figure 1b are predominantly in the direction shown by the pale purple arrows, which was one of the conditions in Hopfield's original proposal and corresponds to the usual biochemical assumption of irreversibility. (Section IV.C.5 will explore relaxing it.)
Again, we are assuming that a single ribosome bounces around this state diagram in the presence of fixed concentrations of feedstocks either imposed in vitro by the experimenter or supplied by a cellular milieu. After hydrolysis, the ribosome can reject its tRNA–GDP complex with probability per unit time ℓ. In the metaphor of section II.C, this option corresponds to “exiting the museum exhibit by the second door.” Or, with probability per unit time k add, the ribosome complex can add its amino acid to the nascent polypeptide and translocate the tRNA to the P binding site, ejecting any tRNA already bound there. Either way, the A binding site becomes vacant and, for the purposes of this state diagram, the ribosome returns to state 0.
IV. RESULTS AND DISCUSSION
A. Multiwell hopping
Section III.A described a simulation of double-well hopping. A small modification of the algorithm described there gives an interesting extension: Instead of hopping between two wells (reversing direction on every step), consider one-dimensional diffusion on a symmetric many-well potential, for example, one of the form U(x) = sin(x). In such a potential, for each transition the system must also make a random decision whether to increase or decrease a position coordinate x by 2π. The resulting random walk will display the same long-time scaling behavior as any unbiased 1D walk, but with trajectories that hop at random times, not periodic steps as in the simplest realization (8).
B. Birth–death model
1. Convergence to the continuous, deterministic approximation
Students will probably find it reasonable that, when the mRNA population size n is sufficiently large, we may neglect its discrete character. Students who have been exposed to probability ideas may also find it reasonable that, in this case, the relative fluctuations of n from one realization to the next will be small, and so n effectively behaves as a continuous, deterministic variable, subject to the differential equation dn/dt = k d n + k s. That equation predicts exponential relaxation from an initial value n 0 to the steady value n* = k s / k d with e-folding time 1/ k d:
(1)The simulation bears out this expectation (Fig 2a, b).
However, mRNA populations in living cells are often not large. Nevertheless, although individual realizations of n(t) may differ significantly, the ensemble average of many such trajectories does follow the prediction of the continuous/deterministic idealization (Fig 2c). Within individual cells, however, there will be significant deviation around that mean behavior (Fig 2c). Specifically, for this case (representing a simple constitutive promoter), the steady state will have fluctuations of mRNA count n that follow a Poisson distribution (Fig 2d). That key result is more memorable for students when they discover it empirically in a simulation than it would be if they just watched the instructor prove it with abstract mathematics (by solving a master equation) (8).
State fluctuations of the sort just mentioned may suffice to pop a more complex system out of one steady state and into a very different one. Indeed, even the simplest living cells do make sudden, random state transitions of this sort. Such unpredictable behavior, not seen in the differential equation approach, can potentially be useful to bacteria, implementing a population-level bet-hedging strategy (23–26).
A real bacterium is not simply a beaker of reagents. Bacteria periodically divide, partitioning a randomly chosen subset of each mRNA species into each daughter cell. That extra level of realism is difficult to introduce into an analytical model, but straightforward in a simulation. The results are similar to those just described, with a larger effective value of the clearance rate constant (8).
2. Upgrade to cover bursting processes
Bacteria are supposedly simple organisms. The birth–death process is simple, too, and it fits with the cartoons we see in textbooks, so it is interesting for students to follow the recent discovery that the model makes quantitative predictions for mRNA production that in some cases were experimentally disproven (8, 27, 28).
Indeed, most bacterial genes are not constituently expressed; for example, many are regulated by transcription factors, whose binding/unbinding can introduce additional variation in mRNA copy numbers. Recent advances in single-molecule imaging permit the direct measurement of n(t) in individual cells and disagree with the birth–death model's prediction that the distribution of n in the steady state should be Poisson; typically, the ratio of variance to mean was found to be 5 or more (not 1 as in a Poisson process). Researchers found, however, that a simple modification of the birth–death model could accommodate this and other discrepant data. The required extension amounts to assuming that mRNA transcripts are generated in bursts, that the bursts themselves are initiated with a fixed probability per unit time, and that once initiated, a burst is also terminated with fixed probability per unit time. For example, a burst could begin with unbinding of a repressor and end when it rebinds. Many other regulatory architectures are probably realized in cells, but the model just outlined is simple and tractable and, hence, serves as a phenomenological representative for what lies beyond constitutive gene expression.
We can readily upgrade the birth–death simulation by supplementing the state variable n with a binary on/off variable; see the second code of Computer Code S2. Although the new model has two additional parameters compared with the original birth–death model, nevertheless it was overconstrained by the experimental data, so its success was a nontrivial test (Fig 3) (8, 28). Later work confirmed its predictions for how mRNA statistics should change when the level of a transcription factor was changed (29, 30). More broadly, it is instructive to point out to students that had the original authors been content with a reasonable fit to averaged data (Fig 3a), they might have accepted the birth–death model. Only when fluctuation information is included do we see that that model cannot explain the experiment (Fig 3b). Eventually, analytic results were also obtained for this model (31–35), but the original approach of Golding et al. (27) was via numerical simulations, like the one described here.
C. Proofreading
We now return to the process described in section III.D, described by models of the sort shown in Figure 1.
1. Visualization of the simulation results
To keep the project (stochastic simulation and visualization of various proofreading models) modular, we constructed simulation code that writes its state trajectory to a file (Computer Code S3). A second code then reads that file and creates a visual output (Computer Code S4). The first of these codes operates similarly to section III.C, but with a four-way choice of what transition to make after each waiting interval (Fig 1b). The second code can be almost as simple as the one described in section III.A. However, students with more time (perhaps in a capstone project) can make a more informative display with a reasonable additional effort, as follows.
The supplemental videos not only show the state that is current at the end of each video frame, they also animate the pending arrivals of new complexes that are about to bind and the departures of old ones that have unbound without incorporation. By this means, the videos give a rough sense of the narrative in the trajectory being shown (Fig 4). These improvements are not difficult to add once the basic code is working. Alternatively, students can construct the basic version, then be shown these videos.
The exponential distribution of waiting times implies that there will be episodes with several events happening rapidly, interspersed with long pauses. For this reason, it is useful to view the simulation in two ways: Once with a shorter time step that resolves most individual events but covers only a limited time interval (Video S3), and then with a coarser time step to see the entire synthesis trajectory (Video S4).
We also found it useful (solely for visualization purposes) to alter the distribution of waiting times in a simple way that relieves visual congestion without, we think, too much damage to the realism of the simulation. Our modification, shown in the supplemental videos, was simply to add a small fixed delay, for example one half of one video frame, to every transition waiting time (Fig 5).
2. Classic Hopfield–Ninio mechanism
Following Hopfield (4), we initially assume that the rate constant for incorporation k add is the same regardless of whether the tRNA is correct or incorrect. We also suppose that the binding rates k′c = k′w and ℓ′c = ℓ′w have this property; for example, they may all be diffusion-limited (3). Only the unbinding rates differ in the classic HN mechanism:
(2)Here, φ −1 = 94 and φ 3 = 7.9 are the preference factors for unbinding the wrong tRNA relative to the correct one before and after GTP hydrolysis, respectively. These values were taken from in vitro measurements (13). Again following Hopfield, in this section we also take the hydrolysis rate constants to be equal: = (and m w = m c). The remaining factor, φ 2 for condensation, was set by kinetic consistency (see Data S1.1).
For the rate constants, themselves, each is either a constant probability per unit time (unbinding and hydrolysis) or a probability per unit time with the substrate concentration already lumped in (binding and condensation). The values we chose were appropriate for the concentrations of reactants present in Zaher and Green's (13) in vitro stopped flow experiments. (See Table 1 and Data S1.1 in the Supplemental Material for a list of parameters and an explanation of the rate constants chosen.)
From the preference factors and rate constants mentioned above (summarized in Table 1), the classic HN ribosome had a simulated error of 2/1000 = 0.002 wrong incorporations, close to the in vitro measured error rate of 0.0016. This result also roughly matches the analytic predicted fraction (φ −1 φ− 3)−1/(1 + (φ −1 φ 3)−1 = 0.0013, where φ− 1 φ 3 = (94)(7.9) ≈ 740 is the predicted preference factor on the basis of Hopfield's analysis. However, the simulated speed was ≈10−4 amino acids/s, much lower than the in vitro speeds of ≈0.25–8 amino acids/s. The slow translation rate is a result of the two sequential quasi-equilibrium proofreading steps. When the speed of translation is increased, the error fraction also increases as the quasi-equilibrium is broken (Fig 6).
For visualizing the HN model in an animation, we raised the probability of incorrect choices so that wrong incorporations could be seen within a reasonable time frame. To do this, the preference ratios φ− 1 and φ− 3 were both lowered to 5. Videos S3 and S4 show the resulting behavior. Perhaps the most important impression we get from viewing these animations is that the cell is a busy place. The riot of activity, the constant binding events that end with no progress (and often not even GTP hydrolysis), are hallmarks of chemical dynamics that are hard to appreciate in textbook discussions, yet vividly apparent in the simulation. This is especially clear in Video S4, which shows a typical run of 25 amino acid incorporations. Because the simulation shows many unproductive binding and unbinding events, not every event is shown in detail in Video S4. Focusing on just the GDP-tRNA rejections shows that more correct tRNAs than incorrect tRNAs make it past GTP hydrolysis, and that the few incorrect tRNAs that do make it past are mostly rejected in the second proofreading step. However, we also see many correct complexes bind and get rejected, before or after GTP hydrolysis. This is the price paid for accuracy in the classic HN mechanism. In the instance shown, only one incorrect amino acid was incorporated out of 25 incorporations, much lower than the expected error fraction of φ −1 /(1 + φ −1) = 0.17 from single-step equilibrium binding (where φ = 5 is the preference factor) and fortuitously close to Hopfield's predicted value φ− 2/(1 + φ− 2) = 0.038.
Video S3 provides a more detailed look at this process. The videos also show clearly the jerky, nonuniform progress of synthesis, with some amino acid incorporations happening after much longer delays than others. That feature is by now well documented by single-molecule experiments.
3. Forward-rate discrimination model
Much has been learned about ribosome dynamics (37, 38) after the original insights of Hopfield (4) and Ninio (5). We now know that each step in our model consists of substeps. For example, GTP hydrolysis is subdivided into GTPase activation followed by actual hydrolysis—the latter step probably depends on a rearrangement of monitoring bases in the ribosomal RNA—and so on (39).
The simulation described in section IV.C.2 was designed to show the HN mechanism in its classic, or pure, form and how it can enhance fidelity even without help from the effects just described. For example, we assumed that the only dependence on right versus wrong tRNA was via unbinding rates. Indeed, such dependence was later seen at the single-molecule level (40), but it now appears that some of the forward rates also depend on the identity of the tRNA (13, 41, 42), an effect we will call forward-rate discrimination. This forward-rate discrimination (FRD) is in part driven by an induced-fit mechanism, whereby correct codon–anticodon pairing causes conformational changes that accelerate EF-Tu GTP hydrolysis and tRNA accommodation in site A (14–16, 43). To quantify its contribution to the error fraction of the ribosome, we simulated a ribosome with all forward rate ratios the same as that determined in vitro, but with backward rate ratios set to 1 (see Table 1, FRD ribosome, and Data S1.4). With these rates, the error was 182/1000 = 0.182, much higher than the HN ribosome error of 0.002, and the speed was 0.32 amino acids/s, which is also higher than the HN ribosome speed of 10−4 amino acids/s and closer to the range of in vitro translation rates (0.25–8 amino acids/s). Figure 5 shows that, similar to the classic HN ribosome, the FRD-only ribosome achieves lower error at slower translation rates, but at all speeds, the FRD ribosome has higher error than the HN ribosome. The FRD-only ribosome also must trade off a significant amount of accuracy to achieve in vitro translation speeds, with the proportion of wrong incorporations >0.3 at 1–10 amino acids/s.
Overall, comparing the FRD-only and HN-only models reveals that off-rate differences account for almost all of the translation accuracy, whereas on-rate differences increase the efficiency and speed of translation.
4. More realistic model
The realistic ribosome with all rates and ratios set by Zaher and Green's (13) in vitro measurements combines the off-rate and forward-rate discrimination described above (Table 1, fifth column; Data S1.2). The realistic model demonstrates that a combination of both strategies allows the ribosome to achieve both low error and high speed. For a simulation of 10 000 amino acids, the error was 18/10 000 = 0.0018, which is similar to the in vitro ribosome error, but the speed was 2.89 amino acids/s, faster than both the HN and FRD ribosome and within the range of in vitro translation rates. Figure 6 shows that the realistic ribosome does not have the speed limits of the HN ribosome or the costly tradeoff of accuracy for speed of the FRD ribosome.
Video S5 shows an animation of a simulation with the realistic ribosome rates. We see both a bias for correct tRNA binding/hydrolysis and a bias for rejection of wrong tRNAs before GTP hydrolysis. Of the 26 correct tRNA binding events in this run, 25 resulted in successful incorporation. This is much more efficient compared with the fraction 24/10 245 = 0.002 of productive correct tRNA binding events in the classic HN ribosome simulation in Videos S3 and S4. Additionally, of the 30 incorrect tRNA binding events on the realistic ribosome, all 30 resulted in rejection.
The real ribosome uses both the classic HN mechanism (quasi-equilibrium, energetic proofreading) and forward-rate discrimination; our results show that both are required to optimize speed, efficiency, and accuracy. Despite this, simulating the classic HN mechanism is still a valuable exercise for students. Most of the discrimination power is accounted for by the HN-only ribosome (Fig 6), and by visualizing discrimination via only a difference in unbinding rates, students see the minimal components necessary to attain high accuracy in a broad class of biological reactions. The classic HN mechanism illustrates an essential part of biological proofreading that fundamentally relies on nonequilibrium physics. Recent evidence also points to two kinetic proofreading steps—that is, two sequential, nearly irreversible steps, each of which can be followed by unbinding of tRNA (44, 45). Our simulation could be extended to include such effects, whereas analytic methods would quickly become intractable.
5. Role of thermodynamic driving force
Next, we return to the classic HN mechanism, this time operating at nearly equilibrium concentrations of GTP, GDP, and Pi to demonstrate the importance of the one-way door (the nonequilibrium GTP hydrolysis step). Table 1 summarizes the rates we chose for this undriven model (see Data S1.3 for more detailed description). Before a discussion, students can be given these values and asked what behavior they find.
With these rates, the reaction still creates a peptide chain, because we assumed a fixed probability per time to add an amino acid irreversibly whenever the ribosome visits its activated state. However, this time, the simulation gave an error of 1683/15 000 = 0.1122 and a speed of ≈4 × 10−5 amino acids/s. The error fraction of 0.1122 is from GDP-tRNA directly binding the ribosome, which is consistent with an expected error fraction of φ −1/(1 + φ −1) = 0.1124, where φ = 7.9. This large error illustrates the significance of the thermodynamic driving force, as the ribosome in equilibrium is both slow and error-prone.
To gain more insight into the role of the irreversible GTP hydrolysis step, some students may wish to rerun the simulation with different incorporation and hydrolysis rates. For example, a simulation with = 25 and k add = 4.14 results in a simulation with many tRNAs flipping between GDP and GTP states, another way in which the two discrimination steps effectively merge into one.
V. CONCLUSION
The models described here show fairly elementary physical principles that lie at the heart of cell biology. Specifically, gene expression and kinetic proofreading are two important, fundamental topics that are within reach of undergraduates.
A module that introduces stochastic simulation need not dominate a semester course: one class week is enough for the first exposure. Indeed, the entire simulation plus visualization in Computer Code S1 consists of just seven short lines of code, and yet it creates a valuable educational experience not available in a static textbook. Moreover, the framework is not specifically biological in character; it can serve as a stepping stone to more complex simulations relevant for a variety of courses.
Finally, the stochastic stimulations used here allowed comparison of different models for error correction, and the comparison of different minimal models is an important skill for biophysics students.