Skip to main content
Sign inSign up

Biophysical Society Logo

Logo
ArchiveBiophysical JournalBiophysical ReportsContact UsHelp

Biophysical Society Logo

Article Contents

  • Research theme
  • How we organized the project
  • Methods and tips
  • Outcome
  • ACKNOWLEDGMENTS
  • REFERENCES
Save
Download PDF

During the spring of 2020, labs around the world suddenly closed to help slow the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the deadly COVID-19 pandemic. Among the many effects on science and education, the lab closures resulted in undergraduates losing the opportunity to work on research projects during that spring and summer and throughout the 2020–2021 academic year. Participating directly in a research project is important for undergraduate students to gain research experience and with it the mentoring and training needed to prepare them for graduate school or professional school and a future career in science. To address this need during the pandemic, I organized an online, remote, collaborative project for a team of undergraduates at the University of Illinois at Chicago (UIC) that grew to include additional undergraduates from other universities as well as several high school students and their teachers. My experience in organizing this project could serve as a model for organizing online student research projects in the future.

Research theme

The project was organized around the topic of protein structure and function using my lab's work on moonlighting proteins and the MoonProt database we constructed (http://moonlightingproteins.org) (1). The MoonProt database is an online open-access database storing expert-curated annotations for moonlighting proteins. Moonlighting proteins are defined as proteins that have more than one physiologically relevant biophysical or biochemical function within a single polypeptide chain (2). Proteins with multiple functions play integral roles in coordinating cellular pathways, sensing and responding to changes within the cell and in its environment, interactions of pathogens with host organisms, and many other vital processes in health and disease. The database is an important tool not only for researchers who specialize in the cellular processes in which the moonlighting proteins are involved, but also for those interested in understanding the relationship between protein structures and functions in general and for developing improved methods of predicting protein functions from sequences or structures.

How we organized the project

I started the project with three UIC undergraduates located in the Chicago area and added two Summer Research Opportunity Program (SROP) students in Puerto Rico (from Universidad de Puerto Rico en Cayey). After speaking at two virtual international conferences about my lab's research, I was approached by others requesting an opportunity to join the project, so I added two more undergraduate students, from Minot University in North Dakota and the V. N. Karazin Kharkiv National University in the Ukraine, and four high school students and their teachers from Cold Spring Harbor High School and Northport High School in New York state.

I met with a few students at a time in subgroups via Skype. This arrangement enabled students to meet each other and work collaboratively on a topic with their subgroup. The high school teachers organized additional meetings to mentor and assist the high school students.

The project included two parts or phases: preparing annotation of proteins to be added to the MoonProt database and analysis of the sequences or structures, or both, of the proteins in the database (Fig 1). I selected all of the proteins for the database, with the requirement that peer-reviewed published biochemical, biophysical, mutagenic, or other data supported the presence of multiple physiologically relevant functions for each protein. I provided the students with selected references about the proteins as well as their Universal Protein Resource Knowledgebase (UniProtKB) IDs (3) so that the students were working with the exact protein described in the published reports (homologous proteins do not always have both functions). The student volunteers read the journal articles and helped prepare manual annotations with the use of published papers and online resources. For example, students used amino acid sequences and the Basic Local Alignment Search Tool (BLAST; http://blast.ncbi.nlm.nih.gov/Blast.cgi) (4) to search the Protein Data Bank (rcsb.org) (5) for structures corresponding to the protein's amino acid sequence.

Fig 1 Fig 1 Fig 1
Fig 1 Schematic diagram of steps in project. The diagram summarizes the steps of training, annotation, and sequence and structural analysis. The graduate students expanded and updated that architecture of the database to provide sections for the new types of information. The professor checked all of the work and uploaded it into the online database with the help of the graduate students.

Citation: The Biophysicist 2, 2; 10.35459/tbp.2021.000190

As part of the training, all the students started by reading review articles about moonlighting proteins (2, 6), followed by a paper about the human angiotensin-converting enzyme 2 (ACE2) protein and preparing a practice annotation for that protein that we could go over together. ACE2 is a moonlighting protein that is (a) an enzyme that cleaves angiotensin to yield bioactive peptides and (b) a transmembrane chaperone that helps in folding and membrane targeting of the BoAT amino acid transporter [reviewed in Gheblawi et al. (7)]. The ACE2 protein as a training example was also very timely because it is the receptor that SARS-CoV-2 uses to attach to host cells. During the annotation stage of the project, subgroups of students were given subsets of moonlighting proteins for annotation with a theme—for example, enzymes that are also cell surface receptors, enzymes that also bind RNA or DNA (8), or transcription and splicing factors that have a second function in mitosis (9).

In the second stage, groups of students worked collaboratively and with the guidance of the principal investigator (PI) in using online tools to analyze the protein sequences, structures of the whole group of over 500 moonlighting proteins in the MoonProt database, or both. For example, some students used TMHMM (a membrane protein topology prediction method based on a hidden Markov model) to predict the locations of transmembrane helices (10), or used the protein disorder prediction system (PrDOS) and IUPred (11, 12) to identify amino acid sequences of low complexity, or retrieved information about the human proteins' connections to disease from the Online Mendelian Inheritance in Man (OMIM) database of human genes and genetic disorders (13).

Students prepared the annotations and other information in Excel worksheets; after the worksheets were checked by the PI, three current or former graduate students from the Jeffery Lab at the University of Chicago, who also helped in updating the database architecture, uploaded the new annotations and other information.

Methods and tips

The main communication methods were Skype meetings, which were held twice a week for students who were working on a short-term project (SROP), and once a week for students who were working on the project longer term (the other undergraduate students). I interacted directly with the high school students and their teachers less often, but the high school teachers mentored their students between our meetings.

Most of the files were exchanged by posting in a shared Box (www.box.com) site. In this way, PDF files about the research theme of moonlighting proteins and about the individual proteins were available to everyone without needing to find email messages with individual files. In many cases, the PI wrote notes on a paper about a protein and then scanned it before posting to help the students interpret the information in the paper. The students posted their filled-out annotation (in Excel files) to Box, which was helpful for organizing a large number of files. Additional files in which data for all the proteins in the dataset were added (e.g., prediction of transmembrane helices) were also in Excel format and posted to Box. Students were also encouraged to fill out a sample notebook page in a Word document with spaces for the student's name, date, hypothesis, methods, results, and conclusions. Even though this was a project involving annotation and analysis of sequences and structures, all these concepts in organizing an experiment were important; for example, a hypothesis might be that some protein folds would be more common than others in moonlighting proteins. Methods would include everything another person would need to reproduce the annotation or analysis, including the following: (a) information about tools and resources, such as the versions of programs, servers, databases, and so on; (b) what data were used as input; (c) what parameters or cutoffs were selected for running the programs; (d) how the results were interpreted; and (e) references for those computational tools and resources.

Because of the suddenness of the move to remote learning and the development and growth of this online project, there were a few bottlenecks. The slowest part was the selection and preparation of papers about proteins for annotation because this was done by one person. Another bottleneck was uploading the annotations because the professor was also in charge of quality control. Selecting programs for sequence and structure analysis was less of a bottleneck because most of them are routinely used in the lab. In future projects with more time for planning and preparation, proteins and papers could be selected ahead of time, enabling more timely checking and uploading of information to the online database as it becomes available. Additionally, step-by-step directions and reference lists for use of the programs, databases, and other online tools could be prepared ahead of time.

Outcome

The online remote project resulted in both updating of an important scientific tool and a valuable learning experience for undergraduate and high school students.

The updated MoonProt database includes more examples of moonlighting proteins from humans, plants, and archaea; more proteins involved in disease; and proteins with different combinations of functions, as well as more information about the individual proteins. This resource provides information for studies of moonlighting proteins that could lead to identifying characteristics that correlate with multiple functions and more broadly can be used for aiding in the interpretation and use of the rapidly increasing number of genome sequences. MoonProt can also be used to develop improved algorithms for predicting protein functions from sequence or structure, which can help increase the accuracy of annotating sequence and structural databases. The structures of proteins that have multiple functions can be used for models or scaffolds to inform protein design and engineering.

The project also served as a proof of concept of a framework for enabling participation of a diverse group of students in research at a critical time when other opportunities are limited or absent. The project provided online research opportunities to learn about protein sequences, structures, and functions for three UIC undergraduate students, two SROP undergraduate students in Puerto Rico, two additional undergraduate volunteers in the United States and internationally, and four high school students with two high school teachers. All of these students and teachers, plus three graduate students are authors of the paper in Nucleic Acids Research (1). Additionally, the two SROP students gave presentations and wrote papers for their summer program. The three UIC students continued in the lab into the academic year, with one student writing a report and receiving course credit for the fall semester and another student continuing on a related project with moonlighting proteins to fulfill the requirements of an Honors Capstone thesis. Several of the other undergraduate and high school students are continuing in collaborations with the Jeffery Lab related to moonlighting proteins. One of the graduate students included his work on the project in his PhD thesis.

This unique training experience enabled biology students to work on a collaborative and interdisciplinary computer-based analysis that used concepts from biology, physics, and chemistry. This project can also serve as a new framework enabling online undergraduate and high school research experiences beyond the time of COVID social distancing, and it could be expanded in the future to provide an opportunity for a larger number of students. An online remote project like this could also potentially provide opportunities to participate in a research project for students who have disabilities that make it challenging to do a project in a lab or students who are at community colleges or other colleges without large research programs.

ACKNOWLEDGMENTS

I thank all of the students and their teachers who helped with this project during these challenging times: Chang Chen, Haipeng Liu, Shadi Zabad, Nina Rivera, Emily Rowin, Maheen Hassan, Stephanie M. Gomez De Jesus, Paola S. Llinás Santos, Karyna Kravchenko, Mariia Mikhova, Sophia Ketterer, Annabel Shen, Sophia Shen, Erin Navas, Bryan Horan, and Jaak Raudsepp. I also thank Horan and Raudsepp for mentoring and guiding the high school students and give a special thanks to Liu, Chen, and Zabad for helping during several long days just before the deadline for submitting the Nucleic Acids Research manuscript when we had some computer difficulties and had to troubleshoot remotely.

REFERENCES

  • 1.

    C., Chen, , H., Liu , S., Zabad , N., Rivera , E., Rowin , M., Hassan , S. M., Gomez De Jesus , P. S., Llinás Santos , K., Kravchenko , M., Mikhova , S., Ketterer , A., Shen , S., Shen , E., Navas , B., Horan , J., Raudsepp , and C. Jeffery
    2021.

    MoonProt 3.0: an update of the moonlighting proteins database

    . Nucleic Acids Res 49
    (D1)
    : D368D372.

  • 2.

    C. J. Jeffery, ,
    1999.

    Moonlighting proteins

    . Trends Biochem Sci 24
    (1)
    : 811.

  • 3.

    The UniProt Consortium. 2017.

    UniProt: the universal protein knowledgebase

    . Nucleic Acids Res 45
    (D1)
    : D158D169.

  • 4.

    S., McGinnis, , and T. L. Madden
    2004.

    BLAST: at the core of a powerful and diverse set of sequence analysis tools

    . Nucleic Acids Res 32
    (Suppl 2)
    : W20W25.

  • 5.

    H. M., Berman, , J., Westbrook , Z., Feng , G., Gilliland , T. N., Bhat , H., Weissig , I. N., Shindyalov , and P. E. Bourne
    2000.

    The Protein Data Bank

    . Nucleic Acids Res 28
    (1)
    : 235242.

  • 6.

    C. J. Jeffery, ,
    2017.

    Moonlighting proteins—nature's Swiss army knives

    . Sci Prog 100
    (4)
    : 363373.

  • 7.

    M., Gheblawi, , K., Wang , A., Viveiros , Q., Nguyen , J., Zhong , A., Turner , M., Raizada , M., Grant , and G. Oudit
    2020.

    Angiotensin-converting enzyme 2: SARS-CoV-2 receptor and regulator of the renin-angiotensin system: celebrating the 20th anniversary of the discovery of ACE2

    . Circ Res 126
    (10)
    : 14561474.

  • 8.

    F. M., Commichau, , and J. Stülke
    2008.

    Trigger enzymes: bifunctional proteins active in metabolism and in controlling gene expression

    . Mol Microbiol 67
    (4)
    : 692702.

  • 9.

    M. P., Somma, , E. N., Andreyeva , G. A., Pavlova , C., Pellacani , E., Bucciarelli , J. V., Popova , S., Bonaccorsi , A. V., Pindyurin , and M. Gatti
    2020.

    Moonlighting in mitosis: analysis of the mitotic functions of transcription and splicing factors

    . Cells 9
    (6)
    : 1554.

  • 10.

    E. L., Sonnhammer, , G., von Heijne , and A. Krogh
    1998.

    A hidden Markov model for predicting transmembrane helices in protein sequences

    . In Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology (ISMB '98).
    J. I., Glasgow , T. G., Littlejohn , F., Major , R., Lathrop , D., Sankoff , and C., Sensen
    editors.
    M
    ontréal, Québec, Canada. The AAAI Press, Menlo Park, CA,
    pp.
    175182.

  • 11.

    T., Ishida, , and K. Kinoshita
    2007.

    PrDOS: prediction of disordered protein regions from amino acid sequence

    . Nucleic Acids Res 35: W460W464.

  • 12.

    Z., Dosztányi, , V., Csizmok , P., Tompa , and I. Simon
    2005.

    IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

    . Bioinformatics 21
    (16)
    : 34333434.

  • 13.

    J. S., Amberger, , C. A., Bocchini , F., Schiettecatte , A. F., Scott , and A. Hamosh
    2015.

    OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders

    . Nucleic Acids Res 43
    (D1)
    : D789D798.

  • Download PDF
Copyright: © 2021 Biophysical Society.
Citations

Get Email Alerts

Article Contents
AboutArchiveSubmit a ManuscriptJoin the Biophysical Society
biop_footer_logo.1
biop_footer_logo.2

eISSN: 2578-6970

Powered by PubFactory