Web-based Molecular Biology Tools

Molecular biology is the study of biological macromolecules at the structural and functional level, particularly DNA and proteins. There are many free resources on the Internet to study various aspects of these primary constituents. The following is a list of some of these web-based tools and a brief description with some verbiage used from the native site. This is not a comprehensive list, but it is meant to provide a good starting point for researchers. Some resources appear in more than one category.

General Sites

  • BYU DNA Sequencing Center Resources
    The DNA Sequencing Center (DNASC) at Brigham Young University has also created an online resource page with additional resources.
    DBGET is a simple database retrieval system for finding and obtaining specific entries of diverse databases. Here a database is simply considered a sequential collection of entries, which may be stored in a single file or multiple files. Because each entry of a database is given a unique identifier, molecular biology databases in the world can be retrieved uniformly by the combination of the database name and the identifier.
  • European Bioinformatics Institute
    European Bioinformatics Institute (EBI) is a center for research and services in bioinformatics. The Institute manages databases of biological data including nucleic acid, protein sequences and macromolecular structures.
  • Expasy
    Molecular server that is dedicated to the analysis of protein and nucleic acid sequence. Protein identification and characterization tools:

    • Identification and characterization with peptide mass fingerprinting data
    • Identification and characterization with MS/MS data
    • Identification with isoelectric point, molecular weight and/or amino acid composition
    • Other prediction or characterization tools, MS data (vizualisation, quantitation, analysis, etc.), and 2-DE data (image analysis, data publishing, etc.).
  • Java based Molecular Biologist’s Workbench
    This site contains a workbench of tools for DNA and protein analysis: Data entry, data manipulation, data analysis, genetical and functional site mapping, and primer design.
  • National Center for Biotechnology Information
    NCBI’s mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. It contains links to the Genbank database, tools for data mining including BLAST, COGS, MapViewer, LocusLink, UniGen, ORF finder, Electronic PCR, VAST search, CCAP, Human-Mouse Homology maps, VecScreen, and Cancer Genome Anatomy Project. Also provides access to Entrez: a retrieval system for searching several linked databases, including PubMed, Nucleotide sequence database, protein sequence database, structure, genome, population data sets, Online Mendelian Inheritance in Man, taxonomy, 3D domains, ProbeSet, and online books.
  • National Center for Genome Resources
    The National Center for Genome Resources (NCGR) contains information and links to various genome related projects.

Nucleic Acid Sequencing Tools

  • Biosyn Gizmo Tools
    Bundle of databases (siRNA, protein, peptide antigen) and tools, including a Bioinformatic Glossary, Genetic Code Table, Nucleic Acids and Protein Calculations, and an Oligo Properties Calculator.
    Searches for sequence homology between your sequence and those in the databases. BLASTN will perform search in DNA sequences; BLASTX will translate your sequence in all 6 frames and perform a search in protein sequences.
  • Codon Usage Database
    A query box to search a codon usage table for an organism, is presented. Search can be done with Latin name or its sub-string of organism. Useful for creation of primers and probes.
  • Sequence Manipulation Suite (SMS)
    The Sequence Manipulation Suite in BioSyn’s Gizmo Tools is a collection of JavaScript programs for generating, formatting, and analyzing short DNA and protein sequences. It is commonly used by molecular biologists, for teaching, and for program and algorithm testing.

Genomic Resources

  • GenomeNet
    GenomeNet is a Japanese network of database and computational services for genome research and related research areas in molecular and cellular biology. GenomeNet was established in September 1991 under the Human Genome Program (HGP) of the Ministry of Education, Science, Sports and Culture (MESSC).
  • National Center for Genome Resources
    National Center for Genome Resources (NCGR) contains information and links to various genome related projects.
  • SoftBerry
    Softberry, Inc. is a leading developer of software tools for genomic research. Their primary areas of interest and expertise are in the following areas: *Genome annotation *Functional site identification in DNA and Proteins *Sequence database managing *Genome comparison *Expression data analysis *Protein structure prediction. *Protein compartment (destination) prediction.
  • UCSC Genome Browser
    The University of California, Santa Cruz (UCSC) Genome Browser website contains the reference sequence and working draft assemblies for a large collection of genomes.
  • db GAP (NCBI)
    The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
  • Ensembl
    The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.

Protein Sequence Analysis Tools

  Expasy
    Molecular server that is dedicated to the analysis of protein and nucleic acid sequence. Protein identification and characterization tools:

    • Identification and characterization with peptide mass fingerprinting data
    • Identification and characterization with MS/MS data
    • Identification with isoelectric point, molecular weight and/or amino acid composition
    • Other prediction or characterization tools, MS data (vizualisation, quantitation, analysis, etc.), and 2-DE data (image analysis, data publishing, etc.).
  • FramePlot
    Protein coding region prediction in Bacterial DNA.
  • MPEx
    Membrane Protein Explorer (MPEx) is a tool for exploring the topology and other features of membrane proteins by means of hydropathy plots based upon thermodynamic principles.
  • PredictProtein
    PredictProtein is an Internet service for sequence analysis and the prediction of protein structure and function. Users submit protein sequences or alignments; PredictProtein returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localization signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions, disulfide-bonds, sub-cellular localization, and functional annotations. Upon request fold recognition by prediction-based threading, CHOP domain assignments, predictions of transmembrane strands and inter-residue contacts are also available.
  • ProDom
    ProDom is a protein domain family database constructed automatically by clustering homologous segments. The ProDom building procedure MKDOM2 is based on recursive PSI-BLAST searches [ALTS2]. The source protein sequences are non-fragmentary sequences derived from SWISS-PROT and TrEMBL databases.
  • ProtScale
    ProtScale allows you to compute and represent the profile produced by any amino acid scale on a selected protein. An amino acid scale is defined by a numerical value assigned to each type of amino acid. The most frequently used scales are the hydrophobicity or hydrophilicity scales and the secondary structure conformational parameters scales, but many other scales exist which are based on different chemical and physical properties of the amino acids. This program provides 57 predefined scales entered from the literature.
  Sequence Manipulation Suite (SMS)
    The Sequence Manipulation Suite in BioSyn’s Gizmo Tools is a collection of JavaScript programs for generating, formatting, and analyzing short DNA and protein sequences. It is commonly used by molecular biologists, for teaching, and for program and algorithm testing.
  • Worldwide Protein Data Bank (wwPDB)
    The wwPDB maintains a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community.

3D Macromolecular Structure Tools

  • Cn3D
    Cn3D is a helper application for web browsers that allows you to view 3-dimensional structures from NCBI’s Entrez retrieval service. Cn3D runs on Windows, Mac, and Unix. Cn3D simultaneously displays structure, sequence, and alignment, and now has powerful annotation and alignment editing features.
  • DeepView
    Swiss-PdbViewer (aka DeepView) is an application that provides a user friendly interface allowing to analyze several proteins at the same time. The proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts. Amino acid mutations, H-bonds, angles and distances between atoms are easy to obtain thanks to the intuitive graphic and menu interface.
  • Povray
    When used with Swiss-PDB viewer the rendered output image appears much sharper and the colors are more vivid.
  • RasMol
    Protein Explorer, a RasMol-derivative, is the easiest-to-use and most powerful software for looking at macromolecular structure and its relation to function. It runs on Windows or Mac computers. RasMol users will find its menus very familiar, and it understands RasMol commands. It is very fast: rotating a protein or DNA molecule shows its 3D structure.
  • RCSB Protein Database
    The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease. This site offers tools for browsing, searching, and reporting that utilize the data resulting from ongoing efforts to create a more consistent and comprehensive archive. The Research Collaboratory for Structural Bioinformatics (RCSB) is a non-profit consortium dedicated to improving our understanding of the function of biological systems through the study of the 3-D structure of biological macromolecules.

Phylogeny Tools

    PHYLIP is a free package of programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of executables.
  • TreeView
    TreeView is a simple program for displaying phylogenies on Apple Macintosh and Windows PCs. It can be used to view PHYLIP generated phylogeny trees.


re3data.org: Registry of Research Data Repositories

re3data.org logo

Re3data launched at the tail end of 2012 with the goal of registering all research data repositories. These research data repositories are collections of datasets usually associated with a particular discipline or a particular geographic region. Because of the way data repositories have cropped up on an as-needed basis over the past 50 years, these repositories are myriad and take a specialized knowledge to navigate the options in any academic field.

Research data represents the lion’s share of effort for universities. The value of research data within universities is without peer; however, this data is often vulnerable to loss due to poor preservation practices. Data repositories provide long-term storage and potentially enable access to datasets, while also promoting reproducibility of research. Although this storage and access provide a clear benefit to the researcher, the funding agencies who support research can be the stimulus for researchers to use a data repository. For example, the National Science Foundation requires dissemination and sharing of research results:

Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.

Dissemination and Sharing of Research Results – National Science Foundation

Certain publishers also stipulate the use of data repositories, such as this example for Scientific Data, a Nature Publishing Group journal:

Scientific Data mandates the release of datasets accompanying our Data Descriptors, but we do not ourselves host data. Instead, we ask authors to submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, community-recognized repositories where possible, or to generalist repositories if no suitable community resource is available.

Recommended Data Repositories – Nature

For librarians, the benefits of data repositories are fairly clear. Repositories manage, organize, preserve, enable discovery of, and usually provide a persistent identifier for data. Re3data allows librarians to point researchers in the right direction regarding repositories. Re3data provides a basic search feature equipped with 27 facets to narrow or refine a search. Each repository record is tagged with icons to let uses know if the repository provides additional information about its service, if it is open, restricted, or closed access, and what persistent identifier is used (i.e. DOI, URN, ARK, handle, Purl, or other).

View of Re3data’s search interface
View of Re3data’s search interface

Users can also browse by country, subject, or content type (ex. Raw data, audiovisual data, source code, to name a few.) The subject browse function is particularly attractive:

View of Re3data’s browse wheel
View of Re3data’s browse wheel

Users can select a discipline and the wheel will react and narrow the search with a rotating animation action.

Also, librarians who play a role in their own institution repository can suggest their repository to be included in Re3data. Data repositories considered for inclusion must be run by a legal entity, clarify access conditions, and have a focus on research data (see: http://www.re3data.org/suggest).

Look to the Stars: History of Astronomy Collections from Adler Planetarium

It’s been a few months now since the ALA Annual Meeting in Chicago, and I am still thinking about the Adler Planetarium! A group of librarians from the Science & Technology Section were lucky enough to get a tour of the Webster Institute for the History of Astronomy, which manages the Adler’s collections. The collections include rare books, historic photographs and scientific instruments, and much more. It was amazing to see some of these beautiful and fascinating materials up close.

STEM Preprint Repositories: Where Are They Now?

In light of the one year anniversary of engrXiv, and the recent creation of AgriXiv and PsyArXiv, we wanted to highlight the availability of preprint repositories for STEM disciplines.  Preprint services provide free, open access to research articles.  The goal of these sites include disseminating “knowledge quickly and efficiently” (1), “providing a free, open access outlet for new findings” (2), and making “research outputs …  immediately available to all the stakeholders for understanding and finding suitable solutions” (3).

Researchers often add their preprints to these repositories, so they are called preprint servers, but postprints and published versions might be included as well. Awareness of the open access movement is spreading, and more researchers have a desire to make their research articles open.  Reasons for publishing work open access include funding agency mandates to make research results publicly available, and researcher desire to make their work accessible for increased visibility and public good.  We wanted to highlight STEM disciplinary repositories so science librarians can help patrons both find and share open access scientific research.

arXiv was founded in 1991 as an electronic archive for research articles from physics, math, computer science, nonlinear science, quantitative biology, quantitative finance, and statistics.  arXiv is operated by Cornell University Library, and there are over 1 million submissions to arXiv.  

Life science researchers can use bioRxiv, a free online archive and distribution service for unpublished preprints. It is operated by Cold Spring Harbor Laboratory, and was launched in 2013.  It has about 13,500 content items.  

AgriXiv, engrXiv, and PsyArXiv were all founded within the last year or so.  All three use Open Science Framework’s preprint service.  AgriXiv, preprints for agriculture and allied sciences, was founded earlier in 2017, and does not have lot of content added yet.  AgriXiv stresses the “importance of agricultural research to meet the demands for food production and … livelihood promotion” and the “growing need for dedicated research sharing and dissemination” to “facilitate the sharing of interim research for public good” (3). Learn more at the AgriXiv blog.  engrXiv was founded in 2016, and is dedicated to the “dissemination of engineering knowledge quickly and efficiently” (1).  In addition to the preprint server, engrXiv has a blog to share news related to the site.  Currently, there are about 130 posts on engrXiv.  PsyArXiv, an open-access preprint service for psychological sciences, was founded late in 2016, but already has a large amount of content added, about 700 posts.  You can learn more at the PsyArXiv blog.  

ChemRxiv is an open preprint server for chemistry, still under development by the American Chemical Society (ACS).  ChemRxiv is intended to be a collaborative undertaking to facilitate the discoverability of scientific research.  Interested users can sign up for alerts to get news and updates about ChemRxiv.  

Finally, a note about the Center for Open Science and the Open Science Framework, as these may be helpful open access resources for science librarians and their patrons.  The Center for Open Science is a nonprofit company that aims “to increase openness, integrity, and reproducibility of research” (4).  Open Science Framework is their free and open source tool for research project management across the entire research lifecycle. Researchers can collaborate with their groups, make their projects accessible, and store and archive research data, protocols, and materials.


  1. About engrXiv. (2016, July). Retrieved August 12, 2017, from http://blog.engrxiv.org/about/
  2. Introducing PsyArXiv: Psychology’s dedicated open access digital archive. (2016, December). Retrieved August 12, 2017, from http://blog.psyarxiv.com/
  3. AgriXiv. (2017, February). Retrieved August 12, 2017, from https://agrixiv.wordpress.com/
  4. Brian Nosek. (n.d.). A Brief History of COS. Retrieved August 12, 2017, from https://cos.io/about/brief-history-cos-2013-2017/


Emily Gari, Science & Engineering Librarian, University of Colorado Boulder

The Encyclopedia of Life is 10 years old!

The Encyclopedia of Life is 10 years old!  It is freely available on the web.  From their statistics, as of May 11, 2017, they have 5.5 million pages.  Responsibilities are shared by interested groups and individuals.  “The founding partners of the project include the Field Museum of Natural HistoryHarvard University, the Marine Biological Laboratory, the Smithsonian Institution, and the Biodiversity Heritage Library.  The Missouri Botanical Garden later joined, and negotiations are ongoing with the Atlas of Living Australia.  Other partners are the American Museum of Natural History (New York), Natural History Museum (London), New York Botanical Garden, and the Royal Botanic Gardens (Kew).”

The Surgeon General’s Office in the United States Army started an index of all holdings in its library in 1880. The various volumes were printed until 1961. Because the Army Medical Library became the largest medical library in the world in the late 1890s, the Index-Catalogue of the Library of the Surgeon-General’s Office, 1880-1961, can be considered an almost complete compilation of the medical literature. Continue reading

NCBI Bioinformatics Tools: Protein, BLAST, COBALT, and Cn3D Structure Viewer

This tutorial is a step-by-step guide for searching for motifs for the SET domain, which I have taught for epigenetics students.

“For example, a protein called Clr4 from S. pombe contains the SET domain. How could you find mammalian homologous of Clr4? Let’s assume that you find 8 proteins in human database containing SET domain. How close are they? Can we draw a tree out of it? Can we align all these protein sequences together and compare their similarity, and find the most conserved motif (like GXGNA) shared with all these proteins? If I would like to know where this motif located in 3D structure, can we look at it on the published protein structure database?”

Data in the Time of Cholerics: Where to Find Preserved Federal Data

During the recent change in federal government, researchers and librarians were concerned about loss of access to federal data, particularly in the area of environmental science where the new administration’s policies appeared to contradict scientific consensus. Early indications suggested that federal datasets and scientific information would be removed from the web entirely, or at least restricted in access.

Alleviating the high cost of science textbooks with Open Educational Resources

OER Global Logo by Jonathas Mello is licensed under a Creative Commons Attribution Unported 3.0 License

Academic institutions are searching for ways to alleviate the financial burden that the increasing cost of textbooks places on their students. The average student spends $1200 annually for books and supplies, according to the Open Textbook Network.  Science textbooks are especially expensive, but OERs, or Open Educational Resources, are gaining acceptance as alternatives to traditional textbooks (Open Textbook Network, 2017).

Apps: What are engineering students using?

one smartphone with colorful application icons (3d render)A survey conducted by the Pew Research Center in the fall of 2016 reports that 80% of US adults with some college and 89% of US college graduate own a smartphone.   (Pew Research Center, January 11, 2017).  Not surprising as the smartphone is the go-to for recent news, connecting with friends and family, and learning new things.  We know that smartphones are popular with the college population but how are college students using them in support of their study, in particular engineering students?

