This is a summary of notes during the presentation of guest speaker Dr Kumaran Baskaran, author of the RBMRB package
Note: the interactive plots are only visible on the HTML version of this document. PDF and Docx versions will not show them.
RBRMB is an R package to access data from the Biological Magnetic Resonance Data Bank - BMRB, a repository for data from NMR spectroscopy on proteins, peptides, nucleic acids, and other biomolecules.
This short document is a set of notes from a “hands-on” session, but more complete “how to” and descriptions can be found on the RBMRB documentation1.
The RBMRB can be installed from the R line command with:
install.packages('RBMRB)or from the Packages tab within **[RStudio]https://www.rstudio.com/products/rstudio/download/)**.
Note: There are many dependencies that will be installed automatically at the same time.
Before it can be used the package needs to be loaded within R with the command:
library(RBMRB)The RBMRB package contains commands to access data directly from the within the web page of Biological Magnetic Resonance Data Bank - BMRB online database.
At the top of the web page is a search field to request accession code of compounds of interest.
The database contains on chemical shift data, while 3D coordinates are available from the Protein Data Bank{^2].
Therefore wtih the RBMRB package we will only be interested in chemical shift meta data. These can be fetched by accession code (see above.) For example here we get data for entry 15060 (NMR structure of the murine DLC2 (deleted in liver cancer -2) SAM (sterile alpha motif) domain*.)
We’ll save the chemical shift data in the R object cs:
cs <- fetch_entry_chemical_shifts(15060)You can explore the data table with these commands:
To see a spreadsheet like format:
View(cs)The see the first 6 lines of data:
head(cs)You will see that the data contains 24 variables (columns.) If you explore further you may also find that there are 974 observations (rows.) This can be confirmed by checking the dimension:
dim(cs)[1] 974 24
We are mostly interested in the column Val containing the value of the chemical shift. This is the 11th column in the data. We could access this column simply by using the command:
cs$Valwhich would display all values within that column on the screen.
The following command is meant to simplify the data by keeping only the relevant columns. The command convert_cs_to_n15hsqc is part of the RBMRB package. See ?convert_cs_to_n15hsqc for help.
spect_data <- convert_cs_to_n15hsqc(cs)The usualy “base R” commands are available, by choosing the columns we need. For example:
plot(spect_data$H, spect_data$N)Which could also be written as
with(spect_data, plot(H,N))The package contains built-in graphical functions that take advantage of modern plotting packages installed as dependencies such as ggplot2 and plotly providing interactive plots. This is one reason why a plot is sent first towards an R object, here plt.
plt <- HSQC_15N(15060)
pltSee Wikipedia definition of TOCSY2.
This plot will connect all oxygen atoms of one residue to try to identify amino acids.
plt2 <- TOCSY(15060)
plt2However this plot is very busy and not so useful here.
In BMRB assigments have already been done and can be used for comparison.
plt3 <- HSQC_15N(c(11505, 11547))
plt3Each entry will have a different color
The plot can be modified by adding a connecting line between related residues:
plt3 <- HSQC_15N(c(11505, 11547), 'line')
plt3Important note: those far away on the plot represent change in frequency NOT spacial distance! An example where this can happen could be closeness to water.
plt4 <- HSQC_15N(18857)
plt4Modify to connect by line to see the changes:
plt4 <- HSQC_15N(18857, 'line')
plt4