This tutorial is based on the Templated RNA-Seq Workflow1 on the DNASTAR Tutorial web page2.
The tutorial is called “templated” as sequence reads are aligned to the genome template.
Note: A separate tutorial exists for cases where there is no template genome.
Templated RNA-Seq uses next-gen sequencing to show the presence of RNA at a particular moment. RNA can be indentified and quantified by alignment to the genome.
This tutorial is meant to become familiar with the DNASTAR software for next-gen sequencing. In this tutorial we will use two DNASTAR software:
SeqMan NGen
will be used to align RNA-Seq data onto the genome.ArrayStar
will be used tp analyse the completed RNA-Seq alignment assembly.Note: While
SeqMan NGen
exists for both Mac and Windows,ArrayStar
is a Windows-only software (as it is based on the Microsoft .Net framework.)
In this tutorial, we will compare stationary phase RNA from wild-type Listeria monocytogenes cells with that from mutant cells that do not express sigma B, a major transcriptional regulator (see Oliver et al. (2009) and Appendix below.)
The tutorial is split in two parts following the software used:
SeqMan NGen
ArrayStar
Choose which OS you prefer to work in. However, note that Part B can only be run within Windows and the file(s) from Part A would need to be transferred on the Windows side unless there is a way to share a directory.
For this tutorial it is therefore advised to run both Parts A and B under Windows.
For this tutorial we need access to the DNASTAR software which is installed on the class iMacs.
The data for the workshop can be found on the DNASTAR Tutorial web page as T3_Templated_RNA-Seq.zip
5
“Finding T3_Templated_RNA-Seq.zip on DNASTAR Tutorial web page”
\(\bigodot\) TASK: Dowload the data to your desktop and unzip it.
The resulting directory will contain 5 files:
Listeria monocytogenes 4b F2365.NC_002973.6.gbk
sigB_1.fastq
sigB_2.fastq
wt_1.fastq
wt_2.fastq
The .gbk
file is a GeneBank sequence file for the complete genome of the bacteria Listeria monocytogenes.
The fastq
files are sequencing reads. There are two replicates each for the wild type (wt
) and for an isogenic strain lacking Sigma B. (See also Appendix A below.)
On both Mac and Windows you can launch individual DNASTAR software by finding them on the hard drive or for example the “Start” menu under Windows.
However, the DNASTAR Navigator
consolidates all DNASTAR software in one place and may make it easier to launch any of the desired software.
SeqMan NGen
is located under Genomics within the Navigator.
Click on SeqMan NGen
to launch.
Perhaps the easiest way would be to use “Spotlight Search” (top right “magnifying glass icon”) and start typing the name of the software. For SeqMan NGen
it should appear after the first few letters are typed.
However, for this tutorial it is recommended to use Windows since the second part of the tutorial requires a Windows-only software.
Click on the “Start” button (bottom left - looks like a 4 white squares) and scroll down to the letter D where DNASTAR should be listed.
Click on the downward pointing arrow on the right hand side of the name and find the software you need. e.g. DNASTAR Navigator 14
or directly SeqMan NGen
.
The first screen after the launch offers 3 choices:
\(\bigodot\) TASK: Click Assemble on local computer.
“Three choices on first screen**“
On the next screen “Choose Assembly Workflow”" screen, select Transcriptome / RNA-Seq and press Next.
“select Transcriptome / RNA-Seq”
In the “Choose Assembly Type”" screen, select Reference based assembly and click Next.
“select Reference based assembly”
“Press Add and select reference sequence.”
“Select file and click Open.”
Note: If a reference sequence had not been provided with the tutorial data, you could have downloaded an L. monocytogenes genome here using the Download NCBI Genomes button.
In the Input Sequence Files and Define Experiments or Individual Replicates screen: (See illustration below.)
sigB_1
, sigB_2
, wt_1
or wt_2
, as appropriate for that row.
“Follow all steps to set-up files and define experimental details.”
In the Group Individual Replicates into Replicate Sets screen:
sigB
and click OK.wt
replicates, naming the set “wt.”
“Group replicates to define experiment.”
In the Set Up Experiments screen, check the Is Control box to the right of wt
. Then click Next.
“Group replicates to define experiment.”
In the Assembly Options screen, check Haploid (since this is a bacterial genome)
There is nothing else to change on that screen.
Then click Next.
“Assembly options: check Haploid.”
In the Assembly Output screen:
“Assembly options: check Haploid.”
In the “Your assembly is ready to begin” screen is revealed the script created by our previous clicks. However, all you have to do is press Start Assembly to begin the assembly.
“Start Assembly.”
Assembly will be complete within about 5 minutes depending on hardware configuration.
Wait until being informed that assembly has finished, then click Next.
“Finish Assembly.”
“Finish Assembly.”
If you are on a Windows system the ArrayStar software will launch. See part B for continuing the analysis.
If you are on a Macintosh the transfer will not work and a warning message will appear:
“Finish Assembly.”
Note: The file
Templated RNA-Seq.astar
can be transfered on the Windows side to continue the analysis.
In Part B, we will analyze the results of the RNA-Seq assembly in ArrayStar by using a “quick gene set” to locate a potential operon structure.
An “operon” is a group of one or more genes that are transcribed as a single RNA unit.
In this section of the tutorial, we will create a “quick gene set,” then use the Gene Table to search for potential operon structures.
Either use the DNASTAR Navigator
opened earlier (ArrayStar
is listed under the Genomics category,) or find ArrayStar
within the Windows “Start” menu on the bottom left (see beginning of tutorial above.)
“Use the ‘Start’ button to launch ArrayStar
.”
When we ran SeqMan NGen
we saved a file called Templated RNA-Seq.astar
which will serve as the start for the analysis. This file is compiled as a “project” and therefore:
Within the first panel in ArrayStar
under Get Started choose Open a project… and navigate to where the file was saved (probably within a directory called “Templated RNA-Seq_RNA-Seq”)
Note: In order to be [allegedly] “helpful” Windows will hide know filename extensions. Therefore your file will appear without the
.astar
extension, which can be confusing.
“Open project file Templated RNA-Seq.astar
.”
Click Open
Note: It will take about 30 seconds to 1 minute to load and display the data under the “Scatter Plot” tab.
Before continuing any further click on the “Experiment List” tab
“Click on the”Experiment List" tab.“
Depending on how the data was loaded into ArrayStar, you will see either an RNA-Seq folder or both an RNA-Seq and Variant folder. In the latter case, select the Variant folder, then right-click on it and choose Delete. When prompted, press the Delete button.
“Right Click on folder”Variant" to delete it.“
You will be warned with: “Are you sure you wish to delete 4 experiments? You will not be able to undo this deletion.”
The variant analysis is used as part of another DNASTAR tutorial and it is safe to remove these for our purpose: Click the Delete button.
To access the Quick Gene Set Creation dialog, use the menu command Graphs > Venn Diagrams and then press the Quick gene set creation button.
“Menu: Graphs > Venn Diagrams then click Quick gene set creation.”
This will open the “Step 1” comparison workflow window options and in the next section we will chose one.
The window panel offers 3 different methods to compare and the experimental material:
In the center section of Step 1, Compare Experiments to a Baseline, select a Baseline Experiment of wt, and then click the Select button just below.
“Choose wt as the baseline.”
Keep everything as the presented default: click button Move to Step 3 (Comparisons).
In Step 3, keep the Signal Threshold and Fold Change boxes checked, but uncheck the P value box.
Also remove the checkmark by the Up box, to the right of Fold Change.
The filter is now set up to find genes in the sigB mutant samples that have a >= 2-fold downward change, compared to the wildtype, and an RPKM signal value >= 10.
Press Finish.
“Uncheck P value and Up as marked.”
Open the Set List by using the menu cascade Data > Show Set List. Note that the newly-created quick gene set is already selected and called sigBxwt, 2 fold down, signal>=10.
“Menu cascade Data > Show Set List.”
DNASTAR software makes heavy use of icons that may not have menu items equivallents.
Such is the region of the ArrayStar panel called “Actions section” (see illustration below.)
In the Actions section, click the link Select and show the table of this set’s Genes (2nd icon from the left as illustrated below.)
“Use the second button on the Actions section.”
While only three columns appear initially, the Gene Table can display a variety of gene name and annotation fields, notes, expression levels, and statistical calculations.
“Resulting Gene Table is first shown with only 3 columns.”
We will add some columns in the next section.
On the Actions section of icons click the Add/Manage Columns tool () to open the Manage Columns dialog.
“Add/Manage Columns. Step to add Log2 values.”
On the same line of icons click the Add Fold Change tool (.)
Specify a Control of wt and an Experiment of sigB, then press OK.
“Specify control and experiment samples”
The Gene Table should now contain seven columns.
Due to the choice made above all fold changes show a down direction.
Click once on the column header for Target Range to sort all of the genes in the project in ascending order of appearance on the assembly.
Scroll down, noting that the genes within the “quick gene set” remain selected in blue and are interspersed along the whole table as illustrated below.
“Genes within the”quick gene set" remain selected in blue (arrows.)"
To remove genes that are not in the “quick gene set” from the table, click on the Choose Quick-Filter tool ( and select Show Only Selected Genes.
“Show only the”quick gene set" in the table."
To identify possible operon structures, scroll down the Gene Table, noting sections where consecutive, or nearly consecutive, genes show similar trends in expression levels and fold changes. One candidate for an operon would be the four overlapping (or adjacent, in one case) genes starting with LMOf2365_0912 and ending with LMOf2365_0915.
This also happens to be the location of the sigB gene (arrow) :
“Potential operon structures.”
Check the list, you may find more, for example:
moeA 9.190 21.931 1072071..1073294 3.20002 4.45488 2.386 down
mobB 6.003 24.406 1073273..1073758 2.58574 4.60917 4.065 down
moaE 5.199 22.291 1073755..1074177 2.37825 4.47840 4.287 down
moaC 11.041 25.622 1074422..1074904 3.46479 4.67930 2.320 down
moaA 4.094 23.153 1074933..1075934 2.03364 4.53315 5.654 down
moaB 10.216 26.778 1076457..1075969 3.35276 4.74299 2.621 down
“Figure 8 from Dworkin M., Falkow S., Rosenberg E., Schleifer K.-E., Stackebrandt E. (2006). Molybdo-cofactor biosynthesis genes.”
From Dworkin M., Falkow S., Rosenberg E., Schleifer K.-E., Stackebrandt E. (2006): Molybdopterin Cofactor Biosynthesis.
In S. carnosus, nine genes were identified (Fig. 8), all of which appear to be involved in molybdenum cofactor biosynthesis
(Note: As of this writing, the book PDF can be downloaded from the Springer web site http://link.springer.com/referencework/10.1007%2F0-387-30744-3 )
opuCD 9.620 28.024 1437146..1436475 3.26596 4.80862 2.913 down
opuCC 4.612 17.954 1438087..1437161 2.20553 4.16620 3.892 down
opuCB 7.037 15.689 1438745..1438089 2.81504 3.97171 2.229 down
opuCA 7.067 19.397 1439942..1438749 2.82103 4.27777 2.744 down
The operon, designated opuC, consists of four genes which are predicted to encode an ATP binding protein (OpuCA), an extracellular substrate binding protein (OpuCC), and two membrane-associated proteins presumed to form the permease (OpuCB and OpuCD). The operon is preceded by a potential SigB-dependent promoter. (Fraser et al. 2000)
The data used in the tutorial was published in (Oliver et al. 2009) and is available for download on the Gene Expression Omnibus (GEO) under accession number GSE156516
Oliver et al. (2009) info:
The four sample files have been renamed on the DNASTAR web site. The names on the GEO site are labeled as:
File Name | Replicate name |
---|---|
GSM391674 | 10403S_replicate1 |
GSM391675 | DsigB_replicate1 |
GSM391676 | 10403S_replicate2 |
GSM391677 | DsigB_replicate2 |
In L. monocytogenes, 168 genes were positively regulated by sigmaB; 145 of these genes were preceded by a putative sigmaB consensus promoter (Raengpradub, Wiedmann, and Boor 2008.)
The genes positively regulated by sigmaB were classified into nine functional categories:
L. monocytogenes is a non-spore-forming facultative intracellular pathogen that causes listeriosis, a serious invasive disease in both animals and humans. To establish a food-borne bacterial infection, L. monocytogenes must have the ability to survive under a variety of stress conditions, including those encountered in a wide range of nonhost environments and food matrices, as well as under rapidly changing conditions encountered during gastrointestinal passage (exposure to organic acids, bile salts, and osmotic gradients) and subsequent stages of infection (e.g., in the intracellular environment). L. monocytogenes sigmaB is activated following exposure to a number of environmental stress conditions […] and contributes to bacterial survival under acid and oxidative stresses and during carbon starvation […] (Raengpradub, Wiedmann, and Boor 2008.)
A survey of best practices for RNA-seq data analysis7 Conesa et al. (2016a), Conesa et al. (2016b)
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium10
Conesa, A., P. Madrigal, S. Tarazona, D. Gomez-Cabrero, A. Cervera, A. McPherson, M. W. Szcze?niak, et al. 2016a. “A survey of best practices for RNA-seq data analysis.” Genome Biol. 17 (January): 13. https://www.ncbi.nlm.nih.gov/pubmed/26813401.
———. 2016b. “Erratum to: A survey of best practices for RNA-seq data analysis.” Genome Biol. 17 (1): 181. https://www.ncbi.nlm.nih.gov/pubmed/27565134.
Dworkin M., Falkow S., Rosenberg E., Schleifer K.-E., Stackebrandt E., ed. 2006. The Prokaryotes A Handbook on the Biology of Bacteria. 3rd ed. Vol. 4. Bacteria: Firmicutes, Cyanobacteria. New York NY: Springer. http://link.springer.com/referencework/10.1007%2F0-387-30744-3.
Fraser, K. R., D. Harvie, P. J. Coote, and C. P. O’Byrne. 2000. “Identification and characterization of an ATP binding cassette L-carnitine transporter in Listeria monocytogenes.” Appl. Environ. Microbiol. 66 (11): 4696–4704. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC92368/.
Oliver, H. F., R. H. Orsi, L. Ponnala, U. Keich, W. Wang, Q. Sun, S. W. Cartinhour, M. J. Filiatrault, M. Wiedmann, and K. J. Boor. 2009. “Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs.” BMC Genomics 10 (December): 641. http://www.ncbi.nlm.nih.gov/pubmed/11581570.
Raengpradub, S., M. Wiedmann, and K. J. Boor. 2008. “Comparative analysis of the sigma B-dependent stress responses in Listeria monocytogenes and Listeria innocua strains exposed to selected stress conditions.” Appl. Environ. Microbiol. 74 (1): 158–71. https://www.ncbi.nlm.nih.gov/pubmed/18024685.
Schaik, W. van, and T. Abee. 2005. “The role of sigmaB in the stress response of Gram-positive bacteria – targets for food preservation and safety.” Curr. Opin. Biotechnol. 16 (2): 218–24. https://www.ncbi.nlm.nih.gov/pubmed/15831390.
http://www.dnastar.com/t-support-tutorials.aspx#/seqman_ngen_tutorials/#!Documents/tutorial3templatedrnaseqworkflow.htm↩
https://s3.amazonaws.com/star-deploy/SupportingDataFiles/LG14/SeqMan+NGen+Tutorial+Data/T3_Templated_RNA-Seq.zip↩
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15651↩
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8↩
https://genome.ucsc.edu/ENCODE/protocols/dataStandards/ENCODE_RNAseq_Standards_V1.0.pdf↩
http://www.epigenesys.eu/en/protocols/bio-informatics/1283-guidelines-for-rna-seq-data-analysis↩