1 Analysis of yeast growth data

Based on webinar by Dr. Jeremy Chacon Beginners Introduction to R Statistical Software1

The mock yeast experiment table used in the webinar (yeast_example.txt) can be obtained from this short link: https://go.wisc.edu/mc5d52

1.1 Set working directory

It is always good practice to keep projects wihtin a separate directory.

Change directory to the one on the desktop with setwd() and verify with getwd(). This commands assumes that the directory exists already. Create it on your computer first if necessary, and download the yeast_example.txt (see above) within it.

setwd("~/Desktop/R_intro_2018/Yeast_demo")
getwd()

Note: On a Windows computer it would be something like this: C:/Users/etc/etc/etc/ (using the forward slash /)

1.2 List all files in directory

list.files()
 [1] "Demo_yeast_files"         "Demo_yeast.docx"         
 [3] "Demo_yeast.html"          "Demo_yeast.md"           
 [5] "Demo_yeast.pdf"           "demo_yeast.R"            
 [7] "Demo_yeast.Rmd"           "mystyles.docx"           
 [9] "RStudio_yeast_demo.Rproj" "yeast_example.md"        
[11] "yeast_example.txt"        "yeast_example.xlsx"      

Note: the command dir() would give the same result.

dir()

1.3 List “txt” files and read data

List *.txt files within the directory with either list.files() or dir() specifying the pattern searched:

dir(pattern = ".txt") 
[1] "yeast_example.txt"

Read data, specifying that the first line is a header, into variable named yeast_eg

# yeast_eg = read.table('yeast_example.txt', header=T) 
# Update due to change in R 4.0.x
yeast_eg = read.table('yeast_example.txt', header = T, stringsAsFactors = T) 

2 Examine data

The first 6 lines of the data look like this:

head(yeast_eg)
  genotype       drug     treatment OD_change
1       WT       none    WT_no_drug       3.2
2       WT       none    WT_no_drug       2.8
3       WT       none    WT_no_drug       3.1
4       WT       none    WT_no_drug       3.3
5       WT       none    WT_no_drug       2.6
6       WT nocodazole WT_nocodazole       1.2

During an interactive session the following command will open a spreadsheet-like tab or window showing all the data in tabular format.

View(yeast_eg)

The structure and summary of the data look like this:

str(yeast_eg)
'data.frame':   20 obs. of  4 variables:
 $ genotype : Factor w/ 2 levels "mad2_del","WT": 2 2 2 2 2 2 2 2 2 2 ...
 $ drug     : Factor w/ 2 levels "nocodazole","none": 2 2 2 2 2 1 1 1 1 1 ...
 $ treatment: Factor w/ 4 levels "mad2_del_no_drug",..: 3 3 3 3 3 4 4 4 4 4 ...
 $ OD_change: num  3.2 2.8 3.1 3.3 2.6 1.2 1.5 1.3 1.9 0.7 ...
summary(yeast_eg)
     genotype          drug                  treatment   OD_change    
 mad2_del:10   nocodazole:10   mad2_del_no_drug   :5   Min.   :0.700  
 WT      :10   none      :10   mad2_del_nocodazole:5   1st Qu.:2.125  
                               WT_no_drug         :5   Median :2.650  
                               WT_nocodazole      :5   Mean   :2.425  
                                                       3rd Qu.:2.925  
                                                       Max.   :3.300  

Optionally we can alaos create a nice looking table with some added command (that may require loading additional R pacakges, so it it does not work now that’s OK.) Here is the complete dataset wihtin the table:

library(knitr)
kable(yeast_eg)
genotype drug treatment OD_change
WT none WT_no_drug 3.2
WT none WT_no_drug 2.8
WT none WT_no_drug 3.1
WT none WT_no_drug 3.3
WT none WT_no_drug 2.6
WT nocodazole WT_nocodazole 1.2
WT nocodazole WT_nocodazole 1.5
WT nocodazole WT_nocodazole 1.3
WT nocodazole WT_nocodazole 1.9
WT nocodazole WT_nocodazole 0.7
mad2_del none mad2_del_no_drug 2.7
mad2_del none mad2_del_no_drug 2.9
mad2_del none mad2_del_no_drug 3.0
mad2_del none mad2_del_no_drug 2.5
mad2_del none mad2_del_no_drug 3.1
mad2_del nocodazole mad2_del_nocodazole 2.2
mad2_del nocodazole mad2_del_nocodazole 2.4
mad2_del nocodazole mad2_del_nocodazole 2.9
mad2_del nocodazole mad2_del_nocodazole 2.5
mad2_del nocodazole mad2_del_nocodazole 2.7

3 Data exploration

3.1 Accessing column data

Accessing specific columns in the data table can be done in 2 ways:

  • Using the $ sign between the name of the dataset and the name of the column. For example: yeast_eg$genotype

  • The with() function allows a more elegant writing. The first argument is the dataset, here yeast_eg. The second command will be typically be a function into which is specified the name of the column to use. For example: with(yeast_eg,summary(genotype)).

with(yeast_eg,summary(genotype))
mad2_del       WT 
      10       10 

3.1.1 Exploratory plots

The following comman with plot the genotype on the horizontal x axis and the OD change on the vertical y axis:

with(yeast_eg, plot(genotype, OD_change))

Note: Using the $ nomenclature would create the exact same plot: plot(yeast_eg$genotype, yeast_eg$OD_change).

We can observe that the OD change is higher, on average for mad2_del as indicated by the thick line within the box representing the median.

Thus for now it appear that the growth rate is greater in mad2_del even when we add the drug nocodazole which should sop the cells from growing.

But to confirm this hypothesis we need to look at the data a few different more ways.

We can now look at the effect of the drug on the OD change.

with(yeast_eg, plot(drug, OD_change))