Based on webinar by Dr. Jeremy Chacon Beginners Introduction to R Statistical Software1
The mock yeast experiment table used in the webinar (yeast_example.txt
) can be obtained from this short link: https://go.wisc.edu/mc5d52
It is always good practice to keep projects wihtin a separate directory.
Change directory to the one on the desktop with setwd()
and verify with getwd()
. This commands assumes that the directory exists already. Create it on your computer first if necessary, and download the yeast_example.txt
(see above) within it.
setwd("~/Desktop/R_intro_2018/Yeast_demo")
getwd()
Note: On a Windows computer it would be something like this:
C:/Users/etc/etc/etc/
(using the forward slash/
)
list.files()
[1] "Demo_yeast_files" "Demo_yeast.docx"
[3] "Demo_yeast.html" "Demo_yeast.md"
[5] "Demo_yeast.pdf" "demo_yeast.R"
[7] "Demo_yeast.Rmd" "mystyles.docx"
[9] "RStudio_yeast_demo.Rproj" "yeast_example.md"
[11] "yeast_example.txt" "yeast_example.xlsx"
Note: the command dir()
would give the same result.
dir()
List *.txt
files within the directory with either list.files()
or dir()
specifying the pattern searched:
dir(pattern = ".txt")
[1] "yeast_example.txt"
Read data, specifying that the first line is a header, into variable named yeast_eg
# yeast_eg = read.table('yeast_example.txt', header=T)
# Update due to change in R 4.0.x
read.table('yeast_example.txt', header = T, stringsAsFactors = T) yeast_eg =
The first 6 lines of the data look like this:
head(yeast_eg)
genotype drug treatment OD_change
1 WT none WT_no_drug 3.2
2 WT none WT_no_drug 2.8
3 WT none WT_no_drug 3.1
4 WT none WT_no_drug 3.3
5 WT none WT_no_drug 2.6
6 WT nocodazole WT_nocodazole 1.2
During an interactive session the following command will open a spreadsheet-like tab or window showing all the data in tabular format.
View(yeast_eg)
The structure and summary of the data look like this:
str(yeast_eg)
'data.frame': 20 obs. of 4 variables:
$ genotype : Factor w/ 2 levels "mad2_del","WT": 2 2 2 2 2 2 2 2 2 2 ...
$ drug : Factor w/ 2 levels "nocodazole","none": 2 2 2 2 2 1 1 1 1 1 ...
$ treatment: Factor w/ 4 levels "mad2_del_no_drug",..: 3 3 3 3 3 4 4 4 4 4 ...
$ OD_change: num 3.2 2.8 3.1 3.3 2.6 1.2 1.5 1.3 1.9 0.7 ...
summary(yeast_eg)
genotype drug treatment OD_change
mad2_del:10 nocodazole:10 mad2_del_no_drug :5 Min. :0.700
WT :10 none :10 mad2_del_nocodazole:5 1st Qu.:2.125
WT_no_drug :5 Median :2.650
WT_nocodazole :5 Mean :2.425
3rd Qu.:2.925
Max. :3.300
Optionally we can alaos create a nice looking table with some added command (that may require loading additional R
pacakges, so it it does not work now that’s OK.) Here is the complete dataset wihtin the table:
library(knitr)
kable(yeast_eg)
genotype | drug | treatment | OD_change |
---|---|---|---|
WT | none | WT_no_drug | 3.2 |
WT | none | WT_no_drug | 2.8 |
WT | none | WT_no_drug | 3.1 |
WT | none | WT_no_drug | 3.3 |
WT | none | WT_no_drug | 2.6 |
WT | nocodazole | WT_nocodazole | 1.2 |
WT | nocodazole | WT_nocodazole | 1.5 |
WT | nocodazole | WT_nocodazole | 1.3 |
WT | nocodazole | WT_nocodazole | 1.9 |
WT | nocodazole | WT_nocodazole | 0.7 |
mad2_del | none | mad2_del_no_drug | 2.7 |
mad2_del | none | mad2_del_no_drug | 2.9 |
mad2_del | none | mad2_del_no_drug | 3.0 |
mad2_del | none | mad2_del_no_drug | 2.5 |
mad2_del | none | mad2_del_no_drug | 3.1 |
mad2_del | nocodazole | mad2_del_nocodazole | 2.2 |
mad2_del | nocodazole | mad2_del_nocodazole | 2.4 |
mad2_del | nocodazole | mad2_del_nocodazole | 2.9 |
mad2_del | nocodazole | mad2_del_nocodazole | 2.5 |
mad2_del | nocodazole | mad2_del_nocodazole | 2.7 |
Accessing specific columns in the data table can be done in 2 ways:
Using the $
sign between the name of the dataset and the name of the column. For example: yeast_eg$genotype
The with()
function allows a more elegant writing. The first argument is the dataset, here yeast_eg
. The second command will be typically be a function into which is specified the name of the column to use. For example: with(yeast_eg,summary(genotype))
.
with(yeast_eg,summary(genotype))
mad2_del WT
10 10
The following comman with plot the genotype on the horizontal x
axis and the OD change on the vertical y
axis:
with(yeast_eg, plot(genotype, OD_change))
Note: Using the $
nomenclature would create the exact same plot: plot(yeast_eg$genotype, yeast_eg$OD_change)
.
We can observe that the OD change is higher, on average for mad2_del
as indicated by the thick line within the box representing the median
.
Thus for now it appear that the growth rate is greater in mad2_del
even when we add the drug nocodazole which should sop the cells from growing.
But to confirm this hypothesis we need to look at the data a few different more ways.
We can now look at the effect of the drug on the OD change.
with(yeast_eg, plot(drug, OD_change))