RStudio (RStudio Team 2015) is an integrated graphical interface layer over the R
program (R Core Team 2017).
When RStudio is launched, an R
session is started and shown within the bottom quadrant of the RStdusio window.
Typically an RStudio session splits the screen in 4 main quadrants. Some quadrants may be split into Tabs:
R
consoleLeft | Right | |
---|---|---|
Top | Scripts | Environment | History |
Bottom | R console | R Markdown |
Files | Plots | Packages | Help | Viewer |
In the the Script area different file types can be open, the most common would be an R Script
for recording commands that can be passed onto the R
console easily.
This is the R
console that is “attached” to this RStudio session and that will execute the scripts with send.
The commands can be written within a script file (see below) or typed directly within the R
console.
You can start a new R Script
file with the following menu cascade:
File > New File > R Script
Within the script area we can type commands, one per line.
For example type the following within the `R Script: area:
print('Hello World')
Then, while the cursor is within that line, press together control
and return
(on a Mac command
return
also works.)
This action will transfer the command to R
which will run it. Therefore within the R
console you will see:
> print('Hello World')
[1] "Hello World"
i.e. the command and its output
An alternate method to activate the command (and passing it on to R
) is to click the Run
button at the top right of the Script quadrant.
In both cases, to activates multiple lines simply select them with the mouse first before pressing control
and return
or clicking on Run
.
This type of script will be seen in a whole section later on.
Now that we have issued at least one command, it will be recorded in the history
list. You can see the list of issued command by pressing the History
tab at the Top Right quadrant.
The environment variable will record and update all the R objects
that we create along with their type, size and number of records; in other word the object structure.
For example, let’s create a vector x
containing a series of 100 ramdom numbers:
x <- rnorm(100)
Note: The function rnorm()
samples random numbers from the Normal distribution
with mean zero (\(\mu\)=0) and standard deviation of one ( \(\sigma=1\)) if not otherwise specified.
If you now click on the Environment
Tab at the top right you will see information about x
that would be identical or similar to that obtained within the console with the str
command to show the structure of an object, for example with the command:
str(x)
num [1:100] 0.1256 0.0246 -0.3991 0.3895 0.1933 ...
The tabs are: Files | Plots | Packages | Help | Viewer
The Files
tab displays the content of the current working directory. If we have not yet specified an area to save file, the typical default would be the default “Home” directory such as /Users/name
on a Mac or Linux system or C:\Users\yourname
on Windows.
The list of files and folders shown within the window could also be obtained with the command:
dir()
The Plots
tab should be blank as we have not made a single plot yet.
We can create a simple plot from the 100 random numbers stored within the x
object created above with the plot()
function:
# This plot not shown
plot(x)
We can add a little fantasy by coloring the plotted points and by adding a horizontal line at zero (since the random numbers are ditributed around zero,) thickened with the linewidth command lwd=2
and colored.
plot(x, col=2)
abline(h=0, col=3, lwd=2)
If you have not clicked the Plots
tab yet do so now and you’ll see the plot.
This plot will remain visible until another plot command is issued.
Note: Here the color was given by a number where col=2
made red points and col=3
made a green line. There are only 8
colors in this default which can be listed with:
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow"
[8] "gray"
R
does have a longer list of 657 colors which can be listed exhaustively with:
colors()
The first 8 colors in the list are:
[1] "white" "aliceblue" "antiquewhite" "antiquewhite1"
[5] "antiquewhite2" "antiquewhite3" "antiquewhite4" "aquamarine"
The last 8 colors in the list are:
[1] "wheat4" "whitesmoke" "yellow" "yellow1" "yellow2"
[6] "yellow3" "yellow4" "yellowgreen"
The RStudio tab makes installing new packages very easy.
R
packages are hosted on the “The Comprehensive R Archive Network”or CRAN
web site and its mirrors. Packages provide new functionalities to R
and can “depend” on other packages which will be installed together.
We will now install a package that we’ll use a little later, and we may add new packages as we go along.
There are currently about 7,000 packages available for R
which is great but can also create challenges.
We will install the knitr
(Xie 2015) package:
Packages
tab.Install
button just belowknitr
within the “Packages” text entryInstall
button.The equivallent R
command:
install.packages("knitr")
will be issued within the R
console and the package, as well as all its dependencies will be installed.
We will make use of this functionbality of this package in another section.
The Help
tab will present R
help when information on commands are requeted. For example:
help(print)
# which can also be written as:
?print
The help pages are presented from the HTML formatted help pages.
This tab is used to present special output in web format from more advanced RStudio features.
We will not use this tab today.
Data is generically often avaiable as a table. For example a spreadsheet table.
Dataframes are useful R
tables that can contain mixed data (numbers, words, vectors, etc.)
Dataframes are a “higher order” structure “above” a simple vector or a matrix. However, it is possible to “coerce” these “lower” structures into a data frame for output purposes.
This will be useful later on…
Let’s start with a simple, small vector example: a vector of 5 random numbers
# a vector of random numbers
x2 <- rnorm(5)
There are default oputput presentations to print
these objects onto the screen:
# print x2
x2
[1] 0.2594850 -0.3755963 1.1162778 0.5345244 1.7199597
We can use the function as.data.frame()
to “coerce” vector x2
(later we’ll apply this to matrix z
.)
# x2 as dataframe:
as.data.frame(x2)
x2
1 0.2594850
2 -0.3755963
3 1.1162778
4 0.5345244
5 1.7199597
The numbers 1
throuh 5
on the left columns are the rownames
or index
of the values of x2
, i.e. the order in which they appear in x2
:
rownames(as.data.frame(x2))
[1] "1" "2" "3" "4" "5"
Let’s say that now we want to create a table where we show the value of x
in one column and the value of x * 100
and x * 1000
in adjacent columns. We can use the cbind()
function that “binds columns” to accomplish that:
as.data.frame(cbind(x2, x2*100, x2*1000))
x2 V2 V3
1 0.2594850 25.94850 259.4850
2 -0.3755963 -37.55963 -375.5963
3 1.1162778 111.62778 1116.2778
4 0.5345244 53.45244 534.5244
5 1.7199597 171.99597 1719.9597
However, note that the column names is correct for x2
but for the other 2 columns R
simply give a generic vector name: V2
and V3
in this case.
Here is a simple method to remedy that: create a new R
object Y
containing the new data frame and then change the name of the column names:
Y <- as.data.frame(cbind(x2, x2*100, x2*1000))
Then change the value of the column names:
colnames(Y) <- c("x2","x2 times 100", "x2 times 1000")
Now print-out Y
with its new column names:
Y
x2 x2 times 100 x2 times 1000
1 0.2594850 25.94850 259.4850
2 -0.3755963 -37.55963 -375.5963
3 1.1162778 111.62778 1116.2778
4 0.5345244 53.45244 534.5244
5 1.7199597 171.99597 1719.9597
A similar method can be applied to a matrix:
# a matrix of numbers with 4 rows and 5 colums
# containing numbers 1 through 20
z <- matrix(1:20, 4,5)
# print z
z
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
# z as dataframe:
as.data.frame(z)
V1 V2 V3 V4 V5
1 1 5 9 13 17
2 2 6 10 14 18
3 3 7 11 15 19
4 4 8 12 16 20
We note that the column names again have the generic names of “V1” to “V5”.
With the same method this could be changed with our own column and row names. We’ll first convert matrix z
into data frame Z
and then change the row and column names:
# Create an object to contain the dataframe
Z <- as.data.frame(z)
# change the column names:
colnames(Z) <- c("col1", "col2", "col3", "col4", "col5")
# change the row names:
row.names(Z) <- c("row1", "row2", "row3","row4")
# Print the "new Z" with its new column and row names:
print(Z)
col1 col2 col3 col4 col5
row1 1 5 9 13 17
row2 2 6 10 14 18
row3 3 7 11 15 19
row4 4 8 12 16 20
Note: we created the column names as a vector with the combine c()
function “on the fly” but the vectors could be created first as an R
object separately and then assigned to Z
.
Note also that the command is different for rows and columns, one of them has a .
in the middle:
colnames()
row.names()
This is part of the “fun difficulties” of working with R
!
Earlier we installed package knitr
so now we can use a function of knitr
called kable()
to format tabular data into a nice output:
# First we load the knitr package
library(knitr)
Z
as a tableNow we can write a table example with e.g. the matrix z
transformed earlier in dataframe Z
with updated row and column names:
kable(Z, padding = 0)
col1 | col2 | col3 | col4 | col5 | |
---|---|---|---|---|---|
row1 | 1 | 5 | 9 | 13 | 17 |
row2 | 2 | 6 | 10 | 14 | 18 |
row3 | 3 | 7 | 11 | 15 | 19 |
row4 | 4 | 8 | 12 | 16 | 20 |
View options for kable()
with the help command ?kable
which would explain the following, different output from command:
Note that the output will only make sense for the type of format requested for final output. The HTML format will appear correct only on this document saved as HTML and will appear bady formatted on PDF or DOCX versions of this document.
kable(Z, format="html", caption = "Table 1: This table was a matrix.", padding = 10)
col1 | col2 | col3 | col4 | col5 | |
---|---|---|---|---|---|
row1 | 1 | 5 | 9 | 13 | 17 |
row2 | 2 | 6 | 10 | 14 | 18 |
row3 | 3 | 7 | 11 | 15 | 19 |
row4 | 4 | 8 | 12 | 16 | 20 |
Below the “latex” format will appear only on PDF output and will remain blank on other outputs:
kable(Z, format="latex")
Y
as a table:Y
is a 3-column dataframe representing x2
, 100
times x2
and 1000
times x2
that we created earlier. We can limit the number of decimal digits shown with the extra command digit=2
as an option:
kable(Y, digit=2)
x2 | x2 times 100 | x2 times 1000 |
---|---|---|
0.26 | 25.95 | 259.49 |
-0.38 | -37.56 | -375.60 |
1.12 | 111.63 | 1116.28 |
0.53 | 53.45 | 534.52 |
1.72 | 172.00 | 1719.96 |
You can try adding padding=0
or padding=1
to see the effect on the output.
In the next section we will refine our use of RStudio by creating an “dynamic” document.
R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org.
RStudio Team. 2015. RStudio: Integrated Development Environment for R. Boston, MA: RStudio, Inc. http://www.rstudio.com/.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. http://yihui.name/knitr/.
3.2 Comments
When writing code it is useful to document what the commands mean or what we are trying to do. A “gift” of information to your “future self” or colleagues.
The symbol
#
can be used to write comment lines that are ignored byR
but that might prove useful in the future. For example: