Chapter 1 Introduction
In this class we’ll use the software called R
inside another software called RStudio
that provides a great graphical and intuitive user interface.
R
is the name of the software itself, but also the name of the programming language that is used within the software.
In this chapter:
- Software installation
- Installing
R
packages - NHANES datasets
Just like a cooking recipe is a series of tasks to prepare a dish starting with specific ingredients, a program is simply a list of instructions to be performed and the ingredients are the data that are provided within a dataset. The instructions are written with a programming language, in our case R
.
This 5 minutes video R - Coding - 4.13 explains how R
is useful in data science.
1.1 Software installation
Students should install the following two software on their computer. Both R
and RStudio
have versions for the three main types of computers. Once installed, working within the software is the same on all computer platforms.
TASK: Install the software on your computer.
R
- from The Comprehensive R Archive Network at cran.r-project.orgRStudio
- from rstudio.com
Choose the version for your computer and follow installation instructions.
The installation process is rather intuitive. If you need more guidance the following step-by-step videos would be useful:
- Installing R and RStudio on Windows 10 (March 20, 2020 - 3min 23sec)
- Installing R and Rstudio on MacOS (Mar 22, 2020 - 4min)
1.2 Installing R packages
Packages are modular additions to the R
software that add functionality in the form of new functions, included datasets, documentation, etc. The standard repository of R
packages The “Comprehensive R Archive Network” (CRAN) will likely be the most used for environmental health.
One of a “suite” of packages that we’ll use is called Tidyverse
and it should preferably be installed before classes start. While Tidyverse
is a suite of multiple packages, this can be installed just like a single package with that single name.
The method to add a package is rather simple:
- Copy the following command in the
R
console:install.packages("tidyverse")
- Alternatively use the
Packages
pane in RStudio to do the installation with the graphical interface
See also section 3 to get oriented in RStudio.
To install in RStudio follow this video Installing Packages in R Studio (Nov 20, 2012 - 2.52 min) and use the package name Tidyverse
instead.
To install in R
follow the demonstration in the video How to Install Packages in R (Aug 9, 2013 - 6:24min)
1.3 Datasets: NHANES
Exercises in this book will be from the National Health and Nutrition Examination Survey (NHANES)4 a survey research program conducted by the National Center for Health Statistics (NCHS)5 to assess the health and nutritional status of adults and children in the United States, and to track changes over time.
An article in FAQS.org (Beals (2008)) details the history of NHANES and how the collected data is used.
1.3.1 NHANES 2015-2016
NHANES data is collected in datasets and we’ll use datasets from the 2 year collection between 2015 and 2016.
IMPORTANT NOTE:
NHANES datasets are complex and in some cases the data may not be used as is and may require careful considerations before any conclusion is reached. Attention should be given to the existence of sub-groups. In other cases comparisons need to include sub-group weights that are included within the dataset.
See chapter 12.
NHANES file names:
The NHANES data files have succinct names, for example DEMO
for demographics, with an appended suffix that is specific the the series. For example, 2015-2016 have the suffix _I
and the actual file will have the root name DEMO_I
while the demographics for other series would be different. For example the 2017-2018 series has suffix _J
and the very first series in 1999-2000 had suffix _A
. The pattern is therefore to go to the next letter each time a new series is published.
Video6: How are the data collected from the participant’s point of view NHANES Participants (English) 2:22min
See also7: The Latest Data Release and Reports from the National Health and Nutrition Examination Survey May 21, 2020 - 57:29 min