Chapter 1 Introduction

In this class we’ll use the software called R inside another software called RStudio that provides a great graphical and intuitive user interface.

R is the name of the software itself, but also the name of the programming language that is used within the software.

In this chapter:

  • Software installation
  • Installing R packages
  • NHANES datasets

Just like a cooking recipe is a series of tasks to prepare a dish starting with specific ingredients, a program is simply a list of instructions to be performed and the ingredients are the data that are provided within a dataset. The instructions are written with a programming language, in our case R.

This 5 minutes video R - Coding - 4.13 explains how R is useful in data science.

1.1 Software installation

Students should install the following two software on their computer. Both R and RStudio have versions for the three main types of computers. Once installed, working within the software is the same on all computer platforms.

TASK: Install the software on your computer.

Choose the version for your computer and follow installation instructions.

The installation process is rather intuitive. If you need more guidance the following step-by-step videos would be useful:

1.2 Installing R packages

Packages are modular additions to the R software that add functionality in the form of new functions, included datasets, documentation, etc. The standard repository of R packages The “Comprehensive R Archive Network” (CRAN) will likely be the most used for environmental health.

Adding packages is like adding gears for a more powerful engine.

Figure 1.1: Adding packages is like adding gears for a more powerful engine.

One of a “suite” of packages that we’ll use is called Tidyverse and it should preferably be installed before classes start. While Tidyverse is a suite of multiple packages, this can be installed just like a single package with that single name.

The method to add a package is rather simple:

  • Copy the following command in the R console: install.packages("tidyverse")
  • Alternatively use the Packages pane in RStudio to do the installation with the graphical interface

See also section 3 to get oriented in RStudio.

To install in RStudio follow this video Installing Packages in R Studio (Nov 20, 2012 - 2.52 min) and use the package name Tidyverse instead.

To install in R follow the demonstration in the video How to Install Packages in R (Aug 9, 2013 - 6:24min)

1.3 Datasets: NHANES

Exercises in this book will be from the National Health and Nutrition Examination Survey (NHANES)4 a survey research program conducted by the National Center for Health Statistics (NCHS)5 to assess the health and nutritional status of adults and children in the United States, and to track changes over time.

An article in FAQS.org (Beals (2008)) details the history of NHANES and how the collected data is used.

NHANES logo.

Figure 1.2: NHANES logo.

1.3.1 NHANES 2015-2016

NHANES data is collected in datasets and we’ll use datasets from the 2 year collection between 2015 and 2016.

IMPORTANT NOTE:

NHANES datasets are complex and in some cases the data may not be used as is and may require careful considerations before any conclusion is reached. Attention should be given to the existence of sub-groups. In other cases comparisons need to include sub-group weights that are included within the dataset.

See chapter 12.

NHANES file names:

The NHANES data files have succinct names, for example DEMO for demographics, with an appended suffix that is specific the the series. For example, 2015-2016 have the suffix _I and the actual file will have the root name DEMO_I while the demographics for other series would be different. For example the 2017-2018 series has suffix _J and the very first series in 1999-2000 had suffix _A. The pattern is therefore to go to the next letter each time a new series is published.

Video6: How are the data collected from the participant’s point of view NHANES Participants (English) 2:22min

See also7: The Latest Data Release and Reports from the National Health and Nutrition Examination Survey May 21, 2020 - 57:29 min

1.4 Datasets: included in R

A number of small datasets are included with R during installation. We might make use of one or more.

There is no further installation required to access the included datasets.

References

Beals, Katherine A. 2008. “Nutrition and Well-Being a to z.” Faqs.org. http://www.faqs.org/nutrition/Met-Obe/National-Health-and-Nutrition-Examination-Survey-NHANES.html.