2024-06-10

Introduction

In this class we’ll use the software called R inside another software called RStudio that provides a great graphical and intuitive user interface.

R is the name of the software itself, but also the name of the programming language that is used within the software.

In this chapter:

  • Software installation.
  • Installing R packages.
  • NHANES datasets.

Introduction - programming

Just like a cooking recipe is a series of tasks to prepare a dish starting with specific ingredients, a program is simply a list of instructions to be performed and the ingredients are the data that are provided within a dataset.

The instructions are written with a programming language, in our case R.

Introduction video - R - Coding - 4.1

.

This 5 minutes video R - Coding - 4.1[^R_coding_4.1] explains how R is useful in data science. [^R_coding_4.1]:https://youtu.be/xp1l7utYFGs.

Software installation

Students should install the following two software on their computer.

Both R and RStudio have versions for the three main types of computers.

Once installed, working within the software is the same on all computer platforms.

Software installation -Task

Software installation

Software installation on macOS

Installing R packages

Packages are modular additions to the R software that add functionality in the form of new functions, included datasets, documentation, etc.

The standard repository of R packages The “Comprehensive R Archive Network” (CRAN) will likely be the most used for environmental health.

Adding packages is like adding gears for a more powerful engine.

Adding packages is like adding gears for a more powerful engine.

Installing R packages: Tidyverse

One of a “suite” of packages that we’ll use is called Tidyverse and it should preferably be installed before classes start.

While Tidyverse is a suite of multiple packages, this can be installed just like a single package with that single name.

The method to add a package is rather simple: Copy the following command in the R console:

install.packages("tidyverse")

Installing R packages - Tidyverse

Alternatively use the Packages pane in RStudio to do the installation with the graphical interface. To install in RStudio follow this video Installing Packages in R Studio (Nov 20, 2012 - 2.52 min) and use the package name Tidyverse instead.

.

Installing R packages - Tidyverse

Datasets: NHANES

Exercises in this book will be from the National Health and Nutrition Examination Survey (NHANES)1 a survey research program conducted by the National Center for Health Statistics (NCHS)2 to assess the health and nutritional status of adults and children in the United States, and to track changes over time.

An article in FAQS.org (Beals (2008)) details the history of NHANES and how the collected data is used.

NHANES logo.

NHANES logo.

NHANES 2015-2016

NHANES data is collected in datasets and we’ll use datasets from the 2 year collection between 2015 and 2016.

IMPORTANT NOTE:

  • NHANES datasets are complex and in some cases the data may not be used as is and may require careful considerations before any conclusion is reached.
  • Attention should be given to the existence of sub-groups.
  • In other cases comparisons need to include sub-group weights that are included within the dataset.

See chapter @ref(usingNHANESweights).

NHANES - File Names

NHANES file names:

The NHANES data files have succinct names, for example DEMO for demographics, with an appended suffix that is specific the the series.

Example: 2015-2016 have the suffix _I and the actual file will have the root name DEMO_I while the demographics for other series would be different.

Example: the 2017-2018 series has suffix _J and the very first series in 1999-2000 had suffix _A.

The pattern is therefore to go to the next letter each time a new series is published.

NHANES 2015-2016

NHANES 2015-2016

Datasets: included in R