Tabular data analysis with R and Tidyverse: Environmental Health
Last updated: 07 July, 2022
Preamble
The course book is based on a tutorial course for the 2020 “Summer Research Opportunities Program” (SROP) for “Underrepresented Racial Minority” (URM) at the University of Wisconsin-Madison (Vice Provost (2013), and Archived)
The main objective of this course is to learn how to analyze tabular datasets of environmental health data using the software R
within the RStudio
interface.
This course is also a preparation on reproducible research using dynamic documents for the analysis of environmental health data from the “Center for Disease Control and Prevention” (CDC) “National Center for Health Statistics” (NCHS) repository of “National Health and Nutrition Examination Survey” (NAHANES) datasets. This type of large tabular data is typical and will provide a number of useful examples.
A special distinction between “classic R
” and “Tidyverse” nomenclature will be highlighted.
This course book is available online in 2 formats on link shown below as a shortened URL:
HTML is the primary format for easier Copy/Paste interaction. PDF is easier to print or download and contains a useful Index.
“Environmental Health is the field of science that studies how the environment influences human health and disease.”
National Institute of Environmental Health Science NIEHS
Data and observations are usually collected in the form of numbers and gathered into tables representing the data in columns and rows.
Learning goals
During this course we’ll acquire new skills:
R
and Rstudio
software with additional packages- Understand programming concepts such as variables, conditional statements, data stream, and pipelines
- Examine, compare and contrast data
- Illustrate analyzes with graphics and plots
- Compose reproducible reports that can be automated
At the end of the course you’ll have acquired sufficient proficiency and independence to use the software R
within the RStudio
graphical interface to analyze complex environmental datasets in tabular form and create useful and reproducible reports with annotated graphics.
Software used during this tutorial
R
- from The Comprehensive R Archive Network at cran.r-project.orgRStudio
- from rstudio.com
We’ll also install additional “modules” within R
called “packages” to add functionality and make analysis easier.
References
Biochemistry Dept., jsgro@wisc.edu↩︎
Population Health Sciences, kmalecki@wisc.edu↩︎