Chapter 9 Intermission: data wrangling
NHANES datasets are “curated” and are created following standard practice resulting in datasets listed in tabular data formatted in a way well suited for R
.
This section is here as an “intermission” in the form of a lecture by Garrett Grolemund, Data Scientist and Master Instrutor at RStudio, split into 4 YouTube videos. The whole four parts are listed here, but the most important for treating NHANES data would be Part 3 about the dplyr
Tidyverse package. Part 1 would review what was learned in the previous chapter (8) and Part 2 is about the tidyr
package that helps reformat the data, a very useful tool but not really necessary for NHANES data.
Description of the RStudio videos:
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr
and dplyr
, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
Title | Link | Time |
---|---|---|
Part 1: What is data wrangling? Intro, Motivation, Outline, Setup | https://youtu.be/jOd65mR1zfw | 8:26 |
Part 2: Tidy Data and tidyr |
https://youtu.be/1ELALQlO-yM | 17:36 |
Part 3: Data manipulation tools: dplyr |
https://youtu.be/Zc_ufg4uW4U | 19:34 |
Part 4: Working with Two Datasets: Binds, Set Operations, and Joins | https://youtu.be/AuBgYDCg1Cg | 7:23 |
9.1 Part 3 here
HTML version has Part 3 embedded here:
Pt. 3: Data manipulation tools: dplyr
https://youtu.be/Zc_ufg4uW4U
02:00 -
dplyr::select
03:40 -
dplyr::filter
05:05 -
dplyr::mutate
07:05 -
dplyr::summarise
08:30 -
dplyr::arrange
09:55 - Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
11:45 -
dplyr::group_by