flowchart TB
subgraph Manual["❌ Manual Way"]
direction TB
M1["Run R analysis"] --> M2["Copy to Excel/Word"]
M2 --> M3["Format manually"]
M3 --> M4["Paste into paper"]
M4 -.->|"Data changes?"| M1
M5["🔄 Repeat everything"]
end
subgraph Code["✅ Code Way"]
direction TB
C1["Write code once"] --> C2["Tables & figures<br/>embedded in document"]
C2 --> C3["Render to paper"]
C3 -.->|"Data changes?"| C4["Just re-render"]
C4 --> C3
end
Part 5: Tables & Figures
Code-generated output that updates automatically
Learning Goals
By the end of this section, you will:
- Create publication-ready tables with
gtsummary - Create figures with
ggplot2 - Use
tidy()to extract and compare model coefficients - Run subgroup analysis using
nest()+map()(Part 4 skills!) - Experience the power of Git (the “surprise” exercise!)
Why Code-Generated Tables?
The Code Way: Write code once → Embedded results update automatically → No copy-paste errors
The Manual Way: Run analysis → Copy to Word → Format manually → Data changes → repeat everything
Setup
If you’re continuing from Part 4, data_clean is already loaded. If not, add this setup chunk:
analysis.qmd
#| label: setup
#| message: false
library(tidyverse)
library(gtsummary)
library(gt)
library(here)
# Load cleaned data from Part 3
data_clean <- readRDS(here("results", "data_clean.rds"))Hands-on: Demographics Table
Step 1: Create a basic table
Add this code chunk to your analysis.qmd:
analysis.qmd
#| label: tbl-demographics
#| tbl-cap: "Participant characteristics by education level"
data_clean |>
select(Education, Age, Gender, Race1, BPSysAve) |>
tbl_summary(by = Education)Step 2: Customize the table
analysis.qmd
#| label: tbl-demographics-custom
#| tbl-cap: "Participant characteristics by education level"
data_clean |>
mutate(Gender = str_to_title(Gender)) |>
select(Education, Age, Gender, Race1, BPSysAve) |>
tbl_summary(
by = Education,
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
label = list(
Age ~ "Age (years)",
Gender ~ "Sex",
Race1 ~ "Race/Ethnicity",
BPSysAve ~ "Systolic BP (mmHg)"
),
missing = "no"
) |>
add_overall() |>
bold_labels()Tip: Ask AI “How do I add confidence intervals to gtsummary?” for customization help.
Hands-on: Create a Figure
Step 1: Box plot by education
analysis.qmd
#| label: fig-bp-education
#| fig-cap: "Distribution of systolic blood pressure by education level"
#| fig-width: 7
#| fig-height: 5
data_clean |>
ggplot(aes(x = Education, y = BPSysAve, fill = Education)) +
geom_boxplot(alpha = 0.7) +
stat_summary(
fun = mean,
geom = "point",
shape = 18,
size = 3,
color = "darkred"
) +
labs(
x = "Education Level",
y = "Systolic Blood Pressure (mmHg)",
title = "Blood Pressure Distribution by Education Level"
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "none",
axis.text.x = element_text(angle = 45, hjust = 1)
)Step 2: Reference in text
In your text, you can write:
As shown in @tbl-demographics, participants varied by education level.
Blood pressure distribution is visualized in @fig-bp-education.Hands-on: Regression Table
In epidemiology, we almost always report regression results. Let’s create a publication-ready regression table.
Step 1: Fit a linear regression model
analysis.qmd
#| label: fit-model
# Model: Blood pressure ~ Education + covariates
model_fit <- lm(BPSysAve ~ Education + Gender + Race1, data = data_clean)
# View the results
summary(model_fit)This model estimates the association between education level and blood pressure, adjusting for sex and race/ethnicity.
Step 2: Create a publication-ready table
analysis.qmd
#| label: tbl-regression
#| tbl-cap: "Association between education level and systolic blood pressure"
model_fit |>
tbl_regression(
label = list(
Education ~ "Education level",
Gender ~ "Sex",
Race1 ~ "Race/ethnicity"
)
) |>
bold_labels()What gtsummary does: Converts your model into a formatted table with coefficients, confidence intervals, and p-values — no manual formatting needed!
Step 3: Reference in text
Results of the linear regression are shown in @tbl-regression.Extracting Results with tidy()
gtsummary is great for publication-ready tables, but sometimes you need more control:
- Extract specific coefficients for inline text
- Compare estimates across multiple models
- Create custom tables with specific formatting
The broom package provides tidy() to convert model output into a plain data frame.
Step 1: Convert model to data frame
analysis.qmd
#| label: tidy-basics
library(broom)
# tidy() converts model output to a data frame
model_fit |> tidy(conf.int = TRUE)Now you have a tibble with columns: term, estimate, std.error, statistic, p.value, conf.low, conf.high.
Step 2: Extract specific coefficients
analysis.qmd
#| label: tidy-extract
# Extract just the College Grad coefficient
model_fit |>
tidy(conf.int = TRUE) |>
filter(term == "EducationCollege Grad") |>
select(term, estimate, conf.low, conf.high, p.value)Why this matters: You can now use this value in inline code, combine with other models, or format however you want.
Comparing Models with tidy()
A common epidemiology task: comparing crude and adjusted estimates to assess confounding.
Step 1: Fit two models
analysis.qmd
#| label: fit-two-models
# Model 1: Without age adjustment
model_crude <- lm(BPSysAve ~ Education + Gender + Race1, data = data_clean)
# Model 2: With age adjustment
model_adjusted <- lm(BPSysAve ~ Education + Gender + Race1 + Age, data = data_clean)Step 2: Extract and compare coefficients
analysis.qmd
#| label: compare-models
# Extract College Grad coefficient from both models
crude_result <- model_crude |>
tidy(conf.int = TRUE) |>
filter(term == "EducationCollege Grad") |>
mutate(model = "Crude (no age)")
adjusted_result <- model_adjusted |>
tidy(conf.int = TRUE) |>
filter(term == "EducationCollege Grad") |>
mutate(model = "Age-adjusted")
# Combine into comparison table
comparison <- bind_rows(crude_result, adjusted_result) |>
select(model, estimate, conf.low, conf.high, p.value)
comparisonInterpretation: If the coefficient changes substantially between models, age is a confounder of the education-BP relationship.
Step 3: Visualize the comparison
analysis.qmd
#| label: fig-comparison
#| fig-cap: "Comparison of crude and age-adjusted estimates for College Grad vs 8th Grade"
#| fig-width: 6
#| fig-height: 3
comparison |>
ggplot(aes(x = estimate, y = model)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) +
geom_point(size = 3, color = "steelblue") +
labs(
x = "Difference in systolic BP (mmHg)\nCollege Grad vs. 8th Grade",
y = NULL
) +
theme_minimal(base_size = 12)Step 4: Create a publication-ready table
analysis.qmd
#| label: tbl-comparison
#| tbl-cap: "Comparison of crude and age-adjusted estimates for education-BP association"
comparison |>
gt() |>
fmt_number(columns = c(estimate, conf.low, conf.high), decimals = 1) |>
fmt_number(columns = p.value, decimals = 3) |>
cols_merge(
columns = c(conf.low, conf.high),
pattern = "({1}, {2})"
) |>
cols_label(
model = "Model",
estimate = "β",
conf.low = "95% CI",
p.value = "P-value"
)You can output your comparison as a figure (Step 3) or as a formatted table — choose whichever fits your needs.
Subgroup Analysis: Combining Part 4 Skills
Now let’s put Part 4 skills to work. We’ll create a publication-ready table showing results for Overall and by Gender, with both Crude and Age-adjusted models as columns.
This combines everything you’ve learned: - Part 4: nest() + map() for iterating over subgroups - New skill: pivot_wider() for reshaping results into a matrix layout
Expected Output
| N | Crude β (95% CI) | Age-adjusted β (95% CI) | |
|---|---|---|---|
| Overall | 4,621 | -7.2 (-9.1, -5.3) | -3.1 (-5.0, -1.2) |
| Male | 2,198 | -6.8 (-9.5, -4.1) | -2.5 (-5.2, 0.2) |
| Female | 2,423 | -7.5 (-10.1, -4.9) | -3.6 (-6.2, -1.0) |
Step 1: Define a function that fits BOTH models
This function returns results from both Crude and Age-adjusted models:
analysis.qmd
#| label: define-fit-models
# Function to fit Crude + Adjusted models and return both results
fit_models <- function(data) {
# Skip if sample size is too small
if (nrow(data) < 30) {
return(tibble(
model = c("Crude", "Adjusted"),
estimate = NA_real_,
conf.low = NA_real_,
conf.high = NA_real_,
n = nrow(data)
))
}
# Crude model (no age adjustment)
crude <- lm(BPSysAve ~ Education, data = data) |>
tidy(conf.int = TRUE) |>
filter(term == "EducationCollege Grad") |>
mutate(model = "Crude")
# Age-adjusted model
adjusted <- lm(BPSysAve ~ Education + Age, data = data) |>
tidy(conf.int = TRUE) |>
filter(term == "EducationCollege Grad") |>
mutate(model = "Adjusted")
# Return both results with sample size
bind_rows(crude, adjusted) |>
mutate(n = nrow(data))
}Step 2: Run for Overall + Gender subgroups
analysis.qmd
#| label: run-subgroup-analysis
# Overall results
overall_results <- data_clean |>
fit_models() |>
mutate(subgroup = "Overall")
# Results by Gender using nest() + map()
gender_results <- data_clean |>
group_by(Gender) |>
nest() |>
mutate(results = map(data, fit_models)) |>
unnest(results) |>
mutate(subgroup = str_to_title(Gender)) |>
select(-data, -Gender)
# Combine all results
all_results <- bind_rows(overall_results, gender_results)
all_resultsWhat happened:
fit_models()ran on the full dataset → 2 rows (Crude + Adjusted)nest()+map()ranfit_models()on Male and Female → 4 rowsbind_rows()combined everything → 6 rows total (3 subgroups × 2 models)
Step 3: Reshape with pivot_wider()
Now we transform from long (one row per model) to wide (one column per model):
analysis.qmd
#| label: pivot-results
table_wide <- all_results |>
mutate(
# Format estimate with 95% CI
estimate_ci = sprintf("%.1f (%.1f, %.1f)", estimate, conf.low, conf.high)
) |>
select(subgroup, n, model, estimate_ci) |>
distinct() |> # Remove any duplicates
pivot_wider(
names_from = model,
values_from = estimate_ci
)
table_wideStep 4: Create publication-ready table with gt()
analysis.qmd
#| label: tbl-subgroup
#| tbl-cap: "Association between college education and systolic blood pressure: Overall and by sex"
table_wide |>
# Order: Overall first, then alphabetical
mutate(subgroup = factor(subgroup, levels = c("Overall", "Female", "Male"))) |>
arrange(subgroup) |>
gt() |>
tab_spanner(
label = "Crude",
columns = Crude,
id = "spanner_crude"
) |>
tab_spanner(
label = "Age-adjusted",
columns = Adjusted,
id = "spanner_adjusted"
) |>
cols_label(
subgroup = "",
n = "N",
Crude = "β (95% CI)",
Adjusted = "β (95% CI)"
) |>
cols_align(align = "center", columns = c(n, Crude, Adjusted)) |>
tab_footnote(
footnote = "Reference: ≤8th Grade education"
)Result: A clean matrix table showing crude vs. adjusted estimates across subgroups — exactly what journals expect!
Step 5: Visualize as a forest plot
analysis.qmd
#| label: fig-subgroup
#| fig-cap: "Association between college education and systolic blood pressure by subgroup"
#| fig-width: 7
#| fig-height: 4
all_results |>
filter(!is.na(estimate)) |>
mutate(
subgroup = factor(subgroup, levels = c("Male", "Female", "Overall")),
model = factor(model, levels = c("Crude", "Adjusted"))
) |>
ggplot(aes(x = estimate, y = subgroup, color = model, shape = model)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
geom_errorbarh(
aes(xmin = conf.low, xmax = conf.high),
height = 0.2,
position = position_dodge(width = 0.4)
) +
geom_point(size = 3, position = position_dodge(width = 0.4)) +
scale_color_manual(values = c("Crude" = "steelblue", "Adjusted" = "darkorange")) +
labs(
x = "Difference in systolic BP (mmHg)\nCollege Grad vs. ≤8th Grade",
y = NULL,
color = "Model",
shape = "Model"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "bottom")| Part 4 Skill | Part 5 Application |
|---|---|
| Write a function | fit_models() returns both Crude + Adjusted results |
nest() + map() |
Apply to Gender subgroups automatically |
bind_rows() |
Combine Overall + subgroup results |
New: pivot_wider() |
Reshape for matrix table layout |
The payoff: 6 regression models (3 subgroups × 2 model types), reshaped and formatted, in ~30 lines of code.
Save Your Results
Before the surprise exercise, save your results for use in the manuscript later:
Save the model and comparison
analysis.qmd
#| label: save-results
# Save the age-adjusted model (for Part 6)
saveRDS(model_adjusted, here("results", "model_fit.rds"))
# Save the comparison table (for Part 6)
saveRDS(comparison, here("results", "model_comparison.rds"))Why save these? In Part 6, you’ll load these in your manuscript to extract specific statistics using inline R code.
Commit your progress
- Go to Source Control panel
- Stage
analysis.qmd - Write a commit message describing your changes
- Click Commit
Self-Study: Git Restore Experience
This section is for self-paced learning after the workshop. It demonstrates Git’s powerful “restore” feature through a realistic scenario. Try it when you have 15-20 minutes to practice.
Everything looks great! We created tables, figures, a model comparison, and committed to Git. Now let’s experience why Git is your research safety net.
Plot Twist #1
“The crude vs adjusted comparison is nice, but I think showing both estimates might confuse reviewers. Let’s just keep the age-adjusted model. Can you simplify the analysis?”
No problem! Let’s simplify by removing the crude model and comparison.
Make the changes
- Open
analysis.qmd - Delete the model comparison code (the
model_crude,crude_result,comparison, and the comparison figure) - Keep only the age-adjusted model:
analysis.qmd
#| label: fit-model
# Final model: Age-adjusted
model_fit <- lm(BPSysAve ~ Education + Gender + Race1 + Age, data = data_clean)- Update the save code to only save
model_fit - Re-render and verify results
- Commit your changes with a descriptive message
Plot Twist #2
“Reviewer 2 is asking to see the crude estimates to assess confounding. Can we add back the comparison table we had before? Sorry!”
Sound familiar?
Without Git…
- “Wait, what exactly did I delete?”
- “I removed like 20 lines of code…”
- “I don’t remember the exact
tidy()andbind_rows()syntax…”
With Git: Check the Diff
- Open Source Control panel
- Click on your recent commit
- See the diff — exactly which lines were deleted (shown in red)
The comparison code you carefully wrote is all there in the diff!
With Git: Restore the Previous Version
Now let’s use Git to actually restore the file to its previous state—no manual editing required!
Hands-on: Use git restore
Open the terminal in Positron (Terminal → New Terminal).
Terminal commands like git need to run inside your project folder.
Good news: When you open Terminal in Positron, it automatically starts in your project directory — no extra steps needed!
If using an external terminal (e.g., macOS Terminal, Windows PowerShell), first navigate to your project:
cd /path/to/your/projectNow run:
# First, see your commit history
git log --onelineYou’ll see something like:
abc1234 (your commit about simplifying the model)
def5678 (your commit about adding tables/figures)
ghi9012 Initial commit
Now restore the file to the previous commit:
# Restore analysis.qmd to the state before the last commit
git restore --source=HEAD~1 analysis.qmdHEAD~1means “one commit before the current one”- You can also use the commit hash:
git restore --source=def5678 analysis.qmd
Git replaced your file with the exact contents from that commit. All your comparison code is back — no retyping needed!
Verify and commit
- Open
analysis.qmd— confirm the comparison code is restored - Re-render to verify the comparison table and figure appear
- Stage and commit with a descriptive message
- Run
git log --onelineagain — you now have 3 commits showing the full journey!
Other Ways to Undo Changes
git restore (what we just did):
- Restores specific file(s) to a previous state
- HEAD stays where it is
- Best when you know which file to restore
git checkout <commit> + new branch:
- Moves your entire project to a past state
- Create a new branch from there to continue work
- Best when you’re not sure which files changed, or want to explore an old state
git revert <commit>:
- Creates a new commit that undoes a previous commit
- Preserves full history (good for shared repositories)
- Best when you’ve already pushed to GitHub
For this workshop, git restore is the simplest approach!
GUI option: Discard uncommitted changes
If you haven’t committed yet and want to undo changes:
- Open Source Control panel
- Find the modified file
- Right-click → Discard Changes
This only works for changes that haven’t been committed yet.
Take a moment to think:
- Could you rewrite the
tidy()+bind_rows()comparison code from memory? - How long would it take to manually recreate the deleted code?
- What if you had made similar deletions across 5 files?
Git lets you experiment fearlessly because every version is saved.
The Real Lesson: Git as Your Research Safety Net
The advisor feedback loop you just experienced is real:
- “Add this analysis” → “Remove it” → “Actually, put it back”
- Reviewer requests months later for code you deleted
Without Git: Hope you commented it out, or rewrite from memory. With Git: One command restores any previous version.
Beyond Restore: Branches for Experimentation
What if you want to try a different approach without risking your working code? Git branches let you experiment in a parallel version of your project.
gitGraph
commit id: "Initial analysis"
commit id: "Add demographics table"
commit id: "Age-adjusted model"
branch exploratory-analysis
checkout exploratory-analysis
commit id: "Try alternative model"
commit id: "Compare specifications"
checkout main
commit id: "Continue main work"
merge exploratory-analysis id: "Merge results"
Think of main as your safe version. A branch is where you experiment — if it works, merge it back; if not, delete it.
Branches require more Git knowledge than we can cover here. The goal is awareness — knowing this exists changes how you think about experimentation.
| Scenario | Why a Branch Helps |
|---|---|
| Exploratory analysis | Experiment without affecting main results |
| Different statistical approach | Compare methods side-by-side |
- Part 2 introduced the basic Git workflow: edit → stage → commit
- Part 5 (this section) showed why that workflow matters: restoration and safety
- Part 7 will show how Git enables collaboration through Word documents with track changes
- Part 8 will introduce GitHub for sharing and
targetsfor pipeline automation
Going Further: Sensitivity Analysis
The nest() + map() pattern works for any repeated analysis. For example, testing how results change across different thresholds:
analysis.qmd
#| label: sensitivity-teaser
#| eval: false
# Test hypertension at different BP thresholds
thresholds <- c(120, 130, 140, 150)
map(thresholds, \(t) {
data_clean |>
mutate(hypertension = BPSysAve >= t) |>
glm(hypertension ~ Education + Age + Gender, data = _, family = binomial) |>
tidy(exponentiate = TRUE, conf.int = TRUE) |>
filter(term == "EducationCollege Grad") |>
mutate(threshold = t)
}) |>
list_rbind()The example files (examples/nhanes-manuscript/notebooks/analysis.qmd) show complete examples of sensitivity analyses using this pattern.