This document demonstrates principles of reproducible and efficient data analysis, including using functions, automating repeated analysis via purrr, script-based tables and figures, and how to use Quarto to generate reproducible reports. We will focus on the summary-level exposure misclassification adjustment as a motivating example.
Performing PBA with a single set of bias parameters
PBA is be done via Monte Carlo simulations where estimates are corrected using bias parameters drawn from probability distributions. If you know the underlying theory, you could implement this from scratch; however, it would require hundreds of lines of code every time you perform PBA.
Fortunately, you don’t have to do this - you will use the pba.summary.exp.rr function that is already created and available for this course. This function uses the following inputs.
Observed data
\(a\) = number of exposed cases
\(b\) = number of unexposed cases
\(c\) = number of exposed controls
\(d\) = number of unexposed controls
Bias parameters for sensitivity (beta distribution)
\(se1.a\) = alpha parameter for sens among cases
\(se1.b\) = beta parameter for sens among cases
\(se0.a\) = alpha parameter for sens among controls
\(se0.b\) = beta parameter for sens among controls
Bias parameters for specificity (trapezoidal)
Case Specificity
\(sp1.min\) = lower bound of trapezoidal dist’n
\(sp1.mod1\) = first mode of trapezoidal
\(sp1.mod2\) = second mode of trapezoidal
\(sp1.max\) = upper bound of trapezoidal dist’n
Control Specificity
\(sp0.min\) = lower bound of trapezoidal dist’n
\(sp0.mod1\) = first mode of trapezoidal
\(sp0.mod2\) = second mode of trapezoidal
\(sp0.max\) = upper bound of trapezoidal dist’n
\(Sims\) = number of simulations
The source code for the pba.summary.exp.rr function is embedded below.
Code
# Load packageslibrary(MASS)library(trapezoid)library(tidyverse)library(sandwich)select <- dplyr::select # overwriting the select function from MASS to the one from dplyr#Macro for summary level exposure misclass bias adjustment#Inputs for this macro are as follow:#Data#a=number of exposed cases#b=number of unexposed cases#c=number of exposed controls#d=number of unexposed controls#bias parameters for sensitivity (beta distribution)#se1.a=alpha parameter for sens among cases#se1.b=beta parameter for sens among cases#se0.a=alpha parameter for sens among controls#se0.b=beta parameter for sens among controls#Bias parameters for specificity (trapezoidal)#Case Specificity#sp1.min = lower bound of trapezoidal dist'n#sp1.mod1 = first mode of trapezoidal#sp1.mod2 = second mode of trapezoidal#sp1.max = upper bound of trapezoidal dist'n#Control Specificity#sp0.min = lower bound of trapezoidal dist'n#sp0.mod1 = first mode of trapezoidal#sp0.mod2 = second mode of trapezoidal#sp0.max = upper bound of trapezoidal dist'n#Sims=number of simulationspba.summary.exp.rr<-function(a,b,c,d,se1.a,se1.b,se0.a,se0.b,sp1.min,sp1.mod1,sp1.mod2,sp1.max,sp0.min,sp0.mod1,sp0.mod2,sp0.max,SIMS){ n_case <- a + b n_ctrl <- c + d n_exp <- a + c n_unexp<- b + d# set some important parameters niter <- SIMS #number of iterations# draw correlated sensitivities and specificities# correlation =rho1 rho1=.80 V=matrix(c(1,rho1,rho1,1),ncol=2) Z=mvrnorm(niter,c(0,0),V) U=pnorm(Z)#NOTE: if you have non-differential misclassification, you would sample se1~beta(niter,se1.a,se1.b) and then set se0<-se1 se1=qbeta(U[,1],se1.a,se1.b) se0=qbeta(U[,2],se0.a,se0.b) rho2=.80 V=matrix(c(1,rho2,rho2,1),ncol=2) Z=mvrnorm(niter,c(0,0),V) U=pnorm(Z)#NOTE: similar to above, if you have non-differential misclassification, sample sp1 and then set sp0=sp1 sp1=qtrapezoid(U[,1],sp1.min,sp1.mod1,sp1.mod2,sp1.max) sp0=qtrapezoid(U[,2],sp0.min,sp0.mod1,sp0.mod2,sp0.max)# calculate bias-adjusted cell frequencies ac <- (a - n_case*(1-sp1))/(se1 - (1-sp1)) #bias-adjusted cases, exposed bc <- n_case - ac #bias-adjusted cases, unexposed cc <- (c - n_ctrl*(1-sp0))/(se0 - (1-sp0)) #bias-adjusted controls, exposed dc <- n_ctrl - cc #bias-adjusted controls, unexposed flag<-(ac<=0)|(bc<=0)|(cc<=0)|(dc<=0)#calculate prevalence of exposure in cases and controls, accounting for sampling error PrevE_cases<-rbeta(niter,ac,bc) PrevE_controls <-rbeta(niter,cc,dc)#calculate PPV and NPV of exposure classification in cases and controls PPV_case <- (se1*PrevE_cases)/((se1*PrevE_cases)+(1-sp1)*(1-PrevE_cases)) PPV_control <- (se0*PrevE_controls)/((se0*PrevE_controls)+(1-sp0)*(1-PrevE_controls)) NPV_case <- (sp1*(1-PrevE_cases))/((1-se1)*PrevE_cases+sp1*(1-PrevE_cases)) NPV_control <- (sp0*(1-PrevE_controls))/((1-se0)*PrevE_controls+sp0*(1-PrevE_controls))#calculate the expected number of exposed cases and controls ab <-rbinom(niter,a,PPV_case)+rbinom(niter,b,1-NPV_case) bb <- n_case-ab cb <-rbinom(niter,c,PPV_control)+rbinom(niter,d,1-NPV_control) db <- n_ctrl-cb#calculate bias adjusted RR with second source of uncertainty rr_bb <- (ab/(ab+cb))/(bb/(bb+db))# calculate bias-adjusted risk ratios, third source of uncertainty, bias-adjusted standard error se_bb <-sqrt(1/ab+1/bb-1/(ab+cb)-1/(bb+db)) z <-rnorm(niter) rr_bb_cb <-exp(log(rr_bb) - (z*se_bb))# bind vectors as dataframe bound <-data.frame(ac, bc, cc, dc, ab, bb, cb, db, rr_bb_cb)# housekeeping - retain rows with positive corrected cell freqs summary_pba <- bound %>%filter(ac >0) %>%filter(bc >0) %>%filter(cc >0) %>%filter(dc >0) %>%filter(ab >0) %>%filter(bb >0) %>%filter(cb >0) %>%filter(db >0) q3<-c(quantile(summary_pba$rr_bb_cb,c(0.025,0.5,0.975)),niter-length(summary_pba$rr_bb_cb))names(q3)<-c("2.5th %tile","50th %tile", "97.5th %tile","Impossible values")return(q3)}
The function outputs the 2.5th, 50th (median), and 97.5th percentiles of bias-adjusted risk ratios, along with the number of impossible values (simulations with negative values for the adjusted cell counts). The sample scenario in the distributed code used the following parameter values:
Specified parameter values in the example
Parameter
Value
\(a\)
40
\(b\)
20
\(c\)
60
\(d\)
80
\(se1.a\)
25
\(se1.b\)
3
\(se0.a\)
45
\(se0.b\)
7
\(sp1.min\)
0.9
\(sp1.mod1\)
0.93
\(sp1.mod2\)
0.97
\(sp1.max\)
1
\(sp0.min\)
0.8
\(sp0.mod1\)
0.83
\(sp0.mod2\)
0.87
\(sp0.max\)
0.9
\(Sims\)
10^6
The function should produce output similar to what is shown below. Due to the probabilistic nature of Monte Carlo simulation, you may get slightly different numbers each time you run the code.
The output is treated as a vector of numeric values in R and does not look fancy when knitted as an HTML Quarto report (this document). There are many packages you can use to make the table prettier in a Quarto document. Here, I show the example using the knitr::kable() and kableExtra.
Code
library(kableExtra)library(knitr)output.single <-pba.summary.exp.rr(a =40,b =20,c =60,d =80,se1.a =25, se1.b =3, se0.a =45, se0.b =7,sp1.min = .9, sp1.mod1 = .93, sp1.mod2 = .97, sp1.max =1,sp0.min = .8, sp0.mod1 = .83, sp0.mod2 = .87, sp0.max = .9,SIMS =10^6) |>round(2)output.single |> broom::tidy() |># transform the output into a tibblepivot_wider(names_from ="names", values_from ="x") |># wider format kable() |># transform the tibble into a HTML talbekable_paper("hover",full_width = F) # use functions from the kableExtra package to format the table
2.5th %tile
50th %tile
97.5th %tile
Impossible values
1.54
2.84
7.8
3307
Repeating PBA with changing parameter values
You now know how to use the pba.summary.exp.rr function to perform the PBA with a single set of pre-specified parameter values (e.g., alphas and betas for sensitivity). These values are determined based on results from a validation study and subject-matter knowledge. But, what if you want to repeat the PBA with changing parameter values?
You may be tempted to copy and paste the pba.summary.exp.rr, modify the parameter values, and save the outputs. You repeat this process for all desired parameter value combinations. Then, you manually create a summary table (entering each QBA result and its corresponding parameter values into a spreadsheet).
DO NOT DO THIS. This approach is not ideal because it is very time consuming and also introduces inevitable human errors (hence, the lack of reproducibility) due to many reasons such as:
Failing to change parameter values or incorrectly typing parameter values
Failing to rename output objects
Using incorrect objects or entering wrong numbers when creating the summary table
Instead, do the following:
Write a generic function that does the following:
Uses a list of parameter value combinations as an input
Applies each combination of parameter values to the pba.summary.exp.rr
Automatically repeats this process for all combinations
Create a list of desired parameter value combinations
Apply the function to the list you created
Use scripts to generate tables and figures summarizing the sensitivity analysis
Example: Changing beta parameters
Below is an example where PBA is repeated with varying beta parameters for sensitivity among cases and non-cases. That is, you will only change \(se1.b\) and \(se0.b\) while fixing all other parameters to the same values as the ones in the original example above.
1. Write a function that uses \(se1.b\) and \(se0.b\) as an input
The code below creates an original function named perform.PBA.with.changingBetas. This function uses user-specified values of \(se1.b\) and \(se0.b\) as inputs and apply them to the pba.summary.exp.rr function while holding all other parameter values constant.
Note that the function uses set.seed() to specify a seed value to be used in the PBA. This ensures the full reproducibility of the result.
We will specify a range of the beta parameter values by fixing the alpha values while changing the sensitivity values. In the original example, the mean sensitivity among cases and non-cases from the beta distribution were approximately:
\[E[se0] = \frac{se0.a}{(se0.a + se0.b)} = \frac{45}{(45+7)}=0.87\] Let’s consider the mean sensitivity among cases and non-cases ranging from 0.85 to 1.0 by 0.05 respectively. These values can be specified via seq(0.85, 1, 0.05) in R. You can backcalculate beta values from the mean sensitvity and fixed alpha value:
\[se1.b = \frac{se1.a}{E[se1]}-se1.a = \frac{25}{E[se1]}-25\]\[se0.b = \frac{se0.a}{E[se0]}-se0.a = \frac{45}{E[se0]}-45\] This gives us the following sixteen scenarios with corresponding beta values.
Code
library(purrr)library(flextable)library(officer)# Function to perform the pba with specified betasperform.PBA.with.changingBetas <-function(se1b, se0b){set.seed(123) rr.out <-pba.summary.exp.rr(a =40,b =20,c =60,d =80,se1.a =25, se1.b = se1b, se0.a =45, se0.b = se0b,sp1.min = .9, sp1.mod1 = .93, sp1.mod2 = .97, sp1.max =1,sp0.min = .8, sp0.mod1 = .83, sp0.mod2 = .87, sp0.max = .9,SIMS =10^6) scenario.specific.output <-round(rr.out,2) |> broom::tidy() |>pivot_wider(names_from ="names", values_from ="x") return(scenario.specific.output)}# Back calculate betas based on the sensitivity values for cases and non-casesparms.betas =expand_grid(mean.se1s =seq(0.85, 1, 0.05),mean.se0s =seq(0.85, 1, 0.05)) |>as_tibble(rownames ="scenario") |>mutate(se1.b =25/mean.se1s-25,se0.b =45/mean.se0s-45, ) parms.betas |>kable() |>kable_paper("hover",full_width = F)
scenario
mean.se1s
mean.se0s
se1.b
se0.b
1
0.85
0.85
4.411765
7.941177
2
0.85
0.90
4.411765
5.000000
3
0.85
0.95
4.411765
2.368421
4
0.85
1.00
4.411765
0.000000
5
0.90
0.85
2.777778
7.941177
6
0.90
0.90
2.777778
5.000000
7
0.90
0.95
2.777778
2.368421
8
0.90
1.00
2.777778
0.000000
9
0.95
0.85
1.315789
7.941177
10
0.95
0.90
1.315789
5.000000
11
0.95
0.95
1.315789
2.368421
12
0.95
1.00
1.315789
0.000000
13
1.00
0.85
0.000000
7.941177
14
1.00
0.90
0.000000
5.000000
15
1.00
0.95
0.000000
2.368421
16
1.00
1.00
0.000000
0.000000
These beta values are now saved as a tibble object (i.e., as a data table), so you do not need to type them manually. The tibble looks like this
3. Apply the function to the specified parameter values
Instead of copying and pasting the perform.PBA.with.changingBetas and manually updating the beta value inputs, we will automate this repeated process using the R package called purrr. The map function of this package automatically repeats a specified function using each element of a vector as an input. Using the purrr::map2, you can set two vectors as inputs - in this case, the columns parms.betas$se1.b and parms.betas$se0.b. The code below adds a new column called parms.betas$rr.outputs, which is an output from the perform.PBA.with.changingBetas using the beta values corresponding to each row. Use unnest() to show what’s inside the parms.betas$rr.outputs column.
The output would look like this:
Code
# Apply the new function to the grid of betasoutputs.sensitivity <- parms.betas |>mutate(rr.outputs =map2(se1.b,se0.b, perform.PBA.with.changingBetas)) |>unnest(rr.outputs) |>mutate(across(c(se1.b,se0.b),round,2))outputs.sensitivity |>head()
You can now use scripts to generate a table based on the output tibble. You can group some rows (see what I did for the second column), rename columns for better readability, etc. Then use the flextable package to transform the tibble into a new table object. You can specify fonts, layout, etc.
Code
# Country specific demographic group means of outcomecollapse_rows_df <-function(df, variable){ group_var <-enquo(variable) df %>%group_by(!! group_var) %>%mutate(groupRow =1:n()) %>%ungroup() %>%mutate(!!quo_name(group_var) :=ifelse(groupRow ==1, as.character(!! group_var), "")) %>%select(-c(groupRow))}# Create a publishable tabletbl.output.betas <- outputs.sensitivity |>collapse_rows_df(mean.se1s) |># apply the function above to group some rowsrename( # rename variables for readabilityScenario = scenario,`E[se1]`= mean.se1s,`E[se0]`= mean.se0s ) |>flextable() |># Transform a tibble into a flextableset_caption("Table. QBA with different betas", # add a titlealign_with_table = F,style =fp_text( # specify formatting rulesfont.family ="Times", font.size =12,bold =FALSE,italic =FALSE ) ) |>fontsize(size =12, part ="all") |>font(fontname ="Times", part ="all") |>set_table_properties(layout ="autofit")tbl.output.betas
Scenario
E[se1]
E[se0]
se1.b
se0.b
2.5th %tile
50th %tile
97.5th %tile
Impossible values
1
0.85
0.85
4.41
7.94
1.65
3.26
13.42
15,292
2
0.90
4.41
5.00
1.77
3.51
14.80
15,334
3
0.95
4.41
2.37
1.89
3.74
16.38
15,320
4
1.00
4.41
0.00
1.97
3.95
18.31
15,183
5
0.9
0.85
2.78
7.94
1.48
2.71
6.97
2,382
6
0.90
2.78
5.00
1.60
2.92
7.68
2,439
7
0.95
2.78
2.37
1.71
3.11
8.35
2,383
8
1.00
2.78
0.00
1.79
3.28
9.29
2,387
9
0.95
0.85
1.32
7.94
1.34
2.33
4.48
178
10
0.90
1.32
5.00
1.46
2.52
4.86
170
11
0.95
1.32
2.37
1.57
2.69
5.25
175
12
1.00
1.32
0.00
1.66
2.84
5.75
174
13
1
0.85
0.00
7.94
1.19
2.06
3.50
0
14
0.90
0.00
5.00
1.31
2.22
3.72
0
15
0.95
0.00
2.37
1.43
2.37
3.93
0
16
1.00
0.00
0.00
1.55
2.52
4.14
0
The flextable object can be exported as a MS Word document using the officer package.
The following code will save the flextable above as a MS Word file. The exported file would look like this:
One of the benefits of storing an output as a tibble object is that you can use it to flexibly create any figures for visualization using the ggplot2 package. Here, you see an example of a point range plot for the median bias-adjusted estimates with bars for 2.5th and 97.5th percentiles, according to the specified mean sensitivity values among cases and non-cases.
This is just one example of how you might do reproducible data analysis and reporting. You could extend this example to:
Be creative and use many other visualization approaches (modify Step 4)
Allow a wider range of the beta parameter values (modify Step 2)
or even create a new function that allows you to vary other parameter values (e.g., specificity parameters; modify Step 1)
The big-picture idea is that you should always write original functions to automate the repetition and scripts for generating tables and figures.Avoid copying, pasting, and manually typing numbers as much as possible.
Source Code
---title: "Reproducible Aanalysis and Reporting for QBA"author: "Koichiro Shiba"affiliation: "Department of Epidemiology, Boston University School of Public Health"date: "2025-01-02"title-block-banner: truetheme: mintyformat: html: embed-resources: true code-fold: true code-tools: true toc: true toc-location: left smooth-scroll: trueeditor: markdown: wrap: 72---```{r setup, include=FALSE}knitr::opts_chunk$set(echo = T, message = F, warning = F, error = F)```# OverviewThis document demonstrates principles of reproducible and efficient dataanalysis, including using functions, automating repeated analysis viapurrr, script-based tables and figures, and how to use Quarto togenerate reproducible reports. We will focus on [the summary-levelexposure misclassification adjustment]{.underline} as a motivatingexample.# Performing PBA with a single set of bias parametersPBA is be done via Monte Carlo simulations where estimates are correctedusing bias parameters drawn from probability distributions. If you knowthe underlying theory, you could implement this from scratch; however,it would require hundreds of lines of code every time you perform PBA.Fortunately, you don't have to do this - you will use the`pba.summary.exp.rr` function that is already created and available forthis course. This function uses the following inputs.- Observed data - $a$ = number of exposed cases - $b$ = number of unexposed cases - $c$ = number of exposed controls - $d$ = number of unexposed controls- Bias parameters for sensitivity (beta distribution) - $se1.a$ = alpha parameter for sens among cases - $se1.b$ = beta parameter for sens among cases - $se0.a$ = alpha parameter for sens among controls - $se0.b$ = beta parameter for sens among controls- Bias parameters for specificity (trapezoidal) - Case Specificity - $sp1.min$ = lower bound of trapezoidal dist'n - $sp1.mod1$ = first mode of trapezoidal - $sp1.mod2$ = second mode of trapezoidal - $sp1.max$ = upper bound of trapezoidal dist'n - Control Specificity - $sp0.min$ = lower bound of trapezoidal dist'n - $sp0.mod1$ = first mode of trapezoidal - $sp0.mod2$ = second mode of trapezoidal - $sp0.max$ = upper bound of trapezoidal dist'n- $Sims$ = number of simulationsThe source code for the `pba.summary.exp.rr` function is embedded below.```{r}# Load packageslibrary(MASS)library(trapezoid)library(tidyverse)library(sandwich)select <- dplyr::select # overwriting the select function from MASS to the one from dplyr#Macro for summary level exposure misclass bias adjustment#Inputs for this macro are as follow:#Data#a=number of exposed cases#b=number of unexposed cases#c=number of exposed controls#d=number of unexposed controls#bias parameters for sensitivity (beta distribution)#se1.a=alpha parameter for sens among cases#se1.b=beta parameter for sens among cases#se0.a=alpha parameter for sens among controls#se0.b=beta parameter for sens among controls#Bias parameters for specificity (trapezoidal)#Case Specificity#sp1.min = lower bound of trapezoidal dist'n#sp1.mod1 = first mode of trapezoidal#sp1.mod2 = second mode of trapezoidal#sp1.max = upper bound of trapezoidal dist'n#Control Specificity#sp0.min = lower bound of trapezoidal dist'n#sp0.mod1 = first mode of trapezoidal#sp0.mod2 = second mode of trapezoidal#sp0.max = upper bound of trapezoidal dist'n#Sims=number of simulationspba.summary.exp.rr<-function(a,b,c,d,se1.a,se1.b,se0.a,se0.b,sp1.min,sp1.mod1,sp1.mod2,sp1.max,sp0.min,sp0.mod1,sp0.mod2,sp0.max,SIMS){ n_case <- a + b n_ctrl <- c + d n_exp <- a + c n_unexp<- b + d# set some important parameters niter <- SIMS #number of iterations# draw correlated sensitivities and specificities# correlation =rho1 rho1=.80 V=matrix(c(1,rho1,rho1,1),ncol=2) Z=mvrnorm(niter,c(0,0),V) U=pnorm(Z)#NOTE: if you have non-differential misclassification, you would sample se1~beta(niter,se1.a,se1.b) and then set se0<-se1 se1=qbeta(U[,1],se1.a,se1.b) se0=qbeta(U[,2],se0.a,se0.b) rho2=.80 V=matrix(c(1,rho2,rho2,1),ncol=2) Z=mvrnorm(niter,c(0,0),V) U=pnorm(Z)#NOTE: similar to above, if you have non-differential misclassification, sample sp1 and then set sp0=sp1 sp1=qtrapezoid(U[,1],sp1.min,sp1.mod1,sp1.mod2,sp1.max) sp0=qtrapezoid(U[,2],sp0.min,sp0.mod1,sp0.mod2,sp0.max)# calculate bias-adjusted cell frequencies ac <- (a - n_case*(1-sp1))/(se1 - (1-sp1)) #bias-adjusted cases, exposed bc <- n_case - ac #bias-adjusted cases, unexposed cc <- (c - n_ctrl*(1-sp0))/(se0 - (1-sp0)) #bias-adjusted controls, exposed dc <- n_ctrl - cc #bias-adjusted controls, unexposed flag<-(ac<=0)|(bc<=0)|(cc<=0)|(dc<=0)#calculate prevalence of exposure in cases and controls, accounting for sampling error PrevE_cases<-rbeta(niter,ac,bc) PrevE_controls <-rbeta(niter,cc,dc)#calculate PPV and NPV of exposure classification in cases and controls PPV_case <- (se1*PrevE_cases)/((se1*PrevE_cases)+(1-sp1)*(1-PrevE_cases)) PPV_control <- (se0*PrevE_controls)/((se0*PrevE_controls)+(1-sp0)*(1-PrevE_controls)) NPV_case <- (sp1*(1-PrevE_cases))/((1-se1)*PrevE_cases+sp1*(1-PrevE_cases)) NPV_control <- (sp0*(1-PrevE_controls))/((1-se0)*PrevE_controls+sp0*(1-PrevE_controls))#calculate the expected number of exposed cases and controls ab <-rbinom(niter,a,PPV_case)+rbinom(niter,b,1-NPV_case) bb <- n_case-ab cb <-rbinom(niter,c,PPV_control)+rbinom(niter,d,1-NPV_control) db <- n_ctrl-cb#calculate bias adjusted RR with second source of uncertainty rr_bb <- (ab/(ab+cb))/(bb/(bb+db))# calculate bias-adjusted risk ratios, third source of uncertainty, bias-adjusted standard error se_bb <-sqrt(1/ab+1/bb-1/(ab+cb)-1/(bb+db)) z <-rnorm(niter) rr_bb_cb <-exp(log(rr_bb) - (z*se_bb))# bind vectors as dataframe bound <-data.frame(ac, bc, cc, dc, ab, bb, cb, db, rr_bb_cb)# housekeeping - retain rows with positive corrected cell freqs summary_pba <- bound %>%filter(ac >0) %>%filter(bc >0) %>%filter(cc >0) %>%filter(dc >0) %>%filter(ab >0) %>%filter(bb >0) %>%filter(cb >0) %>%filter(db >0) q3<-c(quantile(summary_pba$rr_bb_cb,c(0.025,0.5,0.975)),niter-length(summary_pba$rr_bb_cb))names(q3)<-c("2.5th %tile","50th %tile", "97.5th %tile","Impossible values")return(q3)}```The function outputs the 2.5th, 50th (median), and 97.5th percentiles ofbias-adjusted risk ratios, along with the number of impossible values(simulations with negative values for the adjusted cell counts). Thesample scenario in the distributed code used the following parametervalues:| Parameter | Value ||------------|-------|| $a$ | 40 || $b$ | 20 || $c$ | 60 || $d$ | 80 || $se1.a$ | 25 || $se1.b$ | 3 || $se0.a$ | 45 || $se0.b$ | 7 || $sp1.min$ | 0.9 || $sp1.mod1$ | 0.93 || $sp1.mod2$ | 0.97 || $sp1.max$ | 1 || $sp0.min$ | 0.8 || $sp0.mod1$ | 0.83 || $sp0.mod2$ | 0.87 || $sp0.max$ | 0.9 || $Sims$ | 10\^6 |: Specified parameter values in the example {.striped .hover}The function should produce output similar to what is shown below. Dueto the probabilistic nature of Monte Carlo simulation, you may getslightly different numbers each time you run the code.```{r}pba.summary.exp.rr(a =40,b =20,c =60,d =80,se1.a =25, se1.b =3, se0.a =45, se0.b =7,sp1.min = .9, sp1.mod1 = .93, sp1.mod2 = .97, sp1.max =1,sp0.min = .8, sp0.mod1 = .83, sp0.mod2 = .87, sp0.max = .9,SIMS =10^6) |>round(2)```The output is treated as a vector of numeric values in R and does notlook fancy when knitted as an HTML Quarto report (this document). Thereare many packages you can use to make the table prettier in a Quartodocument. Here, I show the example using the `knitr::kable()` and`kableExtra`.```{r}library(kableExtra)library(knitr)output.single <-pba.summary.exp.rr(a =40,b =20,c =60,d =80,se1.a =25, se1.b =3, se0.a =45, se0.b =7,sp1.min = .9, sp1.mod1 = .93, sp1.mod2 = .97, sp1.max =1,sp0.min = .8, sp0.mod1 = .83, sp0.mod2 = .87, sp0.max = .9,SIMS =10^6) |>round(2)output.single |> broom::tidy() |># transform the output into a tibblepivot_wider(names_from ="names", values_from ="x") |># wider format kable() |># transform the tibble into a HTML talbekable_paper("hover",full_width = F) # use functions from the kableExtra package to format the table```# Repeating PBA with changing parameter valuesYou now know how to use the `pba.summary.exp.rr` function to perform thePBA with a single set of pre-specified parameter values (e.g., alphasand betas for sensitivity). These values are determined based on resultsfrom a validation study and subject-matter knowledge. But, what if youwant to repeat the PBA with changing parameter values?You may be tempted to copy and paste the `pba.summary.exp.rr`, modifythe parameter values, and save the outputs. You repeat this process forall desired parameter value combinations. Then, you manually create asummary table (entering each QBA result and its corresponding parametervalues into a spreadsheet).**DO NOT DO THIS.** This approach is not ideal because it is *very timeconsuming* and also introduces *inevitable human errors* (hence, thelack of reproducibility) due to many reasons such as:- Failing to change parameter values or incorrectly typing parameter values- Failing to rename output objects- Using incorrect objects or entering wrong numbers when creating the summary table**Instead, do the following:**1. [Write a generic function]{.underline} that does the following: 1. Uses a list of parameter value combinations as an input 2. Applies each combination of parameter values to the`pba.summary.exp.rr` 3. Automatically repeats this process for all combinations2. Create a list of desired parameter value combinations3. Apply the function to the list you created4. [Use scripts]{.underline} to generate tables and figures summarizing the sensitivity analysis# Example: Changing beta parametersBelow is an example where PBA is repeated with varying beta parametersfor sensitivity among cases and non-cases. That is, you will only change$se1.b$ and $se0.b$ while fixing all other parameters to the same valuesas the ones in the original example above.## 1. Write a function that uses $se1.b$ and $se0.b$ as an inputThe code below creates an original function named`perform.PBA.with.changingBetas`. This function uses user-specifiedvalues of $se1.b$ and $se0.b$ as inputs and apply them to the`pba.summary.exp.rr` function while holding all other parameter valuesconstant.Note that the function uses `set.seed()` to specify a seed value to beused in the PBA. This ensures the full reproducibility of the result.```{r}library(purrr)library(flextable)library(officer)# Function to perform the pba with specified betasperform.PBA.with.changingBetas <-function(se1b, se0b){set.seed(123) rr.out <-pba.summary.exp.rr(a =40,b =20,c =60,d =80,se1.a =25, se1.b = se1b, se0.a =45, se0.b = se0b,sp1.min = .9, sp1.mod1 = .93, sp1.mod2 = .97, sp1.max =1,sp0.min = .8, sp0.mod1 = .83, sp0.mod2 = .87, sp0.max = .9,SIMS =10^6) scenario.specific.output <-round(rr.out,2) |> broom::tidy() |>pivot_wider(names_from ="names", values_from ="x") return(scenario.specific.output)}```## 2. Create a list of desired parameter valuesWe will specify a range of the beta parameter values by fixing the alphavalues while changing the sensitivity values. In the original example,the mean sensitivity among cases and non-cases from the betadistribution were approximately:$$E[se1] = \frac{se1.a}{(se1.a + se1.b)} = \frac{25}{(25+3)}=0.89$$$$E[se0] = \frac{se0.a}{(se0.a + se0.b)} = \frac{45}{(45+7)}=0.87$$Let's consider the mean sensitivity among cases and non-cases rangingfrom 0.85 to 1.0 by 0.05 respectively. These values can be specified via`seq(0.85, 1, 0.05)` in R. You can backcalculate beta values from themean sensitvity and fixed alpha value:$$se1.b = \frac{se1.a}{E[se1]}-se1.a = \frac{25}{E[se1]}-25$$$$se0.b = \frac{se0.a}{E[se0]}-se0.a = \frac{45}{E[se0]}-45$$ This givesus the following sixteen scenarios with corresponding beta values.```{r}library(purrr)library(flextable)library(officer)# Function to perform the pba with specified betasperform.PBA.with.changingBetas <-function(se1b, se0b){set.seed(123) rr.out <-pba.summary.exp.rr(a =40,b =20,c =60,d =80,se1.a =25, se1.b = se1b, se0.a =45, se0.b = se0b,sp1.min = .9, sp1.mod1 = .93, sp1.mod2 = .97, sp1.max =1,sp0.min = .8, sp0.mod1 = .83, sp0.mod2 = .87, sp0.max = .9,SIMS =10^6) scenario.specific.output <-round(rr.out,2) |> broom::tidy() |>pivot_wider(names_from ="names", values_from ="x") return(scenario.specific.output)}# Back calculate betas based on the sensitivity values for cases and non-casesparms.betas =expand_grid(mean.se1s =seq(0.85, 1, 0.05),mean.se0s =seq(0.85, 1, 0.05)) |>as_tibble(rownames ="scenario") |>mutate(se1.b =25/mean.se1s-25,se0.b =45/mean.se0s-45, ) parms.betas |>kable() |>kable_paper("hover",full_width = F)```These beta values are now saved as a tibble object (i.e., as a datatable), so you do not need to type them manually. The tibble looks likethis```{r}parms.betas |>head()```## 3. Apply the function to the specified parameter valuesInstead of copying and pasting the `perform.PBA.with.changingBetas` andmanually updating the beta value inputs, we will automate this repeatedprocess using the R package called `purrr`. The `map` function of thispackage automatically repeats a specified function using each element ofa vector as an input. Using the `purrr::map2`, you can set two vectorsas inputs - in this case, the columns `parms.betas$se1.b` and`parms.betas$se0.b`. The code below adds a new column called`parms.betas$rr.outputs`, which is an output from the`perform.PBA.with.changingBetas` using the beta values corresponding toeach row. Use `unnest()` to show what's inside the`parms.betas$rr.outputs` column.The output would look like this:```{r}# Apply the new function to the grid of betasoutputs.sensitivity <- parms.betas |>mutate(rr.outputs =map2(se1.b,se0.b, perform.PBA.with.changingBetas)) |>unnest(rr.outputs) |>mutate(across(c(se1.b,se0.b),round,2))outputs.sensitivity |>head() ```## 4. Use scripts to generate tables and figuresYou can now use scripts to generate a table based on the output tibble.You can group some rows (see what I did for the second column), renamecolumns for better readability, etc. Then use the `flextable` package totransform the tibble into a new table object. You can specify fonts,layout, etc.```{r}# Country specific demographic group means of outcomecollapse_rows_df <-function(df, variable){ group_var <-enquo(variable) df %>%group_by(!! group_var) %>%mutate(groupRow =1:n()) %>%ungroup() %>%mutate(!!quo_name(group_var) :=ifelse(groupRow ==1, as.character(!! group_var), "")) %>%select(-c(groupRow))}# Create a publishable tabletbl.output.betas <- outputs.sensitivity |>collapse_rows_df(mean.se1s) |># apply the function above to group some rowsrename( # rename variables for readabilityScenario = scenario,`E[se1]`= mean.se1s,`E[se0]`= mean.se0s ) |>flextable() |># Transform a tibble into a flextableset_caption("Table. QBA with different betas", # add a titlealign_with_table = F,style =fp_text( # specify formatting rulesfont.family ="Times", font.size =12,bold =FALSE,italic =FALSE ) ) |>fontsize(size =12, part ="all") |>font(fontname ="Times", part ="all") |>set_table_properties(layout ="autofit")tbl.output.betas```The `flextable` object can be exported as a MS Word document using the`officer` package.```{r}my.fontprops <-fp_text(bold = F, italic = F)read_docx() |>body_add_flextable(tbl.output.betas) |>body_add_fpar(my.fontprops) |>print(., target ="reproducible_table_QBA.docx")```The following code will save the flextable above as a MS Word file. Theexported file would look like this:{width="1800"}One of the benefits of storing an output as a tibble object is that youcan use it to flexibly create any figures for visualization using the`ggplot2` package. Here, you see an example of a point range plot forthe median bias-adjusted estimates with bars for 2.5th and 97.5thpercentiles, according to the specified mean sensitivity values amongcases and non-cases.```{r}## as a figurepd <-position_dodge(width =-0.5) # set desired dodge widthoutputs.sensitivity |>mutate(mean.se0s =factor(mean.se0s),mean.se1s =factor(mean.se1s),mean.se1s =fct_rev(mean.se1s)) |>ggplot(aes(x = mean.se1s, y =`50th %tile`, ymin =`2.5th %tile`, ymax =`97.5th %tile`, colour = mean.se0s)) +geom_point(position = pd) +geom_line(position = pd) +geom_errorbar(width =0.01,#alpha = 0.5,linewidth =0.5,position = pd ) +theme_minimal() +coord_flip() +scale_y_continuous(limits =c(1,max(outputs.sensitivity$`97.5th %tile`)),breaks =c(seq(1,max(outputs.sensitivity$`97.5th %tile`),1)) ) + ggsci::scale_color_npg(name =str_wrap("Sensitivity Among Non-cases (Se0)", width =25)) +xlab("Sensitivity Among Cases (Se1)") +ylab("50th Percentile Bias-adjusted RRs") +theme(text =element_text(size =12),axis.text =element_text(size =12),legend.text =element_text(size =12),#legend.position = "bottom",#legend.direction = "vertical",legend.box.background =element_rect(fill ="white", color ="black"), ) ```This is just one example of how you might do reproducible data analysisand reporting. You could extend this example to:1. Be creative and use many other visualization approaches (modify Step 4)2. Allow a wider range of the beta parameter values (modify Step 2)3. or even create a new function that allows you to vary other parameter values (e.g., specificity parameters; modify Step 1)The big-picture idea is that [you should always write original functionsto automate the repetition and scripts for generating tables andfigures.]{.underline} **Avoid copying, pasting, and manually typingnumbers as much as possible.**