Output a compact summary table — rm_compactsum • reportRmd

Outputs a table formatted for pdf, word or html output with summary statistics

rm_compactsum(
  data,
  xvars,
  grp,
  use_mean,
  caption = NULL,
  tableOnly = FALSE,
  covTitle = "",
  digits = 1,
  digits.cat = 0,
  nicenames = TRUE,
  iqr = TRUE,
  all.stats = FALSE,
  pvalue = TRUE,
  effSize = FALSE,
  p.adjust = "none",
  unformattedp = FALSE,
  show.sumstats = FALSE,
  show.tests = FALSE,
  full = TRUE,
  percentage = "col"
)

Arguments

data: dataframe containing data
xvars: character vector with the names of covariates to include in table
grp: character with the name of the grouping variable
use_mean: logical indicating whether mean and standard deviation will be returned for continuous variables instead of median. Otherwise, can specify for individual variables using a character vector containing the names of covariates to return mean and sd for (if use_mean is not supplied, all covariates will have median summaries). See examples.
caption: character containing table caption (default is no caption)
tableOnly: logical, if TRUE then a dataframe is returned, otherwise a formatted printed object is returned (default is FALSE)
covTitle: character with the name of the covariate (predictor) column. The default is to leave this empty for output or, for table only output to use the column name 'Covariate'
digits: numeric specifying the number of digits for summarizing mean data. Digits can be specified for individual variables using a named vector in the format digits=c("var1"=2,"var2"=3). If a variable is not in the vector the default will be used for it (default is 1). See examples
digits.cat: numeric specifying the number of digits for the proportions when summarizing categorical data (default is 0)
nicenames: logical indicating if you want to replace . and _ in strings . with a space
iqr: logical indicating if you want to display the interquartile range (Q1-Q3) as opposed to (min-max) in the summary for continuous variables
all.stats: logical indicating if all summary statistics (Q1, Q3 + min, max on a separate line) should be displayed. Overrides iqr
pvalue: logical indicating if you want p-values included in the table
effSize: logical indicating if you want effect sizes and their 95% confidence intervals included in the table. Effect sizes calculated include Cramer's V for categorical variables, and Cohen's d, Wilcoxon r, Epsilon-squared, or Omega-squared for numeric/continuous variables
p.adjust: p-adjustments to be performed
unformattedp: logical indicating if you would like the p-value to be returned unformatted (ie. not rounded or prefixed with '<'). Best used with tableOnly = T and outTable function. See examples
show.sumstats: logical indicating if the type of statistical summary (mean, median, etc) used should be shown.
show.tests: logical indicating if the type of statistical test and effect size (if effSize = TRUE) used should be shown in a column beside the p-values.
full: logical indicating if you want the full sample included in the table, ignored if grp is not specified
percentage: choice of how percentages are presented, either column (default) or row

Value

A character vector of the table source code, unless tableOnly = TRUE in which case a data frame is returned. The output has the following attribute:

"description", which describes what is included in the output table and the type of statistical summary for each covariate. When applicable, the types of statistical tests used will be included. If effSize = TRUE, the effect sizes for each covariate will also be mentioned.

Details

Comparisons for categorical variables default to chi-square tests, but if there are counts of <5 then the Fisher Exact test will be used. For grouping variables with two levels, either t-tests (mean) or wilcoxon tests (median) will be used for numerical variables. Otherwise, ANOVA (mean) or Kruskal- Wallis tests will be used. The statistical test used can be displayed by specifying show.tests = TRUE. Statistical tests and effect sizes for grp and/ or xvars with less than 2 counts in any level will not be shown.

Effect sizes are calculated as Cohen d for between group differences if the variable is summarised with the mean, otherwise Wilcoxon R if summarised with a median. Cramer's V is used for categorical variables, omega is used for differences in means among more than two groups and epsilon for differences in medians among more than two groups. Confidence intervals are calculated using bootstrapping.

tidyselect can only be used for xvars and grp arguments. Additional arguments (digits, use_mean) must be passed in using characters if variable names are used.

References

Smithson, M. (2002). Noncentral Confidence Intervals for Standardized Effect Sizes. (07/140 ed., Vol. 140). SAGE Publications. doi:10.4135/9781412983761.n4

Steiger, J. H. (2004). Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis. Psychological Methods, 9(2), 164–182. doi:10.1037/1082-989X.9.2.164

Kelley, T. L. (1935). An Unbiased Correlation Ratio Measure. Proceedings of the National Academy of Sciences - PNAS, 21(9), 554–559. doi:10.1073/pnas.21.9.554

Okada, K. (2013). Is Omega Squared Less Biased? A Comparison of Three Major Effect Size Indices in One-Way ANOVA. Behavior Research Methods, 40(2), 129-147.

Breslow, N. (1970). A generalized Kruskal-Wallis test for comparing K samples subject to unequal patterns of censorship. Biometrika, 57(3), 579-594.

FRITZ, C. O., MORRIS, P. E., & RICHLER, J. J. (2012). Effect Size Estimates: Current Use, Calculations, and Interpretation. Journal of Experimental Psychology. General, 141(1), 2–18. doi:10.1037/a0024338

Examples

data("pembrolizumab")
rm_compactsum(data = pembrolizumab, xvars = c("age",
"change_ctdna_group", "l_size", "pdl1"), grp = "sex", use_mean = "age",
digits = c("age" = 2, "l_size" = 3), digits.cat = 1, iqr = TRUE,
show.tests = TRUE)
#> <table class="table table" style="margin-left: auto; margin-right: auto; margin-left: auto; margin-right: auto;">
#>  <thead>
#>   <tr>
#>    <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;">  </th>
#>    <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Full Sample (n=94) </th>
#>    <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Female (n=58) </th>
#>    <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Male (n=36) </th>
#>    <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> p-value </th>
#>    <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Missing </th>
#>    <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> pTest </th>
#>   </tr>
#>  </thead>
#> <tbody>
#>   <tr>
#>    <td style="text-align:left;"> <span style="font-weight: bold;">Age at study entry</span> </td>
#>    <td style="text-align:right;"> 57.86 (12.75) </td>
#>    <td style="text-align:right;"> 56.95 (12.59) </td>
#>    <td style="text-align:right;"> 59.32 (13.05) </td>
#>    <td style="text-align:right;"> 0.39 </td>
#>    <td style="text-align:right;"> 0 </td>
#>    <td style="text-align:right;"> t-test </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> <span style="font-weight: bold;">Did ctDNA increase or decrease from baseline to cycle 3 - Increase from baseline</span> </td>
#>    <td style="text-align:right;"> 40 (54.8%) </td>
#>    <td style="text-align:right;"> 21 (52.5%) </td>
#>    <td style="text-align:right;"> 19 (57.6%) </td>
#>    <td style="text-align:right;"> 0.84 </td>
#>    <td style="text-align:right;"> 21 </td>
#>    <td style="text-align:right;"> ChiSq </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> <span style="font-weight: bold;">Target lesion size at baseline</span> </td>
#>    <td style="text-align:right;"> 73.500 (49.250-108.750) </td>
#>    <td style="text-align:right;"> 68.000 (44.250-97.750) </td>
#>    <td style="text-align:right;"> 93.000 (65.500-121.000) </td>
#>    <td style="text-align:right;"> 0.066 </td>
#>    <td style="text-align:right;"> 0 </td>
#>    <td style="text-align:right;"> Wilcoxon Rank Sum </td>
#>   </tr>
#>   <tr>
#>    <td style="text-align:left;"> <span style="font-weight: bold;">PD L1 percent</span> </td>
#>    <td style="text-align:right;"> 0.0 (0.0-10.0) </td>
#>    <td style="text-align:right;"> 0.5 (0.0-13.8) </td>
#>    <td style="text-align:right;"> 0.0 (0.0-4.5) </td>
#>    <td style="text-align:right;"> 0.76 </td>
#>    <td style="text-align:right;"> 1 </td>
#>    <td style="text-align:right;"> Wilcoxon Rank Sum </td>
#>   </tr>
#> </tbody>
#> </table>

# Other Examples (not run)
## Include the summary statistic in the variable column
#rm_compactsum(data = pembrolizumab, xvars = c("age",
#"change_ctdna_group"), grp = "sex", use_mean = "age", show.sumstats=TRUE)

## To show effect sizes
#rm_compactsum(data = pembrolizumab, xvars = c("age",
#"change_ctdna_group"), grp = "sex", use_mean = "age", digits = 2,
#effSize = TRUE, show.tests = TRUE)

## To return unformatted p-values
#rm_compactsum(data = pembrolizumab, xvars = c("l_size",
#"change_ctdna_group"), grp = "cohort", effSize = TRUE, unformattedp = TRUE)

## Using tidyselect
#pembrolizumab |> rm_compactsum(xvars = c(age, sex, pdl1), grp = cohort,
#effSize = TRUE)