crosstable provides a high level of customization. While
the available options may not be immediately intuitive at first, they
allow fine control over how summaries and effects are computed and
displayed.
Before exploring these options, we start by loading the package and setting a few convenient defaults.
library(crosstable)
crosstable_options(compact=TRUE, keep_id=TRUE)Customization in crosstable mainly happens at three
levels:
- how numerical variables are summarized (
funs) - how group effects are computed (
effect_args) - how statistical tests are computed (
test_args)
Summary functions (funs) describe each group separately,
whereas effect_args and test_args control how
groups are compared. effect_args controls the estimated
effect size and confidence interval, whereas test_args
controls the hypothesis test and p-value.
Numeric variables: the funs argument
Numeric variables are summarized using a set of summary functions. By
default, crosstable reports: min/max,
median/IQR, mean/sd and
number of observations/missing. These summaries are
generated by the internal function cross_summary().
These summaries can be customized depending on how you want numeric variables to be reported.
In practice, you will often want all numeric variables to be
summarized in the same way.
For this reason, it is convenient to define funs globally
with crosstable_options(), although you can also pass it
directly to crosstable().
The first possibility is to use a named list of functions. If a
function returns multiple values (as with quantile()), the
names of the returned statistics are automatically combined.
crosstable_options(funs=c("mean"=mean, "std dev"=sd, qtl=~quantile(.x, prob=c(0.25, 0.75))))
crosstable(mtcars2, mpg) %>% as_flextable()value |
|
|---|---|
Miles/(US) gallon (mpg) |
|
mean |
20.1 |
std dev |
6.0 |
qtl 25% |
15.4 |
qtl 75% |
22.8 |
Another option is to provide a custom summary function that returns
several statistics at once. In this case, you should give the function
an empty name (" ") so that its internal labels are used
directly.
f = function(x) c("Mean (SD)"=meansd(x), "Med [IQR]"=mediqr(x))
crosstable(mtcars2, wt, funs=f) %>% as_flextable()value |
|
|---|---|
Weight (1000 lbs) (wt) |
|
f Mean (SD) |
3.22 (0.98) |
f Med [IQR] |
3.33 [2.58;3.61] |
crosstable(mtcars2, wt, funs=c(" "=f)) %>% as_flextable()value |
|
|---|---|
Weight (1000 lbs) (wt) |
|
Mean (SD) |
3.22 (0.98) |
Med [IQR] |
3.33 [2.58;3.61] |
To this end, crosstable exports convenience functions:
meansd(), meanCI(), mediqr(),
minmax(), and nna().
Customizing effects
When effect = TRUE, crosstable computes an
effect comparing the levels of the by variable.
Effect calculation is controlled by the effect_args
argument, which defaults to the result of
crosstable_effect_args().
The function used for actual calculation depends on the type of variable being analyzed:
-
effect_summarizefor numeric variables -
effect_tabularfor categorical variables -
effect_survivalfor survival outcomes
By default, effect_tabular is set to
effect_odds_ratio(), which computes an odds ratio for
categorical variables.
mtcars2 %>%
crosstable(am, by=vs, effect=TRUE) %>%
as_flextable()Engine |
effect |
||
|---|---|---|---|
straight |
vshaped |
||
Transmission (am) |
Odds ratio [95% Wald CI], ref='vshaped vs straight' |
||
auto |
7 (36.84%) |
12 (63.16%) |
|
manual |
7 (53.85%) |
6 (46.15%) |
|
Suppose that instead of an odds ratio, you want to compute a difference in proportions.
To define a custom categorical effect, you need to write a function that takes:
-
x: the variable being summarized -
by: the grouping variable -
conf.level: the confidence level
and returns a list with the following elements:
-
summary: a data frame containing the effect label, estimate, and confidence interval.
It can contain several rows ifxhas more than two levels. -
effect.type: the name of the effect being computed -
ref: the reference level or comparison label
The following example computes a difference in proportions and uses
prop.test() to derive the confidence interval.
ct_effect_prop_diff = function(x, by, conf.level){
tb = table(x, by)
test = prop.test(tb, conf.level=conf.level)
nms = dimnames(tb)[["x"]]
effect = diff(test$estimate)
effect.type = "Difference of proportions"
reference = glue::glue(", {nms[1]} vs {nms[2]}")
summary = data.frame(name = "Proportion difference", effect,
ci_inf = test$conf.int[1],
ci_sup = test$conf.int[2])
list(summary = summary, ref = reference, effect.type = effect.type)
}
my_effect_args = crosstable_effect_args(effect_tabular=ct_effect_prop_diff)
# crosstable_options(effect_args=my_effect_args) #set globally if desired
mtcars2 %>%
crosstable(am, by=vs, effect=TRUE, effect_args=my_effect_args) %>%
as_flextable()Engine |
effect |
||
|---|---|---|---|
straight |
vshaped |
||
Transmission (am) |
Difference of proportions, auto vs manual |
||
auto |
7 (36.84%) |
12 (63.16%) |
|
manual |
7 (53.85%) |
6 (46.15%) |
|
The same general approach can be used to define custom effects for numeric and survival variables.
Several alternative effect functions are already implemented in
crosstable.
See ?effect_summary, ?effect_tabular, and
?effect_survival for available options.
Customizing tests
Customizing statistical tests is even simpler: you specify the test function in [crosstable_test_args()], using the appropriate argument depending on the variable type.
For example, the following code forces all tests for categorical variables to use Fisher’s exact test:
my_test_args = crosstable_test_args(test_tabular=fisher.test)
mtcars2 %>%
crosstable(am, by=vs, test=TRUE, test_args=my_test_args) %>%
as_flextable()
#> Warning in crosstable(., am, by = vs, test = TRUE, test_args = my_test_args): Be aware that automatic global testing should only be done in an exploratory
#> context, as it would cause extensive alpha inflation otherwise.
#> This warning is displayed once every 8 hours.Engine |
test |
||
|---|---|---|---|
straight |
vshaped |
||
Transmission (am) |
p value: 0.4727 |
||
auto |
7 (36.84%) |
12 (63.16%) |
|
manual |
7 (53.85%) |
6 (46.15%) |
|
You can also provide your own custom test function. Such a function
must return a list with two elements: - p.value: the
p-value - method: the label displayed for the test
For example, the function below performs a test for numeric variables
using a linear model. In a two-group setting, this is conceptually close
to a standard group comparison test, but it mainly serves here as an
illustration of how custom testing logic can be integrated into
crosstable.
ct_test_lm = function(x, by){
fit = lm(x ~ by)
pval = anova(fit)$`Pr(>F)`[1]
list(p.value = pval, method = "Linear model ANOVA")
}
my_test_args = crosstable_test_args(test_summarize=ct_test_lm)
mtcars2 %>%
crosstable(mpg, by=vs, test=TRUE, test_args=my_test_args) %>%
as_flextable()Engine |
test |
||
|---|---|---|---|
straight |
vshaped |
||
Miles/(US) gallon (mpg) |
p value: <0.0001 |
||
mean |
24.6 |
16.6 |
|
std dev |
5.4 |
3.9 |
|
qtl 25% |
21.4 |
14.8 |
|
qtl 75% |
29.6 |
19.1 |
|
