Skip to contents

crosstable provides a high level of customization. While the available options may not be immediately intuitive at first, they allow fine control over how summaries and effects are computed and displayed.

Before exploring these options, we start by loading the package and setting a few convenient defaults.

library(crosstable)
crosstable_options(compact=TRUE, keep_id=TRUE)

Customization in crosstable mainly happens at three levels:

  • how numerical variables are summarized (funs)
  • how group effects are computed (effect_args)
  • how statistical tests are computed (test_args)

Summary functions (funs) describe each group separately, whereas effect_args and test_args control how groups are compared. effect_args controls the estimated effect size and confidence interval, whereas test_args controls the hypothesis test and p-value.

Numeric variables: the funs argument

Numeric variables are summarized using a set of summary functions. By default, crosstable reports: min/max, median/IQR, mean/sd and number of observations/missing. These summaries are generated by the internal function cross_summary().

These summaries can be customized depending on how you want numeric variables to be reported.

In practice, you will often want all numeric variables to be summarized in the same way.
For this reason, it is convenient to define funs globally with crosstable_options(), although you can also pass it directly to crosstable().

The first possibility is to use a named list of functions. If a function returns multiple values (as with quantile()), the names of the returned statistics are automatically combined.

crosstable_options(funs=c("mean"=mean, "std dev"=sd, qtl=~quantile(.x, prob=c(0.25, 0.75))))
crosstable(mtcars2, mpg) %>% as_flextable()

value

Miles/(US) gallon (mpg)

mean

20.1

std dev

6.0

qtl 25%

15.4

qtl 75%

22.8

Another option is to provide a custom summary function that returns several statistics at once. In this case, you should give the function an empty name (" ") so that its internal labels are used directly.

f = function(x) c("Mean (SD)"=meansd(x), "Med [IQR]"=mediqr(x))
crosstable(mtcars2, wt, funs=f) %>% as_flextable()

value

Weight (1000 lbs) (wt)

f Mean (SD)

3.22 (0.98)

f Med [IQR]

3.33 [2.58;3.61]

crosstable(mtcars2, wt, funs=c(" "=f)) %>% as_flextable()

value

Weight (1000 lbs) (wt)

Mean (SD)

3.22 (0.98)

Med [IQR]

3.33 [2.58;3.61]

To this end, crosstable exports convenience functions: meansd(), meanCI(), mediqr(), minmax(), and nna().

Customizing effects

When effect = TRUE, crosstable computes an effect comparing the levels of the by variable.

Effect calculation is controlled by the effect_args argument, which defaults to the result of crosstable_effect_args().

The function used for actual calculation depends on the type of variable being analyzed:

  • effect_summarize for numeric variables
  • effect_tabular for categorical variables
  • effect_survival for survival outcomes

By default, effect_tabular is set to effect_odds_ratio(), which computes an odds ratio for categorical variables.

mtcars2 %>% 
  crosstable(am, by=vs, effect=TRUE) %>% 
  as_flextable()

Engine

effect

straight

vshaped

Transmission (am)

Odds ratio [95% Wald CI], ref='vshaped vs straight'
manual vs auto: 0.50 [0.11 to 2.08]

auto

7 (36.84%)

12 (63.16%)

manual

7 (53.85%)

6 (46.15%)

Suppose that instead of an odds ratio, you want to compute a difference in proportions.

To define a custom categorical effect, you need to write a function that takes:

  • x: the variable being summarized
  • by: the grouping variable
  • conf.level: the confidence level

and returns a list with the following elements:

  • summary: a data frame containing the effect label, estimate, and confidence interval.
    It can contain several rows if x has more than two levels.
  • effect.type: the name of the effect being computed
  • ref: the reference level or comparison label

The following example computes a difference in proportions and uses prop.test() to derive the confidence interval.

ct_effect_prop_diff = function(x, by, conf.level){
  tb = table(x, by) 
  test = prop.test(tb, conf.level=conf.level)
  nms = dimnames(tb)[["x"]] 
  effect = diff(test$estimate)
  effect.type = "Difference of proportions"
  reference = glue::glue(", {nms[1]} vs {nms[2]}")
  summary = data.frame(name = "Proportion difference", effect, 
                       ci_inf = test$conf.int[1], 
                       ci_sup = test$conf.int[2])
  list(summary = summary, ref = reference, effect.type = effect.type)
}
my_effect_args = crosstable_effect_args(effect_tabular=ct_effect_prop_diff)
  
# crosstable_options(effect_args=my_effect_args) #set globally if desired
mtcars2 %>% 
  crosstable(am, by=vs, effect=TRUE, effect_args=my_effect_args) %>% 
  as_flextable()

Engine

effect

straight

vshaped

Transmission (am)

Difference of proportions, auto vs manual
Proportion difference: 0.17 [-0.58 to 0.24]

auto

7 (36.84%)

12 (63.16%)

manual

7 (53.85%)

6 (46.15%)

The same general approach can be used to define custom effects for numeric and survival variables.

Several alternative effect functions are already implemented in crosstable.
See ?effect_summary, ?effect_tabular, and ?effect_survival for available options.

Customizing tests

Customizing statistical tests is even simpler: you specify the test function in [crosstable_test_args()], using the appropriate argument depending on the variable type.

For example, the following code forces all tests for categorical variables to use Fisher’s exact test:

my_test_args = crosstable_test_args(test_tabular=fisher.test)
mtcars2 %>% 
  crosstable(am, by=vs, test=TRUE, test_args=my_test_args) %>% 
  as_flextable()
#> Warning in crosstable(., am, by = vs, test = TRUE, test_args = my_test_args): Be aware that automatic global testing should only be done in an exploratory
#> context, as it would cause extensive alpha inflation otherwise.
#> This warning is displayed once every 8 hours.

Engine

test

straight

vshaped

Transmission (am)

p value: 0.4727
(Fisher's Exact Test for Count Data)

auto

7 (36.84%)

12 (63.16%)

manual

7 (53.85%)

6 (46.15%)

You can also provide your own custom test function. Such a function must return a list with two elements: - p.value: the p-value - method: the label displayed for the test

For example, the function below performs a test for numeric variables using a linear model. In a two-group setting, this is conceptually close to a standard group comparison test, but it mainly serves here as an illustration of how custom testing logic can be integrated into crosstable.

ct_test_lm = function(x, by){
  fit = lm(x ~ by)
  pval = anova(fit)$`Pr(>F)`[1]
  list(p.value = pval, method = "Linear model ANOVA")
}

my_test_args = crosstable_test_args(test_summarize=ct_test_lm)
mtcars2 %>% 
  crosstable(mpg, by=vs, test=TRUE, test_args=my_test_args) %>% 
  as_flextable()

Engine

test

straight

vshaped

Miles/(US) gallon (mpg)

p value: <0.0001
(Linear model ANOVA)

mean

24.6

16.6

std dev

5.4

3.9

qtl 25%

21.4

14.8

qtl 75%

29.6

19.1