Crosstable is a package centered on a single function, crosstable, which easily computes descriptive statistics on datasets.

It can use the tidyselect syntax for selecting variables (and more) and is interfaced with the package officer to create automatized reports.


# Install last version on CRAN (#0.2.1)

# Install last version on GitHub (fewer bug, more features)
devtools::install_github("DanChaltiel/crosstable", build_vignettes=TRUE)

However, for reproducibility purpose, you might want to install a specific version in your projects (even better would be to use renv):

devtools::install_github("DanChaltiel/crosstable@v0.2.1-CRAN", build_vignettes=TRUE) #last version
devtools::install_github("DanChaltiel/crosstable@e7ca7ab", build_vignettes=TRUE) #last commit

In case of any installation problem, try the solutions proposed in this article or fill an Issue.

Getting help

You can use the vignettes (click on the links):


Base usage

Without any more argument than the dataset, the function will summarise all numeric variables with statistics (min+max, mean+sd, median+IQR, N+NA) and all categorical variables with counts and percentages.

library(dplyr) #for the pipe

#>             .id        label   variable         value
#> 1  Sepal.Length Sepal.Length  Min / Max     4.3 / 7.9
#> 2  Sepal.Length Sepal.Length  Med [IQR] 5.8 [5.1;6.4]
#> 3  Sepal.Length Sepal.Length Mean (std)     5.8 (0.8)
#> 4  Sepal.Length Sepal.Length     N (NA)       150 (0)
#> 5   Sepal.Width  Sepal.Width  Min / Max     2.0 / 4.4
#> 6   Sepal.Width  Sepal.Width  Med [IQR] 3.0 [2.8;3.3]
#> 7   Sepal.Width  Sepal.Width Mean (std)     3.1 (0.4)
#> 8   Sepal.Width  Sepal.Width     N (NA)       150 (0)
#> 9  Petal.Length Petal.Length  Min / Max     1.0 / 6.9
#> 10 Petal.Length Petal.Length  Med [IQR] 4.3 [1.6;5.1]
#> 11 Petal.Length Petal.Length Mean (std)     3.8 (1.8)
#> 12 Petal.Length Petal.Length     N (NA)       150 (0)
#> 13  Petal.Width  Petal.Width  Min / Max     0.1 / 2.5
#> 14  Petal.Width  Petal.Width  Med [IQR] 1.3 [0.3;1.8]
#> 15  Petal.Width  Petal.Width Mean (std)     1.2 (0.8)
#> 16  Petal.Width  Petal.Width     N (NA)       150 (0)
#> 17      Species      Species     setosa   50 (33.33%)
#> 18      Species      Species versicolor   50 (33.33%)
#> 19      Species      Species  virginica   50 (33.33%)

Column specification, grouping and labels

You can select specific columns using names and helpers functions, and require specific summary statistics using funs and funs_arg. The by argument allows to specify a grouping variable; the following example groups by the column vs, which is the “Engine” type. Note that, as the mtcars2 has labels, they are also included in the crosstable.

The as_flextable function allows to output a beautiful HTML table that can be customized at will (see the flextable package) and embed in a Word document (see the officer package).

ct1 = crosstable(mtcars2, c(qsec, ends_with("t"), starts_with("c")), by=vs,
                 funs=c(mean, quantile), funs_arg=list(probs=c(.25,.75), digits=3))
ct1 %>% as_flextable(keep_id=TRUE)

Margins and totals

The margin argument changes the percentages calculation, while the total argument adds total rows or columns.

#margin and totals
ct2 = crosstable(mtcars2, c(disp, vs), by=am, margin=c("row", "col"), total="both")
ct2 %>% as_flextable

Predicate functions, automatic testing

For the variable selection, you can use predicate functions. It is a good practice to wrap these in where. If the grouping variable is numeric, correlation coefficients will be calculated.

Using the test argument, you can perform tests with each variable and the grouping variable. Beware, automatic testing should only be done in an exploratory context, as it would cause extensive alpha inflation otherwise.

ct3 = crosstable(mtcars2, where(is.numeric), by=hp, test=TRUE)
ct3 %>% as_flextable

Lambda functions, effect size calculation

The predicate function can be a lambda function, using .x as the variable name.

Using the effect argument, you can calculate effect sizes for all numeric variables and for categorical variable of exactly 2 levels.

ct4 = crosstable(mtcars2, where(~is.numeric(.x) && mean(.x)>50), by=vs, effect=TRUE)
ct4 %>% as_flextable

Formula syntax, survival variables

Finally, you can describe survival data using the Surv object from the package survival. The times and followup arguments allows for more control.

This is only possible using the formula syntax of variable selection, which allows more complex selection and is written as var1 + var2 ~ group.

ct5 = crosstable(aml, Surv(time, status) ~ x, times=c(0,15,30,150), followup=TRUE)
ct5 %>% as_flextable


crosstable is a rewrite of the awesome biostat2 package written by David Hajage. The user interface is quite different but the concept is the same.

Thanks David!