Using `percent_pattern` • crosstable

When working with categorical variables, crosstable() allows a very flexible output thanks to the percent_pattern argument.

This vignette will review the many things you can do using percent_pattern.

initialization

First, let’s add some missing values to the mtcars2 dataset and tweak some options:

library(crosstable)
mtcars3 = mtcars2
mtcars3$cyl[1:5] = NA
mtcars3$vs[5:12] = NA

crosstable_options(
  percent_digits=0
)

Default behaviour

By default, crosstable() will use percent_pattern="{n} ({p_row})", so it outputs the size n along with the row’s percentage p_row:

crosstable(mtcars3, cyl, by=vs) %>% as_flextable()

label	variable	Engine
label	variable	straight	vshaped	NA
Number of cylinders	4	7 (88%)	1 (12%)	2
	6	0 (0%)	1 (100%)	3
	8	0 (0%)	11 (100%)	2
	NA	2	2	1

Here, we will see how we can tweak percent_pattern in order to display other figures.

NOTE: Missing values will always be described with n alone. If you want to describe them as non-missing values, you will have to mutate them as one, most likely using forcats::fct_explicit_na().

Allowed variables

First, here is the list of all the internal variables you can use:

n, n_row, n_col, and n_tot: respectively the size of the cell, the row, the column, and the whole table.
p_row, p_col, and p_tot: respectively the proportion relative to the row, the column, and the whole table.
p_tot_inf, p_tot_sup, p_row_inf, p_row_sup, p_col_inf, p_col_sup: the confidence interval (calculated using Wilson score) for each of the proportions above.

Should you ever need it, note that it is also possible to use any external variable defined outside of crosstable().

Here is a simple example:

crosstable(mtcars3, cyl, by=vs, 
           percent_pattern="N={n}/{n_row} -> p={p_row}") %>% 
  as_flextable()

label	variable	Engine
label	variable	straight	vshaped	NA
Number of cylinders	4	N=7/8 -> p=88%	N=1/8 -> p=12%	2
	6	N=0/1 -> p=0%	N=1/1 -> p=100%	3
	8	N=0/11 -> p=0%	N=11/11 -> p=100%	2
	NA	2	2	1

Missing values

As you can see, these internal variables do not account for missing values (except for n, obviously).

This should make sense in most cases, but if it doesn’t, you can use the following variables to account for NA explicitly:

n_row_na, n_col_na, n_tot_na
p_tot_na, p_row_na, p_col_na
p_tot_na_inf, p_tot_na_sup, p_row_na_inf, p_row_na_sup, p_col_na_inf, p_col_na_sup

(See the last section for an example)

Note that if you use showNA="no", there will be no difference between the standard variables and the _na variables.

Proportions in totals

As you may have noticed, totals are considered separately:

crosstable(mtcars3, cyl, by=vs, total=TRUE, 
           percent_pattern="N={n}, p={p_row} ({n}/{n_row})") %>% 
  as_flextable()

label	variable	Engine			Total
label	variable	straight	vshaped	NA	Total
Number of cylinders	4	N=7, p=88% (7/8)	N=1, p=12% (1/8)	2	10 (37%)
	6	N=0, p=0% (0/1)	N=1, p=100% (1/1)	3	4 (15%)
	8	N=0, p=0% (0/11)	N=11, p=100% (11/11)	2	13 (48%)
	NA	2	2	1	5
	Total	9 (38%)	15 (62%)	8	32 (100%)

Indeed, you cannot have the same pattern for totals. For instance, the proportion relative to the row would not make sense in the context of the entire row itself.

To get control over the percent_pattern in totals, you have to pass a list with names body, total_row, total_col, and total_all:

pp = list(body="N={n}, p={p_tot} ({n}/{n_tot})", 
          total_row="N={n} p=({p_col})", 
          total_col="{n}", total_all="Total={n}")
crosstable(mtcars3, cyl, by=vs, total=TRUE, 
           percent_pattern=pp) %>% 
  as_flextable()

label	variable	Engine			Total
label	variable	straight	vshaped	NA	Total
Number of cylinders	4	N=7, p=35% (7/20)	N=1, p=5% (1/20)	2	10
	6	N=0, p=0% (0/20)	N=1, p=5% (1/20)	3	4
	8	N=0, p=0% (0/20)	N=11, p=55% (11/20)	2	13
	NA	2	2	1	5
	Total	N=9 p=(38%)	N=15 p=(62%)	8	Total=32

`get_percent_pattern()`

To easily get a percent_pattern list, you can use the get_percent_pattern() helper:

get_percent_pattern("all")
#> $body
#> [1] "{n} ({p_tot} / {p_row} / {p_col})"
#> 
#> $total_row
#> [1] "{n} ({p_col})"
#> 
#> $total_col
#> [1] "{n} ({p_row})"
#> 
#> $total_all
#> [1] "{n} ({p_tot})"
get_percent_pattern("col", na=TRUE)
#> $body
#> [1] "{n} ({p_col_na})"
#> 
#> $total_row
#> [1] "{n} ({p_col_na})"
#> 
#> $total_col
#> [1] "{n} ({p_row_na})"
#> 
#> $total_all
#> [1] "{n} ({p_tot_na})"

You can also set the result to a variable and modify its members at will. See ?get_percent_pattern for more information.

Ultimate example

Here is the ultimate example for percent_pattern. Give a close look to all possible values and you will surely find the one that you need.

ULTIMATE_PATTERN=list(
  body="N={n}
        Cell: p = {p_tot} ({n}/{n_tot}) [{p_tot_inf}; {p_tot_sup}]
        Col: p = {p_col} ({n}/{n_col}) [{p_col_inf}; {p_col_sup}]
        Row: p = {p_row} ({n}/{n_row}) [{p_row_inf}; {p_row_sup}]
        
        Cell (NA): p = {p_tot_na} ({n}/{n_tot_na}) [{p_tot_na_inf}; {p_tot_na_sup}]
        Col (NA): p = {p_col_na} ({n}/{n_col_na}) [{p_col_na_inf}; {p_col_na_sup}]
        Row (NA): p = {p_row_na} ({n}/{n_row_na}) [{p_row_na_inf}; {p_row_na_sup}]",
  total_row="N={n}
             Row: p = {p_row} ({n}/{n_row}) [{p_row_inf}; {p_row_sup}]
             Row (NA): p = {p_row_na} ({n}/{n_row_na}) [{p_row_na_inf}; {p_row_na_sup}]",
  total_col="N={n}
             Col: p = {p_col} ({n}/{n_col}) [{p_col_inf}; {p_col_sup}]
             Col (NA): p = {p_col_na} ({n}/{n_col_na}) [{p_col_na_inf}; {p_col_na_sup}]",
  total_all="N={n}
             P: {p_col} [{p_col_inf}; {p_col_sup}]
             P (NA): {p_col} [{p_col_na_inf}; {p_col_na_sup}]"
)

crosstable(mtcars3, cyl, by=vs,
           percent_digits=0, total=TRUE, showNA="always",
           percent_pattern=ULTIMATE_PATTERN) %>% 
  as_flextable() %>% 
  flextable::theme_box()

label	variable	Engine			Total
label	variable	straight	vshaped	NA	Total
Number of cylinders	4	N=7 Cell: p = 35% (7/20) [2e+01%; 57%] Col: p = 100% (7/7) [65%; 100%] Row: p = 88% (7/8) [53%; 98%] Cell (NA): p = 22% (7/32) [11%; 39%] Col (NA): p = 78% (7/9) [45%; 94%] Row (NA): p = 70% (7/10) [40%; 89%]	N=1 Cell: p = 5% (1/20) [9e-01%; 24%] Col: p = 8% (1/13) [1%; 33%] Row: p = 12% (1/8) [2%; 47%] Cell (NA): p = 3% (1/32) [1%; 16%] Col (NA): p = 7% (1/15) [1%; 30%] Row (NA): p = 10% (1/10) [2%; 40%]	2	N=10 Col: p = 37% (10/27) [22%; 56%] Col (NA): p = 31% (10/32) [18%; 49%]
	6	N=0 Cell: p = 0% (0/20) [1e-15%; 16%] Col: p = 0% (0/7) [0%; 35%] Row: p = 0% (0/1) [0%; 79%] Cell (NA): p = 0% (0/32) [0%; 11%] Col (NA): p = 0% (0/9) [0%; 30%] Row (NA): p = 0% (0/4) [0%; 49%]	N=1 Cell: p = 5% (1/20) [9e-01%; 24%] Col: p = 8% (1/13) [1%; 33%] Row: p = 100% (1/1) [21%; 100%] Cell (NA): p = 3% (1/32) [1%; 16%] Col (NA): p = 7% (1/15) [1%; 30%] Row (NA): p = 25% (1/4) [5%; 70%]	3	N=4 Col: p = 15% (4/27) [6%; 32%] Col (NA): p = 12% (4/32) [5%; 28%]
	8	N=0 Cell: p = 0% (0/20) [1e-15%; 16%] Col: p = 0% (0/7) [0%; 35%] Row: p = 0% (0/11) [0%; 26%] Cell (NA): p = 0% (0/32) [0%; 11%] Col (NA): p = 0% (0/9) [0%; 30%] Row (NA): p = 0% (0/13) [0%; 23%]	N=11 Cell: p = 55% (11/20) [3e+01%; 74%] Col: p = 85% (11/13) [58%; 96%] Row: p = 100% (11/11) [74%; 100%] Cell (NA): p = 34% (11/32) [20%; 52%] Col (NA): p = 73% (11/15) [48%; 89%] Row (NA): p = 85% (11/13) [58%; 96%]	2	N=13 Col: p = 48% (13/27) [31%; 66%] Col (NA): p = 41% (13/32) [26%; 58%]
	NA	2	2	1	5
	Total	N=9 Row: p = 38% (9/24) [21%; 57%] Row (NA): p = 28% (9/32) [16%; 45%]	N=15 Row: p = 62% (15/24) [43%; 79%] Row (NA): p = 47% (15/32) [31%; 64%]	8	N=32 P: 100% [89%; 100%] P (NA): 100% [89%; 100%]