Read all .csv
files in a directory, with labels if specified.
Usage
read_all_csv(
path,
...,
labels_from = NULL,
format_file = NULL,
subdirectories = FALSE,
read_fun = "guess",
datetime_extraction = "guess",
verbose = getOption("edc_read_verbose", 1),
clean_names_fun = NULL
)
Arguments
- path
[
character(1)
]
path to the directory containing.csv
files.- ...
unused
- labels_from
[
character(1)
]
path to the file containing the labels. See section "Labels file" below.- format_file
[
character(1)
]
the path to the file that should be used to apply formats. See section "Format file" below. UseNULL
to not apply formats.- subdirectories
[
logical(1)
]
whether to read subdirectories- read_fun
[
function
]
if "guess" doesn't work properly, a function to read the files in path, e.g.read.csv
,read.csv2
,...- datetime_extraction
[
POSIXt(1)
]
the datetime of the data extraction. Default to the most common date of last modification inpath
.- verbose
[
numeric(1)
]
one ofc(0, 1, 2)
. The higher, the more information will be printed.- clean_names_fun
use
edc_clean_names()
instead.
Value
a list containing one dataframe for each .csv
file in the folder, the extraction date (datetime_extraction
), and a summary of all imported tables (.lookup
).
Labels file
labels_from
should contain the information about column labels. It should be a data file (.csv
) containing 2 columns: one for the column name and the other for its associated label. Use options(edc_col_name="xxx", edc_col_label="xxx")
to specify the names of the columns.
Format file
format_file
should contain the information about SAS formats. It can be either:
a
procformat.sas
file, containing the whole PROC FORMATor a data file (.csv or .sas7bdat) containing 3 columns:
FMTNAME
the SAS format name (repeated)START
the variable levelLABEL
the label associated to the level
You can get this datafile from SAS using
PROC FORMAT
with optionCNTLOUT
. Otherwise, you can useoptions(edc_var_format_name="xxx", edc_var_level="xxx", edc_var_label="xxx")
to specify different column names.
See also
Other EDCimport reading functions:
read_all_sas()
,
read_all_xpt()
,
read_trialmaster()
Examples
# Create a directory with multiple csv files and a label lookup.
path = paste0(tempdir(), "/read_all_csv")
dir.create(paste0(path, "/subdir"), recursive=TRUE)
write.csv(iris, paste0(path, "/iris.csv"))
write.csv(mtcars, paste0(path, "/mtcars.csv"))
write.csv(mtcars, paste0(path, "/subdir/mtcars.csv"))
write.csv(airquality, paste0(path, "/airquality.csv"))
labs = c(iris, mtcars, airquality) %>% names()
write.csv(data.frame(name=labs, label=toupper(labs)), paste0(path, "/labels.csv"))
db = read_all_csv(path, labels_from="labels.csv", subdirectories=TRUE) %>%
set_project_name("My great project")
#> Warning: Option "edc_lookup" has been overwritten.
db
#> ── EDCimport database ──────────────────────────────────────────────────────────
#> Contains 4 tables: `airquality`, `iris`, `mtcars`, and `subdir_mtcars`
#> ℹ Use `EDCimport::load_database(db)` to load the tables in the global
#> environment.
#> ℹ Use `EDCimport::edc_lookup()` to see the summary table.
edc_lookup()
#> ── Lookup table - My great project (extraction of 2025-02-19) - EDCimport v0.5.2
#> dataset nrow ncol n_id rows_per_id crfname
#> <chr> <dbl> <dbl> <int> <dbl> <chr>
#> 1 airquality 153 7 0 NA NA
#> 2 iris 150 6 0 NA NA
#> 3 mtcars 32 12 0 NA NA
#> 4 subdir_mtcars 32 12 0 NA NA