These functions ingest data from a file. In many cases, these functions return immediately because they only read the metadata. The actual data is only read when it is actually processed.
read_parquet_duckdb()
reads a CSV file using DuckDB's read_parquet()
table function.
read_csv_duckdb()
reads a CSV file using DuckDB's read_csv_auto()
table function.
read_json_duckdb()
reads a JSON file using DuckDB's read_json()
table function.
read_file_duckdb()
uses arbitrary readers to read data.
See https://duckdb.org/docs/data/overview for a documentation
of the available functions and their options.
To read multiple files with the same schema,
pass a wildcard or a character vector to the path
argument,
Usage
read_parquet_duckdb(
path,
...,
prudence = c("thrifty", "lavish", "stingy"),
options = list()
)
read_csv_duckdb(
path,
...,
prudence = c("thrifty", "lavish", "stingy"),
options = list()
)
read_json_duckdb(
path,
...,
prudence = c("thrifty", "lavish", "stingy"),
options = list()
)
read_file_duckdb(
path,
table_function,
...,
prudence = c("thrifty", "lavish", "stingy"),
options = list()
)
Arguments
- path
Path to files, glob patterns
*
and?
are supported.- ...
These dots are for future extensions and must be empty.
- prudence
Memory protection, controls if DuckDB may convert intermediate results in DuckDB-managed memory to data frames in R memory.
"thrifty"
: up to a maximum size of 1 million cells,"lavish"
: regardless of size,"stingy"
: never.
The default is
"thrifty"
for the ingestion functions, and may be different for other functions. Seevignette("prudence")
for more information.- options
Arguments to the DuckDB function indicated by
table_function
.- table_function
The name of a table-valued DuckDB function such as
"read_parquet"
,"read_csv"
,"read_csv_auto"
or"read_json"
.
Value
A duckplyr frame, see as_duckdb_tibble()
for details.
Fine-tuning prudence
The prudence
argument can also be a named numeric vector
with at least one of cells
or rows
to limit the cells (values) and rows in the resulting data frame
after automatic materialization.
If both limits are specified, both are enforced.
The equivalent of "thrifty"
is c(cells = 1e6)
.
Examples
# Create simple CSV file
path <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)
# Reading is immediate
df <- read_csv_duckdb(path)
# Names are always available
names(df)
#> [1] "a" "b"
# Materialization upon access is turned off by default
try(print(df$a))
#> [1] 1 2 3
# Materialize explicitly
collect(df)$a
#> [1] 1 2 3
# Automatic materialization with prudence = "lavish"
df <- read_csv_duckdb(path, prudence = "lavish")
df$a
#> [1] 1 2 3
# Specify column types
read_csv_duckdb(
path,
options = list(delim = ",", types = list(c("DOUBLE", "VARCHAR")))
)
#> # A duckplyr data frame: 2 variables
#> a b
#> <dbl> <chr>
#> 1 1 d
#> 2 2 e
#> 3 3 f
# Create and read a simple JSON file
path <- tempfile("duckplyr_test_", fileext = ".json")
writeLines('[{"a": 1, "b": "x"}, {"a": 2, "b": "y"}]', path)
# Reading needs the json extension
db_exec("INSTALL json")
db_exec("LOAD json")
read_json_duckdb(path)
#> # A duckplyr data frame: 2 variables
#> a b
#> <dbl> <chr>
#> 1 1 x
#> 2 2 y