Skip to content

This is a method for the dplyr::filter() generic. See "Fallbacks" section for differences in implementation. The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [.

Usage

# S3 method for class 'duckplyr_df'
filter(.data, ..., .by = NULL, .preserve = FALSE)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Expressions that return a logical value, and are defined in terms of the variables in .data. If multiple expressions are included, they are combined with the & operator. Only rows for which all conditions evaluate to TRUE are kept.

.by

[Experimental]

<tidy-select> Optionally, a selection of columns to group by for just this operation, functioning as an alternative to group_by(). For details and examples, see ?dplyr_by.

.preserve

Relevant when the .data input is grouped. If .preserve = FALSE (the default), the grouping structure is recalculated based on the resulting data, otherwise the grouping is kept as is.

Fallbacks

There is no DuckDB translation in filter.duckplyr_df()

  • with no filter conditions,

  • nor for a grouped operation (if .by is set).

These features fall back to dplyr::filter(), see vignette("fallback") for details.

See also

Examples

df <- duckdb_tibble(x = 1:3, y = 3:1)
filter(df, x >= 2)
#> # A duckplyr data frame: 2 variables
#>       x     y
#>   <int> <int>
#> 1     2     2
#> 2     3     1