Skip to contents

Compares two data frames/tibbles (or two objects coercible to tibbles like matrices), optionally ignoring row and column ordering, and returns TRUE if both are equal, or FALSE otherwise. If the latter is the case and quiet = FALSE, information about detected differences is printed to the console.

Usage

is_equal_df(
  x,
  y,
  ignore_col_order = FALSE,
  ignore_row_order = FALSE,
  ignore_col_types = FALSE,
  tolerance = NULL,
  quiet = TRUE,
  max_diffs = 10L,
  return_waldo_compare = FALSE
)

Arguments

x

The data frame / tibble to check for changes.

y

The data frame / tibble that x should be checked against, i.e. the reference, so messages describe how x is different to y.

ignore_col_order

Whether or not to ignore the order of columns.

ignore_row_order

Whether or not to ignore the order of rows.

ignore_col_types

Whether or not to distinguish similar column types. Currently, if set to TRUE, this will convert factors to characters and integers to doubles before the comparison.

tolerance

If non-NULL, used as threshold for ignoring small floating point difference when comparing numeric vectors. Using any non-NULL value will cause integer and double vectors to be compared based on their values, not their types, and will ignore the difference between NaN and NA_real_.

It uses the same algorithm as all.equal(), i.e., first we generate x_diff and y_diff by subsetting x and y to look only locations with differences. Then we check that mean(abs(x_diff - y_diff)) / mean(abs(y_diff)) (or just mean(abs(x_diff - y_diff)) if y_diff is small) is less than tolerance.

quiet

Whether or not to output detected differences between x and y to the console.

max_diffs

Maximum number of differences shown. Only relevant if quiet = FALSE or return_waldo_compare = TRUE. Set max_diffs = Inf to see all differences.

return_waldo_compare

Whether to return a character vector of class waldo_compare describing the differences between x and y instead of TRUE or FALSE.

Value

If return_waldo_compare = FALSE, a logical scalar indicating the result of the comparison. Otherwise a character vector of class waldo_compare describing the differences between x and y.

Details

Under the hood, this function relies on waldo::compare().

See also

Other data frame / tibble functions: assert_cols(), reduce_df_list()

Examples

scramble <- function(x) x[sample(nrow(x)), sample(ncol(x))]

# by default, ordering of rows and columns matters...
pal::is_equal_df(x = mtcars,
                 y = scramble(mtcars))
#> [1] FALSE

# ...but those can be ignored if desired
pal::is_equal_df(x = mtcars,
                 y = scramble(mtcars),
                 ignore_col_order = TRUE)
#> [1] FALSE
pal::is_equal_df(x = mtcars,
                 y = scramble(mtcars),
                 ignore_row_order = TRUE)
#> [1] FALSE

# by default, `is_equal_df()` is sensitive to column type differences...
df1 <- data.frame(x = "a",
                  stringsAsFactors = FALSE)
df2 <- data.frame(x = factor("a"))
pal::is_equal_df(df1, df2)
#> [1] FALSE

# ...but you can request it to not make a difference between similar types
pal::is_equal_df(df1, df2,
                 ignore_col_types = TRUE)
#> [1] TRUE