Read A CSV Or Other Delimited File With Arrow
These functions uses the Arrow C++ CSV reader to read into a data.frame. Arrow C++ options have been mapped to argument names that follow those of readr::read_delim(), and col_select was inspired by vroom::vroom().
read_delim_arrow( file, delim = ",", quote = "\"", escape_double = TRUE, escape_backslash = FALSE, col_names = TRUE, col_select = NULL, na = c("", "NA"), quoted_na = TRUE, skip_empty_rows = TRUE, skip = 0L, parse_options = NULL, convert_options = NULL, read_options = NULL, as_data_frame = TRUE ) read_csv_arrow( file, quote = "\"", escape_double = TRUE, escape_backslash = FALSE, col_names = TRUE, col_select = NULL, na = c("", "NA"), quoted_na = TRUE, skip_empty_rows = TRUE, skip = 0L, parse_options = NULL, convert_options = NULL, read_options = NULL, as_data_frame = TRUE ) read_tsv_arrow( file, quote = "\"", escape_double = TRUE, escape_backslash = FALSE, col_names = TRUE, col_select = NULL, na = c("", "NA"), quoted_na = TRUE, skip_empty_rows = TRUE, skip = 0L, parse_options = NULL, convert_options = NULL, read_options = NULL, as_data_frame = TRUE )Arguments
| file | A character file name, raw vector, or an Arrow input stream. If a file name, a memory-mapped Arrow InputStream will be opened and closed when finished; compression will be detected from the file extension and handled automatically. If an input stream is provided, it will be left open. |
|---|---|
| delim | Single character used to separate fields within a record. |
| quote | Single character used to quote strings. |
| escape_double | Does the file escape quotes by doubling them? i.e. If this option is TRUE, the value """" represents a single quote, \". |
| escape_backslash | Does the file use backslashes to escape special characters? This is more general than escape_double as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like \\n. |
| col_names | If TRUE, the first row of the input will be used as the column names and will not be included in the data frame. If FALSE, column names will be generated by Arrow, starting with "f0", "f1", ..., "fN". Alternatively, you can specify a character vector of column names. |
| col_select | A character vector of column names to keep, as in the "select" argument to data.table::fread(), or a tidy selection specification of columns, as used in dplyr::select(). |
| na | A character vector of strings to interpret as missing values. |
| quoted_na | Should missing values inside quotes be treated as missing values (the default) or strings. (Note that this is different from the the Arrow C++ default for the corresponding convert option, strings_can_be_null.) |
| skip_empty_rows | Should blank rows be ignored altogether? If TRUE, blank rows will not be represented at all. If FALSE, they will be filled with missings. |
| skip | Number of lines to skip before reading data. |
| parse_options | see file reader options. If given, this overrides any parsing options provided in other arguments (e.g. delim, quote, etc.). |
| convert_options | see file reader options |
| read_options | see file reader options |
| as_data_frame | Should the function return a data.frame (default) or an Arrow Table? |
Value
A data.frame, or a Table if as_data_frame = FALSE.
Details
read_csv_arrow() and read_tsv_arrow() are wrappers around read_delim_arrow() that specify a delimiter.
Note that not all readr options are currently implemented here. Please file an issue if you encounter one that arrow should support.
If you need to control Arrow-specific reader parameters that don't have an equivalent in readr::read_csv(), you can either provide them in the parse_options, convert_options, or read_options arguments, or you can use CsvTableReader directly for lower-level access.
Examples
# \donttest{ tf <- tempfile() on.exit(unlink(tf)) write.csv(mtcars, file = tf) df <- read_csv_arrow(tf) dim(df)#> [1] 32 12 # Can select columns df <- read_csv_arrow(tf, col_select = starts_with("d")) # }Contents
Từ khóa » Vroom Col_select
-
Get Started With Vroom
-
Vroom 1.0.0 - Tidyverse
-
Vroom Function - RDocumentation
-
Read A Delimited File Into A Tibble - Vroom
-
1 Introduction To Vroom | _main.utf8
-
The Id Column Should Also Be Selectable In Col_select() #110 - GitHub
-
Unable To Read A File When Setting `vroom(col_names = FALSE ...
-
10 Vroom: Fast Reading Of Delimited Files | R For Data Science
-
6 Importing Data, Fast! - Reproducible Research In R
-
Get Started With Vroom
-
Vroom - Bountysource
-
Make The Function Accept As An Argument A File Path Or Object From ...
-
[PDF] Package 'vroom'
-
Rworkshop2020/community - Gitter