10 Vroom: Fast Reading Of Delimited Files | R For Data Science
- R4DS: tidyverse and beyond
- 前言
- I R for data science
- 1 dplyr: Data transformation
- 1.1 filter()
- 1.1.1 Operators
- 1.1.2 Missing values
- 1.1.3 Exercises
- 1.1.4 slice()
- 1.2 arrange()
- 1.2.1 Exercises
- 1.3 select()
- 1.3.1 练习
- 1.3.2 常用创建函数
- 1.3.3 Exercises
- 1.4 summarize()
- 1.4.1 Missing values in summarize()
- 1.4.2 计数函数
- 1.4.3 逻辑值的计数和比例:sum(x > 10) 和 mean(y == 0)
- 1.4.4 其他常用的摘要函数
- 1.4.5 多个分组变量的消耗
- 1.5 group_by() combined with other functions
- 1.6 Exercises
- 1.1 filter()
- 2 tibble: Modern data frames
- 2.1 Introduction
- 2.2 Comparing tibble and data.frame
- 2.2.1 Creating
- 2.2.2 Printing
- 2.2.3 Subsetting
- 2.3 Comparing two data frames (tibbles)
- 2.3.1 dplyr::all_equal()
- 2.3.2 janitor::compare_df_cols()
- 2.3.3 vetr::alike()
- 2.3.4 diffdf::diffdf()
- 2.4 Exercises
- 3 readr: Data import
- 3.1 Importing data in base R
- 3.2 Importing data in readr
- 3.2.1 Introduction
- 3.2.2 Writing data
- 3.2.3 Exercises
- 3.3 Parsing a vector
- 3.3.1 Numeric
- 3.3.2 Character
- 3.3.3 Factor
- 3.3.4 Date and time
- 3.3.5 Exercises
- 3.4 Parsing a file
- 3.4.1 Strategies
- 3.4.2 Possible challenges
- 3.4.3 Other tips
- 3.4.4 Example: Dealing with metadata
- 3.4.5 Example: multi-row headers
- 3.5 readxl
- 3.5.1 Multi-row headers in Excel
- 4 lubridate: Dates and times
- 4.1 Creating dates and times
- 4.1.1 From strings
- 4.1.2 From individual components
- 4.1.3 From other times
- 4.1.4 Exercises
- 4.2 Date-time components
- 4.2.1 Accessing components
- 4.2.2 Rounding
- 4.2.3 Setting components
- 4.2.4 Exercises
- 4.3 Time span
- 4.3.1 时期 Durations
- 4.3.2 阶段 Periods
- 4.3.3 区间 Intervals
- 4.3.4 Conclusion
- 4.3.5 Exercises
- 4.4 hms
- 4.5 dint
- 4.5.1 Creation
- 4.5.2 Arithmetic and Sequences
- 4.5.3 Accessors
- 4.5.4 Formatting
- 4.5.5 Labelling functions in ggplot2
- 4.1 Creating dates and times
- 5 forcats: factor
- 5.1 Factor basics
- 5.2 Sorting
- 5.2.1 Sorting by frequency, appearance, or numeric order
- 5.2.2 Sorting by another variable
- 5.2.3 Sorting manually
- 5.3 Chaninge number of levels
- 5.3.1 Lumping levels
- 5.3.2 Expanding levels
- 5.3.3 Dropping levels
- 5.3.4 Transforming NA levels
- 5.4 Recoding
- 5.4.1 Exercises
- 6 tidyr: Tidy data
- 6.1 Tidy data
- 6.1.1 Exercises
- 6.2 Pivoting
- 6.2.1 pivot_longer()
- 6.2.2 pivot_wider()
- 6.2.3 Combining pivot_longer() and pivot_wider()
- 6.2.4 Exercises
- 6.3 Nesting
- 6.3.1 Example: Managing multiple models
- 6.3.2 Example: Multicple hoice data
- 6.4 Rectangling
- 6.4.1 Github users
- 6.4.2 Github repos
- 6.4.3 Game of Throne characters
- 6.4.4 Sharla Gelfand’s discography
- 6.5 separate() and untie()
- 6.5.1 separate()
- 6.5.2 unite()
- 6.5.3 Exercises
- 6.6 Handling missing values
- 6.7 Case Study
- 6.8 Miscellaneous Functions
- 6.8.1 chop() and unchop()
- 6.8.2 uncount()
- 6.8.3 Exercises
- 6.9 None-tidy data
- 6.1 Tidy data
- 7 purrr: Functional programming
- 7.1 map() family
- 7.2 Producing atomic vectors
- 7.2.1 purrr-style anonymous functions
- 7.3 Predicate functions
- 7.3.1 Basics
- 7.3.2 Map variants
- 7.4 group functions
- 7.4.1 group_map、group_modify
- 7.4.2 group_nest、group_split、group_keys、group_data
- 7.5 Other useful tools
- 7.5.1 imap()
- 7.5.2 adverbs
- 8 Relational data
- 8.1 Introduction
- 8.2 Mutating joins
- 8.3 Filtering join
- 9 broom: Tidy representation of models
- 9.1 Visualizing many models
- 9.2 Examples
- 9.2.1 PCA
- 9.3 broomExtra
- 9.4 ggfortify
- II Importing
- 10 vroom: Fast reading of delimited files
- 11 Reading in data from other formats
- 11.1 PDF
- 11.1.1 Scraping pdf data
- 11.2 Office documents
- 11.3 Google sheet
- 11.4 Images
- 11.1 PDF
- 12 Useful APIs
- 12.1 WDI
- 12.1.1 WDIsearch()
- 12.1.2 WDI
- 12.2 ipumsr
- 12.1 WDI
- III Exploring and Wrangling
- 13 Data summary
- 13.1 skimr
- 13.2 visdat
- 13.3 summarytools
- 13.3.1 freq
- 13.3.2 descr()
- 13.4 gt and gtsummary
- 13.5 naniar
- 14 Janitor
- 14.1 cleaning
- 14.1.1 clean_names
- 14.2 Exploring
- 14.2.1 tabyl
- 14.2.2 get_dupes
- 14.2.3 remove_
- 14.2.4 round_half_up
- 14.2.5 excel_numeric_to_date
- 14.2.6 top_levels
- 14.2.7 row_to_names
- 14.1 cleaning
- IV Miscellaneous tools
- 15 Advanced relational data
- 15.1 fuzzyjoin
- 15.1.1 inexact matching
- 15.1.2 stringdist
- 15.2 funneljoin
- 15.2.1 after_join()
- 15.2.2 funnel in one table
- 15.3 dm
- 15.1 fuzzyjoin
- 16 Categorical data (facotr)
- 16.1 Frequency and contingency table
- 16.1.1 frq() and flat_table()
- 16.2 Coding
- 16.2.1 rec()
- 16.3 Cutting
- 16.3.1 chop()
- 16.1 Frequency and contingency table
- 17 Dealing with missing values
- 17.1 Exploring
- 17.1.1 naniar
- 17.1.2 Replace a value with NA
- 17.1.3 janitor
- 17.1.4 sjmisc
- 17.2 Wrangling
- 17.2.1 tidyr
- 17.2.2 janitor
- 17.2.3 visdat
- 17.3 Imputation
- 17.1 Exploring
- References
- written with bookdown
vroom(Hester and Wickham 2019)
https://vroom.r-lib.org/
library(vroom) file_path <-vroom_example("mtcars.csv") vroom(file_path) #> # A tibble: 32 x 12 #> model mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> # ... with 26 more rows spec(vroom(file_path)) #> cols( #> model = col_character(), #> mpg = col_double(), #> cyl = col_double(), #> disp = col_double(), #> hp = col_double(), #> drat = col_double(), #> wt = col_double(), #> qsec = col_double(), #> vs = col_double(), #> am = col_double(), #> gear = col_double(), #> carb = col_double() #> ) compressed <-vroom_example("mtcars.csv.zip") vroom(compressed) #> # A tibble: 32 x 12 #> model mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> # ... with 26 more rows vroom(compressed, col_select = c(model, cyl, gear)) #> # A tibble: 32 x 3 #> model cyl gear #> <chr> <dbl> <dbl> #> 1 Mazda RX4 6 4 #> 2 Mazda RX4 Wag 6 4 #> 3 Datsun 710 4 4 #> 4 Hornet 4 Drive 6 3 #> 5 Hornet Sportabout 8 3 #> 6 Valiant 6 3 #> # ... with 26 more rows mtcars #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.5 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 #> Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.0 0 0 3 2 #> Valiant 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 #> Duster 360 14.3 8 360.0 245 3.21 3.57 15.8 0 0 3 4 #> Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 #> Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 #> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 #> Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 #> Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 #> Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3 #> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.25 18.0 0 0 3 4 #> Lincoln Continental 10.4 8 460.0 215 3.00 5.42 17.8 0 0 3 4 #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.34 17.4 0 0 3 4 #> Fiat 128 32.4 4 78.7 66 4.08 2.20 19.5 1 1 4 1 #> Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2 #> Toyota Corolla 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1 #> Toyota Corona 21.5 4 120.1 97 3.70 2.46 20.0 1 0 3 1 #> Dodge Challenger 15.5 8 318.0 150 2.76 3.52 16.9 0 0 3 2 #> AMC Javelin 15.2 8 304.0 150 3.15 3.44 17.3 0 0 3 2 #> Camaro Z28 13.3 8 350.0 245 3.73 3.84 15.4 0 0 3 4 #> Pontiac Firebird 19.2 8 400.0 175 3.08 3.85 17.1 0 0 3 2 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.94 18.9 1 1 4 1 #> Porsche 914-2 26.0 4 120.3 91 4.43 2.14 16.7 0 1 5 2 #> Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 #> Ford Pantera L 15.8 8 351.0 264 4.22 3.17 14.5 0 1 5 4 #> Ferrari Dino 19.7 6 145.0 175 3.62 2.77 15.5 0 1 5 6 #> Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8 #> Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2Từ khóa » Vroom Col_select
-
Get Started With Vroom
-
Vroom 1.0.0 - Tidyverse
-
Vroom Function - RDocumentation
-
Read A Delimited File Into A Tibble - Vroom
-
1 Introduction To Vroom | _main.utf8
-
The Id Column Should Also Be Selectable In Col_select() #110 - GitHub
-
Unable To Read A File When Setting `vroom(col_names = FALSE ...
-
6 Importing Data, Fast! - Reproducible Research In R
-
Read A CSV Or Other Delimited File With Arrow
-
Get Started With Vroom
-
Vroom - Bountysource
-
Make The Function Accept As An Argument A File Path Or Object From ...
-
[PDF] Package 'vroom'
-
Rworkshop2020/community - Gitter