10 Vroom: Fast Reading Of Delimited Files | R For Data Science

  • R4DS: tidyverse and beyond
  • 前言
  • I R for data science
  • 1 dplyr: Data transformation
    • 1.1 filter()
      • 1.1.1 Operators
      • 1.1.2 Missing values
      • 1.1.3 Exercises
      • 1.1.4 slice()
    • 1.2 arrange()
      • 1.2.1 Exercises
    • 1.3 select()
      • 1.3.1 练习
      • 1.3.2 常用创建函数
      • 1.3.3 Exercises
    • 1.4 summarize()
      • 1.4.1 Missing values in summarize()
      • 1.4.2 计数函数
      • 1.4.3 逻辑值的计数和比例:sum(x > 10) 和 mean(y == 0)
      • 1.4.4 其他常用的摘要函数
      • 1.4.5 多个分组变量的消耗
    • 1.5 group_by() combined with other functions
    • 1.6 Exercises
  • 2 tibble: Modern data frames
    • 2.1 Introduction
    • 2.2 Comparing tibble and data.frame
      • 2.2.1 Creating
      • 2.2.2 Printing
      • 2.2.3 Subsetting
    • 2.3 Comparing two data frames (tibbles)
      • 2.3.1 dplyr::all_equal()
      • 2.3.2 janitor::compare_df_cols()
      • 2.3.3 vetr::alike()
      • 2.3.4 diffdf::diffdf()
    • 2.4 Exercises
  • 3 readr: Data import
    • 3.1 Importing data in base R
    • 3.2 Importing data in readr
      • 3.2.1 Introduction
      • 3.2.2 Writing data
      • 3.2.3 Exercises
    • 3.3 Parsing a vector
      • 3.3.1 Numeric
      • 3.3.2 Character
      • 3.3.3 Factor
      • 3.3.4 Date and time
      • 3.3.5 Exercises
    • 3.4 Parsing a file
      • 3.4.1 Strategies
      • 3.4.2 Possible challenges
      • 3.4.3 Other tips
      • 3.4.4 Example: Dealing with metadata
      • 3.4.5 Example: multi-row headers
    • 3.5 readxl
      • 3.5.1 Multi-row headers in Excel
  • 4 lubridate: Dates and times
    • 4.1 Creating dates and times
      • 4.1.1 From strings
      • 4.1.2 From individual components
      • 4.1.3 From other times
      • 4.1.4 Exercises
    • 4.2 Date-time components
      • 4.2.1 Accessing components
      • 4.2.2 Rounding
      • 4.2.3 Setting components
      • 4.2.4 Exercises
    • 4.3 Time span
      • 4.3.1 时期 Durations
      • 4.3.2 阶段 Periods
      • 4.3.3 区间 Intervals
      • 4.3.4 Conclusion
      • 4.3.5 Exercises
    • 4.4 hms
    • 4.5 dint
      • 4.5.1 Creation
      • 4.5.2 Arithmetic and Sequences
      • 4.5.3 Accessors
      • 4.5.4 Formatting
      • 4.5.5 Labelling functions in ggplot2
  • 5 forcats: factor
    • 5.1 Factor basics
    • 5.2 Sorting
      • 5.2.1 Sorting by frequency, appearance, or numeric order
      • 5.2.2 Sorting by another variable
      • 5.2.3 Sorting manually
    • 5.3 Chaninge number of levels
      • 5.3.1 Lumping levels
      • 5.3.2 Expanding levels
      • 5.3.3 Dropping levels
      • 5.3.4 Transforming NA levels
    • 5.4 Recoding
      • 5.4.1 Exercises
  • 6 tidyr: Tidy data
    • 6.1 Tidy data
      • 6.1.1 Exercises
    • 6.2 Pivoting
      • 6.2.1 pivot_longer()
      • 6.2.2 pivot_wider()
      • 6.2.3 Combining pivot_longer() and pivot_wider()
      • 6.2.4 Exercises
    • 6.3 Nesting
      • 6.3.1 Example: Managing multiple models
      • 6.3.2 Example: Multicple hoice data
    • 6.4 Rectangling
      • 6.4.1 Github users
      • 6.4.2 Github repos
      • 6.4.3 Game of Throne characters
      • 6.4.4 Sharla Gelfand’s discography
    • 6.5 separate() and untie()
      • 6.5.1 separate()
      • 6.5.2 unite()
      • 6.5.3 Exercises
    • 6.6 Handling missing values
    • 6.7 Case Study
    • 6.8 Miscellaneous Functions
      • 6.8.1 chop() and unchop()
      • 6.8.2 uncount()
      • 6.8.3 Exercises
    • 6.9 None-tidy data
  • 7 purrr: Functional programming
    • 7.1 map() family
    • 7.2 Producing atomic vectors
      • 7.2.1 purrr-style anonymous functions
    • 7.3 Predicate functions
      • 7.3.1 Basics
      • 7.3.2 Map variants
    • 7.4 group functions
      • 7.4.1 group_map、group_modify
      • 7.4.2 group_nest、group_split、group_keys、group_data
    • 7.5 Other useful tools
      • 7.5.1 imap()
      • 7.5.2 adverbs
  • 8 Relational data
    • 8.1 Introduction
    • 8.2 Mutating joins
    • 8.3 Filtering join
  • 9 broom: Tidy representation of models
    • 9.1 Visualizing many models
    • 9.2 Examples
      • 9.2.1 PCA
    • 9.3 broomExtra
    • 9.4 ggfortify
  • II Importing
  • 10 vroom: Fast reading of delimited files
  • 11 Reading in data from other formats
    • 11.1 PDF
      • 11.1.1 Scraping pdf data
    • 11.2 Office documents
    • 11.3 Google sheet
    • 11.4 Images
  • 12 Useful APIs
    • 12.1 WDI
      • 12.1.1 WDIsearch()
      • 12.1.2 WDI
    • 12.2 ipumsr
  • III Exploring and Wrangling
  • 13 Data summary
    • 13.1 skimr
    • 13.2 visdat
    • 13.3 summarytools
      • 13.3.1 freq
      • 13.3.2 descr()
    • 13.4 gt and gtsummary
    • 13.5 naniar
  • 14 Janitor
    • 14.1 cleaning
      • 14.1.1 clean_names
    • 14.2 Exploring
      • 14.2.1 tabyl
      • 14.2.2 get_dupes
      • 14.2.3 remove_
      • 14.2.4 round_half_up
      • 14.2.5 excel_numeric_to_date
      • 14.2.6 top_levels
      • 14.2.7 row_to_names
  • IV Miscellaneous tools
  • 15 Advanced relational data
    • 15.1 fuzzyjoin
      • 15.1.1 inexact matching
      • 15.1.2 stringdist
    • 15.2 funneljoin
      • 15.2.1 after_join()
      • 15.2.2 funnel in one table
    • 15.3 dm
  • 16 Categorical data (facotr)
    • 16.1 Frequency and contingency table
      • 16.1.1 frq() and flat_table()
    • 16.2 Coding
      • 16.2.1 rec()
    • 16.3 Cutting
      • 16.3.1 chop()
  • 17 Dealing with missing values
    • 17.1 Exploring
      • 17.1.1 naniar
      • 17.1.2 Replace a value with NA
      • 17.1.3 janitor
      • 17.1.4 sjmisc
    • 17.2 Wrangling
      • 17.2.1 tidyr
      • 17.2.2 janitor
      • 17.2.3 visdat
    • 17.3 Imputation
  • References
  • written with bookdown
R for data science: tidyverse and beyond 10 vroom: Fast reading of delimited files

vroom(Hester and Wickham 2019)

https://vroom.r-lib.org/

library(vroom) file_path <-vroom_example("mtcars.csv") vroom(file_path) #> # A tibble: 32 x 12 #> model mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> # ... with 26 more rows spec(vroom(file_path)) #> cols( #> model = col_character(), #> mpg = col_double(), #> cyl = col_double(), #> disp = col_double(), #> hp = col_double(), #> drat = col_double(), #> wt = col_double(), #> qsec = col_double(), #> vs = col_double(), #> am = col_double(), #> gear = col_double(), #> carb = col_double() #> ) compressed <-vroom_example("mtcars.csv.zip") vroom(compressed) #> # A tibble: 32 x 12 #> model mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> # ... with 26 more rows vroom(compressed, col_select = c(model, cyl, gear)) #> # A tibble: 32 x 3 #> model cyl gear #> <chr> <dbl> <dbl> #> 1 Mazda RX4 6 4 #> 2 Mazda RX4 Wag 6 4 #> 3 Datsun 710 4 4 #> 4 Hornet 4 Drive 6 3 #> 5 Hornet Sportabout 8 3 #> 6 Valiant 6 3 #> # ... with 26 more rows mtcars #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.5 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 #> Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.0 0 0 3 2 #> Valiant 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 #> Duster 360 14.3 8 360.0 245 3.21 3.57 15.8 0 0 3 4 #> Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 #> Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 #> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 #> Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 #> Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 #> Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3 #> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.25 18.0 0 0 3 4 #> Lincoln Continental 10.4 8 460.0 215 3.00 5.42 17.8 0 0 3 4 #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.34 17.4 0 0 3 4 #> Fiat 128 32.4 4 78.7 66 4.08 2.20 19.5 1 1 4 1 #> Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2 #> Toyota Corolla 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1 #> Toyota Corona 21.5 4 120.1 97 3.70 2.46 20.0 1 0 3 1 #> Dodge Challenger 15.5 8 318.0 150 2.76 3.52 16.9 0 0 3 2 #> AMC Javelin 15.2 8 304.0 150 3.15 3.44 17.3 0 0 3 2 #> Camaro Z28 13.3 8 350.0 245 3.73 3.84 15.4 0 0 3 4 #> Pontiac Firebird 19.2 8 400.0 175 3.08 3.85 17.1 0 0 3 2 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.94 18.9 1 1 4 1 #> Porsche 914-2 26.0 4 120.3 91 4.43 2.14 16.7 0 1 5 2 #> Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 #> Ford Pantera L 15.8 8 351.0 264 4.22 3.17 14.5 0 1 5 4 #> Ferrari Dino 19.7 6 145.0 175 3.62 2.77 15.5 0 1 5 6 #> Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8 #> Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2

Từ khóa » Vroom Col_select