10 Vroom: Fast Reading Of Delimited Files | R For Data Science

Có thể bạn quan tâm

R4DS: tidyverse and beyond
前言
I R for data science
1 dplyr: Data transformation
- 1.1 filter()
  - 1.1.1 Operators
  - 1.1.2 Missing values
  - 1.1.3 Exercises
  - 1.1.4 slice()
- 1.2 arrange()
  - 1.2.1 Exercises
- 1.3 select()
  - 1.3.1 练习
  - 1.3.2 常用创建函数
  - 1.3.3 Exercises
- 1.4 summarize()
  - 1.4.1 Missing values in summarize()
  - 1.4.2 计数函数
  - 1.4.3 逻辑值的计数和比例:sum(x > 10) 和 mean(y == 0)
  - 1.4.4 其他常用的摘要函数
  - 1.4.5 多个分组变量的消耗
- 1.5 group_by() combined with other functions
- 1.6 Exercises
2 tibble: Modern data frames
- 2.1 Introduction
- 2.2 Comparing tibble and data.frame
  - 2.2.1 Creating
  - 2.2.2 Printing
  - 2.2.3 Subsetting
- 2.3 Comparing two data frames (tibbles)
  - 2.3.1 dplyr::all_equal()
  - 2.3.2 janitor::compare_df_cols()
  - 2.3.3 vetr::alike()
  - 2.3.4 diffdf::diffdf()
- 2.4 Exercises
3 readr: Data import
- 3.1 Importing data in base R
- 3.2 Importing data in readr
  - 3.2.1 Introduction
  - 3.2.2 Writing data
  - 3.2.3 Exercises
- 3.3 Parsing a vector
  - 3.3.1 Numeric
  - 3.3.2 Character
  - 3.3.3 Factor
  - 3.3.4 Date and time
  - 3.3.5 Exercises
- 3.4 Parsing a file
  - 3.4.1 Strategies
  - 3.4.2 Possible challenges
  - 3.4.3 Other tips
  - 3.4.4 Example: Dealing with metadata
  - 3.4.5 Example: multi-row headers
- 3.5 readxl
  - 3.5.1 Multi-row headers in Excel
4 lubridate: Dates and times
- 4.1 Creating dates and times
  - 4.1.1 From strings
  - 4.1.2 From individual components
  - 4.1.3 From other times
  - 4.1.4 Exercises
- 4.2 Date-time components
  - 4.2.1 Accessing components
  - 4.2.2 Rounding
  - 4.2.3 Setting components
  - 4.2.4 Exercises
- 4.3 Time span
  - 4.3.1 时期 Durations
  - 4.3.2 阶段 Periods
  - 4.3.3 区间 Intervals
  - 4.3.4 Conclusion
  - 4.3.5 Exercises
- 4.4 hms
- 4.5 dint
  - 4.5.1 Creation
  - 4.5.2 Arithmetic and Sequences
  - 4.5.3 Accessors
  - 4.5.4 Formatting
  - 4.5.5 Labelling functions in ggplot2
5 forcats: factor
- 5.1 Factor basics
- 5.2 Sorting
  - 5.2.1 Sorting by frequency, appearance, or numeric order
  - 5.2.2 Sorting by another variable
  - 5.2.3 Sorting manually
- 5.3 Chaninge number of levels
  - 5.3.1 Lumping levels
  - 5.3.2 Expanding levels
  - 5.3.3 Dropping levels
  - 5.3.4 Transforming NA levels
- 5.4 Recoding
  - 5.4.1 Exercises
6 tidyr: Tidy data
- 6.1 Tidy data
  - 6.1.1 Exercises
- 6.2 Pivoting
  - 6.2.1 pivot_longer()
  - 6.2.2 pivot_wider()
  - 6.2.3 Combining pivot_longer() and pivot_wider()
  - 6.2.4 Exercises
- 6.3 Nesting
  - 6.3.1 Example: Managing multiple models
  - 6.3.2 Example: Multicple hoice data
- 6.4 Rectangling
  - 6.4.1 Github users
  - 6.4.2 Github repos
  - 6.4.3 Game of Throne characters
  - 6.4.4 Sharla Gelfand’s discography
- 6.5 separate() and untie()
  - 6.5.1 separate()
  - 6.5.2 unite()
  - 6.5.3 Exercises
- 6.6 Handling missing values
- 6.7 Case Study
- 6.8 Miscellaneous Functions
  - 6.8.1 chop() and unchop()
  - 6.8.2 uncount()
  - 6.8.3 Exercises
- 6.9 None-tidy data
7 purrr: Functional programming
- 7.1 map() family
- 7.2 Producing atomic vectors
  - 7.2.1 purrr-style anonymous functions
- 7.3 Predicate functions
  - 7.3.1 Basics
  - 7.3.2 Map variants
- 7.4 group functions
  - 7.4.1 group_map、group_modify
  - 7.4.2 group_nest、group_split、group_keys、group_data
- 7.5 Other useful tools
  - 7.5.1 imap()
  - 7.5.2 adverbs
8 Relational data
- 8.1 Introduction
- 8.2 Mutating joins
- 8.3 Filtering join
9 broom: Tidy representation of models
- 9.1 Visualizing many models
- 9.2 Examples
  - 9.2.1 PCA
- 9.3 broomExtra
- 9.4 ggfortify
II Importing
10 vroom: Fast reading of delimited files
11 Reading in data from other formats
- 11.1 PDF
  - 11.1.1 Scraping pdf data
- 11.2 Office documents
- 11.3 Google sheet
- 11.4 Images
12 Useful APIs
- 12.1 WDI
  - 12.1.1 WDIsearch()
  - 12.1.2 WDI
- 12.2 ipumsr
III Exploring and Wrangling
13 Data summary
- 13.1 skimr
- 13.2 visdat
- 13.3 summarytools
  - 13.3.1 freq
  - 13.3.2 descr()
- 13.4 gt and gtsummary
- 13.5 naniar
14 Janitor
- 14.1 cleaning
  - 14.1.1 clean_names
- 14.2 Exploring
  - 14.2.1 tabyl
  - 14.2.2 get_dupes
  - 14.2.3 remove_
  - 14.2.4 round_half_up
  - 14.2.5 excel_numeric_to_date
  - 14.2.6 top_levels
  - 14.2.7 row_to_names
IV Miscellaneous tools
15 Advanced relational data
- 15.1 fuzzyjoin
  - 15.1.1 inexact matching
  - 15.1.2 stringdist
- 15.2 funneljoin
  - 15.2.1 after_join()
  - 15.2.2 funnel in one table
- 15.3 dm
16 Categorical data (facotr)
- 16.1 Frequency and contingency table
  - 16.1.1 frq() and flat_table()
- 16.2 Coding
  - 16.2.1 rec()
- 16.3 Cutting
  - 16.3.1 chop()
17 Dealing with missing values
- 17.1 Exploring
  - 17.1.1 naniar
  - 17.1.2 Replace a value with NA
  - 17.1.3 janitor
  - 17.1.4 sjmisc
- 17.2 Wrangling
  - 17.2.1 tidyr
  - 17.2.2 janitor
  - 17.2.3 visdat
- 17.3 Imputation
References
written with bookdown

R for data science: tidyverse and beyond 10 vroom: Fast reading of delimited files

vroom(Hester and Wickham 2019)

https://vroom.r-lib.org/

library(vroom) file_path <-vroom_example("mtcars.csv") vroom(file_path) #> # A tibble: 32 x 12 #> model mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> # ... with 26 more rows spec(vroom(file_path)) #> cols( #> model = col_character(), #> mpg = col_double(), #> cyl = col_double(), #> disp = col_double(), #> hp = col_double(), #> drat = col_double(), #> wt = col_double(), #> qsec = col_double(), #> vs = col_double(), #> am = col_double(), #> gear = col_double(), #> carb = col_double() #> ) compressed <-vroom_example("mtcars.csv.zip") vroom(compressed) #> # A tibble: 32 x 12 #> model mpg cyl disp hp drat wt qsec vs am gear carb #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 Mazda RX4 W~ 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 Hornet 4 Dr~ 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 Hornet Spor~ 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> # ... with 26 more rows vroom(compressed, col_select = c(model, cyl, gear)) #> # A tibble: 32 x 3 #> model cyl gear #> <chr> <dbl> <dbl> #> 1 Mazda RX4 6 4 #> 2 Mazda RX4 Wag 6 4 #> 3 Datsun 710 4 4 #> 4 Hornet 4 Drive 6 3 #> 5 Hornet Sportabout 8 3 #> 6 Valiant 6 3 #> # ... with 26 more rows mtcars #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.5 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 #> Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 #> Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.0 0 0 3 2 #> Valiant 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 #> Duster 360 14.3 8 360.0 245 3.21 3.57 15.8 0 0 3 4 #> Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 #> Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 #> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 #> Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 #> Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 #> Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3 #> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.25 18.0 0 0 3 4 #> Lincoln Continental 10.4 8 460.0 215 3.00 5.42 17.8 0 0 3 4 #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.34 17.4 0 0 3 4 #> Fiat 128 32.4 4 78.7 66 4.08 2.20 19.5 1 1 4 1 #> Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2 #> Toyota Corolla 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1 #> Toyota Corona 21.5 4 120.1 97 3.70 2.46 20.0 1 0 3 1 #> Dodge Challenger 15.5 8 318.0 150 2.76 3.52 16.9 0 0 3 2 #> AMC Javelin 15.2 8 304.0 150 3.15 3.44 17.3 0 0 3 2 #> Camaro Z28 13.3 8 350.0 245 3.73 3.84 15.4 0 0 3 4 #> Pontiac Firebird 19.2 8 400.0 175 3.08 3.85 17.1 0 0 3 2 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.94 18.9 1 1 4 1 #> Porsche 914-2 26.0 4 120.3 91 4.43 2.14 16.7 0 1 5 2 #> Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 #> Ford Pantera L 15.8 8 351.0 264 4.22 3.17 14.5 0 1 5 4 #> Ferrari Dino 19.7 6 145.0 175 3.62 2.77 15.5 0 1 5 6 #> Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8 #> Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2

Từ khóa » Vroom Col_select

10 Vroom: Fast Reading Of Delimited Files | R For Data Science

Get Started With Vroom

Vroom 1.0.0 - Tidyverse

Vroom Function - RDocumentation

Read A Delimited File Into A Tibble - Vroom

1 Introduction To Vroom | _main.utf8

The Id Column Should Also Be Selectable In Col_select() #110 - GitHub

Unable To Read A File When Setting `vroom(col_names = FALSE ...

6 Importing Data, Fast! - Reproducible Research In R

Read A CSV Or Other Delimited File With Arrow

Get Started With Vroom

Vroom - Bountysource

Make The Function Accept As An Argument A File Path Or Object From ...

[PDF] Package 'vroom'

Rworkshop2020/community - Gitter

Liên Hệ