Chapter 16 Arranging (Sorting) Data | R For HR: An Introduction To ...

Maybe your like

R for HR
Preface
- 0.1 Growth of HR Analytics
- 0.2 Skills Gap
- 0.3 Project Life Cycle Perspective
- 0.4 Overview of HRIS & HR Analytics
- 0.5 My Philosophy for This Book
  - 0.5.1 Rationale for Using R
  - 0.5.2 Audience
- 0.6 Structure
- 0.7 About the Author
- 0.8 Contacting the Author
- 0.9 Acknowledgements
I HR Analytics Project Life Cycle
1 Overview of HR Analytics Project Life Cycle
2 Question Formulation
- 2.1 Adopting a Strategic Mindset
  - 2.1.1 Strategy
  - 2.1.2 Strategy Formulation
  - 2.1.3 Strategy Implementation
  - 2.1.4 Strategic Human Resource Management
- 2.2 Defining Problems & Formulating Questions
  - 2.2.1 Defining a Problem
  - 2.2.2 Formulating a Question
  - 2.2.3 Thinking Divergently & Convergently
- 2.3 Summary
3 Data Acquisition
- 3.1 Employee Surveys
- 3.2 Rating Forms
- 3.3 Surveillance & Monitoring
- 3.4 Database Queries
- 3.5 Scraping
- 3.6 Summary
4 Data Management
- 4.1 Data Cleaning
- 4.2 Data Manipulation & Structuring
- 4.3 Common Data-Management Tools
- 4.4 Summary
5 Data Analysis
- 5.1 Tools & Techniques
  - 5.1.1 Mathematics
  - 5.1.2 Statistics
  - 5.1.3 Machine Learning
  - 5.1.4 Computational Modeling & Simulations
  - 5.1.5 Text Analyses & Qualitative Analyses
- 5.2 Continuum of Data Analytics
  - 5.2.1 Descriptive Analytics
  - 5.2.2 Predict-ish Analytics
  - 5.2.3 Predictive Analytics
  - 5.2.4 Prescriptive Analytics
- 5.3 Summary
6 Data Interpretation & Storytelling
- 6.1 Data Interpretation
- 6.2 Storytelling
  - 6.2.1 Structure
  - 6.2.2 Clarity & Parsimony
  - 6.2.3 Influence & Persuasion
- 6.3 Data Visualization
- 6.4 Summary
7 Deployment & Implementation
II Introduction to R
8 Overview of R & RStudio
- 8.1 R Programming Language
  - 8.1.1 What Is R?
  - 8.1.2 Why Use R?
  - 8.1.3 Who Uses R?
- 8.2 RStudio
  - 8.2.1 What is RStudio?
  - 8.2.2 Why RStudio?
  - 8.2.3 Who Uses RStudio?
- 8.3 Packages
- 8.4 Summary
9 Installing R & RStudio
- 9.1 Video Tutorial
- 9.2 Downloading & Installing R
  - 9.2.1 For Windows Operation Systems
  - 9.2.2 For Mac Operating Systems
- 9.3 Downloading & Installing RStudio
- 9.4 Summary
10 Getting Started with R & RStudio
- 10.1 Orientation to RStudio
- 10.2 Creating & Saving an R Script
  - 10.2.1 Creating a New R Script
  - 10.2.2 Using an R Script
  - 10.2.3 Saving an R Script
  - 10.2.4 Opening a Saved R Script
- 10.3 Creating an RStudio Project
  - 10.3.1 Creating a New RStudio Project
  - 10.3.2 Opening an Existing RStudio Project
- 10.4 Orientation to Written Tutorials
- 10.5 Summary
11 Basic Features and Operations of the R Language
- 11.1 Video Tutorial
- 11.2 Functions & Packages Introduced
- 11.3 R as a Calculator
- 11.4 Functions
- 11.5 Packages
- 11.6 Variable Assignment
- 11.7 Types of Data
  - 11.7.1 numeric Data
  - 11.7.2 character Data
  - 11.7.3 Date Data
  - 11.7.4 logical Data
- 11.8 Vectors
- 11.9 Lists
- 11.10 Data Frames
- 11.11 Annotations
- 11.12 Summary
12 Setting a Working Directory
- 12.1 Video Tutorial
- 12.2 Functions & Packages Introduced
- 12.3 Identify the Current Working Directory
- 12.4 Set a New Working Directory
- 12.5 Summary
III Data Acquisition & Management
13 Reading Data into R
- 13.1 Conceptual Overview
- 13.2 Tutorial
  - 13.2.1 Video Tutorial
  - 13.2.2 Functions & Packages Introduced
  - 13.2.3 Initial Steps
  - 13.2.4 Read a .csv File
  - 13.2.5 Read a .xlsx File
  - 13.2.6 Summary
- 13.3 Chapter Supplement
  - 13.3.1 Functions & Packages Introduced
  - 13.3.2 Initial Steps
  - 13.3.3 Additional Functions for Reading a .csv File
  - 13.3.4 Skip Rows of Data During Read
  - 13.3.5 List Data File Names in Working Directory
14 Removing, Adding, & Changing Variable Names
- 14.1 Conceptual Overview
- 14.2 Tutorial
  - 14.2.1 Video Tutorial
  - 14.2.2 Functions & Packages Introduced
  - 14.2.3 Initial Steps
  - 14.2.4 Remove Variable Names from a Data Frame Object
  - 14.2.5 Add Variable Names to a Data Frame Object
  - 14.2.6 Change Specific Variable Names in a Data Frame Object
  - 14.2.7 Summary
15 Writing Data from R
- 15.1 Conceptual Overview
- 15.2 Tutorial
  - 15.2.1 Video Tutorial
  - 15.2.2 Functions & Packages Introduced
  - 15.2.3 Initial Steps
  - 15.2.4 Write Data Frame to Working Directory
  - 15.2.5 Write Table to Working Directory
  - 15.2.6 Summary
16 Arranging (Sorting) Data
- 16.1 Conceptual Overview
- 16.2 Tutorial
  - 16.2.1 Video Tutorial
  - 16.2.2 Functions & Packages Introduced
  - 16.2.3 Initial Steps
  - 16.2.4 Arrange (Sort) Data
  - 16.2.5 Summary
- 16.3 Chapter Supplement
  - 16.3.1 Functions & Packages Introduced
  - 16.3.2 Initial Steps
  - 16.3.3 order Function from Base R
17 Joining (Merging) Data
- 17.1 Conceptual Overview
  - 17.1.1 Review of Horizontal Joins (Merges)
  - 17.1.2 Review of Vertical Joins (Merges)
- 17.2 Tutorial
  - 17.2.1 Video Tutorial
  - 17.2.2 Functions & Packages Introduced
  - 17.2.3 Initial Steps
  - 17.2.4 Horizontal Join (Merge)
  - 17.2.5 Vertical Join (Merge)
  - 17.2.6 Summary
- 17.3 Chapter Supplement
  - 17.3.1 Video Tutorial
  - 17.3.2 Functions & Packages Introduced
  - 17.3.3 Initial Steps
  - 17.3.4 merge Function from Base R
18 Filtering (Subsetting) Data
- 18.1 Conceptual Overview
  - 18.1.1 Review of Logical Operators
- 18.2 Tutorial
  - 18.2.1 Video Tutorial
  - 18.2.2 Functions & Packages Introduced
  - 18.2.3 Initial Steps
  - 18.2.4 Filter Cases from Data Frame
  - 18.2.5 Remove Single Variable from Data Frame
  - 18.2.6 Select Multiple Variables from Data Frame
  - 18.2.7 Remove Multiple Variables from Data Frame
  - 18.2.8 Summary
- 18.3 Chapter Supplement
  - 18.3.1 Video Tutorials
  - 18.3.2 Functions & Packages Introduced
  - 18.3.3 Initial Steps
  - 18.3.4 subset Function from Base R
  - 18.3.5 Filter by Pattern Contained within String
19 Cleaning Data
- 19.1 Conceptual Overview
- 19.2 Tutorial
  - 19.2.1 Video Tutorial
  - 19.2.2 Functions & Packages Introduced
  - 19.2.3 Initial Steps
  - 19.2.4 Review Data
  - 19.2.5 Clean Data
  - 19.2.6 Rename Variables
  - 19.2.7 Other Approaches to Cleaning Data
  - 19.2.8 Summary
20 Manipulating & Restructuring Data
- 20.1 Conceptual Overview
- 20.2 Tutorial
  - 20.2.1 Video Tutorial
  - 20.2.2 Functions & Packages Introduced
  - 20.2.3 Initial Steps
  - 20.2.4 Wide-to-Long Format Data Manipulation
  - 20.2.5 Long-to-Wide Format Data Manipulation
  - 20.2.6 Summary
21 Centering & Standardizing Variables
- 21.1 Conceptual Overview
  - 21.1.1 Review of Centering Variables
  - 21.1.2 Review of Standardizing Variables
- 21.2 Tutorial
  - 21.2.1 Video Tutorial
  - 21.2.2 Functions & Packages Introduced
  - 21.2.3 Initial Steps
  - 21.2.4 Grand-Mean Center Variables
  - 21.2.5 Group-Mean Center Variables
  - 21.2.6 Standardize Variables
  - 21.2.7 Summary
22 Removing Objects from the R Environment
- 22.1 Conceptual Overview
- 22.2 Tutorial
  - 22.2.1 Video Tutorial
  - 22.2.2 Functions & Packages Introduced
  - 22.2.3 Initial Steps
  - 22.2.4 List Objects in R Environment
  - 22.2.5 Remove Objects from R Environment
  - 22.2.6 Summary
IV Employee Demographics
23 Introduction to Employee Demographics
- 23.1 Chapters Included
24 Describing Employee Demographics Using Descriptive Statistics
- 24.1 Conceptual Overview
  - 24.1.1 Review of Measurement Scales
  - 24.1.2 Constructs, Measures, & Measurement Scales
  - 24.1.3 Types of Descriptive Statistics
  - 24.1.4 Sample Write-Up
- 24.2 Tutorial
  - 24.2.1 Video Tutorials
  - 24.2.2 Functions & Packages Introduced
  - 24.2.3 Initial Steps
  - 24.2.4 Determine the Measurement Scale
  - 24.2.5 Describe Nominal & Ordinal (Categorical) Variables
  - 24.2.6 Describe Interval & Ratio (Continuous) Variables
  - 24.2.7 Summary
- 24.3 Chapter Supplement
  - 24.3.1 Functions & Packages Introduced
  - 24.3.2 Initial Steps
  - 24.3.3 Compute Coefficient of Variation (CV)
25 Summarizing Two or More Categorical Variables Using Cross-Tabulations
- 25.1 Conceptual Overview
  - 25.1.1 Review of Cross-Tabulation
  - 25.1.2 Sample Write-Up
- 25.2 Tutorial
  - 25.2.1 Video Tutorial
  - 25.2.2 Functions & Packages Introduced
  - 25.2.3 Initial Steps
  - 25.2.4 Two-Way Cross-Tabulation
  - 25.2.5 Three-Way Cross-Tabulation
  - 25.2.6 Summary
26 Applying Pivot Tables to Explore Employee Demographic Data
- 26.1 Conceptual Overview
- 26.2 Tutorial
  - 26.2.1 Video Tutorial
  - 26.2.2 Functions & Packages Introduced
  - 26.2.3 Initial Steps
  - 26.2.4 Create a Pivot Table
  - 26.2.5 Summary
V Employee Surveys
27 Introduction to Employee Surveys
- 27.1 Chapters Included
28 Aggregating & Segmenting Employee Survey Data
- 28.1 Conceptual Overview
- 28.2 Tutorial
  - 28.2.1 Video Tutorial
  - 28.2.2 Functions & Packages Introduced
  - 28.2.3 Initial Steps
  - 28.2.4 Counts By Group
  - 28.2.5 Measures of Central Tendency and Dispersion By Group
  - 28.2.6 Add Variable to Data Frame Containing Aggregated Values
  - 28.2.7 Visualize Data By Group
  - 28.2.8 Summary
- 28.3 Chapter Supplement
  - 28.3.1 Functions & Packages Introduced
  - 28.3.2 Initial Steps
  - 28.3.3 describeBy Function from psych Package
  - 28.3.4 aggregate Function from Base R
29 Estimating Internal Consistency Reliability Using Cronbach’s alpha
- 29.1 Conceptual Overview
- 29.2 Tutorial
  - 29.2.1 Video Tutorial
  - 29.2.2 Functions & Packages Introduced
  - 29.2.3 Initial Steps
  - 29.2.4 Compute Cronbach’s alpha
  - 29.2.5 Summary
30 Creating a Composite Variable Based on a Multi-Item Measure
- 30.1 Conceptual Overview
- 30.2 Tutorial
  - 30.2.1 Video Tutorial
  - 30.2.2 Functions & Packages Introduced
  - 30.2.3 Initial Steps
  - 30.2.4 Compute Cronbach’s alpha
  - 30.2.5 Create a Composite Variable
  - 30.2.6 Summary
VI Employee Training
31 Introduction to Employee Training
- 31.1 Needs Assessment
- 31.2 Learning Environment & Enhancement
- 31.3 Training Methods
- 31.4 Training Evaluation
  - 31.4.1 Causal Inferences
  - 31.4.2 Training Evaluation Designs & Statistical Analysis
- 31.5 Chapters Included
32 Evaluating a Pre-Test/Post-Test without Control Group Design Using Paired-Samples t-test
- 32.1 Conceptual Overview
  - 32.1.1 Review of Pre-Test/Post-Test without Control Group Design
  - 32.1.2 Review of Paired-Samples t-test
- 32.2 Tutorial
  - 32.2.1 Video Tutorial
  - 32.2.2 Functions & Packages Introduced
  - 32.2.3 Initial Steps
  - 32.2.4 Estimate Paired-Samples t-test
  - 32.2.5 Visualize Results Using Bar Chart
  - 32.2.6 Summary
- 32.3 Chapter Supplement
  - 32.3.1 Functions & Packages Introduced
  - 32.3.2 Initial Steps
  - 32.3.3 t.test Function from Base R
  - 32.3.4 lm Function from Base R
33 Evaluating a Post-Test-Only with Control Group Design Using Independent-Samples t-test
- 33.1 Conceptual Overview
  - 33.1.1 Review of Post-Test-Only with Control Group Design
  - 33.1.2 Review of Independent-Samples t-test
- 33.2 Tutorial
  - 33.2.1 Video Tutorial
  - 33.2.2 Functions & Packages Introduced
  - 33.2.3 Initial Steps
  - 33.2.4 Estimate Independent-Samples t-test
  - 33.2.5 Visualize Results Using Bar Chart
  - 33.2.6 Summary
- 33.3 Chapter Supplement
  - 33.3.1 Functions & Packages Introduced
  - 33.3.2 Initial Steps
  - 33.3.3 t.test Function from Base R
  - 33.3.4 lm Function from Base R
34 Evaluating a Pre-Test/Post-Test with Control Group Design Using an Independent-Samples t-test
- 34.1 Conceptual Overview
  - 34.1.1 Statistical Assumptions
- 34.2 Tutorial
  - 34.2.1 Video Tutorial
  - 34.2.2 Functions & Packages Introduced
  - 34.2.3 Initial Steps {#initsteps_mixedfactorial}}
  - 34.2.4 Evaluate a Pre-Test/Post-Test with Control Group Design
  - 34.2.5 Summary
- 34.3 Chapter Supplement
  - 34.3.1 Functions & Packages Introduced
  - 34.3.2 Initial Steps
  - 34.3.3 Estimating a Simple Linear Regression Model with a Difference Score Outcome Variable
  - 34.3.4 Estimating a Biserial Correlation with a Difference Score Outcome Variable
  - 34.3.5 Estimating a 2x2 Mixed-Factorial ANOVA Model
  - 34.3.6 Estimating a Random-Coefficients Multilevel Model
  - 34.3.7 Estimating an Analysis of Covariance Model
35 Evaluating a Post-Test-Only with Two Comparison Groups Design Using One-Way ANOVA
- 35.1 Conceptual Overview
  - 35.1.1 Review of Post-Test-Only with Two Comparison Groups Design
  - 35.1.2 Review of One-Way ANOVA
- 35.2 Tutorial
  - 35.2.1 Video Tutorial
  - 35.2.2 Functions & Packages Introduced
  - 35.2.3 Initial Steps
  - 35.2.4 Test Statistical Assumptions
  - 35.2.5 Estimate One-Way ANOVA
  - 35.2.6 Visualize Results Using Bar Chart
  - 35.2.7 Summary
- 35.3 Chapter Supplement
  - 35.3.1 Functions & Packages Introduced
  - 35.3.2 Initial Steps
  - 35.3.3 aov Function from Base R
  - 35.3.4 APA-Style Table of Results
VII Employee Selection
36 Introduction to Employee Selection
- 36.1 Evaluating Selection Tools
- 36.2 Chapters Included
37 Investigating Disparate Impact
- 37.1 Conceptual Overview
- 37.2 Tutorial
  - 37.2.1 Video Tutorial
  - 37.2.2 Functions & Packages Introduced
  - 37.2.3 Initial Steps
  - 37.2.4 4/5ths Rule
  - 37.2.5 Chi-Square ($\chi^2$) Test of Independence
  - 37.2.6 Fisher Exact Test
  - 37.2.7 $Z_{D}$ Test
  - 37.2.8 $Z_{IR}$ Test
  - 37.2.9 Summary
38 Estimating Criterion-Related Validity of a Selection Tool Using Correlation
- 38.1 Conceptual Overview
  - 38.1.1 Review of Criterion-Related Validity
  - 38.1.2 Review of Correlation
- 38.2 Tutorial
  - 38.2.1 Video Tutorial
  - 38.2.2 Functions & Packages Introduced
  - 38.2.3 Initial Steps
  - 38.2.4 Visualize Association Using a Scatter Plot
  - 38.2.5 Estimate Correlation
  - 38.2.6 Summary
- 38.3 Chapter Supplement
  - 38.3.1 Functions & Packages Introduced
  - 38.3.2 Initial Steps
  - 38.3.3 cor Function from Base R
  - 38.3.4 cor.test Function from Base R
39 Predicting Criterion Scores Based on Selection Tool Scores Using Simple Linear Regression
- 39.1 Conceptual Overview
  - 39.1.1 Review of Simple Linear Regression
  - 39.1.2 Predicting Future Criterion Scores Using Simple Linear Regression
- 39.2 Tutorial
  - 39.2.1 Video Tutorials
  - 39.2.2 Functions & Packages Introduced
  - 39.2.3 Initial Steps
  - 39.2.4 Estimate Simple Linear Regression Model
  - 39.2.5 Predict Criterion Scores
  - 39.2.6 Summary
- 39.3 Chapter Supplement
  - 39.3.1 Functions & Packages Introduced
  - 39.3.2 Initial Steps
  - 39.3.3 lm Function from Base R
  - 39.3.4 predict Function from Base R
  - 39.3.5 APA-Style Results Table
40 Estimating Incremental Validity of a Selection Tool Using Multiple Linear Regression
- 40.1 Conceptual Overview
  - 40.1.1 Review of Multiple Linear Regression
- 40.2 Tutorial
  - 40.2.1 Video Tutorials
  - 40.2.2 Functions & Packages Introduced
  - 40.2.3 Initial Steps
  - 40.2.4 Estimate Multiple Linear Regression Model
  - 40.2.5 Summary
- 40.3 Chapter Supplement
  - 40.3.1 Functions & Packages Introduced
  - 40.3.2 Initial Steps
  - 40.3.3 lm Function from Base R
  - 40.3.4 APA-Style Results Table
41 Applying a Compensatory Approach to Selection Decisions Using Multiple Linear Regression
- 41.1 Conceptual Overview
  - 41.1.1 Review of Multiple Linear Regression
  - 41.1.2 Review of Compensatory Approach
- 41.2 Tutorial
  - 41.2.1 Video Tutorial
  - 41.2.2 Functions & Packages Introduced
  - 41.2.3 Initial Steps
  - 41.2.4 Estimate Multiple Linear Regression Model
  - 41.2.5 Predict Criterion Scores
  - 41.2.6 Summary
- 41.3 Chapter Supplement
  - 41.3.1 Functions & Packages Introduced
  - 41.3.2 Initial Steps
  - 41.3.3 lm & predict Functions from Base R
42 Applying a Noncompensatory Approach to Selection Decisions Using Angoff Method
- 42.1 Conceptual Overview
  - 42.1.1 Review of Noncompensatory Approach
- 42.2 Tutorial
  - 42.2.1 Video Tutorial
  - 42.2.2 Functions & Packages Introduced
  - 42.2.3 Initial Steps
  - 42.2.4 Create Cutoff Scores
  - 42.2.5 Apply Cutoff Scores to Make Selection Decisions
  - 42.2.6 Summary
43 Testing for Differential Prediction Using Moderated Multiple Linear Regression
- 43.1 Conceptual Overview
  - 43.1.1 Review of Moderated Multiple Linear Regression
  - 43.1.2 Review of Differential Prediction
- 43.2 Tutorial
  - 43.2.1 Video Tutorial
  - 43.2.2 Functions & Packages Introduced
  - 43.2.3 Initial Steps
  - 43.2.4 Grand-Mean Center Continuous Predictor Variables
  - 43.2.5 Estimate Moderated Multiple Linear Regression Model
  - 43.2.6 Summary
- 43.3 Chapter Supplement
  - 43.3.1 Functions & Packages Introduced
  - 43.3.2 Initial Steps
  - 43.3.3 lm Function from Base R
  - 43.3.4 APA-Style Results Table
44 Statistically & Empirically Cross-Validating a Selection Tool
- 44.1 Conceptual Overview
  - 44.1.1 Review of Statistical Cross-Validation
  - 44.1.2 Review of Empirical Cross-Validation
- 44.2 Tutorial
  - 44.2.1 Functions & Packages Introduced
  - 44.2.2 Initial Steps
  - 44.2.3 Perform Statistical Cross-Validation
  - 44.2.4 Perform Empirical Cross-Validation
  - 44.2.5 Summary
VIII Employee Separation & Retention
45 Introduction to Employee Separation & Retention
- 45.1 Chapters Included
46 Computing Monthly & Annual Turnover Rates
- 46.1 Conceptual Overview
- 46.2 Tutorial
  - 46.2.1 Video Tutorial
  - 46.2.2 Functions & Packages Introduced
  - 46.2.3 Initial Steps
  - 46.2.4 Compute Monthly Turnover Rates
  - 46.2.5 Compute Annual Turnover Rate
  - 46.2.6 Summary
47 Estimating the Association Between Two Categorical Variables Using Chi-Square ($\chi^2$) Test of Independence
- 47.1 Conceptual Overview
- 47.2 Tutorial
  - 47.2.1 Video Tutorial
  - 47.2.2 Functions & Packages Introduced
  - 47.2.3 Initial Steps
  - 47.2.4 Create a Contingency Table for Observed Data
  - 47.2.5 Estimate Chi-Square ($\chi^2$) Test of Independence
  - 47.2.6 Summary
- 47.3 Chapter Supplement
  - 47.3.1 Functions & Packages Introduced
  - 47.3.2 Initial Steps
  - 47.3.3 Compute Odds Ratio for 2x2 Contingency Table
48 Identifying Predictors of Turnover Using Logistic Regression
- 48.1 Conceptual Overview
  - 48.1.1 Review of Logistic Regression
- 48.2 Tutorial
  - 48.2.1 Video Tutorials
  - 48.2.2 Functions & Packages Introduced
  - 48.2.3 Initial Steps
  - 48.2.4 Estimate Simple Logistic Regression Model
  - 48.2.5 Estimate Multiple Logistic Regression Model
  - 48.2.6 Summary
- 48.3 Chapter Supplement
  - 48.3.1 Functions & Packages Introduced
  - 48.3.2 Initial Steps
  - 48.3.3 Simple Logistic Regression Model Using glm Function from Base R
  - 48.3.4 Multiple Logistic Regression Using glm Function from Base R
49 Applying k-Fold Cross-Validation to Logistic Regression
- 49.1 Conceptual Overview
  - 49.1.1 Review of Predictive Analytics
  - 49.1.2 Review of k-Fold Cross-Validation
  - 49.1.3 Conceptual Video
- 49.2 Tutorial
  - 49.2.1 Video Tutorials
  - 49.2.2 Functions & Packages Introduced
  - 49.2.3 Initial Steps
  - 49.2.4 Apply k-Fold Cross-Validation Using Logistic Regression
  - 49.2.5 Summary
50 Understanding Length of Service Using Survival Analysis
- 50.1 Conceptual Overview
  - 50.1.1 Censoring
  - 50.1.2 Types of Survival Analysis
  - 50.1.3 Conceptual Video
- 50.2 Tutorial
  - 50.2.1 Video Tutorials
  - 50.2.2 Functions & Packages Introduced
  - 50.2.3 Initial Steps
  - 50.2.4 Create a Censoring Variable
  - 50.2.5 Inspect Distribution of Length of Service
  - 50.2.6 Conduct Kaplan-Meier Analysis & Create Life Table
  - 50.2.7 Estimate Cox Proportional Hazards Model
  - 50.2.8 Summary
IX Employee Performance Management
51 Introduction to Employee Performance Management
- 51.1 Chapters Included
52 Evaluating Convergent & Discriminant Validity Using Scatter Plots & Correlations
- 52.1 Conceptual Overview
  - 52.1.1 Review of Concurrent & Discriminant Validity
  - 52.1.2 Review of Pearson Product-Moment & Point-Biserial Correlation
  - 52.1.3 Review of Bivariate Scatter Plot
- 52.2 Tutorial
  - 52.2.1 Video Tutorial
  - 52.2.2 Functions & Packages Introduced
  - 52.2.3 Initial Steps
  - 52.2.4 Visualize Association Using a Bivariate Scatter Plot
  - 52.2.5 Estimate Correlations
  - 52.2.6 Create Correlation Matrix
  - 52.2.7 Summary
- 52.3 Chapter Supplement
  - 52.3.1 Functions & Packages Introduced
  - 52.3.2 Initial Steps
  - 52.3.3 shapiro.test Function from Base R
  - 52.3.4 APA-Style Results Table
  - 52.3.5 cor.plot Function from psych package
  - 52.3.6 corrgram Function from corrgram package
53 Investigating Nonlinear Associations Using Polynomial Regression
- 53.1 Conceptual Overview
  - 53.1.1 Statistical Assumptions
  - 53.1.2 Statistical Significance
- 53.2 Tutorial
  - 53.2.1 Functions & Packages Introduced
  - 53.2.2 Initial Steps
  - 53.2.3 Visualize Association Using a Bivariate Scatter Plot
  - 53.2.4 Estimate Polynomial Regression Model
  - 53.2.5 Summary
54 Supervised Statistical Learning Using Lasso Regression
- 54.1 Conceptual Overview
  - 54.1.1 Shrinkage
  - 54.1.2 Regularization
  - 54.1.3 Tuning
  - 54.1.4 Model Type Selection
  - 54.1.5 Cross-Validation
  - 54.1.6 Predictive Analytics
  - 54.1.7 Conceptual Video
- 54.2 Tutorial
  - 54.2.1 Video Tutorials
  - 54.2.2 Functions & Packages Introduced
  - 54.2.3 Initial Steps
  - 54.2.4 Process Overview
  - 54.2.5 Partition the Data
  - 54.2.6 Specify k-Fold Cross-Validation
  - 54.2.7 Specify and Train Lasso Regression Model
  - 54.2.8 Optional: Compare to Lasso Model to OLS Multiple Linear Regression Model
  - 54.2.9 Summary
55 Investigating Processes Using Path Analysis
- 55.1 Conceptual Overview
  - 55.1.1 Path Diagram
  - 55.1.2 Model Identification
  - 55.1.3 Model Fit
  - 55.1.4 Parameter Estimates
  - 55.1.5 Statistical Assumptions
  - 55.1.6 Conceptual Video
- 55.2 Tutorial
  - 55.2.1 Video Tutorial
  - 55.2.2 Functions & Packages Introduced
  - 55.2.3 Initial Steps
  - 55.2.4 Specifying & Estimating Path Analysis Models
  - 55.2.5 Obtaining Standardized Parameter Estimates
  - 55.2.6 Alternative Approaches to Model Specifications
  - 55.2.7 Estimating Models with Missing Data
  - 55.2.8 Summary
56 Estimating a Mediation Model Using Path Analysis
- 56.1 Conceptual Overview
  - 56.1.1 Estimation of Indirect Effect
  - 56.1.2 Model Identification
  - 56.1.3 Model Fit
  - 56.1.4 Parameter Estimates
  - 56.1.5 Statistical Assumptions
  - 56.1.6 Conceptual Video
- 56.2 Tutorial
  - 56.2.1 Video Tutorial
  - 56.2.2 Functions & Packages Introduced
  - 56.2.3 Initial Steps
  - 56.2.4 Specifying & Estimating a Mediation Analysis Model
  - 56.2.5 Obtaining Standardized Parameter Estimates
  - 56.2.6 Estimating Models with Missing Data
  - 56.2.7 Summary
57 Evaluating Measurement Models Using Confirmatory Factor Analysis
- 57.1 Conceptual Overview
  - 57.1.1 Path Diagrams
  - 57.1.2 Model Identification
  - 57.1.3 Model Fit
  - 57.1.4 Parameter Estimates
  - 57.1.5 Model Comparisons
  - 57.1.6 Statistical Assumptions
- 57.2 Tutorial
  - 57.2.1 Video Tutorial
  - 57.2.2 Functions & Packages Introduced
  - 57.2.3 Initial Steps
  - 57.2.4 Estimate One-Factor CFA Models
  - 57.2.5 Estimate Multi-Factor CFA Models
  - 57.2.6 Nested Model Comparisons
  - 57.2.7 Estimate Second-Order Model
  - 57.2.8 Estimating Models with Missing Data
  - 57.2.9 Simulate Dynamic Fit Index Cutoffs
  - 57.2.10 Summary
58 Estimating Structural Regression Models Using Structural Equation Modeling
- 58.1 Conceptual Overview
  - 58.1.1 Path Diagrams
  - 58.1.2 Model Identification
  - 58.1.3 Model Fit
  - 58.1.4 Parameter Estimates
  - 58.1.5 Model Comparisons
  - 58.1.6 Statistical Assumptions
- 58.2 Tutorial
  - 58.2.1 Video Tutorial
  - 58.2.2 Functions & Packages Introduced
  - 58.2.3 Initial Steps
  - 58.2.4 Evaluate the Measurement Model Using Confirmatory Factor Analysis
  - 58.2.5 Estimate a Structural Regression Model
  - 58.2.6 Nested Model Comparisons
  - 58.2.7 Estimating Indirect Effects in Mediation Models
  - 58.2.8 Estimating Models with Missing Data
  - 58.2.9 Summary
59 Estimating Change Using Latent Growth Modeling
- 59.1 Conceptual Overview
  - 59.1.1 Path Diagrams
  - 59.1.2 Model Identification
  - 59.1.3 Model Fit
  - 59.1.4 Parameter Estimates
  - 59.1.5 Model Comparisons
  - 59.1.6 Statistical Assumptions
- 59.2 Tutorial
  - 59.2.1 Video Tutorial
  - 59.2.2 Functions & Packages Introduced
  - 59.2.3 Initial Steps
  - 59.2.4 Visualizing Change
  - 59.2.5 Estimate Unconditional Unconstrained Latent Growth Model
  - 59.2.6 Nested Model Comparisons
  - 59.2.7 Estimate Nonlinear Latent Growth Models
  - 59.2.8 Estimating Models with Missing Data
  - 59.2.9 Summary
X Employee Compensation & Reward Systems
60 Introduction to Employee Compensation & Reward Systems
- 60.1 Chapters Included
61 Preparing Market Survey Data
- 61.1 Conceptual Overview
  - 61.1.1 Aging Market Survey Data
  - 61.1.2 Applying Market Survey Weights
  - 61.1.3 Conceptual Video
- 61.2 Tutorial
  - 61.2.1 Video Tutorials
  - 61.2.2 Functions & Packages Introduced
  - 61.2.3 Initial Steps
  - 61.2.4 Age the Data
  - 61.2.5 Compute the Sample-Weighted Means
  - 61.2.6 Summary
62 Estimating a Market Pay Line Using Linear & Polynomial Regression
- 62.1 Conceptual Overview
  - 62.1.1 Statistical Assumptions
  - 62.1.2 Statistical Significance
  - 62.1.3 Practical Significance
  - 62.1.4 Conceptual Video
- 62.2 Tutorial
  - 62.2.1 Video Tutorial
  - 62.2.2 Functions & Packages Introduced
  - 62.2.3 Initial Steps
  - 62.2.4 Estimate a Market Pay Line
  - 62.2.5 Summary
63 Identifying Pay Determinants & Evaluating Pay Equity Using Hierarchical Linear Regression
- 63.1 Conceptual Overview
  - 63.1.1 Review of Hiearchical Linear Regression
  - 63.1.2 Conceptual Videos
- 63.2 Tutorial
  - 63.2.1 Video Tutorial
  - 63.2.2 Functions & Packages Introduced
  - 63.2.3 Initial Steps
  - 63.2.4 Perform Hierarchical Linear Regression
  - 63.2.5 Summary
64 Computing Compa-Ratios & Investigating Pay Compression
- 64.1 Conceptual Overview
  - 64.1.1 Conceptual Videos
- 64.2 Tutorial
  - 64.2.1 Video Tutorial
  - 64.2.2 Functions & Packages Introduced
  - 64.2.3 Initial Steps
  - 64.2.4 Compute Compa-Ratio for Each Employee
  - 64.2.5 Compute Compa-Ratio for Group of Employees
  - 64.2.6 Investigate Pay Compression and Pay Inversion
  - 64.2.7 Summary
XI Odds & Ends
65 Primer on Data
66 Legal & Ethical Issues
67 Judgment, Decision Making, & Bias
68 Language Considerations
69 Creating a Data Analytics Portfolio
70 Careers in Human Resource Analytics
71 Conducting a Literature Search & Review
72 Statistical & Practical Significance
73 Missing Data
74 Power Analysis
References
Published with bookdown

R for HR: An Introduction to Human Resource Analytics Using R Chapter 16 Arranging (Sorting) Data

In this chapter, we will learn how to arrange (sort) data within a data frame object, which can be useful for identifying high or low numeric values or to alphabetize character values.

16.1 Conceptual Overview

Arranging (sorting) data refers to the process of ordering rows numerically or alphabetically in a data frame or table by the values of one or more variables. Sorting can make it easier to visually scan raw data, such as for the purposes of identifying extreme or outlier values. Sorting can also make facilitate decision making when rank ordering applicants’ scores, for example, on different selection tools.

16.2 Tutorial

This chapter’s tutorial demonstrates how to arrange (sort) data in R.

16.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below. Both versions of the tutorial will show you how to arrange (sort) data with or without the pipe (%>%) operator. If you’re unfamiliar with the pipe operator, no need to worry: I provide a brief explanation and demonstration regarding their purpose in both versions of the tutorial.

Link to video tutorial: https://youtu.be/wVwJQsLNbmw

16.2.2 Functions & Packages Introduced

Function	Package
arrange	dplyr
desc	dplyr

16.2.3 Initial Steps

Please note, that any function that appears in the Initial Steps section has been covered in a previous chapter. If you need a refresher, please view the relevant chapter. In addition, a previous chapter may show you how to perform the same action using different functions or packages.

If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory setwd("H:/RWorkshop")

Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already # [Note: You don't need to install a package every # time you wish to access it] install.packages("readr") # Access readr package library(readr) # Read data and name data frame (tibble) object personaldata <- read_csv("PersData.csv") ## Rows: 9 Columns: 5 ## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## chr (4): lastname, firstname, startdate, gender ## dbl (1): id ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # Print the names of the variables in the data frame (tibble) object names(personaldata) ## [1] "id" "lastname" "firstname" "startdate" "gender" # Print data frame (tibble) object personaldata ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 154 McDonald Ronald 1/9/2016 male ## 3 155 Smith John 1/9/2016 male ## 4 165 Doe Jane 1/4/2016 female ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 111 Newton Isaac 1/9/2016 male ## 7 198 Morales Linda 1/7/2016 female ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

As you can see from the output generated in your console, the personaldata data frame object contains basic employee demographic information. The variable names include: id, lastname, firstname, startdate, and gender. Technically, the read_csv function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.

16.2.4 Arrange (Sort) Data

There are different functions we could use to arrange (sort) the data in the data frame, and in this chapter, we will focus on the arrange function from the dplyr package (Wickham et al. 2023). Please note that there are other functions we could use to sort data, and if you’re interested, in the Arranging (Sorting) Data: Chapter Supplement, I demonstrate how to use the order function from base R to carry out the same operations we will cover below.

Because the arrange function comes from the dplyr package, which is part of the tidyverse of R packages (Wickham 2023; Wickham et al. 2019). If you haven’t already, install and access the dplyr package using the install.packages and library functions, respectively.

# Install dplyr package if you haven't already # [Note: You don't need to install a package every # time you wish to access it] install.packages("dplyr") # Access dplyr package library(dplyr)

Before diving into arranging the data, as a disclaimer, I will demonstrate two techniques for arranging (sorting) data using the arrange function.

The first technique uses a “pipe” which in R is represented by the %>% operator. The pipe operator comes from a package called magrittr (Bache and Wickham 2022), on which the dplyr is partially dependent. In short, a pipe allows a person to more efficiently write code and to improve the readability of the code and overall script. Specifically, a pipe forwards the result or value of one object or expression to a subsequent function. In doing so, one can avoid writing functions in which other functions are nested parenthetically. For more information on the pipe operator, check out Wickham and Grolemund’s (2017) chapter on pipes: https://r4ds.had.co.nz/pipes.html.

This brings us to the second technique for arranging (sorting) data using the arrange function. The second technique uses a more traditional approach that some may argue lacks the efficiency and readability of the pipe. Conversely, others may argue against the use of pipes altogether. I’m not here to settle any “pipes versus no pipes” debate, and you’re welcome to use either technique. If you don’t want to learn how to use pipes (or would like to learn how to use them at a later date), feel free to skip to the section below called Without Pipe.

16.2.4.1 With Pipe

To use the “with pipe” technique, first, type the name of our data frame object, which we previously named personaldata, followed by the pipe (%>%) operator. This will “pipe” our data frame into the subsequent function. Second, either on the same line or on the next line, type the name of the arrange function, and within the parentheses, enter the variable name startdate as the argument to indicate that we want to arrange (sort) the data by the start date of the employees. The default operation of the arrange function is to arrange (sort) the data in ascending order. If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names function, which I demonstrated previously in this chapter in the Initial Steps section.

# Arrange (sort) data by variable in ascending order (single line) (with pipe) personaldata %>% arrange(startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

Alternatively, we can write this script over two lines and achieve the same output in our Console.

# Arrange (sort) data by variable in ascending order (two lines) (with pipe) personaldata %>% arrange(startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

Please note that the operations we have performed thus far have not changed anything in the personaldata data frame object itself; rather, the output in the Console simply shows what it looks like if the data are sorted by the variable in question. We can verify this by viewing the first six rows of data in our data frame object using the head function. As you can see below, nothing changed in the data frame itself.

# Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 154 McDonald Ronald 1/9/2016 male ## 3 155 Smith John 1/9/2016 male ## 4 165 Doe Jane 1/4/2016 female ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 111 Newton Isaac 1/9/2016 male

To change the ordering of data in the personaldata data frame object itself, we will need to (re)name the data frame object using the <- variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata). To do so, to the left of the <- operator, type what you would like to name the new (updated) sorted data frame object (personaldata). Next, to the right of the <- operator, copy and paste the same code we wrote above. Finally, use the head function from base R to view the first six rows of the new data frame object.

# Arrange (sort) data by variable in ascending order and # overwrite existing data frame object (with pipe) personaldata <- personaldata %>% arrange(startdate) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male

As you can see in the Console output, now the personaldata data frame object has been changed such that the data are arranged (sorted) by the startdate variable.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below.

# Arrange (sort) data by variable in ascending order and # overwrite existing data frame object (with pipe) personaldata <- personaldata %>% arrange(desc(startdate)) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 201 Providence Cindy 1/9/2016 female ## 5 282 Legend John 1/9/2016 male ## 6 198 Morales Linda 1/7/2016 female

To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments. Let’s enter the gender variable first, followed by the startdate variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate is sorted within the sorted levels of the gender variable. As a reminder, the default operation of the arrange function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments).

# Arrange (sort) data by two variables in ascending order (with pipe) personaldata %>% arrange(gender, startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 165 Doe Jane 1/4/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 201 Providence Cindy 1/9/2016 female ## 4 153 Sanchez Alejandro 1/1/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 154 McDonald Ronald 1/9/2016 male ## 7 155 Smith John 1/9/2016 male ## 8 111 Newton Isaac 1/9/2016 male ## 9 282 Legend John 1/9/2016 male

Watch what happens when we switch the order of the two variables we are using to sort the data.

# Arrange (sort) data by two variables in ascending order (with pipe) personaldata %>% arrange(startdate, gender) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 201 Providence Cindy 1/9/2016 female ## 6 154 McDonald Ronald 1/9/2016 male ## 7 155 Smith John 1/9/2016 male ## 8 111 Newton Isaac 1/9/2016 male ## 9 282 Legend John 1/9/2016 male

As you can see, the order of the two sorting variables matters.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function.

# Arrange (sort) data by variable in descending order (with pipe) personaldata %>% arrange(desc(gender), desc(startdate)) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 282 Legend John 1/9/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 153 Sanchez Alejandro 1/1/2016 male ## 7 201 Providence Cindy 1/9/2016 female ## 8 198 Morales Linda 1/7/2016 female ## 9 165 Doe Jane 1/4/2016 female

Or, we can sort one variable in the default ascending order and the other in descending order.

# Arrange (sort) data by two variables in ascending & descending order (with pipe) personaldata %>% arrange(gender, desc(startdate)) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 201 Providence Cindy 1/9/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 165 Doe Jane 1/4/2016 female ## 4 154 McDonald Ronald 1/9/2016 male ## 5 155 Smith John 1/9/2016 male ## 6 111 Newton Isaac 1/9/2016 male ## 7 282 Legend John 1/9/2016 male ## 8 125 Franklin Benjamin 1/5/2016 male ## 9 153 Sanchez Alejandro 1/1/2016 male

16.2.4.2 Without Pipe

We can achieve the same output without using the pipe (%>%) operator as with the pipe operator; again, your choice of using or not using the pipe operator is up to you.

To use the arrange function without the pipe operator, type the name of the arrange function, and within the parentheses, as the first argument, type the name of the personaldata data frame object, and as the second argument, type the startdate variable, where the latter indicates that we want to arrange (sort) the data frame object by the start date of the employees. The default operation of the arrange function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments). If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names function, which I demonstrated previously in this chapter in the Initial Steps section.

# Arrange (sort) data by variable in ascending order without pipe arrange(personaldata, startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

# Arrange (sort) data by variable in ascending order and # overwrite existing data frame object without pipe personaldata <- arrange(personaldata, startdate) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below.

# Arrange (sort) data by variable in descending order and # overwrite existing data frame object without pipe personaldata <- arrange(personaldata, desc(startdate)) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 201 Providence Cindy 1/9/2016 female ## 5 282 Legend John 1/9/2016 male ## 6 198 Morales Linda 1/7/2016 female

To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments (after the name of the data frame, which is the first argument). Let’s enter the gender variable first, followed by the startdate variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed.

# Arrange (sort) data by variable in ascending order without pipe personaldata <- arrange(personaldata, gender, startdate)

As shown in the output above, startdate is sorted within the sorted levels of the gender variable. This also verifies that the default operation of the arrange function is to arrange (sort) the data in ascending order.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below. You can use the desc function on one or both sorting variables.

# Arrange (sort) data by one variable in ascending order and # the other in descending order without pipe personaldata <- arrange(personaldata, gender, desc(startdate))

Or we can apply the desc function to both variables.

# Arrange (sort) data by both variables descending order without pipe personaldata <- arrange(personaldata, desc(gender), desc(startdate))

16.2.5 Summary

In this chapter, we learned how to arrange (sort) data by one or more variables using the arrange and desc functions from the dplyr package. This chapter also introduced the pipe (%>%) operator, which can help make code easier to read in some contexts.

16.3 Chapter Supplement

In addition to the arrange function from the dplyr package covered above, we can use the order function from base R to arrange (sort) data by values for one or more variable. Because this function comes from base R, we do not need to install and access an additional package like we do with the arrange functions, which some may find advantageous.

16.3.1 Functions & Packages Introduced

Function	Package
order	base R
c	base R

16.3.2 Initial Steps

If required, please refer to the Initial Steps section from this chapter for more information on these initial steps.

# Set your working directory setwd("H:/RWorkshop") # Access readr package library(readr) # Read data and name data frame (tibble) object personaldata <- read_csv("PersData.csv") ## Rows: 9 Columns: 5 ## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## chr (4): lastname, firstname, startdate, gender ## dbl (1): id ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

16.3.3 order Function from Base R

To sort a data frame object in ascending order based on a single variable, we will use the order function from base R to do the following:

Type the name of the data frame object that you wish to arrange (sort) (personaldata).
Insert brackets ([ ]), which allow us to reference rows or columns depending on how we format the brackets. If we type a function or value before the comma, we are indicating that we wish to apply operations to row(s), and if we type a function or value after the comma, we are indicating that we wish to apply operations to column(s).
To sort the data frame into ascending rows by the startdate variable, type the name of the order function before the comma in the brackets. As the sole parenthetical argument of the order function, type the name of the personaldata data frame object, followed by the $ operator and the name of the variable by which we wish to sort the data frame, which to reiterate is the startdate variable. The $ operator signals to R that a variable belongs to a particular data frame object. By default, the order function sorts in ascending order.

# Arrange (sort) data by variable in ascending order personaldata[order(personaldata$startdate),] ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

# Arrange (sort) data by variable in ascending order # and overwrite existing data frame object personaldata <- personaldata[order(personaldata$startdate),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male

To sort in descending order, add the argument decreasing=TRUE within the order function parentheses. Remember, we use commas to separate arguments used in a function (if there are two or more arguments).

# Arrange (sort) data by variable in descending order personaldata <- personaldata[order(personaldata$startdate, decreasing=TRUE),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 201 Providence Cindy 1/9/2016 female ## 5 282 Legend John 1/9/2016 male ## 6 198 Morales Linda 1/7/2016 female

If we wish to sort a data frame object by two variables, as the second argument in the order function parentheses, simply add the name of the data frame object, followed by the $ operator and the name of the second second variable. We will sort the data frame in by gender and startdate. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate is sorted within the sorted levels of the gender variable. The default operation of the arrange function is to arrange (sort) the data in ascending order.

# Arrange (sort) data by two variables in ascending order personaldata <- personaldata[order(personaldata$gender, personaldata$startdate),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 165 Doe Jane 1/4/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 201 Providence Cindy 1/9/2016 female ## 4 153 Sanchez Alejandro 1/1/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 154 McDonald Ronald 1/9/2016 male

To sort by one of the variables in descending order and the other variable by the default ascending order, we need to add the decreasing= argument, but because we have two variables, we need to provide a vector containing logical values (TRUE, FALSE) to indicate which variable we wish to apply a descending order. If the logical value is TRUE for the decreasing= argument, then we sort in descending variable. Using the c (combine) function from base R, we create a vector of two logical values whose order corresponds to the order in which we listed the two variables in the order function. For example, if the argument is decreasing=c(FALSE, TRUE), then we sort the first variable in the default ascending order and the second variable in descending order, which is what we do below. Just be sure to add the following argument to the order function when attempting to sort two or more variables: method="radix".

# Arrange (sort) data by gender in ascending order and # startdate in descending order personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(FALSE, TRUE), method="radix"),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 201 Providence Cindy 1/9/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 165 Doe Jane 1/4/2016 female ## 4 154 McDonald Ronald 1/9/2016 male ## 5 155 Smith John 1/9/2016 male ## 6 111 Newton Isaac 1/9/2016 male

Or, you could sort by both variables in descending order by change the argument to decreasing=c(TRUE, TRUE).

# Arrange (sort) data by gender and startdate variables descending order personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(TRUE, TRUE), method="radix"),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 282 Legend John 1/9/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 153 Sanchez Alejandro 1/1/2016 male

References

Bache, Stefan Milton, and Hadley Wickham. 2022. Magrittr: A Forward-Pipe Operator for r. https://CRAN.R-project.org/package=magrittr. ———. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse. Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686. Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr. Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data. Sebastopol, California: O’Reilly Media, Inc. https://r4ds.had.co.nz/n. Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Tag » How To Sort In R

Chapter 16 Arranging (Sorting) Data | R For HR: An Introduction To ...

16.1 Conceptual Overview

16.2 Tutorial

16.2.1 Video Tutorial

16.2.2 Functions & Packages Introduced

16.2.3 Initial Steps

16.2.4 Arrange (Sort) Data

16.2.4.1 With Pipe

16.2.4.2 Without Pipe

16.2.5 Summary

16.3 Chapter Supplement

16.3.1 Functions & Packages Introduced

16.3.2 Initial Steps

16.3.3 order Function from Base R

References

Sorting Data - Quick-R

Sorting In R Using Order() Tutorial - DataCamp

How Can I Sort My Data In R? | R FAQ - Statistical Consulting

R Sort A Data Frame Using Order() - Guru99

How To Sort A Data Frame By Multiple Columns In R - Chartio

How To Sort A DataFrame In R ? - GeeksforGeeks

Sort Function - RDocumentation

SORT In R With Sort() And Order() Functions [vectors, Data Frames, ...]

How To Sort An R Data Frame (multiple Ways, Multiple Columns)

R – Sorting A Data Frame By The Contents Of A Column - R-bloggers

Sorting Or Ordering Vectors - R

Sort In R: How To Sort Data In R - R-Lang

How To Sort Data In R - ProjectPro

Sort Table In R (3 Examples) - Statistics Globe

Contact