Chapter 16 Arranging (Sorting) Data | R For HR: An Introduction To ...

  • R for HR
  • Preface
    • 0.1 Growth of HR Analytics
    • 0.2 Skills Gap
    • 0.3 Project Life Cycle Perspective
    • 0.4 Overview of HRIS & HR Analytics
    • 0.5 My Philosophy for This Book
      • 0.5.1 Rationale for Using R
      • 0.5.2 Audience
    • 0.6 Structure
    • 0.7 About the Author
    • 0.8 Contacting the Author
    • 0.9 Acknowledgements
  • I HR Analytics Project Life Cycle
  • 1 Overview of HR Analytics Project Life Cycle
  • 2 Question Formulation
    • 2.1 Adopting a Strategic Mindset
      • 2.1.1 Strategy
      • 2.1.2 Strategy Formulation
      • 2.1.3 Strategy Implementation
      • 2.1.4 Strategic Human Resource Management
    • 2.2 Defining Problems & Formulating Questions
      • 2.2.1 Defining a Problem
      • 2.2.2 Formulating a Question
      • 2.2.3 Thinking Divergently & Convergently
    • 2.3 Summary
  • 3 Data Acquisition
    • 3.1 Employee Surveys
    • 3.2 Rating Forms
    • 3.3 Surveillance & Monitoring
    • 3.4 Database Queries
    • 3.5 Scraping
    • 3.6 Summary
  • 4 Data Management
    • 4.1 Data Cleaning
    • 4.2 Data Manipulation & Structuring
    • 4.3 Common Data-Management Tools
    • 4.4 Summary
  • 5 Data Analysis
    • 5.1 Tools & Techniques
      • 5.1.1 Mathematics
      • 5.1.2 Statistics
      • 5.1.3 Machine Learning
      • 5.1.4 Computational Modeling & Simulations
      • 5.1.5 Text Analyses & Qualitative Analyses
    • 5.2 Continuum of Data Analytics
      • 5.2.1 Descriptive Analytics
      • 5.2.2 Predict-ish Analytics
      • 5.2.3 Predictive Analytics
      • 5.2.4 Prescriptive Analytics
    • 5.3 Summary
  • 6 Data Interpretation & Storytelling
    • 6.1 Data Interpretation
    • 6.2 Storytelling
      • 6.2.1 Structure
      • 6.2.2 Clarity & Parsimony
      • 6.2.3 Influence & Persuasion
    • 6.3 Data Visualization
    • 6.4 Summary
  • 7 Deployment & Implementation
  • II Introduction to R
  • 8 Overview of R & RStudio
    • 8.1 R Programming Language
      • 8.1.1 What Is R?
      • 8.1.2 Why Use R?
      • 8.1.3 Who Uses R?
    • 8.2 RStudio
      • 8.2.1 What is RStudio?
      • 8.2.2 Why RStudio?
      • 8.2.3 Who Uses RStudio?
    • 8.3 Packages
    • 8.4 Summary
  • 9 Installing R & RStudio
    • 9.1 Video Tutorial
    • 9.2 Downloading & Installing R
      • 9.2.1 For Windows Operation Systems
      • 9.2.2 For Mac Operating Systems
    • 9.3 Downloading & Installing RStudio
    • 9.4 Summary
  • 10 Getting Started with R & RStudio
    • 10.1 Orientation to RStudio
    • 10.2 Creating & Saving an R Script
      • 10.2.1 Creating a New R Script
      • 10.2.2 Using an R Script
      • 10.2.3 Saving an R Script
      • 10.2.4 Opening a Saved R Script
    • 10.3 Creating an RStudio Project
      • 10.3.1 Creating a New RStudio Project
      • 10.3.2 Opening an Existing RStudio Project
    • 10.4 Orientation to Written Tutorials
    • 10.5 Summary
  • 11 Basic Features and Operations of the R Language
    • 11.1 Video Tutorial
    • 11.2 Functions & Packages Introduced
    • 11.3 R as a Calculator
    • 11.4 Functions
    • 11.5 Packages
    • 11.6 Variable Assignment
    • 11.7 Types of Data
      • 11.7.1 numeric Data
      • 11.7.2 character Data
      • 11.7.3 Date Data
      • 11.7.4 logical Data
    • 11.8 Vectors
    • 11.9 Lists
    • 11.10 Data Frames
    • 11.11 Annotations
    • 11.12 Summary
  • 12 Setting a Working Directory
    • 12.1 Video Tutorial
    • 12.2 Functions & Packages Introduced
    • 12.3 Identify the Current Working Directory
    • 12.4 Set a New Working Directory
    • 12.5 Summary
  • III Data Acquisition & Management
  • 13 Reading Data into R
    • 13.1 Conceptual Overview
    • 13.2 Tutorial
      • 13.2.1 Video Tutorial
      • 13.2.2 Functions & Packages Introduced
      • 13.2.3 Initial Steps
      • 13.2.4 Read a .csv File
      • 13.2.5 Read a .xlsx File
      • 13.2.6 Summary
    • 13.3 Chapter Supplement
      • 13.3.1 Functions & Packages Introduced
      • 13.3.2 Initial Steps
      • 13.3.3 Additional Functions for Reading a .csv File
      • 13.3.4 Skip Rows of Data During Read
      • 13.3.5 List Data File Names in Working Directory
  • 14 Removing, Adding, & Changing Variable Names
    • 14.1 Conceptual Overview
    • 14.2 Tutorial
      • 14.2.1 Video Tutorial
      • 14.2.2 Functions & Packages Introduced
      • 14.2.3 Initial Steps
      • 14.2.4 Remove Variable Names from a Data Frame Object
      • 14.2.5 Add Variable Names to a Data Frame Object
      • 14.2.6 Change Specific Variable Names in a Data Frame Object
      • 14.2.7 Summary
  • 15 Writing Data from R
    • 15.1 Conceptual Overview
    • 15.2 Tutorial
      • 15.2.1 Video Tutorial
      • 15.2.2 Functions & Packages Introduced
      • 15.2.3 Initial Steps
      • 15.2.4 Write Data Frame to Working Directory
      • 15.2.5 Write Table to Working Directory
      • 15.2.6 Summary
  • 16 Arranging (Sorting) Data
    • 16.1 Conceptual Overview
    • 16.2 Tutorial
      • 16.2.1 Video Tutorial
      • 16.2.2 Functions & Packages Introduced
      • 16.2.3 Initial Steps
      • 16.2.4 Arrange (Sort) Data
      • 16.2.5 Summary
    • 16.3 Chapter Supplement
      • 16.3.1 Functions & Packages Introduced
      • 16.3.2 Initial Steps
      • 16.3.3 order Function from Base R
  • 17 Joining (Merging) Data
    • 17.1 Conceptual Overview
      • 17.1.1 Review of Horizontal Joins (Merges)
      • 17.1.2 Review of Vertical Joins (Merges)
    • 17.2 Tutorial
      • 17.2.1 Video Tutorial
      • 17.2.2 Functions & Packages Introduced
      • 17.2.3 Initial Steps
      • 17.2.4 Horizontal Join (Merge)
      • 17.2.5 Vertical Join (Merge)
      • 17.2.6 Summary
    • 17.3 Chapter Supplement
      • 17.3.1 Video Tutorial
      • 17.3.2 Functions & Packages Introduced
      • 17.3.3 Initial Steps
      • 17.3.4 merge Function from Base R
  • 18 Filtering (Subsetting) Data
    • 18.1 Conceptual Overview
      • 18.1.1 Review of Logical Operators
    • 18.2 Tutorial
      • 18.2.1 Video Tutorial
      • 18.2.2 Functions & Packages Introduced
      • 18.2.3 Initial Steps
      • 18.2.4 Filter Cases from Data Frame
      • 18.2.5 Remove Single Variable from Data Frame
      • 18.2.6 Select Multiple Variables from Data Frame
      • 18.2.7 Remove Multiple Variables from Data Frame
      • 18.2.8 Summary
    • 18.3 Chapter Supplement
      • 18.3.1 Video Tutorials
      • 18.3.2 Functions & Packages Introduced
      • 18.3.3 Initial Steps
      • 18.3.4 subset Function from Base R
      • 18.3.5 Filter by Pattern Contained within String
  • 19 Cleaning Data
    • 19.1 Conceptual Overview
    • 19.2 Tutorial
      • 19.2.1 Video Tutorial
      • 19.2.2 Functions & Packages Introduced
      • 19.2.3 Initial Steps
      • 19.2.4 Review Data
      • 19.2.5 Clean Data
      • 19.2.6 Rename Variables
      • 19.2.7 Other Approaches to Cleaning Data
      • 19.2.8 Summary
  • 20 Manipulating & Restructuring Data
    • 20.1 Conceptual Overview
    • 20.2 Tutorial
      • 20.2.1 Video Tutorial
      • 20.2.2 Functions & Packages Introduced
      • 20.2.3 Initial Steps
      • 20.2.4 Wide-to-Long Format Data Manipulation
      • 20.2.5 Long-to-Wide Format Data Manipulation
      • 20.2.6 Summary
  • 21 Centering & Standardizing Variables
    • 21.1 Conceptual Overview
      • 21.1.1 Review of Centering Variables
      • 21.1.2 Review of Standardizing Variables
    • 21.2 Tutorial
      • 21.2.1 Video Tutorial
      • 21.2.2 Functions & Packages Introduced
      • 21.2.3 Initial Steps
      • 21.2.4 Grand-Mean Center Variables
      • 21.2.5 Group-Mean Center Variables
      • 21.2.6 Standardize Variables
      • 21.2.7 Summary
  • 22 Removing Objects from the R Environment
    • 22.1 Conceptual Overview
    • 22.2 Tutorial
      • 22.2.1 Video Tutorial
      • 22.2.2 Functions & Packages Introduced
      • 22.2.3 Initial Steps
      • 22.2.4 List Objects in R Environment
      • 22.2.5 Remove Objects from R Environment
      • 22.2.6 Summary
  • IV Employee Demographics
  • 23 Introduction to Employee Demographics
    • 23.1 Chapters Included
  • 24 Describing Employee Demographics Using Descriptive Statistics
    • 24.1 Conceptual Overview
      • 24.1.1 Review of Measurement Scales
      • 24.1.2 Constructs, Measures, & Measurement Scales
      • 24.1.3 Types of Descriptive Statistics
      • 24.1.4 Sample Write-Up
    • 24.2 Tutorial
      • 24.2.1 Video Tutorials
      • 24.2.2 Functions & Packages Introduced
      • 24.2.3 Initial Steps
      • 24.2.4 Determine the Measurement Scale
      • 24.2.5 Describe Nominal & Ordinal (Categorical) Variables
      • 24.2.6 Describe Interval & Ratio (Continuous) Variables
      • 24.2.7 Summary
    • 24.3 Chapter Supplement
      • 24.3.1 Functions & Packages Introduced
      • 24.3.2 Initial Steps
      • 24.3.3 Compute Coefficient of Variation (CV)
  • 25 Summarizing Two or More Categorical Variables Using Cross-Tabulations
    • 25.1 Conceptual Overview
      • 25.1.1 Review of Cross-Tabulation
      • 25.1.2 Sample Write-Up
    • 25.2 Tutorial
      • 25.2.1 Video Tutorial
      • 25.2.2 Functions & Packages Introduced
      • 25.2.3 Initial Steps
      • 25.2.4 Two-Way Cross-Tabulation
      • 25.2.5 Three-Way Cross-Tabulation
      • 25.2.6 Summary
  • 26 Applying Pivot Tables to Explore Employee Demographic Data
    • 26.1 Conceptual Overview
    • 26.2 Tutorial
      • 26.2.1 Video Tutorial
      • 26.2.2 Functions & Packages Introduced
      • 26.2.3 Initial Steps
      • 26.2.4 Create a Pivot Table
      • 26.2.5 Summary
  • V Employee Surveys
  • 27 Introduction to Employee Surveys
    • 27.1 Chapters Included
  • 28 Aggregating & Segmenting Employee Survey Data
    • 28.1 Conceptual Overview
    • 28.2 Tutorial
      • 28.2.1 Video Tutorial
      • 28.2.2 Functions & Packages Introduced
      • 28.2.3 Initial Steps
      • 28.2.4 Counts By Group
      • 28.2.5 Measures of Central Tendency and Dispersion By Group
      • 28.2.6 Add Variable to Data Frame Containing Aggregated Values
      • 28.2.7 Visualize Data By Group
      • 28.2.8 Summary
    • 28.3 Chapter Supplement
      • 28.3.1 Functions & Packages Introduced
      • 28.3.2 Initial Steps
      • 28.3.3 describeBy Function from psych Package
      • 28.3.4 aggregate Function from Base R
  • 29 Estimating Internal Consistency Reliability Using Cronbach’s alpha
    • 29.1 Conceptual Overview
    • 29.2 Tutorial
      • 29.2.1 Video Tutorial
      • 29.2.2 Functions & Packages Introduced
      • 29.2.3 Initial Steps
      • 29.2.4 Compute Cronbach’s alpha
      • 29.2.5 Summary
  • 30 Creating a Composite Variable Based on a Multi-Item Measure
    • 30.1 Conceptual Overview
    • 30.2 Tutorial
      • 30.2.1 Video Tutorial
      • 30.2.2 Functions & Packages Introduced
      • 30.2.3 Initial Steps
      • 30.2.4 Compute Cronbach’s alpha
      • 30.2.5 Create a Composite Variable
      • 30.2.6 Summary
  • VI Employee Training
  • 31 Introduction to Employee Training
    • 31.1 Needs Assessment
    • 31.2 Learning Environment & Enhancement
    • 31.3 Training Methods
    • 31.4 Training Evaluation
      • 31.4.1 Causal Inferences
      • 31.4.2 Training Evaluation Designs & Statistical Analysis
    • 31.5 Chapters Included
  • 32 Evaluating a Pre-Test/Post-Test without Control Group Design Using Paired-Samples t-test
    • 32.1 Conceptual Overview
      • 32.1.1 Review of Pre-Test/Post-Test without Control Group Design
      • 32.1.2 Review of Paired-Samples t-test
    • 32.2 Tutorial
      • 32.2.1 Video Tutorial
      • 32.2.2 Functions & Packages Introduced
      • 32.2.3 Initial Steps
      • 32.2.4 Estimate Paired-Samples t-test
      • 32.2.5 Visualize Results Using Bar Chart
      • 32.2.6 Summary
    • 32.3 Chapter Supplement
      • 32.3.1 Functions & Packages Introduced
      • 32.3.2 Initial Steps
      • 32.3.3 t.test Function from Base R
      • 32.3.4 lm Function from Base R
  • 33 Evaluating a Post-Test-Only with Control Group Design Using Independent-Samples t-test
    • 33.1 Conceptual Overview
      • 33.1.1 Review of Post-Test-Only with Control Group Design
      • 33.1.2 Review of Independent-Samples t-test
    • 33.2 Tutorial
      • 33.2.1 Video Tutorial
      • 33.2.2 Functions & Packages Introduced
      • 33.2.3 Initial Steps
      • 33.2.4 Estimate Independent-Samples t-test
      • 33.2.5 Visualize Results Using Bar Chart
      • 33.2.6 Summary
    • 33.3 Chapter Supplement
      • 33.3.1 Functions & Packages Introduced
      • 33.3.2 Initial Steps
      • 33.3.3 t.test Function from Base R
      • 33.3.4 lm Function from Base R
  • 34 Evaluating a Pre-Test/Post-Test with Control Group Design Using an Independent-Samples t-test
    • 34.1 Conceptual Overview
      • 34.1.1 Statistical Assumptions
    • 34.2 Tutorial
      • 34.2.1 Video Tutorial
      • 34.2.2 Functions & Packages Introduced
      • 34.2.3 Initial Steps {#initsteps_mixedfactorial}}
      • 34.2.4 Evaluate a Pre-Test/Post-Test with Control Group Design
      • 34.2.5 Summary
    • 34.3 Chapter Supplement
      • 34.3.1 Functions & Packages Introduced
      • 34.3.2 Initial Steps
      • 34.3.3 Estimating a Simple Linear Regression Model with a Difference Score Outcome Variable
      • 34.3.4 Estimating a Biserial Correlation with a Difference Score Outcome Variable
      • 34.3.5 Estimating a 2x2 Mixed-Factorial ANOVA Model
      • 34.3.6 Estimating a Random-Coefficients Multilevel Model
      • 34.3.7 Estimating an Analysis of Covariance Model
  • 35 Evaluating a Post-Test-Only with Two Comparison Groups Design Using One-Way ANOVA
    • 35.1 Conceptual Overview
      • 35.1.1 Review of Post-Test-Only with Two Comparison Groups Design
      • 35.1.2 Review of One-Way ANOVA
    • 35.2 Tutorial
      • 35.2.1 Video Tutorial
      • 35.2.2 Functions & Packages Introduced
      • 35.2.3 Initial Steps
      • 35.2.4 Test Statistical Assumptions
      • 35.2.5 Estimate One-Way ANOVA
      • 35.2.6 Visualize Results Using Bar Chart
      • 35.2.7 Summary
    • 35.3 Chapter Supplement
      • 35.3.1 Functions & Packages Introduced
      • 35.3.2 Initial Steps
      • 35.3.3 aov Function from Base R
      • 35.3.4 APA-Style Table of Results
  • VII Employee Selection
  • 36 Introduction to Employee Selection
    • 36.1 Evaluating Selection Tools
    • 36.2 Chapters Included
  • 37 Investigating Disparate Impact
    • 37.1 Conceptual Overview
    • 37.2 Tutorial
      • 37.2.1 Video Tutorial
      • 37.2.2 Functions & Packages Introduced
      • 37.2.3 Initial Steps
      • 37.2.4 4/5ths Rule
      • 37.2.5 Chi-Square (\(\chi^2\)) Test of Independence
      • 37.2.6 Fisher Exact Test
      • 37.2.7 \(Z_{D}\) Test
      • 37.2.8 \(Z_{IR}\) Test
      • 37.2.9 Summary
  • 38 Estimating Criterion-Related Validity of a Selection Tool Using Correlation
    • 38.1 Conceptual Overview
      • 38.1.1 Review of Criterion-Related Validity
      • 38.1.2 Review of Correlation
    • 38.2 Tutorial
      • 38.2.1 Video Tutorial
      • 38.2.2 Functions & Packages Introduced
      • 38.2.3 Initial Steps
      • 38.2.4 Visualize Association Using a Scatter Plot
      • 38.2.5 Estimate Correlation
      • 38.2.6 Summary
    • 38.3 Chapter Supplement
      • 38.3.1 Functions & Packages Introduced
      • 38.3.2 Initial Steps
      • 38.3.3 cor Function from Base R
      • 38.3.4 cor.test Function from Base R
  • 39 Predicting Criterion Scores Based on Selection Tool Scores Using Simple Linear Regression
    • 39.1 Conceptual Overview
      • 39.1.1 Review of Simple Linear Regression
      • 39.1.2 Predicting Future Criterion Scores Using Simple Linear Regression
    • 39.2 Tutorial
      • 39.2.1 Video Tutorials
      • 39.2.2 Functions & Packages Introduced
      • 39.2.3 Initial Steps
      • 39.2.4 Estimate Simple Linear Regression Model
      • 39.2.5 Predict Criterion Scores
      • 39.2.6 Summary
    • 39.3 Chapter Supplement
      • 39.3.1 Functions & Packages Introduced
      • 39.3.2 Initial Steps
      • 39.3.3 lm Function from Base R
      • 39.3.4 predict Function from Base R
      • 39.3.5 APA-Style Results Table
  • 40 Estimating Incremental Validity of a Selection Tool Using Multiple Linear Regression
    • 40.1 Conceptual Overview
      • 40.1.1 Review of Multiple Linear Regression
    • 40.2 Tutorial
      • 40.2.1 Video Tutorials
      • 40.2.2 Functions & Packages Introduced
      • 40.2.3 Initial Steps
      • 40.2.4 Estimate Multiple Linear Regression Model
      • 40.2.5 Summary
    • 40.3 Chapter Supplement
      • 40.3.1 Functions & Packages Introduced
      • 40.3.2 Initial Steps
      • 40.3.3 lm Function from Base R
      • 40.3.4 APA-Style Results Table
  • 41 Applying a Compensatory Approach to Selection Decisions Using Multiple Linear Regression
    • 41.1 Conceptual Overview
      • 41.1.1 Review of Multiple Linear Regression
      • 41.1.2 Review of Compensatory Approach
    • 41.2 Tutorial
      • 41.2.1 Video Tutorial
      • 41.2.2 Functions & Packages Introduced
      • 41.2.3 Initial Steps
      • 41.2.4 Estimate Multiple Linear Regression Model
      • 41.2.5 Predict Criterion Scores
      • 41.2.6 Summary
    • 41.3 Chapter Supplement
      • 41.3.1 Functions & Packages Introduced
      • 41.3.2 Initial Steps
      • 41.3.3 lm & predict Functions from Base R
  • 42 Applying a Noncompensatory Approach to Selection Decisions Using Angoff Method
    • 42.1 Conceptual Overview
      • 42.1.1 Review of Noncompensatory Approach
    • 42.2 Tutorial
      • 42.2.1 Video Tutorial
      • 42.2.2 Functions & Packages Introduced
      • 42.2.3 Initial Steps
      • 42.2.4 Create Cutoff Scores
      • 42.2.5 Apply Cutoff Scores to Make Selection Decisions
      • 42.2.6 Summary
  • 43 Testing for Differential Prediction Using Moderated Multiple Linear Regression
    • 43.1 Conceptual Overview
      • 43.1.1 Review of Moderated Multiple Linear Regression
      • 43.1.2 Review of Differential Prediction
    • 43.2 Tutorial
      • 43.2.1 Video Tutorial
      • 43.2.2 Functions & Packages Introduced
      • 43.2.3 Initial Steps
      • 43.2.4 Grand-Mean Center Continuous Predictor Variables
      • 43.2.5 Estimate Moderated Multiple Linear Regression Model
      • 43.2.6 Summary
    • 43.3 Chapter Supplement
      • 43.3.1 Functions & Packages Introduced
      • 43.3.2 Initial Steps
      • 43.3.3 lm Function from Base R
      • 43.3.4 APA-Style Results Table
  • 44 Statistically & Empirically Cross-Validating a Selection Tool
    • 44.1 Conceptual Overview
      • 44.1.1 Review of Statistical Cross-Validation
      • 44.1.2 Review of Empirical Cross-Validation
    • 44.2 Tutorial
      • 44.2.1 Functions & Packages Introduced
      • 44.2.2 Initial Steps
      • 44.2.3 Perform Statistical Cross-Validation
      • 44.2.4 Perform Empirical Cross-Validation
      • 44.2.5 Summary
  • VIII Employee Separation & Retention
  • 45 Introduction to Employee Separation & Retention
    • 45.1 Chapters Included
  • 46 Computing Monthly & Annual Turnover Rates
    • 46.1 Conceptual Overview
    • 46.2 Tutorial
      • 46.2.1 Video Tutorial
      • 46.2.2 Functions & Packages Introduced
      • 46.2.3 Initial Steps
      • 46.2.4 Compute Monthly Turnover Rates
      • 46.2.5 Compute Annual Turnover Rate
      • 46.2.6 Summary
  • 47 Estimating the Association Between Two Categorical Variables Using Chi-Square (\(\chi^2\)) Test of Independence
    • 47.1 Conceptual Overview
    • 47.2 Tutorial
      • 47.2.1 Video Tutorial
      • 47.2.2 Functions & Packages Introduced
      • 47.2.3 Initial Steps
      • 47.2.4 Create a Contingency Table for Observed Data
      • 47.2.5 Estimate Chi-Square (\(\chi^2\)) Test of Independence
      • 47.2.6 Summary
    • 47.3 Chapter Supplement
      • 47.3.1 Functions & Packages Introduced
      • 47.3.2 Initial Steps
      • 47.3.3 Compute Odds Ratio for 2x2 Contingency Table
  • 48 Identifying Predictors of Turnover Using Logistic Regression
    • 48.1 Conceptual Overview
      • 48.1.1 Review of Logistic Regression
    • 48.2 Tutorial
      • 48.2.1 Video Tutorials
      • 48.2.2 Functions & Packages Introduced
      • 48.2.3 Initial Steps
      • 48.2.4 Estimate Simple Logistic Regression Model
      • 48.2.5 Estimate Multiple Logistic Regression Model
      • 48.2.6 Summary
    • 48.3 Chapter Supplement
      • 48.3.1 Functions & Packages Introduced
      • 48.3.2 Initial Steps
      • 48.3.3 Simple Logistic Regression Model Using glm Function from Base R
      • 48.3.4 Multiple Logistic Regression Using glm Function from Base R
  • 49 Applying k-Fold Cross-Validation to Logistic Regression
    • 49.1 Conceptual Overview
      • 49.1.1 Review of Predictive Analytics
      • 49.1.2 Review of k-Fold Cross-Validation
      • 49.1.3 Conceptual Video
    • 49.2 Tutorial
      • 49.2.1 Video Tutorials
      • 49.2.2 Functions & Packages Introduced
      • 49.2.3 Initial Steps
      • 49.2.4 Apply k-Fold Cross-Validation Using Logistic Regression
      • 49.2.5 Summary
  • 50 Understanding Length of Service Using Survival Analysis
    • 50.1 Conceptual Overview
      • 50.1.1 Censoring
      • 50.1.2 Types of Survival Analysis
      • 50.1.3 Conceptual Video
    • 50.2 Tutorial
      • 50.2.1 Video Tutorials
      • 50.2.2 Functions & Packages Introduced
      • 50.2.3 Initial Steps
      • 50.2.4 Create a Censoring Variable
      • 50.2.5 Inspect Distribution of Length of Service
      • 50.2.6 Conduct Kaplan-Meier Analysis & Create Life Table
      • 50.2.7 Estimate Cox Proportional Hazards Model
      • 50.2.8 Summary
  • IX Employee Performance Management
  • 51 Introduction to Employee Performance Management
    • 51.1 Chapters Included
  • 52 Evaluating Convergent & Discriminant Validity Using Scatter Plots & Correlations
    • 52.1 Conceptual Overview
      • 52.1.1 Review of Concurrent & Discriminant Validity
      • 52.1.2 Review of Pearson Product-Moment & Point-Biserial Correlation
      • 52.1.3 Review of Bivariate Scatter Plot
    • 52.2 Tutorial
      • 52.2.1 Video Tutorial
      • 52.2.2 Functions & Packages Introduced
      • 52.2.3 Initial Steps
      • 52.2.4 Visualize Association Using a Bivariate Scatter Plot
      • 52.2.5 Estimate Correlations
      • 52.2.6 Create Correlation Matrix
      • 52.2.7 Summary
    • 52.3 Chapter Supplement
      • 52.3.1 Functions & Packages Introduced
      • 52.3.2 Initial Steps
      • 52.3.3 shapiro.test Function from Base R
      • 52.3.4 APA-Style Results Table
      • 52.3.5 cor.plot Function from psych package
      • 52.3.6 corrgram Function from corrgram package
  • 53 Investigating Nonlinear Associations Using Polynomial Regression
    • 53.1 Conceptual Overview
      • 53.1.1 Statistical Assumptions
      • 53.1.2 Statistical Significance
    • 53.2 Tutorial
      • 53.2.1 Functions & Packages Introduced
      • 53.2.2 Initial Steps
      • 53.2.3 Visualize Association Using a Bivariate Scatter Plot
      • 53.2.4 Estimate Polynomial Regression Model
      • 53.2.5 Summary
  • 54 Supervised Statistical Learning Using Lasso Regression
    • 54.1 Conceptual Overview
      • 54.1.1 Shrinkage
      • 54.1.2 Regularization
      • 54.1.3 Tuning
      • 54.1.4 Model Type Selection
      • 54.1.5 Cross-Validation
      • 54.1.6 Predictive Analytics
      • 54.1.7 Conceptual Video
    • 54.2 Tutorial
      • 54.2.1 Video Tutorials
      • 54.2.2 Functions & Packages Introduced
      • 54.2.3 Initial Steps
      • 54.2.4 Process Overview
      • 54.2.5 Partition the Data
      • 54.2.6 Specify k-Fold Cross-Validation
      • 54.2.7 Specify and Train Lasso Regression Model
      • 54.2.8 Optional: Compare to Lasso Model to OLS Multiple Linear Regression Model
      • 54.2.9 Summary
  • 55 Investigating Processes Using Path Analysis
    • 55.1 Conceptual Overview
      • 55.1.1 Path Diagram
      • 55.1.2 Model Identification
      • 55.1.3 Model Fit
      • 55.1.4 Parameter Estimates
      • 55.1.5 Statistical Assumptions
      • 55.1.6 Conceptual Video
    • 55.2 Tutorial
      • 55.2.1 Video Tutorial
      • 55.2.2 Functions & Packages Introduced
      • 55.2.3 Initial Steps
      • 55.2.4 Specifying & Estimating Path Analysis Models
      • 55.2.5 Obtaining Standardized Parameter Estimates
      • 55.2.6 Alternative Approaches to Model Specifications
      • 55.2.7 Estimating Models with Missing Data
      • 55.2.8 Summary
  • 56 Estimating a Mediation Model Using Path Analysis
    • 56.1 Conceptual Overview
      • 56.1.1 Estimation of Indirect Effect
      • 56.1.2 Model Identification
      • 56.1.3 Model Fit
      • 56.1.4 Parameter Estimates
      • 56.1.5 Statistical Assumptions
      • 56.1.6 Conceptual Video
    • 56.2 Tutorial
      • 56.2.1 Video Tutorial
      • 56.2.2 Functions & Packages Introduced
      • 56.2.3 Initial Steps
      • 56.2.4 Specifying & Estimating a Mediation Analysis Model
      • 56.2.5 Obtaining Standardized Parameter Estimates
      • 56.2.6 Estimating Models with Missing Data
      • 56.2.7 Summary
  • 57 Evaluating Measurement Models Using Confirmatory Factor Analysis
    • 57.1 Conceptual Overview
      • 57.1.1 Path Diagrams
      • 57.1.2 Model Identification
      • 57.1.3 Model Fit
      • 57.1.4 Parameter Estimates
      • 57.1.5 Model Comparisons
      • 57.1.6 Statistical Assumptions
    • 57.2 Tutorial
      • 57.2.1 Video Tutorial
      • 57.2.2 Functions & Packages Introduced
      • 57.2.3 Initial Steps
      • 57.2.4 Estimate One-Factor CFA Models
      • 57.2.5 Estimate Multi-Factor CFA Models
      • 57.2.6 Nested Model Comparisons
      • 57.2.7 Estimate Second-Order Model
      • 57.2.8 Estimating Models with Missing Data
      • 57.2.9 Simulate Dynamic Fit Index Cutoffs
      • 57.2.10 Summary
  • 58 Estimating Structural Regression Models Using Structural Equation Modeling
    • 58.1 Conceptual Overview
      • 58.1.1 Path Diagrams
      • 58.1.2 Model Identification
      • 58.1.3 Model Fit
      • 58.1.4 Parameter Estimates
      • 58.1.5 Model Comparisons
      • 58.1.6 Statistical Assumptions
    • 58.2 Tutorial
      • 58.2.1 Video Tutorial
      • 58.2.2 Functions & Packages Introduced
      • 58.2.3 Initial Steps
      • 58.2.4 Evaluate the Measurement Model Using Confirmatory Factor Analysis
      • 58.2.5 Estimate a Structural Regression Model
      • 58.2.6 Nested Model Comparisons
      • 58.2.7 Estimating Indirect Effects in Mediation Models
      • 58.2.8 Estimating Models with Missing Data
      • 58.2.9 Summary
  • 59 Estimating Change Using Latent Growth Modeling
    • 59.1 Conceptual Overview
      • 59.1.1 Path Diagrams
      • 59.1.2 Model Identification
      • 59.1.3 Model Fit
      • 59.1.4 Parameter Estimates
      • 59.1.5 Model Comparisons
      • 59.1.6 Statistical Assumptions
    • 59.2 Tutorial
      • 59.2.1 Video Tutorial
      • 59.2.2 Functions & Packages Introduced
      • 59.2.3 Initial Steps
      • 59.2.4 Visualizing Change
      • 59.2.5 Estimate Unconditional Unconstrained Latent Growth Model
      • 59.2.6 Nested Model Comparisons
      • 59.2.7 Estimate Nonlinear Latent Growth Models
      • 59.2.8 Estimating Models with Missing Data
      • 59.2.9 Summary
  • X Employee Compensation & Reward Systems
  • 60 Introduction to Employee Compensation & Reward Systems
    • 60.1 Chapters Included
  • 61 Preparing Market Survey Data
    • 61.1 Conceptual Overview
      • 61.1.1 Aging Market Survey Data
      • 61.1.2 Applying Market Survey Weights
      • 61.1.3 Conceptual Video
    • 61.2 Tutorial
      • 61.2.1 Video Tutorials
      • 61.2.2 Functions & Packages Introduced
      • 61.2.3 Initial Steps
      • 61.2.4 Age the Data
      • 61.2.5 Compute the Sample-Weighted Means
      • 61.2.6 Summary
  • 62 Estimating a Market Pay Line Using Linear & Polynomial Regression
    • 62.1 Conceptual Overview
      • 62.1.1 Statistical Assumptions
      • 62.1.2 Statistical Significance
      • 62.1.3 Practical Significance
      • 62.1.4 Conceptual Video
    • 62.2 Tutorial
      • 62.2.1 Video Tutorial
      • 62.2.2 Functions & Packages Introduced
      • 62.2.3 Initial Steps
      • 62.2.4 Estimate a Market Pay Line
      • 62.2.5 Summary
  • 63 Identifying Pay Determinants & Evaluating Pay Equity Using Hierarchical Linear Regression
    • 63.1 Conceptual Overview
      • 63.1.1 Review of Hiearchical Linear Regression
      • 63.1.2 Conceptual Videos
    • 63.2 Tutorial
      • 63.2.1 Video Tutorial
      • 63.2.2 Functions & Packages Introduced
      • 63.2.3 Initial Steps
      • 63.2.4 Perform Hierarchical Linear Regression
      • 63.2.5 Summary
  • 64 Computing Compa-Ratios & Investigating Pay Compression
    • 64.1 Conceptual Overview
      • 64.1.1 Conceptual Videos
    • 64.2 Tutorial
      • 64.2.1 Video Tutorial
      • 64.2.2 Functions & Packages Introduced
      • 64.2.3 Initial Steps
      • 64.2.4 Compute Compa-Ratio for Each Employee
      • 64.2.5 Compute Compa-Ratio for Group of Employees
      • 64.2.6 Investigate Pay Compression and Pay Inversion
      • 64.2.7 Summary
  • XI Odds & Ends
  • 65 Primer on Data
  • 66 Legal & Ethical Issues
  • 67 Judgment, Decision Making, & Bias
  • 68 Language Considerations
  • 69 Creating a Data Analytics Portfolio
  • 70 Careers in Human Resource Analytics
  • 71 Conducting a Literature Search & Review
  • 72 Statistical & Practical Significance
  • 73 Missing Data
  • 74 Power Analysis
  • References
  • Published with bookdown
R for HR: An Introduction to Human Resource Analytics Using R Chapter 16 Arranging (Sorting) Data

In this chapter, we will learn how to arrange (sort) data within a data frame object, which can be useful for identifying high or low numeric values or to alphabetize character values.

16.1 Conceptual Overview

Arranging (sorting) data refers to the process of ordering rows numerically or alphabetically in a data frame or table by the values of one or more variables. Sorting can make it easier to visually scan raw data, such as for the purposes of identifying extreme or outlier values. Sorting can also make facilitate decision making when rank ordering applicants’ scores, for example, on different selection tools.

16.2 Tutorial

This chapter’s tutorial demonstrates how to arrange (sort) data in R.

16.2.1 Video Tutorial

As usual, you have the choice to follow along with the written tutorial in this chapter or to watch the video tutorial below. Both versions of the tutorial will show you how to arrange (sort) data with or without the pipe (%>%) operator. If you’re unfamiliar with the pipe operator, no need to worry: I provide a brief explanation and demonstration regarding their purpose in both versions of the tutorial.

Link to video tutorial: https://youtu.be/wVwJQsLNbmw

16.2.2 Functions & Packages Introduced

Function Package
arrange dplyr
desc dplyr

16.2.3 Initial Steps

Please note, that any function that appears in the Initial Steps section has been covered in a previous chapter. If you need a refresher, please view the relevant chapter. In addition, a previous chapter may show you how to perform the same action using different functions or packages.

If you haven’t already, save the file called “PersData.csv” into a folder that you will subsequently set as your working directory. Your working directory will likely be different than the one shown below (i.e., "H:/RWorkshop"). As a reminder, you can access all of the data files referenced in this book by downloading them as a compressed (zipped) folder from the my GitHub site: https://github.com/davidcaughlin/R-Tutorial-Data-Files; once you’ve followed the link to GitHub, just click “Code” (or “Download”) followed by “Download ZIP”, which will download all of the data files referenced in this book. For the sake of parsimony, I recommend downloading all of the data files into the same folder on your computer, which will allow you to set that same folder as your working directory for each of the chapters in this book.

Next, using the setwd function, set your working directory to the folder in which you saved the data file for this chapter. Alternatively, you can manually set your working directory folder in your drop-down menus by going to Session > Set Working Directory > Choose Directory…. Be sure to create a new R script file (.R) or update an existing R script file so that you can save your script and annotations. If you need refreshers on how to set your working directory and how to create and save an R script, please refer to Setting a Working Directory and Creating & Saving an R Script.

# Set your working directory setwd("H:/RWorkshop")

Next, read in the .csv data file called “PersData.csv” using your choice of read function. In this example, I use the read_csv function from the readr package (Wickham, Hester, and Bryan 2024). If you choose to use the read_csv function, be sure that you have installed and accessed the readr package using the install.packages and library functions. Note: You don’t need to install a package every time you wish to access it; in general, I would recommend updating a package installation once ever 1-3 months. For refreshers on installing packages and reading data into R, please refer to Packages and Reading Data into R.

# Install readr package if you haven't already # [Note: You don't need to install a package every # time you wish to access it] install.packages("readr") # Access readr package library(readr) # Read data and name data frame (tibble) object personaldata <- read_csv("PersData.csv") ## Rows: 9 Columns: 5 ## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## chr (4): lastname, firstname, startdate, gender ## dbl (1): id ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. # Print the names of the variables in the data frame (tibble) object names(personaldata) ## [1] "id" "lastname" "firstname" "startdate" "gender" # Print data frame (tibble) object personaldata ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 154 McDonald Ronald 1/9/2016 male ## 3 155 Smith John 1/9/2016 male ## 4 165 Doe Jane 1/4/2016 female ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 111 Newton Isaac 1/9/2016 male ## 7 198 Morales Linda 1/7/2016 female ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

As you can see from the output generated in your console, the personaldata data frame object contains basic employee demographic information. The variable names include: id, lastname, firstname, startdate, and gender. Technically, the read_csv function reads in what is called a “tibble” object (as opposed to a data frame object), but for our purposes a tibble will behave similarly to a data frame. For more information on tibbles, check out Wickham and Grolemund’s (2017) chapter on tibbles: http://r4ds.had.co.nz/tibbles.html.

16.2.4 Arrange (Sort) Data

There are different functions we could use to arrange (sort) the data in the data frame, and in this chapter, we will focus on the arrange function from the dplyr package (Wickham et al. 2023). Please note that there are other functions we could use to sort data, and if you’re interested, in the Arranging (Sorting) Data: Chapter Supplement, I demonstrate how to use the order function from base R to carry out the same operations we will cover below.

Because the arrange function comes from the dplyr package, which is part of the tidyverse of R packages (Wickham 2023; Wickham et al. 2019). If you haven’t already, install and access the dplyr package using the install.packages and library functions, respectively.

# Install dplyr package if you haven't already # [Note: You don't need to install a package every # time you wish to access it] install.packages("dplyr") # Access dplyr package library(dplyr)

Before diving into arranging the data, as a disclaimer, I will demonstrate two techniques for arranging (sorting) data using the arrange function.

The first technique uses a “pipe” which in R is represented by the %>% operator. The pipe operator comes from a package called magrittr (Bache and Wickham 2022), on which the dplyr is partially dependent. In short, a pipe allows a person to more efficiently write code and to improve the readability of the code and overall script. Specifically, a pipe forwards the result or value of one object or expression to a subsequent function. In doing so, one can avoid writing functions in which other functions are nested parenthetically. For more information on the pipe operator, check out Wickham and Grolemund’s (2017) chapter on pipes: https://r4ds.had.co.nz/pipes.html.

This brings us to the second technique for arranging (sorting) data using the arrange function. The second technique uses a more traditional approach that some may argue lacks the efficiency and readability of the pipe. Conversely, others may argue against the use of pipes altogether. I’m not here to settle any “pipes versus no pipes” debate, and you’re welcome to use either technique. If you don’t want to learn how to use pipes (or would like to learn how to use them at a later date), feel free to skip to the section below called Without Pipe.

16.2.4.1 With Pipe

To use the “with pipe” technique, first, type the name of our data frame object, which we previously named personaldata, followed by the pipe (%>%) operator. This will “pipe” our data frame into the subsequent function. Second, either on the same line or on the next line, type the name of the arrange function, and within the parentheses, enter the variable name startdate as the argument to indicate that we want to arrange (sort) the data by the start date of the employees. The default operation of the arrange function is to arrange (sort) the data in ascending order. If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names function, which I demonstrated previously in this chapter in the Initial Steps section.

# Arrange (sort) data by variable in ascending order (single line) (with pipe) personaldata %>% arrange(startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

Alternatively, we can write this script over two lines and achieve the same output in our Console.

# Arrange (sort) data by variable in ascending order (two lines) (with pipe) personaldata %>% arrange(startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

Please note that the operations we have performed thus far have not changed anything in the personaldata data frame object itself; rather, the output in the Console simply shows what it looks like if the data are sorted by the variable in question. We can verify this by viewing the first six rows of data in our data frame object using the head function. As you can see below, nothing changed in the data frame itself.

# Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 154 McDonald Ronald 1/9/2016 male ## 3 155 Smith John 1/9/2016 male ## 4 165 Doe Jane 1/4/2016 female ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 111 Newton Isaac 1/9/2016 male

To change the ordering of data in the personaldata data frame object itself, we will need to (re)name the data frame object using the <- variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata). To do so, to the left of the <- operator, type what you would like to name the new (updated) sorted data frame object (personaldata). Next, to the right of the <- operator, copy and paste the same code we wrote above. Finally, use the head function from base R to view the first six rows of the new data frame object.

# Arrange (sort) data by variable in ascending order and # overwrite existing data frame object (with pipe) personaldata <- personaldata %>% arrange(startdate) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male

As you can see in the Console output, now the personaldata data frame object has been changed such that the data are arranged (sorted) by the startdate variable.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below.

# Arrange (sort) data by variable in ascending order and # overwrite existing data frame object (with pipe) personaldata <- personaldata %>% arrange(desc(startdate)) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 201 Providence Cindy 1/9/2016 female ## 5 282 Legend John 1/9/2016 male ## 6 198 Morales Linda 1/7/2016 female

To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments. Let’s enter the gender variable first, followed by the startdate variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate is sorted within the sorted levels of the gender variable. As a reminder, the default operation of the arrange function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments).

# Arrange (sort) data by two variables in ascending order (with pipe) personaldata %>% arrange(gender, startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 165 Doe Jane 1/4/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 201 Providence Cindy 1/9/2016 female ## 4 153 Sanchez Alejandro 1/1/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 154 McDonald Ronald 1/9/2016 male ## 7 155 Smith John 1/9/2016 male ## 8 111 Newton Isaac 1/9/2016 male ## 9 282 Legend John 1/9/2016 male

Watch what happens when we switch the order of the two variables we are using to sort the data.

# Arrange (sort) data by two variables in ascending order (with pipe) personaldata %>% arrange(startdate, gender) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 201 Providence Cindy 1/9/2016 female ## 6 154 McDonald Ronald 1/9/2016 male ## 7 155 Smith John 1/9/2016 male ## 8 111 Newton Isaac 1/9/2016 male ## 9 282 Legend John 1/9/2016 male

As you can see, the order of the two sorting variables matters.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function.

# Arrange (sort) data by variable in descending order (with pipe) personaldata %>% arrange(desc(gender), desc(startdate)) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 282 Legend John 1/9/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 153 Sanchez Alejandro 1/1/2016 male ## 7 201 Providence Cindy 1/9/2016 female ## 8 198 Morales Linda 1/7/2016 female ## 9 165 Doe Jane 1/4/2016 female

Or, we can sort one variable in the default ascending order and the other in descending order.

# Arrange (sort) data by two variables in ascending & descending order (with pipe) personaldata %>% arrange(gender, desc(startdate)) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 201 Providence Cindy 1/9/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 165 Doe Jane 1/4/2016 female ## 4 154 McDonald Ronald 1/9/2016 male ## 5 155 Smith John 1/9/2016 male ## 6 111 Newton Isaac 1/9/2016 male ## 7 282 Legend John 1/9/2016 male ## 8 125 Franklin Benjamin 1/5/2016 male ## 9 153 Sanchez Alejandro 1/1/2016 male

16.2.4.2 Without Pipe

We can achieve the same output without using the pipe (%>%) operator as with the pipe operator; again, your choice of using or not using the pipe operator is up to you.

To use the arrange function without the pipe operator, type the name of the arrange function, and within the parentheses, as the first argument, type the name of the personaldata data frame object, and as the second argument, type the startdate variable, where the latter indicates that we want to arrange (sort) the data frame object by the start date of the employees. The default operation of the arrange function is to arrange (sort) the data in ascending order. Remember, we use commas to separate arguments used in a function (if there are more than one arguments). If you’re wondering where I found the exact names of the variables in the data frame, revisit the use of the names function, which I demonstrated previously in this chapter in the Initial Steps section.

# Arrange (sort) data by variable in ascending order without pipe arrange(personaldata, startdate) ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

To change the ordering of data in the personaldata data frame object itself, we will need to (re)name the data frame object using the <- variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata). To do so, to the left of the <- operator, type what you would like to name the new (updated) sorted data frame object (personaldata). Next, to the right of the <- operator, copy and paste the same code we wrote above. Finally, use the head function from base R to view the first six rows of the new data frame object.

# Arrange (sort) data by variable in ascending order and # overwrite existing data frame object without pipe personaldata <- arrange(personaldata, startdate) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below.

# Arrange (sort) data by variable in descending order and # overwrite existing data frame object without pipe personaldata <- arrange(personaldata, desc(startdate)) # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 201 Providence Cindy 1/9/2016 female ## 5 282 Legend John 1/9/2016 male ## 6 198 Morales Linda 1/7/2016 female

To arrange (sort) data by values/levels of two variables, we simply enter the names of two variables as consecutive arguments (after the name of the data frame, which is the first argument). Let’s enter the gender variable first, followed by the startdate variable. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed.

# Arrange (sort) data by variable in ascending order without pipe personaldata <- arrange(personaldata, gender, startdate)

As shown in the output above, startdate is sorted within the sorted levels of the gender variable. This also verifies that the default operation of the arrange function is to arrange (sort) the data in ascending order.

To arrange the data in descending order, just use the desc function from dplyr within the arrange function as shown below. You can use the desc function on one or both sorting variables.

# Arrange (sort) data by one variable in ascending order and # the other in descending order without pipe personaldata <- arrange(personaldata, gender, desc(startdate))

Or we can apply the desc function to both variables.

# Arrange (sort) data by both variables descending order without pipe personaldata <- arrange(personaldata, desc(gender), desc(startdate))

16.2.5 Summary

In this chapter, we learned how to arrange (sort) data by one or more variables using the arrange and desc functions from the dplyr package. This chapter also introduced the pipe (%>%) operator, which can help make code easier to read in some contexts.

16.3 Chapter Supplement

In addition to the arrange function from the dplyr package covered above, we can use the order function from base R to arrange (sort) data by values for one or more variable. Because this function comes from base R, we do not need to install and access an additional package like we do with the arrange functions, which some may find advantageous.

16.3.1 Functions & Packages Introduced

Function Package
order base R
c base R

16.3.2 Initial Steps

If required, please refer to the Initial Steps section from this chapter for more information on these initial steps.

# Set your working directory setwd("H:/RWorkshop") # Access readr package library(readr) # Read data and name data frame (tibble) object personaldata <- read_csv("PersData.csv") ## Rows: 9 Columns: 5 ## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Delimiter: "," ## chr (4): lastname, firstname, startdate, gender ## dbl (1): id ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

16.3.3 order Function from Base R

To sort a data frame object in ascending order based on a single variable, we will use the order function from base R to do the following:

  1. Type the name of the data frame object that you wish to arrange (sort) (personaldata).
  2. Insert brackets ([ ]), which allow us to reference rows or columns depending on how we format the brackets. If we type a function or value before the comma, we are indicating that we wish to apply operations to row(s), and if we type a function or value after the comma, we are indicating that we wish to apply operations to column(s).
  3. To sort the data frame into ascending rows by the startdate variable, type the name of the order function before the comma in the brackets. As the sole parenthetical argument of the order function, type the name of the personaldata data frame object, followed by the $ operator and the name of the variable by which we wish to sort the data frame, which to reiterate is the startdate variable. The $ operator signals to R that a variable belongs to a particular data frame object. By default, the order function sorts in ascending order.
# Arrange (sort) data by variable in ascending order personaldata[order(personaldata$startdate),] ## # A tibble: 9 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male ## 7 111 Newton Isaac 1/9/2016 male ## 8 201 Providence Cindy 1/9/2016 female ## 9 282 Legend John 1/9/2016 male

To change the ordering of data in the personaldata data frame object itself, we will need to (re)name the data frame object using the <- variable assignment operator. In this example, I will demonstrate how to overwrite the existing data frame object, and thus I give the data frame object the exact same name as it had originally (i.e., personaldata). To do so, to the left of the <- operator, type what you would like to name the new (updated) sorted data frame object (personaldata). Next, to the right of the <- operator, copy and paste the same code we wrote above. Finally, use the head function from base R to view the first six rows of the new data frame object.

# Arrange (sort) data by variable in ascending order # and overwrite existing data frame object personaldata <- personaldata[order(personaldata$startdate),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 153 Sanchez Alejandro 1/1/2016 male ## 2 165 Doe Jane 1/4/2016 female ## 3 125 Franklin Benjamin 1/5/2016 male ## 4 198 Morales Linda 1/7/2016 female ## 5 154 McDonald Ronald 1/9/2016 male ## 6 155 Smith John 1/9/2016 male

To sort in descending order, add the argument decreasing=TRUE within the order function parentheses. Remember, we use commas to separate arguments used in a function (if there are two or more arguments).

# Arrange (sort) data by variable in descending order personaldata <- personaldata[order(personaldata$startdate, decreasing=TRUE),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 201 Providence Cindy 1/9/2016 female ## 5 282 Legend John 1/9/2016 male ## 6 198 Morales Linda 1/7/2016 female

If we wish to sort a data frame object by two variables, as the second argument in the order function parentheses, simply add the name of the data frame object, followed by the $ operator and the name of the second second variable. We will sort the data frame in by gender and startdate. The ordering of the two variables matters; the function sorts initially by the values/levels of the first variable listed and sorts subsequently by the values/levels of the second variable listed, but does so within the values/levels of the first variable listed. As shown below, startdate is sorted within the sorted levels of the gender variable. The default operation of the arrange function is to arrange (sort) the data in ascending order.

# Arrange (sort) data by two variables in ascending order personaldata <- personaldata[order(personaldata$gender, personaldata$startdate),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 165 Doe Jane 1/4/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 201 Providence Cindy 1/9/2016 female ## 4 153 Sanchez Alejandro 1/1/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 154 McDonald Ronald 1/9/2016 male

To sort by one of the variables in descending order and the other variable by the default ascending order, we need to add the decreasing= argument, but because we have two variables, we need to provide a vector containing logical values (TRUE, FALSE) to indicate which variable we wish to apply a descending order. If the logical value is TRUE for the decreasing= argument, then we sort in descending variable. Using the c (combine) function from base R, we create a vector of two logical values whose order corresponds to the order in which we listed the two variables in the order function. For example, if the argument is decreasing=c(FALSE, TRUE), then we sort the first variable in the default ascending order and the second variable in descending order, which is what we do below. Just be sure to add the following argument to the order function when attempting to sort two or more variables: method="radix".

# Arrange (sort) data by gender in ascending order and # startdate in descending order personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(FALSE, TRUE), method="radix"),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 201 Providence Cindy 1/9/2016 female ## 2 198 Morales Linda 1/7/2016 female ## 3 165 Doe Jane 1/4/2016 female ## 4 154 McDonald Ronald 1/9/2016 male ## 5 155 Smith John 1/9/2016 male ## 6 111 Newton Isaac 1/9/2016 male

Or, you could sort by both variables in descending order by change the argument to decreasing=c(TRUE, TRUE).

# Arrange (sort) data by gender and startdate variables descending order personaldata <- personaldata[order(personaldata$gender, personaldata$startdate, decreasing=c(TRUE, TRUE), method="radix"),] # Print just the first 6 rows of the data frame in Console head(personaldata) ## # A tibble: 6 × 5 ## id lastname firstname startdate gender ## <dbl> <chr> <chr> <chr> <chr> ## 1 154 McDonald Ronald 1/9/2016 male ## 2 155 Smith John 1/9/2016 male ## 3 111 Newton Isaac 1/9/2016 male ## 4 282 Legend John 1/9/2016 male ## 5 125 Franklin Benjamin 1/5/2016 male ## 6 153 Sanchez Alejandro 1/1/2016 male

References

Bache, Stefan Milton, and Hadley Wickham. 2022. Magrittr: A Forward-Pipe Operator for r. https://CRAN.R-project.org/package=magrittr. ———. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse. Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686. Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr. Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Visualize, Model, Transform, Tidy, and Import Data. Sebastopol, California: O’Reilly Media, Inc. https://r4ds.had.co.nz/n. Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2024. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Tag » How To Sort In R