Chapter 10 Stem And Leaf Plot | Basic R Guide For NSC Statistics

  • R Guide for NSC Statistics
  • Preface
  • 1 R and RStudio
    • 1.1 Desktop RStudio
      • Downloading and Installing R
      • Downloading and Installing Desktop RStudio
    • 1.2 RStudio Cloud
    • 1.3 RStudio Layout
      • SOURCE
      • CONSOLE
      • ENVIRONMENT/HISTORY
      • FILES/PLOTS/PACKAGES/HELP/VIEWER
  • 2 R Style Guide
    • 2.1 Line Length
    • 2.2 Commenting Guidelines
    • 2.3 Assignment
    • 2.4 Notation and Naming
    • 2.5 Object Names
    • 2.6 Syntax Spacing
    • 2.7 Curly Braces
    • 2.8 Indentation
  • 3 Introduction to R
    • 3.1 Object and Mode
    • 3.2 Data Structures
    • 3.3 Graphics in R
    • 3.4 GGPlot2 Package
  • 4 Packages and Datasets
    • 4.1 Installing a Package
    • 4.2 Loading a Package
    • 4.3 Setting Working Directory
    • 4.4 Datasets built in R
    • 4.5 Using “Import Dataset” in RStudio
    • 4.6 Reading Files
      • CSV Files
      • Excel Files
  • 5 Working with Data Objects
    • 5.1 Assignation
    • 5.2 Retrieving a Value
    • 5.3 Error Messages
  • 6 Working with Data Frames
    • 6.1 Creating a Data Frame
    • 6.2 Intro to R Scripts
    • 6.3 Extracting Entries
    • 6.4 Generating a Count
  • 7 Working with Dataset
    • 7.1 Dataset built into R
    • 7.2 Viewing Part of the Dataset
    • 7.3 Viewing Entries Tied to a Variable
    • 7.4 Ordering Data Frame by Variable
    • 7.5 Renaming Variables
    • 7.6 Changing Data Entry
  • 8 Bar Graph
    • 8.1 Basic R Bar Graph
      • Enhancements in Basic R
      • Rearranging Results in Basic R
      • Horizontal Bar Graphs in Basic R
    • 8.2 Ggplot2 Bar Graph
      • Enhancements in Ggplot2
    • 8.3 Side-by-Side Bar Graph
      • 8.3.1 Renaming a Column Header
    • 8.4 Stacked Plot
  • 9 Pie Chart
    • 9.1 Basic R Pie Chart
      • Enhancements in Basic R (Optional)
    • 9.2 Ggplot2 Pie Chart
  • 10 Stem and Leaf Plot
    • 10.1 Making a Stem and Leaf Plot
    • 10.2 Rescaling the Stemplot
  • 11 Histogram
    • 11.1 Basic R Histogram
      • Changing Bin Widths in Basic R (Optional)
      • Changing Range of Values in Basic R (Optional)
      • Adding Colors in Basic R (Optional)
    • 11.2 Ggplot2 Histogram
      • Ggplot2 Histogram of a Vector
  • 12 Single Boxplot
    • 12.1 Basic R Boxplot
      • Boxplot with No Outlier
      • Boxplot with Outlier
    • 12.2 Ggplot2 Boxplot
      • Boxplot with No Outlier
      • Boxplot with Outlier
  • 13 Parallel Boxplot
    • 13.1 With a Grouping Variable (or Factor)
      • In Basic R
      • Using Ggplot2
    • 13.2 Without a Grouping Variable (or Factor)
      • In Basic R
  • 14 Descriptive Statistics for a Vector
    • 14.1 Describing Distribution
    • 14.2 Calculating Outliers
      • Using Boxplot Function
      • Using Fivenum Function
      • Forming Subsets
    • 14.3 No Outliers
  • 15 Descriptive Statistics for Data Frame
    • 15.1 Catergorical Variable Count
    • 15.2 Describing Distribution
    • 15.3 Describing Distribution by Group
      • By( ) Function
      • Aggregate( ) Function
    • 15.4 Subsetting
    • 15.5 Dealing with Outliers
  • 16 Putting Everything Together
    • 16.1 Downloading Dataset
    • 16.2 Removing a Row
    • 16.3 Renaming a Variable
    • 16.4 Changing Data Entries
    • 16.5 Categorical Variable Count
    • 16.6 Statistics for Quantitative Variables
    • 16.7 Boxplot & Histogram
    • 16.8 Dealing with Outliers
  • 17 Normal Quantile Plot
    • 17.1 Symmetric Distribution
      • Using Basic R
      • Using Ggplot2
    • 17.2 Skewed Distribution
      • Right-Skewed Distribution
      • Left-Skewed Distribution
    • 17.3 Other Distributions
      • In Basic R
      • Using Ggplot2
      • In Basic R
      • Using Ggplot2
  • 18 Scatterplots and Best Fit Lines - Single Set
    • 18.1 Basic R Scatterplot
    • 18.2 Basic R Regression Line
    • 18.3 Ggplot2 Scatterplot
    • 18.4 Ggplot2 Regression Line
  • 19 Scatterplots and Best Fit Lines - Two Sets
    • 19.1 Two Scatterplots in Basic R
    • 19.2 Two Regression Lines in Basic R
    • 19.3 Two Scatterplots Using Ggplot2
    • 19.4 Two Regression Lines Using Ggplot2
  • 20 Linear Regression Equation, Correlation Coefficient and Residuals
    • 20.1 Linear Regression Equation
    • 20.2 Calculating Correlation Coefficient
    • 20.3 Residual Plots
  • 21 Samples and Distributions
    • 21.1 Samples
    • 21.2 Sampling Distribution
    • 21.3 Binomial Distribution
    • 21.4 Normal Distribution
  • 22 Inference on One Sample Mean
    • 22.1 Check Distribution
    • 22.2 Two-Sided Hypothesis Test
    • 22.3 Calculating Confidence Interval
    • 22.4 One-Sided Hypothesis Test
  • 23 Inference on Two Independent Sample Means
    • 23.1 One-Sided Hypothesis Test
    • 23.2 Two-Sided Hypothesis Test
    • 23.3 Calculating Confidence Interval
  • 24 Inference on Two Dependent Sample Means
    • 24.1 Hypothesis Test Using Paired Values
    • 24.2 Hypothesis Test Using Value Differences
  • 25 Inference on a Single Proportion
    • 25.1 Two-Sided Alternative Hypothesis Test
    • 25.2 Calculating Confidence Interval
    • 25.3 One-Sided Alternative Hypothesis Test
  • 26 Inference on Two Proportions
    • 26.1 Enter Data as a Vector
      • Two-Sided Hypothesis Test
      • Calculating Confidence Interval Only
      • One-Sided Hypothesis Test
    • 26.2 Enter Data in Matrix Form
      • Two-Sided Hypothesis Test
      • Calculating Confidence Interval Only
      • One-sided Hypothesis Test
  • 27 Chi-Square Test
    • 27.1 Goodness of Fit
    • 27.2 Independence
  • Published with bookdown
Basic R Guide for NSC Statistics Chapter 10 Stem and Leaf Plot

Let us look at a dataset built into R called rivers. To see a description of this dataset, type ?rivers. The description will appear on the 4th panel under the Help tab.

To view the whole dataset, use the command View(rivers). A column of observations will appear on the Source panel, under the tab called rivers. You should see 1 column with 141 entries.

Let us look at the first 6 lines of rivers.

head(rivers) ## [1] 735 320 325 392 524 450

This dataset happens to be a vector since it has only 1 column of entries. The output for head(rivers) is given as a row of entries.

10.1 Making a Stem and Leaf Plot

To do a stemplot, we use the function stem(quantitative_variable)

stem(rivers) ## ## The decimal point is 2 digit(s) to the right of the | ## ## 0 | 4 ## 2 | 011223334555566667778888899900001111223333344455555666688888999 ## 4 | 111222333445566779001233344567 ## 6 | 000112233578012234468 ## 8 | 045790018 ## 10 | 04507 ## 12 | 1471 ## 14 | 56 ## 16 | 7 ## 18 | 9 ## 20 | ## 22 | 25 ## 24 | 3 ## 26 | ## 28 | ## 30 | ## 32 | ## 34 | ## 36 | 1

Notice that the stem part is automatically incremented by 2. R figures out how best to increment the stem part unless you specify otherwise.

Be sure to read where R places the decimal point for the output. For this result, the decimal is placed 2 digits after the vertical bar. In other words, the decimal point is 1 digit after the leaf. Notice that the leaf is a single digit. That means, you need to add a 0 after each leaf. For example, the first entry has a stem of 0 and leaf of 4. That means that the shortest river is 40 miles. The next river has a stem of 2 and a leaf of 0. That means, it is 200 miles long. The third entry has a stem of 2 and a leaf of 1. That means, this river is 210 miles long.

Note: The shortest river is actually 135 miles and not 40 miles. Because the stems are incremented by 2, it is hard to know whether the stem for the shortest river is 0 or 1. In this case, it should have been one. With rounding, the shortest river should read 140 miles and not 40 miles.

10.2 Rescaling the Stemplot

To rescale the stemplot, change the “scale” argument of the function, stem( ). The default scale is 1. Therefore, a scale greater than 1 will increase the length of the stems.

stem(rivers, scale = 2) ## ## The decimal point is 2 digit(s) to the right of the | ## ## 1 | 4 ## 2 | 0112233345555666677788888999 ## 3 | 00001111223333344455555666688888999 ## 4 | 111222333445566779 ## 5 | 001233344567 ## 6 | 000112233578 ## 7 | 012234468 ## 8 | 04579 ## 9 | 0018 ## 10 | 045 ## 11 | 07 ## 12 | 147 ## 13 | 1 ## 14 | 56 ## 15 | ## 16 | ## 17 | 7 ## 18 | 9 ## 19 | ## 20 | ## 21 | ## 22 | ## 23 | 25 ## 24 | ## 25 | 3 ## 26 | ## 27 | ## 28 | ## 29 | ## 30 | ## 31 | ## 32 | ## 33 | ## 34 | ## 35 | ## 36 | ## 37 | 1

Notice that the decimal is 2 digits to the right of the vertical bar or 1 decimal place after leaf. Therefore, the shortest river, with a stem of 1 and leaf of 4, is 140 miles long. The longest river, with a stem of 37 and leaf of 1, is 3710 miles long.

A scale between 0 and 1 will shorten the length of the stems.

stem(rivers, scale = 0.5) ## ## The decimal point is 3 digit(s) to the right of the | ## ## 0 | 12222222222333333333333333333333333333333333333444444444444444444444 ## 0 | 55555555555555556666666666677777777778888999999 ## 1 | 0001122233 ## 1 | 5589 ## 2 | 33 ## 2 | 5 ## 3 | ## 3 | 7

Notice that the leaf part is split from 0 to 4 and 5 to 9.

Note where the decimal point is placed. It is now 3 digits to the right of the vertical bar or 2 digits to the right of the leaf. That means, you have to add two 0s after each leaf. Therefore, fthe shortest river, with a stem of 0 and leaf of 1, is 100 miles long. The longest river, with a stem of 3 and leaf of 7, is 3700 miles long.

From each of the plots above, we see that no matter how we rescale, the length distribution is always skewed to the right with possible outliers.

Tag » How To Make A Stemplot