Chapter 4 Tidyverse / kableExtra pipelines

To really levarage the power of R for tables, you’ll likely want to get in the habit of creating pipelines that use functions from the tidyverse package to reshape or clean your data before piping that data into kable and kableExtra functions to create the table. This section will walk through some examples of this pipeline, and how it can provide a powerful approach for creating tables from data in R.

To run this code, please be sure to load these packages:

4.1 Example data

Again, this section uses a dataset from the dslabs package called “us_contagious_diseases”. You can load that data by loading the dslabs package (if you don’t have it, you can install it with install.packages("dslabs")) and then using the data function to load the dataset:

This dataset includes counts by states for various infectious diseases. It gives these counts by year over a several decades. It also gives the number of weeks for which that state reported data that year and the population of the state that year (in case you’d like to calculate rates):

##       disease   state year weeks_reporting count population
## 1 Hepatitis A Alabama 1966              50   321    3345787
## 2 Hepatitis A Alabama 1967              49   291    3364130
## 3 Hepatitis A Alabama 1968              52   314    3386068
## 4 Hepatitis A Alabama 1969              49   380    3412450

4.2 Example 1

One quick table we might want to create may be a table where each state is listed, with the range of years that the state reported data. This way, we can quickly see the years that each state reported cases for each disease, and if all states have data for the same number of years.

To do this, we’ll need to get a summary from each combination of state and disease, where we get the minimum and maximum values of year for that combination. Then, we can paste those values together to get the year range. We can select just the columns we need for the final table, getting rid of any extra columns we created along the way to help with our calculation. Finally, this table will look much nicer if we have a separate column for each disease (while “long” data is easier to work with when coding, “wide” data often is clearer in a presentation format). The pivot_wider can do that reshaping.

Here’s the full pipeline:

Table 4.1: Years of reporting for each state and disease.
state Hepatitis A Measles Mumps Pertussis Polio Rubella Smallpox
Alabama 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Alaska 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Arizona 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Arkansas 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
California 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Colorado 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Connecticut 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Delaware 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
District Of Columbia 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Florida 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Georgia 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Hawaii 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Idaho 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Illinois 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Indiana 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Iowa 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Kansas 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Kentucky 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Louisiana 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Maine 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Maryland 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Massachusetts 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Michigan 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Minnesota 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Mississippi 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Missouri 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Montana 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Nebraska 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Nevada 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
New Hampshire 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
New Jersey 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
New Mexico 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
New York 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
North Carolina 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
North Dakota 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Ohio 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Oklahoma 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Oregon 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Pennsylvania 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Rhode Island 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
South Carolina 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
South Dakota 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Tennessee 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Texas 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Utah 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Vermont 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Virginia 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Washington 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
West Virginia 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Wisconsin 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952
Wyoming 1966–2011 1928–2002 1968–2002 1938–2011 1928–1968 1966–2002 1928–1952

4.3 Example 2

As another example, you might want to use this data to create a table that shows the number of cases of a specific disease as it’s changed over the last five years of reporting in a certain region. For example, you could pull data on states along the west coast of the US (California, Oregon, and Washington) and look at changes in pertussis over the last five available years.

Table 4.2: Average weekly number of cases of pertussis in west coast states for the last five years of reporting.
state 2007 2008 2009 2010 2011
California 3.3 7.1 10.4 26.4 23.4
Oregon 2.0 1.6 3.2 3.3 3.6
Washington 6.0 9.1 6.3 11.7 15.2