Chapter 3 The basics

3.1 Create a basic table

3.1.1 Example data

You can read in some data from the dslabs package on infectious disease cases in US states and check it out:

##       disease   state year weeks_reporting count population
## 1 Hepatitis A Alabama 1966              50   321    3345787
## 2 Hepatitis A Alabama 1967              49   291    3364130
## 3 Hepatitis A Alabama 1968              52   314    3386068
##         disease            state            year      weeks_reporting
##  Hepatitis A:2346   Alabama   :  315   Min.   :1928   Min.   : 0.00  
##  Measles    :3825   Alaska    :  315   1st Qu.:1950   1st Qu.:31.00  
##  Mumps      :1785   Arizona   :  315   Median :1975   Median :46.00  
##  Pertussis  :2856   Arkansas  :  315   Mean   :1971   Mean   :37.38  
##  Polio      :2091   California:  315   3rd Qu.:1990   3rd Qu.:50.00  
##  Rubella    :1887   Colorado  :  315   Max.   :2011   Max.   :52.00  
##  Smallpox   :1275   (Other)   :14175                                 
##      count          population      
##  Min.   :     0   Min.   :   86853  
##  1st Qu.:     7   1st Qu.: 1018755  
##  Median :    69   Median : 2749249  
##  Mean   :  1492   Mean   : 4107584  
##  3rd Qu.:   525   3rd Qu.: 4996229  
##  Max.   :132342   Max.   :37607525  
##                   NA's   :214

With this data, you can create a basic table with the total counts of pertussis by state over the period of the data (1928–2011), limiting to the top 10 states. First, you can use tidyverse code to create the dataframe with the data you’d like to show in the table:

3.1.2 Creating a basic table

Then, you can use the kable function to create a basic PDF table:

state total_count
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

The booktabs = TRUE option, if you include it in the kable function call, will give some cleaner default formatting for the table:

state total_count
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

Example change.

3.2 Change column names

3.2.1 Basics of changing column names

By default, each column will have the same name as the column name in the data that is input to kable. You can easily change those, though, with the col.names option. You just put in a character vector with as many values as you have columns, giving the names you would prefer:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

3.2.2 Changing formatting in column names

The kableExtra package includes a row_spec function. This will make changes to the formatting of a specific row in the table.

You can use this for any row, and that includes the row with the column headers. The kableExtra conventions consider this as row 0.

For example, you can put the column headers in italics by setting italic = TRUE in a row_spec call for row 0:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

If you would like to underline the column headers, you can set underline = TRUE in the row_spec call for row 0:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

You can change the color of the column names by using the color option in the row_spec call:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

This function uses the R colornames, so you can check out those at sites like http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf. It will also accept colors specified by hexadecimal code in the “#BBBBBB” format:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

You can check out websites on hexadecimal color codes at places like https://htmlcolorcodes.com/ to learn more about using these conventions to specify the color.

If you’d like to change the background color and the text color, you can add a specification for the “background” option in the row_spec call for row 0:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

You can change the font size of the column names, without changing the font size for the rest of the table, with the font_size parameter in row_spec for row 0. The default text size is somewhere in the “14” range, so pick larger numbers to make the text larger in the column names and smaller numbers to make the text smaller.

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

3.2.3 Adding footnotes to column names

You can also add footnotes based on a column name. This gets just a touch more complex, because what you actually need to do is rename the column name to include the footnote marker in the column’s name.

These markers can be added with functions that start with footnote_marker. For example, footnote_marker_alphabet gives letters for each footnote. The other options are footnote_marker_number and footnote_marker_symbol. You can add one of these to a column name directly in the col.names vector of column names by using paste0 to add the marker on to the existing column name. You’ll put in 1 for the argument for the first footmark of a certain type, 2 for the second, and so on.

Once you’ve added this footnote marker, you’ll need to also add the footnote itself. You do this with the footnote function. In this function, you can add a vector the same length as the number of footnotes of that type (alphabet, symbol, or number) with the footnote text in order.

For a simple example, the following code will add a single footnote for a single column header, using a symbol to denote the footnote:

State Total Pertussis Cases*
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115
* Includes all cases for 1928–2011

Here is an example of adding footnotes to two column names, using alphabetical symbols:

Statea Total Pertussis Casesb
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115
a Top 10 states in terms of total case counts.
b Includes all cases for 1928–2011.

3.3 Change the column alignment

You might want to change how the values in each column are aligned. By default, the numeric columns will be right-aligned and all others will be left-aligned, but you might want something different, like everything to be centered.

You can use the align parameter in the kable function to change the column alignment. This parameter takes a character vector that combines the letters “c”, “l”, and “r”. These stand for center aligned (“c”), left-aligned (“l”), and right-aligned (“r”).

If you’d like all of your columns to be aligned in the same way (for example, all center-aligned), you can just put one character into the argument for align. For example, the following code will center-aligned both of the two columns in the example table:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

If you want to set different alignments for different columns, you will need to include as many characters in the argument to the align parameter as there are columns in the dataframe. You will line up the characters with the columns—for example, if you want the first column to be right-aligned and the next to be center-aligned, you could do that with the argument "rc", where “r” is in the first position to stand for the first column and “c” is in the second position to stand for the second column:

State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

3.4 Add a table caption

You can add a caption to your table with the caption parameter. This lets you put your caption in a character string, and it will be added to the table output:

Table 3.1: States with most total cases of Pertussis
State Total Pertussis Cases
New York 214266
Texas 191626
Pennsylvania 160331
California 155110
Michigan 136302
Ohio 124964
Wisconsin 116211
New Jersey 109192
Massachusetts 105285
Illinois 103115

The captions will be numbered based on their order in the document. If you’re building a fancier document (like the bookdown book we’ve created here), the table numbering will start with the chapter’s number and then given the number of the table within that chapter.