Chapter 5 How to…
5.1 Make a table full width
5.2 Add a top header
You can also add headers above certain column names, to help distinguish groupings.
For this section, we’ll use an example based on the “us_contagious_diseases” data
from the dslabs
package. The following code will load that data and then create the
pertussis_latest_years
dataframe from it:
library(tidyverse)
library(dslabs)
data("us_contagious_diseases")
pertussis_latest_years <- us_contagious_diseases %>%
# Limit to states in the region of interest
filter(state %in% c("California", "Oregon", "Washington")) %>%
# Limit to the disease of interest
filter(disease == "Pertussis") %>%
# Limit to the latest five years reported for each state
group_by(state) %>%
arrange(desc(year)) %>%
slice(1:5) %>%
# Calculate the average weekly count from the total yearly count and
# the number of weeks reporting (should check data---does an
# 'unreported' week mean there were no cases that week?)
mutate(ave_weekly_count = count / weeks_reporting) %>%
# Limit to only the data to show in the table
select(state, year, ave_weekly_count) %>%
# Reorder by year so that the earliest year comes first
arrange(year) %>%
# "Widen" the data for a nicer table
pivot_wider(names_from = year, values_from = ave_weekly_count)
This dataframe gives average weekly counts of pertussis in three west coast states over the last five years of reporting. The code used to create this dataset from the original data is discussed in a previous section of this book.
Previously, we created this table from the data:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.")
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
We might want to add a top header over the five columns that show years, to distinguish
that those are all similar types of columns. With kableExtra
, this can be added with
the function add_header_above
. The convention for this argument is to include a value
for header
that’s a vector; this vector will give the names to use for top-level headers
as well as the number of columns they should span. In this case, we’d like to not add
anything over the “state” column, so we can use the marker " "
to indicate that
no top-level header should be added. For all the other columns, we’ll add “Year”.
Since we only have one column (the first) that will lack a top-level header, and
then five (the years) that will all get the same header (“Year”), we can express
that in the “header” argument as c(" " = 1, "Year" = 5)
. In other words, don’t
have a top-level header for one column, and then have one named “Year” that spans
five columns.
Here’s the result:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "Year" = 5))
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
In the add_header_above
call, you’ll need to make sure that you specify values
for all the columns in your table. In other words, all the numbers in that argument
should add up to the number of columns that you have. In this case, we have six
columns in the table (the “state” column plus each of the yearly ones). Since
5 + 1 = 6, we’re good here.
Here’s another example, where separate headers are added for the first two then the next three years:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "First two years" = 2, "Next three years" = 3))
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
Again, you can see that 1 + 2 + 3 = 6, so we’ve covered all of our columns in the table.
If you want, you can even add headers on top of headers. Just keep adding calls to
add_header_above
, going from the header you want closest to the column names to the
top-most one you want. For example, you could add a header above both of the two
top-level headers in the previous example with:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "First two years" = 2, "Next three years" = 3)) %>%
add_header_above(header = c(" " = 1, "Year" = 5))
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
You can change the formatting of the header in the add_header_above
call, as well.
For example, you can change it to be in italics with italic = TRUE
:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "Year" = 5),
italic = TRUE)
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
You can change the alignment, too. The default is to be centered over the columns that
it spans, but you can change that with align
. For example, align = "l"
would change
it to be left-aligned:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "Year" = 5),
align = "l")
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
You can change the color of both the text and the background color for this new
header section with teh color
and background
arguments, respectively:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "Year" = 5),
color = "white",
background = "lightsalmon")
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
You can change the angle of the text for the new header with angle
:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "Year" = 5),
angle = -45)
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
If you’d like the text for a column to have a line break, you can include the code for a new line (“”) inside the column name:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "Year\n(Georgian calendar)" = 5))
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
If you have more than one top-level header, you may want to change the amount
of separation between each group. You can do that with the line_sep
argument.
Here is the example table where separate top-level headers are added for the
first two years versus the last three from earlier:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "First two years" = 2, "Next three years" = 3))
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
If you set line_sep = 0
, then you’ll see that there’s no space between the two
header sections, so the line under the two headers runs together into one line:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "First two years" = 2, "Next three years" = 3),
line_sep = 0)
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
On the other hand, if you set line_sep
to a value higher than it’s default of 3,
you can increase the separation in the headers:
pertussis_latest_years %>%
kable(digits = 1,
caption = "Average weekly number of cases of pertussis in west coast states for the last five years of reporting.") %>%
add_header_above(header = c(" " = 1, "First two years" = 2, "Next three years" = 3),
line_sep = 10)
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
5.3 Group rows
You might also want to show structure or grouping within your rows. There
are a number of functions in kableExtra
that help with this.
These examples will again use data derived from the “us_contagious_diseases” data
from the dslabs
package. The following code will load that data and then create the
per_hepa_latest_years
dataframe from it:
library(tidyverse)
library(dslabs)
data("us_contagious_diseases")
per_hepa_latest_years <- us_contagious_diseases %>%
# Limit to states in the region of interest
filter(state %in% c("California", "Oregon", "Washington")) %>%
# Limit to the two diseases of interest
filter(disease %in% c("Pertussis", "Hepatitis A")) %>%
# Limit to the latest five years reported for each combination of
# state and disease
group_by(state, disease) %>%
arrange(desc(year)) %>%
slice(1:5) %>%
# Calculate the average weekly count from the total yearly count and
# the number of weeks reporting (should check data---does an
# 'unreported' week mean there were no cases that week?)
ungroup() %>%
mutate(ave_weekly_count = count / weeks_reporting) %>%
# Limit to only the data to show in the table
select(state, disease, year, ave_weekly_count) %>%
# Reorder by year so that the earliest year comes first
arrange(disease, state, year) %>%
# "Widen" the data for a nicer table
pivot_wider(names_from = year, values_from = ave_weekly_count)
This dataframe gives the average weekly counts of two diseases (separately) in three west coast states over the last five years of reporting. The code used to create this dataset is an extension of the code used in the section on adding top headings to tables—in this case, we’re adding on data for a second disease.
Here’s what the dataframe looks like:
## # A tibble: 6 x 7
## state disease `2007` `2008` `2009` `2010` `2011`
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 California Hepatitis A 6.12 6.48 4.60 4.10 3.59
## 2 Oregon Hepatitis A 0.612 0.596 0.422 0.413 0.277
## 3 Washington Hepatitis A 1.12 0.896 0.875 0.417 0.583
## 4 California Pertussis 3.28 7.08 10.4 26.4 23.4
## 5 Oregon Pertussis 2.04 1.61 3.15 3.26 3.60
## 6 Washington Pertussis 5.98 9.08 6.3 11.7 15.2
When we create a table from this, we may want to group the rows by disease (or by state), so that readers can see common factors in certain rows.
You can create those groupings using the pack_rows
function. You will include one
of these for each of the groupings that you want to create. For example, here there’s
one pack_rows
to group the first three rows (all with measurements of Hepatitis A)
and a second pack_rows
to group the fourth through six rows (all with measurements
of pertussis). For each pack_rows
call, it gives the name that will be used for
the grouping (this is the name that shows up as a label for those groups in the
table) as well as the number of the line where the grouping should start (start_row
)
and the line where it should end (end_row
):
per_hepa_latest_years %>%
select(-disease) %>%
kable(digits = 1) %>%
pack_rows("Hepatitis A", start_row = 1, end_row = 3) %>%
pack_rows("Pertussis", start_row = 4, end_row = 6)
state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
Hepatitis A | |||||
California | 6.1 | 6.5 | 4.6 | 4.1 | 3.6 |
Oregon | 0.6 | 0.6 | 0.4 | 0.4 | 0.3 |
Washington | 1.1 | 0.9 | 0.9 | 0.4 | 0.6 |
Pertussis | |||||
California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
This method of grouping the rows required us to list, by hand, the rows in each
group. However, that’s something that we originally had in the data, in the
disease
column.
The kableExtra
has a function called collapse_rows
that will do the grouping
by a column in the data. You need to move the columns to use to early in the
dataframe, which you can do with select
. Then, you can specify to collapse
the rows by the first column using collapse_rows
:
per_hepa_latest_years %>%
# Move disease to be the first column.
select(disease, state, `2007`:`2011`) %>%
kable(digits = 1) %>%
collapse_rows(columns = 1)
disease | state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|---|
Hepatitis A | California | 6.1 | 6.5 | 4.6 | 4.1 | 3.6 |
Oregon | 0.6 | 0.6 | 0.4 | 0.4 | 0.3 | |
Washington | 1.1 | 0.9 | 0.9 | 0.4 | 0.6 | |
Pertussis | California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 | |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
The collapse_rows
function includes some different formatting options. For example,
you can change the vertical alignment of the headings for the row groups with valign
.
Here’s the code to put them at the top of each row group, instead of vertically centered:
per_hepa_latest_years %>%
# Move disease to be the first column.
select(disease, state, `2007`:`2011`) %>%
kable(digits = 1) %>%
collapse_rows(columns = 1, valign = "top")
disease | state | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|---|
Hepatitis A | California | 6.1 | 6.5 | 4.6 | 4.1 | 3.6 |
Oregon | 0.6 | 0.6 | 0.4 | 0.4 | 0.3 | |
Washington | 1.1 | 0.9 | 0.9 | 0.4 | 0.6 | |
Pertussis | California | 3.3 | 7.1 | 10.4 | 26.4 | 23.4 |
Oregon | 2.0 | 1.6 | 3.2 | 3.3 | 3.6 | |
Washington | 6.0 | 9.1 | 6.3 | 11.7 | 15.2 |
5.4 Bold columns or rows
When tables have a lot of content, you may want to highlight elements to draw attention to them. This example uses data from the AmesHousing
package.
library(tidyverse)
library(AmesHousing)
top_neighborhoods <- make_ames() %>%
filter(Year_Sold == 2010) %>%
group_by(Neighborhood) %>%
summarize(Home_Sales = n(),
Total = scales::dollar(sum(Sale_Price))) %>%
arrange(-Home_Sales) %>%
slice(1:8)
The column_spec() function allows the selection of a column and then a specification for the look. For example, this is a bold treatment for the third column.
Neighborhood | Home_Sales | Total |
---|---|---|
North_Ames | 58 | $8,721,963 |
Old_Town | 29 | $3,556,339 |
Edwards | 23 | $2,835,421 |
Sawyer | 23 | $3,106,800 |
College_Creek | 21 | $4,163,850 |
Somerset | 21 | $4,334,126 |
Sawyer_West | 18 | $3,268,000 |
Mitchell | 17 | $3,043,600 |
The row_spec() function allows the selection of rows and a specification for a style. For greater emphasis, this example sets a background fill color and the font color.
top_neighborhoods %>%
kable() %>%
kable_styling() %>%
row_spec(3:4, bold = T, color = "white", background = "#D7261E")
Neighborhood | Home_Sales | Total |
---|---|---|
North_Ames | 58 | $8,721,963 |
Old_Town | 29 | $3,556,339 |
Edwards | 23 | $2,835,421 |
Sawyer | 23 | $3,106,800 |
College_Creek | 21 | $4,163,850 |
Somerset | 21 | $4,334,126 |
Sawyer_West | 18 | $3,268,000 |
Mitchell | 17 | $3,043,600 |
5.5 Dynamically make changes
You might want to make changes to your tables that depend on features of your data that aren’t known in advance or can’t be hardcoded.
Let’s look at the comma_survey
data set, which polled Americans on the oxford comma and other grammatical concepts. In particular, we’ll look at the education of those who care about the debate of whether “data” is a singular or plural noun:
library(fivethirtyeight)
library(tidyverse)
data_noun <- comma_survey %>%
filter(!is.na(care_data)) %>%
select(education, care_data)
data_noun
## # A tibble: 1,091 x 2
## education care_data
## <ord> <ord>
## 1 Bachelor degree Not much
## 2 Graduate degree Not much
## 3 <NA> Not at all
## 4 Less than high school degree Some
## 5 <NA> Not much
## 6 Some college or Associate degree Some
## 7 Some college or Associate degree Some
## 8 Some college or Associate degree A lot
## 9 Graduate degree Not much
## 10 Bachelor degree Some
## # … with 1,081 more rows
We want to make a table of what percent, by education, answered each of “Not much”, “Not at all”, “Some”, and “A lot” to the question: “How much, if at all, do you care about the debate over the use of the word”data" as a singular or plural noun?".
First, we need to do some wrangling to replace NA
values of Education
with “Education unknown”, and to calculate the percent of each response.
library(scales)
data_noun_percent <- data_noun %>%
mutate(education = fct_explicit_na(education, na_level = "Education unknown")) %>%
group_by(education, care_data) %>%
summarise(n = n()) %>%
mutate(
prop = n / sum(n),
percent = percent(prop)
) %>%
ungroup() %>%
select(-n)
data_noun_percent
## # A tibble: 24 x 4
## education care_data prop percent
## <ord> <ord> <dbl> <chr>
## 1 Less than high school degree Not at all 0.273 27.3%
## 2 Less than high school degree Not much 0.455 45.5%
## 3 Less than high school degree Some 0.182 18.2%
## 4 Less than high school degree A lot 0.0909 9.1%
## 5 High school degree Not at all 0.28 28.0%
## 6 High school degree Not much 0.34 34.0%
## 7 High school degree Some 0.3 30.0%
## 8 High school degree A lot 0.08 8.0%
## 9 Some college or Associate degree Not at all 0.2 20.0%
## 10 Some college or Associate degree Not much 0.403 40.3%
## # … with 14 more rows
Then reshape the data:
data_noun_percent_wide <- data_noun_percent %>%
select(-prop) %>%
pivot_wider(
names_from = "care_data",
values_from = "percent"
)
data_noun_percent_wide
## # A tibble: 6 x 5
## education `Not at all` `Not much` Some `A lot`
## <ord> <chr> <chr> <chr> <chr>
## 1 Less than high school degree 27.3% 45.5% 18.2% 9.1%
## 2 High school degree 28.0% 34.0% 30.0% 8.0%
## 3 Some college or Associate degree 20.0% 40.3% 29.8% 9.8%
## 4 Bachelor degree 18.90% 35.47% 35.76% 9.88%
## 5 Graduate degree 13.0% 37.3% 31.9% 17.8%
## 6 Education unknown 18% 31% 32% 18%
Then, we can make a table. We’ll clean up the column names, make sure the education column is left aligned and the responses are right aligned, and add a top header with the question “How much, if at all, do you care about the debate over the use of the word ‘data’ as a singular or plural noun?” above the “Not at all”, “Not much”, “Some”, and “A lot” columns:
library(kableExtra)
data_noun_percent_wide %>%
kable(
col.names = c("Education", "Not at all", "Not much", "Some", "A lot"),
align = c("lrrrr")
) %>%
add_header_above(header = c(" " = 1, "How much, if at all, do you care about the debate over the use of the word 'data' as a singular or plural noun?" = 4))
Education | Not at all | Not much | Some | A lot |
---|---|---|---|---|
Less than high school degree | 27.3% | 45.5% | 18.2% | 9.1% |
High school degree | 28.0% | 34.0% | 30.0% | 8.0% |
Some college or Associate degree | 20.0% | 40.3% | 29.8% | 9.8% |
Bachelor degree | 18.90% | 35.47% | 35.76% | 9.88% |
Graduate degree | 13.0% | 37.3% | 31.9% | 17.8% |
Education unknown | 18% | 31% | 32% | 18% |
Doing this required us to hardcode in the question responses and the number of columns (4) corresponding to them (both for the header above, and for the column alignment). Our code is now fragile if this isn’t known in advance, or could change!
Instead, we can think about it this way: we know that the first column contains Education and the rest contain the answers. So, we can set the column names, alignment, and the header using attributes of data_noun_percent_wide
.
For col.names
and align
, we can quickly use names()
and ncol()
, respectively:
k <- data_noun_percent_wide %>%
kable(
col.names = c(
"Education",
names(data_noun_percent_wide)[-1]
),
align = c("l", rep("r", ncol(data_noun_percent_wide) - 1))
)
k
Education | Not at all | Not much | Some | A lot |
---|---|---|---|---|
Less than high school degree | 27.3% | 45.5% | 18.2% | 9.1% |
High school degree | 28.0% | 34.0% | 30.0% | 8.0% |
Some college or Associate degree | 20.0% | 40.3% | 29.8% | 9.8% |
Bachelor degree | 18.90% | 35.47% | 35.76% | 9.88% |
Graduate degree | 13.0% | 37.3% | 31.9% | 17.8% |
Education unknown | 18% | 31% | 32% | 18% |
And for the header, we can create a named vector with the number of columns each header spans (again, using ncol()
):
k_header <- c(1, ncol(data_noun_percent_wide) - 1)
names(k_header) <- c(" ", "How much, if at all, do you care about the debate over the use of the word 'data' as a singular or plural noun?")
k %>%
add_header_above(header = k_header)
Education | Not at all | Not much | Some | A lot |
---|---|---|---|---|
Less than high school degree | 27.3% | 45.5% | 18.2% | 9.1% |
High school degree | 28.0% | 34.0% | 30.0% | 8.0% |
Some college or Associate degree | 20.0% | 40.3% | 29.8% | 9.8% |
Bachelor degree | 18.90% | 35.47% | 35.76% | 9.88% |
Graduate degree | 13.0% | 37.3% | 31.9% | 17.8% |
Education unknown | 18% | 31% | 32% | 18% |
Now our code is more robust and can easily adapt to changes in the data (e.g., if there were additional responses).
We might want to do other things, too, like bold the cell with the highest percent for each of the levels of Education.
To do this, we can use cell_spec()
to set values of percent
to bold if they are the highest value of prop
(the unformatted percent
) for their level of education
.
data_noun_bold_highest_prop <- data_noun_percent %>%
group_by(education) %>%
mutate(highest_prop = max(prop)) %>%
mutate(percent = cell_spec(percent, format = "html", bold = prop == highest_prop))
data_noun_bold_highest_prop
## # A tibble: 24 x 5
## # Groups: education [6]
## education care_data prop percent highest_prop
## <ord> <ord> <dbl> <chr> <dbl>
## 1 Less than high sch… Not at a… 0.273 "<span style=\" \" >27… 0.455
## 2 Less than high sch… Not much 0.455 "<span style=\" font-weigh… 0.455
## 3 Less than high sch… Some 0.182 "<span style=\" \" >18… 0.455
## 4 Less than high sch… A lot 0.0909 "<span style=\" \" >9.… 0.455
## 5 High school degree Not at a… 0.28 "<span style=\" \" >28… 0.34
## 6 High school degree Not much 0.34 "<span style=\" font-weigh… 0.34
## 7 High school degree Some 0.3 "<span style=\" \" >30… 0.34
## 8 High school degree A lot 0.08 "<span style=\" \" >8.… 0.34
## 9 Some college or As… Not at a… 0.2 "<span style=\" \" >20… 0.403
## 10 Some college or As… Not much 0.403 "<span style=\" font-weigh… 0.403
## # … with 14 more rows
This applies HTML formatting to percent
(specifically, bold formatting to the highest value for each level of education
) that will be picked up by kableExtra.
Then, we again reshape the data to a wide format and create a table. This time, we need to set escape = FALSE
in kable()
to let it know that there is HTML in the cells that should not be escaped (i.e., not treated as regular text):
data_noun_bold_highest_prop_wide <- data_noun_bold_highest_prop %>%
select(-prop, -highest_prop) %>%
pivot_wider(
names_from = "care_data",
values_from = "percent"
)
k_header <- c(1, ncol(data_noun_bold_highest_prop_wide) - 1)
names(k_header) <- c(" ", "How much, if at all, do you care about the debate over the use of the word 'data' as a singular or plural noun?")
data_noun_bold_highest_prop_wide %>%
kable("html",
col.names = c("Education", names(data_noun_bold_highest_prop_wide)[-1]),
align = c("l", rep("r", ncol(data_noun_bold_highest_prop_wide) - 1)),
escape = FALSE
) %>%
add_header_above(header = k_header)
Education | Not at all | Not much | Some | A lot |
---|---|---|---|---|
Less than high school degree | 27.3% | 45.5% | 18.2% | 9.1% |
High school degree | 28.0% | 34.0% | 30.0% | 8.0% |
Some college or Associate degree | 20.0% | 40.3% | 29.8% | 9.8% |
Bachelor degree | 18.90% | 35.47% | 35.76% | 9.88% |
Graduate degree | 13.0% | 37.3% | 31.9% | 17.8% |
Education unknown | 18% | 31% | 32% | 18% |
This way, we can apply formatting based on specific values of the data, which may change.
5.6 Take advantage of PDF-only features
Example code that will allow us to embed PDF-only table output. This
uses a couple of basic examples of doing this, with code from the dslabs
package.
library(tidyverse)
library(dslabs)
data("us_contagious_diseases")
top_10_data <- us_contagious_diseases %>%
filter(disease == "Pertussis") %>%
group_by(state) %>%
summarize(total_count = sum(count)) %>%
arrange(desc(total_count)) %>%
slice(1:10)
Then, you can use the kable
function to create a basic PDF table:
The booktabs = TRUE
option, if you include it in the kable
function call, will give
some cleaner default formatting for the table: