vignettes/articles/spatial_data.Rmd
spatial_data.Rmd
There is a ton of spatial data on the City of Toronto Open Data Portal. Spatial resources are retrieved the same way as all other resources, by using get_resource()
, and may require the sf
package.
We can look at the locations of EarlyON Child and Family Centres in Toronto. As the portal describes, these centres offer free programs to caregivers and children, providing programs to strengthen relationships, support education, and foster healthy child development. The result of pulling this data in through the package is an sf
object with WGS84 projection.
library(opendatatoronto)
earlyon_centres <- search_packages("EarlyON Child and Family Centres") %>%
list_package_resources() %>%
get_resource()
earlyon_centres
#> Simple feature collection with 262 features and 7 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -79.5969 ymin: 43.59557 xmax: -79.14029 ymax: 43.82965
#> geographic CRS: WGS 84
#> # A tibble: 262 x 8
#> `_id` loc_id program agency address phone rundate geometry
#> <int> <int> <chr> <chr> <chr> <chr> <chr> <POINT [°]>
#> 1 2621 6197 Alexand… Alexan… 105 Gra… 4166… 22JAN21 (-79.39888 43.65154)
#> 2 2622 6199 Applegr… Appleg… 60 Wood… 4164… 22JAN21 (-79.32191 43.66604)
#> 3 2623 6200 Applegr… Appleg… 31 East… 4164… 22JAN21 (-79.31814 43.67293)
#> 4 2624 6202 Birchmo… Birchm… 93 Birc… 4163… 22JAN21 (-79.26311 43.69563)
#> 5 2625 6209 St. Hel… Colleg… 66 Sher… 4168… 22JAN21 (-79.43344 43.64734)
#> 6 2626 6217 Kimbour… East E… 200 Wol… 4164… 22JAN21 (-79.32252 43.68569)
#> 7 2627 6218 Terry F… East E… 2 Gledh… 4164… 22JAN21 (-79.30929 43.68755)
#> 8 2628 6235 Eastvie… East T… 86 Blak… 4163… 22JAN21 (-79.33993 43.67503)
#> 9 2629 6238 Lakesho… Lakesh… 185 Fif… 4162… 22JAN21 (-79.50343 43.60356)
#> 10 2630 6242 The Chi… The To… 826 Blo… 4165… 22JAN21 (-79.42301 43.66268)
#> # … with 252 more rows
If we want to plot this data on a map of Toronto, data to map the different neighbourhoods of Toronto is also available from the portal:
neighbourhoods <- list_package_resources("https://open.toronto.ca/dataset/neighbourhoods/") %>%
get_resource()
neighbourhoods[c("AREA_NAME", "geometry")]
#> Simple feature collection with 140 features and 1 field
#> geometry type: POLYGON
#> dimension: XY
#> bbox: xmin: -79.63926 ymin: 43.581 xmax: -79.11527 ymax: 43.85546
#> geographic CRS: WGS 84
#> # A tibble: 140 x 2
#> AREA_NAME geometry
#> <chr> <POLYGON [°]>
#> 1 Casa Loma (96) ((-79.41469 43.67391, -79.41485 43.67434, -79.41553 …
#> 2 Annex (95) ((-79.39414 43.66872, -79.39588 43.66833, -79.39738 …
#> 3 Caledonia-Fairbank (10… ((-79.46021 43.68156, -79.46044 43.6819, -79.46075 4…
#> 4 Woodbine Corridor (64) ((-79.31485 43.66674, -79.3166 43.66636, -79.31692 4…
#> 5 Lawrence Park South (1… ((-79.41096 43.70408, -79.41165 43.70394, -79.41208 …
#> 6 Milliken (130) ((-79.24308 43.81297, -79.24433 43.81271, -79.24514 …
#> 7 Henry Farm (53) ((-79.35966 43.76649, -79.35966 43.76655, -79.35967 …
#> 8 Downsview-Roding-CFB (… ((-79.50783 43.71776, -79.50854 43.71767, -79.51265 …
#> 9 Kingsview Village-The … ((-79.55236 43.70947, -79.55229 43.7095, -79.55219 4…
#> 10 Kennedy Park (124) ((-79.24549 43.7306, -79.24555 43.73055, -79.24563 4…
#> # … with 130 more rows
Then, we can plot the EarlyON centres along with a map of Toronto:
library(ggplot2)
ggplot() +
geom_sf(data = neighbourhoods) +
geom_sf(data = earlyon_centres) +
theme_void()
We may also wish to do something like analyze how many EarlyON centres there are in each neighbourhood. We can count by neighbourhood, using the sf
package to join the two datasets, then dplyr
to summarise, and finally ggiraph
to create an interactive visualization, replacing geom_sf
with geom_sf_interactive
and supplying a tooltip:
library(sf)
library(dplyr)
library(ggiraph)
library(glue)
earlyon_by_neighbourhood <- neighbourhoods %>%
st_join(earlyon_centres) %>%
group_by(neighbourhood = AREA_NAME) %>%
summarise(n_earlyon = n_distinct(program, na.rm = TRUE)) %>%
mutate(tooltip = glue(("{neighbourhood}: {n_earlyon}")))
p <- ggplot() +
geom_sf_interactive(data = earlyon_by_neighbourhood, aes(fill = n_earlyon, tooltip = tooltip)) +
theme_void()
girafe(code = print(p))
This shows us, for example, that there are 10 EarlyON Centres in West Hill, 7 in South Riverdale, and 5 in Moss Park:
earlyon_by_neighbourhood %>%
as_tibble() %>%
select(neighbourhood, n_earlyon) %>%
arrange(-n_earlyon) %>%
head()
#> # A tibble: 6 x 2
#> neighbourhood n_earlyon
#> <chr> <int>
#> 1 West Hill (136) 10
#> 2 Malvern (132) 9
#> 3 Milliken (130) 8
#> 4 South Riverdale (70) 7
#> 5 Glenfield-Jane Heights (25) 6
#> 6 Dovercourt-Wallace Emerson-Junction (93) 5
But it does not tell us anything about whether these neighbourhoods are over- or under-served in terms of child and family centres.
Instead, it may be better to normalize the number of EarlyON Centres, by something like the population - or better yet, the number of children in each neighbourhood, assuming that families are able to attend programs at the EarlyON Centres in the neighbourhoods they live in.
For this, we can integrate the Neighbourhood Profiles dataset, in which the City of Toronto uses the Census data to provide a profile of the demographic, social, and economic characteristics of the people and households in Toronto neighbourhoods. Note that the latest data is from the 2016 Census, while the EarlyON centres data is up to date - this analysis is purely for illustrative purposes.
We can pull in the Neighbourhood Profiles data, and focus the number of children in each neighbourhood. We make additional use of the tidyr
and stringr
packages to reshape and clean the data.
library(tidyr)
library(stringr)
neighbourhood_profiles <- list_package_resources("https://open.toronto.ca/dataset/neighbourhood-profiles/") %>%
filter(name == "neighbourhood-profiles-2016-csv") %>%
get_resource()
neighbourhoods_children <- neighbourhood_profiles %>%
filter(Characteristic == "Children (0-14 years)") %>%
select(`Agincourt North`:`Yorkdale-Glen Park`) %>%
pivot_longer(cols = everything(), names_to = "neighbourhood", values_to = "children") %>%
mutate(
children = str_remove_all(children, ","),
children = as.numeric(children)
)
neighbourhoods_children
#> # A tibble: 140 x 2
#> neighbourhood children
#> <chr> <dbl>
#> 1 Agincourt North 3840
#> 2 Agincourt South-Malvern West 3075
#> 3 Alderwood 1760
#> 4 Annex 2360
#> 5 Banbury-Don Mills 3605
#> 6 Bathurst Manor 2325
#> 7 Bay Street Corridor 1695
#> 8 Bayview Village 2415
#> 9 Bayview Woods-Steeles 1515
#> 10 Bedford Park-Nortown 4555
#> # … with 130 more rows
There are some differences in how the neighbourhoods are named between the two datasets, so additional cleaning is required, such as removing the neighbourhood numbers from the spatial data set, and fixing inconsistencies and misspellings, before we can combine them.
earlyon_by_neighbourhood <- earlyon_by_neighbourhood %>%
separate(neighbourhood, into = "neighbourhood", sep = " \\(") %>%
mutate(neighbourhood = case_when(
neighbourhood == "Cabbagetown-South St.James Town" ~ "Cabbagetown-South St. James Town",
neighbourhood == "North St.James Town" ~ "North St. James Town",
TRUE ~ neighbourhood
))
neighbourhoods_children <- neighbourhoods_children %>%
mutate(neighbourhood = case_when(
neighbourhood == "Mimico (includes Humber Bay Shores)" ~ "Mimico",
neighbourhood == "Weston-Pelham Park" ~ "Weston-Pellam Park",
TRUE ~ neighbourhood
))
Finally, we can combine the data sets, and calculate the number of EarlyON Centres per 1,000 children:
earlyon_by_neighbourhood_with_children <- earlyon_by_neighbourhood %>%
left_join(neighbourhoods_children, by = "neighbourhood") %>%
mutate(n_earlyon_per_child = n_earlyon / children,
n_earlyon_per_1k_children = round(n_earlyon_per_child * 1000, 2),
tooltip = glue(("{neighbourhood}: {n_earlyon_per_1k_children}"))
)
And visualize that along with the locations of the centres themselves, adjusting the colour scheme to better highlight neighbourhoods without any:
p <- ggplot() +
geom_sf_interactive(data = earlyon_by_neighbourhood_with_children, aes(fill = n_earlyon_per_1k_children, tooltip = tooltip)) +
geom_sf_interactive(data = earlyon_centres, size = 0.25) +
scale_fill_gradient(low = "white", high = "#992a2a") +
labs(title = "Number of EarlyON Child and Family Centres, per 1,000 Children") +
theme_void() +
theme(legend.title = element_blank())
girafe(code = print(p))
Now, we can see that most neighbourhoods have less than 1 EarlyON Centre per 1,000 children, with a number having zero. Moss Park, one of the neighbourhoods we highlighted before, has 3.25 centres per 1,000, and Kensington-Chinatown has the highest, at 3.8 per 1,000 children.
It could be interesting to further quantify the number of children in neighbourhoods who don’t have any centres, since they are all just left at zero in this visualization - but that’s an exercise for another day!