There is a ton of spatial data on the City of Toronto Open Data Portal. Spatial resources are retrieved the same way as all other resources, by using get_resource(), and may require the sf package.

We can look at the locations of EarlyON Child and Family Centres in Toronto. As the portal describes, these centres offer free programs to caregivers and children, providing programs to strengthen relationships, support education, and foster healthy child development. The result of pulling this data in through the package is an sf object with WGS84 projection.

library(opendatatoronto)

earlyon_centres <- search_packages("EarlyON Child and Family Centres") %>%
  list_package_resources() %>%
  get_resource()

earlyon_centres
#> Simple feature collection with 262 features and 7 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: -79.5969 ymin: 43.59557 xmax: -79.14029 ymax: 43.82965
#> geographic CRS: WGS 84
#> # A tibble: 262 x 8
#>    `_id` loc_id program  agency  address  phone rundate                 geometry
#>    <int>  <int> <chr>    <chr>   <chr>    <chr> <chr>                <POINT [°]>
#>  1  2621   6197 Alexand… Alexan… 105 Gra… 4166… 22JAN21     (-79.39888 43.65154)
#>  2  2622   6199 Applegr… Appleg… 60 Wood… 4164… 22JAN21     (-79.32191 43.66604)
#>  3  2623   6200 Applegr… Appleg… 31 East… 4164… 22JAN21     (-79.31814 43.67293)
#>  4  2624   6202 Birchmo… Birchm… 93 Birc… 4163… 22JAN21     (-79.26311 43.69563)
#>  5  2625   6209 St. Hel… Colleg… 66 Sher… 4168… 22JAN21     (-79.43344 43.64734)
#>  6  2626   6217 Kimbour… East E… 200 Wol… 4164… 22JAN21     (-79.32252 43.68569)
#>  7  2627   6218 Terry F… East E… 2 Gledh… 4164… 22JAN21     (-79.30929 43.68755)
#>  8  2628   6235 Eastvie… East T… 86 Blak… 4163… 22JAN21     (-79.33993 43.67503)
#>  9  2629   6238 Lakesho… Lakesh… 185 Fif… 4162… 22JAN21     (-79.50343 43.60356)
#> 10  2630   6242 The Chi… The To… 826 Blo… 4165… 22JAN21     (-79.42301 43.66268)
#> # … with 252 more rows

If we want to plot this data on a map of Toronto, data to map the different neighbourhoods of Toronto is also available from the portal:

neighbourhoods <- list_package_resources("https://open.toronto.ca/dataset/neighbourhoods/") %>%
  get_resource()

neighbourhoods[c("AREA_NAME", "geometry")]
#> Simple feature collection with 140 features and 1 field
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: -79.63926 ymin: 43.581 xmax: -79.11527 ymax: 43.85546
#> geographic CRS: WGS 84
#> # A tibble: 140 x 2
#>    AREA_NAME                                                            geometry
#>    <chr>                                                           <POLYGON [°]>
#>  1 Casa Loma (96)          ((-79.41469 43.67391, -79.41485 43.67434, -79.41553 …
#>  2 Annex (95)              ((-79.39414 43.66872, -79.39588 43.66833, -79.39738 …
#>  3 Caledonia-Fairbank (10… ((-79.46021 43.68156, -79.46044 43.6819, -79.46075 4…
#>  4 Woodbine Corridor (64)  ((-79.31485 43.66674, -79.3166 43.66636, -79.31692 4…
#>  5 Lawrence Park South (1… ((-79.41096 43.70408, -79.41165 43.70394, -79.41208 …
#>  6 Milliken (130)          ((-79.24308 43.81297, -79.24433 43.81271, -79.24514 …
#>  7 Henry Farm (53)         ((-79.35966 43.76649, -79.35966 43.76655, -79.35967 …
#>  8 Downsview-Roding-CFB (… ((-79.50783 43.71776, -79.50854 43.71767, -79.51265 …
#>  9 Kingsview Village-The … ((-79.55236 43.70947, -79.55229 43.7095, -79.55219 4…
#> 10 Kennedy Park (124)      ((-79.24549 43.7306, -79.24555 43.73055, -79.24563 4…
#> # … with 130 more rows

Then, we can plot the EarlyON centres along with a map of Toronto:

library(ggplot2)

ggplot() +
  geom_sf(data = neighbourhoods) +
  geom_sf(data = earlyon_centres) +
  theme_void()

We may also wish to do something like analyze how many EarlyON centres there are in each neighbourhood. We can count by neighbourhood, using the sf package to join the two datasets, then dplyr to summarise, and finally ggiraph to create an interactive visualization, replacing geom_sf with geom_sf_interactive and supplying a tooltip:

library(sf)
library(dplyr)
library(ggiraph)
library(glue)

earlyon_by_neighbourhood <- neighbourhoods %>%
  st_join(earlyon_centres) %>%
  group_by(neighbourhood = AREA_NAME) %>%
  summarise(n_earlyon = n_distinct(program, na.rm = TRUE)) %>%
  mutate(tooltip = glue(("{neighbourhood}: {n_earlyon}")))

p <- ggplot() +
  geom_sf_interactive(data = earlyon_by_neighbourhood, aes(fill = n_earlyon, tooltip = tooltip)) +
  theme_void()

girafe(code = print(p))

This shows us, for example, that there are 10 EarlyON Centres in West Hill, 7 in South Riverdale, and 5 in Moss Park:

earlyon_by_neighbourhood %>%
  as_tibble() %>%
  select(neighbourhood, n_earlyon) %>%
  arrange(-n_earlyon) %>%
  head()
#> # A tibble: 6 x 2
#>   neighbourhood                            n_earlyon
#>   <chr>                                        <int>
#> 1 West Hill (136)                                 10
#> 2 Malvern (132)                                    9
#> 3 Milliken (130)                                   8
#> 4 South Riverdale (70)                             7
#> 5 Glenfield-Jane Heights (25)                      6
#> 6 Dovercourt-Wallace Emerson-Junction (93)         5

But it does not tell us anything about whether these neighbourhoods are over- or under-served in terms of child and family centres.

Instead, it may be better to normalize the number of EarlyON Centres, by something like the population - or better yet, the number of children in each neighbourhood, assuming that families are able to attend programs at the EarlyON Centres in the neighbourhoods they live in.

For this, we can integrate the Neighbourhood Profiles dataset, in which the City of Toronto uses the Census data to provide a profile of the demographic, social, and economic characteristics of the people and households in Toronto neighbourhoods. Note that the latest data is from the 2016 Census, while the EarlyON centres data is up to date - this analysis is purely for illustrative purposes.

We can pull in the Neighbourhood Profiles data, and focus the number of children in each neighbourhood. We make additional use of the tidyr and stringr packages to reshape and clean the data.

library(tidyr)
library(stringr)

neighbourhood_profiles <- list_package_resources("https://open.toronto.ca/dataset/neighbourhood-profiles/") %>%
  filter(name == "neighbourhood-profiles-2016-csv") %>%
  get_resource()

neighbourhoods_children <- neighbourhood_profiles %>%
  filter(Characteristic == "Children (0-14 years)") %>%
  select(`Agincourt North`:`Yorkdale-Glen Park`) %>%
  pivot_longer(cols = everything(), names_to = "neighbourhood", values_to = "children") %>%
  mutate(
    children = str_remove_all(children, ","),
    children = as.numeric(children)
  )

neighbourhoods_children
#> # A tibble: 140 x 2
#>    neighbourhood                children
#>    <chr>                           <dbl>
#>  1 Agincourt North                  3840
#>  2 Agincourt South-Malvern West     3075
#>  3 Alderwood                        1760
#>  4 Annex                            2360
#>  5 Banbury-Don Mills                3605
#>  6 Bathurst Manor                   2325
#>  7 Bay Street Corridor              1695
#>  8 Bayview Village                  2415
#>  9 Bayview Woods-Steeles            1515
#> 10 Bedford Park-Nortown             4555
#> # … with 130 more rows

There are some differences in how the neighbourhoods are named between the two datasets, so additional cleaning is required, such as removing the neighbourhood numbers from the spatial data set, and fixing inconsistencies and misspellings, before we can combine them.

earlyon_by_neighbourhood <- earlyon_by_neighbourhood %>%
  separate(neighbourhood, into = "neighbourhood", sep = " \\(") %>%
  mutate(neighbourhood = case_when(
    neighbourhood == "Cabbagetown-South St.James Town" ~ "Cabbagetown-South St. James Town",
    neighbourhood == "North St.James Town" ~ "North St. James Town",
    TRUE ~ neighbourhood
  ))

neighbourhoods_children <- neighbourhoods_children %>%
  mutate(neighbourhood = case_when(
    neighbourhood == "Mimico (includes Humber Bay Shores)" ~ "Mimico",
    neighbourhood == "Weston-Pelham Park" ~ "Weston-Pellam Park",
    TRUE ~ neighbourhood
  ))

Finally, we can combine the data sets, and calculate the number of EarlyON Centres per 1,000 children:

earlyon_by_neighbourhood_with_children <- earlyon_by_neighbourhood %>%
  left_join(neighbourhoods_children, by = "neighbourhood") %>%
  mutate(n_earlyon_per_child = n_earlyon / children,
    n_earlyon_per_1k_children = round(n_earlyon_per_child * 1000, 2),
    tooltip = glue(("{neighbourhood}: {n_earlyon_per_1k_children}"))
  )

And visualize that along with the locations of the centres themselves, adjusting the colour scheme to better highlight neighbourhoods without any:

p <- ggplot() +
  geom_sf_interactive(data = earlyon_by_neighbourhood_with_children, aes(fill = n_earlyon_per_1k_children, tooltip = tooltip)) +
  geom_sf_interactive(data = earlyon_centres, size = 0.25) + 
  scale_fill_gradient(low = "white", high = "#992a2a") + 
  labs(title = "Number of EarlyON Child and Family Centres, per 1,000 Children") + 
  theme_void() + 
  theme(legend.title = element_blank())

girafe(code = print(p))

Now, we can see that most neighbourhoods have less than 1 EarlyON Centre per 1,000 children, with a number having zero. Moss Park, one of the neighbourhoods we highlighted before, has 3.25 centres per 1,000, and Kensington-Chinatown has the highest, at 3.8 per 1,000 children.

It could be interesting to further quantify the number of children in neighbourhoods who don’t have any centres, since they are all just left at zero in this visualization - but that’s an exercise for another day!