opendatatoronto
is an R interface to the City of Toronto Open Data Portal. The goal of the package is to help read data directly into R without needing to manually download it via the portal.
For more information, please visit the package website and vignettes:
opendatatoronto
purrr
You can intall the released version of opendatatoronto from CRAN:
install.packages("opendatatoronto")
or the development version from GitHub with:
devtools::install_github("sharlagelfand/opendatatoronto", ref = "main")
In the Portal, datasets are called packages. You can see a list of available packages by using list_packages()
. This will show metadata about the package, including what topics (i.e. tags) the package covers, any civic issues it addresses, a description of it, how many resources there are (and their formats), how often it is is refreshed and when it was last refreshed.
library(opendatatoronto)
packages <- list_packages(limit = 10)
packages
#> # A tibble: 10 x 11
#> title id topics civic_issues publisher excerpt dataset_category
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Cata… 473d… City … <NA> Informat… Histor… Table
#> 2 Lobb… 6a87… City … <NA> Lobbyist… The Lo… Document
#> 3 Addr… abed… Locat… Mobility Informat… This d… Document
#> 4 Prop… 1aca… Locat… Mobility Informat… This d… Document
#> 5 Buil… 108c… Devel… Affordable … Toronto … Provid… Document
#> 6 Buil… 8219… Devel… <NA> Toronto … Provid… Document
#> 7 Muni… 5da2… City … Affordable … Municipa… This d… Document
#> 8 Shor… fc41… Permi… Affordable … Municipa… This d… Table
#> 9 Poll… 7bce… City … <NA> City Cle… Polls … Table
#> 10 Dail… 8a6e… City … Affordable … Shelter,… Daily … Table
#> # … with 4 more variables: num_resources <int>, formats <chr>,
#> # refresh_rate <chr>, last_refreshed <date>
You can also search packages by title:
ttc_packages <- search_packages("ttc")
ttc_packages
#> # A tibble: 14 x 11
#> title id topics civic_issues publisher excerpt dataset_category
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 TTC … 7795… Trans… Mobility Toronto … "Data … Document
#> 2 TTC … 996c… Trans… Mobility Toronto … "TTC S… Document
#> 3 TTC … b68c… Trans… Mobility Toronto … "TTC S… Document
#> 4 TTC … e271… Trans… Mobility Toronto … "TTC B… Document
#> 5 TTC … ef35… Trans… Mobility Toronto … "This … Document
#> 6 TTC … aedd… Trans… Mobility Toronto … "This … Website
#> 7 TTC … 1444… Trans… Mobility Toronto … "This … Website
#> 8 TTC … 4b80… Trans… Mobility Toronto … "This … Website
#> 9 TTC … d2a7… Trans… Mobility Toronto … "This … Website
#> 10 TTC … d9dc… Trans… Mobility Toronto … "This … Document
#> 11 TTC … 2c4c… Finan… Mobility,Fi… Toronto … "This … Website
#> 12 TTC … 4eb6… Trans… Mobility Toronto … "This … Document
#> 13 TTC … c01c… <NA> Mobility Toronto … "This … Document
#> 14 TTC … 8217… Trans… Mobility Toronto … "The N… Document
#> # … with 4 more variables: num_resources <int>, formats <chr>,
#> # refresh_rate <chr>, last_refreshed <date>
Or see metadata for a specific package:
show_package("996cfe8d-fb35-40ce-b569-698d51fc683b")
#> # A tibble: 1 x 11
#> title id topics civic_issues publisher excerpt dataset_category
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 TTC … 996c… <NA> <NA> <NA> <NA> <NA>
#> # … with 4 more variables: num_resources <int>, formats <chr>,
#> # refresh_rate <chr>, last_refreshed <date>
Within a package, there are a number of resources - e.g. CSV, XSLX, JSON, SHP files, and more. Resources are the actual “data”.
For a given package, you can get a list of resources using list_package_resources()
. You can pass it the package id (which is contained in marriage_license_packages
below):
marriage_licence_packages <- search_packages("Marriage Licence Statistics")
marriage_licence_resources <- marriage_licence_packages %>%
list_package_resources()
marriage_licence_resources
#> # A tibble: 1 x 4
#> name id format last_modified
#> <chr> <chr> <chr> <date>
#> 1 Marriage Licence Statistic… 4d985c1d-9c7e-4f74-9864-7321… CSV 2021-01-01
But you can also get a list of resources by using the package’s URL from the Portal:
list_package_resources("https://open.toronto.ca/dataset/sexual-health-clinic-locations-hours-and-services/")
#> # A tibble: 2 x 4
#> name id format last_modified
#> <chr> <chr> <chr> <date>
#> 1 sexual-health-clinic-locations-ho… e958dd45-9426-4298-ac… XLSX 2019-08-15
#> 2 Sexual-health-clinic-locations-ho… 2edcc4a3-c095-4ce3-b0… XLSX 2019-08-15
Finally (and most usefully!), you can download the resource (i.e., the actual data) directly into R using get_resource()
:
marriage_licence_statistics <- marriage_licence_resources %>%
get_resource()
marriage_licence_statistics
#> # A tibble: 461 x 4
#> `_id` CIVIC_CENTRE MARRIAGE_LICENSES TIME_PERIOD
#> <int> <chr> <dbl> <chr>
#> 1 8354 ET 80 2011-01
#> 2 8355 NY 136 2011-01
#> 3 8356 SC 159 2011-01
#> 4 8357 TO 367 2011-01
#> 5 8358 ET 109 2011-02
#> 6 8359 NY 150 2011-02
#> 7 8360 SC 154 2011-02
#> 8 8361 TO 383 2011-02
#> 9 8362 ET 177 2011-03
#> 10 8363 NY 231 2011-03
#> # … with 451 more rows