vignettes/multiple_resources_purrr.Rmd
multiple_resources_purrr.Rmd
It may be useful to retrieve multiple or all of the resources for a package. For example, the TTC Subway Delay Data is released monthly, with a new file each month.
library(opendatatoronto)
ttc_subway_delays_resources <- list_package_resources("https://open.toronto.ca/dataset/ttc-subway-delay-data/")
ttc_subway_delays_resources
#> # A tibble: 35 x 4
#> name id format last_modified
#> <chr> <chr> <chr> <date>
#> 1 ttc-subway-delay-codes fece136b-224a-412a-b191-8d… XLSX 2019-08-15
#> 2 ttc-subway-delay-jan-2014-a… 6664420f-316f-4f94-9ba4-d4… XLSX 2019-08-15
#> 3 ttc-subway-delay-may-2017 d1159888-0035-45a0-b238-86… XLSX 2019-08-15
#> 4 ttc-subway-delay-june-2017 240d8e8c-d300-4f91-b94f-cb… XLSX 2019-08-15
#> 5 ttc-subway-delay-july-2017 98d4ac77-aa9f-40a3-97ee-6f… XLSX 2019-08-15
#> 6 ttc-subway-delay-august-2017 92e7649a-cf2f-4ac7-9802-b7… XLSX 2019-08-15
#> 7 ttc-subway-delay-september-… 61412f10-656b-4992-9a1a-a1… XLSX 2019-08-15
#> 8 ttc-subway-delay-october-20… 69a6db37-7982-49c7-8dbc-56… XLSX 2019-08-15
#> 9 ttc-subway-delay-november-2… 10080217-8022-41c0-a8ba-2a… XLSX 2019-08-15
#> 10 ttc-subway-delay-december-2… a731c4bb-630a-4530-b590-b3… XLSX 2019-08-15
#> # … with 25 more rows
It would be cumbersome to retrieve them one at a time, e.g. via
library(dplyr)
delays_jan_2019 <- ttc_subway_delays_resources %>%
filter(name == "ttc-subway-delay-january-2019") %>%
get_resource()
delays_feb_2019 <- ttc_subway_delays_resources %>%
filter(name == "ttc-subway-delay-february-2019") %>%
get_resource()
and so on.
Instead, you can iterate through the different resources using the purrr
package and get them in a nested tibble:
library(purrr)
ttc_subway_delays_2019 <- ttc_subway_delays_resources %>%
filter(grepl("2019", name)) %>%
mutate(data = map(id, get_resource))
ttc_subway_delays_2019
#> # A tibble: 12 x 5
#> name id format last_modified data
#> <chr> <chr> <chr> <date> <list>
#> 1 "ttc-subway-delay-j… 9a824dba-20cc-40b1-8… XLSX 2019-08-15 <tibble [1,8…
#> 2 "ttc-subway-delay-f… e6bac74e-2da2-4429-a… XLSX 2019-08-15 <tibble [1,5…
#> 3 "ttc-subway-delay-m… 0511879f-3233-4a42-8… XLSX 2019-08-15 <tibble [1,5…
#> 4 "ttc-subway-delay-a… 447b4a5a-f696-4f05-8… XLSX 2019-08-15 <tibble [1,5…
#> 5 "ttc-subway-delay-m… a302fcab-81a1-4142-b… XLSX 2019-08-15 <tibble [1,7…
#> 6 "ttc-subway-delay-j… 655a138c-d381-4fe7-b… XLSX 2019-08-15 <tibble [1,6…
#> 7 "ttc-subway-delay-J… 34d9619f-0239-4dad-a… XLSX 2019-10-07 <tibble [1,6…
#> 8 "ttc-subway-delay-A… b6557580-a0f4-4c96-9… XLSX 2019-10-07 <tibble [1,6…
#> 9 "ttc-subway-delay-S… e2a5e386-ddf7-4416-8… XLSX 2019-10-07 <tibble [1,5…
#> 10 "ttc-subway-delay-O… fd837bd2-85ed-485e-b… XLSX 2019-11-08 <tibble [1,4…
#> 11 "ttc-subway-delay-N… ac734fde-145d-4313-9… XLSX 2020-02-03 <tibble [1,4…
#> 12 "ttc-subway-delay-D… 967ea5f7-de10-4ca8-a… XLSX 2020-01-30 <tibble [1,5…
All of these resources contain equivalent data, with the same column names
map(ttc_subway_delays_2019[["data"]], colnames)
#> [[1]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[2]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[3]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[4]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[5]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[6]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[7]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[8]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[9]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[10]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[11]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
#>
#> [[12]]
#> [1] "Date" "Time" "Day" "Station" "Code" "Min Delay"
#> [7] "Min Gap" "Bound" "Line" "Vehicle"
so the data can easily be unnested and combined:
bind_rows(ttc_subway_delays_2019[["data"]])
#> # A tibble: 19,222 x 10
#> Date Time Day Station Code `Min Delay` `Min Gap` Bound
#> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 2019-01-01 00:00:00 01:08 Tues… YORK M… PUSI 0 0 S
#> 2 2019-01-01 00:00:00 02:14 Tues… ST AND… PUMST 0 0 <NA>
#> 3 2019-01-01 00:00:00 02:16 Tues… JANE S… TUSC 0 0 W
#> 4 2019-01-01 00:00:00 02:27 Tues… BLOOR … SUO 0 0 N
#> 5 2019-01-01 00:00:00 03:03 Tues… DUPONT… MUATC 11 16 N
#> 6 2019-01-01 00:00:00 03:08 Tues… EGLINT… EUATC 11 16 S
#> 7 2019-01-01 00:00:00 03:09 Tues… DUPONT… EUATC 6 11 N
#> 8 2019-01-01 00:00:00 03:26 Tues… ST CLA… EUATC 4 9 N
#> 9 2019-01-01 00:00:00 03:37 Tues… KENNED… TUMVS 0 0 E
#> 10 2019-01-01 00:00:00 08:04 Tues… DAVISV… MUNOA 5 10 S
#> # … with 19,212 more rows, and 2 more variables: Line <chr>, Vehicle <dbl>