vignettes/async-example.Rmd
async-example.Rmd
Get the URLs of all US CRAN mirrors, we’ll use this later.
mirrors <- getCRANmirrors()
mirror_urls <- mirrors$URL[mirrors$CountryCode == "us"]
response_time()
will be an async function that returns
the total response time of HEAD request to a single URL.
http_stop_for_status()
is similar to
httr::stop_for_status()
, it throws an error if the status
code is 400 or higher. Should this (or other) error happen, we will just
return Inf
for that URL.
We also add the URL to the result, as a name.
response_time <- async(function(url) {
http_head(url)$
then(http_stop_for_status)$
then(function(x) setNames(x[["times"]][["total"]], url))$
catch(error = function(e) setNames(Inf, url))
})
Let’s try it, with a real CRAN URL, and a test one that returns an HTTP 404 status code.
synchronise(response_time(mirror_urls[[1]]))
## https://cloud.r-project.org/
## 0.045453
synchronise(response_time("https://httpbin.org/status/404"))
## https://httpbin.org/status/404
## Inf
fastest_urls()
will run response_time()
on
all supplied URLs, and stop as soon as 10 of them respond. Then we sort
the results, so the fastest response is first.
fastest_urls <- async(function(urls) {
reqs <- lapply(urls, response_time)
when_some(10, .list = reqs)$
then(function(x) sort(unlist(x)))
})
Let’s run it on the real data.
synchronise(fastest_urls(mirror_urls))
## https://cloud.r-project.org/ https://mirrors.cicku.me/cran/
## 0.053387 0.153928
## https://cran.mirrors.hoobly.com/ https://mirror.its.umich.edu/cran/
## 0.213738 0.221988
## https://ftp.osuosl.org/pub/cran/ https://mirror.las.iastate.edu/CRAN/
## 0.227401 0.238722
## https://cran.case.edu/ https://lib.stat.cmu.edu/R/CRAN/
## 0.405589 0.416950
## https://archive.linux.duke.edu/cran/ http://ftp.ussg.iu.edu/CRAN/
## 0.423765 Inf
By resolving we mean 1. getting the DESCRIPTION
file of the package to be able to look up dependencies, and other
metadata, and 2. getting a download link, in case we want to download
the package later.
For this example we’ll implement two package types: 1. package on GitHub, and 2. package file at a URL.
We start with GitHub packages, these are more complicated. First, we
need to set the Accept
HTTP header when we talk to the
GitHub API, so define a helper function for that. We also set a GitHub
token to avoid GitHub rate limits, this should be in the
GITHUB_PAT
environment variable.
default_gh_headers <- function() {
headers <- c("Accept" = "application/vnd.github.v3+json")
pat <- Sys.getenv("GITHUB_PAT", "")
if (pat != "") {
headers <- c(headers, Authorization = paste("token", pat))
}
headers
}
Getting the DESCRIPTION
file is easy, we just need to
hit the right API endpoint, and then extract the response.
desc::desc()
creates a nice desc::description
object that can be queried.
We use http_stop_for_status()
to error out, if the any
error happens, e.g. the repo or the file does not exist, and we cannot
access it, etc. We don’t catch this (or other) error here, so
get_gh_description()
will throw an error to its
$then()
(or $when_any()
or
$when_all()
) child, or to synchronise()
.
get_gh_description <- async(function(user, repo) {
desc_url <- paste0(
"https://raw.githubusercontent.com/", user, "/", repo,
"/HEAD/DESCRIPTION")
http_get(desc_url, headers = default_gh_headers())$
then(http_stop_for_status)$
then(function(resp) rawToChar(resp$content))$
then(function(txt) desc::desc(text = txt))
})
Test:
synchronise(get_gh_description("jeroen", "curl"))
## Type: Package
## Package: curl
## Title: A Modern and Flexible Web Client for R
## Version: 6.99999
## Authors@R (parsed):
## * Jeroen Ooms <jeroenooms@gmail.com> [aut, cre] (ORCID: <https://orcid.org/0000-0002-4035-0289>)
## * Hadley Wickham [ctb]
## * Posit Software, PBC [cph]
## Description: Bindings to 'libcurl' <https://curl.se/libcurl/> for
## performing fully configurable HTTP/FTP requests where responses can be
## processed in memory, on disk, or streaming via the callback or
## connection interfaces. Some knowledge of 'libcurl' is recommended; for
## a more-user-friendly web client see the 'httr2' package which builds
## on this package with http specific tools and logic.
## License: MIT + file LICENSE
## URL: https://jeroen.r-universe.dev/curl
## BugReports: https://github.com/jeroen/curl/issues
## Depends:
## R (>= 3.0.0)
## Suggests:
## httpuv (>= 1.4.4),
## jsonlite,
## knitr,
## later,
## rmarkdown,
## spelling,
## testthat (>= 1.0.0),
## webutils
## VignetteBuilder:
## knitr
## Encoding: UTF-8
## Language: en-US
## Roxygen: list(load = "installed", markdown = TRUE)
## RoxygenNote: 7.3.2
## SystemRequirements: libcurl (>= 7.73): libcurl-devel (rpm) or
## libcurl4-openssl-dev (deb)
Test non-existing repos, and other errors:
synchronise(get_gh_description("jeroen", "curlqwerty"))
## Error in stop(http_error(resp)): Not Found (HTTP 404).
synchronise(get_gh_description("git", "git"))
## Error in stop(http_error(resp)): Not Found (HTTP 404).
The next thing we need is the SHA of the commit we want to download. For simplicity we always use the last commit of default branch here. This function is very similar to the previous one otherwise.
get_gh_sha <- async(function(user, repo) {
commit_url <- paste0(
"https://api.github.com/repos/", user, "/", repo, "/git/trees/HEAD")
http_get(commit_url, headers = default_gh_headers())$
then(http_stop_for_status)$
then(function(resp) {
cdata <- jsonlite::fromJSON(rawToChar(resp$content),
simplifyVector = FALSE)
cdata$sha
})
})
Test:
synchronise(get_gh_sha("jeroen", "curl"))
## [1] "57b713c909cbd88a22444399ea39f38950675771"
A small helper function to get a GitHub download URL from the user name, repo name and the sha hash:
get_gh_download_url <- function(user, repo, sha) {
paste0("https://api.github.com/repos/", user, "/", repo,
"/zipball/", sha)
}
We are ready to write the GitHub resolver now. It will take a slug,
i.e. the user/repo
form, and then query both the
DESCRIPTION
file and the SHA hash. when_all()
resolves if both of them are done, and results a named list of both
pieces of data. Then we just create the download URL, and simply return
it with the already parsed DESCRIPTION
.
resolve_gh <- async(function(slug) {
slug <- strsplit(slug, "/")[[1]]
user <- slug[[1]]
repo <- slug[[2]]
desc <- get_gh_description(user, repo)
sha <- get_gh_sha(user, repo)
when_all(desc = desc, sha = sha)$
then(function(x) {
list(
url = get_gh_download_url(user, repo, x$sha),
description = x$desc)
})
})
Test:
synchronise(resolve_gh("jeroen/curl"))
## $url
## [1] "https://api.github.com/repos/jeroen/curl/zipball/57b713c909cbd88a22444399ea39f38950675771"
##
## $description
## Type: Package
## Package: curl
## Title: A Modern and Flexible Web Client for R
## Version: 6.99999
## Authors@R (parsed):
## * Jeroen Ooms <jeroenooms@gmail.com> [aut, cre] (ORCID: <https://orcid.org/0000-0002-4035-0289>)
## * Hadley Wickham [ctb]
## * Posit Software, PBC [cph]
## Description: Bindings to 'libcurl' <https://curl.se/libcurl/> for
## performing fully configurable HTTP/FTP requests where responses can be
## processed in memory, on disk, or streaming via the callback or
## connection interfaces. Some knowledge of 'libcurl' is recommended; for
## a more-user-friendly web client see the 'httr2' package which builds
## on this package with http specific tools and logic.
## License: MIT + file LICENSE
## URL: https://jeroen.r-universe.dev/curl
## BugReports: https://github.com/jeroen/curl/issues
## Depends:
## R (>= 3.0.0)
## Suggests:
## httpuv (>= 1.4.4),
## jsonlite,
## knitr,
## later,
## rmarkdown,
## spelling,
## testthat (>= 1.0.0),
## webutils
## VignetteBuilder:
## knitr
## Encoding: UTF-8
## Language: en-US
## Roxygen: list(load = "installed", markdown = TRUE)
## RoxygenNote: 7.3.2
## SystemRequirements: libcurl (>= 7.73): libcurl-devel (rpm) or
## libcurl4-openssl-dev (deb)
Resolving a package at an HTTP URL is much simpler, it needs a single
HTTP request, because we simply need to download the package to get the
DESCRIPTION file. We place the downloaded package in a temporary file
(with the same file name), and return a file://
URL to this
file.
resolve_url <- async(function(url) {
dir.create(tmpdir <- tempfile())
dest <- file.path(tmpdir, basename(url))
http_get(url, file = dest)$
then(http_stop_for_status)$
then(function() {
dest <- normalizePath(dest)
list(
url = paste("file://", normalizePath(dest)),
description = desc::desc(dest))
})
})
Test:
curl20_url <- "https://cloud.r-project.org/src/contrib/Archive/curl/curl_2.0.tar.gz"
synchronise(resolve_url(curl20_url))
## $url
## [1] "file:// /tmp/Rtmp117fgr/file1fe4341c6f38/curl_2.0.tar.gz"
##
## $description
## Type: Package
## Package: curl
## Title: A Modern and Flexible Web Client for R
## Version: 2.0
## Authors@R (parsed):
## * Jeroen Ooms <jeroen.ooms@stat.ucla.edu> [cre, aut]
## * Hadley Wickham <hadley@rstudio.com> [ctb]
## * RStudio [cph]
## Author: Jeroen Ooms [cre, aut], Hadley Wickham [ctb], RStudio
## [cph]
## Maintainer: Jeroen Ooms <jeroen.ooms@stat.ucla.edu>
## Description: The curl() and curl_download() functions provide
## highly configurable drop-in replacements for base url() and
## download.file() with better performance, support for encryption
## (https, ftps), gzip compression, authentication, and other libcurl
## goodies. The core of the package implements a framework for performing
## fully customized requests where data can be processed either in
## memory, on disk, or streaming via the callback or connection
## interfaces. Some knowledge of libcurl is recommended; for a
## more-user-friendly web client see the 'httr' package which builds on
## this package with http specific tools and logic.
## License: MIT + file LICENSE
## URL: https://github.com/jeroenooms/curl#readme
## BugReports: https://github.com/jeroenooms/curl/issues
## Depends:
## R (>= 3.0.0)
## Suggests:
## jsonlite,
## knitr,
## magrittr,
## rmarkdown,
## testthat (>= 1.0.0)
## VignetteBuilder:
## knitr
## Date/Publication: 2016-09-17 00:42:17
## LazyData: true
## NeedsCompilation: yes
## Packaged: 2016-09-16 15:48:22 UTC; jeroen
## Repository: CRAN
## RoxygenNote: 5.0.1
## SystemRequirements: libcurl: libcurl-devel (rpm) or
## libcurl4-openssl-dev (deb).
Now combine the two functions:
res <- synchronise(when_all(
resolve_gh("jeroen/curl"),
resolve_gh("ropensci/magick"),
resolve_url(curl20_url)
))
length(res)
## [1] 3
Errors bubble up to the top level, other downloads are cancelled
automatically (unless they are already completed of course), and an
error is thrown from synchronise()
.
tryCatch(
res <- synchronise(when_all(
resolve_gh("jeroen/curl"),
resolve_gh("ropensci/magick"),
resolve_url(curl20_url),
resolve_url("https://httpbin.org/delay/2")
)),
error = function(e) e
)
## <async error: Line starting '{ ...' is malformed!
## in *parent* callback of `http_get(url, file = dest)$then(http_stop_for_status)$then` at ?/?:?:?>
But we can also handle errors locally, at any point we wish. E.g. we
can simply return NULL
values for the failed packages.
res <- synchronise(when_all(
resolve_gh("jeroen/curl"),
resolve_gh("ropensci/magickfoooooobar")$catch(error = function(e) NULL),
resolve_url(curl20_url),
resolve_url("https://httpbin.org/status/401")$catch(error = function(e) NULL)
))
res[[1]]$description$get("Package")
## Package
## "curl"
res[[2]]
## NULL
res[[3]]$description$get("Version")
## Version
## "2.0"
res[[4]]
## NULL