Cloudy with a Chance of Cron
2017-11-08 · 626 words

Sometimes the data we need for an analysis can’t be found in one consolidated table. Most of the time the information we want is not immediately available and has to be collected with time through a slow and monotonous process.

Imagine, for example, that we’d like to download meteorological data from the world’s largest cities every 12 hours for an analysis about weather forecasts. A naive programmer might create alarms on his or her watch and download the necessary tables whenever the alarms go off.

But this doesn’t look like a good strategy, right?

DarkSky

To demonstrate an alternative to this method, we’ll use a weather forecasting service called DarkSky. This platform has become well known for its incredible precision and for its very well-made app, but most people don’t know that DarkSky also has an API for anyone interested in meteorological data.

Lucky for us, hrbrmstr has already created an R interface for this API that can be easily installed with the command below:

# install.packages("devtools")
devtools::install_github("hrbrmstr/darksky")

After the package is installed, go to DarkSky’s dev website, create an account and get a secret key to access the API.

Sys.setenv(DARKSKY_API_KEY = "YOUR SECRET KEY")

Downloading the data

The first step in our analysis is determining the latitudes and longitudes of the biggest cities in the world so that we can get the weather forecasts for these coordinates.

With the maps package we can do this very quickly:

forecasts <- maps::world.cities %>%
  dplyr::as_tibble() %>%
  dplyr::filter(pop > 2000000) %>%
  dplyr::rename(country = country.etc) %>%
  dplyr::select(name, country, lat, long) %>%
  dplyr::mutate(
    currently = list(""),
    hourly = list(""),
    daily = list(""))

In the snippet above we selected all cities with more than 2 million inhabitants (alongside their locations) from the maps::world.cities database. The last 4 lines are preparation for when we get the actual forecasts:

for (i in 1:nrow(forecasts)) {
  forecast <- darksky::get_current_forecast(forecasts$lat[i], forecasts$long[i])
  forecasts$currently[i] <- forecast$currently %>% dplyr::as_tibble() %>% list()
  forecasts$hourly[i] <- forecast$hourly %>% dplyr::as_tibble() %>% list()
  forecasts$daily[i] <- forecast$daily %>% dplyr::as_tibble() %>% list()
}

In the currently column we store the current meteorological state of the cities, while in hourly and daily we store the forecasts for the next 48 hours and next 7 days respectively. Now we just have to save all of this in an RDS file:

file <- lubridate::now() %>%
  lubridate::ymd_hms() %>%
  as.character() %>%
  stringr::str_replace_all("[-: ]", "_") %>%
  stringr::str_c(".rds")

readr::write_rds(forecasts, stringr::str_c("FOLDER FOR FILES", file))

cronR

As you can see, the script described in the section above doesn’t depend on any human input and can be run automatically. Now all that’s left is automating this execution, task we’ll accomplish with cronR.

This package allows us to schedule the execution of any command so that it runs every so many minutes/hours/days/… Make sure that you’re on a machine or server that will not shut down, see if the cron daemon is active and schedule the execution of our script:

cmd <- cronR::cron_rscript("PATH TO SCRIPT")

cronR::cron_add(cmd, "daily", "12AM")
cronR::cron_add(cmd, "daily", "12PM")

And that’s all! In my case, I scheduled the script to run daily at 12PM and at 12AM, but the frequency is up to you (just remember that the free plan only allows 1000 calls per day). To know more about how to change the frequency of executions, check out cronR’s documentation.

Wrap-up

As we’ve seen, it’s not hard to schedule the execution of a script. The hardest part is creating code that works independently of the programmer (i.e. naming the files generated automatically), but after that you only have to call cronR::cron_rscript() and cronR::cron_add().

In my next post I’ll use the data downloaded with this tutorial for an analysis about weather forecasts, so stay tuned for part two!

P.S.: If you want the complete code for my get_forecasts.R file, I’ve made it available as a Gist.


Posts · Teaching · About · 🇧🇷Português