4  Daily Download Counts for All Packages

The following code does not run here due to time consumption. The generated data is timestamped with a data collection time and produced from local R scripts that are pushed up to Github.

The below walks through a script designed to generate daily download count data for identified packages in the previously generated all_packages file to utilize in creating plots.

First import the necessary packages.

library(tidyverse)
library(packageRank)

Read in the packages stored in the ‘all_packages.csv’ file scraped from the tidyverse gallery and CRAN webpages in previous appendices.

sorted_packages <- read_csv("generated_data/all_packages.csv", skip = 2)

#top 30 packages by download count, excludes ggplot2
top_packages <- c(sorted_packages$package[2:31])

#all packages that can be found on CRAN (cran download count is not null)
packages_in_cran <- c(sorted_packages$package[!is.na(sorted_packages$downloads)])

Defines a ‘get_cd_data’ function that retrieves historical cran download count data using the cranDownloads function of packageRank.

get_cd_data <- function(pkg) {
  cranDownloads(packages = pkg, to = 2025)$cranlogs.data
}

Retrieves cran download data for top 30 packages and all cran packages.

dc_top_packages <- map_dfr(top_packages, get_cd_data)
dc_cran_packages <- map_dfr(packages_in_cran, get_cd_data)
write_csv(dc_top_packages, "generated_data/dc_top_packages.csv")
write_csv(dc_cran_packages, "generated_data/dc_cran_packages.csv")