Managing custom formats

The pins package provides a robust set of functions to read and write standard types of files using standard tools, e.g. CSV files using read.csv() and write.csv(). However, from time to time, you may wish read or write in other ways. You may want to read and write:

  • CSV files using readr or vroom
  • Arrow files without using compression
  • Whole directories that are archived/zipped

You can create a customized approach using either pin_upload() and pin_download(). The goal of this vignette is to show how you can incorporate this customization into your workflow. To see a different approach for when you want to write and read with consistent metadata, see vignette("customize-pins-metadata").

We’ll begin with an example where we write and read uncompressed Arrow files, starting by creating a temporary board:

library(pins)

board <- board_temp()

Upload a single file

Two points to keep in mind:

  • pin_upload() takes a vector of paths to local files.
  • pin_download() returns a vector of paths to local files.

If you are writing a one-off file, you can do everything directly:

pin_name <- "mtcars-arrow"

# file name will be `mtcars-arrow.arrow`
path <- fs::path_temp(fs::path_ext_set(pin_name, "arrow"))

arrow::write_feather(mtcars, path, compression = "uncompressed")

pin_upload(board, paths = path, name = pin_name)
#> Creating new version '20241106T055945Z-a863e'

Reading from the downloaded pin is straightforward; pin_download() returns a local path that can be piped to arrow::read_feather():

mtcars_download <- 
  pin_download(board, pin_name) %>%
  arrow::read_feather()

head(mtcars_download)
#> # A tibble: 6 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
#> 6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1

Function to manage uploading

If you want to write more than one custom file of a certain type, or using a certain tool, you might consider writing a helper function:

pin_upload_arrow <- function(board, x, name, ...) {
  # path deleted when `pin_upload_arrow()` exits
  path <- fs::path_temp(fs::path_ext_set(name, "arrow"))
  withr::defer(fs::file_delete(path))
  
  # custom writer
  arrow::write_feather(x, path, compression = "uncompressed")
  
  pin_upload(board, paths = path, name = name, ...) 
}

This helper function is designed to work like pin_write():

pin_upload_arrow(board, x = mtcars, name = "mtcars-arrow2")
#> Creating new version '20241106T055945Z-a863e'

As before, you can pipe the result of pin_download() to your reader function:

pin_download(board, name = "mtcars-arrow2") %>%
  arrow::read_feather() %>%
  head()
#> # A tibble: 6 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2
#> 6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1

Another example: upload a zipped directory archive as a pin

If you want to use this same approach to archive and pin a whole directory, you can write a helper function like:

pin_upload_archive <- function(board, dir, name, ...) {
  path <- fs::path_temp(fs::path_ext_set(name, "tar.gz"))
  withr::defer(fs::file_delete(path))
  archive::archive_write_dir(path, dir)
  pin_upload(board = board, paths = path, name = name, ...)
}

You can download the compressed archive via pin_download(board, name) and then pipe that path straight to archive::archive_extract() to extract your archive in a new directory.