--- title: "Input Functions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Input Functions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} type: docs repo: https://github.com/rstudio/tfestimators menu: main: name: "Input Functions" identifier: "tfestimators-input-functions" parent: "tfestimators-using-tfestimators" weight: 30 --- ```{r setup, include=FALSE} # # https://www.tensorflow.org/get_started/input_fn library(tfestimators) knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(eval = FALSE) ``` ## Overview TensorFlow estimators receive data through input functions. Input functions take an arbitrary data source (in-memory data sets, streaming data, custom data format, and so on) and generate Tensors that can be supplied to TensorFlow models. More concretely, input functions are used to: 1) Turn raw data sources into Tensors, and 2) Configure how data is drawn during training (shuffling, batch size, epochs, etc.) You can also perform feature engineering within an input function; however, it's better to use [feature columns](feature_columns.html) for this purpose whenever possible, as in that case the tranformations are made part of the TensorFlow graph and so can be executed without an R runtime (e.g. when the model is deployed onto a device or server). The **tfestimators** package includes an `input_fn()` function that can create TensorFlow input functions from common R data sources (e.g. data frames and matrices). It's also possible to write a fully custom input function. Both methods of creating input functions are covered below. ## Data Frame Input You can create an input function from an R data frame using the `input_fn()` method. You can specify feature and response variables either explicitly or using the R formula interface. For example, to create an input function for the **mtcars** dataset with features "drat" and "cyl" and response "mpg" you could use this code: ```{r} model %>% train( input_fn(mtcars, features = c(drat, cyl), response = mpg, batch_size = 128, epochs = 3) ) ``` Or alternatively use the R formula interface like this: ```{r} model %>% train( input_fn(mpg ~ drat + cyl, data = mtcars, batch_size = 128, epochs = 3) ) ``` Note that `input_fn` functions provide several parameters for controlling how data is drawn from the input source. These include `batch_size` (defaults to 128), `shuffle` (default to `"auto"`), and `epochs` (defaults to 1). Note that, by default, shuffling is disabled during prediction. ### Training vs. Evaluation It's often the case that you'll want to use the same basic input function for training and evaluation, but need to provide a distinct dataset for each step. In that case you can create a wrapper function that returns the same input function with varying input data. For example, imagine we have already split the **mtcars** dataset into training and test subsets. We could have an input function generator like this: ```{r} mtcars_input_fn <- function(data, ...) { input_fn(data, features = c("drat", "cyl"), response = "mpg", ...) } ``` The `...` parameter is used to forward additional options to `input_fn()`. This helper function could then be used during training and evaluation as follows: ```{r} # train the model model %>% train(mtcars_input_fn(train_data)) # evaluate the model model %>% evaluate(mtcars_input_fn(test_data)) ``` ## Matrix Input As with data frames, you can also pass an R matrix to `input_fn()` to automatically create an input function for the matrix. Note however that in order to specify the `features` and `response` parameters you will need to ensure that your matrix columns are named. For example: ```{r} m <- matrix(c(1:12), nrow = 4, ncol = 3) colnames(m) <- c("x1", "x2", "y") input_fn(m, features = c("x1", "x2"), response = "y") ``` ## List Input There's also a built-in `input_fn()` that works on nested lists, for example: ```{r} input_fn( object = list( inputs = list( list(list(1), list(2), list(3)), list(list(4), list(5), list(6))), output = list( list(1, 2, 3), list(4, 5, 6))), features = "inputs", response = "output" ) ``` In the above example, the data is a list of two named lists where each named list can be seen as different columns in a dataset. In this case, a column named `features` is being used as features to the model and a column named `response` is being used as the response variable.