TensorFlow estimators receive data through input functions. Input functions take an arbitrary data source (in-memory data sets, streaming data, custom data format, and so on) and generate Tensors that can be supplied to TensorFlow models.
More concretely, input functions are used to:
You can also perform feature engineering within an input function; however, it’s better to use feature columns for this purpose whenever possible, as in that case the tranformations are made part of the TensorFlow graph and so can be executed without an R runtime (e.g. when the model is deployed onto a device or server).
The tfestimators package includes an
input_fn()
function that can create TensorFlow input
functions from common R data sources (e.g. data frames and matrices).
It’s also possible to write a fully custom input function. Both methods
of creating input functions are covered below.
You can create an input function from an R data frame using the
input_fn()
method. You can specify feature and response
variables either explicitly or using the R formula interface.
For example, to create an input function for the mtcars dataset with features “drat” and “cyl” and response “mpg” you could use this code:
model %>% train(
input_fn(mtcars,
features = c(drat, cyl),
response = mpg,
batch_size = 128,
epochs = 3)
)
Or alternatively use the R formula interface like this:
Note that input_fn
functions provide several parameters
for controlling how data is drawn from the input source. These include
batch_size
(defaults to 128), shuffle
(default
to "auto"
), and epochs
(defaults to 1). Note
that, by default, shuffling is disabled during prediction.
It’s often the case that you’ll want to use the same basic input function for training and evaluation, but need to provide a distinct dataset for each step. In that case you can create a wrapper function that returns the same input function with varying input data.
For example, imagine we have already split the mtcars dataset into training and test subsets. We could have an input function generator like this:
mtcars_input_fn <- function(data, ...) {
input_fn(data,
features = c("drat", "cyl"),
response = "mpg",
...)
}
The ...
parameter is used to forward additional options
to input_fn()
.
This helper function could then be used during training and evaluation as follows:
As with data frames, you can also pass an R matrix to
input_fn()
to automatically create an input function for
the matrix. Note however that in order to specify the
features
and response
parameters you will need
to ensure that your matrix columns are named. For example:
There’s also a built-in input_fn()
that works on nested
lists, for example:
input_fn(
object = list(
inputs = list(
list(list(1), list(2), list(3)),
list(list(4), list(5), list(6))),
output = list(
list(1, 2, 3), list(4, 5, 6))),
features = "inputs",
response = "output"
)
In the above example, the data is a list of two named lists where
each named list can be seen as different columns in a dataset. In this
case, a column named features
is being used as features to
the model and a column named response
is being used as the
response variable.