The tfestimators framework makes it easy to construct and build
machine learning models via its high-level Estimator API.
Estimator
offers classes you can instantiate to quickly
configure common model types such as regressors and classifiers.
But what if none of the predefined model types meets your needs?
Perhaps you need more granular control over model configuration, such as
the ability to customize the loss function used for optimization, or
specify different activation functions for each neural network layer. Or
maybe you’re implementing a ranking or recommendation system, and
neither a classifier nor a regressor is appropriate for generating
predictions. The figure on the right illustrates the basic components of
an estimator. Users can implement custom behaviors and or architecture
inside the model_fn
of the estimator.
This tutorial covers how to create your own Estimator
using the building blocks provided in tfestimators
package,
which will predict the ages of abalones based on their
physical measurements. You’ll learn how to do the following:
Estimator
tf$feature_column
and
tf$layers
tf$losses
The complete code for this tutorial can be found here.
It’s possible to estimate the age of an abalone (sea snail) by the number of rings on its shell. However, because this task requires cutting, staining, and viewing the shell under a microscope, it’s desirable to find other measurements that can predict age.
The Abalone Data Set contains the following feature data for abalone:
Feature | Description |
---|---|
Length | Length of abalone (in longest direction; in mm) |
Diameter | Diameter of abalone (measurement perpendicular to length; in mm) |
Height | Height of abalone (with its meat inside shell; in mm) |
Whole Weight | Weight of entire abalone (in grams) |
Shucked Weight | Weight of abalone meat only (in grams) |
Viscera Weight | Gut weight of abalone (in grams), after bleeding |
Shell Weight | Weight of dried abalone shell (in grams) |
This tutorial uses three data sets. abalone_train.csv
contains labeled training data comprising 3,320 examples. abalone_test.csv
contains labeled test data for 850 examples. abalone_predict
contains 7 examples on which to make predictions.
The following sections walk through writing the
Estimator
code step by step; the full,
final code is available here.
We first write a function that downloads the training, testing, and evaluation data from TensorFlow website if we haven’t downloaded them before.
library(tfestimators)
maybe_download_abalone <- function(train_data_path, test_data_path, predict_data_path, column_names_to_assign) {
if (!file.exists(train_data_path) || !file.exists(test_data_path) || !file.exists(predict_data_path)) {
cat("Downloading abalone data ...")
train_data <- read.csv("http://download.tensorflow.org/data/abalone_train.csv", header = FALSE)
test_data <- read.csv("http://download.tensorflow.org/data/abalone_test.csv", header = FALSE)
predict_data <- read.csv("http://download.tensorflow.org/data/abalone_predict.csv", header = FALSE)
colnames(train_data) <- column_names_to_assign
colnames(test_data) <- column_names_to_assign
colnames(predict_data) <- column_names_to_assign
write.csv(train_data, train_data_path, row.names = FALSE)
write.csv(test_data, test_data_path, row.names = FALSE)
write.csv(predict_data, predict_data_path, row.names = FALSE)
} else {
train_data <- read.csv(train_data_path, header = TRUE)
test_data <- read.csv(test_data_path, header = TRUE)
predict_data <- read.csv(predict_data_path, header = TRUE)
}
return(list(train_data = train_data, test_data = test_data, predict_data = predict_data))
}
COLNAMES <- c("length", "diameter", "height", "whole_weight", "shucked_weight", "viscera_weight", "shell_weight", "num_rings")
downloaded_data <- maybe_download_abalone(
file.path(getwd(), "train_abalone.csv"),
file.path(getwd(), "test_abalone.csv"),
file.path(getwd(), "predict_abalone.csv"),
COLNAMES
)
train_data <- downloaded_data$train_data
test_data <- downloaded_data$test_data
predict_data <- downloaded_data$predict_data
Next, we construct the input function as follows:
When defining a model using one of tf.estimator’s provided classes,
such as linear_dnn_combined_classifier
, you supply all the
configuration parameters right in the constructor, e.g.:
diameter <- column_numeric("diameter")
height <- column_numeric("height")
model <- dnn_linear_combined_classifier(
linear_feature_columns = feature_columns(diameter),
dnn_feature_columns = feature_columns(height),
dnn_hidden_units = c(100L, 50L)
)
You don’t need to write any further code to instruct TensorFlow how
to train the model, calculate loss, or return predictions; that logic is
already baked into the linear_dnn_combined_classifier
.
By contrast, when you’re creating your own estimator from scratch,
the constructor accepts just two high-level parameters for model
configuration, model_fn
and params
:
model_fn
: A function object that contains all the
aforementioned logic to support training, evaluation, and prediction.
You are responsible for implementing that functionality. The next
section, Constructing the
model_fn
covers creating a model function in
detail.
params
: An optional dict of hyperparameters (e.g.,
learning rate, dropout) that will be passed into the
model_fn
.
Note: Just like tfestimators
’ predefined regressors and
classifiers, the estimator
initializer also accepts the
general configuration arguments model_dir
and
config
.
For the abalone age predictor, the model will accept one
hyperparameter: learning rate. Here, learning_rate
is set
to 0.001
, but you can tune this value as needed to achieve
the best results during model training.
The following code creates the list model_params
containing the learning rate and instantiates the
Estimator
:
The basic skeleton for an Estimator
API model function
looks like this:
model_fn <- function(features, labels, mode, params, config) {
# Logic to do the following:
# 1. Configure the model via TensorFlow operations
# 2. Define the loss function for training/evaluation
# 3. Define the training operation/optimizer
# 4. Generate predictions
# 5. Return predictions/loss/train_op/eval_metric_ops in estimator_spec object
}
The model_fn
must accept three arguments:
features
: A dict containing the features passed to the
model via input_fn
.labels
: A Tensor
containing the labels
passed to the model via input_fn
. Will be empty for
predict()
calls, as these are the values the model will
infer.mode
: One of the following mode_keys()
string values indicating the context in which the model_fn was invoked:
"train"
The model_fn
was invoked in
training mode, namely via a train()
call."eval"
. The model_fn
was invoked in
evaluation mode, namely via an evaluate()
call."infer"
. The model_fn
was invoked in
predict mode, namely via a predict()
call.model_fn
may also accept a params
argument
containing a dict of hyperparameters used for training (as shown in the
skeleton above) and a config
that represents the
configurations used in a model, including GPU percentage, cluster
information, etc.
The body of the function performs the following tasks (described in detail in the sections that follow):
optimizer
algorithm to minimize the loss values calculated
by the loss function.The model_fn
must return an estimator_spec
object, which contains the following values:
mode
(required). The mode in which the model was
run. Typically, you will return the mode
argument of the
model_fn
here.
predictions
(required in infer
mode). A
dict that maps key names of your choice to Tensor
s
containing the predictions from the model, e.g.:
In infer
mode, the dict that you return in
estimator_spec
will then be returned by
predict()
, so you can construct it in the format in which
you’d like to consume it.
loss
(required in eval
and
train
modes). A Tensor
containing a scalar
loss value: the output of the model’s loss function (discussed in more
depth later in Defining loss for the model)
calculated over all the input examples. This is used in
train
mode for error handling and logging, and is
automatically included as a metric in eval
mode.
train_op
(required only in train
mode).
An Op that runs one step of training.
eval_metric_ops
(optional). A dict of name/value
pairs specifying the metrics that will be calculated when the model runs
in eval
mode. The name is a label of your choice for the
metric, and the value is the result of your metric calculation. The
tf$metrics
module provides predefined functions for a
variety of common metrics. The following eval_metric_ops
contains an "accuracy"
metric calculated using
tf$metrics$accuracy
:
If you do not specify eval_metric_ops
, only
loss
will be calculated during evaluation.
Constructing a neural network entails creating and connecting the input layer, the hidden layers, and the output layer.
The input layer is a series of nodes (one for each feature in the
model) that will accept the feature data that is passed to the
model_fn
in the features
argument. If
features
contains an n-dimensional Tensor
with
all your feature data, then it can serve as the input layer. If
features
contains a dict of feature columns passed to the
model via an input function, you can convert it to an input-layer
Tensor
with the input_layer
function:
As shown above, input_layer()
takes two required
arguments:
features
. A mapping from string keys to the
Tensors
containing the corresponding feature data. This is
exactly what is passed to the model_fn
in the
features
argument.feature_columns
. A list of all the
FeatureColumns
in the model — age
,
height
, and weight
in the above example.The input layer of the neural network then must be connected to one
or more hidden layers via an activation
function that performs a nonlinear transformation on the data from
the previous layer. The last hidden layer is then connected to the
output layer, the final layer in the model. tf$layers
provides the tf$layers$dense
function for constructing
fully connected layers. The activation is controlled by the
activation
argument. Some options to pass to the
activation
argument are:
tf$nn$relu
. The following code creates a layer of
units
nodes fully connected to the previous layer
input_layer
with a ReLU
activation function:tf$nn$relu6
. The following code creates a layer of
units
nodes fully connected to the previous layer
hidden_layer
with a ReLU 6 activation function:second_hidden_layer <- tf$layers$dense(
inputs = hidden_layer, units = 20L, activation = tf$nn$relu)
NULL
. The following code creates a layer of
units
nodes fully connected to the previous layer
second_hidden_layer
with no activation function,
just a linear transformation:Other activation functions are possible, e.g.:
output_layer <- tf$layers$dense(inputs = second_hidden_layer,
units = 10L, activation_fn = tf$sigmoid)
The above code creates the neural network layer
output_layer
, which is fully connected to
second_hidden_layer
with a sigmoid activation function
tf$sigmoid
.
The network contains two hidden layers, each with 10 nodes and a ReLU
activation function. The output layer contains no activation function,
and is tf$reshape
to a one-dimensional tensor to capture
the model’s predictions, which are stored in
predictions_dict
.
The estimator_spec
returned by the model_fn
must contain loss
: a Tensor
representing the
loss value, which quantifies how well the model’s predictions reflect
the label values during training and evaluation runs. The
tf$losses
module provides convenience functions for
calculating loss using a variety of metrics, including:
absolute_difference(labels, predictions)
. Calculates
loss using the absolute-difference
formula (also known as L1 loss).
log_loss(labels, predictions)
. Calculates loss using
the logistic
loss forumula (typically used in logistic regression).
mean_squared_error(labels, predictions)
. Calculates
loss using the mean squared
error (MSE; also known as L2 loss).
The following example adds a definition for loss
to the
abalone model_fn
using
mean_squared_error()
:
Supplementary metrics for evaluation can be added to an
eval_metric_ops
dict. The following code defines an
rmse
metric, which calculates the root mean squared error
for the model predictions. Note that the labels
tensor is
cast to a float64
type to match the data type of the
predictions
tensor, which will contain real values:
The training op defines the optimization algorithm TensorFlow will
use when fitting the model to the training data. Typically when
training, the goal is to minimize loss. A simple way to create the
training op is to instantiate a tf$train$Optimizer
subclass
and call the minimize
method.
The following code defines a training op for the abalone
model_fn
using the loss value calculated in Defining Loss for the Model, the learning rate
passed to the function in params
, and the gradient descent
optimizer. For global_step
, the convenience function
tf$train$get_global_step
takes care of generating an
integer variable:
Here’s the final, complete model_fn
for the abalone age
predictor. The following code configures the neural network; defines
loss and the training op; and returns a estimator_spec
object containing mode
, predictions_dict
,
loss
, and train_op
:
model_fn <- function(features, labels, mode, params, config) {
# Connect the first hidden layer to input layer
first_hidden_layer <- tf$layers$dense(features, 10L, activation = tf$nn$relu)
# Connect the second hidden layer to first hidden layer with relu
second_hidden_layer <- tf$layers$dense(first_hidden_layer, 10L, activation = tf$nn$relu)
# Connect the output layer to second hidden layer (no activation fn)
output_layer <- tf$layers$dense(second_hidden_layer, 1L)
# Reshape output layer to 1-dim Tensor to return predictions
predictions <- tf$reshape(output_layer, list(-1L))
predictions_list <- list(ages = predictions)
# Calculate loss using mean squared error
loss <- tf$losses$mean_squared_error(labels, predictions)
eval_metric_ops <- list(rmse = tf$metrics$root_mean_squared_error(
tf$cast(labels, tf$float64), predictions
))
optimizer <- tf$train$GradientDescentOptimizer(learning_rate = params$learning_rate)
train_op <- optimizer$minimize(loss = loss, global_step = tf$train$get_global_step())
return(estimator_spec(
mode = mode,
predictions = predictions_list,
loss = loss,
train_op = train_op,
eval_metric_ops = eval_metric_ops
))
}
model_params <- list(learning_rate = 0.001)
model <- estimator(model_fn, params = model_params)
You’ve instantiated an Estimator
for the abalone
predictor and defined its behavior in model_fn
; all that’s
left to do is train, evaluate, and make predictions.
The following code fits the neural network to the training data and
evaluates the model performance based on the
eval_metric_ops
that we have defined: