---
title: "Hyperparameter Tuning"
output: 
  rmarkdown::html_vignette: default
vignette: >
  %\VignetteIndexEntry{Hyperparameter Tuning}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
type: docs
repo: https://github.com/rstudio/tfruns
menu:
  main:
    name: "Hyperparameter Tuning"
    identifier: "tools-tfruns-tuning"
    parent: "tfruns-top"
    weight: 20
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
```


## Overview

Tuning a model often requires exploring the impact of changes to many hyperparameters. The best way to approach this is generally not by changing the source code of the training script as we did above, but instead by defining flags for key parameters then training over the combinations of those flags to determine which combination of flags yields the best model.

## Training Flags

Here's a declaration of 2 flags that control dropout rate within a model:

```{r}
FLAGS <- flags(
  flag_numeric("dropout1", 0.4),
  flag_numeric("dropout2", 0.3)
)
```

These flags are then used in the definition of the model here:

```{r}
model <- keras_model_sequential()
model %>%
  layer_dense(units = 128, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = FLAGS$dropout1) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = FLAGS$dropout2) %>%
  layer_dense(units = 10, activation = 'softmax')
```

Once we've defined flags, we can pass alternate flag values to `training_run()` as follows:

```{r}
training_run('mnist_mlp.R', flags = list(dropout1 = 0.2, dropout2 = 0.2))
```

You aren't required to specify all of the flags (any flags excluded will simply use their default value).

Flags make it very straightforward to systematically explore the impact of changes to hyperparameters on model performance, for example:

```{r}
for (dropout1 in c(0.1, 0.2, 0.3))
  training_run('mnist_mlp.R', flags = list(dropout1 = dropout1))
```

Flag values are automatically included in run data with a "flag_" prefix (e.g. `flag_dropout1`, `flag_dropout2`).

See the article on [training flags](https://tensorflow.rstudio.com/tools/tfruns/overview/) for additional documentation on using flags.

## Tuning Runs

Above we demonstrated writing a loop to call `training_run()` with various different flag values. A better way to accomplish this is the `tuning_run()` function, which allows you to specify multiple values for each flag, and executes training runs for all combinations of the specified flags. For example:

```{r}
# run various combinations of dropout1 and dropout2
runs <- tuning_run("mnist_mlp.R", flags = list(
  dropout1 = c(0.2, 0.3, 0.4),
  dropout2 = c(0.2, 0.3, 0.4)
))

# find the best evaluation accuracy
runs[order(runs$eval_acc, decreasing = TRUE), ]
```
```
Data frame: 9 x 28 
                    run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc
9 runs/2018-01-26T13-21-03Z    0.1002   0.9817      0.0346     0.9900          0.1086         0.9794
6 runs/2018-01-26T13-23-26Z    0.1133   0.9799      0.0409     0.9880          0.1236         0.9778
5 runs/2018-01-26T13-24-11Z    0.1056   0.9796      0.0613     0.9826          0.1119         0.9777
4 runs/2018-01-26T13-24-57Z    0.1098   0.9788      0.0868     0.9770          0.1071         0.9771
2 runs/2018-01-26T13-26-28Z    0.1185   0.9783      0.0688     0.9819          0.1150         0.9783
3 runs/2018-01-26T13-25-43Z    0.1238   0.9782      0.0431     0.9883          0.1246         0.9779
8 runs/2018-01-26T13-21-53Z    0.1064   0.9781      0.0539     0.9843          0.1086         0.9795
7 runs/2018-01-26T13-22-40Z    0.1043   0.9778      0.0796     0.9772          0.1094         0.9777
1 runs/2018-01-26T13-27-14Z    0.1330   0.9769      0.0957     0.9744          0.1304         0.9751
# ... with 21 more columns:
#   flag_batch_size, flag_dropout1, flag_dropout2, samples, validation_samples, batch_size,
#   epochs, epochs_completed, metrics, model, loss_function, optimizer, learning_rate, script,
#   start, end, completed, output, source_code, context, type
```

Note that the `tuning_run()` function returns a data frame containing a summary of all of the executed training runs.

## Experiment Scopes

By default all runs go into the "runs" sub-directory of the current working directory. For various types of ad-hoc experimentation this works well, but in some cases for a tuning run you may want to create a separate directory scope.

You can do this by specifying the `runs_dir` argument:

```{r}
# run various combinations of dropout1 and dropout2
tuning_run("mnist_mlp.R", runs_dir = "dropout_tuning", flags = list(
  dropout1 = c(0.2, 0.3, 0.4),
  dropout2 = c(0.2, 0.3, 0.4)
))

# list runs witin the specified runs_dir
ls_runs(order = eval_acc, runs_dir = "dropout_tuning")
```
```
Data frame: 9 x 28 
                              run_dir eval_acc eval_loss metric_loss metric_acc metric_val_loss metric_val_acc
9 dropout_tuning/2018-01-26T13-38-02Z   0.9803    0.0980      0.0324     0.9902          0.1096         0.9789
6 dropout_tuning/2018-01-26T13-40-40Z   0.9795    0.1243      0.0396     0.9885          0.1341         0.9784
2 dropout_tuning/2018-01-26T13-43-55Z   0.9791    0.1138      0.0725     0.9813          0.1205         0.9773
7 dropout_tuning/2018-01-26T13-39-49Z   0.9786    0.1027      0.0796     0.9778          0.1053         0.9761
3 dropout_tuning/2018-01-26T13-43-08Z   0.9784    0.1206      0.0479     0.9871          0.1246         0.9775
4 dropout_tuning/2018-01-26T13-42-21Z   0.9784    0.1026      0.0869     0.9766          0.1108         0.9769
5 dropout_tuning/2018-01-26T13-41-31Z   0.9783    0.1086      0.0589     0.9832          0.1216         0.9764
8 dropout_tuning/2018-01-26T13-38-57Z   0.9780    0.1007      0.0511     0.9855          0.1100         0.9771
1 dropout_tuning/2018-01-26T13-44-41Z   0.9770    0.1178      0.1017     0.9734          0.1244         0.9757
# ... with 21 more columns:
#   flag_batch_size, flag_dropout1, flag_dropout2, samples, validation_samples, batch_size, epochs,
#   epochs_completed, metrics, model, loss_function, optimizer, learning_rate, script, start, end,
#   completed, output, source_code, context, type
```

## Sampling Flag Combinations

If the number of flag combinations is very large, you can also specify that only a random sample of combinations should be tried using the `sample` parameter. For example:

```{r}
# run random sample (0.3) of dropout1 and dropout2 combinations
runs <- tuning_run("mnist_mlp.R", sample = 0.3, flags = list(
  dropout1 = c(0.2, 0.3, 0.4),
  dropout2 = c(0.2, 0.3, 0.4)
))
```