--- title: "Parsing Utilities" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Parsing Utilities} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} type: docs repo: https://github.com/rstudio/tfestimators menu: main: name: "Parsing Utilities" identifier: "tfestimators-parsing-utilities" parent: "tfestimators-advanced" weight: 30 --- ```{r setup, include=FALSE} library(tfestimators) knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(eval = FALSE) ``` ## Overview Parsing utilities are a set of functions that helps generate parsing spec for `tf$parse_example` to be used with estimators. If users keep data in `tf$Example` format, they need to call `tf$parse_example` with a proper feature spec. There are two main things that these utility functions help: * Users need to combine parsing spec of features with labels and weights (if any) since they are all parsed from same `tf$Example` instance. The utility functions combine these specs. * It is difficult to map expected label by a estimator such as `dnn_classifier` to corresponding `tf$parse_example` spec. The utility functions encode it by getting related information from users (key, dtype). ## Example output of parsing spec ```{r} parsing_spec <- classifier_parse_example_spec( feature_columns = column_numeric('a'), label_key = 'b', weight_column = 'c' ) ``` For the above example, `classifier_parse_example_spec` would return the following: ```{r} expected_spec <- list( a = tf$python$ops$parsing_ops$FixedLenFeature(reticulate::tuple(1L), dtype = tf$float32), c = tf$python$ops$parsing_ops$FixedLenFeature(reticulate::tuple(1L), dtype = tf$float32), b = tf$python$ops$parsing_ops$FixedLenFeature(reticulate::tuple(1L), dtype = tf$int64) ) # This should be the same as the one we constructed using `classifier_parse_example_spec` testthat::expect_equal(parsing_spec, expected_spec) ``` ## Example usage with a classifier Firstly, define features transformations and initiailize your classifier similar to the following: ```{r} fcs <- feature_columns(...) model <- dnn_classifier( n_classes = 1000, feature_columns = fcs, weight_column = 'example-weight', label_vocabulary= c('photos', 'keep', ...), hidden_units = c(256, 64, 16) ) ``` Next, create the parsing configuration for `tf$parse_example` using `classifier_parse_example_spec` and the feature columns `fcs` we have just defined: ```{r} parsing_spec <- classifier_parse_example_spec( feature_columns = fcs, label_key = 'my-label', label_dtype = tf$string, weight_column = 'example-weight' ) ``` This label configuration tells the classifier the following: * weights are retrieved with key 'example-weight' * label is string and can be one of the following `c('photos', 'keep', ...)` * integer id for label 'photos' is 0, 'keep' is 1, etc Then define your input function with the help of `read_batch_features` that reads the batches of features from files in `tf$Example` format with the parsing configuration `parsing_spec` we just defined: ```{r} input_fn_train <- function() { features <- tf$contrib$learn$read_batch_features( file_pattern = train_files, batch_size = batch_size, features = parsing_spec, reader = tf$RecordIOReader) labels <- features[["my-label"]] return(list(features, labels)) } ``` Finally we can train the model using the training input function parsed by `classifier_parse_example_spec`: ```{r} train(model, input_fn = input_fn_train) ```