Title: | Interface to 'TensorFlow' Datasets |
---|---|
Description: | Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details. |
Authors: | Tomasz Kalinowski [ctb, cph, cre], Daniel Falbel [ctb, cph], JJ Allaire [aut, cph], Yuan Tang [aut] , Kevin Ushey [aut], RStudio [cph, fnd], Google Inc. [cph] |
Maintainer: | Tomasz Kalinowski <[email protected]> |
License: | Apache License 2.0 |
Version: | 2.17.0.9000 |
Built: | 2024-12-14 03:35:35 UTC |
Source: | https://github.com/rstudio/tfdatasets |
Currently we only consider "string" type as nominal.
all_nominal()
all_nominal()
Other Selectors:
all_numeric()
,
has_type()
Find all the variables with the following types: "float16", "float32", "float64", "int16", "int32", "int64", "half", "double".
all_numeric()
all_numeric()
Other Selectors:
all_nominal()
,
has_type()
Convert tf_dataset to an iterator that yields R arrays.
as_array_iterator(dataset)
as_array_iterator(dataset)
dataset |
A tensorflow dataset |
An iterable. Use iterate()
or iter_next()
to access values from the iterator.
The function enables you to use a TF Dataset in a stateless "tensor-in tensor-out" expression, without creating an iterator. This facilitates the ease of data transformation on tensors using the optimized TF Dataset abstraction on top of them.
## S3 method for class 'tensorflow.python.data.ops.dataset_ops.DatasetV2' as_tensor(x, ..., name = NULL) ## S3 method for class 'tensorflow.python.data.ops.dataset_ops.DatasetV2' as.array(x, ...)
## S3 method for class 'tensorflow.python.data.ops.dataset_ops.DatasetV2' as_tensor(x, ..., name = NULL) ## S3 method for class 'tensorflow.python.data.ops.dataset_ops.DatasetV2' as.array(x, ...)
x |
A TF Dataset |
... |
passed on to |
name |
(Optional.) A name for the TensorFlow operation. |
For example, consider a preprocess_batch()
which would take as an input
a batch of raw features and returns the processed feature.
preprocess_one_case <- function(x) x + 100 preprocess_batch <- function(raw_features) { batch_size <- dim(raw_features)[1] ds <- raw_features %>% tensor_slices_dataset() %>% dataset_map(preprocess_one_case, num_parallel_calls = batch_size) %>% dataset_batch(batch_size) as_tensor(ds) } raw_features <- array(seq(prod(4, 5)), c(4, 5)) preprocess_batch(raw_features)
In the above example, the batch of raw_features
was converted to a TF
Dataset. Next, each of the raw_feature cases in the batch was mapped using
the preprocess_one_case and the processed features were grouped into a single
batch. The final dataset contains only one element which is a batch of all
the processed features.
Note: The dataset should contain only one element. Now, instead of creating
an iterator for the dataset and retrieving the batch of features, the
as_tensor()
function is used to skip the iterator creation process and
directly output the batch of features.
This can be particularly useful when your tensor transformations are expressed as TF Dataset operations, and you want to use those transformations while serving your model.
Creates a dataset that deterministically chooses elements from datasets.
choose_from_datasets(datasets, choice_dataset, stop_on_empty_dataset = TRUE)
choose_from_datasets(datasets, choice_dataset, stop_on_empty_dataset = TRUE)
datasets |
A non-empty list of tf.data.Dataset objects with compatible structure. |
choice_dataset |
A |
stop_on_empty_dataset |
If |
Returns a dataset that interleaves elements from datasets according to the values of choice_dataset.
## Not run: datasets <- list(tensors_dataset("foo") %>% dataset_repeat(), tensors_dataset("bar") %>% dataset_repeat(), tensors_dataset("baz") %>% dataset_repeat()) # Define a dataset containing `[0, 1, 2, 0, 1, 2, 0, 1, 2]`. choice_dataset <- range_dataset(0, 3) %>% dataset_repeat(3) result <- choose_from_datasets(datasets, choice_dataset) result %>% as_array_iterator() %>% iterate(function(s) s$decode()) %>% print() # [1] "foo" "bar" "baz" "foo" "bar" "baz" "foo" "bar" "baz" ## End(Not run)
## Not run: datasets <- list(tensors_dataset("foo") %>% dataset_repeat(), tensors_dataset("bar") %>% dataset_repeat(), tensors_dataset("baz") %>% dataset_repeat()) # Define a dataset containing `[0, 1, 2, 0, 1, 2, 0, 1, 2]`. choice_dataset <- range_dataset(0, 3) %>% dataset_repeat(3) result <- choose_from_datasets(datasets, choice_dataset) result %>% as_array_iterator() %>% iterate(function(s) s$decode()) %>% print() # [1] "foo" "bar" "baz" "foo" "bar" "baz" "foo" "bar" "baz" ## End(Not run)
The components of the resulting element will have an additional outer
dimension, which will be batch_size
(or N %% batch_size
for the last
element if batch_size
does not divide the number of input elements N
evenly and drop_remainder
is FALSE
). If your program depends on the
batches having the same outer dimension, you should set the drop_remainder
argument to TRUE
to prevent the smaller batch from being produced.
dataset_batch( dataset, batch_size, drop_remainder = FALSE, num_parallel_calls = NULL, deterministic = NULL )
dataset_batch( dataset, batch_size, drop_remainder = FALSE, num_parallel_calls = NULL, deterministic = NULL )
dataset |
A dataset |
batch_size |
An integer, representing the number of consecutive elements of this dataset to combine in a single batch. |
drop_remainder |
(Optional.) A boolean, representing whether the last
batch should be dropped in the case it has fewer than |
num_parallel_calls |
(Optional.) A scalar integer, representing the
number of batches to compute asynchronously in parallel. If not specified,
batches will be computed sequentially. If the value |
deterministic |
(Optional.) When |
A dataset
If your program requires data to have a statically known shape (e.g.,
when using XLA), you should use drop_remainder=TRUE
. Without
drop_remainder=TRUE
the shape of the output dataset will have an unknown
leading dimension due to the possibility of a smaller final batch.
Other dataset methods:
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Dataset
by lengthA transformation that buckets elements in a Dataset
by length
dataset_bucket_by_sequence_length( dataset, element_length_func, bucket_boundaries, bucket_batch_sizes, padded_shapes = NULL, padding_values = NULL, pad_to_bucket_boundary = FALSE, no_padding = FALSE, drop_remainder = FALSE, name = NULL )
dataset_bucket_by_sequence_length( dataset, element_length_func, bucket_boundaries, bucket_batch_sizes, padded_shapes = NULL, padding_values = NULL, pad_to_bucket_boundary = FALSE, no_padding = FALSE, drop_remainder = FALSE, name = NULL )
dataset |
A |
element_length_func |
function from element in |
bucket_boundaries |
integers, upper length boundaries of the buckets. |
bucket_batch_sizes |
integers, batch size per bucket. Length should be
|
padded_shapes |
Nested structure of |
padding_values |
Values to pad with, passed to
|
pad_to_bucket_boundary |
bool, if |
no_padding |
boolean, indicates whether to pad the batch features (features
need to be either of type |
drop_remainder |
(Optional.) A logical scalar, representing
whether the last batch should be dropped in the case it has fewer than
|
name |
(Optional.) A name for the tf.data operation. |
Elements of the Dataset
are grouped together by length and then are padded
and batched.
This is useful for sequence tasks in which the elements have variable length. Grouping together elements that have similar lengths reduces the total fraction of padding in a batch which increases training step efficiency.
Below is an example to bucketize the input data to the 3 buckets "[0, 3), [3, 5), [5, Inf)" based on sequence length, with batch size 2.
## Not run: dataset <- list(c(0), c(1, 2, 3, 4), c(5, 6, 7), c(7, 8, 9, 10, 11), c(13, 14, 15, 16, 17, 18, 19, 20), c(21, 22)) %>% lapply(as.array) %>% lapply(as_tensor, "int32") %>% lapply(tensors_dataset) %>% Reduce(dataset_concatenate, .) dataset %>% dataset_bucket_by_sequence_length( element_length_func = function(elem) tf$shape(elem)[1], bucket_boundaries = c(3, 5), bucket_batch_sizes = c(2, 2, 2) ) %>% as_array_iterator() %>% iterate(print) # [,1] [,2] [,3] [,4] # [1,] 1 2 3 4 # [2,] 5 6 7 0 # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] # [1,] 7 8 9 10 11 0 0 0 # [2,] 13 14 15 16 17 18 19 20 # [,1] [,2] # [1,] 0 0 # [2,] 21 22 ## End(Not run)
## Not run: dataset <- list(c(0), c(1, 2, 3, 4), c(5, 6, 7), c(7, 8, 9, 10, 11), c(13, 14, 15, 16, 17, 18, 19, 20), c(21, 22)) %>% lapply(as.array) %>% lapply(as_tensor, "int32") %>% lapply(tensors_dataset) %>% Reduce(dataset_concatenate, .) dataset %>% dataset_bucket_by_sequence_length( element_length_func = function(elem) tf$shape(elem)[1], bucket_boundaries = c(3, 5), bucket_batch_sizes = c(2, 2, 2) ) %>% as_array_iterator() %>% iterate(print) # [,1] [,2] [,3] [,4] # [1,] 1 2 3 4 # [2,] 5 6 7 0 # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] # [1,] 7 8 9 10 11 0 0 0 # [2,] 13 14 15 16 17 18 19 20 # [,1] [,2] # [1,] 0 0 # [2,] 21 22 ## End(Not run)
Caches the elements in this dataset.
dataset_cache(dataset, filename = NULL)
dataset_cache(dataset, filename = NULL)
dataset |
A dataset |
filename |
String with the name of a directory on the filesystem to use for caching tensors in this Dataset. If a filename is not provided, the dataset will be cached in memory. |
A dataset
Other dataset methods:
dataset_batch()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Iterates throught the dataset collecting every element into a list. It's useful for looking at the full result of the dataset. Note: You may run out of memory if your dataset is too big.
dataset_collect(dataset, iter_max = Inf)
dataset_collect(dataset, iter_max = Inf)
dataset |
A dataset |
iter_max |
Maximum number of iterations. |
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Creates a dataset by concatenating given dataset with this dataset.
dataset_concatenate(dataset, ...)
dataset_concatenate(dataset, ...)
dataset , ...
|
|
A dataset
Input dataset and dataset to be concatenated should have same nested structures and output types.
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Transform a dataset with delimted text lines into a dataset with named columns
dataset_decode_delim(dataset, record_spec, parallel_records = NULL)
dataset_decode_delim(dataset, record_spec, parallel_records = NULL)
dataset |
Dataset containing delimited text lines (e.g. a CSV) |
record_spec |
Specification of column names and types (see |
parallel_records |
(Optional) An integer, representing the number of records to decode in parallel. If not specified, records will be processed sequentially. |
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Enumerates the elements of this dataset
dataset_enumerate(dataset, start = 0L)
dataset_enumerate(dataset, start = 0L)
dataset |
A tensorflow dataset |
start |
An integer (coerced to a |
It is similar to python's enumerate
, this transforms a sequence of
elements into a sequence of list(index, element)
, where index is an integer
that indicates the position of the element in the sequence.
## Not run: dataset <- tensor_slices_dataset(100:103) %>% dataset_enumerate() iterator <- reticulate::as_iterator(dataset) reticulate::iter_next(iterator) # list(0, 100) reticulate::iter_next(iterator) # list(1, 101) reticulate::iter_next(iterator) # list(2, 102) reticulate::iter_next(iterator) # list(3, 103) reticulate::iter_next(iterator) # NULL (iterator exhausted) reticulate::iter_next(iterator) # NULL (iterator exhausted) ## End(Not run)
## Not run: dataset <- tensor_slices_dataset(100:103) %>% dataset_enumerate() iterator <- reticulate::as_iterator(dataset) reticulate::iter_next(iterator) # list(0, 100) reticulate::iter_next(iterator) # list(1, 101) reticulate::iter_next(iterator) # list(2, 102) reticulate::iter_next(iterator) # list(3, 103) reticulate::iter_next(iterator) # NULL (iterator exhausted) reticulate::iter_next(iterator) # NULL (iterator exhausted) ## End(Not run)
Filter a dataset by a predicate
dataset_filter(dataset, predicate)
dataset_filter(dataset, predicate)
dataset |
A dataset |
predicate |
A function mapping a nested structure of tensors (having
shapes and types defined by |
Note that the functions used inside the predicate must be
tensor operations (e.g. tf$not_equal
, tf$less
, etc.). R
generic methods for relational operators (e.g. <
, >
, <=
,
etc.) and logical operators (e.g. !
, &
, |
, etc.) are
provided so you can use shorthand syntax for most common
comparisions (this is illustrated by the example below).
A dataset composed of records that matched the predicate.
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
## Not run: dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_filter(function(record) { record$mpg >= 20 }) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_filter(function(record) { record$mpg >= 20 & record$cyl >= 6L }) ## End(Not run)
## Not run: dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_filter(function(record) { record$mpg >= 20 }) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_filter(function(record) { record$mpg >= 20 & record$cyl >= 6L }) ## End(Not run)
Maps map_func across this dataset and flattens the result.
dataset_flat_map(dataset, map_func)
dataset_flat_map(dataset, map_func)
dataset |
A dataset |
map_func |
A function mapping a nested structure of tensors (having
shapes and types defined by |
A dataset
Group windows of elements by key and reduce them
dataset_group_by_window( dataset, key_func, reduce_func, window_size = NULL, window_size_func = NULL, name = NULL )
dataset_group_by_window( dataset, key_func, reduce_func, window_size = NULL, window_size_func = NULL, name = NULL )
dataset |
a TF Dataset |
key_func |
A function mapping a nested structure of tensors (having
shapes and types defined by |
reduce_func |
A function mapping a key and a dataset of up to
|
window_size |
A |
window_size_func |
A function mapping a key to a |
name |
(Optional.) A name for the Tensorflow operation. |
This transformation maps each consecutive element in a dataset to a
key using key_func()
and groups the elements by key. It then applies
reduce_func()
to at most window_size_func(key)
elements matching the same
key. All except the final window for each key will contain
window_size_func(key)
elements; the final window may be smaller.
You may provide either a constant window_size
or a window size determined
by the key through window_size_func
.
window_size <- 5 dataset <- range_dataset(to = 10) %>% dataset_group_by_window( key_func = function(x) x %% 2, reduce_func = function(key, ds) dataset_batch(ds, window_size), window_size = window_size ) it <- as_array_iterator(dataset) while (!is.null(elem <- iter_next(it))) print(elem) #> tf.Tensor([0 2 4 6 8], shape=(5), dtype=int64) #> tf.Tensor([1 3 5 7 9], shape=(5), dtype=int64)
Maps map_func across this dataset, and interleaves the results
dataset_interleave(dataset, map_func, cycle_length, block_length = 1)
dataset_interleave(dataset, map_func, cycle_length, block_length = 1)
dataset |
A dataset |
map_func |
A function mapping a nested structure of tensors (having
shapes and types defined by |
cycle_length |
The number of elements from this dataset that will be processed concurrently. |
block_length |
The number of consecutive elements to produce from each input element before cycling to another input element. |
The cycle_length
and block_length
arguments control the order in which
elements are produced. cycle_length
controls the number of input elements
that are processed concurrently. In general, this transformation will apply
map_func
to cycle_length
input elements, open iterators on the returned
dataset objects, and cycle through them producing block_length
consecutive
elements from each iterator, and consuming the next input element each time
it reaches the end of an iterator.
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
## Not run: dataset <- tensor_slices_dataset(c(1,2,3,4,5)) %>% dataset_interleave(cycle_length = 2, block_length = 4, function(x) { tensors_dataset(x) %>% dataset_repeat(6) }) # resulting dataset (newlines indicate "block" boundaries): c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, ) ## End(Not run)
## Not run: dataset <- tensor_slices_dataset(c(1,2,3,4,5)) %>% dataset_interleave(cycle_length = 2, block_length = 4, function(x) { tensors_dataset(x) %>% dataset_repeat(6) }) # resulting dataset (newlines indicate "block" boundaries): c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, ) ## End(Not run)
Map a function across a dataset.
dataset_map(dataset, map_func, num_parallel_calls = NULL)
dataset_map(dataset, map_func, num_parallel_calls = NULL)
dataset |
A dataset |
map_func |
A function mapping a nested structure of tensors (having
shapes and types defined by |
num_parallel_calls |
(Optional) An integer, representing the number of elements to process in parallel If not specified, elements will be processed sequentially. |
A dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Maps 'map_func“ across batch_size consecutive elements of this dataset and then combines them into a batch. Functionally, it is equivalent to map followed by batch. However, by fusing the two transformations together, the implementation can be more efficient.
dataset_map_and_batch( dataset, map_func, batch_size, num_parallel_batches = NULL, drop_remainder = FALSE, num_parallel_calls = NULL )
dataset_map_and_batch( dataset, map_func, batch_size, num_parallel_batches = NULL, drop_remainder = FALSE, num_parallel_calls = NULL )
dataset |
A dataset |
map_func |
A function mapping a nested structure of tensors (having
shapes and types defined by |
batch_size |
An integer, representing the number of consecutive elements of this dataset to combine in a single batch. |
num_parallel_batches |
(Optional) An integer, representing the number of batches to create in parallel. On one hand, higher values can help mitigate the effect of stragglers. On the other hand, higher values can increase contention if CPU is scarce. |
drop_remainder |
(Optional.) A boolean, representing whether the last
batch should be dropped in the case it has fewer than |
num_parallel_calls |
(Optional) An integer, representing the number of elements to process in parallel If not specified, elements will be processed sequentially. |
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Get or Set Dataset Options
dataset_options(dataset, ...)
dataset_options(dataset, ...)
dataset |
a tensorflow dataset |
... |
Valid values include:
|
The options are "global" in the sense they apply to the entire dataset. If options are set multiple times, they are merged as long as different options do not use different non-default values.
If values are supplied to ...
, returns a tf.data.Dataset
with the
given options set/updated. Otherwise, returns the currently set options for
the dataset.
## Not run: # pass options directly: range_dataset(0, 10) %>% dataset_options( experimental_deterministic = FALSE, threading.private_threadpool_size = 10 ) # pass options as a named list: opts <- list( experimental_deterministic = FALSE, threading.private_threadpool_size = 10 ) range_dataset(0, 10) %>% dataset_options(opts) # pass a tf.data.Options() instance opts <- tf$data$Options() opts$experimental_deterministic <- FALSE opts$threading$private_threadpool_size <- 10L range_dataset(0, 10) %>% dataset_options(opts) # get currently set options range_dataset(0, 10) %>% dataset_options() ## End(Not run)
## Not run: # pass options directly: range_dataset(0, 10) %>% dataset_options( experimental_deterministic = FALSE, threading.private_threadpool_size = 10 ) # pass options as a named list: opts <- list( experimental_deterministic = FALSE, threading.private_threadpool_size = 10 ) range_dataset(0, 10) %>% dataset_options(opts) # pass a tf.data.Options() instance opts <- tf$data$Options() opts$experimental_deterministic <- FALSE opts$threading$private_threadpool_size <- 10L range_dataset(0, 10) %>% dataset_options(opts) # get currently set options range_dataset(0, 10) %>% dataset_options() ## End(Not run)
Combines consecutive elements of this dataset into padded batches.
dataset_padded_batch( dataset, batch_size, padded_shapes = NULL, padding_values = NULL, drop_remainder = FALSE, name = NULL )
dataset_padded_batch( dataset, batch_size, padded_shapes = NULL, padding_values = NULL, drop_remainder = FALSE, name = NULL )
dataset |
A dataset |
batch_size |
An integer, representing the number of consecutive elements of this dataset to combine in a single batch. |
padded_shapes |
(Optional.) A (nested) structure of
|
padding_values |
(Optional.) A (nested) structure of scalar-shaped
|
drop_remainder |
(Optional.) A boolean scalar, representing
whether the last batch should be dropped in the case it has fewer than
|
name |
(Optional.) A name for the tf.data operation. Requires tensorflow version >= 2.7. |
This transformation combines multiple consecutive elements of the input dataset into a single element.
Like dataset_batch()
, the components of the resulting element will
have an additional outer dimension, which will be batch_size
(or
N %% batch_size
for the last element if batch_size
does not divide the
number of input elements N
evenly and drop_remainder
is FALSE
). If
your program depends on the batches having the same outer dimension, you
should set the drop_remainder
argument to TRUE
to prevent the smaller
batch from being produced.
Unlike dataset_batch()
, the input elements to be batched may have
different shapes, and this transformation will pad each component to the
respective shape in padded_shapes
. The padded_shapes
argument
determines the resulting shape for each dimension of each component in an
output element:
If the dimension is a constant, the component will be padded out to that length in that dimension.
If the dimension is unknown, the component will be padded out to the maximum length of all elements in that dimension.
See also tf$data$experimental$dense_to_sparse_batch
, which combines
elements that may have different shapes into a tf$sparse$SparseTensor
.
A tf_dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
## Not run: A <- range_dataset(1, 5, dtype = tf$int32) %>% dataset_map(function(x) tf$fill(list(x), x)) # Pad to the smallest per-batch size that fits all elements. B <- A %>% dataset_padded_batch(2) B %>% as_array_iterator() %>% iterate(print) # Pad to a fixed size. C <- A %>% dataset_padded_batch(2, padded_shapes=5) C %>% as_array_iterator() %>% iterate(print) # Pad with a custom value. D <- A %>% dataset_padded_batch(2, padded_shapes=5, padding_values = -1L) D %>% as_array_iterator() %>% iterate(print) # Pad with a single value and multiple components. E <- zip_datasets(A, A) %>% dataset_padded_batch(2, padding_values = -1L) E %>% as_array_iterator() %>% iterate(print) ## End(Not run)
## Not run: A <- range_dataset(1, 5, dtype = tf$int32) %>% dataset_map(function(x) tf$fill(list(x), x)) # Pad to the smallest per-batch size that fits all elements. B <- A %>% dataset_padded_batch(2) B %>% as_array_iterator() %>% iterate(print) # Pad to a fixed size. C <- A %>% dataset_padded_batch(2, padded_shapes=5) C %>% as_array_iterator() %>% iterate(print) # Pad with a custom value. D <- A %>% dataset_padded_batch(2, padded_shapes=5, padding_values = -1L) D %>% as_array_iterator() %>% iterate(print) # Pad with a single value and multiple components. E <- zip_datasets(A, A) %>% dataset_padded_batch(2, padding_values = -1L) E %>% as_array_iterator() %>% iterate(print) ## End(Not run)
Creates a Dataset that prefetches elements from this dataset.
dataset_prefetch(dataset, buffer_size = tf$data$AUTOTUNE)
dataset_prefetch(dataset, buffer_size = tf$data$AUTOTUNE)
dataset |
A dataset |
buffer_size |
An integer, representing the maximum number elements that will be buffered when prefetching. |
A dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
device
A transformation that prefetches dataset values to the given device
dataset_prefetch_to_device(dataset, device, buffer_size = NULL)
dataset_prefetch_to_device(dataset, device, buffer_size = NULL)
dataset |
A dataset |
device |
A string. The name of a device to which elements will be prefetched (e.g. "/gpu:0"). |
buffer_size |
(Optional.) The number of elements to buffer on device. Defaults to an automatically chosen value. |
A dataset
Although the transformation creates a dataset, the transformation must be the final dataset in the input pipeline.
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Transform a dataset with named columns into a list with features (x
) and
response (y
) elements.
dataset_prepare( dataset, x, y = NULL, named = TRUE, named_features = FALSE, parallel_records = NULL, batch_size = NULL, num_parallel_batches = NULL, drop_remainder = FALSE )
dataset_prepare( dataset, x, y = NULL, named = TRUE, named_features = FALSE, parallel_records = NULL, batch_size = NULL, num_parallel_batches = NULL, drop_remainder = FALSE )
dataset |
A dataset |
x |
Features to include. When |
y |
(Optional). Response variable. |
named |
|
named_features |
|
parallel_records |
(Optional) An integer, representing the number of records to decode in parallel. If not specified, records will be processed sequentially. |
batch_size |
(Optional). Batch size if you would like to fuse the
|
num_parallel_batches |
(Optional) An integer, representing the number of batches to create in parallel. On one hand, higher values can help mitigate the effect of stragglers. On the other hand, higher values can increase contention if CPU is scarce. |
drop_remainder |
(Optional.) A boolean, representing whether the last
batch should be dropped in the case it has fewer than |
A dataset. The dataset will have a structure of either:
When named_features
is TRUE
: list(x = list(feature_name = feature_values, ...), y = response_values)
When named_features
is FALSE
: list(x = features_array, y = response_values)
,
where features_array
is a Rank 2 array of (batch_size, num_features)
.
Note that the y
element will be omitted when y
is NULL
.
input_fn() for use with tfestimators.
The transformation calls reduce_func successively on every element of the input dataset until the dataset is exhausted, aggregating information in its internal state. The initial_state argument is used for the initial state and the final state is returned as the result.
dataset_reduce(dataset, initial_state, reduce_func)
dataset_reduce(dataset, initial_state, reduce_func)
dataset |
A dataset |
initial_state |
An element representing the initial state of the transformation. |
reduce_func |
A function that maps |
A dataset element.
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
A transformation that resamples a dataset to a target distribution.
dataset_rejection_resample( dataset, class_func, target_dist, initial_dist = NULL, seed = NULL, name = NULL )
dataset_rejection_resample( dataset, class_func, target_dist, initial_dist = NULL, seed = NULL, name = NULL )
dataset |
A |
class_func |
A function mapping an element of the input dataset to a
scalar |
target_dist |
A floating point type tensor, shaped |
initial_dist |
(Optional.) A floating point type tensor, shaped
|
seed |
(Optional.) Integer seed for the resampler. |
name |
(Optional.) A name for the tf.data operation. |
A tf.Dataset
## Not run: initial_dist <- c(.5, .5) target_dist <- c(.6, .4) num_classes <- length(initial_dist) num_samples <- 100000 data <- sample.int(num_classes, num_samples, prob = initial_dist, replace = TRUE) dataset <- tensor_slices_dataset(data) tally <- c(0, 0) `add<-` <- function (x, value) x + value # tfautograph::autograph({ # for(i in dataset) # add(tally[as.numeric(i)]) <- 1 # }) dataset %>% as_array_iterator() %>% iterate(function(i) { add(tally[i]) <<- 1 }, simplify = FALSE) # The value of `tally` will be close to c(50000, 50000) as # per the `initial_dist` distribution. tally # c(50287, 49713) tally <- c(0, 0) dataset %>% dataset_rejection_resample( class_func = function(x) (x-1) %% 2, target_dist = target_dist, initial_dist = initial_dist ) %>% as_array_iterator() %>% iterate(function(element) { names(element) <- c("class_id", "i") add(tally[element$i]) <<- 1 }, simplify = FALSE) # The value of tally will be now be close to c(75000, 50000) # thus satisfying the target_dist distribution. tally # c(74822, 49921) ## End(Not run)
## Not run: initial_dist <- c(.5, .5) target_dist <- c(.6, .4) num_classes <- length(initial_dist) num_samples <- 100000 data <- sample.int(num_classes, num_samples, prob = initial_dist, replace = TRUE) dataset <- tensor_slices_dataset(data) tally <- c(0, 0) `add<-` <- function (x, value) x + value # tfautograph::autograph({ # for(i in dataset) # add(tally[as.numeric(i)]) <- 1 # }) dataset %>% as_array_iterator() %>% iterate(function(i) { add(tally[i]) <<- 1 }, simplify = FALSE) # The value of `tally` will be close to c(50000, 50000) as # per the `initial_dist` distribution. tally # c(50287, 49713) tally <- c(0, 0) dataset %>% dataset_rejection_resample( class_func = function(x) (x-1) %% 2, target_dist = target_dist, initial_dist = initial_dist ) %>% as_array_iterator() %>% iterate(function(element) { names(element) <- c("class_id", "i") add(tally[element$i]) <<- 1 }, simplify = FALSE) # The value of tally will be now be close to c(75000, 50000) # thus satisfying the target_dist distribution. tally # c(74822, 49921) ## End(Not run)
Repeats a dataset count times.
dataset_repeat(dataset, count = NULL)
dataset_repeat(dataset, count = NULL)
dataset |
A dataset |
count |
(Optional.) An integer value representing the number of times
the elements of this dataset should be repeated. The default behavior (if
|
A dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
A transformation that scans a function across an input dataset
dataset_scan(dataset, initial_state, scan_func)
dataset_scan(dataset, initial_state, scan_func)
dataset |
A tensorflow dataset |
initial_state |
A nested structure of tensors, representing the initial state of the accumulator. |
scan_func |
A function that maps |
This transformation is a stateful relative of dataset_map()
.
In addition to mapping scan_func
across the elements of the input dataset,
scan()
accumulates one or more state tensors, whose initial values are
initial_state
.
## Not run: initial_state <- as_tensor(0, dtype="int64") scan_func <- function(state, i) list(state + i, state + i) dataset <- range_dataset(0, 10) %>% dataset_scan(initial_state, scan_func) reticulate::iterate(dataset, as.array) %>% unlist() # 0 1 3 6 10 15 21 28 36 45 ## End(Not run)
## Not run: initial_state <- as_tensor(0, dtype="int64") scan_func <- function(state, i) list(state + i, state + i) dataset <- range_dataset(0, 10) %>% dataset_scan(initial_state, scan_func) reticulate::iterate(dataset, as.array) %>% unlist() # 0 1 3 6 10 15 21 28 36 45 ## End(Not run)
This dataset operator is very useful when running distributed training, as it allows each worker to read a unique subset.
dataset_shard(dataset, num_shards, index)
dataset_shard(dataset, num_shards, index)
dataset |
A dataset |
num_shards |
A integer representing the number of shards operating in parallel. |
index |
A integer, representing the worker index. |
A dataset
Randomly shuffles the elements of this dataset.
dataset_shuffle( dataset, buffer_size, seed = NULL, reshuffle_each_iteration = NULL )
dataset_shuffle( dataset, buffer_size, seed = NULL, reshuffle_each_iteration = NULL )
dataset |
A dataset |
buffer_size |
An integer, representing the number of elements from this dataset from which the new dataset will sample. |
seed |
(Optional) An integer, representing the random seed that will be used to create the distribution. |
reshuffle_each_iteration |
(Optional) A boolean, which if true indicates
that the dataset should be pseudorandomly reshuffled each time it is iterated
over. (Defaults to |
A dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Shuffles and repeats a dataset returning a new permutation for each epoch.
dataset_shuffle_and_repeat(dataset, buffer_size, count = NULL, seed = NULL)
dataset_shuffle_and_repeat(dataset, buffer_size, count = NULL, seed = NULL)
dataset |
A dataset |
buffer_size |
An integer, representing the number of elements from this dataset from which the new dataset will sample. |
count |
(Optional.) An integer value representing the number of times
the elements of this dataset should be repeated. The default behavior (if
|
seed |
(Optional) An integer, representing the random seed that will be used to create the distribution. |
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Creates a dataset that skips count elements from this dataset
dataset_skip(dataset, count)
dataset_skip(dataset, count)
dataset |
A dataset |
count |
An integer, representing the number of elements of this dataset that should be skipped to form the new dataset. If count is greater than the size of this dataset, the new dataset will contain no elements. If count is -1, skips the entire dataset. |
A dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_take()
,
dataset_take_while()
,
dataset_window()
Persist the output of a dataset
dataset_snapshot( dataset, path, compression = c("AUTO", "GZIP", "SNAPPY", "None"), reader_func = NULL, shard_func = NULL )
dataset_snapshot( dataset, path, compression = c("AUTO", "GZIP", "SNAPPY", "None"), reader_func = NULL, shard_func = NULL )
dataset |
A tensorflow dataset |
path |
Required. A directory to use for storing/loading the snapshot to/from. |
compression |
Optional. The type of compression to apply to the snapshot
written to disk. Supported options are |
reader_func |
Optional. A function to control how to read data from snapshot shards. |
shard_func |
Optional. A function to control how to shard data when writing a snapshot. |
The snapshot API allows users to transparently persist the output of their preprocessing pipeline to disk, and materialize the pre-processed data on a different training run.
This API enables repeated preprocessing steps to be consolidated, and allows re-use of already processed data, trading off disk storage and network bandwidth for freeing up more valuable CPU resources and accelerator compute time.
https://github.com/tensorflow/community/blob/master/rfcs/20200107-tf-data-snapshot.md has detailed design documentation of this feature.
Users can specify various options to control the behavior of snapshot,
including how snapshots are read from and written to by passing in
user-defined functions to the reader_func
and shard_func
parameters.
shard_func
is a user specified function that maps input elements to
snapshot shards.
NUM_SHARDS <- parallel::detectCores() dataset %>% dataset_enumerate() %>% dataset_snapshot( "/path/to/snapshot/dir", shard_func = function(index, ds_elem) x %% NUM_SHARDS) %>% dataset_map(function(index, ds_elem) ds_elem)
reader_func
is a user specified function that accepts a single argument:
a Dataset of Datasets, each representing a "split" of elements of the
original dataset. The cardinality of the input dataset matches the
number of the shards specified in the shard_func
. The function
should return a Dataset of elements of the original dataset.
Users may want specify this function to control how snapshot files should be read from disk, including the amount of shuffling and parallelism.
Here is an example of a standard reader function a user can define. This function enables both dataset shuffling and parallel reading of datasets:
user_reader_func <- function(datasets) { num_cores <- parallel::detectCores() datasets %>% dataset_shuffle(num_cores) %>% dataset_interleave(function(x) x, num_parallel_calls=AUTOTUNE) } dataset <- dataset %>% dataset_snapshot("/path/to/snapshot/dir", reader_func = user_reader_func)
By default, snapshot parallelizes reads by the number of cores available on the system, but will not attempt to shuffle the data.
Creates a dataset with at most count elements from this dataset
dataset_take(dataset, count)
dataset_take(dataset, count)
dataset |
A dataset |
count |
Integer representing the number of elements of this dataset that
should be taken to form the new dataset. If |
A dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take_while()
,
dataset_window()
A transformation that stops dataset iteration based on a predicate.
dataset_take_while(dataset, predicate, name = NULL)
dataset_take_while(dataset, predicate, name = NULL)
dataset |
A TF dataset |
predicate |
A function that maps a nested structure of tensors (having
shapes and types defined by |
name |
(Optional.) A name for the tf.data operation. |
Example usage:
range_dataset(from = 0, to = 10) %>% dataset_take_while( ~ .x < 5) %>% as_array_iterator() %>% iterate(simplify = FALSE) %>% str() #> List of 5 #> $ : num 0 #> $ : num 1 #> $ : num 2 #> $ : num 3 #> $ : num 4
A TF Dataset
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_window()
Splits elements of a dataset into multiple elements.
dataset_unbatch(dataset, name = NULL)
dataset_unbatch(dataset, name = NULL)
dataset |
A dataset |
name |
(Optional.) A name for the tf.data operation. |
Use this transformation to produce a dataset that contains one instance of each unique element in the input (See example).
dataset_unique(dataset, name = NULL)
dataset_unique(dataset, name = NULL)
dataset |
A tf.Dataset. |
name |
(Optional.) A name for the tf.data operation. |
A tf.Dataset
This transformation only supports datasets which fit into memory and have elements of either tf.int32, tf.int64 or tf.string type.
## Not run: c(0, 37, 2, 37, 2, 1) %>% as_tensor("int32") %>% tensor_slices_dataset() %>% dataset_unique() %>% as_array_iterator() %>% iterate() %>% sort() # [1] 0 1 2 37 ## End(Not run)
## Not run: c(0, 37, 2, 37, 2, 1) %>% as_tensor("int32") %>% tensor_slices_dataset() %>% dataset_unique() %>% as_array_iterator() %>% iterate() %>% sort() # [1] 0 1 2 37 ## End(Not run)
Prepares the dataset to be used directly in a model.The transformed dataset is prepared to return tuples (x,y) that can be used directly in Keras.
dataset_use_spec(dataset, spec)
dataset_use_spec(dataset, spec)
dataset |
A TensorFlow dataset. |
spec |
A feature specification created with |
A TensorFlow dataset.
feature_spec()
to initialize the feature specification.
fit.FeatureSpec()
to create a tensorflow dataset prepared to modeling.
steps to a list of all implemented steps.
Other Feature Spec Functions:
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Combines input elements into a dataset of windows.
dataset_window(dataset, size, shift = NULL, stride = 1, drop_remainder = FALSE)
dataset_window(dataset, size, shift = NULL, stride = 1, drop_remainder = FALSE)
dataset |
A dataset |
size |
representing the number of elements of the input dataset to combine into a window. |
shift |
epresenting the forward shift of the sliding window in each
iteration. Defaults to |
stride |
representing the stride of the input elements in the sliding window. |
drop_remainder |
representing whether a window should be dropped in
case its size is smaller |
Other dataset methods:
dataset_batch()
,
dataset_cache()
,
dataset_collect()
,
dataset_concatenate()
,
dataset_decode_delim()
,
dataset_filter()
,
dataset_interleave()
,
dataset_map()
,
dataset_map_and_batch()
,
dataset_padded_batch()
,
dataset_prefetch()
,
dataset_prefetch_to_device()
,
dataset_reduce()
,
dataset_repeat()
,
dataset_shuffle()
,
dataset_shuffle_and_repeat()
,
dataset_skip()
,
dataset_take()
,
dataset_take_while()
Specification for reading a record from a text file with delimited values
delim_record_spec( example_file, delim = ",", skip = 0, names = NULL, types = NULL, defaults = NULL ) csv_record_spec( example_file, skip = 0, names = NULL, types = NULL, defaults = NULL ) tsv_record_spec( example_file, skip = 0, names = NULL, types = NULL, defaults = NULL )
delim_record_spec( example_file, delim = ",", skip = 0, names = NULL, types = NULL, defaults = NULL ) csv_record_spec( example_file, skip = 0, names = NULL, types = NULL, defaults = NULL ) tsv_record_spec( example_file, skip = 0, names = NULL, types = NULL, defaults = NULL )
example_file |
File that provides an example of the records to be read. If you don't explicitly specify names and types (or defaults) then this file will be read to generate default values. |
delim |
Character delimiter to separate fields in a record (defaults to ",") |
skip |
Number of lines to skip before reading data. Note that if
|
names |
Character vector with column names (or If If |
types |
Column types. If Types can be explicitliy specified in a character vector as "integer",
"double", and "character" (e.g. Alternatively, you can use a compact string representation where each
character represents one column: c = character, i = integer, d = double
(e.g. |
defaults |
List of default values which are used when data is
missing from a record (e.g. |
Retrives the Dense Features from a spec.
dense_features(spec)
dense_features(spec)
spec |
A feature specification created with |
A list of feature columns.
Used to create initialize a feature columns specification.
feature_spec(dataset, x, y = NULL)
feature_spec(dataset, x, y = NULL)
dataset |
A TensorFlow dataset. |
x |
Features to include can use |
y |
(Optional) The response variable. Can also be specified using
a |
After creating the feature_spec
object you can add steps using the
step
functions.
a FeatureSpec
object.
fit.FeatureSpec()
to fit the FeatureSpec
dataset_use_spec()
to create a tensorflow dataset prepared to modeling.
steps to a list of all implemented steps.
Other Feature Spec Functions:
dataset_use_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ .) # select using `tidyselect` helpers spec <- feature_spec(hearts, x = c(thal, age), y = target) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ .) # select using `tidyselect` helpers spec <- feature_spec(hearts, x = c(thal, age), y = target) ## End(Not run)
A dataset of all files matching a pattern
file_list_dataset(file_pattern, shuffle = NULL, seed = NULL)
file_list_dataset(file_pattern, shuffle = NULL, seed = NULL)
file_pattern |
A string, representing the filename pattern that will be matched. |
shuffle |
(Optional) If |
seed |
(Optional) An integer, representing the random seed that will be used to create the distribution. |
For example, if we had the following files on our filesystem:
/path/to/dir/a.txt
/path/to/dir/b.csv
/path/to/dir/c.csv
If we pass "/path/to/dir/*.csv"
as the file_pattern
, the dataset would produce:
/path/to/dir/b.csv
/path/to/dir/c.csv
A dataset of string corresponding to file names
The shuffle
and seed
arguments only apply for TensorFlow >= v1.8
This function will fit
the specification. Depending
on the steps added to the specification it will compute
for example, the levels of categorical features, normalization
constants, etc.
## S3 method for class 'FeatureSpec' fit(object, dataset = NULL, ...)
## S3 method for class 'FeatureSpec' fit(object, dataset = NULL, ...)
object |
A feature specification created with |
dataset |
(Optional) A TensorFlow dataset. If |
... |
(unused) |
a fitted FeatureSpec
object.
feature_spec()
to initialize the feature specification.
dataset_use_spec()
to create a tensorflow dataset prepared to modeling.
steps to a list of all implemented steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) spec_fit <- fit(spec) spec_fit ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) spec_fit <- fit(spec) spec_fit ## End(Not run)
A dataset of fixed-length records from one or more binary files.
fixed_length_record_dataset( filenames, record_bytes, header_bytes = NULL, footer_bytes = NULL, buffer_size = NULL )
fixed_length_record_dataset( filenames, record_bytes, header_bytes = NULL, footer_bytes = NULL, buffer_size = NULL )
filenames |
A string tensor containing one or more filenames. |
record_bytes |
An integer representing the number of bytes in each record. |
header_bytes |
(Optional) An integer scalar representing the number of bytes to skip at the start of a file. |
footer_bytes |
(Optional) A integer scalar representing the number of bytes to ignore at the end of a file. |
buffer_size |
(Optional) A integer scalar representing the number of bytes to buffer when reading. |
A dataset
Can only be used inside the steps specifications to find variables by type.
has_type(match = "float32")
has_type(match = "float32")
match |
A list of types to match. |
Other Selectors:
all_nominal()
,
all_numeric()
Heart disease (angiographic disease status) dataset.
hearts
hearts
A data frame with 303 rows and 14 variables:
age in years
sex (1 = male; 0 = female)
chest pain type: Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic
resting blood pressure (in mm Hg on admission to the hospital)
serum cholestoral in mg/dl
(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
resting electrocardiographic results: Value 0: normal, Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
maximum heart rate achieved
exercise induced angina (1 = yes; 0 = no)
ST depression induced by exercise relative to rest
the slope of the peak exercise ST segment: Value 1: upsloping, Value 2: flat, Value 3: downsloping
number of major vessels (0-3) colored by flourosopy
3 = normal; 6 = fixed defect; 7 = reversable defect
diagnosis of heart disease angiographic
https://archive.ics.uci.edu/ml/datasets/heart+Disease
The authors of the databases have requested that any publications resulting from the use of the data include the names of the principal investigator responsible for the data collection at each institution. They would be:
Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.
Construct a tfestimators input function from a dataset
input_fn.tf_dataset(dataset, features, response = NULL)
input_fn.tf_dataset(dataset, features, response = NULL)
dataset |
A dataset |
features |
The names of feature variables to be used. |
response |
The name of the response variable. |
Creating an input_fn from a dataset requires that the dataset
consist of a set of named output tensors (e.g. like the dataset
produced by the tfrecord_dataset()
or text_line_dataset()
function).
An input_fn suitable for use with tfestimators train, evaluate, and predict methods
Returns a nested list of tensors that when evaluated will yield the next element(s) in the dataset.
iterator_get_next(iterator, name = NULL)
iterator_get_next(iterator, name = NULL)
iterator |
An iterator |
name |
(Optional) A name for the created operation. |
A nested list of tensors
Other iterator functions:
iterator_initializer()
,
iterator_make_initializer()
,
iterator_string_handle()
,
make-iterator
An operation that should be run to initialize this iterator.
iterator_initializer(iterator)
iterator_initializer(iterator)
iterator |
An iterator |
Other iterator functions:
iterator_get_next()
,
iterator_make_initializer()
,
iterator_string_handle()
,
make-iterator
Create an operation that can be run to initialize this iterator
iterator_make_initializer(iterator, dataset, name = NULL)
iterator_make_initializer(iterator, dataset, name = NULL)
iterator |
An iterator |
dataset |
A dataset |
name |
(Optional) A name for the created operation. |
A tf$Operation that can be run to initialize this iterator on the given dataset.
Other iterator functions:
iterator_get_next()
,
iterator_initializer()
,
iterator_string_handle()
,
make-iterator
String-valued tensor that represents this iterator
iterator_string_handle(iterator, name = NULL)
iterator_string_handle(iterator, name = NULL)
iterator |
An iterator |
name |
(Optional) A name for the created operation. |
Scalar tensor of type string
Other iterator functions:
iterator_get_next()
,
iterator_initializer()
,
iterator_make_initializer()
,
make-iterator
DEPRECATED: Use keras3::layer_feature_space()
instead.
layer_input_from_dataset(dataset)
layer_input_from_dataset(dataset)
dataset |
a TensorFlow dataset or a data.frame |
Create a list ok Keras input layers that can be used together
with keras::layer_dense_features()
.
a list of Keras input layers
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age + slope) %>% step_numeric_column(age, slope) %>% step_bucketized_column(age, boundaries = c(10, 20, 30)) spec <- fit(spec) dataset <- hearts %>% dataset_use_spec(spec) input <- layer_input_from_dataset(dataset) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age + slope) %>% step_numeric_column(age, slope) %>% step_bucketized_column(age, boundaries = c(10, 20, 30)) spec <- fit(spec) dataset <- hearts %>% dataset_use_spec(spec) input <- layer_input_from_dataset(dataset) ## End(Not run)
Returns the length of the dataset.
## S3 method for class 'tf_dataset' length(x) ## S3 method for class 'tensorflow.python.data.ops.dataset_ops.DatasetV2' length(x)
## S3 method for class 'tf_dataset' length(x) ## S3 method for class 'tensorflow.python.data.ops.dataset_ops.DatasetV2' length(x)
x |
a |
Either Inf
if the dataset is infinite, NA
if the dataset length
is unknown, or an R numeric if it is known.
## Not run: range_dataset(0, 42) %>% length() # 42 range_dataset(0, 42) %>% dataset_repeat() %>% length() # Inf range_dataset(0, 42) %>% dataset_repeat() %>% dataset_filter(function(x) TRUE) %>% length() # NA ## End(Not run)
## Not run: range_dataset(0, 42) %>% length() # 42 range_dataset(0, 42) %>% dataset_repeat() %>% length() # Inf range_dataset(0, 42) %>% dataset_repeat() %>% dataset_filter(function(x) TRUE) %>% length() # NA ## End(Not run)
Reads CSV files into a dataset, where each element is a (features, labels) list that corresponds to a batch of CSV rows. The features dictionary maps feature column names to tensors containing the corresponding feature data, and labels is a tensor containing the batch's label data.
make_csv_dataset( file_pattern, batch_size, column_names = NULL, column_defaults = NULL, label_name = NULL, select_columns = NULL, field_delim = ",", use_quote_delim = TRUE, na_value = "", header = TRUE, num_epochs = NULL, shuffle = TRUE, shuffle_buffer_size = 10000, shuffle_seed = NULL, prefetch_buffer_size = 1, num_parallel_reads = 1, num_parallel_parser_calls = 2, sloppy = FALSE, num_rows_for_inference = 100 )
make_csv_dataset( file_pattern, batch_size, column_names = NULL, column_defaults = NULL, label_name = NULL, select_columns = NULL, field_delim = ",", use_quote_delim = TRUE, na_value = "", header = TRUE, num_epochs = NULL, shuffle = TRUE, shuffle_buffer_size = 10000, shuffle_seed = NULL, prefetch_buffer_size = 1, num_parallel_reads = 1, num_parallel_parser_calls = 2, sloppy = FALSE, num_rows_for_inference = 100 )
file_pattern |
List of files or glob patterns of file paths containing CSV records. |
batch_size |
An integer representing the number of records to combine in a single batch. |
column_names |
An optional list of strings that corresponds to the CSV columns, in order. One per column of the input record. If this is not provided, infers the column names from the first row of the records. These names will be the keys of the features dict of each dataset element. |
column_defaults |
A optional list of default values for the CSV fields. One item
per selected column of the input record. Each item in the list is either a valid CSV
dtype (integer, numeric, or string), or a tensor with one of the
aforementioned types. The tensor can either be a scalar default value (if the column
is optional), or an empty tensor (if the column is required). If a dtype is provided
instead of a tensor, the column is also treated as required. If this list is not
provided, tries to infer types based on reading the first |
label_name |
A optional string corresponding to the label column. If provided, the data for this column is returned as a separate tensor from the features dictionary, so that the dataset complies with the format expected by a TF Estiamtors and Keras. |
select_columns |
(Ignored if using TensorFlow version 1.8.) An optional list of
integer indices or string column names, that specifies a subset of columns of CSV data
to select. If column names are provided, these must correspond to names provided in
|
field_delim |
An optional string. Defaults to |
use_quote_delim |
An optional bool. Defaults to |
na_value |
Additional string to recognize as NA/NaN. |
header |
A bool that indicates whether the first rows of provided CSV files correspond to header lines with column names, and should not be included in the data. |
num_epochs |
An integer specifying the number of times this dataset is repeated. If NULL, cycles through the dataset forever. |
shuffle |
A bool that indicates whether the input should be shuffled. |
shuffle_buffer_size |
Buffer size to use for shuffling. A large buffer size ensures better shuffling, but increases memory usage and startup time. |
shuffle_seed |
Randomization seed to use for shuffling. |
prefetch_buffer_size |
An int specifying the number of feature batches to prefetch for performance improvement. Recommended value is the number of batches consumed per training step. |
num_parallel_reads |
Number of threads used to read CSV records from files. If >1, the results will be interleaved. |
num_parallel_parser_calls |
(Ignored if using TensorFlow version 1.11 or later.) Number of parallel invocations of the CSV parsing function on CSV records. |
sloppy |
If |
num_rows_for_inference |
Number of rows of a file to use for type inference if
record_defaults is not provided. If |
A dataset, where each element is a (features, labels) list that corresponds to
a batch of batch_size
CSV rows. The features dictionary maps feature column names
to tensors containing the corresponding column data, and labels is a tensor
containing the column data for the label column specified by label_name
.
Creates an iterator for enumerating the elements of this dataset.
make_iterator_one_shot(dataset) make_iterator_initializable(dataset, shared_name = NULL) make_iterator_from_structure( output_types, output_shapes = NULL, shared_name = NULL ) make_iterator_from_string_handle( string_handle, output_types, output_shapes = NULL )
make_iterator_one_shot(dataset) make_iterator_initializable(dataset, shared_name = NULL) make_iterator_from_structure( output_types, output_shapes = NULL, shared_name = NULL ) make_iterator_from_string_handle( string_handle, output_types, output_shapes = NULL )
dataset |
A dataset |
shared_name |
(Optional) If non-empty, the returned iterator will be shared under the given name across multiple sessions that share the same devices (e.g. when using a remote server). |
output_types |
A nested structure of tf$DType objects corresponding to each component of an element of this iterator. |
output_shapes |
(Optional) A nested structure of tf$TensorShape objects corresponding to each component of an element of this dataset. If omitted, each component will have an unconstrainted shape. |
string_handle |
A scalar tensor of type string that evaluates
to a handle produced by the |
An Iterator over the elements of this dataset.
For make_iterator_one_shot()
, the returned
iterator will be initialized automatically. A "one-shot" iterator does not
currently support re-initialization.
For make_iterator_initializable()
,
the returned iterator will be in an uninitialized state, and you must run
the object returned from iterator_initializer()
before using it.
For make_iterator_from_structure()
, the returned iterator is not bound
to a particular dataset, and it has no initializer. To initialize the
iterator, run the operation returned by iterator_make_initializer()
.
Other iterator functions:
iterator_get_next()
,
iterator_initializer()
,
iterator_make_initializer()
,
iterator_string_handle()
Tensor(s) for retrieving the next batch from a dataset
next_batch(dataset)
next_batch(dataset)
dataset |
A dataset |
To access the underlying data within the dataset you iteratively evaluate the tensor(s) to read batches of data.
Note that in many cases you won't need to explicitly evaluate the tensors. Rather, you will pass the tensors to another function that will perform the evaluation (e.g. the Keras layer_input() and compile() functions).
If you do need to perform iteration manually by evaluating the tensors, there are a couple of possible approaches to controlling/detecting when iteration should end.
One approach is to create a dataset that yields batches infinitely (traversing the dataset multiple times with different batches randomly drawn). In this case you'd use another mechanism like a global step counter or detecting a learning plateau.
Another approach is to detect when all batches have been yielded
from the dataset. When the tensor reaches the end of iteration a runtime
error will occur. You can catch and ignore the error when it occurs by wrapping
your iteration code in the with_dataset()
function.
See the examples below for a demonstration of each of these methods of iteration.
Tensor(s) that can be evaluated to yield the next batch of training data.
## Not run: # iteration with 'infinite' dataset and explicit step counter library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_prepare(x = c(mpg, disp), y = cyl) %>% dataset_shuffle(5000) %>% dataset_batch(128) %>% dataset_repeat() # repeat infinitely batch <- next_batch(dataset) steps <- 200 for (i in 1:steps) { # use batch$x and batch$y tensors } # iteration that detects and ignores end of iteration error library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_prepare(x = c(mpg, disp), y = cyl) %>% dataset_batch(128) %>% dataset_repeat(10) batch <- next_batch(dataset) with_dataset({ while(TRUE) { # use batch$x and batch$y tensors } }) ## End(Not run)
## Not run: # iteration with 'infinite' dataset and explicit step counter library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_prepare(x = c(mpg, disp), y = cyl) %>% dataset_shuffle(5000) %>% dataset_batch(128) %>% dataset_repeat() # repeat infinitely batch <- next_batch(dataset) steps <- 200 for (i in 1:steps) { # use batch$x and batch$y tensors } # iteration that detects and ignores end of iteration error library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_prepare(x = c(mpg, disp), y = cyl) %>% dataset_batch(128) %>% dataset_repeat(10) batch <- next_batch(dataset) with_dataset({ while(TRUE) { # use batch$x and batch$y tensors } }) ## End(Not run)
Output types and shapes
output_types(object) output_shapes(object)
output_types(object) output_shapes(object)
object |
A dataset or iterator |
output_types()
returns the type of each component of an element of
this object; output_shapes()
returns the shape of each component of an
element of this object
Dataset
of pseudorandom valuesCreates a Dataset
of pseudorandom values
random_integer_dataset(seed = NULL)
random_integer_dataset(seed = NULL)
seed |
(Optional) If specified, the dataset produces a deterministic sequence of values. |
The dataset generates a sequence of uniformly distributed integer values (dtype int64).
Creates a dataset of a step-separated range of values.
range_dataset(from = 0, to = 0, by = 1, ..., dtype = tf$int64)
range_dataset(from = 0, to = 0, by = 1, ..., dtype = tf$int64)
from |
Range start |
to |
Range end (exclusive) |
by |
Increment of the sequence |
... |
ignored |
dtype |
Output dtype. (Optional, default: |
Read files into a dataset, optionally processing them in parallel.
read_files( files, reader, ..., parallel_files = 1, parallel_interleave = 1, num_shards = NULL, shard_index = NULL )
read_files( files, reader, ..., parallel_files = 1, parallel_interleave = 1, num_shards = NULL, shard_index = NULL )
files |
List of filenames or glob pattern for files (e.g. "*.csv") |
reader |
Function that maps a file into a dataset (e.g.
|
... |
Additional arguments to pass to |
parallel_files |
An integer, number of files to process in parallel |
parallel_interleave |
An integer, number of consecutive records to produce from each file before cycling to another file. |
num_shards |
An integer representing the number of shards operating in parallel. |
shard_index |
An integer, representing the worker index. Shared indexes are 0 based so for e.g. 8 shards valid indexes would be 0-7. |
A dataset
datasets
.Samples elements at random from the datasets in datasets
.
sample_from_datasets( datasets, weights = NULL, seed = NULL, stop_on_empty_dataset = TRUE )
sample_from_datasets( datasets, weights = NULL, seed = NULL, stop_on_empty_dataset = TRUE )
datasets |
A list ofobjects with compatible structure. |
weights |
(Optional.) A list of |
seed |
(Optional.) An integer, representing the random seed that will be used to create the distribution. |
stop_on_empty_dataset |
If |
A dataset that interleaves elements from datasets
at random, according to
weights
if provided, otherwise with uniform probability.
scaler_standard: mean and standard deviation normalizer.
scaler_min_max: min max normalizer
This scaler will learn the min and max of the numeric variable
and use this to create a normalizer_fn
.
scaler_min_max()
scaler_min_max()
scaler to a complete list of normalizers
Other scaler:
scaler_standard()
This scaler will learn the mean and the standard deviation
and use this to create a normalizer_fn
.
scaler_standard()
scaler_standard()
scaler to a complete list of normalizers
Other scaler:
scaler_min_max()
List of selectors that can be used to specify variables inside steps.
cur_info_env
cur_info_env
An object of class environment
of length 0.
tf$SparseTensor
in this dataset row-wise.Splits each rank-N tf$SparseTensor
in this dataset row-wise.
sparse_tensor_slices_dataset(sparse_tensor)
sparse_tensor_slices_dataset(sparse_tensor)
sparse_tensor |
A |
A dataset of rank-(N-1) sparse tensors.
Other tensor datasets:
tensor_slices_dataset()
,
tensors_dataset()
A dataset consisting of the results from a SQL query
sql_record_spec(names, types) sql_dataset(driver_name, data_source_name, query, record_spec) sqlite_dataset(filename, query, record_spec)
sql_record_spec(names, types) sql_dataset(driver_name, data_source_name, query, record_spec) sqlite_dataset(filename, query, record_spec)
names |
Names of columns returned from the query |
types |
List of |
driver_name |
String containing the database type. Currently, the only supported value is 'sqlite'. |
data_source_name |
String containing a connection string to connect to the database. |
query |
String containing the SQL query to execute. |
record_spec |
Names and types of database columns |
filename |
Filename for the database |
A dataset
Use this step to create bucketized columns from numeric columns.
step_bucketized_column(spec, ..., boundaries)
step_bucketized_column(spec, ..., boundaries)
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
boundaries |
A sorted list or tuple of floats specifying the boundaries. |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) %>% step_bucketized_column(age, boundaries = c(10, 20, 30)) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) %>% step_bucketized_column(age, boundaries = c(10, 20, 30)) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Represents sparse feature where ids are set by hashing.
step_categorical_column_with_hash_bucket( spec, ..., hash_bucket_size, dtype = tf$string )
step_categorical_column_with_hash_bucket( spec, ..., hash_bucket_size, dtype = tf$string )
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
hash_bucket_size |
An int > 1. The number of buckets. |
dtype |
The type of features. Only string and integer types are supported. |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_hash_bucket(thal, hash_bucket_size = 3) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_hash_bucket(thal, hash_bucket_size = 3) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Use this when your inputs are integers in the range [0-num_buckets)
.
step_categorical_column_with_identity( spec, ..., num_buckets, default_value = NULL )
step_categorical_column_with_identity( spec, ..., num_buckets, default_value = NULL )
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
num_buckets |
Range of inputs and outputs is |
default_value |
If |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts$thal <- as.integer(as.factor(hearts$thal)) - 1L hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_identity(thal, num_buckets = 5) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts$thal <- as.integer(as.factor(hearts$thal)) - 1L hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_identity(thal, num_buckets = 5) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Use this function when the vocabulary of a categorical variable is written to a file.
step_categorical_column_with_vocabulary_file( spec, ..., vocabulary_file, vocabulary_size = NULL, dtype = tf$string, default_value = NULL, num_oov_buckets = 0L )
step_categorical_column_with_vocabulary_file( spec, ..., vocabulary_file, vocabulary_size = NULL, dtype = tf$string, default_value = NULL, num_oov_buckets = 0L )
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
vocabulary_file |
The vocabulary file name. |
vocabulary_size |
Number of the elements in the vocabulary. This
must be no greater than length of |
dtype |
The type of features. Only string and integer types are supported. |
default_value |
The integer ID value to return for out-of-vocabulary
feature values, defaults to |
num_oov_buckets |
Non-negative integer, the number of out-of-vocabulary
buckets. All out-of-vocabulary inputs will be assigned IDs in the range
|
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_file(thal, vocabulary_file = file) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_file(thal, vocabulary_file = file) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Creates a categorical column specification
step_categorical_column_with_vocabulary_list( spec, ..., vocabulary_list = NULL, dtype = NULL, default_value = -1L, num_oov_buckets = 0L )
step_categorical_column_with_vocabulary_list( spec, ..., vocabulary_list = NULL, dtype = NULL, default_value = -1L, num_oov_buckets = 0L )
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
vocabulary_list |
An ordered iterable defining the vocabulary. Each
feature is mapped to the index of its value (if present) in vocabulary_list.
Must be castable to |
dtype |
The type of features. Only string and integer types are supported.
If |
default_value |
The integer ID value to return for out-of-vocabulary feature
values, defaults to |
num_oov_buckets |
Non-negative integer, the number of out-of-vocabulary buckets.
All out-of-vocabulary inputs will be assigned IDs in the range
|
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_list(thal) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_list(thal) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Use this step to create crosses between categorical columns.
step_crossed_column(spec, ..., hash_bucket_size, hash_key = NULL)
step_crossed_column(spec, ..., hash_bucket_size, hash_key = NULL)
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
hash_bucket_size |
An int > 1. The number of buckets. |
hash_key |
(optional) Specify the hash_key that will be used by the FingerprintCat64 function to combine the crosses fingerprints on SparseCrossOp. |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) %>% step_bucketized_column(age, boundaries = c(10, 20, 30)) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age) %>% step_bucketized_column(age, boundaries = c(10, 20, 30)) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Use this step to create ambeddings columns from categorical columns.
step_embedding_column( spec, ..., dimension = function(x) { as.integer(x^0.25) }, combiner = "mean", initializer = NULL, ckpt_to_load_from = NULL, tensor_name_in_ckpt = NULL, max_norm = NULL, trainable = TRUE )
step_embedding_column( spec, ..., dimension = function(x) { as.integer(x^0.25) }, combiner = "mean", initializer = NULL, ckpt_to_load_from = NULL, tensor_name_in_ckpt = NULL, max_norm = NULL, trainable = TRUE )
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
dimension |
An integer specifying dimension of the embedding, must be > 0. Can also be a function of the size of the vocabulary. |
combiner |
A string specifying how to reduce if there are multiple entries in
a single row. Currently 'mean', 'sqrtn' and 'sum' are supported, with 'mean' the
default. 'sqrtn' often achieves good accuracy, in particular with bag-of-words
columns. Each of this can be thought as example level normalizations on
the column. For more information, see |
initializer |
A variable initializer function to be used in embedding
variable initialization. If not specified, defaults to
|
ckpt_to_load_from |
String representing checkpoint name/pattern from
which to restore column weights. Required if |
tensor_name_in_ckpt |
Name of the Tensor in ckpt_to_load_from from which to
restore the column weights. Required if |
max_norm |
If not |
trainable |
Whether or not the embedding is trainable. Default is |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_list(thal) %>% step_embedding_column(thal, dimension = 3) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_list(thal) %>% step_embedding_column(thal, dimension = 3) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Use this step to create indicator columns from categorical columns.
step_indicator_column(spec, ...)
step_indicator_column(spec, ...)
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_list(thal) %>% step_indicator_column(thal) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) file <- tempfile() writeLines(unique(hearts$thal), file) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ thal) %>% step_categorical_column_with_vocabulary_list(thal) %>% step_indicator_column(thal) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
step_numeric_column
creates a numeric column specification. It can also be
used to normalize numeric columns.
step_numeric_column( spec, ..., shape = 1L, default_value = NULL, dtype = tf$float32, normalizer_fn = NULL )
step_numeric_column( spec, ..., shape = 1L, default_value = NULL, dtype = tf$float32, normalizer_fn = NULL )
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
shape |
An iterable of integers specifies the shape of the Tensor. An integer can be given
which means a single dimension Tensor with given width. The Tensor representing the column will
have the shape of |
default_value |
A single value compatible with |
dtype |
defines the type of values. Default value is |
normalizer_fn |
If not |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_remove_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age, normalizer_fn = standard_scaler()) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age, normalizer_fn = standard_scaler()) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
Removes features of the feature specification.
step_remove_column(spec, ...)
step_remove_column(spec, ...)
spec |
A feature specification created with |
... |
Comma separated list of variable names to apply the step. selectors can also be used. |
a FeatureSpec
object.
steps for a complete list of allowed steps.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_shared_embeddings_column()
,
steps
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age, normalizer_fn = scaler_standard()) %>% step_bucketized_column(age, boundaries = c(20, 50)) %>% step_remove_column(age) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
## Not run: library(tfdatasets) data(hearts) hearts <- tensor_slices_dataset(hearts) %>% dataset_batch(32) # use the formula interface spec <- feature_spec(hearts, target ~ age) %>% step_numeric_column(age, normalizer_fn = scaler_standard()) %>% step_bucketized_column(age, boundaries = c(20, 50)) %>% step_remove_column(age) spec_fit <- fit(spec) final_dataset <- hearts %>% dataset_use_spec(spec_fit) ## End(Not run)
List of steps that can be used to specify columns in the feature_spec
interface.
step_numeric_column()
to define numeric columns.
step_categorical_column_with_vocabulary_list()
to define categorical columns.
step_categorical_column_with_hash_bucket()
to define categorical columns
where ids are set by hashing.
step_categorical_column_with_identity()
to define categorical columns
represented by integers in the range [0-num_buckets)
.
step_categorical_column_with_vocabulary_file()
to define categorical columns
when their vocabulary is available in a file.
step_indicator_column()
to create indicator columns from categorical columns.
step_embedding_column()
to create embeddings columns from categorical columns.
step_bucketized_column()
to create bucketized columns from numeric columns.
step_crossed_column()
to perform crosses of categorical columns.
step_shared_embeddings_column()
to share embeddings between a list of
categorical columns.
step_remove_column()
to remove columns from the specification.
selectors for a list of selectors that can be used to specify variables.
Other Feature Spec Functions:
dataset_use_spec()
,
feature_spec()
,
fit.FeatureSpec()
,
step_bucketized_column()
,
step_categorical_column_with_hash_bucket()
,
step_categorical_column_with_identity()
,
step_categorical_column_with_vocabulary_file()
,
step_categorical_column_with_vocabulary_list()
,
step_crossed_column()
,
step_embedding_column()
,
step_indicator_column()
,
step_numeric_column()
,
step_remove_column()
,
step_shared_embeddings_column()
Creates a dataset whose elements are slices of the given tensors.
tensor_slices_dataset(tensors)
tensor_slices_dataset(tensors)
tensors |
A nested structure of tensors, each having the same size in the first dimension. |
A dataset.
Other tensor datasets:
sparse_tensor_slices_dataset()
,
tensors_dataset()
Creates a dataset with a single element, comprising the given tensors.
tensors_dataset(tensors)
tensors_dataset(tensors)
tensors |
A nested structure of tensors. |
A dataset.
Other tensor datasets:
sparse_tensor_slices_dataset()
,
tensor_slices_dataset()
A dataset comprising lines from one or more text files.
text_line_dataset( filenames, compression_type = NULL, record_spec = NULL, parallel_records = NULL )
text_line_dataset( filenames, compression_type = NULL, record_spec = NULL, parallel_records = NULL )
filenames |
String(s) specifying one or more filenames |
compression_type |
A string, one of: |
record_spec |
(Optional) Specification used to decode delimimted text lines
into records (see |
parallel_records |
(Optional) An integer, representing the number of records to decode in parallel. If not specified, records will be processed sequentially. |
A dataset
A dataset comprising records from one or more TFRecord files.
tfrecord_dataset( filenames, compression_type = NULL, buffer_size = NULL, num_parallel_reads = NULL )
tfrecord_dataset( filenames, compression_type = NULL, buffer_size = NULL, num_parallel_reads = NULL )
filenames |
String(s) specifying one or more filenames |
compression_type |
A string, one of: |
buffer_size |
An integer representing the number of bytes in the read buffer. (0 means no buffering). |
num_parallel_reads |
An integer representing the number of files to read in parallel. Defaults to reading files sequentially. |
If the dataset encodes a set of TFExample instances, then they can be decoded
into named records using the dataset_map()
function (see example below).
## Not run: # Creates a dataset that reads all of the examples from two files, and extracts # the image and label features. filenames <- c("/var/data/file1.tfrecord", "/var/data/file2.tfrecord") dataset <- tfrecord_dataset(filenames) %>% dataset_map(function(example_proto) { features <- list( image = tf$FixedLenFeature(shape(), tf$string, default_value = ""), label = tf$FixedLenFeature(shape(), tf$int32, default_value = 0L) ) tf$parse_single_example(example_proto, features) }) ## End(Not run)
## Not run: # Creates a dataset that reads all of the examples from two files, and extracts # the image and label features. filenames <- c("/var/data/file1.tfrecord", "/var/data/file2.tfrecord") dataset <- tfrecord_dataset(filenames) %>% dataset_map(function(example_proto) { features <- list( image = tf$FixedLenFeature(shape(), tf$string, default_value = ""), label = tf$FixedLenFeature(shape(), tf$int32, default_value = 0L) ) tf$parse_single_example(example_proto, features) }) ## End(Not run)
Execute code that traverses a dataset until an out of range condition occurs
until_out_of_range(expr) out_of_range_handler(e)
until_out_of_range(expr) out_of_range_handler(e)
expr |
Expression to execute (will be executed multiple times until the condition occurs) |
e |
Error object |
When a dataset iterator reaches the end, an out of range runtime error will occur. This function will catch and ignore the error when it occurs.
## Not run: library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_batch(128) %>% dataset_repeat(10) %>% dataset_prepare(x = c(mpg, disp), y = cyl) iter <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iter) until_out_of_range({ batch <- sess$run(next_batch) # use batch$x and batch$y tensors }) ## End(Not run)
## Not run: library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_batch(128) %>% dataset_repeat(10) %>% dataset_prepare(x = c(mpg, disp), y = cyl) iter <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iter) until_out_of_range({ batch <- sess$run(next_batch) # use batch$x and batch$y tensors }) ## End(Not run)
Execute code that traverses a dataset
with_dataset(expr)
with_dataset(expr)
expr |
Expression to execute |
When a dataset iterator reaches the end, an out of range runtime error
will occur. You can catch and ignore the error when it occurs by wrapping
your iteration code in a call to with_dataset()
(see the example
below for an illustration).
## Not run: library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_prepare(x = c(mpg, disp), y = cyl) %>% dataset_batch(128) %>% dataset_repeat(10) iter <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iter) with_dataset({ while(TRUE) { batch <- sess$run(next_batch) # use batch$x and batch$y tensors } }) ## End(Not run)
## Not run: library(tfdatasets) dataset <- text_line_dataset("mtcars.csv", record_spec = mtcars_spec) %>% dataset_prepare(x = c(mpg, disp), y = cyl) %>% dataset_batch(128) %>% dataset_repeat(10) iter <- make_iterator_one_shot(dataset) next_batch <- iterator_get_next(iter) with_dataset({ while(TRUE) { batch <- sess$run(next_batch) # use batch$x and batch$y tensors } }) ## End(Not run)
Merges datasets together into pairs or tuples that contain an element from each dataset.
zip_datasets(...)
zip_datasets(...)
... |
Datasets to zip (or a single argument with a list or list of lists of datasets). |
A dataset