Package 'tfprobability'

Title: Interface to 'TensorFlow Probability'
Description: Interface to 'TensorFlow Probability', a 'Python' library built on 'TensorFlow' that makes it easy to combine probabilistic models and deep learning on modern hardware ('TPU', 'GPU'). 'TensorFlow Probability' includes a wide selection of probability distributions and bijectors, probabilistic layers, variational inference, Markov chain Monte Carlo, and optimizers such as Nelder-Mead, BFGS, and SGLD.
Authors: Tomasz Kalinowski [ctb, cre], Sigrid Keydana [aut], Daniel Falbel [ctb], Kevin Kuo [ctb] , RStudio [cph]
Maintainer: Tomasz Kalinowski <[email protected]>
License: Apache License (>= 2.0)
Version: 0.15.1.9000
Built: 2024-10-06 05:50:09 UTC
Source: https://github.com/rstudio/tfprobability

Help Index


GLM families

Description

A list of models that can be used as the model argument in glm_fit():

Details

  • Bernoulli: Bernoulli(probs=mean) where mean = sigmoid(matmul(X, weights))

  • BernoulliNormalCDF: Bernoulli(probs=mean) where ⁠mean = Normal(0, 1).cdf(matmul(X, weights))⁠

  • GammaExp: Gamma(concentration=1, rate=1 / mean) where mean = exp(matmul(X, weights))

  • GammaSoftplus: Gamma(concentration=1, rate=1 / mean) where mean = softplus(matmul(X, weights))

  • LogNormal: LogNormal(loc=log(mean) - log(2) / 2, scale=sqrt(log(2))) where mean = exp(matmul(X, weights)).

  • LogNormalSoftplus: LogNormal(loc=log(mean) - log(2) / 2, scale=sqrt(log(2))) where mean = softplus(matmul(X, weights))

  • Normal: Normal(loc=mean, scale=1) where mean = matmul(X, weights).

  • NormalReciprocal: Normal(loc=mean, scale=1) where mean = 1 / matmul(X, weights)

  • Poisson: Poisson(rate=mean) where mean = exp(matmul(X, weights)).

  • PoissonSoftplus: Poisson(rate=mean) where mean = softplus(matmul(X, weights)).

Value

list of models that can be used as the model argument in glm_fit()

See Also

Other glm_fit: glm_fit.tensorflow.tensor(), glm_fit_one_step.tensorflow.tensor()


Runs multiple Fisher scoring steps

Description

Runs multiple Fisher scoring steps

Usage

glm_fit(x, ...)

Arguments

x

float-like, matrix-shaped Tensor where each row represents a sample's features.

...

other arguments passed to specific methods.

Value

A glm_fit object with parameter estimates, number of iterations, etc.

See Also

glm_fit.tensorflow.tensor()


Runs one Fisher scoring step

Description

Runs one Fisher scoring step

Usage

glm_fit_one_step(x, ...)

Arguments

x

float-like, matrix-shaped Tensor where each row represents a sample's features.

...

other arguments passed to specific methods.

Value

A glm_fit object with parameter estimates, number of iterations, etc.

See Also

glm_fit_one_step.tensorflow.tensor()


Runs one Fisher Scoring step

Description

Runs one Fisher Scoring step

Usage

## S3 method for class 'tensorflow.tensor'
glm_fit_one_step(
  x,
  response,
  model,
  model_coefficients_start = NULL,
  predicted_linear_response_start = NULL,
  l2_regularizer = NULL,
  dispersion = NULL,
  offset = NULL,
  learning_rate = NULL,
  fast_unsafe_numerics = TRUE,
  name = NULL,
  ...
)

Arguments

x

float-like, matrix-shaped Tensor where each row represents a sample's features.

response

vector-shaped Tensor where each element represents a sample's observed response (to the corresponding row of features). Must have same dtype as x.

model

a string naming the model (see glm_families) or a tfp$glm$ExponentialFamily-like instance which implicitly characterizes a negative log-likelihood loss by specifying the distribuion's mean, gradient_mean, and variance.

model_coefficients_start

Optional (batch of) vector-shaped Tensor representing the initial model coefficients, one for each column in x. Must have same dtype as model_matrix. Default value: Zeros.

predicted_linear_response_start

Optional Tensor with shape, dtype matching response; represents offset shifted initial linear predictions based on model_coefficients_start. Default value: offset if model_coefficients is NULL, and tf$linalg$matvec(x, model_coefficients_start) + offset otherwise.

l2_regularizer

Optional scalar Tensor representing L2 regularization penalty. Default: NULL ie. no regularization.

dispersion

Optional (batch of) Tensor representing response dispersion.

offset

Optional Tensor representing constant shift applied to predicted_linear_response.

learning_rate

Optional (batch of) scalar Tensor used to dampen iterative progress. Typically only needed if optimization diverges, should be no larger than 1 and typically very close to 1. Default value: NULL (i.e., 1).

fast_unsafe_numerics

Optional Python bool indicating if faster, less numerically accurate methods can be employed for computing the weighted least-squares solution. Default value: TRUE (i.e., "fast but possibly diminished accuracy").

name

usesed as name prefix to ops created by this function. Default value: "fit".

...

other arguments passed to specific methods.

Value

A glm_fit object with parameter estimates, and number of required steps.

See Also

Other glm_fit: glm_families, glm_fit.tensorflow.tensor()


Runs multiple Fisher scoring steps

Description

Runs multiple Fisher scoring steps

Usage

## S3 method for class 'tensorflow.tensor'
glm_fit(
  x,
  response,
  model,
  model_coefficients_start = NULL,
  predicted_linear_response_start = NULL,
  l2_regularizer = NULL,
  dispersion = NULL,
  offset = NULL,
  convergence_criteria_fn = NULL,
  learning_rate = NULL,
  fast_unsafe_numerics = TRUE,
  maximum_iterations = NULL,
  name = NULL,
  ...
)

Arguments

x

float-like, matrix-shaped Tensor where each row represents a sample's features.

response

vector-shaped Tensor where each element represents a sample's observed response (to the corresponding row of features). Must have same dtype as x.

model

a string naming the model (see glm_families) or a tfp$glm$ExponentialFamily-like instance which implicitly characterizes a negative log-likelihood loss by specifying the distribuion's mean, gradient_mean, and variance.

model_coefficients_start

Optional (batch of) vector-shaped Tensor representing the initial model coefficients, one for each column in x. Must have same dtype as model_matrix. Default value: Zeros.

predicted_linear_response_start

Optional Tensor with shape, dtype matching response; represents offset shifted initial linear predictions based on model_coefficients_start. Default value: offset if model_coefficients is NULL, and tf$linalg$matvec(x, model_coefficients_start) + offset otherwise.

l2_regularizer

Optional scalar Tensor representing L2 regularization penalty. Default: NULL ie. no regularization.

dispersion

Optional (batch of) Tensor representing response dispersion.

offset

Optional Tensor representing constant shift applied to predicted_linear_response.

convergence_criteria_fn

callable taking: is_converged_previous, iter_, model_coefficients_previous, predicted_linear_response_previous, model_coefficients_next, predicted_linear_response_next, response, model, dispersion and returning a logical Tensor indicating that Fisher scoring has converged.

learning_rate

Optional (batch of) scalar Tensor used to dampen iterative progress. Typically only needed if optimization diverges, should be no larger than 1 and typically very close to 1. Default value: NULL (i.e., 1).

fast_unsafe_numerics

Optional Python bool indicating if faster, less numerically accurate methods can be employed for computing the weighted least-squares solution. Default value: TRUE (i.e., "fast but possibly diminished accuracy").

maximum_iterations

Optional maximum number of iterations of Fisher scoring to run; "and-ed" with result of convergence_criteria_fn. Default value: NULL (i.e., infinity).

name

usesed as name prefix to ops created by this function. Default value: "fit".

...

other arguments passed to specific methods.

Value

A glm_fit object with parameter estimates, and number of required steps.

See Also

Other glm_fit: glm_families, glm_fit_one_step.tensorflow.tensor()


Blockwise Initializer

Description

Initializer which concats other intializers

Usage

initializer_blockwise(initializers, sizes, validate_args = FALSE)

Arguments

initializers

list of Keras initializers, eg: keras::initializer_glorot_uniform() or initializer_constant().

sizes

list of integers scalars representing the number of elements associated with each initializer in initializers.

validate_args

bool indicating we should do (possibly expensive) graph-time assertions, if necessary.

@return Initializer which concats other intializers


Installs TensorFlow Probability

Description

Installs TensorFlow Probability

Usage

install_tfprobability(
  method = c("auto", "virtualenv", "conda"),
  conda = "auto",
  version = "default",
  tensorflow = "default",
  extra_packages = NULL,
  ...,
  pip_ignore_installed = TRUE
)

Arguments

method

Installation method. By default, "auto" automatically finds a method that will work in the local environment. Change the default to force a specific installation method. Note that the "virtualenv" method is not available on Windows.

conda

The path to a conda executable. Use "auto" to allow reticulate to automatically find an appropriate conda binary. See Finding Conda and conda_binary() for more details.

version

TensorFlow version to install. Valid values include:

  • "default" installs 2.9

  • "release" installs the latest release version of tensorflow (which may be incompatible with the current version of the R package)

  • A version specification like "2.4" or "2.4.0". Note that if the patch version is not supplied, the latest patch release is installed (e.g., "2.4" today installs version "2.4.2")

  • nightly for the latest available nightly build.

  • To any specification, you can append "-cpu" to install the cpu version only of the package (e.g., "2.4-cpu")

  • The full URL or path to a installer binary or python *.whl file.

tensorflow

Synonym for version. Maintained for backwards.

extra_packages

Additional Python packages to install along with TensorFlow.

...

other arguments passed to reticulate::conda_install() or reticulate::virtualenv_install(), depending on the method used.

pip_ignore_installed

Whether pip should ignore installed python packages and reinstall all already installed python packages. This defaults to TRUE, to ensure that TensorFlow dependencies like NumPy are compatible with the prebuilt TensorFlow binaries.

Value

invisible


Masked Autoencoder for Distribution Estimation

Description

layer_autoregressive takes as input a Tensor of shape ⁠[..., event_size]⁠ and returns a Tensor of shape ⁠[..., event_size, params]⁠. The output satisfies the autoregressive property. That is, the layer is configured with some permutation ord of ⁠{0, ..., event_size-1}⁠ (i.e., an ordering of the input dimensions), and the output output[batch_idx, i, ...] for input dimension i depends only on inputs x[batch_idx, j] where ord(j) < ord(i).

Usage

layer_autoregressive(
  object,
  params,
  event_shape = NULL,
  hidden_units = NULL,
  input_order = "left-to-right",
  hidden_degrees = "equal",
  activation = NULL,
  use_bias = TRUE,
  kernel_initializer = "glorot_uniform",
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

params

integer specifying the number of parameters to output per input.

event_shape

list-like of positive integers (or a single int), specifying the shape of the input to this layer, which is also the event_shape of the distribution parameterized by this layer. Currently only rank-1 shapes are supported. That is, event_shape must be a single integer. If not specified, the event shape is inferred when this layer is first called or built.

hidden_units

list-like of non-negative integers, specifying the number of units in each hidden layer.

input_order

Order of degrees to the input units: 'random', 'left-to-right', 'right-to-left', or an array of an explicit order. For example, 'left-to-right' builds an autoregressive model: ⁠p(x) = p(x1) p(x2 | x1) ... p(xD | x<D)⁠. Default: 'left-to-right'.

hidden_degrees

Method for assigning degrees to the hidden units: 'equal', 'random'. If 'equal', hidden units in each layer are allocated equally (up to a remainder term) to each degree. Default: 'equal'.

activation

An activation function. See keras::layer_dense. Default: NULL.

use_bias

Whether or not the dense layers constructed in this layer should have a bias term. See keras::layer_dense. Default: TRUE.

kernel_initializer

Initializer for the kernel weights matrix. Default: 'glorot_uniform'.

validate_args

logical, default FALSE. When TRUE, layer parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

The autoregressive property allows us to use output[batch_idx, i] to parameterize conditional distributions: ⁠p(x[batch_idx, i] | x[batch_idx, ] for ord(j) < ord(i))⁠ which give us a tractable distribution over input x[batch_idx]:

⁠p(x[batch_idx]) = prod_i p(x[batch_idx, ord(i)] | x[batch_idx, ord(0:i)])⁠

For example, when params is 2, the output of the layer can parameterize the location and log-scale of an autoregressive Gaussian distribution.

Value

a Keras layer

See Also

Other layers: layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


An autoregressive normalizing flow layer, given a layer_autoregressive.

Description

Following Papamakarios et al. (2017), given an autoregressive model p(x)p(x) with conditional distributions in the location-scale family, we can construct a normalizing flow for p(x)p(x).

Usage

layer_autoregressive_transform(object, made, ...)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

made

A Made layer, which must output two parameters for each input.

...

Additional parameters passed to Keras Layer.

Details

Specifically, suppose made is a ⁠[layer_autoregressive()]⁠ – a layer implementing a Masked Autoencoder for Distribution Estimation (MADE) – that computes location and log-scale parameters made(x)[i]made(x)[i] for each input x[i]x[i]. Then we can represent the autoregressive model p(x)p(x) as x=f(u)x = f(u) where uu is drawn from from some base distribution and where ff is an invertible and differentiable function (i.e., a Bijector) and f1(x)f^{-1}(x) is defined by:

library(tensorflow)
library(zeallot)
f_inverse <- function(x) {
  c(shift, log_scale) %<-% tf$unstack(made(x), 2, axis = -1L)
  (x - shift) * tf$math$exp(-log_scale)
}

Given a layer_autoregressive() made, a layer_autoregressive_transform() transforms an input ⁠tfd_*⁠ p(u)p(u) to an output ⁠tfd_*⁠ p(x)p(x) where x=f(u)x = f(u).

Value

a Keras layer

References

Papamakarios et al. (2017)

See Also

tfb_masked_autoregressive_flow() and layer_autoregressive()


A OneHotCategorical mixture Keras layer from k * (1 + d) params.

Description

k (i.e., num_components) represents the number of component OneHotCategorical distributions and d (i.e., event_size) represents the number of categories within each OneHotCategorical distribution.

Usage

layer_categorical_mixture_of_one_hot_categorical(
  object,
  event_size,
  num_components,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  sample_dtype = NULL,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

event_size

Scalar integer representing the size of single draw from this distribution.

num_components

Scalar integer representing the number of mixture components. Must be at least 1. (If num_components=1, it's more efficient to use the OneHotCategorical layer.)

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

sample_dtype

dtype of samples produced by this distribution. Default value: NULL (i.e., previous layer's dtype).

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE.

...

Additional arguments passed to args of keras::create_layer.

Details

Typical choices for convert_to_tensor_fn include:

  • tfp$distributions$Distribution$sample

  • tfp$distributions$Distribution$mean

  • tfp$distributions$Distribution$mode

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


1D convolution layer (e.g. temporal convolution) with Flipout

Description

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input to produce a tensor of outputs. It may also include a bias addition and activation function on the outputs. It assumes the kernel and/or bias are drawn from distributions.

Usage

layer_conv_1d_flipout(
  object,
  filters,
  kernel_size,
  strides = 1,
  padding = "valid",
  data_format = "channels_last",
  dilation_rate = 1,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

filters

Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).

kernel_size

An integer or list of a single integer, specifying the length of the 1D convolution window.

strides

An integer or list of a single integer, specifying the stride length of the convolution. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

padding

One of "valid" or "same" (case-insensitive).

data_format

A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape ⁠(batch, length, channels)⁠ while channels_first corresponds to inputs with shape ⁠(batch, channels, length)⁠.

dilation_rate

An integer or tuple/list of a single integer, specifying the dilation rate to use for dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any strides value != 1.

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

outputs = f(inputs; kernel, bias), kernel, bias ~ posterior

where f denotes the layer's calculation. It uses the Flipout estimator (Wen et al., 2018), which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias. Flipout uses roughly twice as many floating point operations as the reparameterization estimator but has the advantage of significantly lower variance.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


1D convolution layer (e.g. temporal convolution).

Description

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input to produce a tensor of outputs. It may also include a bias addition and activation function on the outputs. It assumes the kernel and/or bias are drawn from distributions.

Usage

layer_conv_1d_reparameterization(
  object,
  filters,
  kernel_size,
  strides = 1,
  padding = "valid",
  data_format = "channels_last",
  dilation_rate = 1,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

filters

Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).

kernel_size

An integer or list of a single integer, specifying the length of the 1D convolution window.

strides

An integer or list of a single integer, specifying the stride length of the convolution. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

padding

One of "valid" or "same" (case-insensitive).

data_format

A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape ⁠(batch, length, channels)⁠ while channels_first corresponds to inputs with shape ⁠(batch, channels, length)⁠.

dilation_rate

An integer or tuple/list of a single integer, specifying the dilation rate to use for dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any strides value != 1.

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

outputs = f(inputs; kernel, bias), kernel, bias ~ posterior

where f denotes the layer's calculation. It uses the reparameterization estimator (Kingma and Welling, 2014), which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


2D convolution layer (e.g. spatial convolution over images) with Flipout

Description

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input to produce a tensor of outputs. It may also include a bias addition and activation function on the outputs. It assumes the kernel and/or bias are drawn from distributions.

Usage

layer_conv_2d_flipout(
  object,
  filters,
  kernel_size,
  strides = 1,
  padding = "valid",
  data_format = "channels_last",
  dilation_rate = 1,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

filters

Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).

kernel_size

An integer or list of a single integer, specifying the length of the 1D convolution window.

strides

An integer or list of a single integer, specifying the stride length of the convolution. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

padding

One of "valid" or "same" (case-insensitive).

data_format

A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape ⁠(batch, length, channels)⁠ while channels_first corresponds to inputs with shape ⁠(batch, channels, length)⁠.

dilation_rate

An integer or tuple/list of a single integer, specifying the dilation rate to use for dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any strides value != 1.

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

outputs = f(inputs; kernel, bias), kernel, bias ~ posterior

where f denotes the layer's calculation. It uses the Flipout estimator (Wen et al., 2018), which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias. Flipout uses roughly twice as many floating point operations as the reparameterization estimator but has the advantage of significantly lower variance.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


2D convolution layer (e.g. spatial convolution over images)

Description

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input to produce a tensor of outputs. It may also include a bias addition and activation function on the outputs. It assumes the kernel and/or bias are drawn from distributions.

Usage

layer_conv_2d_reparameterization(
  object,
  filters,
  kernel_size,
  strides = 1,
  padding = "valid",
  data_format = "channels_last",
  dilation_rate = 1,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

filters

Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).

kernel_size

An integer or list of a single integer, specifying the length of the 1D convolution window.

strides

An integer or list of a single integer, specifying the stride length of the convolution. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

padding

One of "valid" or "same" (case-insensitive).

data_format

A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape ⁠(batch, length, channels)⁠ while channels_first corresponds to inputs with shape ⁠(batch, channels, length)⁠.

dilation_rate

An integer or tuple/list of a single integer, specifying the dilation rate to use for dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any strides value != 1.

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

outputs = f(inputs; kernel, bias), kernel, bias ~ posterior

where f denotes the layer's calculation. It uses the reparameterization estimator (Kingma and Welling, 2014), which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


3D convolution layer (e.g. spatial convolution over volumes) with Flipout

Description

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input to produce a tensor of outputs. It may also include a bias addition and activation function on the outputs. It assumes the kernel and/or bias are drawn from distributions.

Usage

layer_conv_3d_flipout(
  object,
  filters,
  kernel_size,
  strides = 1,
  padding = "valid",
  data_format = "channels_last",
  dilation_rate = 1,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

filters

Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).

kernel_size

An integer or list of a single integer, specifying the length of the 1D convolution window.

strides

An integer or list of a single integer, specifying the stride length of the convolution. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

padding

One of "valid" or "same" (case-insensitive).

data_format

A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape ⁠(batch, length, channels)⁠ while channels_first corresponds to inputs with shape ⁠(batch, channels, length)⁠.

dilation_rate

An integer or tuple/list of a single integer, specifying the dilation rate to use for dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any strides value != 1.

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

outputs = f(inputs; kernel, bias), kernel, bias ~ posterior

where f denotes the layer's calculation. It uses the Flipout estimator (Wen et al., 2018), which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias. Flipout uses roughly twice as many floating point operations as the reparameterization estimator but has the advantage of significantly lower variance.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


3D convolution layer (e.g. spatial convolution over volumes)

Description

This layer creates a convolution kernel that is convolved (actually cross-correlated) with the layer input to produce a tensor of outputs. It may also include a bias addition and activation function on the outputs. It assumes the kernel and/or bias are drawn from distributions.

Usage

layer_conv_3d_reparameterization(
  object,
  filters,
  kernel_size,
  strides = 1,
  padding = "valid",
  data_format = "channels_last",
  dilation_rate = 1,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

filters

Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).

kernel_size

An integer or list of a single integer, specifying the length of the 1D convolution window.

strides

An integer or list of a single integer, specifying the stride length of the convolution. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

padding

One of "valid" or "same" (case-insensitive).

data_format

A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape ⁠(batch, length, channels)⁠ while channels_first corresponds to inputs with shape ⁠(batch, channels, length)⁠.

dilation_rate

An integer or tuple/list of a single integer, specifying the dilation rate to use for dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any strides value != 1.

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

outputs = f(inputs; kernel, bias), kernel, bias ~ posterior

where f denotes the layer's calculation. It uses the reparameterization estimator (Kingma and Welling, 2014), which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


Densely-connected layer class with Flipout estimator.

Description

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

Usage

layer_dense_flipout(
  object,
  units,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  seed = NULL,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

units

integer dimensionality of the output space

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

seed

scalar integer which initializes the random number generator. Default value: NULL (i.e., use global seed).

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

kernel, bias ~ posterior
outputs = activation(matmul(inputs, kernel) + bias)

It uses the Flipout estimator (Wen et al., 2018), which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias. Flipout uses roughly twice as many floating point operations as the reparameterization estimator but has the advantage of significantly lower variance.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer).

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


Densely-connected layer class with local reparameterization estimator.

Description

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

Usage

layer_dense_local_reparameterization(
  object,
  units,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

units

integer dimensionality of the output space

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

kernel, bias ~ posterior
outputs = activation(matmul(inputs, kernel) + bias)

It uses the local reparameterization estimator (Kingma et al., 2015), which performs a Monte Carlo approximation of the distribution on the hidden units induced by the kernel and bias. The default kernel_posterior_fn is a normal distribution which factorizes across all elements of the weight matrix and bias vector. Unlike that paper's multiplicative parameterization, this distribution has trainable location and scale parameters which is known as an additive noise parameterization (Molchanov et al., 2017).

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_reparameterization(), layer_dense_variational(), layer_variable()


Densely-connected layer class with reparameterization estimator.

Description

This layer implements the Bayesian variational inference analogue to a dense layer by assuming the kernel and/or the bias are drawn from distributions.

Usage

layer_dense_reparameterization(
  object,
  units,
  activation = NULL,
  activity_regularizer = NULL,
  trainable = TRUE,
  kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(),
  kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn,
  kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE),
  bias_posterior_tensor_fn = function(d) d %>% tfd_sample(),
  bias_prior_fn = NULL,
  bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p),
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

units

integer dimensionality of the output space

activation

Activation function. Set it to None to maintain a linear activation.

activity_regularizer

Regularizer function for the output.

trainable

Whether the layer weights will be updated during training.

kernel_posterior_fn

Function which creates tfd$Distribution instance representing the surrogate posterior of the kernel parameter. Default value: default_mean_field_normal_fn().

kernel_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

kernel_prior_fn

Function which creates tfd$Distribution instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: tfd_normal(loc = 0, scale = 1).

kernel_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

bias_posterior_fn

Function which creates a tfd$Distribution instance representing the surrogate posterior of the bias parameter. Default value: default_mean_field_normal_fn(is_singular = TRUE) (which creates an instance of tfd_deterministic).

bias_posterior_tensor_fn

Function which takes a tfd$Distribution instance and returns a representative value. Default value: function(d) d %>% tfd_sample().

bias_prior_fn

Function which creates tfd instance. See default_mean_field_normal_fn docstring for required parameter signature. Default value: NULL (no prior, no variational inference)

bias_divergence_fn

Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s) from the surrogate posterior and computes or approximates the KL divergence. The distributions are tfd$Distribution-like instances and the sample is a Tensor.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Details

By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,

kernel, bias ~ posterior
outputs = activation(matmul(inputs, kernel) + bias)

It uses the reparameterization estimator (Kingma and Welling, 2014) which performs a Monte Carlo approximation of the distribution integrating over the kernel and bias.

The arguments permit separate specification of the surrogate posterior (q(W|x)), prior (p(W)), and divergence for both the kernel and bias distributions.

Upon being built, this layer adds losses (accessible via the losses property) representing the divergences of kernel and/or bias surrogate posteriors and their respective priors. When doing minibatch stochastic optimization, make sure to scale this loss such that it is applied just once per epoch (e.g. if kl is the sum of losses for each element of the batch, you should pass kl / num_examples_per_epoch to your optimizer). You can access the kernel and/or bias posterior and prior distributions after the layer is built via the kernel_posterior, kernel_prior, bias_posterior and bias_prior properties.

Value

a Keras layer

References

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_variational(), layer_variable()


Dense Variational Layer

Description

This layer uses variational inference to fit a "surrogate" posterior to the distribution over both the kernel matrix and the bias terms which are otherwise used in a manner similar to layer_dense(). This layer fits the "weights posterior" according to the following generative process:

[K, b] ~ Prior()
M = matmul(X, K) + b
Y ~ Likelihood(M)

Usage

layer_dense_variational(
  object,
  units,
  make_posterior_fn,
  make_prior_fn,
  kl_weight = NULL,
  kl_use_exact = FALSE,
  activation = NULL,
  use_bias = TRUE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

units

Positive integer, dimensionality of the output space.

make_posterior_fn

function taking tf$size(kernel), tf$size(bias), dtype and returns another callable which takes an input and produces a tfd$Distribution instance.

make_prior_fn

function taking tf$size(kernel), tf$size(bias), dtype and returns another callable which takes an input and produces a tfd$Distribution instance.

kl_weight

Amount by which to scale the KL divergence loss between prior and posterior.

kl_use_exact

Logical indicating that the analytical KL divergence should be used rather than a Monte Carlo approximation.

activation

An activation function. See keras::layer_dense. Default: NULL.

use_bias

Whether or not the dense layers constructed in this layer should have a bias term. See keras::layer_dense. Default: TRUE.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Value

a Keras layer

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_variable()


Keras layer enabling plumbing TFP distributions through Keras models

Description

Keras layer enabling plumbing TFP distributions through Keras models

Usage

layer_distribution_lambda(
  object,
  make_distribution_fn,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

make_distribution_fn

A callable that takes previous layer outputs and returns a tfd$distributions$Distribution instance.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


An Independent-Bernoulli Keras layer from prod(event_shape) params

Description

An Independent-Bernoulli Keras layer from prod(event_shape) params

Usage

layer_independent_bernoulli(
  object,
  event_shape,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  sample_dtype = NULL,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

event_shape

Scalar integer representing the size of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

sample_dtype

dtype of samples produced by this distribution. Default value: NULL (i.e., previous layer's dtype).

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. @param ... Additional arguments passed to args of keras::create_layer.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


An independent Logistic Keras layer.

Description

An independent Logistic Keras layer.

Usage

layer_independent_logistic(
  object,
  event_shape,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

event_shape

Scalar integer representing the size of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. @param ... Additional arguments passed to args of keras::create_layer.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


An independent Normal Keras layer.

Description

An independent Normal Keras layer.

Usage

layer_independent_normal(
  object,
  event_shape,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

event_shape

Scalar integer representing the size of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. @param ... Additional arguments passed to args of keras::create_layer.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()

Examples

library(keras)
input_shape <- c(28, 28, 1)
encoded_shape <- 2
n <- 2
model <- keras_model_sequential(
  list(
    layer_input(shape = input_shape),
    layer_flatten(),
    layer_dense(units = n),
    layer_dense(units = params_size_independent_normal(encoded_shape)),
    layer_independent_normal(event_shape = encoded_shape)
    )
  )

An independent Poisson Keras layer.

Description

An independent Poisson Keras layer.

Usage

layer_independent_poisson(
  object,
  event_shape,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

event_shape

Scalar integer representing the size of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. @param ... Additional arguments passed to args of keras::create_layer.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


Pass-through layer that adds a KL divergence penalty to the model loss

Description

Pass-through layer that adds a KL divergence penalty to the model loss

Usage

layer_kl_divergence_add_loss(
  object,
  distribution_b,
  use_exact_kl = FALSE,
  test_points_reduce_axis = NULL,
  test_points_fn = tf$convert_to_tensor,
  weight = NULL,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

distribution_b

Distribution instance corresponding to b as in KL[a, b]. The previous layer's output is presumed to be a Distribution instance and is a.

use_exact_kl

Logical indicating if KL divergence should be calculated exactly via tfp$distributions$kl_divergence or via Monte Carlo approximation. Default value: FALSE.

test_points_reduce_axis

Integer vector or scalar representing dimensions over which to reduce_mean while calculating the Monte Carlo approximation of the KL divergence. As is with all tf$reduce_* ops, NULL means reduce over all dimensions; () means reduce over none of them. Default value: () (i.e., no reduction).

test_points_fn

A callable taking a tfp$distributions$Distribution instance and returning a tensor used for random test points to approximate the KL divergence. Default value: tf$convert_to_tensor.

weight

Multiplier applied to the calculated KL divergence for each Keras batch member. Default value: NULL (i.e., do not weight each batch member).

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


Regularizer that adds a KL divergence penalty to the model loss

Description

When using Monte Carlo approximation (e.g., use_exact = FALSE), it is presumed that the input distribution's concretization (i.e., tf$convert_to_tensor(distribution)) corresponds to a random sample. To override this behavior, set test_points_fn.

Usage

layer_kl_divergence_regularizer(
  object,
  distribution_b,
  use_exact_kl = FALSE,
  test_points_reduce_axis = NULL,
  test_points_fn = tf$convert_to_tensor,
  weight = NULL,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

distribution_b

Distribution instance corresponding to b as in KL[a, b]. The previous layer's output is presumed to be a Distribution instance and is a.

use_exact_kl

Logical indicating if KL divergence should be calculated exactly via tfp$distributions$kl_divergence or via Monte Carlo approximation. Default value: FALSE.

test_points_reduce_axis

Integer vector or scalar representing dimensions over which to reduce_mean while calculating the Monte Carlo approximation of the KL divergence. As is with all tf$reduce_* ops, NULL means reduce over all dimensions; () means reduce over none of them. Default value: () (i.e., no reduction).

test_points_fn

A callable taking a tfp$distributions$Distribution instance and returning a tensor used for random test points to approximate the KL divergence. Default value: tf$convert_to_tensor.

weight

Multiplier applied to the calculated KL divergence for each Keras batch member. Default value: NULL (i.e., do not weight each batch member).

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


A mixture distribution Keras layer, with independent logistic components.

Description

A mixture distribution Keras layer, with independent logistic components.

Usage

layer_mixture_logistic(
  object,
  num_components,
  event_shape = list(),
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

num_components

Number of component distributions in the mixture distribution.

event_shape

integer vector Tensor representing the shape of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. @param ... Additional arguments passed to args of keras::create_layer.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


A mixture distribution Keras layer, with independent normal components.

Description

A mixture distribution Keras layer, with independent normal components.

Usage

layer_mixture_normal(
  object,
  num_components,
  event_shape = list(),
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

num_components

Number of component distributions in the mixture distribution.

event_shape

integer vector Tensor representing the shape of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. @param ... Additional arguments passed to args of keras::create_layer.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_same_family(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


A mixture (same-family) Keras layer.

Description

A mixture (same-family) Keras layer.

Usage

layer_mixture_same_family(
  object,
  num_components,
  component_layer,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

num_components

Number of component distributions in the mixture distribution.

component_layer

Function that, given a tensor of shape ⁠batch_shape + [num_components, component_params_size]⁠, returns a tfd.Distribution-like instance that implements the component distribution (with batch shape ⁠batch_shape + [num_components]⁠) – e.g., a TFP distribution layer.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. @param ... Additional arguments passed to args of keras::create_layer.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_multivariate_normal_tri_l(), layer_one_hot_categorical()


A d-variate Multivariate Normal TriL Keras layer from d+d*(d+1)/ 2 params

Description

A d-variate Multivariate Normal TriL Keras layer from d+d*(d+1)/ 2 params

Usage

layer_multivariate_normal_tri_l(
  object,
  event_size,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

event_size

Integer vector tensor representing the shape of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_one_hot_categorical()


A d-variate OneHotCategorical Keras layer from d params.

Description

Typical choices for convert_to_tensor_fn include:

  • tfp$distributions$Distribution$sample

  • tfp$distributions$Distribution$mean

  • tfp$distributions$Distribution$mode

  • tfp$distributions$OneHotCategorical$logits

Usage

layer_one_hot_categorical(
  object,
  event_size,
  convert_to_tensor_fn = tfp$distributions$Distribution$sample,
  sample_dtype = NULL,
  validate_args = FALSE,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

event_size

Scalar integer representing the size of single draw from this distribution.

convert_to_tensor_fn

A callable that takes a tfd$Distribution instance and returns a tf$Tensor-like object. Default value: tfd$distributions$Distribution$sample.

sample_dtype

dtype of samples produced by this distribution. Default value: NULL (i.e., previous layer's dtype).

validate_args

Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE.

...

Additional arguments passed to args of keras::create_layer.

Value

a Keras layer

See Also

For an example how to use in a Keras model, see layer_independent_normal().

Other distribution_layers: layer_categorical_mixture_of_one_hot_categorical(), layer_distribution_lambda(), layer_independent_bernoulli(), layer_independent_logistic(), layer_independent_normal(), layer_independent_poisson(), layer_kl_divergence_add_loss(), layer_kl_divergence_regularizer(), layer_mixture_logistic(), layer_mixture_normal(), layer_mixture_same_family(), layer_multivariate_normal_tri_l()


Variable Layer

Description

Simply returns a (trainable) variable, regardless of input. This layer implements the mathematical function f(x) = c where c is a constant, i.e., unchanged for all x. Like other Keras layers, the constant is trainable. This layer can also be interpretted as the special case of layer_dense() when the kernel is forced to be the zero matrix (tf$zeros).

Usage

layer_variable(
  object,
  shape,
  dtype = NULL,
  activation = NULL,
  initializer = "zeros",
  regularizer = NULL,
  constraint = NULL,
  ...
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

shape

integer or integer vector specifying the shape of the output of this layer.

dtype

TensorFlow dtype of the variable created by this layer.

activation

An activation function. See keras::layer_dense. Default: NULL.

initializer

Initializer for the constant vector.

regularizer

Regularizer function applied to the constant vector.

constraint

Constraint function applied to the constant vector.

...

Additional keyword arguments passed to the keras::layer_dense constructed by this layer.

Value

a Keras layer

See Also

Other layers: layer_autoregressive(), layer_conv_1d_flipout(), layer_conv_1d_reparameterization(), layer_conv_2d_flipout(), layer_conv_2d_reparameterization(), layer_conv_3d_flipout(), layer_conv_3d_reparameterization(), layer_dense_flipout(), layer_dense_local_reparameterization(), layer_dense_reparameterization(), layer_dense_variational()


A Variational Gaussian Process Layer.

Description

Create a Variational Gaussian Process distribution whose index_points are the inputs to the layer. Parameterized by number of inducing points and a kernel_provider, which should be a tf.keras.Layer with an @property that late-binds variable parameters to a tfp.positive_semidefinite_kernel.PositiveSemidefiniteKernel instance (this requirement has to do with the way that variables must be created in a keras model). The mean_fn is an optional argument which, if omitted, will be automatically configured to be a constant function with trainable variable output.

Usage

layer_variational_gaussian_process(
  object,
  num_inducing_points,
  kernel_provider,
  event_shape = 1,
  inducing_index_points_initializer = NULL,
  unconstrained_observation_noise_variance_initializer = NULL,
  mean_fn = NULL,
  jitter = 1e-06,
  name = NULL
)

Arguments

object

What to compose the new Layer instance with. Typically a Sequential model or a Tensor (e.g., as returned by layer_input()). The return value depends on object. If object is:

  • missing or NULL, the Layer instance is returned.

  • a Sequential model, the model with an additional layer is returned.

  • a Tensor, the output tensor from layer_instance(object) is returned.

num_inducing_points

number of inducing points in the Variational Gaussian Process distribution.

kernel_provider

a Layer instance equipped with an ⁠@property⁠, which yields a PositiveSemidefiniteKernel instance. The latter is used to parametrize the constructed Variational Gaussian Process distribution returned by calling the layer.

event_shape

the shape of the output of the layer. This translates to a batch of underlying Variational Gaussian Process distributions. For example, event_shape = 3 means we are modelling a batch of 3 distributions over functions. We can think oof this as a distribution over 3-dimensional veector-valued functions.

inducing_index_points_initializer

a tf.keras.initializer.Initializer used to initialize the trainable ⁠inducing_index_points variables⁠. Training VGP's is pretty sensitive to choice of initial inducing index point locations. A reasonable heuristic is to scatter them near the data, not too close to each other.

unconstrained_observation_noise_variance_initializer

a tf.keras.initializer.Initializer used to initialize the unconstrained observation noise variable. The observation noise variance is computed from this variable via the tf.nn.softplus function.

mean_fn

a callable that maps layer inputs to mean function values. Passed to the mean_fn parameter of Variational Gaussian Process distribution. If omitted, defaults to a constant function with trainable variable value.

jitter

a small term added to the diagonal of various kernel matrices for numerical stability.

name

name to give to this layer and the scope of ops and variables it contains.

Value

a Keras layer


Adapts the inner kernel's step_size based on log_accept_prob.

Description

The dual averaging policy uses a noisy step size for exploration, while averaging over tuning steps to provide a smoothed estimate of an optimal value. It is based on section 3.2 of Hoffman and Gelman (2013), which modifies the [stochastic convex optimization scheme of Nesterov (2009). The modified algorithm applies extra weight to recent iterations while keeping the convergence guarantees of Robbins-Monro, and takes care not to make the step size too small too quickly when maintaining a constant trajectory length, to avoid expensive early iterations. A good target acceptance probability depends on the inner kernel. If this kernel is HamiltonianMonteCarlo, then 0.6-0.9 is a good range to aim for. For RandomWalkMetropolis this should be closer to 0.25. See the individual kernels' docstrings for guidance.

Usage

mcmc_dual_averaging_step_size_adaptation(
  inner_kernel,
  num_adaptation_steps,
  target_accept_prob = 0.75,
  exploration_shrinkage = 0.05,
  step_count_smoothing = 10,
  decay_rate = 0.75,
  step_size_setter_fn = NULL,
  step_size_getter_fn = NULL,
  log_accept_prob_getter_fn = NULL,
  validate_args = FALSE,
  name = NULL
)

Arguments

inner_kernel

TransitionKernel-like object.

num_adaptation_steps

Scalar integer Tensor number of initial steps to during which to adjust the step size. This may be greater, less than, or equal to the number of burnin steps.

target_accept_prob

A floating point Tensor representing desired acceptance probability. Must be a positive number less than 1. This can either be a scalar, or have shape ⁠[num_chains]⁠. Default value: 0.75 (the center of asymptotically optimal rate for HMC).

exploration_shrinkage

Floating point scalar Tensor. How strongly the exploration rate is biased towards the shrinkage target.

step_count_smoothing

Int32 scalar Tensor. Number of "pseudo-steps" added to the number of steps taken to prevents noisy exploration during the early samples.

decay_rate

Floating point scalar Tensor. How much to favor recent iterations over earlier ones. A value of 1 gives equal weight to all history.

step_size_setter_fn

A function with the signature ⁠(kernel_results, new_step_size) -> new_kernel_results⁠ where kernel_results are the results of the inner_kernel, new_step_size is a Tensor or a nested collection of Tensors with the same structure as returned by the step_size_getter_fn, and new_kernel_results are a copy of kernel_results with the step size(s) set.

step_size_getter_fn

A callable with the signature (kernel_results) -> step_size where kernel_results are the results of the inner_kernel, and step_size is a floating point Tensor or a nested collection of such Tensors.

log_accept_prob_getter_fn

A callable with the signature (kernel_results) -> log_accept_prob where kernel_results are the results of the inner_kernel, and log_accept_prob is a floating point Tensor. log_accept_prob can either be a scalar, or have shape ⁠[num_chains]⁠. If it's the latter, step_size should also have the same leading dimension.

validate_args

logical. When TRUE kernel parameters are checked for validity. When FALSE invalid inputs may silently render incorrect outputs.

name

name prefixed to Ops created by this function. Default value: NULL (i.e., 'dual_averaging_step_size_adaptation').

Details

In general, adaptation prevents the chain from reaching a stationary distribution, so obtaining consistent samples requires num_adaptation_steps be set to a value somewhat smaller than the number of burnin steps. However, it may sometimes be helpful to set num_adaptation_steps to a larger value during development in order to inspect the behavior of the chain during adaptation. The step size is assumed to broadcast with the chain state, potentially having leading dimensions corresponding to multiple chains. When there are fewer of those leading dimensions than there are chain dimensions, the corresponding dimensions in the log_accept_prob are averaged (in the direct space, rather than the log space) before being used to adjust the step size. This means that this kernel can do both cross-chain adaptation, or per-chain step size adaptation, depending on the shape of the step size. For example, if your problem has a state with shape ⁠[S]⁠, your chain state has shape ⁠[C0, C1, S]⁠ (meaning that there are C0 * C1 total chains) and log_accept_prob has shape ⁠[C0, C1]⁠ (one acceptance probability per chain), then depending on the shape of the step size, the following will happen:

  • Step size has shape ⁠[]⁠, ⁠[S]⁠ or ⁠[1]⁠, the log_accept_prob will be averaged across its C0 and C1 dimensions. This means that you will learn a shared step size based on the mean acceptance probability across all chains. This can be useful if you don't have a lot of steps to adapt and want to average away the noise.

  • Step size has shape ⁠[C1, 1]⁠ or ⁠[C1, S]⁠, the log_accept_prob will be averaged across its C0 dimension. This means that you will learn a shared step size based on the mean acceptance probability across chains that share the coordinate across the C1 dimension. This can be useful when the C1 dimension indexes different distributions, while C0 indexes replicas of a single distribution, all sampled in parallel.

  • Step size has shape ⁠[C0, C1, 1]⁠ or ⁠[C0, C1, S]⁠, then no averaging will happen. This means that each chain will learn its own step size. This can be useful when all chains are sampling from different distributions. Even when all chains are for the same distribution, this can help during the initial warmup period.

  • Step size has shape ⁠[C0, 1, 1]⁠ or ⁠[C0, 1, S]⁠, the log_accept_prob will be averaged across its C1 dimension. This means that you will learn a shared step size based on the mean acceptance probability across chains that share the coordinate across the C0 dimension. This can be useful when the C0 dimension indexes different distributions, while C1 indexes replicas of a single distribution, all sampled in parallel.

Value

a Monte Carlo sampling kernel

References

See Also

For an example how to use see mcmc_no_u_turn_sampler().

Other mcmc_kernels: mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Estimate a lower bound on effective sample size for each independent chain.

Description

Roughly speaking, "effective sample size" (ESS) is the size of an iid sample with the same variance as state.

Usage

mcmc_effective_sample_size(
  states,
  filter_threshold = 0,
  filter_beyond_lag = NULL,
  name = NULL
)

Arguments

states

Tensor or list of Tensor objects. Dimension zero should index identically distributed states.

filter_threshold

Tensor or list of Tensor objects. Must broadcast with state. The auto-correlation sequence is truncated after the first appearance of a term less than filter_threshold. Setting to NULL means we use no threshold filter. Since ⁠|R_k| <= 1⁠, setting to any number less than -1 has the same effect.

filter_beyond_lag

Tensor or list of Tensor objects. Must be int-like and scalar valued. The auto-correlation sequence is truncated to this length. Setting to NULL means we do not filter based on number of lags.

name

name to prepend to created ops.

Details

More precisely, given a stationary sequence of possibly correlated random variables ⁠X_1, X_2,...,X_N⁠, each identically distributed ESS is the number such that ⁠Variance{ N**-1 * Sum{X_i} } = ESS**-1 * Variance{ X_1 }.⁠

If the sequence is uncorrelated, ESS = N. In general, one should expect ESS <= N, with more highly correlated sequences having smaller ESS.

Value

Tensor or list of Tensor objects. The effective sample size of each component of states. Shape will be ⁠states$shape[1:]⁠.

See Also

Other mcmc_functions: mcmc_potential_scale_reduction(), mcmc_sample_annealed_importance_chain(), mcmc_sample_chain(), mcmc_sample_halton_sequence()


Runs one step of Hamiltonian Monte Carlo.

Description

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that takes a series of gradient-informed steps to produce a Metropolis proposal. This class implements one random HMC step from a given current_state. Mathematical details and derivations can be found in Neal (2011).

Usage

mcmc_hamiltonian_monte_carlo(
  target_log_prob_fn,
  step_size,
  num_leapfrog_steps,
  state_gradients_are_stopped = FALSE,
  step_size_update_fn = NULL,
  seed = NULL,
  store_parameters_in_results = FALSE,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state (if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

step_size

Tensor or list of Tensors representing the step size for the leapfrog integrator. Must broadcast with the shape of current_state. Larger step sizes lead to faster progress, but too-large step sizes make rejection exponentially more likely. When possible, it's often helpful to match per-variable step sizes to the standard deviations of the target distribution in each variable.

num_leapfrog_steps

Integer number of steps to run the leapfrog integrator for. Total progress per HMC step is roughly proportional to step_size * num_leapfrog_steps.

state_gradients_are_stopped

logical indicating that the proposed new state be run through tf$stop_gradient. This is particularly useful when combining optimization over samples from the HMC chain. Default value: FALSE (i.e., do not apply stop_gradient).

step_size_update_fn

Function taking current step_size (typically a tf$Variable) and kernel_results (typically collections$namedtuple) and returns updated step_size (Tensors). Default value: NULL (i.e., do not update step_size automatically).

seed

integer to seed the random number generator.

store_parameters_in_results

If TRUE, then step_size and num_leapfrog_steps are written to and read from eponymous fields in the kernel results objects returned from one_step and bootstrap_results. This allows wrapper kernels to adjust those parameters on the fly. This is incompatible with step_size_update_fn, which must be set to NULL.

name

string prefixed to Ops created by this function. Default value: NULL (i.e., 'hmc_kernel').

Details

The one_step function can update multiple chains in parallel. It assumes that all leftmost dimensions of current_state index independent chain states (and are therefore updated independently). The output of target_log_prob_fn(current_state) should sum log-probabilities across all event dimensions. Slices along the rightmost dimensions may have different target distributions; for example, ⁠current_state[0, :]⁠ could have a different target distribution from ⁠current_state[1, :]⁠. These semantics are governed by target_log_prob_fn(current_state). (The number of independent chains is tf$size(target_log_prob_fn(current_state)).)

Value

a Monte Carlo sampling kernel

References

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Runs one step of Metropolis-adjusted Langevin algorithm.

Description

Metropolis-adjusted Langevin algorithm (MALA) is a Markov chain Monte Carlo (MCMC) algorithm that takes a step of a discretised Langevin diffusion as a proposal. This class implements one step of MALA using Euler-Maruyama method for a given current_state and diagonal preconditioning volatility matrix.

Usage

mcmc_metropolis_adjusted_langevin_algorithm(
  target_log_prob_fn,
  step_size,
  volatility_fn = NULL,
  seed = NULL,
  parallel_iterations = 10,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state (if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

step_size

Tensor or list of Tensors representing the step size for the leapfrog integrator. Must broadcast with the shape of current_state. Larger step sizes lead to faster progress, but too-large step sizes make rejection exponentially more likely. When possible, it's often helpful to match per-variable step sizes to the standard deviations of the target distribution in each variable.

volatility_fn

function which takes an argument like current_state (or ⁠*current_state⁠ if it's a list) and returns volatility value at current_state. Should return a Tensor or list of Tensors that must broadcast with the shape of current_state. Defaults to the identity function.

seed

integer to seed the random number generator.

parallel_iterations

the number of coordinates for which the gradients of the volatility matrix volatility_fn can be computed in parallel.

name

String prefixed to Ops created by this function. Default value: NULL (i.e., 'mala_kernel').

Details

Mathematical details and derivations can be found in Roberts and Rosenthal (1998) and Xifara et al. (2013).

The one_step function can update multiple chains in parallel. It assumes that all leftmost dimensions of current_state index independent chain states (and are therefore updated independently). The output of target_log_prob_fn(current_state) should reduce log-probabilities across all event dimensions. Slices along the rightmost dimensions may have different target distributions; for example, ⁠current_state[0, :]⁠ could have a different target distribution from ⁠current_state[1, :]⁠. These semantics are governed by target_log_prob_fn(current_state). (The number of independent chains is tf.size(target_log_prob_fn(current_state)).)

References

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Runs one step of the Metropolis-Hastings algorithm.

Description

The Metropolis-Hastings algorithm is a Markov chain Monte Carlo (MCMC) technique which uses a proposal distribution to eventually sample from a target distribution.

Usage

mcmc_metropolis_hastings(inner_kernel, seed = NULL, name = NULL)

Arguments

inner_kernel

TransitionKernel-like object which has collections$namedtuple kernel_results and which contains a target_log_prob member and optionally a log_acceptance_correction member.

seed

integer to seed the random number generator.

name

string prefixed to Ops created by this function. Default value: NULL (i.e., "mh_kernel").

Details

Note: inner_kernel$one_step must return kernel_results as a collections$namedtuple which must:

  • have a target_log_prob field,

  • optionally have a log_acceptance_correction field, and,

  • have only fields which are Tensor-valued.

The Metropolis-Hastings log acceptance-probability is computed as:

log_accept_ratio = (current_kernel_results.target_log_prob
                   - previous_kernel_results.target_log_prob
                   + current_kernel_results.log_acceptance_correction)

If current_kernel_results$log_acceptance_correction does not exist, it is presumed 0 (i.e., that the proposal distribution is symmetric). The most common use-case for log_acceptance_correction is in the Metropolis-Hastings algorithm, i.e.,

accept_prob(x' | x) = p(x') / p(x) (g(x|x') / g(x'|x))
where,
p  represents the target distribution,
g  represents the proposal (conditional) distribution,
x' is the proposed state, and,
x  is current state

The log of the parenthetical term is the log_acceptance_correction. The log_acceptance_correction may not necessarily correspond to the ratio of proposal distributions, e.g, log_acceptance_correction has a different interpretation in Hamiltonian Monte Carlo.

Value

a Monte Carlo sampling kernel

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Runs one step of the No U-Turn Sampler

Description

The No U-Turn Sampler (NUTS) is an adaptive variant of the Hamiltonian Monte Carlo (HMC) method for MCMC. NUTS adapts the distance traveled in response to the curvature of the target density. Conceptually, one proposal consists of reversibly evolving a trajectory through the sample space, continuing until that trajectory turns back on itself (hence the name, 'No U-Turn'). This class implements one random NUTS step from a given current_state. Mathematical details and derivations can be found in Hoffman & Gelman (2011).

Usage

mcmc_no_u_turn_sampler(
  target_log_prob_fn,
  step_size,
  max_tree_depth = 10,
  max_energy_diff = 1000,
  unrolled_leapfrog_steps = 1,
  seed = NULL,
  name = NULL
)

Arguments

target_log_prob_fn

function which takes an argument like current_state and returns its (possibly unnormalized) log-density under the target distribution.

step_size

Tensor or list of Tensors representing the step size for the leapfrog integrator. Must broadcast with the shape of current_state. Larger step sizes lead to faster progress, but too-large step sizes make rejection exponentially more likely. When possible, it's often helpful to match per-variable step sizes to the standard deviations of the target distribution in each variable.

max_tree_depth

Maximum depth of the tree implicitly built by NUTS. The maximum number of leapfrog steps is bounded by 2**max_tree_depth i.e. the number of nodes in a binary tree max_tree_depth nodes deep. The default setting of 10 takes up to 1024 leapfrog steps.

max_energy_diff

Scaler threshold of energy differences at each leapfrog, divergence samples are defined as leapfrog steps that exceed this threshold. Default to 1000.

unrolled_leapfrog_steps

The number of leapfrogs to unroll per tree expansion step. Applies a direct linear multipler to the maximum trajectory length implied by max_tree_depth. Defaults to 1.

seed

integer to seed the random number generator.

name

name prefixed to Ops created by this function. Default value: NULL (i.e., 'nuts_kernel').

Details

The one_step function can update multiple chains in parallel. It assumes that a prefix of leftmost dimensions of current_state index independent chain states (and are therefore updated independently). The output of target_log_prob_fn(current_state) should sum log-probabilities across all event dimensions. Slices along the rightmost dimensions may have different target distributions; for example, current_state[0][0, ...] could have a different target distribution from current_state[0][1, ...]. These semantics are governed by ⁠target_log_prob_fn(*current_state)⁠. (The number of independent chains is tf$size(target_log_prob_fn(current_state)).)

Value

a Monte Carlo sampling kernel

References

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()

Examples

predictors <- tf$cast( c(201,244, 47,287,203,58,210,202,198,158,165,201,157,
  131,166,160,186,125,218,146),tf$float32)
obs <- tf$cast(c(592,401,583,402,495,173,479,504,510,416,393,442,317,311,400,
  337,423,334,533,344),tf$float32)
y_sigma <- tf$cast(c(61,25,38,15,21,15,27,14,30,16,14,25,52,16,34,31,42,26,
  16,22),tf$float32)

# Robust linear regression model
robust_lm <- tfd_joint_distribution_sequential(
 list(
   tfd_normal(loc = 0, scale = 1, name = "b0"),
   tfd_normal(loc = 0, scale = 1, name = "b1"),
   tfd_half_normal(5, name = "df"),
   function(df, b1, b0)
     tfd_independent(
       tfd_student_t(
         # Likelihood
           df = tf$expand_dims(df, axis = -1L),
           loc = tf$expand_dims(b0, axis = -1L) +
                 tf$expand_dims(b1, axis = -1L) * predictors[tf$newaxis, ],
           scale = y_sigma,
           name = "st"
           ), name = "ind")), validate_args = TRUE)

 log_prob <-function(b0, b1, df) {robust_lm %>%
   tfd_log_prob(list(b0, b1, df, obs))}
 step_size0 <- Map(function(x) tf$cast(x, tf$float32), c(1, .2, .5))

 number_of_steps <- 10
 burnin <- 5
 nchain <- 50

 run_chain <- function() {
 # random initialization of the starting postion of each chain
 samples <- robust_lm %>% tfd_sample(nchain)
 b0 <- samples[[1]]
 b1 <- samples[[2]]
 df <- samples[[3]]

 # bijector to map constrained parameters to real
 unconstraining_bijectors <- list(
   tfb_identity(), tfb_identity(), tfb_exp())

 trace_fn <- function(x, pkr) {
   list(pkr$inner_results$inner_results$step_size,
     pkr$inner_results$inner_results$log_accept_ratio)
 }

 nuts <- mcmc_no_u_turn_sampler(
   target_log_prob_fn = log_prob,
   step_size = step_size0
   ) %>%
   mcmc_transformed_transition_kernel(bijector = unconstraining_bijectors) %>%
   mcmc_dual_averaging_step_size_adaptation(
     num_adaptation_steps = burnin,
     step_size_setter_fn = function(pkr, new_step_size)
       pkr$`_replace`(
         inner_results = pkr$inner_results$`_replace`(step_size = new_step_size)),
     step_size_getter_fn = function(pkr) pkr$inner_results$step_size,
     log_accept_prob_getter_fn = function(pkr) pkr$inner_results$log_accept_ratio
     )

   nuts %>% mcmc_sample_chain(
     num_results = number_of_steps,
     num_burnin_steps = burnin,
     current_state = list(b0, b1, df),
     trace_fn = trace_fn)
   }

   run_chain <- tensorflow::tf_function(run_chain)
   res <- run_chain()

Gelman and Rubin (1992)'s potential scale reduction for chain convergence.

Description

Given N > 1 states from each of C > 1 independent chains, the potential scale reduction factor, commonly referred to as R-hat, measures convergence of the chains (to the same target) by testing for equality of means.

Usage

mcmc_potential_scale_reduction(
  chains_states,
  independent_chain_ndims = 1,
  name = NULL
)

Arguments

chains_states

Tensor or list of Tensors representing the state(s) of a Markov Chain at each result step. The ith state is assumed to have shape ⁠[Ni, Ci1, Ci2,...,CiD] + A⁠. Dimension 0 indexes the Ni > 1 result steps of the Markov Chain. Dimensions 1 through D index the ⁠Ci1 x ... x CiD⁠ independent chains to be tested for convergence to the same target. The remaining dimensions, A, can have any shape (even empty).

independent_chain_ndims

Integer type Tensor with value ⁠>= 1⁠ giving the number of giving the number of dimensions, from dim = 1 to dim = D, holding independent chain results to be tested for convergence.

name

name to prepend to created tf. Default: potential_scale_reduction.

Details

Specifically, R-hat measures the degree to which variance (of the means) between chains exceeds what one would expect if the chains were identically distributed. See Gelman and Rubin (1992), Brooks and Gelman (1998)].

Some guidelines:

  • The initial state of the chains should be drawn from a distribution overdispersed with respect to the target.

  • If all chains converge to the target, then as ⁠N --> infinity⁠, R-hat –> 1. Before that, R-hat > 1 (except in pathological cases, e.g. if the chain paths were identical).

  • The above holds for any number of chains C > 1. Increasing C improves effectiveness of the diagnostic.

  • Sometimes, R-hat < 1.2 is used to indicate approximate convergence, but of course this is problem dependent. See Brooks and Gelman (1998).

  • R-hat only measures non-convergence of the mean. If higher moments, or other statistics are desired, a different diagnostic should be used. See Brooks and Gelman (1998).

To see why R-hat is reasonable, let X be a random variable drawn uniformly from the combined states (combined over all chains). Then, in the limit ⁠N, C --> infinity⁠, with E, Var denoting expectation and variance, ⁠R-hat = ( E[Var[X | chain]] + Var[E[X | chain]] ) / E[Var[X | chain]].⁠ Using the law of total variance, the numerator is the variance of the combined states, and the denominator is the total variance minus the variance of the the individual chain means. If the chains are all drawing from the same distribution, they will have the same mean, and thus the ratio should be one.

Value

Tensor or list of Tensors representing the R-hat statistic for the state(s). Same dtype as state, and shape equal to ⁠state$shape[1 + independent_chain_ndims:]⁠.

References

  • Stephen P. Brooks and Andrew Gelman. General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics, 7(4), 1998.

  • Andrew Gelman and Donald B. Rubin. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4):457-472, 1992.

See Also

Other mcmc_functions: mcmc_effective_sample_size(), mcmc_sample_annealed_importance_chain(), mcmc_sample_chain(), mcmc_sample_halton_sequence()


Runs one step of the RWM algorithm with symmetric proposal.

Description

Random Walk Metropolis is a gradient-free Markov chain Monte Carlo (MCMC) algorithm. The algorithm involves a proposal generating step proposal_state = current_state + perturb by a random perturbation, followed by Metropolis-Hastings accept/reject step. For more details see Section 2.1 of Roberts and Rosenthal (2004).

Usage

mcmc_random_walk_metropolis(
  target_log_prob_fn,
  new_state_fn = NULL,
  seed = NULL,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state ((if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

new_state_fn

Function which takes a list of state parts and a seed; returns a same-type list of Tensors, each being a perturbation of the input state parts. The perturbation distribution is assumed to be a symmetric distribution centered at the input state part. Default value: NULL which is mapped to tfp$mcmc$random_walk_normal_fn().

seed

integer to seed the random number generator.

name

String name prefixed to Ops created by this function. Default value: NULL (i.e., 'rwm_kernel').

Details

The current class implements RWM for normal and uniform proposals. Alternatively, the user can supply any custom proposal generating function. The function one_step can update multiple chains in parallel. It assumes that all leftmost dimensions of current_state index independent chain states (and are therefore updated independently). The output of target_log_prob_fn(current_state) should sum log-probabilities across all event dimensions. Slices along the rightmost dimensions may have different target distributions; for example, ⁠current_state[0, :]⁠ could have a different target distribution from ⁠current_state[1, :]⁠. These semantics are governed by target_log_prob_fn(current_state). (The number of independent chains is tf$size(target_log_prob_fn(current_state)).)

Value

a Monte Carlo sampling kernel

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Runs one step of the Replica Exchange Monte Carlo

Description

Replica Exchange Monte Carlo is a Markov chain Monte Carlo (MCMC) algorithm that is also known as Parallel Tempering. This algorithm performs multiple sampling with different temperatures in parallel, and exchanges those samplings according to the Metropolis-Hastings criterion. The K replicas are parameterized in terms of inverse_temperature's, ⁠(beta[0], beta[1], ..., beta[K-1])⁠. If the target distribution has probability density p(x), the kth replica has density p(x)**beta_k.

Usage

mcmc_replica_exchange_mc(
  target_log_prob_fn,
  inverse_temperatures,
  make_kernel_fn,
  swap_proposal_fn = tfp$mcmc$replica_exchange_mc$default_swap_proposal_fn(1),
  state_includes_replicas = FALSE,
  seed = NULL,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state (if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

inverse_temperatures

⁠1D⁠ Tensor of inverse temperatures to perform samplings with each replica. Must have statically known shape. inverse_temperatures[0] produces the states returned by samplers, and is typically == 1.

make_kernel_fn

Function which takes target_log_prob_fn and seed args and returns a TransitionKernel instance.

swap_proposal_fn

function which take a number of replicas, and return combinations of replicas for exchange.

state_includes_replicas

Boolean indicating whether the leftmost dimension of each state sample should index replicas. If TRUE, the leftmost dimension of the current_state kwarg to tfp.mcmc.sample_chain will be interpreted as indexing replicas.

seed

integer to seed the random number generator.

name

string prefixed to Ops created by this function. Default value: NULL (i.e., "remc_kernel").

Details

Typically beta[0] = 1.0, and ⁠1.0 > beta[1] > beta[2] > ... > 0.0⁠.

  • beta[0] == 1 ==> First replicas samples from the target density, p.

  • beta[k] < 1, for ⁠k = 1, ..., K-1⁠ ==> Other replicas sample from "flattened" versions of p (peak is less high, valley less low). These distributions are somewhat closer to a uniform on the support of p. Samples from adjacent replicas i, i + 1 are used as proposals for each other in a Metropolis step. This allows the lower beta samples, which explore less dense areas of p, to occasionally be used to help the beta == 1 chain explore new regions of the support. Samples from replica 0 are returned, and the others are discarded.

Value

list of next_state (Tensor or Python list of Tensors representing the state(s) of the Markov chain(s) at each result step. Has same shape as and current_state.) and kernel_results (collections$namedtuple of internal calculations used to 'advance the chain).

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Runs annealed importance sampling (AIS) to estimate normalizing constants.

Description

This function uses an MCMC transition operator (e.g., Hamiltonian Monte Carlo) to sample from a series of distributions that slowly interpolates between an initial "proposal" distribution: exp(proposal_log_prob_fn(x) - proposal_log_normalizer) and the target distribution: exp(target_log_prob_fn(x) - target_log_normalizer), accumulating importance weights along the way. The product of these importance weights gives an unbiased estimate of the ratio of the normalizing constants of the initial distribution and the target distribution: E[exp(ais_weights)] = exp(target_log_normalizer - proposal_log_normalizer).

Usage

mcmc_sample_annealed_importance_chain(
  num_steps,
  proposal_log_prob_fn,
  target_log_prob_fn,
  current_state,
  make_kernel_fn,
  parallel_iterations = 10,
  name = NULL
)

Arguments

num_steps

Integer number of Markov chain updates to run. More iterations means more expense, but smoother annealing between q and p, which in turn means exponentially lower variance for the normalizing constant estimator.

proposal_log_prob_fn

function that returns the log density of the initial distribution.

target_log_prob_fn

function which takes an argument like current_state and returns its (possibly unnormalized) log-density under the target distribution.

current_state

Tensor or list of Tensors representing the current state(s) of the Markov chain(s). The first r dimensions index independent chains, r = tf$rank(target_log_prob_fn(current_state)).

make_kernel_fn

function which returns a TransitionKernel-like object. Must take one argument representing the TransitionKernel's target_log_prob_fn. The target_log_prob_fn argument represents the TransitionKernel's target log distribution. Note: sample_annealed_importance_chain creates a new target_log_prob_fn which is an interpolation between the supplied target_log_prob_fn and proposal_log_prob_fn; it is this interpolated function which is used as an argument to make_kernel_fn.

parallel_iterations

The number of iterations allowed to run in parallel. It must be a positive integer. See tf$while_loop for more details.

name

string prefixed to Ops created by this function. Default value: NULL (i.e., "sample_annealed_importance_chain").

Details

Note: When running in graph mode, proposal_log_prob_fn and target_log_prob_fn are called exactly three times (although this may be reduced to two times in the future).

Value

list of next_state (Tensor or Python list of Tensors representing the state(s) of the Markov chain(s) at the final iteration. Has same shape as input current_state), ais_weights (Tensor with the estimated weight(s). Has shape matching target_log_prob_fn(current_state)), and kernel_results (collections.namedtuple of internal calculations used to advance the chain).

See Also

For an example how to use see mcmc_sample_chain().

Other mcmc_functions: mcmc_effective_sample_size(), mcmc_potential_scale_reduction(), mcmc_sample_chain(), mcmc_sample_halton_sequence()


Implements Markov chain Monte Carlo via repeated TransitionKernel steps.

Description

This function samples from an Markov chain at current_state and whose stationary distribution is governed by the supplied TransitionKernel instance (kernel).

Usage

mcmc_sample_chain(
  kernel = NULL,
  num_results,
  current_state,
  previous_kernel_results = NULL,
  num_burnin_steps = 0,
  num_steps_between_results = 0,
  trace_fn = NULL,
  return_final_kernel_results = FALSE,
  parallel_iterations = 10,
  seed = NULL,
  name = NULL
)

Arguments

kernel

An instance of tfp$mcmc$TransitionKernel which implements one step of the Markov chain.

num_results

Integer number of Markov chain draws.

current_state

Tensor or list of Tensors representing the current state(s) of the Markov chain(s).

previous_kernel_results

A Tensor or a nested collection of Tensors representing internal calculations made within the previous call to this function (or as returned by bootstrap_results).

num_burnin_steps

Integer number of chain steps to take before starting to collect results. Default value: 0 (i.e., no burn-in).

num_steps_between_results

Integer number of chain steps between collecting a result. Only one out of every num_steps_between_samples + 1 steps is included in the returned results. The number of returned chain states is still equal to num_results. Default value: 0 (i.e., no thinning).

trace_fn

A function that takes in the current chain state and the previous kernel results and return a Tensor or a nested collection of Tensors that is then traced along with the chain state.

return_final_kernel_results

If TRUE, then the final kernel results are returned alongside the chain state and the trace specified by the trace_fn.

parallel_iterations

The number of iterations allowed to run in parallel. It must be a positive integer. See tf$while_loop for more details.

seed

Optional, a seed for reproducible sampling.

name

string prefixed to Ops created by this function. Default value: NULL, (i.e., "mcmc_sample_chain").

Details

This function can sample from multiple chains, in parallel. (Whether or not there are multiple chains is dictated by the kernel.)

The current_state can be represented as a single Tensor or a list of Tensors which collectively represent the current state. Since MCMC states are correlated, it is sometimes desirable to produce additional intermediate states, and then discard them, ending up with a set of states with decreased autocorrelation. See Owen (2017). Such "thinning" is made possible by setting num_steps_between_results > 0. The chain then takes num_steps_between_results extra steps between the steps that make it into the results. The extra steps are never materialized (in calls to sess$run), and thus do not increase memory requirements.

Warning: when setting a seed in the kernel, ensure that sample_chain's parallel_iterations=1, otherwise results will not be reproducible. In addition to returning the chain state, this function supports tracing of auxiliary variables used by the kernel. The traced values are selected by specifying trace_fn. By default, all kernel results are traced but in the future the default will be changed to no results being traced, so plan accordingly. See below for some examples of this feature.

Value

list of:

  • checkpointable_states_and_trace: if return_final_kernel_results is TRUE. The return value is an instance of CheckpointableStatesAndTrace.

  • all_states: if return_final_kernel_results is FALSE and trace_fn is NULL. The return value is a Tensor or Python list of Tensors representing the state(s) of the Markov chain(s) at each result step. Has same shape as input current_state but with a prepended num_results-size dimension.

  • states_and_trace: if return_final_kernel_results is FALSE and trace_fn is not NULL. The return value is an instance of StatesAndTrace.

References

See Also

Other mcmc_functions: mcmc_effective_sample_size(), mcmc_potential_scale_reduction(), mcmc_sample_annealed_importance_chain(), mcmc_sample_halton_sequence()

Examples

dims <- 10
  true_stddev <- sqrt(seq(1, 3, length.out = dims))
  likelihood <- tfd_multivariate_normal_diag(scale_diag = true_stddev)

  kernel <- mcmc_hamiltonian_monte_carlo(
    target_log_prob_fn = likelihood$log_prob,
    step_size = 0.5,
    num_leapfrog_steps = 2
  )

  states <- kernel %>% mcmc_sample_chain(
    num_results = 1000,
    num_burnin_steps = 500,
    current_state = rep(0, dims),
    trace_fn = NULL
  )

  sample_mean <- tf$reduce_mean(states, axis = 0L)
  sample_stddev <- tf$sqrt(
    tf$reduce_mean(tf$math$squared_difference(states, sample_mean), axis = 0L))

Returns a sample from the dim dimensional Halton sequence.

Description

Warning: The sequence elements take values only between 0 and 1. Care must be taken to appropriately transform the domain of a function if it differs from the unit cube before evaluating integrals using Halton samples. It is also important to remember that quasi-random numbers without randomization are not a replacement for pseudo-random numbers in every context. Quasi random numbers are completely deterministic and typically have significant negative autocorrelation unless randomization is used.

Usage

mcmc_sample_halton_sequence(
  dim,
  num_results = NULL,
  sequence_indices = NULL,
  dtype = tf$float32,
  randomized = TRUE,
  seed = NULL,
  name = NULL
)

Arguments

dim

Positive integer representing each sample's event_size. Must not be greater than 1000.

num_results

(Optional) Positive scalar Tensor of dtype int32. The number of samples to generate. Either this parameter or sequence_indices must be specified but not both. If this parameter is None, then the behaviour is determined by the sequence_indices. Default value: NULL.

sequence_indices

(Optional) Tensor of dtype int32 and rank 1. The elements of the sequence to compute specified by their position in the sequence. The entries index into the Halton sequence starting with 0 and hence, must be whole numbers. For example, sequence_indices=⁠[0, 5, 6]⁠ will produce the first, sixth and seventh elements of the sequence. If this parameter is None, then the num_results parameter must be specified which gives the number of desired samples starting from the first sample. Default value: NULL.

dtype

(Optional) The dtype of the sample. One of: float16, float32 or float64. Default value: tf$float32.

randomized

(Optional) bool indicating whether to produce a randomized Halton sequence. If TRUE, applies the randomization described in Owen (2017). Default value: TRUE.

seed

(Optional) integer to seed the random number generator. Only used if randomized is TRUE. If not supplied and randomized is TRUE, no seed is set. Default value: NULL.

name

(Optional) string describing ops managed by this function. If not supplied the name of this function is used. Default value: "sample_halton_sequence".

Details

Computes the members of the low discrepancy Halton sequence in dimension dim. The dim-dimensional sequence takes values in the unit hypercube in dim dimensions. Currently, only dimensions up to 1000 are supported. The prime base for the k-th axes is the k-th prime starting from 2. For example, if dim = 3, then the bases will be ⁠[2, 3, 5]⁠ respectively and the first element of the non-randomized sequence will be: ⁠[0.5, 0.333, 0.2]⁠. For a more complete description of the Halton sequences see here. For low discrepancy sequences and their applications see here.

If randomized is true, this function produces a scrambled version of the Halton sequence introduced by Owen (2017). For the advantages of randomization of low discrepancy sequences see here.

The number of samples produced is controlled by the num_results and sequence_indices parameters. The user must supply either num_results or sequence_indices but not both. The former is the number of samples to produce starting from the first element. If sequence_indices is given instead, the specified elements of the sequence are generated. For example, sequence_indices=tf$range(10) is equivalent to specifying n=10.

Value

halton_elements Elements of the Halton sequence. Tensor of supplied dtype and shape ⁠[num_results, dim]⁠ if num_results was specified or shape ⁠[s, dim]⁠ where s is the size of sequence_indices if sequence_indices were specified.

References

See Also

For an example how to use see mcmc_sample_chain().

Other mcmc_functions: mcmc_effective_sample_size(), mcmc_potential_scale_reduction(), mcmc_sample_annealed_importance_chain(), mcmc_sample_chain()


Adapts the inner kernel's step_size based on log_accept_prob.

Description

The simple policy multiplicatively increases or decreases the step_size of the inner kernel based on the value of log_accept_prob. It is based on equation 19 of Andrieu and Thoms (2008). Given enough steps and small enough adaptation_rate the median of the distribution of the acceptance probability will converge to the target_accept_prob. A good target acceptance probability depends on the inner kernel. If this kernel is HamiltonianMonteCarlo, then 0.6-0.9 is a good range to aim for. For RandomWalkMetropolis this should be closer to 0.25. See the individual kernels' docstrings for guidance.

Usage

mcmc_simple_step_size_adaptation(
  inner_kernel,
  num_adaptation_steps,
  target_accept_prob = 0.75,
  adaptation_rate = 0.01,
  step_size_setter_fn = NULL,
  step_size_getter_fn = NULL,
  log_accept_prob_getter_fn = NULL,
  validate_args = FALSE,
  name = NULL
)

Arguments

inner_kernel

TransitionKernel-like object.

num_adaptation_steps

Scalar integer Tensor number of initial steps to during which to adjust the step size. This may be greater, less than, or equal to the number of burnin steps.

target_accept_prob

A floating point Tensor representing desired acceptance probability. Must be a positive number less than 1. This can either be a scalar, or have shape list(num_chains). Default value: 0.75 (the center of asymptotically optimal rate for HMC).

adaptation_rate

Tensor representing amount to scale the current step_size.

step_size_setter_fn

A function with the signature ⁠(kernel_results, new_step_size) -> new_kernel_results⁠ where kernel_results are the results of the inner_kernel, new_step_size is a Tensor or a nested collection of Tensors with the same structure as returned by the step_size_getter_fn, and new_kernel_results are a copy of kernel_results with the step size(s) set.

step_size_getter_fn

A function with the signature (kernel_results) -> step_size where kernel_results are the results of the inner_kernel, and step_size is a floating point Tensor or a nested collection of such Tensors.

log_accept_prob_getter_fn

A function with the signature (kernel_results) -> log_accept_prob where kernel_results are the results of the inner_kernel, and log_accept_prob is a floating point Tensor. log_accept_prob can either be a scalar, or have shape list(num_chains). If it's the latter, step_size should also have the same leading dimension.

validate_args

Logical. When True kernel parameters are checked for validity. When False invalid inputs may silently render incorrect outputs.

name

string prefixed to Ops created by this class. Default: "simple_step_size_adaptation".

Details

In general, adaptation prevents the chain from reaching a stationary distribution, so obtaining consistent samples requires num_adaptation_steps be set to a value somewhat smaller than the number of burnin steps. However, it may sometimes be helpful to set num_adaptation_steps to a larger value during development in order to inspect the behavior of the chain during adaptation.

The step size is assumed to broadcast with the chain state, potentially having leading dimensions corresponding to multiple chains. When there are fewer of those leading dimensions than there are chain dimensions, the corresponding dimensions in the log_accept_prob are averaged (in the direct space, rather than the log space) before being used to adjust the step size. This means that this kernel can do both cross-chain adaptation, or per-chain step size adaptation, depending on the shape of the step size.

For example, if your problem has a state with shape ⁠[S]⁠, your chain state has shape ⁠[C0, C1, Y]⁠ (meaning that there are C0 * C1 total chains) and log_accept_prob has shape ⁠[C0, C1]⁠ (one acceptance probability per chain), then depending on the shape of the step size, the following will happen:

  • Step size has shape ⁠[]⁠, ⁠[S]⁠ or ⁠[1]⁠, the log_accept_prob will be averaged across its C0 and C1 dimensions. This means that you will learn a shared step size based on the mean acceptance probability across all chains. This can be useful if you don't have a lot of steps to adapt and want to average away the noise.

  • Step size has shape ⁠[C1, 1]⁠ or ⁠[C1, S]⁠, the log_accept_prob will be averaged across its C0 dimension. This means that you will learn a shared step size based on the mean acceptance probability across chains that share the coordinate across the C1 dimension. This can be useful when the C1 dimension indexes different distributions, while C0 indexes replicas of a single distribution, all sampled in parallel.

  • Step size has shape ⁠[C0, C1, 1]⁠ or ⁠[C0, C1, S]⁠, then no averaging will happen. This means that each chain will learn its own step size. This can be useful when all chains are sampling from different distributions. Even when all chains are for the same distribution, this can help during the initial warmup period.

  • Step size has shape ⁠[C0, 1, 1]⁠ or ⁠[C0, 1, S]⁠, the log_accept_prob will be averaged across its C1 dimension. This means that you will learn a shared step size based on the mean acceptance probability across chains that share the coordinate across the C0 dimension. This can be useful when the C0 dimension indexes different distributions, while C1 indexes replicas of a single distribution, all sampled in parallel.

Value

a Monte Carlo sampling kernel

References

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()

Examples

target_log_prob_fn <- tfd_normal(loc = 0, scale = 1)$log_prob
  num_burnin_steps <- 500
  num_results <- 500
  num_chains <- 64L
  step_size <- tf$fill(list(num_chains), 0.1)

  kernel <- mcmc_hamiltonian_monte_carlo(
    target_log_prob_fn = target_log_prob_fn,
    num_leapfrog_steps = 2,
    step_size = step_size
  ) %>%
    mcmc_simple_step_size_adaptation(num_adaptation_steps = round(num_burnin_steps * 0.8))

  res <- kernel %>% mcmc_sample_chain(
    num_results = num_results,
    num_burnin_steps = num_burnin_steps,
    current_state = rep(0, num_chains),
    trace_fn = function(x, pkr) {
      list (
        pkr$inner_results$accepted_results$step_size,
        pkr$inner_results$log_accept_ratio
      )
    }
  )

  samples <- res$all_states
  step_size <- res$trace[[1]]
  log_accept_ratio <- res$trace[[2]]

Runs one step of the slice sampler using a hit and run approach

Description

Slice Sampling is a Markov Chain Monte Carlo (MCMC) algorithm based, as stated by Neal (2003), on the observation that "...one can sample from a distribution by sampling uniformly from the region under the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternately uniform sampling in the vertical direction with uniform sampling from the horizontal slice defined by the current vertical position, or more generally, with some update that leaves the uniform distribution over this slice invariant". Mathematical details and derivations can be found in Neal (2003). The one dimensional slice sampler is extended to n-dimensions through use of a hit-and-run approach: choose a random direction in n-dimensional space and take a step, as determined by the one-dimensional slice sampling algorithm, along that direction (Belisle at al. 1993).

Usage

mcmc_slice_sampler(
  target_log_prob_fn,
  step_size,
  max_doublings,
  seed = NULL,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state (if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

step_size

Tensor or list of Tensors representing the step size for the leapfrog integrator. Must broadcast with the shape of current_state. Larger step sizes lead to faster progress, but too-large step sizes make rejection exponentially more likely. When possible, it's often helpful to match per-variable step sizes to the standard deviations of the target distribution in each variable.

max_doublings

Scalar positive int32 tf$Tensor. The maximum number of doublings to consider.

seed

integer to seed the random number generator.

name

string prefixed to Ops created by this function. Default value: NULL (i.e., 'slice_sampler_kernel').

Details

The one_step function can update multiple chains in parallel. It assumes that all leftmost dimensions of current_state index independent chain states (and are therefore updated independently). The output of ⁠target_log_prob_fn(*current_state)⁠ should sum log-probabilities across all event dimensions. Slices along the rightmost dimensions may have different target distributions; for example, ⁠current_state[0, :]⁠ could have a different target distribution from ⁠current_state[1, :]⁠. These semantics are governed by ⁠target_log_prob_fn(*current_state)⁠. (The number of independent chains is ⁠tf$size(target_log_prob_fn(*current_state))⁠.)

Note that the sampler only supports states where all components have a common dtype.

Value

list of next_state (Tensor or Python list of Tensors representing the state(s) of the Markov chain(s) at each result step. Has same shape as and current_state.) and kernel_results (collections$namedtuple of internal calculations used to 'advance the chain).

References

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Applies a bijector to the MCMC's state space

Description

The transformed transition kernel enables fitting a bijector which serves to decorrelate the Markov chain Monte Carlo (MCMC) event dimensions thus making the chain mix faster. This is particularly useful when the geometry of the target distribution is unfavorable. In such cases it may take many evaluations of the target_log_prob_fn for the chain to mix between faraway states.

Usage

mcmc_transformed_transition_kernel(inner_kernel, bijector, name = NULL)

Arguments

inner_kernel

TransitionKernel-like object which has a target_log_prob_fn argument.

bijector

bijector or list of bijectors. These bijectors use forward to map the inner_kernel state space to the state expected by inner_kernel$target_log_prob_fn.

name

string prefixed to Ops created by this function. Default value: NULL (i.e., "transformed_kernel").

Details

The idea of training an affine function to decorrelate chain event dims was presented in Parno and Marzouk (2014). Used in conjunction with the Hamiltonian Monte Carlo transition kernel, the Parno and Marzouk (2014) idea is an instance of Riemannian manifold HMC (Girolami and Calderhead, 2011).

The transformed transition kernel enables arbitrary bijective transformations of arbitrary transition kernels, e.g., one could use bijectors tfb_affine, tfb_real_nvp, etc. with transition kernels mcmc_hamiltonian_monte_carlo, mcmc_random_walk_metropolis, etc.

Value

a Monte Carlo sampling kernel

References

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Runs one step of Uncalibrated Hamiltonian Monte Carlo

Description

Warning: this kernel will not result in a chain which converges to the target_log_prob. To get a convergent MCMC, use mcmc_hamiltonian_monte_carlo(...) or mcmc_metropolis_hastings(mcmc_uncalibrated_hamiltonian_monte_carlo(...)). For more details on UncalibratedHamiltonianMonteCarlo, see HamiltonianMonteCarlo.

Usage

mcmc_uncalibrated_hamiltonian_monte_carlo(
  target_log_prob_fn,
  step_size,
  num_leapfrog_steps,
  state_gradients_are_stopped = FALSE,
  seed = NULL,
  store_parameters_in_results = FALSE,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state (if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

step_size

Tensor or list of Tensors representing the step size for the leapfrog integrator. Must broadcast with the shape of current_state. Larger step sizes lead to faster progress, but too-large step sizes make rejection exponentially more likely. When possible, it's often helpful to match per-variable step sizes to the standard deviations of the target distribution in each variable.

num_leapfrog_steps

Integer number of steps to run the leapfrog integrator for. Total progress per HMC step is roughly proportional to step_size * num_leapfrog_steps.

state_gradients_are_stopped

logical indicating that the proposed new state be run through tf$stop_gradient. This is particularly useful when combining optimization over samples from the HMC chain. Default value: FALSE (i.e., do not apply stop_gradient).

seed

integer to seed the random number generator.

store_parameters_in_results

If TRUE, then step_size and num_leapfrog_steps are written to and read from eponymous fields in the kernel results objects returned from one_step and bootstrap_results. This allows wrapper kernels to adjust those parameters on the fly. This is incompatible with step_size_update_fn, which must be set to NULL.

name

string prefixed to Ops created by this function. Default value: NULL (i.e., 'hmc_kernel').

Value

a Monte Carlo sampling kernel

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_langevin(), mcmc_uncalibrated_random_walk()


Runs one step of Uncalibrated Langevin discretized diffusion.

Description

The class generates a Langevin proposal using ⁠_euler_method⁠ function and also computes helper UncalibratedLangevinKernelResults for the next iteration. Warning: this kernel will not result in a chain which converges to the target_log_prob. To get a convergent MCMC, use MetropolisAdjustedLangevinAlgorithm(...) or MetropolisHastings(UncalibratedLangevin(...)).

Usage

mcmc_uncalibrated_langevin(
  target_log_prob_fn,
  step_size,
  volatility_fn = NULL,
  parallel_iterations = 10,
  compute_acceptance = TRUE,
  seed = NULL,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state (if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

step_size

Tensor or list of Tensors representing the step size for the leapfrog integrator. Must broadcast with the shape of current_state. Larger step sizes lead to faster progress, but too-large step sizes make rejection exponentially more likely. When possible, it's often helpful to match per-variable step sizes to the standard deviations of the target distribution in each variable.

volatility_fn

function which takes an argument like current_state (or ⁠*current_state⁠ if it's a list) and returns volatility value at current_state. Should return a Tensor or list of Tensors that must broadcast with the shape of current_state. Defaults to the identity function.

parallel_iterations

the number of coordinates for which the gradients of the volatility matrix volatility_fn can be computed in parallel.

compute_acceptance

logical indicating whether to compute the Metropolis log-acceptance ratio used to construct MetropolisAdjustedLangevinAlgorithm kernel.

seed

integer to seed the random number generator.

name

String prefixed to Ops created by this function. Default value: NULL (i.e., 'mala_kernel').

Value

list of next_state (Tensor or Python list of Tensors representing the state(s) of the Markov chain(s) at each result step. Has same shape as and current_state.) and kernel_results (collections$namedtuple of internal calculations used to 'advance the chain).

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_random_walk()


Generate proposal for the Random Walk Metropolis algorithm.

Description

Warning: this kernel will not result in a chain which converges to the target_log_prob. To get a convergent MCMC, use mcmc_random_walk_metropolis(...) or mcmc_metropolis_hastings(mcmc_uncalibrated_random_walk(...)).

Usage

mcmc_uncalibrated_random_walk(
  target_log_prob_fn,
  new_state_fn = NULL,
  seed = NULL,
  name = NULL
)

Arguments

target_log_prob_fn

Function which takes an argument like current_state ((if it's a list current_state will be unpacked) and returns its (possibly unnormalized) log-density under the target distribution.

new_state_fn

Function which takes a list of state parts and a seed; returns a same-type list of Tensors, each being a perturbation of the input state parts. The perturbation distribution is assumed to be a symmetric distribution centered at the input state part. Default value: NULL which is mapped to tfp$mcmc$random_walk_normal_fn().

seed

integer to seed the random number generator.

name

String name prefixed to Ops created by this function. Default value: NULL (i.e., 'rwm_kernel').

Value

a Monte Carlo sampling kernel

See Also

Other mcmc_kernels: mcmc_dual_averaging_step_size_adaptation(), mcmc_hamiltonian_monte_carlo(), mcmc_metropolis_adjusted_langevin_algorithm(), mcmc_metropolis_hastings(), mcmc_no_u_turn_sampler(), mcmc_random_walk_metropolis(), mcmc_replica_exchange_mc(), mcmc_simple_step_size_adaptation(), mcmc_slice_sampler(), mcmc_transformed_transition_kernel(), mcmc_uncalibrated_hamiltonian_monte_carlo(), mcmc_uncalibrated_langevin()


number of params needed to create a CategoricalMixtureOfOneHotCategorical distribution

Description

number of params needed to create a CategoricalMixtureOfOneHotCategorical distribution

Usage

params_size_categorical_mixture_of_one_hot_categorical(
  event_size,
  num_components
)

Arguments

event_size

event size of this distribution

num_components

number of components in the mixture

Value

a scalar


number of params needed to create an IndependentBernoulli distribution

Description

number of params needed to create an IndependentBernoulli distribution

Usage

params_size_independent_bernoulli(event_size)

Arguments

event_size

event size of this distribution

Value

a scalar


number of params needed to create an IndependentLogistic distribution

Description

number of params needed to create an IndependentLogistic distribution

Usage

params_size_independent_logistic(event_size)

Arguments

event_size

event size of this distribution

Value

a scalar


number of params needed to create an IndependentNormal distribution

Description

number of params needed to create an IndependentNormal distribution

Usage

params_size_independent_normal(event_size)

Arguments

event_size

event size of this distribution

Value

a scalar


number of params needed to create an IndependentPoisson distribution

Description

number of params needed to create an IndependentPoisson distribution

Usage

params_size_independent_poisson(event_size)

Arguments

event_size

event size of this distribution

Value

a scalar


number of params needed to create a MixtureLogistic distribution

Description

number of params needed to create a MixtureLogistic distribution

Usage

params_size_mixture_logistic(num_components, event_shape)

Arguments

num_components

Number of component distributions in the mixture distribution.

event_shape

Number of parameters needed to create a single component distribution.

Value

a scalar


number of params needed to create a MixtureNormal distribution

Description

number of params needed to create a MixtureNormal distribution

Usage

params_size_mixture_normal(num_components, event_shape)

Arguments

num_components

Number of component distributions in the mixture distribution.

event_shape

Number of parameters needed to create a single component distribution.

Value

a scalar


number of params needed to create a MixtureSameFamily distribution

Description

number of params needed to create a MixtureSameFamily distribution

Usage

params_size_mixture_same_family(num_components, component_params_size)

Arguments

num_components

Number of component distributions in the mixture distribution.

component_params_size

Number of parameters needed to create a single component distribution.

Value

a scalar


number of params needed to create a MultivariateNormalTriL distribution

Description

number of params needed to create a MultivariateNormalTriL distribution

Usage

params_size_multivariate_normal_tri_l(event_size)

Arguments

event_size

event size of this distribution

Value

a scalar


number of params needed to create a OneHotCategorical distribution

Description

number of params needed to create a OneHotCategorical distribution

Usage

params_size_one_hot_categorical(event_size)

Arguments

event_size

event size of this distribution

Value

a scalar


A state space model representing a sum of component state space models.

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfd_linear_gaussian_state_space_model for details.

Usage

sts_additive_state_space_model(
  component_ssms,
  constant_offset = 0,
  observation_noise_scale = NULL,
  initial_state_prior = NULL,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

component_ssms

list containing one or more tfd_linear_gaussian_state_space_model instances. The components will in general implement different time-series models, with possibly different latent_size, but they must have the same dtype, event shape (num_timesteps and observation_size), and their batch shapes must broadcast to a compatible batch shape.#'

constant_offset

scalar float tensor, or batch of scalars, specifying a constant value added to the sum of outputs from the component models. This allows the components to model the shifted series observed_time_series - constant_offset. Default value: 0.#'

observation_noise_scale

Optional scalar float tensor indicating the standard deviation of the observation noise. May contain additional batch dimensions, which must broadcast with the batch shape of elements in component_ssms. If observation_noise_scale is specified for the sts_additive_state_space_model, the observation noise scales of component models are ignored. If NULL, the observation noise scale is derived by summing the noise variances of the component models, i.e., ⁠observation_noise_scale = sqrt(sum([ssm.observation_noise_scale**2 for ssm in component_ssms]))⁠.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states. Must have event shape ⁠[1]⁠ (as tfd_linear_gaussian_state_space_model requires a rank-1 event shape).

initial_step

Optional scalar integer tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

string prefixed to ops created by this class. Default value: "AdditiveStateSpaceModel".

Details

The sts_additive_state_space_model represents a sum of component state space models. Each of the N components describes a random process generating a distribution on observed time series ⁠x1[t], x2[t], ..., xN[t]⁠. The additive model represents the sum of these processes, y[t] = x1[t] + x2[t] + ... + xN[t] + eps[t], where eps[t] ~ N(0, observation_noise_scale) is an observation noise term.

Mathematical Details

The additive model concatenates the latent states of its component models. The generative process runs each component's dynamics in its own subspace of latent space, and then observes the sum of the observation models from the components.

Formally, the transition model is linear Gaussian:

p(z[t+1] | z[t]) ~ Normal(loc = transition_matrix.matmul(z[t]), cov = transition_cov)

where each z[t] is a latent state vector concatenating the component state vectors, ⁠z[t] = [z1[t], z2[t], ..., zN[t]]⁠, so it has size ⁠latent_size = sum([c.latent_size for c in components])⁠.

The transition matrix is the block-diagonal composition of transition matrices from the component processes:

transition_matrix =
 [[ c0.transition_matrix,  0.,                   ..., 0.                   ],
  [ 0.,                    c1.transition_matrix, ..., 0.                   ],
  [ ...                    ...                   ...                       ],
  [ 0.,                    0.,                   ..., cN.transition_matrix ]]

and the noise covariance is similarly the block-diagonal composition of component noise covariances:

transition_cov =
 [[ c0.transition_cov, 0.,                ..., 0.                ],
  [ 0.,                c1.transition_cov, ..., 0.                ],
  [ ...                ...                     ...               ],
  [ 0.,                0.,                ..., cN.transition_cov ]]

The observation model is also linear Gaussian,

p(y[t] | z[t]) ~ Normal(loc = observation_matrix.matmul(z[t]), stddev = observation_noise_scale)

This implementation assumes scalar observations, so observation_matrix has shape ⁠[1, latent_size]⁠. The additive observation matrix simply concatenates the observation matrices from each component:

observation_matrix = concat([c0.obs_matrix, c1.obs_matrix, ..., cN.obs_matrix], axis=-1)

The effect is that each component observation matrix acts on the dimensions of latent state corresponding to that component, and the overall expected observation is the sum of the expected observations from each component.

If observation_noise_scale is not explicitly specified, it is also computed by summing the noise variances of the component processes:

observation_noise_scale = sqrt(sum([c.observation_noise_scale**2 for c in components]))

Value

an instance of LinearGaussianStateSpaceModel.

See Also

Other sts: sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Formal representation of an autoregressive model.

Description

An autoregressive (AR) model posits a latent level whose value at each step is a noisy linear combination of previous steps:

level[t+1] = (sum(coefficients * levels[t:t-order:-1]) + Normal(0., level_scale))

Usage

sts_autoregressive(
  observed_time_series = NULL,
  order,
  coefficients_prior = NULL,
  level_scale_prior = NULL,
  initial_state_prior = NULL,
  coefficient_constraining_bijector = NULL,
  name = NULL
)

Arguments

observed_time_series

optional float tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations. Default value: NULL.

order

scalar positive integer specifying the number of past timesteps to regress on.

coefficients_prior

optional Distribution instance specifying a prior on the coefficients parameter. If NULL, a default standard normal (tfd_multivariate_normal_diag(scale_diag = tf$ones(list(order)))) prior is used. Default value: NULL.

level_scale_prior

optional Distribution instance specifying a prior on the level_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_state_prior

optional Distribution instance specifying a prior on the initial state, corresponding to the values of the process at a set of size order of imagined timesteps before the initial step. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

coefficient_constraining_bijector

optional Bijector instance representing a constraining mapping for the autoregressive coefficients. For example, tfb_tanh() constrains the coefficients to lie in ⁠(-1, 1)⁠, while tfb_softplus() constrains them to be positive, and tfb_identity() implies no constraint. If NULL, the default behavior constrains the coefficients to lie in ⁠(-1, 1)⁠ using a tanh bijector. Default value: NULL.

name

the name of this model component. Default value: 'Autoregressive'.

Details

The latent state is levels[t:t-order:-1]. We observe a noisy realization of the current level: f[t] = level[t] + Normal(0., observation_noise_scale) at each timestep.

If ⁠coefficients=[1.]⁠, the AR process is a simple random walk, equivalent to a LocalLevel model. However, a random walk's variance increases with time, while many AR processes (in particular, any first-order process with abs(coefficient) < 1) are stationary, i.e., they maintain a constant variance over time. This makes AR processes useful models of uncertainty.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


State space model for an autoregressive process.

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfd_linear_gaussian_state_space_model for details.

Usage

sts_autoregressive_state_space_model(
  num_timesteps,
  coefficients,
  level_scale,
  initial_state_prior,
  observation_noise_scale = 0,
  initial_step = 0,
  validate_args = FALSE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer tensor number of timesteps to model with this distribution.

coefficients

float tensor of shape tf$concat(batch_shape, list(order)) defining the autoregressive coefficients. The coefficients are defined backwards in time: coefficients[0] * level[t] + coefficients[1] * level[t-1] + ... + coefficients[order-1] * level[t-order+1].

level_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the transition noise at each step.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states. Must have event shape list(order).

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the observation noise. Default value: 0.

initial_step

Optional scalar int tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

name

name prefixed to ops created by this class. Default value: "AutoregressiveStateSpaceModel".

Details

In an autoregressive process, the expected level at each timestep is a linear function of previous levels, with added Gaussian noise:

level[t+1] = (sum(coefficients * levels[t:t-order:-1]) + Normal(0., level_scale))

The process is characterized by a vector coefficients whose size determines the order of the process (how many previous values it looks at), and by level_scale, the standard deviation of the noise added at each step. This is formulated as a state space model by letting the latent state encode the most recent values; see 'Mathematical Details' below.

The parameters level_scale and observation_noise_scale are each (a batch of) scalars, and coefficients is a (batch) vector of size list(order). The batch shape of this Distribution is the broadcast batch shape of these parameters and of the initial_state_prior.

Mathematical Details

The autoregressive model implements a tfd_linear_gaussian_state_space_model with latent_size = order and observation_size = 1. The latent state vector encodes the recent history of the process, with the current value in the topmost dimension. At each timestep, the transition sums the previous values to produce the new expected value, shifts all other values down by a dimension, and adds noise to the current value. This is formally encoded by the transition model:

transition_matrix = [ coefs[0], coefs[1], ..., coefs[order]
                      1.,       0 ,       ..., 0.
                      0.,       1.,       ..., 0.
                      ...
                      0.,       0.,  ...,  1., 0.         ]
transition_noise ~ N(loc=0., scale=diag([level_scale, 0., 0., ..., 0.]))

The observation model simply extracts the current (topmost) value, and optionally adds independent noise at each step:

observation_matrix = [[1., 0., ..., 0.]]
observation_noise ~ N(loc=0, scale=observation_noise_scale)

Models with observation_noise_scale = 0 are AR processes in the formal sense. Setting observation_noise_scale to a nonzero value corresponds to a latent AR process observed under an iid noise model.

Value

an instance of LinearGaussianStateSpaceModel.

See Also

Other sts: sts_additive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Build a variational posterior that factors over model parameters.

Description

The surrogate posterior consists of independent Normal distributions for each parameter with trainable loc and scale, transformed using the parameter's bijector to the appropriate support space for that parameter.

Usage

sts_build_factored_surrogate_posterior(
  model,
  batch_shape = list(),
  seed = NULL,
  name = NULL
)

Arguments

model

An instance of StructuralTimeSeries representing a time-series model. This represents a joint distribution over time-series and their parameters with batch shape ⁠[b1, ..., bN]⁠.#'

batch_shape

Batch shape (list, or integer) of initial states to optimize in parallel. Default value: list(). (i.e., just run a single optimization).

seed

integer to seed the random number generator.

name

string prefixed to ops created by this function. Default value: NULL (i.e., 'build_factored_surrogate_posterior').

Value

variational_posterior tfd_joint_distribution_named defining a trainable surrogate posterior over model parameters. Samples from this distribution are named lists with character parameter names as keys.

See Also

Other sts-functions: sts_build_factored_variational_loss(), sts_decompose_by_component(), sts_decompose_forecast_by_component(), sts_fit_with_hmc(), sts_forecast(), sts_one_step_predictive(), sts_sample_uniform_initial_state()


Build a loss function for variational inference in STS models.

Description

Variational inference searches for the distribution within some family of approximate posteriors that minimizes a divergence between the approximate posterior q(z) and true posterior p(z|observed_time_series). By converting inference to optimization, it's generally much faster than sampling-based inference algorithms such as HMC. The tradeoff is that the approximating family rarely contains the true posterior, so it may miss important aspects of posterior structure (in particular, dependence between variables) and should not be blindly trusted. Results may vary; it's generally wise to compare to HMC to evaluate whether inference quality is sufficient for your task at hand.

Usage

sts_build_factored_variational_loss(
  observed_time_series,
  model,
  init_batch_shape = list(),
  seed = NULL,
  name = NULL
)

Arguments

observed_time_series

float tensor of shape ⁠concat([sample_shape, model.batch_shape, [num_timesteps, 1]])⁠ where sample_shape corresponds to i.i.d. observations, and the trailing ⁠[1]⁠ dimension may (optionally) be omitted if num_timesteps > 1. May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations.

model

An instance of StructuralTimeSeries representing a time-series model. This represents a joint distribution over time-series and their parameters with batch shape ⁠[b1, ..., bN]⁠.

init_batch_shape

Batch shape (list) of initial states to optimize in parallel. Default value: list(). (i.e., just run a single optimization).

seed

integer to seed the random number generator.

name

name prefixed to ops created by this function. Default value: NULL (i.e., 'build_factored_variational_loss').

Details

This method constructs a loss function for variational inference using the Kullback-Liebler divergence KL[q(z) || p(z|observed_time_series)], with an approximating family given by independent Normal distributions transformed to the appropriate parameter space for each parameter. Minimizing this loss (the negative ELBO) maximizes a lower bound on the log model evidence ⁠-log p(observed_time_series)⁠. This is equivalent to the 'mean-field' method implemented in Kucukelbir et al. (2017) and is a standard approach. The resulting posterior approximations are unimodal; they will tend to underestimate posterior uncertainty when the true posterior contains multiple modes (the KL[q||p] divergence encourages choosing a single mode) or dependence between variables.

Value

list of:

  • variational_loss: float Tensor of shape ⁠tf$concat([init_batch_shape, model$batch_shape])⁠, encoding a stochastic estimate of an upper bound on the negative model evidence ⁠-log p(y)⁠. Minimizing this loss performs variational inference; the gap between the variational bound and the true (generally unknown) model evidence corresponds to the divergence KL[q||p] between the approximate and true posterior.

  • variational_distributions: a named list giving the approximate posterior for each model parameter. The keys are character parameter names in order, corresponding to ⁠[param.name for param in model.parameters]⁠. The values are tfd$Distribution instances with batch shape ⁠tf$concat([init_batch_shape, model$batch_shape])⁠; these will typically be of the form tfd$TransformedDistribution(tfd.Normal(...), bijector=param.bijector).

References

See Also

Other sts-functions: sts_build_factored_surrogate_posterior(), sts_decompose_by_component(), sts_decompose_forecast_by_component(), sts_fit_with_hmc(), sts_forecast(), sts_one_step_predictive(), sts_sample_uniform_initial_state()


Seasonal state space model with effects constrained to sum to zero.

Description

Seasonal state space model with effects constrained to sum to zero.

Usage

sts_constrained_seasonal_state_space_model(
  num_timesteps,
  num_seasons,
  drift_scale,
  initial_state_prior,
  observation_noise_scale = 1e-04,
  num_steps_per_season = 1,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer tensor number of timesteps to model with this distribution.

num_seasons

Scalar integer number of seasons.

drift_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the change in effect between consecutive occurrences of a given season. This is assumed to be the same for all seasons.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states; must have event shape ⁠[num_seasons]⁠.

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the observation noise.

num_steps_per_season

integer number of steps in each season. This may be either a scalar (shape ⁠[]⁠), in which case all seasons have the same length, or an array of shape ⁠[num_seasons]⁠, in which seasons have different length, but remain constant around different cycles, or an array of shape ⁠[num_cycles, num_seasons]⁠, in which num_steps_per_season for each season also varies in different cycle (e.g., a 4 years cycle with leap day). Default value: 1.

initial_step

Optional scalar integer tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

string prefixed to ops created by this class. Default value: "SeasonalStateSpaceModel".

Value

an instance of LinearGaussianStateSpaceModel.

See Also

sts_seasonal_state_space_model().

Mathematical details

The constrained model implements a reparameterization of the naive SeasonalStateSpaceModel. Instead of directly representing the seasonal effects in the latent space, the latent space of the constrained model represents the difference between each effect and the mean effect. The following discussion assumes familiarity with the mathematical details of SeasonalStateSpaceModel.

Reparameterization and constraints: let the seasonal effects at a given timestep be ⁠E = [e_1, ..., e_N]⁠. The difference between each effect e_i and the mean effect is z_i = e_i - sum_i(e_i)/N. By itself, this transformation is not invertible because recovering the absolute effects requires that we know the mean as well. To fix this, we'll define z_N = sum_i(e_i)/N as the mean effect. It's easy to see that this is invertible: given the mean effect and the differences of the first N - 1 effects from the mean, it's easy to solve for all N effects. Formally, we've defined the invertible linear reparameterization ⁠Z = R E⁠, where

R = [1 - 1/N, -1/N,    ..., -1/N
     -1/N,    1 - 1/N, ..., -1/N,
     ...
     1/N,     1/N,     ...,  1/N]

represents the change of basis from 'effect coordinates' E to 'residual coordinates' Z. The Zs form the latent space of the ConstrainedSeasonalStateSpaceModel. To constrain the mean effect z_N to zero, we fix the prior to zero, p(z_N) ~ N(0., 0), and after the transition at each timestep we project z_N back to zero. Note that this projection is linear: to set the Nth dimension to zero, we simply multiply by the identity matrix with a missing element in the bottom right, i.e., ⁠Z_constrained = P Z⁠, where ⁠P = eye(N) - scatter((N-1, N-1), 1)⁠.

Model: concretely, suppose a naive seasonal effect model has initial state prior N(m, S), transition matrix F and noise covariance Q, and observation matrix H. Then the corresponding constrained seasonal effect model has initial state prior ⁠N(P R m, P R S R' P')⁠, transition matrix ⁠P R F R^-1⁠ and noise covariance ⁠F R Q R' F'⁠, and observation matrix ⁠H R^-1⁠, where the change-of-basis matrix R and constraint projection matrix P are as defined above. This follows directly from applying the reparameterization ⁠Z = R E⁠, and then enforcing the zero-sum constraint on the prior and transition noise covariances. In practice, because the sum of effects z_N is constrained to be zero, it will never contribute a term to any linear operation on the latent space, so we can drop that dimension from the model entirely. ConstrainedSeasonalStateSpaceModel does this, so that it implements the N - 1 dimension latent space ⁠z_1, ..., z_[N-1]⁠. Note that since we constrained the mean effect to be zero, the latent z_i's now recover their interpretation as the actual effects, z_i = e_i for ⁠i = ⁠1, ..., N - 1⁠, even though they were originally defined as residuals. The ⁠N⁠th effect is represented only implicitly, as the nonzero mean of the first ⁠N - 1⁠effects. Although the computational represention is not symmetric across all⁠N⁠effects, we derived the⁠ConstrainedSeasonalStateSpaceModel⁠by starting with a symmetric representation and imposing only a symmetric constraint (the zero-sum constraint), so the probability model remains symmetric over all⁠N' seasonal effects.

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Decompose an observed time series into contributions from each component.

Description

This method decomposes a time series according to the posterior represention of a structural time series model. In particular, it:

  • Computes the posterior marginal mean and covariances over the additive model's latent space.

  • Decomposes the latent posterior into the marginal blocks for each model component.

  • Maps the per-component latent posteriors back through each component's observation model, to generate the time series modeled by that component.

Usage

sts_decompose_by_component(observed_time_series, model, parameter_samples)

Arguments

observed_time_series

float tensor of shape ⁠concat([sample_shape, model.batch_shape, [num_timesteps, 1]])⁠ where sample_shape corresponds to i.i.d. observations, and the trailing ⁠[1]⁠ dimension may (optionally) be omitted if num_timesteps > 1. May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations.

model

An instance of sts_sum representing a structural time series model.

parameter_samples

list of tensors representing posterior samples of model parameters, with shapes ⁠list(tf$concat(list(list(num_posterior_draws), param<1>$prior$batch_shape, param<1>$prior$event_shape), list(list(num_posterior_draws), param<2>$prior$batch_shape, param<2>$prior$event_shape), ... ) )⁠ for all model parameters. This may optionally also be a named list mapping parameter names to tensor values.

Value

component_dists A named list mapping component StructuralTimeSeries instances (elements of model$components) to Distribution instances representing the posterior marginal distributions on the process modeled by each component. Each distribution has batch shape matching that of posterior_means/posterior_covs, and event shape of list(num_timesteps).

See Also

Other sts-functions: sts_build_factored_surrogate_posterior(), sts_build_factored_variational_loss(), sts_decompose_forecast_by_component(), sts_fit_with_hmc(), sts_forecast(), sts_one_step_predictive(), sts_sample_uniform_initial_state()

Examples

observed_time_series <- array(rnorm(2 * 1 * 12), dim = c(2, 1, 12))
day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7, name = "seasonal")
local_linear_trend <- observed_time_series %>% sts_local_linear_trend(name = "local_linear")
model <- observed_time_series %>%
  sts_sum(components = list(day_of_week, local_linear_trend))
states_and_results <- observed_time_series %>%
  sts_fit_with_hmc(
    model,
    num_results = 10,
    num_warmup_steps = 5,
    num_variational_steps = 15
    )
samples <- states_and_results[[1]]

component_dists <- observed_time_series %>%
 sts_decompose_by_component(model = model, parameter_samples = samples)

Decompose a forecast distribution into contributions from each component.

Description

Decompose a forecast distribution into contributions from each component.

Usage

sts_decompose_forecast_by_component(model, forecast_dist, parameter_samples)

Arguments

model

An instance of sts_sum representing a structural time series model.

forecast_dist

A Distribution instance returned by sts_forecast(). (specifically, must be a tfd.MixtureSameFamily over a tfd_linear_gaussian_state_space_model parameterized by posterior samples).

parameter_samples

list of tensors representing posterior samples of model parameters, with shapes ⁠list(tf$concat(list(list(num_posterior_draws), param<1>$prior$batch_shape, param<1>$prior$event_shape), list(list(num_posterior_draws), param<2>$prior$batch_shape, param<2>$prior$event_shape), ... ) )⁠ for all model parameters. This may optionally also be a named list mapping parameter names to tensor values.

Value

component_dists A named list mapping component StructuralTimeSeries instances (elements of model$components) to Distribution instances representing the marginal forecast for each component. Each distribution has batch shape matching forecast_dist (specifically, the event shape is ⁠[num_steps_forecast]⁠).

See Also

Other sts-functions: sts_build_factored_surrogate_posterior(), sts_build_factored_variational_loss(), sts_decompose_by_component(), sts_fit_with_hmc(), sts_forecast(), sts_one_step_predictive(), sts_sample_uniform_initial_state()


Formal representation of a dynamic linear regression model.

Description

The dynamic linear regression model is a special case of a linear Gaussian SSM and a generalization of typical (static) linear regression. The model represents regression weights with a latent state which evolves via a Gaussian random walk:

Usage

sts_dynamic_linear_regression(
  observed_time_series = NULL,
  design_matrix,
  drift_scale_prior = NULL,
  initial_weights_prior = NULL,
  name = NULL
)

Arguments

observed_time_series

optional float tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations. Default value: NULL.

design_matrix

float tensor of shape tf$concat(list(batch_shape, list(num_timesteps, num_features))). This may also optionally be an instance of tf$linalg$LinearOperator.

drift_scale_prior

instance of Distribution specifying a prior on the drift_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_weights_prior

instance of tfd_multivariate_normal representing the prior distribution on the latent states (the regression weights). Must have event shape list(num_features). If NULL, a weakly-informative Normal(0, 10) prior is used. Default value: NULL.

name

the name of this component. Default value: 'DynamicLinearRegression'.

Details

weights[t] ~ Normal(weights[t-1], drift_scale)

The latent state has dimension num_features, while the parameters drift_scale and observation_noise_scale are each (a batch of) scalars. The batch shape of this distribution is the broadcast batch shape of these parameters, the initial_state_prior, and the design_matrix. num_features is determined from the last dimension of design_matrix (equivalent to the number of columns in the design matrix in linear regression).

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


State space model for a dynamic linear regression from provided covariates.

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfd_linear_gaussian_state_space_model for details.

Usage

sts_dynamic_linear_regression_state_space_model(
  num_timesteps,
  design_matrix,
  drift_scale,
  initial_state_prior,
  observation_noise_scale = 0,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer tensor, number of timesteps to model with this distribution.

design_matrix

float tensor of shape tf$concat(list(batch_shape, list(num_timesteps, num_features))). This may also optionally be an instance of tf$linalg$LinearOperator.

drift_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the latent state transitions.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states. Must have event shape list(num_features).

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the observation noise. Default value: 0.

initial_step

scalar integer tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

name prefixed to ops created by this class. Default value: 'DynamicLinearRegressionStateSpaceModel'.

Details

The dynamic linear regression model is a special case of a linear Gaussian SSM and a generalization of typical (static) linear regression. The model represents regression weights with a latent state which evolves via a Gaussian random walk: weights[t] ~ Normal(weights[t-1], drift_scale)

The latent state (the weights) has dimension num_features, while the parameters drift_scale and observation_noise_scale are each (a batch of) scalars. The batch shape of this Distribution is the broadcast batch shape of these parameters, the initial_state_prior, and the design_matrix. num_features is determined from the last dimension of design_matrix (equivalent to the number of columns in the design matrix in linear regression).

Mathematical Details

The dynamic linear regression model implements a tfd_linear_gaussian_state_space_model with latent_size = num_features and observation_size = 1 following the transition model:

transition_matrix = eye(num_features)
transition_noise ~ Normal(0, diag([drift_scale]))

which implements the evolution of weights described above. The observation model is:

observation_matrix[t] = design_matrix[t]
observation_noise ~ Normal(0, observation_noise_scale)

Value

an instance of LinearGaussianStateSpaceModel.

See Also

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Draw posterior samples using Hamiltonian Monte Carlo (HMC)

Description

Markov chain Monte Carlo (MCMC) methods are considered the gold standard of Bayesian inference; under suitable conditions and in the limit of infinitely many draws they generate samples from the true posterior distribution. HMC (Neal, 2011) uses gradients of the model's log-density function to propose samples, allowing it to exploit posterior geometry. However, it is computationally more expensive than variational inference and relatively sensitive to tuning.

Usage

sts_fit_with_hmc(
  observed_time_series,
  model,
  num_results = 100,
  num_warmup_steps = 50,
  num_leapfrog_steps = 15,
  initial_state = NULL,
  initial_step_size = NULL,
  chain_batch_shape = list(),
  num_variational_steps = 150,
  variational_optimizer = NULL,
  variational_sample_size = 5,
  seed = NULL,
  name = NULL
)

Arguments

observed_time_series

float tensor of shape ⁠concat([sample_shape, model.batch_shape, [num_timesteps, 1]])⁠ where sample_shape corresponds to i.i.d. observations, and the trailing ⁠[1]⁠ dimension may (optionally) be omitted if num_timesteps > 1. May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations.

model

An instance of StructuralTimeSeries representing a time-series model. This represents a joint distribution over time-series and their parameters with batch shape ⁠[b1, ..., bN]⁠.

num_results

Integer number of Markov chain draws. Default value: 100.

num_warmup_steps

Integer number of steps to take before starting to collect results. The warmup steps are also used to adapt the step size towards a target acceptance rate of 0.75. Default value: 50.

num_leapfrog_steps

Integer number of steps to run the leapfrog integrator for. Total progress per HMC step is roughly proportional to step_size * num_leapfrog_steps. Default value: 15.

initial_state

Optional Python list of Tensors, one for each model parameter, representing the initial state(s) of the Markov chain(s). These should have shape tf$concat(list(chain_batch_shape, param$prior$batch_shape, param$prior$event_shape)). If NULL, the initial state is set automatically using a sample from a variational posterior. Default value: NULL.

initial_step_size

list of tensors, one for each model parameter, representing the step size for the leapfrog integrator. Must broadcast with the shape of initial_state. Larger step sizes lead to faster progress, but too-large step sizes make rejection exponentially more likely. If NULL, the step size is set automatically using the standard deviation of a variational posterior. Default value: NULL.

chain_batch_shape

Batch shape (list or int) of chains to run in parallel. Default value: list() (i.e., a single chain).

num_variational_steps

int number of steps to run the variational optimization to determine the initial state and step sizes. Default value: 150.

variational_optimizer

Optional tf$train$Optimizer instance to use in the variational optimization. If NULL, defaults to tf$train$AdamOptimizer(0.1). Default value: NULL.

variational_sample_size

integer number of Monte Carlo samples to use in estimating the variational divergence. Larger values may stabilize the optimization, but at higher cost per step in time and memory. Default value: 1.

seed

integer to seed the random number generator.

name

name prefixed to ops created by this function. Default value: NULL (i.e., 'fit_with_hmc').

Details

This method attempts to provide a sensible default approach for fitting StructuralTimeSeries models using HMC. It first runs variational inference as a fast posterior approximation, and initializes the HMC sampler from the variational posterior, using the posterior standard deviations to set per-variable step sizes (equivalently, a diagonal mass matrix). During the warmup phase, it adapts the step size to target an acceptance rate of 0.75, which is thought to be in the desirable range for optimal mixing (Betancourt et al., 2014).

Value

list of:

  • samples: list of Tensors representing posterior samples of model parameters, with shapes ⁠[concat([[num_results], chain_batch_shape, param.prior.batch_shape, param.prior.event_shape]) for param in model.parameters]⁠.

  • kernel_results: A (possibly nested) list of Tensors representing internal calculations made within the HMC sampler.

References

See Also

Other sts-functions: sts_build_factored_surrogate_posterior(), sts_build_factored_variational_loss(), sts_decompose_by_component(), sts_decompose_forecast_by_component(), sts_forecast(), sts_one_step_predictive(), sts_sample_uniform_initial_state()

Examples

observed_time_series <-
  rep(c(3.5, 4.1, 4.5, 3.9, 2.4, 2.1, 1.2), 5) +
  rep(c(1.1, 1.5, 2.4, 3.1, 4.0), each = 7) %>%
  tensorflow::tf$convert_to_tensor(dtype = tensorflow::tf$float64)
day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7)
local_linear_trend <- observed_time_series %>% sts_local_linear_trend()
model <- observed_time_series %>%
  sts_sum(components = list(day_of_week, local_linear_trend))
states_and_results <- observed_time_series %>%
  sts_fit_with_hmc(
    model,
    num_results = 10,
    num_warmup_steps = 5,
    num_variational_steps = 15)

Construct predictive distribution over future observations

Description

Given samples from the posterior over parameters, return the predictive distribution over future observations for num_steps_forecast timesteps.

Usage

sts_forecast(
  observed_time_series,
  model,
  parameter_samples,
  num_steps_forecast
)

Arguments

observed_time_series

float tensor of shape ⁠concat([sample_shape, model.batch_shape, [num_timesteps, 1]])⁠ where sample_shape corresponds to i.i.d. observations, and the trailing ⁠[1]⁠ dimension may (optionally) be omitted if num_timesteps > 1. May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations.

model

An instance of StructuralTimeSeries representing a time-series model. This represents a joint distribution over time-series and their parameters with batch shape ⁠[b1, ..., bN]⁠.

parameter_samples

list of tensors representing posterior samples of model parameters, with shapes ⁠list(tf$concat(list(list(num_posterior_draws), param<1>$prior$batch_shape, param<1>$prior$event_shape), list(list(num_posterior_draws), param<2>$prior$batch_shape, param<2>$prior$event_shape), ... ) )⁠ for all model parameters. This may optionally also be a named list mapping parameter names to tensor values.

num_steps_forecast

scalar integer tensor number of steps to forecast

Value

forecast_dist a tfd_mixture_same_family instance with event shape list(num_steps_forecast, 1) and batch shape tf$concat(list(sample_shape, model$batch_shape)), with num_posterior_draws mixture components.

See Also

Other sts-functions: sts_build_factored_surrogate_posterior(), sts_build_factored_variational_loss(), sts_decompose_by_component(), sts_decompose_forecast_by_component(), sts_fit_with_hmc(), sts_one_step_predictive(), sts_sample_uniform_initial_state()

Examples

observed_time_series <-
  rep(c(3.5, 4.1, 4.5, 3.9, 2.4, 2.1, 1.2), 5) +
  rep(c(1.1, 1.5, 2.4, 3.1, 4.0), each = 7) %>%
  tensorflow::tf$convert_to_tensor(dtype = tensorflow::tf$float64)
day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7)
local_linear_trend <- observed_time_series %>% sts_local_linear_trend()
model <- observed_time_series %>%
  sts_sum(components = list(day_of_week, local_linear_trend))
states_and_results <- observed_time_series %>%
  sts_fit_with_hmc(
    model,
    num_results = 10,
    num_warmup_steps = 5,
    num_variational_steps = 15)
samples <- states_and_results[[1]]
preds <- observed_time_series %>%
  sts_forecast(model,
               parameter_samples = samples,
               num_steps_forecast = 50)
predictions <- preds %>% tfd_sample(10)

Formal representation of a linear regression from provided covariates.

Description

This model defines a time series given by a linear combination of covariate time series provided in a design matrix:

observed_time_series <- tf$matmul(design_matrix, weights)

Usage

sts_linear_regression(design_matrix, weights_prior = NULL, name = NULL)

Arguments

design_matrix

float tensor of shape tf$concat(list(batch_shape, list(num_timesteps, num_features))). This may also optionally be an instance of tf$linalg$LinearOperator.

weights_prior

Distribution representing a prior over the regression weights. Must have event shape list(num_features) and batch shape broadcastable to the design matrix's batch_shape. Alternately, event_shape may be scalar (list()), in which case the prior is internally broadcast as tfd_transformed_distribution(weights_prior, tfb_identity(), event_shape = list(num_features), batch_shape = design_matrix$batch_shape). If NULL, defaults to tfd_student_t(df = 5, loc = 0, scale = 10), a weakly-informative prior loosely inspired by the Stan prior choice recommendations. Default value: NULL.

name

the name of this model component. Default value: 'LinearRegression'.

Details

The design matrix has shape list(num_timesteps, num_features). The weights are treated as an unknown random variable of size list(num_features) (both components also support batch shape), and are integrated over using the same approximate inference tools as other model parameters, i.e., generally HMC or variational inference.

This component does not itself include observation noise; it defines a deterministic distribution with mass at the point tf$matmul(design_matrix, weights). In practice, it should be combined with observation noise from another component such as sts_sum, as demonstrated below.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Formal representation of a local level model

Description

The local level model posits a level evolving via a Gaussian random walk:

level[t] = level[t-1] + Normal(0., level_scale)

Usage

sts_local_level(
  observed_time_series = NULL,
  level_scale_prior = NULL,
  initial_level_prior = NULL,
  name = NULL
)

Arguments

observed_time_series

optional float tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations. Default value: NULL.

level_scale_prior

optional tfp$distribution instance specifying a prior on the level_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_level_prior

optional tfp$distribution instance specifying a prior on the initial level. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

name

the name of this model component. Default value: 'LocalLevel'.

Details

The latent state is ⁠[level]⁠. We observe a noisy realization of the current level: f[t] = level[t] + Normal(0., observation_noise_scale) at each timestep.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


State space model for a local level

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfd_linear_gaussian_state_space_model for details. The local level model is a special case of a linear Gaussian SSM, in which the latent state posits a level evolving via a Gaussian random walk:

level[t] = level[t-1] + Normal(0., level_scale)

Usage

sts_local_level_state_space_model(
  num_timesteps,
  level_scale,
  initial_state_prior,
  observation_noise_scale = 0,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer tensor number of timesteps to model with this distribution.

level_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the level transitions.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states. Must have event shape ⁠[1]⁠ (as tfd_linear_gaussian_state_space_model requires a rank-1 event shape).

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the observation noise.

initial_step

Optional scalar integer tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

string name prefixed to ops created by this class. Default value: "LocalLevelStateSpaceModel".

Details

The latent state is ⁠[level]⁠ and ⁠[level]⁠ is observed (with noise) at each timestep.

The parameters level_scale and observation_noise_scale are each (a batch of) scalars. The batch shape of this Distribution is the broadcast batch shape of these parameters and of the initial_state_prior.

Mathematical Details

The local level model implements a tfp$distributions$LinearGaussianStateSpaceModel with latent_size = 1 and observation_size = 1, following the transition model:

transition_matrix = [[1]]
transition_noise ~ N(loc = 0, scale = diag([level_scale]))

which implements the evolution of level described above, and the observation model:

observation_matrix = [[1]]
observation_noise ~ N(loc = 0, scale = observation_noise_scale)

Value

an instance of LinearGaussianStateSpaceModel.

See Also

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Formal representation of a local linear trend model

Description

The local linear trend model posits a level and slope, each evolving via a Gaussian random walk:

level[t] = level[t-1] + slope[t-1] + Normal(0., level_scale)
slope[t] = slope[t-1] + Normal(0., slope_scale)

Usage

sts_local_linear_trend(
  observed_time_series = NULL,
  level_scale_prior = NULL,
  slope_scale_prior = NULL,
  initial_level_prior = NULL,
  initial_slope_prior = NULL,
  name = NULL
)

Arguments

observed_time_series

optional float tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations. Default value: NULL.

level_scale_prior

optional tfp$distribution instance specifying a prior on the level_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

slope_scale_prior

optional tfd$Distribution instance specifying a prior on the slope_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_level_prior

optional tfp$distribution instance specifying a prior on the initial level. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_slope_prior

optional tfd$Distribution instance specifying a prior on the initial slope. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

name

the name of this model component. Default value: 'LocalLinearTrend'.

Details

The latent state is the two-dimensional tuple ⁠[level, slope]⁠. At each timestep we observe a noisy realization of the current level: f[t] = level[t] + Normal(0., observation_noise_scale). This model is appropriate for data where the trend direction and magnitude (latent slope) is consistent within short periods but may evolve over time.

Note that this model can produce very high uncertainty forecasts, as uncertainty over the slope compounds quickly. If you expect your data to have nonzero long-term trend, i.e. that slopes tend to revert to some mean, then the SemiLocalLinearTrend model may produce sharper forecasts.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


State space model for a local linear trend

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfd_linear_gaussian_state_space_model for details.

Usage

sts_local_linear_trend_state_space_model(
  num_timesteps,
  level_scale,
  slope_scale,
  initial_state_prior,
  observation_noise_scale = 0,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer tensor number of timesteps to model with this distribution.

level_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the level transitions.

slope_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the slope transitions.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states. Must have event shape ⁠[1]⁠ (as tfd_linear_gaussian_state_space_model requires a rank-1 event shape).

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the observation noise.

initial_step

Optional scalar integer tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

string prefixed to ops created by this class. Default value: "LocalLinearTrendStateSpaceModel".

Details

The local linear trend model is a special case of a linear Gaussian SSM, in which the latent state posits a level and slope, each evolving via a Gaussian random walk:

level[t] = level[t-1] + slope[t-1] + Normal(0., level_scale)
slope[t] = slope[t-1] + Normal(0., slope_scale)

The latent state is the two-dimensional tuple ⁠[level, slope]⁠. The level is observed at each timestep.

The parameters level_scale, slope_scale, and observation_noise_scale are each (a batch of) scalars. The batch shape of this Distribution is the broadcast batch shape of these parameters and of the initial_state_prior.

Mathematical Details

The linear trend model implements a tfd_linear_gaussian_state_space_model with latent_size = 2 and observation_size = 1, following the transition model:

transition_matrix = [[1., 1.]
                     [0., 1.]]
transition_noise ~ N(loc = 0, scale = diag([level_scale, slope_scale]))

which implements the evolution of ⁠[level, slope]⁠ described above, and the observation model:

observation_matrix = [[1., 0.]]
observation_noise ~ N(loc= 0 , scale = observation_noise_scale)

which picks out the first latent component, i.e., the level, as the observation at each timestep.

Value

an instance of LinearGaussianStateSpaceModel.

See Also

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Compute one-step-ahead predictive distributions for all timesteps

Description

Given samples from the posterior over parameters, return the predictive distribution over observations at each time T, given observations up through time T-1.

Usage

sts_one_step_predictive(
  observed_time_series,
  model,
  parameter_samples,
  timesteps_are_event_shape = TRUE
)

Arguments

observed_time_series

float tensor of shape ⁠concat([sample_shape, model.batch_shape, [num_timesteps, 1]])⁠ where sample_shape corresponds to i.i.d. observations, and the trailing ⁠[1]⁠ dimension may (optionally) be omitted if num_timesteps > 1. May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations.

model

An instance of StructuralTimeSeries representing a time-series model. This represents a joint distribution over time-series and their parameters with batch shape ⁠[b1, ..., bN]⁠.

parameter_samples

list of tensors representing posterior samples of model parameters, with shapes ⁠list(tf$concat(list(list(num_posterior_draws), param<1>$prior$batch_shape, param<1>$prior$event_shape), list(list(num_posterior_draws), param<2>$prior$batch_shape, param<2>$prior$event_shape), ... ) )⁠ for all model parameters. This may optionally also be a named list mapping parameter names to tensor values.

timesteps_are_event_shape

Deprecated, for backwards compatibility only. If False, the predictive distribution will return per-timestep probabilities Default value: TRUE.

Value

forecast_dist a tfd_mixture_same_family instance with event shape list(num_timesteps) and batch shape tf$concat(list(sample_shape, model$batch_shape)), with num_posterior_draws mixture components. The tth step represents the forecast distribution p(observed_time_series[t] | observed_time_series[0:t-1], parameter_samples).

See Also

Other sts-functions: sts_build_factored_surrogate_posterior(), sts_build_factored_variational_loss(), sts_decompose_by_component(), sts_decompose_forecast_by_component(), sts_fit_with_hmc(), sts_forecast(), sts_sample_uniform_initial_state()


Initialize from a uniform ⁠[-2, 2]⁠ distribution in unconstrained space.

Description

Initialize from a uniform ⁠[-2, 2]⁠ distribution in unconstrained space.

Usage

sts_sample_uniform_initial_state(
  parameter,
  return_constrained = TRUE,
  init_sample_shape = list(),
  seed = NULL
)

Arguments

parameter

sts$Parameter named tuple instance.

return_constrained

if TRUE, re-applies the constraining bijector to return initializations in the original domain. Otherwise, returns initializations in the unconstrained space. Default value: TRUE.

init_sample_shape

sample_shape of the sampled initializations. Default value: list().

seed

integer to seed the random number generator.

Value

uniform_initializer Tensor of shape ⁠concat([init_sample_shape, parameter.prior.batch_shape, transformed_event_shape])⁠, where transformed_event_shape is parameter.prior.event_shape, if return_constrained=TRUE, and otherwise it is parameter$bijector$inverse_event_shape(parameter$prior$event_shape).

See Also

Other sts-functions: sts_build_factored_surrogate_posterior(), sts_build_factored_variational_loss(), sts_decompose_by_component(), sts_decompose_forecast_by_component(), sts_fit_with_hmc(), sts_forecast(), sts_one_step_predictive()


Formal representation of a seasonal effect model.

Description

A seasonal effect model posits a fixed set of recurring, discrete 'seasons', each of which is active for a fixed number of timesteps and, while active, contributes a different effect to the time series. These are generally not meteorological seasons, but represent regular recurring patterns such as hour-of-day or day-of-week effects. Each season lasts for a fixed number of timesteps. The effect of each season drifts from one occurrence to the next following a Gaussian random walk:

Usage

sts_seasonal(
  observed_time_series = NULL,
  num_seasons,
  num_steps_per_season = 1,
  drift_scale_prior = NULL,
  initial_effect_prior = NULL,
  constrain_mean_effect_to_zero = TRUE,
  name = NULL
)

Arguments

observed_time_series

optional float tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations. Default value: NULL.

num_seasons

Scalar integer number of seasons.

num_steps_per_season

integer number of steps in each season. This may be either a scalar (shape ⁠[]⁠), in which case all seasons have the same length, or an array of shape ⁠[num_seasons]⁠, in which seasons have different length, but remain constant around different cycles, or an array of shape ⁠[num_cycles, num_seasons]⁠, in which num_steps_per_season for each season also varies in different cycle (e.g., a 4 years cycle with leap day). Default value: 1.

drift_scale_prior

optional tfd$Distribution instance specifying a prior on the drift_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_effect_prior

optional tfd$Distribution instance specifying a normal prior on the initial effect of each season. This may be either a scalar tfd_normal prior, in which case it applies independently to every season, or it may be multivariate normal (e.g., tfd_multivariate_normal_diag) with event shape ⁠[num_seasons]⁠, in which case it specifies a joint prior across all seasons. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

constrain_mean_effect_to_zero

if TRUE, use a model parameterization that constrains the mean effect across all seasons to be zero. This constraint is generally helpful in identifying the contributions of different model components and can lead to more interpretable posterior decompositions. It may be undesirable if you plan to directly examine the latent space of the underlying state space model. Default value: TRUE.

name

the name of this model component. Default value: 'Seasonal'.

Details

effects[season, occurrence[i]] = (
  effects[season, occurrence[i-1]] + Normal(loc=0., scale=drift_scale))

The drift_scale parameter governs the standard deviation of the random walk; for example, in a day-of-week model it governs the change in effect from this Monday to next Monday.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


State space model for a seasonal effect.

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfd_linear_gaussian_state_space_model for details.

Usage

sts_seasonal_state_space_model(
  num_timesteps,
  num_seasons,
  drift_scale,
  initial_state_prior,
  observation_noise_scale = 0,
  num_steps_per_season = 1,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer tensor number of timesteps to model with this distribution.

num_seasons

Scalar integer number of seasons.

drift_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the change in effect between consecutive occurrences of a given season. This is assumed to be the same for all seasons.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states; must have event shape ⁠[num_seasons]⁠.

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the observation noise.

num_steps_per_season

integer number of steps in each season. This may be either a scalar (shape ⁠[]⁠), in which case all seasons have the same length, or an array of shape ⁠[num_seasons]⁠, in which seasons have different length, but remain constant around different cycles, or an array of shape ⁠[num_cycles, num_seasons]⁠, in which num_steps_per_season for each season also varies in different cycle (e.g., a 4 years cycle with leap day). Default value: 1.

initial_step

Optional scalar integer tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

string prefixed to ops created by this class. Default value: "SeasonalStateSpaceModel".

Details

A seasonal effect model is a special case of a linear Gaussian SSM. The latent states represent an unknown effect from each of several 'seasons'; these are generally not meteorological seasons, but represent regular recurring patterns such as hour-of-day or day-of-week effects. The effect of each season drifts from one occurrence to the next, following a Gaussian random walk:

effects[season, occurrence[i]] = (effects[season, occurrence[i-1]] + Normal(loc=0., scale=drift_scale))

The latent state has dimension num_seasons, containing one effect for each seasonal component. The parameters drift_scale and observation_noise_scale are each (a batch of) scalars. The batch shape of this Distribution is the broadcast batch shape of these parameters and of the initial_state_prior. Note: there is no requirement that the effects sum to zero.

Mathematical Details

The seasonal effect model implements a tfd_linear_gaussian_state_space_model with latent_size = num_seasons and observation_size = 1. The latent state is organized so that the current seasonal effect is always in the first (zeroth) dimension. The transition model rotates the latent state to shift to a new effect at the end of each season:

transition_matrix[t] = (permutation_matrix([1, 2, ..., num_seasons-1, 0])
                       if season_is_changing(t)
                       else eye(num_seasons)
transition_noise[t] ~ Normal(loc=0., scale_diag=(
                      [drift_scale, 0, ..., 0]
                      if season_is_changing(t)
                      else [0, 0, ..., 0]))

where season_is_changing(t) is True if ⁠t `mod` sum(num_steps_per_season)⁠ is in the set of final days for each season, given by cumsum(num_steps_per_season) - 1. The observation model always picks out the effect for the current season, i.e., the first element of the latent state:

observation_matrix = [[1., 0., ..., 0.]]
observation_noise ~ Normal(loc=0, scale=observation_noise_scale)

Value

an instance of LinearGaussianStateSpaceModel.

See Also

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Formal representation of a semi-local linear trend model.

Description

Like the sts_local_linear_trend model, a semi-local linear trend posits a latent level and slope, with the level component updated according to the current slope plus a random walk:

Usage

sts_semi_local_linear_trend(
  observed_time_series = NULL,
  level_scale_prior = NULL,
  slope_mean_prior = NULL,
  slope_scale_prior = NULL,
  autoregressive_coef_prior = NULL,
  initial_level_prior = NULL,
  initial_slope_prior = NULL,
  constrain_ar_coef_stationary = TRUE,
  constrain_ar_coef_positive = FALSE,
  name = NULL
)

Arguments

observed_time_series

optional float tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations. Default value: NULL.

level_scale_prior

optional tfp$distribution instance specifying a prior on the level_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

slope_mean_prior

optional tfd$Distribution instance specifying a prior on the slope_mean parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

slope_scale_prior

optional tfd$Distribution instance specifying a prior on the slope_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

autoregressive_coef_prior

optional tfd$Distribution instance specifying a prior on the autoregressive_coef parameter. If NULL, the default prior is a standard Normal(0, 1). Note that the prior may be implicitly truncated by constrain_ar_coef_stationary and/or constrain_ar_coef_positive. Default value: NULL.

initial_level_prior

optional tfp$distribution instance specifying a prior on the initial level. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_slope_prior

optional tfd$Distribution instance specifying a prior on the initial slope. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

constrain_ar_coef_stationary

if TRUE, perform inference using a parameterization that restricts autoregressive_coef to the interval ⁠(-1, 1)⁠, or ⁠(0, 1)⁠ if force_positive_ar_coef is also TRUE, corresponding to stationary processes. This will implicitly truncate the support of autoregressive_coef_prior. Default value: TRUE.

constrain_ar_coef_positive

if TRUE, perform inference using a parameterization that restricts autoregressive_coef to be positive, or in ⁠(0, 1)⁠ if constrain_ar_coef_stationary is also TRUE. This will implicitly truncate the support of autoregressive_coef_prior. Default value: FALSE.

name

the name of this model component. Default value: 'SemiLocalLinearTrend'.

Details

level[t] = level[t-1] + slope[t-1] + Normal(0., level_scale)

The slope component in a sts_semi_local_linear_trend model evolves according to a first-order autoregressive (AR1) process with potentially nonzero mean:

slope[t] = (slope_mean + autoregressive_coef * (slope[t-1] - slope_mean) + Normal(0., slope_scale))

Unlike the random walk used in LocalLinearTrend, a stationary AR1 process (coefficient in ⁠(-1, 1)⁠) maintains bounded variance over time, so a SemiLocalLinearTrend model will often produce more reasonable uncertainties when forecasting over long timescales.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


State space model for a semi-local linear trend.

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfd_linear_gaussian_state_space_model for details.

Usage

sts_semi_local_linear_trend_state_space_model(
  num_timesteps,
  level_scale,
  slope_mean,
  slope_scale,
  autoregressive_coef,
  initial_state_prior,
  observation_noise_scale = 0,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer tensor number of timesteps to model with this distribution.

level_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the level transitions.

slope_mean

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the expected long-term mean of the latent slope.

slope_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the slope transitions.

autoregressive_coef

Scalar (any additional dimensions are treated as batch dimensions) float tensor defining the AR1 process on the latent slope.

initial_state_prior

instance of tfd_multivariate_normal representing the prior distribution on latent states. Must have event shape ⁠[1]⁠ (as tfd_linear_gaussian_state_space_model requires a rank-1 event shape).

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float tensor indicating the standard deviation of the observation noise.

initial_step

Optional scalar integer tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

string' prefixed to ops created by this class. Default value: "SemiLocalLinearTrendStateSpaceModel".

Details

The semi-local linear trend model is a special case of a linear Gaussian SSM, in which the latent state posits a level and slope. The level evolves via a Gaussian random walk centered at the current slope, while the slope follows a first-order autoregressive (AR1) process with mean slope_mean:

level[t] = level[t-1] + slope[t-1] + Normal(0, level_scale)
slope[t] = (slope_mean + autoregressive_coef * (slope[t-1] - slope_mean) +
           Normal(0., slope_scale))

The latent state is the two-dimensional tuple ⁠[level, slope]⁠. The level is observed at each timestep. The parameters level_scale, slope_mean, slope_scale, autoregressive_coef, and observation_noise_scale are each (a batch of) scalars. The batch shape of this Distribution is the broadcast batch shape of these parameters and of the initial_state_prior.

Mathematical Details

The semi-local linear trend model implements a tfp.distributions.LinearGaussianStateSpaceModel with latent_size = 2 and observation_size = 1, following the transition model:

transition_matrix = [[1., 1.]
                     [0., autoregressive_coef]]
transition_noise ~ N(loc=slope_mean - autoregressive_coef * slope_mean,
                     scale=diag([level_scale, slope_scale]))

which implements the evolution of ⁠[level, slope]⁠ described above, and the observation model:

observation_matrix = [[1., 0.]]
observation_noise ~ N(loc=0, scale=observation_noise_scale)

which picks out the first latent component, i.e., the level, as the observation at each timestep.

Value

an instance of LinearGaussianStateSpaceModel.

See Also

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Formal representation of a smooth seasonal effect model

Description

The smooth seasonal model uses a set of trigonometric terms in order to capture a recurring pattern whereby adjacent (in time) effects are similar. The model uses frequencies calculated via:

Usage

sts_smooth_seasonal(
  period,
  frequency_multipliers,
  allow_drift = TRUE,
  drift_scale_prior = NULL,
  initial_state_prior = NULL,
  observed_time_series = NULL,
  name = NULL
)

Arguments

period

positive scalar float Tensor giving the number of timesteps required for the longest cyclic effect to repeat.

frequency_multipliers

One-dimensional float Tensor listing the frequencies (cyclic components) included in the model, as multipliers of the base/fundamental frequency 2. * pi / period. Each component is specified by the number of times it repeats per period, and adds two latent dimensions to the model. A smooth seasonal model that can represent any periodic function is given by ⁠frequency_multipliers = [1,2, ..., floor(period / 2)]⁠. However, it is often desirable to enforce a smoothness assumption (and reduce the computational burden) by dropping some of the higher frequencies.

allow_drift

optional logical specifying whether the seasonal effects can drift over time. Setting this to FALSE removes the drift_scale parameter from the model. This is mathematically equivalent to drift_scale_prior = tfd.Deterministic(0.), but removing drift directly is preferred because it avoids the use of a degenerate prior. Default value: TRUE.

drift_scale_prior

optional tfd$Distribution instance specifying a prior on the drift_scale parameter. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

initial_state_prior

instance of tfd$MultivariateNormal representing the prior distribution on the latent states. Must have event shape ⁠[2 * len(frequency_multipliers)]⁠. If NULL, a heuristic default prior is constructed based on the provided observed_time_series.

observed_time_series

optional float Tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of tfp$sts$MaskedTimeSeries, which includes a mask Tensor to specify timesteps with missing observations. Default value: NULL.

name

the name of this model component. Default value: 'LocalLinearTrend'.

Details

frequencies[j] = 2. * pi * frequency_multipliers[j] / period

and then posits two latent states for each frequency. The two latent states associated with frequency j drift over time via:

effect[t] = (effect[t-1] * cos(frequencies[j]) +
             auxiliary[t-] * sin(frequencies[j]) +
             Normal(0., drift_scale))
auxiliary[t] = (-effect[t-1] * sin(frequencies[j]) +
                auxiliary[t-] * cos(frequencies[j]) +
                Normal(0., drift_scale))

where effect is the smooth seasonal effect and auxiliary only appears as a matter of construction. The interpretation of auxiliary is thus not particularly important.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_sparse_linear_regression(), sts_sum()


State space model for a smooth seasonal effect

Description

A state space model (SSM) posits a set of latent (unobserved) variables that evolve over time with dynamics specified by a probabilistic transition model p(z[t+1] | z[t]). At each timestep, we observe a value sampled from an observation model conditioned on the current state, p(x[t] | z[t]). The special case where both the transition and observation models are Gaussians with mean specified as a linear function of the inputs, is known as a linear Gaussian state space model and supports tractable exact probabilistic calculations; see tfp$distributions$LinearGaussianStateSpaceModel for details. A smooth seasonal effect model is a special case of a linear Gaussian SSM. It is the sum of a set of "cyclic" components, with one component for each frequency:

frequencies[j] = 2. * pi * frequency_multipliers[j] / period

Each cyclic component contains two latent states which we denote effect and auxiliary. The two latent states for component j drift over time via:

effect[t] = (effect[t-1] * cos(frequencies[j]) +
             auxiliary[t-] * sin(frequencies[j]) +
             Normal(0., drift_scale))
auxiliary[t] = (-effect[t-1] * sin(frequencies[j]) +
                auxiliary[t-] * cos(frequencies[j]) +
                Normal(0., drift_scale))

Usage

sts_smooth_seasonal_state_space_model(
  num_timesteps,
  period,
  frequency_multipliers,
  drift_scale,
  initial_state_prior,
  observation_noise_scale = 0,
  initial_step = 0,
  validate_args = FALSE,
  allow_nan_stats = TRUE,
  name = NULL
)

Arguments

num_timesteps

Scalar integer Tensor number of timesteps to model with this distribution.

period

positive scalar float Tensor giving the number of timesteps required for the longest cyclic effect to repeat.

frequency_multipliers

One-dimensional float Tensor listing the frequencies (cyclic components) included in the model, as multipliers of the base/fundamental frequency 2. * pi / period. Each component is specified by the number of times it repeats per period, and adds two latent dimensions to the model. A smooth seasonal model that can represent any periodic function is given by ⁠frequency_multipliers = [1,2, ..., floor(period / 2)]⁠. However, it is often desirable to enforce a smoothness assumption (and reduce the computational burden) by dropping some of the higher frequencies.

drift_scale

Scalar (any additional dimensions are treated as batch dimensions) float Tensor indicating the standard deviation of the latent state transitions.

initial_state_prior

instance of tfd$MultivariateNormal representing the prior distribution on latent states. Must have event shape ⁠[num_features]⁠.

observation_noise_scale

Scalar (any additional dimensions are treated as batch dimensions) float Tensor indicating the standard deviation of the observation noise. Default value: 0..

initial_step

scalar integer Tensor specifying the starting timestep. Default value: 0.

validate_args

logical. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. Default value: FALSE.

allow_nan_stats

logical. If FALSE, raise an exception if a statistic (e.g. mean/mode/etc...) is undefined for any batch member. If TRUE, batch members with valid parameters leading to undefined statistics will return NaN for this statistic. Default value: TRUE.

name

string prefixed to ops created by this class. Default value: "LocalLinearTrendStateSpaceModel".

Details

The auxiliary latent state only appears as a matter of construction and thus its interpretation is not particularly important. The total smooth seasonal effect is the sum of the effect values from each of the cyclic components. The parameters drift_scale and observation_noise_scale are each (a batch of) scalars. The batch shape of this Distribution is the broadcast batch shape of these parameters and of the initial_state_prior.

Mathematical Details

The smooth seasonal effect model implements a tfp$distributions$LinearGaussianStateSpaceModel with latent_size = 2 * len(frequency_multipliers) and observation_size = 1. The latent state is the concatenation of the cyclic latent states which themselves comprise an effect and an auxiliary state. The transition matrix is a block diagonal matrix where block j is:

transition_matrix[j] =  [[cos(frequencies[j]), sin(frequencies[j])],
                         [-sin(frequencies[j]), cos(frequencies[j])]]

The observation model picks out the cyclic effect values from the latent state:

observation_matrix = [[1., 0., 1., 0., ..., 1., 0.]]
observation_noise ~ Normal(loc=0, scale=observation_noise_scale)

For further mathematical details please see Harvey (1990).

Value

an instance of LinearGaussianStateSpaceModel.

references

  • Harvey, A. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press, 1990.

See Also

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal(), sts_sparse_linear_regression(), sts_sum()


Formal representation of a sparse linear regression.

Description

This model defines a time series given by a sparse linear combination of covariate time series provided in a design matrix:

Usage

sts_sparse_linear_regression(
  design_matrix,
  weights_prior_scale = 0.1,
  weights_batch_shape = NULL,
  name = NULL
)

Arguments

design_matrix

float tensor of shape tf$concat(list(batch_shape, list(num_timesteps, num_features))). This may also optionally be an instance of tf$linalg$LinearOperator.

weights_prior_scale

float Tensor defining the scale of the Horseshoe prior on regression weights. Small values encourage the weights to be sparse. The shape must broadcast with weights_batch_shape. Default value: 0.1.

weights_batch_shape

if NULL, defaults to design_matrix.batch_shape_tensor(). Must broadcast with the batch shape of design_matrix. Default value: NULL.

name

the name of this model component. Default value: 'LinearRegression'.

Details

observed_time_series <- tf$matmul(design_matrix, weights)

This is identical to sts_linear_regression, except that sts_sparse_linear_regression uses a parameterization of a Horseshoe prior to encode the assumption that many of the weights are zero, i.e., many of the covariate time series are irrelevant. See the mathematical details section below for further discussion. The prior parameterization used by sts_sparse_linear_regression is more suitable for inference than that obtained by simply passing the equivalent tfd_horseshoe prior to sts_linear_regression; when sparsity is desired, sts_sparse_linear_regression will likely yield better results.

This component does not itself include observation noise; it defines a deterministic distribution with mass at the point tf$matmul(design_matrix, weights). In practice, it should be combined with observation noise from another component such as sts_sum.

Mathematical Details

The basic horseshoe prior Carvalho et al. (2009) is defined as a Cauchy-normal scale mixture:

scales[i] ~ HalfCauchy(loc=0, scale=1)
weights[i] ~ Normal(loc=0., scale=scales[i] * global_scale)`

The Cauchy scale parameters puts substantial mass near zero, encouraging weights to be sparse, but their heavy tails allow weights far from zero to be estimated without excessive shrinkage. The horseshoe can be thought of as a continuous relaxation of a traditional 'spike-and-slab' discrete sparsity prior, in which the latent Cauchy scale mixes between 'spike' (⁠scales[i] ~= 0⁠) and 'slab' (⁠scales[i] >> 0⁠) regimes.

Following the recommendations in Piironen et al. (2017), SparseLinearRegression implements a horseshoe with the following adaptations:

  • The Cauchy prior on scales[i] is represented as an InverseGamma-Normal compound.

  • The global_scale parameter is integrated out following a Cauchy(0., scale=weights_prior_scale) hyperprior, which is also represented as an InverseGamma-Normal compound.

  • All compound distributions are implemented using a non-centered parameterization. The compound, non-centered representation defines the same marginal prior as the original horseshoe (up to integrating out the global scale), but allows samplers to mix more efficiently through the heavy tails; for variational inference, the compound representation implicity expands the representational power of the variational model.

Note that we do not yet implement the regularized ('Finnish') horseshoe, proposed in Piironen et al. (2017) for models with weak likelihoods, because the likelihood in STS models is typically Gaussian, where it's not clear that additional regularization is appropriate. If you need this functionality, please email [email protected].

The full prior parameterization implemented in SparseLinearRegression is as follows:

Sample global_scale from Cauchy(0, scale=weights_prior_scale).
global_scale_variance ~ InverseGamma(alpha=0.5, beta=0.5)
global_scale_noncentered ~ HalfNormal(loc=0, scale=1)
global_scale = (global_scale_noncentered *
sqrt(global_scale_variance) *
weights_prior_scale)
Sample local_scales from Cauchy(0, 1).
local_scale_variances[i] ~ InverseGamma(alpha=0.5, beta=0.5)
local_scales_noncentered[i] ~ HalfNormal(loc=0, scale=1)
local_scales[i] = local_scales_noncentered[i] * sqrt(local_scale_variances[i])
weights[i] ~ Normal(loc=0., scale=local_scales[i] * global_scale)

Value

an instance of StructuralTimeSeries.

References

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sum()


Sum of structural time series components.

Description

This class enables compositional specification of a structural time series model from basic components. Given a list of component models, it represents an additive model, i.e., a model of time series that may be decomposed into a sum of terms corresponding to the component models.

Usage

sts_sum(
  observed_time_series = NULL,
  components,
  constant_offset = NULL,
  observation_noise_scale_prior = NULL,
  name = NULL
)

Arguments

observed_time_series

optional float tensor of shape ⁠batch_shape + [T, 1]⁠ (omitting the trailing unit dimension is also supported when T > 1), specifying an observed time series. Any priors not explicitly set will be given default values according to the scale of the observed time series (or batch of time series). May optionally be an instance of sts_masked_time_series, which includes a mask tensor to specify timesteps with missing observations. Default value: NULL.

components

list of one or more StructuralTimeSeries instances. These must have unique names.

constant_offset

optional scalar float tensor, or batch of scalars, specifying a constant value added to the sum of outputs from the component models. This allows the components to model the shifted series observed_time_series - constant_offset. If NULL, this is set to the mean of the provided observed_time_series. Default value: NULL.

observation_noise_scale_prior

optional tfd$Distribution instance specifying a prior on observation_noise_scale. If NULL, a heuristic default prior is constructed based on the provided observed_time_series. Default value: NULL.

name

string name of this model component; used as name_scope for ops created by this class. Default value: 'Sum'.

Details

Formally, the additive model represents a random process g[t] = f1[t] + f2[t] + ... + fN[t] + eps[t], where the f's are the random processes represented by the components, and eps[t] ~ Normal(loc=0, scale=observation_noise_scale) is an observation noise term. See the AdditiveStateSpaceModel documentation for mathematical details.

This model inherits the parameters (with priors) of its components, and adds an observation_noise_scale parameter governing the level of noise in the observed time series.

Value

an instance of StructuralTimeSeries.

See Also

For usage examples see sts_fit_with_hmc(), sts_forecast(), sts_decompose_by_component().

Other sts: sts_additive_state_space_model(), sts_autoregressive_state_space_model(), sts_autoregressive(), sts_constrained_seasonal_state_space_model(), sts_dynamic_linear_regression_state_space_model(), sts_dynamic_linear_regression(), sts_linear_regression(), sts_local_level_state_space_model(), sts_local_level(), sts_local_linear_trend_state_space_model(), sts_local_linear_trend(), sts_seasonal_state_space_model(), sts_seasonal(), sts_semi_local_linear_trend_state_space_model(), sts_semi_local_linear_trend(), sts_smooth_seasonal_state_space_model(), sts_smooth_seasonal(), sts_sparse_linear_regression()


ComputesY = g(X) = Abs(X), element-wise

Description

This non-injective bijector allows for transformations of scalar distributions with the absolute value function, which maps ⁠(-inf, inf)⁠ to ⁠[0, inf)⁠.

  • For y in ⁠(0, inf)⁠, tfb_absolute_value$inverse(y) returns the set inverse ⁠{x in (-inf, inf) : |x| = y}⁠ as a tuple, ⁠-y, y⁠. tfb_absolute_value$inverse(0) returns ⁠0, 0⁠, which is not the set inverse (the set inverse is the singleton {0}), but "works" in conjunction with TransformedDistribution to produce a left semi-continuous pdf. For y < 0, tfb_absolute_value$inverse(y) happily returns the wrong thing, ⁠-y, y⁠ This is done for efficiency. If validate_args == TRUE, y < 0 will raise an exception.

Usage

tfb_absolute_value(validate_args = FALSE, name = "absolute_value")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Affine bijector

Description

This Bijector is initialized with shift Tensor and scale arguments, giving the forward operation: Y = g(X) = scale @ X + shift where the scale term is logically equivalent to: ⁠scale = scale_identity_multiplier * tf.diag(tf.ones(d)) + tf.diag(scale_diag) + scale_tril + scale_perturb_factor @ diag(scale_perturb_diag) @ tf.transpose([scale_perturb_factor]))⁠

Usage

tfb_affine(
  shift = NULL,
  scale_identity_multiplier = NULL,
  scale_diag = NULL,
  scale_tril = NULL,
  scale_perturb_factor = NULL,
  scale_perturb_diag = NULL,
  adjoint = FALSE,
  validate_args = FALSE,
  name = "affine",
  dtype = NULL
)

Arguments

shift

Floating-point Tensor. If this is set to NULL, no shift is applied.

scale_identity_multiplier

floating point rank 0 Tensor representing a scaling done to the identity matrix. When scale_identity_multiplier = scale_diag = scale_tril = NULL then ⁠scale += IdentityMatrix⁠. Otherwise no scaled-identity-matrix is added to scale.

scale_diag

Floating-point Tensor representing the diagonal matrix. scale_diag has shape ⁠[N1, N2, ... k]⁠, which represents a k x k diagonal matrix. When NULL no diagonal term is added to scale.

scale_tril

Floating-point Tensor representing the lower triangular matrix. scale_tril has shape ⁠[N1, N2, ... k, k]⁠, which represents a k x k lower triangular matrix. When NULL no scale_tril term is added to scale. The upper triangular elements above the diagonal are ignored.

scale_perturb_factor

Floating-point Tensor representing factor matrix with last two dimensions of shape ⁠(k, r)⁠ When NULL, no rank-r update is added to scale.

scale_perturb_diag

Floating-point Tensor representing the diagonal matrix. scale_perturb_diag has shape ⁠[N1, N2, ... r]⁠, which represents an r x r diagonal matrix. When NULL low rank updates will take the form scale_perturb_factor * scale_perturb_factor.T.

adjoint

Logical indicating whether to use the scale matrix as specified or its adjoint. Default value: FALSE.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

dtype

tf$DType to prefer when converting args to Tensors. Else, we fall back to a common dtype inferred from the args, finally falling back to float32.

Details

If NULL of scale_identity_multiplier, scale_diag, or scale_tril are specified then ⁠scale += IdentityMatrix⁠ Otherwise specifying a scale argument has the semantics of ⁠scale += Expand(arg)⁠, i.e., scale_diag != NULL means ⁠scale += tf$diag(scale_diag)⁠.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Computes⁠Y = g(X; shift, scale) = scale @ X + shift⁠

Description

shift is a numeric Tensor and scale is a LinearOperator. If X is a scalar then the forward transformation is: scale * X + shift where * denotes broadcasted elementwise product.

Usage

tfb_affine_linear_operator(
  shift = NULL,
  scale = NULL,
  adjoint = FALSE,
  validate_args = FALSE,
  name = "affine_linear_operator"
)

Arguments

shift

Floating-point Tensor.

scale

Subclass of LinearOperator. Represents the (batch) positive definite matrix M in ⁠R^{k x k}⁠.

adjoint

Logical indicating whether to use the scale matrix as specified or its adjoint. Default value: FALSE.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Maps unconstrained R^n to R^n in ascending order.

Description

Both the domain and the codomain of the mapping is ⁠[-inf, inf]^n⁠, however, the input of the inverse mapping must be strictly increasing. On the last dimension of the tensor, the Ascending bijector performs: ⁠y = tf$cumsum([x[0], tf$exp(x[1]), tf$exp(x[2]), ..., tf$exp(x[-1])])⁠

Usage

tfb_ascending(validate_args = FALSE, name = "ascending")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) s.t. X = g^-1(Y) = (Y - mean(Y)) / std(Y)

Description

Applies Batch Normalization (Ioffe and Szegedy, 2015) to samples from a data distribution. This can be used to stabilize training of normalizing flows (Papamakarios et al., 2016; Dinh et al., 2017)

Usage

tfb_batch_normalization(
  batchnorm_layer = NULL,
  training = TRUE,
  validate_args = FALSE,
  name = "batch_normalization"
)

Arguments

batchnorm_layer

tf$layers$BatchNormalization layer object. If NULL, defaults to tf$layers$BatchNormalization(gamma_constraint=tf$nn$relu(x) + 1e-6). This ensures positivity of the scale variable.

training

If TRUE, updates running-average statistics during call to inverse().

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

When training Deep Neural Networks (DNNs), it is common practice to normalize or whiten features by shifting them to have zero mean and scaling them to have unit variance.

The inverse() method of the BatchNormalization bijector, which is used in the log-likelihood computation of data samples, implements the normalization procedure (shift-and-scale) using the mean and standard deviation of the current minibatch.

Conversely, the forward() method of the bijector de-normalizes samples (e.g. X*std(Y) + mean(Y) with the running-average mean and standard deviation computed at training-time. De-normalization is useful for sampling.

During training time, BatchNormalization.inverse and BatchNormalization.forward are not guaranteed to be inverses of each other because inverse(y) uses statistics of the current minibatch, while forward(x) uses running-average statistics accumulated from training. In other words, tfb_batch_normalization()$inverse(tfb_batch_normalization()$forward(...)) and tfb_batch_normalization()$forward(tfb_batch_normalization()$inverse(...)) will be identical when training=FALSE but may be different when training=TRUE.

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Bijector which applies a list of bijectors to blocks of a Tensor

Description

More specifically, given ⁠[F_0, F_1, ... F_n]⁠ which are scalar or vector bijectors this bijector creates a transformation which operates on the vector ⁠[x_0, ... x_n]⁠ with the transformation ⁠[F_0(x_0), F_1(x_1) ..., F_n(x_n)]⁠ where ⁠x_0, ..., x_n⁠ are blocks (partitions) of the vector.

Usage

tfb_blockwise(
  bijectors,
  block_sizes = NULL,
  validate_args = FALSE,
  name = NULL
)

Arguments

bijectors

A non-empty list of bijectors.

block_sizes

A 1-D integer Tensor with each element signifying the length of the block of the input vector to pass to the corresponding bijector. The length of block_sizes must be be equal to the length of bijectors. If left as NULL, a vector of 1's is used.

validate_args

Logical indicating whether arguments should be checked for correctness.

name

String, name given to ops managed by this object. Default: E.g., tfb_blockwise(list(tfb_exp(), tfb_softplus()))$name == 'blockwise_of_exp_and_softplus'.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Bijector which applies a sequence of bijectors

Description

Bijector which applies a sequence of bijectors

Usage

tfb_chain(
  bijectors = NULL,
  validate_args = FALSE,
  validate_event_size = TRUE,
  parameters = NULL,
  name = NULL
)

Arguments

bijectors

list of bijector instances. An empty list makes this bijector equivalent to the Identity bijector.

validate_args

Logical indicating whether arguments should be checked for correctness.

validate_event_size

Checks that bijectors are not applied to inputs with incomplete support (that is, inputs where one or more elements are a deterministic transformation of the others). For example, the following LDJ would be incorrect: tfb_chain(list(tfb_scale(), tfb_softmax_centered()))$forward_log_det_jacobian(matrix(1:2, ncol = 2)) The jacobian contribution from tfb_scale() applies to a 2-dimensional input, but the output from tfb_softmax_centered() is a 1-dimensional input embedded in a 2-dimensional space. Setting validate_event_size=TRUE (default) prints warnings in these cases. When validate_args is also TRUE, the warning is promoted to an exception.

parameters

Locals dict captured by subclass constructor, to be used for copy/slice re-instantiation operators.

name

String, name given to ops managed by this object. Default: E.g., tfb_chain(list(tfb_exp(), tfb_softplus()))$name == "chain_of_exp_of_softplus".

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Computesg(X) = X @ X.T where X is lower-triangular, positive-diagonal matrix

Description

Note: the upper-triangular part of X is ignored (whether or not its zero).

Usage

tfb_cholesky_outer_product(
  validate_args = FALSE,
  name = "cholesky_outer_product"
)

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

The surjectivity of g as a map from the set of n x n positive-diagonal lower-triangular matrices to the set of SPD matrices follows immediately from executing the Cholesky factorization algorithm on an SPD matrix A to produce a positive-diagonal lower-triangular matrix L such that A = L @ L.T.

To prove the injectivity of g, suppose that L_1 and L_2 are lower-triangular with positive diagonals and satisfy A = L_1 @ L_1.T = L_2 @ L_2.T. Then ⁠inv(L_1) @ A @ inv(L_1).T = [inv(L_1) @ L_2] @ [inv(L_1) @ L_2].T = I⁠. Setting L_3 := inv(L_1) @ L_2, that L_3 is a positive-diagonal lower-triangular matrix follows from inv(L_1) being positive-diagonal lower-triangular (which follows from the diagonal of a triangular matrix being its spectrum), and that the product of two positive-diagonal lower-triangular matrices is another positive-diagonal lower-triangular matrix. A simple inductive argument (proceeding one column of L_3 at a time) shows that, if I = L_3 @ L_3.T, with L_3 being lower-triangular with positive- diagonal, then L_3 = I. Thus, L_1 = L_2, proving injectivity of g.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Maps the Cholesky factor of M to the Cholesky factor of M^{-1}

Description

The forward and inverse calculations are conceptually identical to: forward <- function(x) tf$cholesky(tf$linalg$inv(tf$matmul(x, x, adjoint_b=TRUE))) inverse = forward However, the actual calculations exploit the triangular structure of the matrices.

Usage

tfb_cholesky_to_inv_cholesky(
  validate_args = FALSE,
  name = "cholesky_to_inv_cholesky"
)

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Maps unconstrained reals to Cholesky-space correlation matrices.

Description

This bijector is a mapping between R^{n} and the n-dimensional manifold of Cholesky-space correlation matrices embedded in R^{m^2}, where n is the (m - 1)th triangular number; i.e. n = 1 + 2 + ... + (m - 1).

Usage

tfb_correlation_cholesky(validate_args = FALSE, name = "correlation_cholesky")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Mathematical Details

The image of unconstrained reals under the CorrelationCholesky bijector is the set of correlation matrices which are positive definite. A correlation matrix can be characterized as a symmetric positive semidefinite matrix with 1s on the main diagonal. However, the correlation matrix is positive definite if no component can be expressed as a linear combination of the other components. For a lower triangular matrix L to be a valid Cholesky-factor of a positive definite correlation matrix, it is necessary and sufficient that each row of L have unit Euclidean norm. To see this, observe that if L_i is the ith row of the Cholesky factor corresponding to the correlation matrix R, then the ith diagonal entry of R satisfies:

1 = R_i,i = L_i . L_i = ||L_i||^2

where '.' is the dot product of vectors and ⁠||...||⁠ denotes the Euclidean norm. Furthermore, observe that ⁠R_i,j⁠ lies in the interval ⁠[-1, 1]⁠. By the Cauchy-Schwarz inequality:

|R_i,j| = |L_i . L_j| <= ||L_i|| ||L_j|| = 1

This is a consequence of the fact that R is symmetric positive definite with 1s on the main diagonal. The LKJ distribution with input_output_cholesky=TRUE generates samples from (and computes log-densities on) the set of Cholesky factors of positive definite correlation matrices. The CorrelationCholesky bijector provides a bijective mapping from unconstrained reals to the support of the LKJ distribution.

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Computes the cumulative sum of a tensor along a specified axis.

Description

Computes the cumulative sum of a tensor along a specified axis.

Usage

tfb_cumsum(axis = -1, validate_args = FALSE, name = "cumsum")

Arguments

axis

int indicating the axis along which to compute the cumulative sum. Note that positive (and zero) values are not supported

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = DCT(X), where DCT type is indicated by the type arg

Description

The discrete cosine transform efficiently applies a unitary DCT operator. This can be useful for mixing and decorrelating across the innermost event dimension. The inverse X = g^{-1}(Y) = IDCT(Y), where IDCT is DCT-III for type==2. This bijector can be interleaved with Affine bijectors to build a cascade of structured efficient linear layers as in Moczulski et al., 2016. Note that the operator applied is orthonormal (i.e. norm='ortho').

Usage

tfb_discrete_cosine_transform(
  validate_args = FALSE,
  dct_type = 2,
  name = "dct"
)

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

dct_type

integer, the DCT type performed by the forward transformation. Currently, only 2 and 3 are supported.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY=g(X)=exp(X)

Description

ComputesY=g(X)=exp(X)

Usage

tfb_exp(validate_args = FALSE, name = "exp")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = exp(X) - 1

Description

This Bijector is no different from tfb_chain(list(tfb_affine_scalar(shift=-1), tfb_exp())). However, this makes use of the more numerically stable routines tf$math$expm1 and tf$log1p.

Usage

tfb_expm1(validate_args = FALSE, name = "expm1")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Note: the expm1(.) is applied element-wise but the Jacobian is a reduction over the event space.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Implements a continuous normalizing flow X->Y defined via an ODE.

Description

This bijector implements a continuous dynamics transformation parameterized by a differential equation, where initial and terminal conditions correspond to domain (X) and image (Y) i.e.

Usage

tfb_ffjord(
  state_time_derivative_fn,
  ode_solve_fn = NULL,
  trace_augmentation_fn = tfp$bijectors$ffjord$trace_jacobian_hutchinson,
  initial_time = 0,
  final_time = 1,
  validate_args = FALSE,
  dtype = tf$float32,
  name = "ffjord"
)

Arguments

state_time_derivative_fn

function taking arguments time (a scalar representing time) and state (a Tensor representing the state at given time) returning the time derivative of the state at given time.

ode_solve_fn

function taking arguments ode_fn (same as state_time_derivative_fn above), initial_time (a scalar representing the initial time of integration), initial_state (a Tensor of floating dtype represents the initial state) and solution_times (1D Tensor of floating dtype representing time at which to obtain the solution) returning a Tensor of shape ⁠[time_axis, initial_state$shape]⁠. Will take ⁠[final_time]⁠ as the solution_times argument and state_time_derivative_fn as ode_fn argument. If NULL a DormandPrince solver from tfp$math$ode is used. Default value: NULL

trace_augmentation_fn

function taking arguments ode_fn ( function same as state_time_derivative_fn above), state_shape (TensorShape of a the state), dtype (same as dtype of the state) and returning a function taking arguments time (a scalar representing the time at which the function is evaluted), state (a Tensor representing the state at given time) that computes a tuple (ode_fn(time, state), jacobian_trace_estimation). jacobian_trace_estimation should represent trace of the jacobian of ode_fn with respect to state. state_time_derivative_fn will be passed as ode_fn argument. Default value: tfp$bijectors$ffjord$trace_jacobian_hutchinson

initial_time

Scalar float representing time to which the x value of the bijector corresponds to. Passed as initial_time to ode_solve_fn. For default solver can be float or floating scalar Tensor. Default value: 0.

final_time

Scalar float representing time to which the y value of the bijector corresponds to. Passed as solution_times to ode_solve_fn. For default solver can be float or floating scalar Tensor. Default value: 1.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

dtype

tf$DType to prefer when converting args to Tensors. Else, we fall back to a common dtype inferred from the args, finally falling back to float32.

name

name prefixed to Ops created by this class.

Details

d/dt[state(t)] = state_time_derivative_fn(t, state(t))
state(initial_time) = X
state(final_time) = Y

For this transformation the value of log_det_jacobian follows another differential equation, reducing it to computation of the trace of the jacobian along the trajectory

state_time_derivative = state_time_derivative_fn(t, state(t))
d/dt[log_det_jac(t)] = Tr(jacobian(state_time_derivative, state(t)))

FFJORD constructor takes two functions ode_solve_fn and trace_augmentation_fn arguments that customize integration of the differential equation and trace estimation.

Differential equation integration is performed by a call to ode_solve_fn.

Custom ode_solve_fn must accept the following arguments:

  • ode_fn(time, state): Differential equation to be solved.

  • initial_time: Scalar float or floating Tensor representing the initial time.

  • initial_state: Floating Tensor representing the initial state.

  • solution_times: 1D floating Tensor of solution times.

And return a Tensor of shape ⁠[solution_times$shape, initial_state$shape]⁠ representing state values evaluated at solution_times. In addition ode_solve_fn must support nested structures. For more details see the interface of tfp$math$ode$Solver$solve().

Trace estimation is computed simultaneously with state_time_derivative using augmented_state_time_derivative_fn that is generated by trace_augmentation_fn. trace_augmentation_fn takes state_time_derivative_fn, state.shape and state.dtype arguments and returns a augmented_state_time_derivative_fn callable that computes both state_time_derivative and unreduced trace_estimation.

Custom ode_solve_fn and trace_augmentation_fn examples:

# custom_solver_fn: `function(f, t_initial, t_solutions, y_initial, ...)`
# ... : Additional arguments to pass to custom_solver_fn.
ode_solve_fn <- function(ode_fn, initial_time, initial_state, solution_times) {
  custom_solver_fn(ode_fn, initial_time, solution_times, initial_state, ...)
}
ffjord <- tfb_ffjord(state_time_derivative_fn, ode_solve_fn = ode_solve_fn)
# state_time_derivative_fn: `function(time, state)`
# trace_jac_fn: `function(time, state)` unreduced jacobian trace function
trace_augmentation_fn <- function(ode_fn, state_shape, state_dtype) {
  augmented_ode_fn <- function(time, state) {
    list(ode_fn(time, state), trace_jac_fn(time, state))
  }
augmented_ode_fn
}
ffjord <- tfb_ffjord(state_time_derivative_fn, trace_augmentation_fn = trace_augmentation_fn)

For more details on FFJORD and continous normalizing flows see Chen et al. (2018), Grathwol et al. (2018).

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Transforms unconstrained vectors to TriL matrices with positive diagonal

Description

This is implemented as a simple tfb_chain of tfb_fill_triangular followed by tfb_transform_diagonal, and provided mostly as a convenience. The default setup is somewhat opinionated, using a Softplus transformation followed by a small shift (1e-5) which attempts to avoid numerical issues from zeros on the diagonal.

Usage

tfb_fill_scale_tri_l(
  diag_bijector = NULL,
  diag_shift = 1e-05,
  validate_args = FALSE,
  name = "fill_scale_tril"
)

Arguments

diag_bijector

Bijector instance, used to transform the output diagonal to be positive. Default value: NULL (i.e., tfb_softplus()).

diag_shift

Float value broadcastable and added to all diagonal entries after applying the diag_bijector. Setting a positive value forces the output diagonal entries to be positive, but prevents inverting the transformation for matrices with diagonal entries less than this value. Default value: 1e-5.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Transforms vectors to triangular

Description

Triangular matrix elements are filled in a clockwise spiral. Given input with shape ⁠batch_shape + [d]⁠, produces output with shape ⁠batch_shape + [n, n]⁠, where n = (-1 + sqrt(1 + 8 * d))/2. This follows by solving the quadratic equation d = 1 + 2 + ... + n = n * (n + 1)/2.

Usage

tfb_fill_triangular(
  upper = FALSE,
  validate_args = FALSE,
  name = "fill_triangular"
)

Arguments

upper

Logical representing whether output matrix should be upper triangular (TRUE) or lower triangular (FALSE, default).

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Returns the forward Bijector evaluation, i.e., X = g(Y).

Description

Returns the forward Bijector evaluation, i.e., X = g(Y).

Usage

tfb_forward(bijector, x, name = "forward")

Arguments

bijector

The bijector to apply

x

Tensor. The input to the "forward" evaluation.

name

name of the operation

Value

a tensor

See Also

Other bijector_methods: tfb_forward_log_det_jacobian(), tfb_inverse_log_det_jacobian(), tfb_inverse()

Examples

b <- tfb_affine_scalar(shift = 1, scale = 2)
  x <- 10
  b %>% tfb_forward(x)

Returns the result of the forward evaluation of the log determinant of the Jacobian

Description

Returns the result of the forward evaluation of the log determinant of the Jacobian

Usage

tfb_forward_log_det_jacobian(
  bijector,
  x,
  event_ndims,
  name = "forward_log_det_jacobian"
)

Arguments

bijector

The bijector to apply

x

Tensor. The input to the "forward" Jacobian determinant evaluation.

event_ndims

Number of dimensions in the probabilistic events being transformed. Must be greater than or equal to bijector$forward_min_event_ndims. The result is summed over the final dimensions to produce a scalar Jacobian determinant for each event, i.e. it has shape x$shape$ndims - event_ndims dimensions.

name

name of the operation

Value

a tensor

See Also

Other bijector_methods: tfb_forward(), tfb_inverse_log_det_jacobian(), tfb_inverse()

Examples

b <- tfb_affine_scalar(shift = 1, scale = 2)
  x <- 10
  b %>% tfb_forward_log_det_jacobian(x, event_ndims = 0)

Implements the Glow Bijector from Kingma & Dhariwal (2018).

Description

Overview: Glow is a chain of bijectors which transforms a rank-1 tensor (vector) into a rank-3 tensor (e.g. an RGB image). Glow does this by chaining together an alternating series of "Blocks," "Squeezes," and "Exits" which are each themselves special chains of other bijectors. The intended use of Glow is as part of a tfd_transformed_distribution, in which the base distribution over the vector space is used to generate samples in the image space. In the paper, an Independent Normal distribution is used as the base distribution.

Usage

tfb_glow(
  output_shape = c(32, 32, 3),
  num_glow_blocks = 3,
  num_steps_per_block = 32,
  coupling_bijector_fn = NULL,
  exit_bijector_fn = NULL,
  grab_after_block = NULL,
  use_actnorm = TRUE,
  seed = NULL,
  validate_args = FALSE,
  name = "glow"
)

Arguments

output_shape

A list of integers, specifying the event shape of the output, of the bijectors forward pass (the image). Specified as ⁠[H, W, C]⁠. Default Value: (32, 32, 3)

num_glow_blocks

An integer, specifying how many downsampling levels to include in the model. This must divide equally into both H and W, otherwise the bijector would not be invertible. Default Value: 3

num_steps_per_block

An integer specifying how many Affine Coupling and 1x1 convolution layers to include at each level of the spatial hierarchy. Default Value: 32 (i.e. the value used in the original glow paper).

coupling_bijector_fn

A function which takes the argument input_shape and returns a callable neural network (e.g. a keras_model_sequential()). The network should either return a tensor with the same event shape as input_shape (this will employ additive coupling), a tensor with the same height and width as input_shape but twice the number of channels (this will employ affine coupling), or a bijector which takes in a tensor with event shape input_shape, and returns a tensor with shape input_shape.

exit_bijector_fn

Similar to coupling_bijector_fn, exit_bijector_fn is a function which takes the argument input_shape and output_chan and returns a callable neural network. The neural network it returns should take a tensor of shape input_shape as the input, and return one of three options: A tensor with output_chan channels, a tensor with 2 * output_chan channels, or a bijector. Additional details can be found in the documentation for ExitBijector.

grab_after_block

A tuple of floats, specifying what fraction of the remaining channels to remove following each glow block. Glow will take the integer floor of this number multiplied by the remaining number of channels. The default is half at each spatial hierarchy. Default value: None (this will take out half of the channels after each block.

use_actnorm

A boolean deciding whether or not to use actnorm. Data-dependent initialization is used to initialize this layer. Default value: FALSE

seed

A seed to control randomness in the 1x1 convolution initialization. Default value: NULL (i.e., non-reproducible sampling).

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

A "Block" (implemented as the GlowBlock Bijector) performs much of the transformations which allow glow to produce sophisticated and complex mappings between the image space and the latent space and therefore achieve rich image generation performance. A Block is composed of num_steps_per_block steps, which are each implemented as a Chain containing an ActivationNormalization (ActNorm) bijector, followed by an (invertible) OneByOneConv bijector, and finally a coupling bijector. The coupling bijector is an instance of a RealNVP bijector, and uses the coupling_bijector_fn function to instantiate the coupling bijector function which is given to the RealNVP. This function returns a bijector which defines the coupling (e.g. Shift(Scale) for affine coupling or Shift for additive coupling).

A "Squeeze" converts spatial features into channel features. It is implemented using the Expand bijector. The difference in names is due to the fact that the forward function from glow is meant to ultimately correspond to sampling from a tfp$util$TransformedDistribution object, which would use Expand (Squeeze is just Invert(Expand)). The Expand bijector takes a tensor with shape ⁠[H, W, C]⁠ and returns a tensor with shape ⁠[2H, 2W, C / 4]⁠, such that each 2x2x1 spatial tile in the output is composed from a single 1x1x4 tile in the input tensor, as depicted in the figure below.

Forward pass (Expand)

\     \       \    \    \
\\     \ ----> \  1 \  2 \
\\\__1__\       \____\____\
\\\__2__\        \    \    \
\\__3__\  <----  \  3 \  4 \
\__4__\          \____\____\

Inverse pass (Squeeze) This is implemented using a chain of Reshape -> Transpose -> Reshape bijectors. Note that on an inverse pass through the bijector, each Squeeze will cause the width/height of the image to decrease by a factor of 2. Therefore, the input image must be evenly divisible by 2 at least num_glow_blocks times, since it will pass through a Squeeze step that many times.

An "Exit" is simply a junction at which some of the tensor "exits" from the glow bijector and therefore avoids any further alteration. Each exit is implemented as a Blockwise bijector, where some channels are given to the rest of the glow model, and the rest are given to a bypass implemented using the Identity bijector. The fraction of channels to be removed at each exit is determined by the grab_after_block arg, indicates the fraction of remaining channels which join the identity bypass. The fraction is converted to an integer number of channels by multiplying by the remaining number of channels and rounding. Additionally, at each exit, glow couples the tensor exiting the highway to the tensor continuing onward. This makes small scale features in the image dependent on larger scale features, since the larger scale features dictate the mean and scale of the distribution over the smaller scale features. This coupling is done similarly to the Coupling bijector in each step of the flow (i.e. using a RealNVP bijector). However for the exit bijector, the coupling is instantiated using exit_bijector_fn rather than coupling bijector fn, allowing for different behaviors between standard coupling and exit coupling. Also note that because the exit utilizes a coupling bijector, there are two special cases (all channels exiting and no channels exiting). The full Glow bijector consists of num_glow_blocks Blocks each of which contains num_steps_per_block steps. Each step implements a coupling using bijector_coupling_fn. Between blocks, glow converts between spatial pixels and channels using the Expand Bijector, and splits channels out of the bijector using the Exit Bijector. The channels which have exited continue onward through Identity bijectors and those which have not exited are given to the next block. After passing through all Blocks, the tensor is reshaped to a rank-1 tensor with the same number of elements. This is where the distribution will be defined. A schematic diagram of Glow is shown below. The forward function of the bijector starts from the bottom and goes upward, while the inverse function starts from the top and proceeds downward.

Value

a bijector instance.

#' “'

Glow Schematic Diagram Input Image ######################## shape = [H, W, C] \ /<- Expand Bijector turns spatial \ / dimensions into channels. | XXXXXXXXXXXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXX A single step of the flow consists Glow Block - | XXXXXXXXXXXXXXXXXXXX <- of ActNorm -> 1x1Conv -> Coupling. | XXXXXXXXXXXXXXXXXXXX there are num_steps_per_block | XXXXXXXXXXXXXXXXXXXX steps of the flow in each block. |_ XXXXXXXXXXXXXXXXXXXX \ / <– Expand bijectors follow each glow \ / block XXXXXXXX\\\\ <– Exit Bijector removes channels _ _ from additional alteration. | XXXXXXXX ! | ! | XXXXXXXX ! | ! | XXXXXXXX ! | ! After exiting, channels are passed Glow Block - | XXXXXXXX ! | ! <— downward using the Blockwise and | XXXXXXXX ! | ! Identify bijectors. | XXXXXXXX ! | ! |_ XXXXXXXX ! | ! \ / <—- Expand Bijector \ / XXX\\ | ! <—- Exit Bijector _ | XXX ! | | ! | XXX ! | | ! | XXX ! | | ! low Block - | XXX ! | | ! | XXX ! | | ! | XXX ! | | ! |_ XXX ! | | ! XX\ ! | | ! <—– (Optional) Exit Bijector | | | v v v Output Distribution ########## shape = [H * W * C]

    Legend

| XX = Step of flow | | X\ = Exit bijector | | \/ = Expand bijector | | !|! = Identity bijector | | | | up = Forward pass | | dn = Inverse pass | |_________________________|

[H, W, C]: R:H,%20W,%20C
[2H, 2W, C / 4]: R:2H,%202W,%20C%20/%204
[H, W, C]: R:H,%20W,%20C
[H * W * C]: R:H%20*%20W%20*%20C

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute ⁠Y = g(X) = 1 - exp(-c * (exp(rate * X) - 1)⁠, the Gompertz CDF.

Description

This bijector maps inputs from ⁠[-inf, inf]⁠ to ⁠[0, inf]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Gompertz distribution:

Y ~ GompertzCDF(concentration, rate)
pdf(y; c, r) = r * c * exp(r * y + c - c * exp(-c * exp(r * y)))

Note: Because the Gompertz distribution concentrates its mass close to zero, for larger rates or larger concentrations, bijector.forward will quickly saturate to 1.

Usage

tfb_gompertz_cdf(
  concentration,
  rate,
  validate_args = FALSE,
  name = "gompertz_cdf"
)

Arguments

concentration

Positive Float-like Tensor that is the same dtype and is broadcastable with concentration. This is c in ⁠Y = g(X) = 1 - exp(-c * (exp(rate * X) - 1)⁠.

rate

Positive Float-like Tensor that is the same dtype and is broadcastable with concentration. This is rate in ⁠Y = g(X) = 1 - exp(-c * (exp(rate * X) - 1)⁠.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = exp(-exp(-(X - loc) / scale))

Description

This bijector maps inputs from ⁠[-inf, inf]⁠ to ⁠[0, 1]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Gumbel distribution:

Usage

tfb_gumbel(loc = 0, scale = 1, validate_args = FALSE, name = "gumbel")

Arguments

loc

Float-like Tensor that is the same dtype and is broadcastable with scale. This is loc in Y = g(X) = exp(-exp(-(X - loc) / scale)).

scale

Positive Float-like Tensor that is the same dtype and is broadcastable with loc. This is scale in Y = g(X) = exp(-exp(-(X - loc) / scale)).

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Y ~ Gumbel(loc, scale) ⁠pdf(y; loc, scale) = exp(-( (y - loc) / scale + exp(- (y - loc) / scale) ) ) / scale⁠

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute Y = g(X) = exp(-exp(-(X - loc) / scale)), the Gumbel CDF.

Description

This bijector maps inputs from ⁠[-inf, inf]⁠ to ⁠[0, 1]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Gumbel distribution:

Usage

tfb_gumbel_cdf(loc = 0, scale = 1, validate_args = FALSE, name = "gumbel_cdf")

Arguments

loc

Float-like Tensor that is the same dtype and is broadcastable with scale. This is loc in Y = g(X) = exp(-exp(-(X - loc) / scale)).

scale

Positive Float-like Tensor that is the same dtype and is broadcastable with loc. This is scale in Y = g(X) = exp(-exp(-(X - loc) / scale)).

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Y ~ GumbelCDF(loc, scale)
pdf(y; loc, scale) = exp(-( (y - loc) / scale + exp(- (y - loc) / scale) ) ) / scale

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = X

Description

ComputesY = g(X) = X

Usage

tfb_identity(validate_args = FALSE, name = "identity")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Bijector constructed from custom functions

Description

Bijector constructed from custom functions

Usage

tfb_inline(
  forward_fn = NULL,
  inverse_fn = NULL,
  inverse_log_det_jacobian_fn = NULL,
  forward_log_det_jacobian_fn = NULL,
  forward_event_shape_fn = NULL,
  forward_event_shape_tensor_fn = NULL,
  inverse_event_shape_fn = NULL,
  inverse_event_shape_tensor_fn = NULL,
  is_constant_jacobian = NULL,
  validate_args = FALSE,
  forward_min_event_ndims = NULL,
  inverse_min_event_ndims = NULL,
  name = "inline"
)

Arguments

forward_fn

Function implementing the forward transformation.

inverse_fn

Function implementing the inverse transformation.

inverse_log_det_jacobian_fn

Function implementing the log_det_jacobian of the forward transformation.

forward_log_det_jacobian_fn

Function implementing the log_det_jacobian of the inverse transformation.

forward_event_shape_fn

Function implementing non-identical static event shape changes. Default: shape is assumed unchanged.

forward_event_shape_tensor_fn

Function implementing non-identical event shape changes. Default: shape is assumed unchanged.

inverse_event_shape_fn

Function implementing non-identical static event shape changes. Default: shape is assumed unchanged.

inverse_event_shape_tensor_fn

Function implementing non-identical event shape changes. Default: shape is assumed unchanged.

is_constant_jacobian

Logical indicating that the Jacobian is constant for all input arguments.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

forward_min_event_ndims

Integer indicating the minimal dimensionality this bijector acts on.

inverse_min_event_ndims

Integer indicating the minimal dimensionality this bijector acts on.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Returns the inverse Bijector evaluation, i.e., X = g^{-1}(Y).

Description

Returns the inverse Bijector evaluation, i.e., X = g^{-1}(Y).

Usage

tfb_inverse(bijector, y, name = "inverse")

Arguments

bijector

The bijector to apply

y

Tensor. The input to the "inverse" evaluation.

name

name of the operation

Value

a tensor

See Also

Other bijector_methods: tfb_forward_log_det_jacobian(), tfb_forward(), tfb_inverse_log_det_jacobian()

Examples

b <- tfb_affine_scalar(shift = 1, scale = 2)
  x <- 10
  y <- b %>% tfb_forward(x)
  b %>% tfb_inverse(y)

Returns the result of the inverse evaluation of the log determinant of the Jacobian

Description

Returns the result of the inverse evaluation of the log determinant of the Jacobian

Usage

tfb_inverse_log_det_jacobian(
  bijector,
  y,
  event_ndims,
  name = "inverse_log_det_jacobian"
)

Arguments

bijector

The bijector to apply

y

Tensor. The input to the "inverse" Jacobian determinant evaluation.

event_ndims

Number of dimensions in the probabilistic events being transformed. Must be greater than or equal to bijector$inverse_min_event_ndims. The result is summed over the final dimensions to produce a scalar Jacobian determinant for each event, i.e. it has shape x$shape$ndims - event_ndims dimensions.

name

name of the operation

Value

a tensor

See Also

Other bijector_methods: tfb_forward_log_det_jacobian(), tfb_forward(), tfb_inverse()

Examples

b <- tfb_affine_scalar(shift = 1, scale = 2)
  x <- 10
  y <- b %>% tfb_forward(x)
  b %>% tfb_inverse_log_det_jacobian(y, event_ndims = 0)

Bijector which inverts another Bijector

Description

Creates a Bijector which swaps the meaning of inverse and forward. Note: An inverted bijector's inverse_log_det_jacobian is often more efficient if the base bijector implements _forward_log_det_jacobian. If _forward_log_det_jacobian is not implemented then the following code is used: y = b$inverse(x) -b$inverse_log_det_jacobian(y)

Usage

tfb_invert(bijector, validate_args = FALSE, name = NULL)

Arguments

bijector

Bijector instance.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Bijector which applies a Stick Breaking procedure.

Description

Bijector which applies a Stick Breaking procedure.

Usage

tfb_iterated_sigmoid_centered(validate_args = FALSE, name = "iterated_sigmoid")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = (1 - (1 - X)**(1 / b))**(1 / a), with X in ⁠[0, 1]⁠

Description

This bijector maps inputs from ⁠[0, 1]⁠ to ⁠[0, 1]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Kumaraswamy distribution: Y ~ Kumaraswamy(a, b) ⁠pdf(y; a, b, 0 <= y <= 1) = a * b * y ** (a - 1) * (1 - y**a) ** (b - 1)⁠

Usage

tfb_kumaraswamy(
  concentration1 = NULL,
  concentration0 = NULL,
  validate_args = FALSE,
  name = "kumaraswamy"
)

Arguments

concentration1

float scalar indicating the transform power, i.e., ⁠Y = g(X) = (1 - (1 - X)**(1 / b))**(1 / a) where a is concentration1.⁠

concentration0

float scalar indicating the transform power, i.e., Y = g(X) = (1 - (1 - X)**(1 / b))**(1 / a) where b is concentration0.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = (1 - (1 - X)**(1 / b))**(1 / a), with X in ⁠[0, 1]⁠

Description

This bijector maps inputs from ⁠[0, 1]⁠ to ⁠[0, 1]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Kumaraswamy distribution: Y ~ Kumaraswamy(a, b) ⁠pdf(y; a, b, 0 <= y <= 1) = a * b * y ** (a - 1) * (1 - y**a) ** (b - 1)⁠

Usage

tfb_kumaraswamy_cdf(
  concentration1 = 1,
  concentration0 = 1,
  validate_args = FALSE,
  name = "kumaraswamy_cdf"
)

Arguments

concentration1

float scalar indicating the transform power, i.e., ⁠Y = g(X) = (1 - (1 - X)**(1 / b))**(1 / a) where a is concentration1.⁠

concentration0

float scalar indicating the transform power, i.e., Y = g(X) = (1 - (1 - X)**(1 / b))**(1 / a) where b is concentration0.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


LambertWTail transformation for heavy-tail Lambert W x F random variables.

Description

A random variable Y has a Lambert W x F distribution if W_tau(Y) = X has distribution F, where tau = (shift, scale, tail) parameterizes the inverse transformation.

Usage

tfb_lambert_w_tail(
  shift = NULL,
  scale = NULL,
  tailweight = NULL,
  validate_args = FALSE,
  name = "lambertw_tail"
)

Arguments

shift

Floating point tensor; the shift for centering (uncentering) the input (output) random variable(s).

scale

Floating point tensor; the scaling (unscaling) of the input (output) random variable(s). Must contain only positive values.

tailweight

Floating point tensor; the tail behaviors of the output random variable(s). Must contain only non-negative values.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

This bijector defines the transformation underlying Lambert W x F distributions that transform an input random variable to an output random variable with heavier tails. It is defined as Y = (U * exp(0.5 * tail * U^2)) * scale + shift, tail >= 0 where U = (X - shift) / scale is a shifted/scaled input random variable, and tail >= 0 is the tail parameter.

Attributes: shift: shift to center (uncenter) the input data. scale: scale to normalize (de-normalize) the input data. tailweight: Tail parameter delta of heavy-tail transformation; must be >= 0.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Masked Autoregressive Density Estimator

Description

This will be wrapped in a make_template to ensure the variables are only created once. It takes the input and returns the loc ("mu" in Germain et al. (2015)) and log_scale ("alpha" in Germain et al. (2015)) from the MADE network.

Usage

tfb_masked_autoregressive_default_template(
  hidden_layers,
  shift_only = FALSE,
  activation = tf$nn$relu,
  log_scale_min_clip = -5,
  log_scale_max_clip = 3,
  log_scale_clip_gradient = FALSE,
  name = NULL,
  ...
)

Arguments

hidden_layers

list-like of non-negative integer, scalars indicating the number of units in each hidden layer. Default: list(512, 512).

shift_only

logical indicating if only the shift term shall be computed. Default: FALSE.

activation

Activation function (callable). Explicitly setting to NULL implies a linear activation.

log_scale_min_clip

float-like scalar Tensor, or a Tensor with the same shape as log_scale. The minimum value to clip by. Default: -5.

log_scale_max_clip

float-like scalar Tensor, or a Tensor with the same shape as log_scale. The maximum value to clip by. Default: 3.

log_scale_clip_gradient

logical indicating that the gradient of tf$clip_by_value should be preserved. Default: FALSE.

name

A name for ops managed by this function. Default: "tfb_masked_autoregressive_default_template".

...

tf$layers$dense arguments

Details

Warning: This function uses masked_dense to create randomly initialized tf$Variables. It is presumed that these will be fit, just as you would any other neural architecture which uses tf$layers$dense.

About Hidden Layers Each element of hidden_layers should be greater than the input_depth (i.e., input_depth = tf$shape(input)[-1] where input is the input to the neural network). This is necessary to ensure the autoregressivity property.

About Clipping This function also optionally clips the log_scale (but possibly not its gradient). This is useful because if log_scale is too small/large it might underflow/overflow making it impossible for the MaskedAutoregressiveFlow bijector to implement a bijection. Additionally, the log_scale_clip_gradient bool indicates whether the gradient should also be clipped. The default does not clip the gradient; this is useful because it still provides gradient information (for fitting) yet solves the numerical stability problem. I.e., log_scale_clip_gradient = FALSE means ⁠grad[exp(clip(x))] = grad[x] exp(clip(x))⁠ rather than the usual ⁠grad[clip(x)] exp(clip(x))⁠.

Value

list of:

  • shift: Float-like Tensor of shift terms

  • log_scale: Float-like Tensor of log(scale) terms

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Affine MaskedAutoregressiveFlow bijector

Description

The affine autoregressive flow (Papamakarios et al., 2016) provides a relatively simple framework for user-specified (deep) architectures to learn a distribution over continuous events. Regarding terminology,

Usage

tfb_masked_autoregressive_flow(
  shift_and_log_scale_fn,
  is_constant_jacobian = FALSE,
  unroll_loop = FALSE,
  event_ndims = 1L,
  validate_args = FALSE,
  name = NULL
)

Arguments

shift_and_log_scale_fn

Function which computes shift and log_scale from both the forward domain (x) and the inverse domain (y). Calculation must respect the "autoregressive property". Suggested default: tfb_masked_autoregressive_default_template(hidden_layers=...). Typically the function contains tf$Variables and is wrapped using tf$make_template. Returning NULL for either (both) shift, log_scale is equivalent to (but more efficient than) returning zero.

is_constant_jacobian

Logical, default: FALSE. When TRUE the implementation assumes log_scale does not depend on the forward domain (x) or inverse domain (y) values. (No validation is made; is_constant_jacobian=FALSE is always safe but possibly computationally inefficient.)

unroll_loop

Logical indicating whether the tf$while_loop in _forward should be replaced with a static for loop. Requires that the final dimension of x be known at graph construction time. Defaults to FALSE.

event_ndims

integer, the intrinsic dimensionality of this bijector. 1 corresponds to a simple vector autoregressive bijector as implemented by the tfb_masked_autoregressive_default_template, 2 might be useful for a 2D convolutional shift_and_log_scale_fn and so on.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

"Autoregressive models decompose the joint density as a product of conditionals, and model each conditional in turn. Normalizing flows transform a base density (e.g. a standard Gaussian) into the target density by an invertible transformation with tractable Jacobian." (Papamakarios et al., 2016)

In other words, the "autoregressive property" is equivalent to the decomposition, ⁠p(x) = prod{ p(x[perm[i]] | x[perm[0:i]]) : i=0, ..., d }⁠ where perm is some permutation of ⁠{0, ..., d}⁠. In the simple case where the permutation is identity this reduces to:

⁠p(x) = prod{ p(x[i] | x[0:i]) : i=0, ..., d }⁠. The provided shift_and_log_scale_fn, tfb_masked_autoregressive_default_template, achieves this property by zeroing out weights in its masked_dense layers. In TensorFlow Probability, "normalizing flows" are implemented as tfp.bijectors.Bijectors. The forward "autoregression" is implemented using a tf.while_loop and a deep neural network (DNN) with masked weights such that the autoregressive property is automatically met in the inverse. A TransformedDistribution using MaskedAutoregressiveFlow(...) uses the (expensive) forward-mode calculation to draw samples and the (cheap) reverse-mode calculation to compute log-probabilities. Conversely, a TransformedDistribution using Invert(MaskedAutoregressiveFlow(...)) uses the (expensive) forward-mode calculation to compute log-probabilities and the (cheap) reverse-mode calculation to compute samples.

Given a shift_and_log_scale_fn, the forward and inverse transformations are (a sequence of) affine transformations. A "valid" shift_and_log_scale_fn must compute each shift (aka loc or "mu" in Germain et al. (2015)]) and log(scale) (aka "alpha" in Germain et al. (2015)) such that ech are broadcastable with the arguments to forward and inverse, i.e., such that the calculations in forward, inverse below are possible.

For convenience, tfb_masked_autoregressive_default_template is offered as a possible shift_and_log_scale_fn function. It implements the MADE architecture (Germain et al., 2015). MADE is a feed-forward network that computes a shift and log(scale) using masked_dense layers in a deep neural network. Weights are masked to ensure the autoregressive property. It is possible that this architecture is suboptimal for your task. To build alternative networks, either change the arguments to tfb_masked_autoregressive_default_template, use the masked_dense function to roll-out your own, or use some other architecture, e.g., using tf.layers. Warning: no attempt is made to validate that the shift_and_log_scale_fn enforces the "autoregressive property".

Assuming shift_and_log_scale_fn has valid shape and autoregressive semantics, the forward transformation is

def forward(x):
   y = zeros_like(x)
   event_size = x.shape[-event_dims:].num_elements()
   for _ in range(event_size):
     shift, log_scale = shift_and_log_scale_fn(y)
     y = x * tf.exp(log_scale) + shift
   return y

and the inverse transformation is

def inverse(y):
  shift, log_scale = shift_and_log_scale_fn(y)
  return (y - shift) / tf.exp(log_scale)

Notice that the inverse does not need a for-loop. This is because in the forward pass each calculation of shift and log_scale is based on the y calculated so far (not x). In the inverse, the y is fully known, thus is equivalent to the scaling used in forward after event_size passes, i.e., the "last" y used to compute shift, log_scale. (Roughly speaking, this also proves the transform is bijective.)

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Autoregressively masked dense layer

Description

Analogous to tf$layers$dense.

Usage

tfb_masked_dense(
  inputs,
  units,
  num_blocks = NULL,
  exclusive = FALSE,
  kernel_initializer = NULL,
  reuse = NULL,
  name = NULL,
  ...
)

Arguments

inputs

Tensor input.

units

integer scalar representing the dimensionality of the output space.

num_blocks

integer scalar representing the number of blocks for the MADE masks.

exclusive

logical scalar representing whether to zero the diagonal of the mask, used for the first layer of a MADE.

kernel_initializer

Initializer function for the weight matrix. If NULL (default), weights are initialized using the tf$glorot_random_initializer

reuse

logical scalar representing whether to reuse the weights of a previous layer by the same name.

name

string used to describe ops managed by this function.

...

tf$layers$dense arguments

Details

See Germain et al. (2015)for detailed explanation.

Value

tensor

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Computes g(L) = inv(L), where L is a lower-triangular matrix

Description

L must be nonsingular; equivalently, all diagonal entries of L must be nonzero. The input must have rank >= 2. The input is treated as a batch of matrices with batch shape ⁠input.shape[:-2]⁠, where each matrix has dimensions input.shape[-2] by input.shape[-1] (hence input.shape[-2] must equal input.shape[-1]).

Usage

tfb_matrix_inverse_tri_l(validate_args = FALSE, name = "matrix_inverse_tril")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Matrix-vector multiply using LU decomposition

Description

This bijector is identical to the "Convolution1x1" used in Glow (Kingma and Dhariwal, 2018).

Usage

tfb_matvec_lu(lower_upper, permutation, validate_args = FALSE, name = NULL)

Arguments

lower_upper

The LU factorization as returned by tf$linalg$lu.

permutation

The LU factorization permutation as returned by tf$linalg$lu.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Warning: this bijector never verifies the scale matrix (as parameterized by LU ecomposition) is invertible. Ensuring this is the case is the caller's responsibility.

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = NormalCDF(x)

Description

This bijector maps inputs from ⁠[-inf, inf]⁠ to ⁠[0, 1]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Normal distribution:

Usage

tfb_normal_cdf(validate_args = FALSE, name = "normal")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Y ~ Normal(0, 1) ⁠pdf(y; 0., 1.) = 1 / sqrt(2 * pi) * exp(-y ** 2 / 2)⁠

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Bijector which maps a tensor x_k that has increasing elements in the last dimension to an unconstrained tensor y_k

Description

Both the domain and the codomain of the mapping is ⁠[-inf, inf]⁠, however, the input of the forward mapping must be strictly increasing. The inverse of the bijector applied to a normal random vector y ~ N(0, 1) gives back a sorted random vector with the same distribution x ~ N(0, 1) where x = sort(y)

Usage

tfb_ordered(validate_args = FALSE, name = "ordered")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

On the last dimension of the tensor, Ordered bijector performs: y[0] = x[0] ⁠y[1:] = tf$log(x[1:] - x[:-1])⁠

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Pads a value to the event_shape of a Tensor.

Description

The semantics of bijector_pad generally follow that of tf$pad() except that bijector_pad's paddings argument applies to the rightmost dimensions. Additionally, the new argument axis enables overriding the dimensions to which paddings is applied. Like paddings, the axis argument is also relative to the rightmost dimension and must therefore be negative. The argument paddings is a vector of integer pairs each representing the number of left and/or right constant_values to pad to the corresponding righmost dimensions. That is, unless axis is specified⁠, specifiying ⁠kdifferentpaddings⁠means the rightmost⁠k⁠dimensions will be "grown" by the sum of the respective⁠paddings⁠row. When⁠axis⁠is specified, it indicates the dimension to which the corresponding⁠paddings⁠element is applied. By default⁠axisisNULL⁠which means it is logically equivalent to⁠range(start=-len(paddings), limit=0)', i.e., the rightmost dimensions.

Usage

tfb_pad(
  paddings = list(c(0, 1)),
  mode = "CONSTANT",
  constant_values = 0,
  axis = NULL,
  validate_args = FALSE,
  name = NULL
)

Arguments

paddings

A vector-shaped Tensor of integer pairs representing the number of elements to pad on the left and right, respectively. Default value: list(reticulate::tuple(0L, 1L)).

mode

One of 'CONSTANT', 'REFLECT', or 'SYMMETRIC' (case-insensitive). For more details, see tf$pad.

constant_values

In "CONSTANT" mode, the scalar pad value to use. Must be same type as tensor. For more details, see tf$pad.

axis

The dimensions for which paddings are applied. Must be 1:1 with paddings or NULL. Default value: NULL (i.e., tf$range(start = -length(paddings), limit = 0)).

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Permutes the rightmost dimension of a Tensor

Description

Permutes the rightmost dimension of a Tensor

Usage

tfb_permute(permutation, axis = -1L, validate_args = FALSE, name = NULL)

Arguments

permutation

An integer-like vector-shaped Tensor representing the permutation to apply to the axis dimension of the transformed Tensor.

axis

Scalar integer Tensor representing the dimension over which to tf$gather. axis must be relative to the end (reading left to right) thus must be negative. Default value: -1 (i.e., right-most).

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = (1 + X * c)**(1 / c), where X >= -1 / c

Description

The power transform maps inputs from ⁠[0, inf]⁠ to ⁠[-1/c, inf]⁠; this is equivalent to the inverse of this bijector. This bijector is equivalent to the Exp bijector when c=0.

Usage

tfb_power_transform(power, validate_args = FALSE, name = "power_transform")

Arguments

power

float scalar indicating the transform power, i.e., Y = g(X) = (1 + X * c)**(1 / c) where c is the power.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


A piecewise rational quadratic spline, as developed in Conor et al.(2019).

Description

This transformation represents a monotonically increasing piecewise rational quadratic function. Outside of the bounds of knot_x/knot_y, the transform behaves as an identity function.

Usage

tfb_rational_quadratic_spline(
  bin_widths,
  bin_heights,
  knot_slopes,
  range_min = -1,
  validate_args = FALSE,
  name = NULL
)

Arguments

bin_widths

The widths of the spans between subsequent knot x positions, a floating point Tensor. Must be positive, and at least 1-D. Innermost axis must sum to the same value as bin_heights. The knot x positions will be a first at range_min, followed by knots at range_min + cumsum(bin_widths, axis=-1).

bin_heights

The heights of the spans between subsequent knot y positions, a floating point Tensor. Must be positive, and at least 1-D. Innermost axis must sum to the same value as bin_widths. The knot y positions will be a first at range_min, followed by knots at range_min + cumsum(bin_heights, axis=-1).

knot_slopes

The slope of the spline at each knot, a floating point Tensor. Must be positive. 1s are implicitly padded for the first and last implicit knots corresponding to range_min and range_min + sum(bin_widths, axis=-1). Innermost axis size should be 1 less than that of bin_widths/bin_heights, or 1 for broadcasting.

range_min

The x/y position of the first knot, which has implicit slope 1. range_max is implicit, and can be computed as range_min + sum(bin_widths, axis=-1). Scalar floating point Tensor.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Typically this bijector will be used as part of a chain, with splines for trailing x dimensions conditioned on some of the earlier x dimensions, and with the inverse then solved first for unconditioned dimensions, then using conditioning derived from those inverses, and so forth.

For each argument, the innermost axis indexes bins/knots and batch axes index axes of x/y spaces. A RationalQuadraticSpline with a separate transform for each of three dimensions might have bin_widths shaped ⁠[3, 32]⁠. To use the same spline for each of x's three dimensions we may broadcast against x and use a bin_widths parameter shaped ⁠[32]⁠.

Parameters will be broadcast against each other and against the input x/ys, so if we want fixed slopes, we can use kwarg knot_slopes=1. A typical recipe for acquiring compatible bin widths and heights would be:

nbins <- unconstrained_vector$shape[-1]
range_min <- 1
range_max <- 1
min_bin_size = 1e-2
scale <- range_max - range_min - nbins * min_bin_size
bin_widths = tf$math$softmax(unconstrained_vector) * scale + min_bin_size

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute ⁠Y = g(X) = 1 - exp( -(X/scale)**2 / 2 ), X >= 0⁠.

Description

This bijector maps inputs from ⁠[0, inf]⁠ to ⁠[0, 1]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Rayleigh distribution:

Y ~ Rayleigh(scale)
pdf(y; scale, y >= 0) = (1 / scale) * (y / scale) * exp(-(y / scale)**2 / 2)

Usage

tfb_rayleigh_cdf(scale, validate_args = FALSE, name = "rayleigh_cdf")

Arguments

scale

Positive floating-point tensor. This is l in ⁠Y = g(X) = 1 - exp( -(X/l)**2 / 2 ), X >= 0⁠.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Likewise, the forward of this bijector is the Rayleigh distribution CDF.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


RealNVP affine coupling layer for vector-valued events

Description

Real NVP models a normalizing flow on a D-dimensional distribution via a single D-d-dimensional conditional distribution (Dinh et al., 2017): y[d:D] = x[d:D] * tf.exp(log_scale_fn(x[0:d])) + shift_fn(x[0:d]) y[0:d] = x[0:d] The last D-d units are scaled and shifted based on the first d units only, while the first d units are 'masked' and left unchanged. Real NVP's shift_and_log_scale_fn computes vector-valued quantities. For scale-and-shift transforms that do not depend on any masked units, i.e. d=0, use the tfb_affine bijector with learned parameters instead. Masking is currently only supported for base distributions with event_ndims=1. For more sophisticated masking schemes like checkerboard or channel-wise masking (Papamakarios et al., 2016), use the tfb_permute bijector to re-order desired masked units into the first d units. For base distributions with event_ndims > 1, use the tfb_reshape bijector to flatten the event shape.

Usage

tfb_real_nvp(
  num_masked,
  shift_and_log_scale_fn,
  is_constant_jacobian = FALSE,
  validate_args = FALSE,
  name = NULL
)

Arguments

num_masked

integer indicating that the first d units of the event should be masked. Must be in the closed interval ⁠[1, D-1]⁠, where D is the event size of the base distribution.

shift_and_log_scale_fn

Function which computes shift and log_scale from both the forward domain (x) and the inverse domain (y). Calculation must respect the "autoregressive property". Suggested default: tfb_real_nvp_default_template(hidden_layers=...). Typically the function contains tf$Variables and is wrapped using tf$make_template. Returning NULL for either (both) shift, log_scale is equivalent to (but more efficient than) returning zero.

is_constant_jacobian

Logical, default: FALSE. When TRUE the implementation assumes log_scale does not depend on the forward domain (x) or inverse domain (y) values. (No validation is made; is_constant_jacobian=FALSE is always safe but possibly computationally inefficient.)

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Recall that the MAF bijector (Papamakarios et al., 2016) implements a normalizing flow via an autoregressive transformation. MAF and IAF have opposite computational tradeoffs - MAF can train all units in parallel but must sample units sequentially, while IAF must train units sequentially but can sample in parallel. In contrast, Real NVP can compute both forward and inverse computations in parallel. However, the lack of an autoregressive transformations makes it less expressive on a per-bijector basis.

A "valid" shift_and_log_scale_fn must compute each shift (aka loc or "mu" in Papamakarios et al. (2016) and log(scale) (aka "alpha" in Papamakarios et al. (2016)) such that each are broadcastable with the arguments to forward and inverse, i.e., such that the calculations in forward, inverse below are possible. For convenience, real_nvp_default_nvp is offered as a possible shift_and_log_scale_fn function.

NICE (Dinh et al., 2014) is a special case of the Real NVP bijector which discards the scale transformation, resulting in a constant-time inverse-log-determinant-Jacobian. To use a NICE bijector instead of Real NVP, shift_and_log_scale_fn should return (shift, NULL), and is_constant_jacobian should be set to TRUE in the RealNVP constructor. Calling tfb_real_nvp_default_template with shift_only=TRUE returns one such NICE-compatible shift_and_log_scale_fn.

Caching: the scalar input depth D of the base distribution is not known at construction time. The first call to any of forward(x), inverse(x), inverse_log_det_jacobian(x), or forward_log_det_jacobian(x) memoizes D, which is re-used in subsequent calls. This shape must be known prior to graph execution (which is the case if using tf$layers).

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Build a scale-and-shift function using a multi-layer neural network

Description

This will be wrapped in a make_template to ensure the variables are only created once. It takes the d-dimensional input x[0:d] and returns the D-d dimensional outputs loc ("mu") and log_scale ("alpha").

Usage

tfb_real_nvp_default_template(
  hidden_layers,
  shift_only = FALSE,
  activation = tf$nn$relu,
  name = NULL,
  ...
)

Arguments

hidden_layers

list-like of non-negative integer, scalars indicating the number of units in each hidden layer. Default: list(512, 512).

shift_only

logical indicating if only the shift term shall be computed (i.e. NICE bijector). Default: FALSE.

activation

Activation function (callable). Explicitly setting to NULL implies a linear activation.

name

A name for ops managed by this function. Default: "tfb_real_nvp_default_template".

...

tf$layers$dense arguments

Details

The default template does not support conditioning and will raise an exception if condition_kwargs are passed to it. To use conditioning in real nvp bijector, implement a conditioned shift/scale template that handles the condition_kwargs.

Value

list of:

  • shift: Float-like Tensor of shift terms

  • log_scale: Float-like Tensor of log(scale) terms

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


A Bijector that computes b(x) = 1. / x

Description

A Bijector that computes b(x) = 1. / x

Usage

tfb_reciprocal(validate_args = FALSE, name = "reciprocal")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Reshapes the event_shape of a Tensor

Description

The semantics generally follow that of tf$reshape(), with a few differences:

  • The user must provide both the input and output shape, so that the transformation can be inverted. If an input shape is not specified, the default assumes a vector-shaped input, i.e., event_shape_in = list(-1).

  • The Reshape bijector automatically broadcasts over the leftmost dimensions of its input (sample_shape and batch_shape); only the rightmost event_ndims_in dimensions are reshaped. The number of dimensions to reshape is inferred from the provided event_shape_in (⁠event_ndims_in = length(event_shape_in))⁠.

Usage

tfb_reshape(
  event_shape_out,
  event_shape_in = c(-1),
  validate_args = FALSE,
  name = NULL
)

Arguments

event_shape_out

An integer-like vector-shaped Tensor representing the event shape of the transformed output.

event_shape_in

An optional integer-like vector-shape Tensor representing the event shape of the input. This is required in order to define inverse operations; the default of list(-1) assumes a vector-shaped input.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute ⁠Y = g(X; scale) = scale * X⁠.

Description

Examples:

Y <- 2 * X
b <- tfb_scale(scale = 2)

Usage

tfb_scale(
  scale = NULL,
  log_scale = NULL,
  validate_args = FALSE,
  name = "scale"
)

Arguments

scale

Floating-point Tensor.

log_scale

Floating-point Tensor. Logarithm of the scale. If this is set to NULL, no scale is applied. This should not be set if scale is set.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute ⁠Y = g(X; scale) = scale @ X⁠

Description

In TF parlance, the scale term is logically equivalent to:

scale = tf$diag(scale_diag)

The scale term is applied without materializing a full dense matrix.

Usage

tfb_scale_matvec_diag(
  scale_diag,
  adjoint = FALSE,
  validate_args = FALSE,
  name = "scale_matvec_diag",
  dtype = NULL
)

Arguments

scale_diag

Floating-point Tensor representing the diagonal matrix. scale_diag has shape ⁠[N1, N2, ... k]⁠, which represents a k x k diagonal matrix.

adjoint

logical indicating whether to use the scale matrix as specified or its adjoint. Default value: FALSE.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

dtype

tf$DType to prefer when converting args to Tensors. Else, we fall back to a common dtype inferred from the args, finally falling back to float32.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute ⁠Y = g(X; scale) = scale @ X⁠.

Description

scale is a LinearOperator. If X is a scalar then the forward transformation is: scale * X where * denotes broadcasted elementwise product.

Usage

tfb_scale_matvec_linear_operator(
  scale,
  adjoint = FALSE,
  validate_args = FALSE,
  name = "scale_matvec_linear_operator"
)

Arguments

scale

Subclass of LinearOperator. Represents the (batch, non-singular) linear transformation by which the Bijector transforms inputs.

adjoint

logical indicating whether to use the scale matrix as specified or its adjoint. Default value: FALSE.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Matrix-vector multiply using LU decomposition.

Description

This bijector is identical to the "Convolution1x1" used in Glow (Kingma and Dhariwal, 2018).

Usage

tfb_scale_matvec_lu(
  lower_upper,
  permutation,
  validate_args = FALSE,
  name = NULL
)

Arguments

lower_upper

The LU factorization as returned by tf$linalg$lu.

permutation

The LU factorization permutation as returned by tf$linalg$lu.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

References

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute ⁠Y = g(X; scale) = scale @ X⁠.

Description

The scale term is presumed lower-triangular and non-singular (ie, no zeros on the diagonal), which permits efficient determinant calculation (linear in matrix dimension, instead of cubic).

Usage

tfb_scale_matvec_tri_l(
  scale_tril,
  adjoint = FALSE,
  validate_args = FALSE,
  name = "scale_matvec_tril",
  dtype = NULL
)

Arguments

scale_tril

Floating-point Tensor representing the lower triangular matrix. scale_tril has shape ⁠[N1, N2, ... k, k]⁠, which represents a k x k lower triangular matrix. When NULL no scale_tril term is added to scale. The upper triangular elements above the diagonal are ignored.

adjoint

logical indicating whether to use the scale matrix as specified or its adjoint. Note that lower-triangularity is taken into account first: the region above the diagonal of scale_tril is treated as zero (irrespective of the adjoint setting). A lower-triangular input with adjoint=TRUE will behave like an upper triangular transform. Default value: FALSE.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

dtype

tf$DType to prefer when converting args to Tensors. Else, we fall back to a common dtype inferred from the args, finally falling back to float32.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Transforms unconstrained vectors to TriL matrices with positive diagonal

Description

This is implemented as a simple tfb_chain of tfb_fill_triangular followed by tfb_transform_diagonal, and provided mostly as a convenience. The default setup is somewhat opinionated, using a Softplus transformation followed by a small shift (1e-5) which attempts to avoid numerical issues from zeros on the diagonal.

Usage

tfb_scale_tri_l(
  diag_bijector = NULL,
  diag_shift = 1e-05,
  validate_args = FALSE,
  name = "scale_tril"
)

Arguments

diag_bijector

Bijector instance, used to transform the output diagonal to be positive. Default value: NULL (i.e., tfb_softplus()).

diag_shift

Float value broadcastable and added to all diagonal entries after applying the diag_bijector. Setting a positive value forces the output diagonal entries to be positive, but prevents inverting the transformation for matrices with diagonal entries less than this value. Default value: 1e-5.

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute ⁠Y = g(X; shift) = X + shift⁠.

Description

where shift is a numeric Tensor.

Usage

tfb_shift(shift, validate_args = FALSE, name = "shift")

Arguments

shift

floating-point tensor

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Compute Y = g(X) = (1 - exp(-rate * X)) * exp(-c * exp(-rate * X))

Description

This bijector maps inputs from ⁠[-inf, inf]⁠ to ⁠[0, inf]⁠. The inverse of the bijector applied to a uniform random variable X ~ U(0, 1) gives back a random variable with the Shifted Gompertz distribution:

Y ~ ShiftedGompertzCDF(concentration, rate)
pdf(y; c, r) = r * exp(-r * y - exp(-r * y) / c) * (1 + (1 - exp(-r * y)) / c)

Usage

tfb_shifted_gompertz_cdf(
  concentration,
  rate,
  validate_args = FALSE,
  name = "shifted_gompertz_cdf"
)

Arguments

concentration

Positive Float-like Tensor that is the same dtype and is broadcastable with concentration. This is c in Y = g(X) = (1 - exp(-rate * X)) * exp(-c * exp(-rate * X)).

rate

Positive Float-like Tensor that is the same dtype and is broadcastable with concentration. This is rate in Y = g(X) = (1 - exp(-rate * X)) * exp(-c * exp(-rate * X)).

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Details

Note: Even though this is called ShiftedGompertzCDF, when applied to the Uniform distribution, this is not the same as applying a GompertzCDF with a Shift bijector (i.e. the Shifted Gompertz distribution is not the same as a Gompertz distribution with a location parameter).

Note: Because the Shifted Gompertz distribution concentrates its mass close to zero, for larger rates or larger concentrations, bijector$forward will quickly saturate to 1.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = 1 / (1 + exp(-X))

Description

ComputesY = g(X) = 1 / (1 + exp(-X))

Usage

tfb_sigmoid(validate_args = FALSE, name = "sigmoid")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sinh_arcsinh(), tfb_sinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


Bijector that computes Y = sinh(X).

Description

Bijector that computes Y = sinh(X).

Usage

tfb_sinh(validate_args = FALSE, name = "sinh")

Arguments

validate_args

Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed.

name

name prefixed to Ops created by this class.

Value

a bijector instance.

See Also

For usage examples see tfb_forward(), tfb_inverse(), tfb_inverse_log_det_jacobian().

Other bijectors: tfb_absolute_value(), tfb_affine_linear_operator(), tfb_affine_scalar(), tfb_affine(), tfb_ascending(), tfb_batch_normalization(), tfb_blockwise(), tfb_chain(), tfb_cholesky_outer_product(), tfb_cholesky_to_inv_cholesky(), tfb_correlation_cholesky(), tfb_cumsum(), tfb_discrete_cosine_transform(), tfb_expm1(), tfb_exp(), tfb_ffjord(), tfb_fill_scale_tri_l(), tfb_fill_triangular(), tfb_glow(), tfb_gompertz_cdf(), tfb_gumbel_cdf(), tfb_gumbel(), tfb_identity(), tfb_inline(), tfb_invert(), tfb_iterated_sigmoid_centered(), tfb_kumaraswamy_cdf(), tfb_kumaraswamy(), tfb_lambert_w_tail(), tfb_masked_autoregressive_default_template(), tfb_masked_autoregressive_flow(), tfb_masked_dense(), tfb_matrix_inverse_tri_l(), tfb_matvec_lu(), tfb_normal_cdf(), tfb_ordered(), tfb_pad(), tfb_permute(), tfb_power_transform(), tfb_rational_quadratic_spline(), tfb_rayleigh_cdf(), tfb_real_nvp_default_template(), tfb_real_nvp(), tfb_reciprocal(), tfb_reshape(), tfb_scale_matvec_diag(), tfb_scale_matvec_linear_operator(), tfb_scale_matvec_lu(), tfb_scale_matvec_tri_l(), tfb_scale_tri_l(), tfb_scale(), tfb_shifted_gompertz_cdf(), tfb_shift(), tfb_sigmoid(), tfb_sinh_arcsinh(), tfb_softmax_centered(), tfb_softplus(), tfb_softsign(), tfb_split(), tfb_square(), tfb_tanh(), tfb_transform_diagonal(), tfb_transpose(), tfb_weibull_cdf(), tfb_weibull()


ComputesY = g(X) = Sinh( (Arcsinh(X) + skewness) * tailweight )

Description

For skewness in ⁠(-inf, inf)⁠ and tailweight in ⁠(0, inf)⁠, this transformation is a diffeomorphism of the real line ⁠(-inf, inf)⁠. The inverse transform is X = g^{-1}(Y) = Sinh( ArcSinh(Y) / tailweight - skewness ). The SinhArcsinh transformation of the Normal is described in Sinh-arcsinh distributions

Usage

tfb_sinh_arcsinh(
  skewness = NULL,
  tailweight = NULL,
  validate_args = FALSE,
  name = "SinhArcsinh"
)

Arguments

skewness

Skewness parameter. Float-type Ten