Title: | Interface to 'TensorFlow Probability' |
---|---|
Description: | Interface to 'TensorFlow Probability', a 'Python' library built on 'TensorFlow' that makes it easy to combine probabilistic models and deep learning on modern hardware ('TPU', 'GPU'). 'TensorFlow Probability' includes a wide selection of probability distributions and bijectors, probabilistic layers, variational inference, Markov chain Monte Carlo, and optimizers such as Nelder-Mead, BFGS, and SGLD. |
Authors: | Tomasz Kalinowski [ctb, cre], Sigrid Keydana [aut], Daniel Falbel [ctb], Kevin Kuo [ctb] , RStudio [cph] |
Maintainer: | Tomasz Kalinowski <[email protected]> |
License: | Apache License (>= 2.0) |
Version: | 0.15.1.9000 |
Built: | 2024-10-06 05:50:09 UTC |
Source: | https://github.com/rstudio/tfprobability |
A list of models that can be used as the model
argument in glm_fit()
:
Bernoulli
: Bernoulli(probs=mean)
where mean = sigmoid(matmul(X, weights))
BernoulliNormalCDF
: Bernoulli(probs=mean)
where mean = Normal(0, 1).cdf(matmul(X, weights))
GammaExp
: Gamma(concentration=1, rate=1 / mean)
where mean = exp(matmul(X, weights))
GammaSoftplus
: Gamma(concentration=1, rate=1 / mean)
where mean = softplus(matmul(X, weights))
LogNormal
: LogNormal(loc=log(mean) - log(2) / 2, scale=sqrt(log(2)))
where
mean = exp(matmul(X, weights))
.
LogNormalSoftplus
: LogNormal(loc=log(mean) - log(2) / 2, scale=sqrt(log(2)))
where
mean = softplus(matmul(X, weights))
Normal
: Normal(loc=mean, scale=1)
where mean = matmul(X, weights)
.
NormalReciprocal
: Normal(loc=mean, scale=1)
where mean = 1 / matmul(X, weights)
Poisson
: Poisson(rate=mean)
where mean = exp(matmul(X, weights))
.
PoissonSoftplus
: Poisson(rate=mean)
where mean = softplus(matmul(X, weights))
.
list of models that can be used as the model
argument in glm_fit()
Other glm_fit:
glm_fit.tensorflow.tensor()
,
glm_fit_one_step.tensorflow.tensor()
Runs multiple Fisher scoring steps
glm_fit(x, ...)
glm_fit(x, ...)
x |
float-like, matrix-shaped Tensor where each row represents a sample's features. |
... |
other arguments passed to specific methods. |
A glm_fit
object with parameter estimates, number of iterations,
etc.
Runs one Fisher scoring step
glm_fit_one_step(x, ...)
glm_fit_one_step(x, ...)
x |
float-like, matrix-shaped Tensor where each row represents a sample's features. |
... |
other arguments passed to specific methods. |
A glm_fit
object with parameter estimates, number of iterations,
etc.
glm_fit_one_step.tensorflow.tensor()
Runs one Fisher Scoring step
## S3 method for class 'tensorflow.tensor' glm_fit_one_step( x, response, model, model_coefficients_start = NULL, predicted_linear_response_start = NULL, l2_regularizer = NULL, dispersion = NULL, offset = NULL, learning_rate = NULL, fast_unsafe_numerics = TRUE, name = NULL, ... )
## S3 method for class 'tensorflow.tensor' glm_fit_one_step( x, response, model, model_coefficients_start = NULL, predicted_linear_response_start = NULL, l2_regularizer = NULL, dispersion = NULL, offset = NULL, learning_rate = NULL, fast_unsafe_numerics = TRUE, name = NULL, ... )
x |
float-like, matrix-shaped Tensor where each row represents a sample's features. |
response |
vector-shaped Tensor where each element represents a sample's
observed response (to the corresponding row of features). Must have same |
model |
a string naming the model (see glm_families) or a |
model_coefficients_start |
Optional (batch of) vector-shaped Tensor representing
the initial model coefficients, one for each column in |
predicted_linear_response_start |
Optional Tensor with shape, |
l2_regularizer |
Optional scalar Tensor representing L2 regularization penalty.
Default: |
dispersion |
Optional (batch of) Tensor representing response dispersion. |
offset |
Optional Tensor representing constant shift applied to |
learning_rate |
Optional (batch of) scalar Tensor used to dampen iterative progress.
Typically only needed if optimization diverges, should be no larger than 1 and typically
very close to 1. Default value: |
fast_unsafe_numerics |
Optional Python bool indicating if faster, less numerically accurate methods can be employed for computing the weighted least-squares solution. Default value: TRUE (i.e., "fast but possibly diminished accuracy"). |
name |
usesed as name prefix to ops created by this function. Default value: "fit". |
... |
other arguments passed to specific methods. |
A glm_fit
object with parameter estimates, and
number of required steps.
Other glm_fit:
glm_families
,
glm_fit.tensorflow.tensor()
Runs multiple Fisher scoring steps
## S3 method for class 'tensorflow.tensor' glm_fit( x, response, model, model_coefficients_start = NULL, predicted_linear_response_start = NULL, l2_regularizer = NULL, dispersion = NULL, offset = NULL, convergence_criteria_fn = NULL, learning_rate = NULL, fast_unsafe_numerics = TRUE, maximum_iterations = NULL, name = NULL, ... )
## S3 method for class 'tensorflow.tensor' glm_fit( x, response, model, model_coefficients_start = NULL, predicted_linear_response_start = NULL, l2_regularizer = NULL, dispersion = NULL, offset = NULL, convergence_criteria_fn = NULL, learning_rate = NULL, fast_unsafe_numerics = TRUE, maximum_iterations = NULL, name = NULL, ... )
x |
float-like, matrix-shaped Tensor where each row represents a sample's features. |
response |
vector-shaped Tensor where each element represents a sample's
observed response (to the corresponding row of features). Must have same |
model |
a string naming the model (see glm_families) or a |
model_coefficients_start |
Optional (batch of) vector-shaped Tensor representing
the initial model coefficients, one for each column in |
predicted_linear_response_start |
Optional Tensor with shape, |
l2_regularizer |
Optional scalar Tensor representing L2 regularization penalty.
Default: |
dispersion |
Optional (batch of) Tensor representing response dispersion. |
offset |
Optional Tensor representing constant shift applied to |
convergence_criteria_fn |
callable taking: |
learning_rate |
Optional (batch of) scalar Tensor used to dampen iterative progress.
Typically only needed if optimization diverges, should be no larger than 1 and typically
very close to 1. Default value: |
fast_unsafe_numerics |
Optional Python bool indicating if faster, less numerically accurate methods can be employed for computing the weighted least-squares solution. Default value: TRUE (i.e., "fast but possibly diminished accuracy"). |
maximum_iterations |
Optional maximum number of iterations of Fisher scoring to run;
"and-ed" with result of |
name |
usesed as name prefix to ops created by this function. Default value: "fit". |
... |
other arguments passed to specific methods. |
A glm_fit
object with parameter estimates, and
number of required steps.
Other glm_fit:
glm_families
,
glm_fit_one_step.tensorflow.tensor()
Initializer which concats other intializers
initializer_blockwise(initializers, sizes, validate_args = FALSE)
initializer_blockwise(initializers, sizes, validate_args = FALSE)
initializers |
list of Keras initializers, eg: |
sizes |
list of integers scalars representing the number of elements associated
with each initializer in |
validate_args |
bool indicating we should do (possibly expensive) graph-time assertions, if necessary. @return Initializer which concats other intializers |
Installs TensorFlow Probability
install_tfprobability( method = c("auto", "virtualenv", "conda"), conda = "auto", version = "default", tensorflow = "default", extra_packages = NULL, ..., pip_ignore_installed = TRUE )
install_tfprobability( method = c("auto", "virtualenv", "conda"), conda = "auto", version = "default", tensorflow = "default", extra_packages = NULL, ..., pip_ignore_installed = TRUE )
method |
Installation method. By default, "auto" automatically finds a method that will work in the local environment. Change the default to force a specific installation method. Note that the "virtualenv" method is not available on Windows. |
conda |
The path to a |
version |
TensorFlow version to install. Valid values include:
|
tensorflow |
Synonym for |
extra_packages |
Additional Python packages to install along with TensorFlow. |
... |
other arguments passed to |
pip_ignore_installed |
Whether pip should ignore installed python
packages and reinstall all already installed python packages. This defaults
to |
invisible
layer_autoregressive
takes as input a Tensor of shape [..., event_size]
and returns a Tensor of shape [..., event_size, params]
.
The output satisfies the autoregressive property. That is, the layer is
configured with some permutation ord
of {0, ..., event_size-1}
(i.e., an
ordering of the input dimensions), and the output output[batch_idx, i, ...]
for input dimension i
depends only on inputs x[batch_idx, j]
where
ord(j) < ord(i)
.
layer_autoregressive( object, params, event_shape = NULL, hidden_units = NULL, input_order = "left-to-right", hidden_degrees = "equal", activation = NULL, use_bias = TRUE, kernel_initializer = "glorot_uniform", validate_args = FALSE, ... )
layer_autoregressive( object, params, event_shape = NULL, hidden_units = NULL, input_order = "left-to-right", hidden_degrees = "equal", activation = NULL, use_bias = TRUE, kernel_initializer = "glorot_uniform", validate_args = FALSE, ... )
object |
What to compose the new
|
params |
integer specifying the number of parameters to output per input. |
event_shape |
|
|
|
input_order |
Order of degrees to the input units: 'random',
'left-to-right', 'right-to-left', or an array of an explicit order. For
example, 'left-to-right' builds an autoregressive model:
|
Method for assigning degrees to the hidden units: 'equal', 'random'. If 'equal', hidden units in each layer are allocated equally (up to a remainder term) to each degree. Default: 'equal'. |
|
activation |
An activation function. See |
use_bias |
Whether or not the dense layers constructed in this layer
should have a bias term. See |
kernel_initializer |
Initializer for the kernel weights matrix. Default: 'glorot_uniform'. |
validate_args |
|
... |
Additional keyword arguments passed to the |
The autoregressive property allows us to use
output[batch_idx, i]
to parameterize conditional distributions:
p(x[batch_idx, i] | x[batch_idx, ] for ord(j) < ord(i))
which give us a tractable distribution over input x[batch_idx]
:
p(x[batch_idx]) = prod_i p(x[batch_idx, ord(i)] | x[batch_idx, ord(0:i)])
For example, when params
is 2, the output of the layer can parameterize
the location and log-scale of an autoregressive Gaussian distribution.
a Keras layer
Other layers:
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
layer_autoregressive
.Following Papamakarios et al. (2017), given
an autoregressive model with conditional distributions in the location-scale
family, we can construct a normalizing flow for
.
layer_autoregressive_transform(object, made, ...)
layer_autoregressive_transform(object, made, ...)
object |
What to compose the new
|
made |
A |
... |
Additional parameters passed to Keras Layer. |
Specifically, suppose made is a [layer_autoregressive()]
– a layer implementing
a Masked Autoencoder for Distribution Estimation (MADE) – that computes location
and log-scale parameters for each input
. Then we can represent
the autoregressive model
as
where
is drawn
from from some base distribution and where
is an invertible and
differentiable function (i.e., a Bijector) and
is defined by:
library(tensorflow) library(zeallot) f_inverse <- function(x) { c(shift, log_scale) %<-% tf$unstack(made(x), 2, axis = -1L) (x - shift) * tf$math$exp(-log_scale) }
Given a layer_autoregressive()
made, a layer_autoregressive_transform()
transforms an input tfd_*
to an output
tfd_*
where
.
a Keras layer
tfb_masked_autoregressive_flow()
and layer_autoregressive()
k * (1 + d)
params.k
(i.e., num_components
) represents the number of component
OneHotCategorical
distributions and d
(i.e., event_size
) represents the
number of categories within each OneHotCategorical
distribution.
layer_categorical_mixture_of_one_hot_categorical( object, event_size, num_components, convert_to_tensor_fn = tfp$distributions$Distribution$sample, sample_dtype = NULL, validate_args = FALSE, ... )
layer_categorical_mixture_of_one_hot_categorical( object, event_size, num_components, convert_to_tensor_fn = tfp$distributions$Distribution$sample, sample_dtype = NULL, validate_args = FALSE, ... )
object |
What to compose the new
|
event_size |
Scalar |
num_components |
Scalar |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
sample_dtype |
|
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. |
... |
Additional arguments passed to |
Typical choices for convert_to_tensor_fn
include:
tfp$distributions$Distribution$sample
tfp$distributions$Distribution$mean
tfp$distributions$Distribution$mode
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
This layer creates a convolution kernel that is convolved
(actually cross-correlated) with the layer input to produce a tensor of
outputs. It may also include a bias addition and activation function
on the outputs. It assumes the kernel
and/or bias
are drawn from distributions.
layer_conv_1d_flipout( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_conv_1d_flipout( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
filters |
Integer, the dimensionality of the output space (i.e. the number of filters in the convolution). |
kernel_size |
An integer or list of a single integer, specifying the length of the 1D convolution window. |
strides |
An integer or list of a single integer,
specifying the stride length of the convolution.
Specifying any stride value != 1 is incompatible with specifying
any |
padding |
One of |
data_format |
A string, one of |
dilation_rate |
An integer or tuple/list of a single integer, specifying
the dilation rate to use for dilated convolution.
Currently, specifying any |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
outputs = f(inputs; kernel, bias), kernel, bias ~ posterior
where f denotes the layer's calculation. It uses the Flipout
estimator (Wen et al., 2018), which performs a Monte Carlo approximation
of the distribution integrating over the kernel
and bias
. Flipout uses
roughly twice as many floating point operations as the reparameterization
estimator but has the advantage of significantly lower variance.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer creates a convolution kernel that is convolved
(actually cross-correlated) with the layer input to produce a tensor of
outputs. It may also include a bias addition and activation function
on the outputs. It assumes the kernel
and/or bias
are drawn from distributions.
layer_conv_1d_reparameterization( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_conv_1d_reparameterization( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
filters |
Integer, the dimensionality of the output space (i.e. the number of filters in the convolution). |
kernel_size |
An integer or list of a single integer, specifying the length of the 1D convolution window. |
strides |
An integer or list of a single integer,
specifying the stride length of the convolution.
Specifying any stride value != 1 is incompatible with specifying
any |
padding |
One of |
data_format |
A string, one of |
dilation_rate |
An integer or tuple/list of a single integer, specifying
the dilation rate to use for dilated convolution.
Currently, specifying any |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
outputs = f(inputs; kernel, bias), kernel, bias ~ posterior
where f denotes the layer's calculation. It uses the reparameterization
estimator (Kingma and Welling, 2014), which performs a Monte Carlo
approximation of the distribution integrating over the kernel
and bias
.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer creates a convolution kernel that is convolved
(actually cross-correlated) with the layer input to produce a tensor of
outputs. It may also include a bias addition and activation function
on the outputs. It assumes the kernel
and/or bias
are drawn from distributions.
layer_conv_2d_flipout( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_conv_2d_flipout( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
filters |
Integer, the dimensionality of the output space (i.e. the number of filters in the convolution). |
kernel_size |
An integer or list of a single integer, specifying the length of the 1D convolution window. |
strides |
An integer or list of a single integer,
specifying the stride length of the convolution.
Specifying any stride value != 1 is incompatible with specifying
any |
padding |
One of |
data_format |
A string, one of |
dilation_rate |
An integer or tuple/list of a single integer, specifying
the dilation rate to use for dilated convolution.
Currently, specifying any |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
outputs = f(inputs; kernel, bias), kernel, bias ~ posterior
where f denotes the layer's calculation. It uses the Flipout
estimator (Wen et al., 2018), which performs a Monte Carlo approximation
of the distribution integrating over the kernel
and bias
. Flipout uses
roughly twice as many floating point operations as the reparameterization
estimator but has the advantage of significantly lower variance.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer creates a convolution kernel that is convolved
(actually cross-correlated) with the layer input to produce a tensor of
outputs. It may also include a bias addition and activation function
on the outputs. It assumes the kernel
and/or bias
are drawn from distributions.
layer_conv_2d_reparameterization( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_conv_2d_reparameterization( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
filters |
Integer, the dimensionality of the output space (i.e. the number of filters in the convolution). |
kernel_size |
An integer or list of a single integer, specifying the length of the 1D convolution window. |
strides |
An integer or list of a single integer,
specifying the stride length of the convolution.
Specifying any stride value != 1 is incompatible with specifying
any |
padding |
One of |
data_format |
A string, one of |
dilation_rate |
An integer or tuple/list of a single integer, specifying
the dilation rate to use for dilated convolution.
Currently, specifying any |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
outputs = f(inputs; kernel, bias), kernel, bias ~ posterior
where f denotes the layer's calculation. It uses the reparameterization
estimator (Kingma and Welling, 2014), which performs a Monte Carlo
approximation of the distribution integrating over the kernel
and bias
.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer creates a convolution kernel that is convolved
(actually cross-correlated) with the layer input to produce a tensor of
outputs. It may also include a bias addition and activation function
on the outputs. It assumes the kernel
and/or bias
are drawn from distributions.
layer_conv_3d_flipout( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_conv_3d_flipout( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
filters |
Integer, the dimensionality of the output space (i.e. the number of filters in the convolution). |
kernel_size |
An integer or list of a single integer, specifying the length of the 1D convolution window. |
strides |
An integer or list of a single integer,
specifying the stride length of the convolution.
Specifying any stride value != 1 is incompatible with specifying
any |
padding |
One of |
data_format |
A string, one of |
dilation_rate |
An integer or tuple/list of a single integer, specifying
the dilation rate to use for dilated convolution.
Currently, specifying any |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
outputs = f(inputs; kernel, bias), kernel, bias ~ posterior
where f denotes the layer's calculation. It uses the Flipout
estimator (Wen et al., 2018), which performs a Monte Carlo approximation
of the distribution integrating over the kernel
and bias
. Flipout uses
roughly twice as many floating point operations as the reparameterization
estimator but has the advantage of significantly lower variance.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer creates a convolution kernel that is convolved
(actually cross-correlated) with the layer input to produce a tensor of
outputs. It may also include a bias addition and activation function
on the outputs. It assumes the kernel
and/or bias
are drawn from distributions.
layer_conv_3d_reparameterization( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_conv_3d_reparameterization( object, filters, kernel_size, strides = 1, padding = "valid", data_format = "channels_last", dilation_rate = 1, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
filters |
Integer, the dimensionality of the output space (i.e. the number of filters in the convolution). |
kernel_size |
An integer or list of a single integer, specifying the length of the 1D convolution window. |
strides |
An integer or list of a single integer,
specifying the stride length of the convolution.
Specifying any stride value != 1 is incompatible with specifying
any |
padding |
One of |
data_format |
A string, one of |
dilation_rate |
An integer or tuple/list of a single integer, specifying
the dilation rate to use for dilated convolution.
Currently, specifying any |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
outputs = f(inputs; kernel, bias), kernel, bias ~ posterior
where f denotes the layer's calculation. It uses the reparameterization
estimator (Kingma and Welling, 2014), which performs a Monte Carlo
approximation of the distribution integrating over the kernel
and bias
.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
layer_dense_flipout( object, units, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), seed = NULL, ... )
layer_dense_flipout( object, units, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), seed = NULL, ... )
object |
What to compose the new
|
units |
integer dimensionality of the output space |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
seed |
scalar |
... |
Additional keyword arguments passed to the |
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
kernel, bias ~ posterior outputs = activation(matmul(inputs, kernel) + bias)
It uses the Flipout estimator (Wen et al., 2018), which performs a Monte
Carlo approximation of the distribution integrating over the kernel
and
bias
. Flipout uses roughly twice as many floating point operations as the
reparameterization estimator but has the advantage of significantly lower
variance.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
layer_dense_local_reparameterization( object, units, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_dense_local_reparameterization( object, units, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
units |
integer dimensionality of the output space |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
kernel, bias ~ posterior outputs = activation(matmul(inputs, kernel) + bias)
It uses the local reparameterization estimator (Kingma et al., 2015),
which performs a Monte Carlo approximation of the distribution on the hidden
units induced by the kernel
and bias
. The default kernel_posterior_fn
is a normal distribution which factorizes across all elements of the weight
matrix and bias vector. Unlike that paper's multiplicative parameterization, this
distribution has trainable location and scale parameters which is known as
an additive noise parameterization (Molchanov et al., 2017).
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer implements the Bayesian variational inference analogue to
a dense layer by assuming the kernel
and/or the bias
are drawn
from distributions.
layer_dense_reparameterization( object, units, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
layer_dense_reparameterization( object, units, activation = NULL, activity_regularizer = NULL, trainable = TRUE, kernel_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(), kernel_posterior_tensor_fn = function(d) d %>% tfd_sample(), kernel_prior_fn = tfp$layers$util$default_multivariate_normal_fn, kernel_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), bias_posterior_fn = tfp$layers$util$default_mean_field_normal_fn(is_singular = TRUE), bias_posterior_tensor_fn = function(d) d %>% tfd_sample(), bias_prior_fn = NULL, bias_divergence_fn = function(q, p, ignore) tfd_kl_divergence(q, p), ... )
object |
What to compose the new
|
units |
integer dimensionality of the output space |
activation |
Activation function. Set it to None to maintain a linear activation. |
activity_regularizer |
Regularizer function for the output. |
trainable |
Whether the layer weights will be updated during training. |
kernel_posterior_fn |
Function which creates |
kernel_posterior_tensor_fn |
Function which takes a |
kernel_prior_fn |
Function which creates |
kernel_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate
sample(s) from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
bias_posterior_fn |
Function which creates a |
bias_posterior_tensor_fn |
Function which takes a |
bias_prior_fn |
Function which creates |
bias_divergence_fn |
Function which takes the surrogate posterior distribution, prior distribution and random variate sample(s)
from the surrogate posterior and computes or approximates the KL divergence. The
distributions are |
... |
Additional keyword arguments passed to the |
By default, the layer implements a stochastic forward pass via sampling from the kernel and bias posteriors,
kernel, bias ~ posterior outputs = activation(matmul(inputs, kernel) + bias)
It uses the reparameterization estimator (Kingma and Welling, 2014)
which performs a Monte Carlo approximation of the distribution integrating
over the kernel
and bias
.
The arguments permit separate specification of the surrogate posterior
(q(W|x)
), prior (p(W)
), and divergence for both the kernel
and bias
distributions.
Upon being built, this layer adds losses (accessible via the losses
property) representing the divergences of kernel
and/or bias
surrogate
posteriors and their respective priors. When doing minibatch stochastic
optimization, make sure to scale this loss such that it is applied just once
per epoch (e.g. if kl
is the sum of losses
for each element of the batch,
you should pass kl / num_examples_per_epoch
to your optimizer).
You can access the kernel
and/or bias
posterior and prior distributions
after the layer is built via the kernel_posterior
, kernel_prior
,
bias_posterior
and bias_prior
properties.
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_variational()
,
layer_variable()
This layer uses variational inference to fit a "surrogate" posterior to the
distribution over both the kernel
matrix and the bias
terms which are
otherwise used in a manner similar to layer_dense()
.
This layer fits the "weights posterior" according to the following generative
process:
[K, b] ~ Prior() M = matmul(X, K) + b Y ~ Likelihood(M)
layer_dense_variational( object, units, make_posterior_fn, make_prior_fn, kl_weight = NULL, kl_use_exact = FALSE, activation = NULL, use_bias = TRUE, ... )
layer_dense_variational( object, units, make_posterior_fn, make_prior_fn, kl_weight = NULL, kl_use_exact = FALSE, activation = NULL, use_bias = TRUE, ... )
object |
What to compose the new
|
units |
Positive integer, dimensionality of the output space. |
make_posterior_fn |
function taking |
make_prior_fn |
function taking |
kl_weight |
Amount by which to scale the KL divergence loss between prior and posterior. |
kl_use_exact |
Logical indicating that the analytical KL divergence should be used rather than a Monte Carlo approximation. |
activation |
An activation function. See |
use_bias |
Whether or not the dense layers constructed in this layer
should have a bias term. See |
... |
Additional keyword arguments passed to the |
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_variable()
Keras layer enabling plumbing TFP distributions through Keras models
layer_distribution_lambda( object, make_distribution_fn, convert_to_tensor_fn = tfp$distributions$Distribution$sample, ... )
layer_distribution_lambda( object, make_distribution_fn, convert_to_tensor_fn = tfp$distributions$Distribution$sample, ... )
object |
What to compose the new
|
make_distribution_fn |
A callable that takes previous layer outputs and returns a |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
An Independent-Bernoulli Keras layer from prod(event_shape) params
layer_independent_bernoulli( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, sample_dtype = NULL, validate_args = FALSE, ... )
layer_independent_bernoulli( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, sample_dtype = NULL, validate_args = FALSE, ... )
object |
What to compose the new
|
event_shape |
Scalar integer representing the size of single draw from this distribution. |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
sample_dtype |
dtype of samples produced by this distribution. Default value: NULL (i.e., previous layer's dtype). |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked
for validity despite possibly degrading runtime performance. When FALSE invalid inputs may
silently render incorrect outputs. Default value: FALSE.
@param ... Additional arguments passed to |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
An independent Logistic Keras layer.
layer_independent_logistic( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
layer_independent_logistic( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
object |
What to compose the new
|
event_shape |
Scalar integer representing the size of single draw from this distribution. |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked
for validity despite possibly degrading runtime performance. When FALSE invalid inputs may
silently render incorrect outputs. Default value: FALSE.
@param ... Additional arguments passed to |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
An independent Normal Keras layer.
layer_independent_normal( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
layer_independent_normal( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
object |
What to compose the new
|
event_shape |
Scalar integer representing the size of single draw from this distribution. |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked
for validity despite possibly degrading runtime performance. When FALSE invalid inputs may
silently render incorrect outputs. Default value: FALSE.
@param ... Additional arguments passed to |
... |
Additional arguments passed to |
a Keras layer
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
library(keras) input_shape <- c(28, 28, 1) encoded_shape <- 2 n <- 2 model <- keras_model_sequential( list( layer_input(shape = input_shape), layer_flatten(), layer_dense(units = n), layer_dense(units = params_size_independent_normal(encoded_shape)), layer_independent_normal(event_shape = encoded_shape) ) )
library(keras) input_shape <- c(28, 28, 1) encoded_shape <- 2 n <- 2 model <- keras_model_sequential( list( layer_input(shape = input_shape), layer_flatten(), layer_dense(units = n), layer_dense(units = params_size_independent_normal(encoded_shape)), layer_independent_normal(event_shape = encoded_shape) ) )
An independent Poisson Keras layer.
layer_independent_poisson( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
layer_independent_poisson( object, event_shape, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
object |
What to compose the new
|
event_shape |
Scalar integer representing the size of single draw from this distribution. |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked
for validity despite possibly degrading runtime performance. When FALSE invalid inputs may
silently render incorrect outputs. Default value: FALSE.
@param ... Additional arguments passed to |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
Pass-through layer that adds a KL divergence penalty to the model loss
layer_kl_divergence_add_loss( object, distribution_b, use_exact_kl = FALSE, test_points_reduce_axis = NULL, test_points_fn = tf$convert_to_tensor, weight = NULL, ... )
layer_kl_divergence_add_loss( object, distribution_b, use_exact_kl = FALSE, test_points_reduce_axis = NULL, test_points_fn = tf$convert_to_tensor, weight = NULL, ... )
object |
What to compose the new
|
distribution_b |
Distribution instance corresponding to b as in |
use_exact_kl |
Logical indicating if KL divergence should be
calculated exactly via |
test_points_reduce_axis |
Integer vector or scalar representing dimensions over which to reduce_mean while calculating the Monte Carlo approximation of the KL divergence. As is with all tf$reduce_* ops, NULL means reduce over all dimensions; () means reduce over none of them. Default value: () (i.e., no reduction). |
test_points_fn |
A callable taking a |
weight |
Multiplier applied to the calculated KL divergence for each Keras batch member. Default value: NULL (i.e., do not weight each batch member). |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
When using Monte Carlo approximation (e.g., use_exact = FALSE
), it is presumed that the input
distribution's concretization (i.e., tf$convert_to_tensor(distribution)
) corresponds to a random
sample. To override this behavior, set test_points_fn.
layer_kl_divergence_regularizer( object, distribution_b, use_exact_kl = FALSE, test_points_reduce_axis = NULL, test_points_fn = tf$convert_to_tensor, weight = NULL, ... )
layer_kl_divergence_regularizer( object, distribution_b, use_exact_kl = FALSE, test_points_reduce_axis = NULL, test_points_fn = tf$convert_to_tensor, weight = NULL, ... )
object |
What to compose the new
|
distribution_b |
Distribution instance corresponding to b as in |
use_exact_kl |
Logical indicating if KL divergence should be
calculated exactly via |
test_points_reduce_axis |
Integer vector or scalar representing dimensions over which to reduce_mean while calculating the Monte Carlo approximation of the KL divergence. As is with all tf$reduce_* ops, NULL means reduce over all dimensions; () means reduce over none of them. Default value: () (i.e., no reduction). |
test_points_fn |
A callable taking a |
weight |
Multiplier applied to the calculated KL divergence for each Keras batch member. Default value: NULL (i.e., do not weight each batch member). |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
A mixture distribution Keras layer, with independent logistic components.
layer_mixture_logistic( object, num_components, event_shape = list(), convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
layer_mixture_logistic( object, num_components, event_shape = list(), convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
object |
What to compose the new
|
num_components |
Number of component distributions in the mixture distribution. |
event_shape |
integer vector |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked
for validity despite possibly degrading runtime performance. When FALSE invalid inputs may
silently render incorrect outputs. Default value: FALSE.
@param ... Additional arguments passed to |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
A mixture distribution Keras layer, with independent normal components.
layer_mixture_normal( object, num_components, event_shape = list(), convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
layer_mixture_normal( object, num_components, event_shape = list(), convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
object |
What to compose the new
|
num_components |
Number of component distributions in the mixture distribution. |
event_shape |
integer vector |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked
for validity despite possibly degrading runtime performance. When FALSE invalid inputs may
silently render incorrect outputs. Default value: FALSE.
@param ... Additional arguments passed to |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
A mixture (same-family) Keras layer.
layer_mixture_same_family( object, num_components, component_layer, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
layer_mixture_same_family( object, num_components, component_layer, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
object |
What to compose the new
|
num_components |
Number of component distributions in the mixture distribution. |
component_layer |
Function that, given a tensor of shape
|
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked
for validity despite possibly degrading runtime performance. When FALSE invalid inputs may
silently render incorrect outputs. Default value: FALSE.
@param ... Additional arguments passed to |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_multivariate_normal_tri_l()
,
layer_one_hot_categorical()
d+d*(d+1)/ 2
paramsA d-variate Multivariate Normal TriL Keras layer from d+d*(d+1)/ 2
params
layer_multivariate_normal_tri_l( object, event_size, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
layer_multivariate_normal_tri_l( object, event_size, convert_to_tensor_fn = tfp$distributions$Distribution$sample, validate_args = FALSE, ... )
object |
What to compose the new
|
event_size |
Integer vector tensor representing the shape of single draw from this distribution. |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_one_hot_categorical()
d
-variate OneHotCategorical Keras layer from d
params.Typical choices for convert_to_tensor_fn
include:
tfp$distributions$Distribution$sample
tfp$distributions$Distribution$mean
tfp$distributions$Distribution$mode
tfp$distributions$OneHotCategorical$logits
layer_one_hot_categorical( object, event_size, convert_to_tensor_fn = tfp$distributions$Distribution$sample, sample_dtype = NULL, validate_args = FALSE, ... )
layer_one_hot_categorical( object, event_size, convert_to_tensor_fn = tfp$distributions$Distribution$sample, sample_dtype = NULL, validate_args = FALSE, ... )
object |
What to compose the new
|
event_size |
Scalar |
convert_to_tensor_fn |
A callable that takes a tfd$Distribution instance and returns a
tf$Tensor-like object. Default value: |
sample_dtype |
|
validate_args |
Logical, default FALSE. When TRUE distribution parameters are checked for validity despite possibly degrading runtime performance. When FALSE invalid inputs may silently render incorrect outputs. Default value: FALSE. |
... |
Additional arguments passed to |
a Keras layer
For an example how to use in a Keras model, see layer_independent_normal()
.
Other distribution_layers:
layer_categorical_mixture_of_one_hot_categorical()
,
layer_distribution_lambda()
,
layer_independent_bernoulli()
,
layer_independent_logistic()
,
layer_independent_normal()
,
layer_independent_poisson()
,
layer_kl_divergence_add_loss()
,
layer_kl_divergence_regularizer()
,
layer_mixture_logistic()
,
layer_mixture_normal()
,
layer_mixture_same_family()
,
layer_multivariate_normal_tri_l()
Simply returns a (trainable) variable, regardless of input.
This layer implements the mathematical function f(x) = c
where c
is a
constant, i.e., unchanged for all x
. Like other Keras layers, the constant
is trainable
. This layer can also be interpretted as the special case of
layer_dense()
when the kernel
is forced to be the zero matrix
(tf$zeros
).
layer_variable( object, shape, dtype = NULL, activation = NULL, initializer = "zeros", regularizer = NULL, constraint = NULL, ... )
layer_variable( object, shape, dtype = NULL, activation = NULL, initializer = "zeros", regularizer = NULL, constraint = NULL, ... )
object |
What to compose the new
|
shape |
integer or integer vector specifying the shape of the output of this layer. |
dtype |
TensorFlow |
activation |
An activation function. See |
initializer |
Initializer for the |
regularizer |
Regularizer function applied to the |
constraint |
Constraint function applied to the |
... |
Additional keyword arguments passed to the |
a Keras layer
Other layers:
layer_autoregressive()
,
layer_conv_1d_flipout()
,
layer_conv_1d_reparameterization()
,
layer_conv_2d_flipout()
,
layer_conv_2d_reparameterization()
,
layer_conv_3d_flipout()
,
layer_conv_3d_reparameterization()
,
layer_dense_flipout()
,
layer_dense_local_reparameterization()
,
layer_dense_reparameterization()
,
layer_dense_variational()
Create a Variational Gaussian Process distribution whose index_points
are
the inputs to the layer. Parameterized by number of inducing points and a
kernel_provider
, which should be a tf.keras.Layer
with an @property that
late-binds variable parameters to a tfp.positive_semidefinite_kernel.PositiveSemidefiniteKernel
instance (this requirement has to do with the way that variables must be created
in a keras model). The mean_fn is an optional argument which, if omitted, will
be automatically configured to be a constant function with trainable variable
output.
layer_variational_gaussian_process( object, num_inducing_points, kernel_provider, event_shape = 1, inducing_index_points_initializer = NULL, unconstrained_observation_noise_variance_initializer = NULL, mean_fn = NULL, jitter = 1e-06, name = NULL )
layer_variational_gaussian_process( object, num_inducing_points, kernel_provider, event_shape = 1, inducing_index_points_initializer = NULL, unconstrained_observation_noise_variance_initializer = NULL, mean_fn = NULL, jitter = 1e-06, name = NULL )
object |
What to compose the new
|
num_inducing_points |
number of inducing points in the Variational Gaussian Process distribution. |
kernel_provider |
a |
event_shape |
the shape of the output of the layer. This translates to a
batch of underlying Variational Gaussian Process distributions. For example,
|
inducing_index_points_initializer |
a |
unconstrained_observation_noise_variance_initializer |
a |
mean_fn |
a callable that maps layer inputs to mean function values. Passed to the mean_fn parameter of Variational Gaussian Process distribution. If omitted, defaults to a constant function with trainable variable value. |
jitter |
a small term added to the diagonal of various kernel matrices for numerical stability. |
name |
name to give to this layer and the scope of ops and variables it contains. |
a Keras layer
step_size
based on log_accept_prob
.The dual averaging policy uses a noisy step size for exploration, while
averaging over tuning steps to provide a smoothed estimate of an optimal
value. It is based on section 3.2 of Hoffman and Gelman (2013), which
modifies the [stochastic convex optimization scheme of Nesterov (2009).
The modified algorithm applies extra weight to recent iterations while
keeping the convergence guarantees of Robbins-Monro, and takes care not
to make the step size too small too quickly when maintaining a constant
trajectory length, to avoid expensive early iterations. A good target
acceptance probability depends on the inner kernel. If this kernel is
HamiltonianMonteCarlo
, then 0.6-0.9 is a good range to aim for. For
RandomWalkMetropolis
this should be closer to 0.25. See the individual
kernels' docstrings for guidance.
mcmc_dual_averaging_step_size_adaptation( inner_kernel, num_adaptation_steps, target_accept_prob = 0.75, exploration_shrinkage = 0.05, step_count_smoothing = 10, decay_rate = 0.75, step_size_setter_fn = NULL, step_size_getter_fn = NULL, log_accept_prob_getter_fn = NULL, validate_args = FALSE, name = NULL )
mcmc_dual_averaging_step_size_adaptation( inner_kernel, num_adaptation_steps, target_accept_prob = 0.75, exploration_shrinkage = 0.05, step_count_smoothing = 10, decay_rate = 0.75, step_size_setter_fn = NULL, step_size_getter_fn = NULL, log_accept_prob_getter_fn = NULL, validate_args = FALSE, name = NULL )
inner_kernel |
|
num_adaptation_steps |
Scalar |
target_accept_prob |
A floating point |
exploration_shrinkage |
Floating point scalar |
step_count_smoothing |
Int32 scalar |
decay_rate |
Floating point scalar |
step_size_setter_fn |
A function with the signature
|
step_size_getter_fn |
A callable with the signature
|
log_accept_prob_getter_fn |
A callable with the signature
|
validate_args |
|
name |
name prefixed to Ops created by this function.
Default value: |
In general, adaptation prevents the chain from reaching a stationary
distribution, so obtaining consistent samples requires num_adaptation_steps
be set to a value somewhat smaller than the number of burnin steps.
However, it may sometimes be helpful to set num_adaptation_steps
to a larger
value during development in order to inspect the behavior of the chain during
adaptation.
The step size is assumed to broadcast with the chain state, potentially having
leading dimensions corresponding to multiple chains. When there are fewer of
those leading dimensions than there are chain dimensions, the corresponding
dimensions in the log_accept_prob
are averaged (in the direct space, rather
than the log space) before being used to adjust the step size. This means that
this kernel can do both cross-chain adaptation, or per-chain step size
adaptation, depending on the shape of the step size.
For example, if your problem has a state with shape [S]
, your chain state
has shape [C0, C1, S]
(meaning that there are C0 * C1
total chains) and
log_accept_prob
has shape [C0, C1]
(one acceptance probability per chain),
then depending on the shape of the step size, the following will happen:
Step size has shape []
, [S]
or [1]
, the log_accept_prob
will be averaged
across its C0
and C1
dimensions. This means that you will learn a shared
step size based on the mean acceptance probability across all chains. This
can be useful if you don't have a lot of steps to adapt and want to average
away the noise.
Step size has shape [C1, 1]
or [C1, S]
, the log_accept_prob
will be
averaged across its C0
dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the C1
dimension. This can be useful when the C1
dimension indexes different distributions, while C0
indexes replicas of a
single distribution, all sampled in parallel.
Step size has shape [C0, C1, 1]
or [C0, C1, S]
, then no averaging will
happen. This means that each chain will learn its own step size. This can be
useful when all chains are sampling from different distributions. Even when
all chains are for the same distribution, this can help during the initial
warmup period.
Step size has shape [C0, 1, 1]
or [C0, 1, S]
, the log_accept_prob
will be
averaged across its C1
dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the C0
dimension. This can be useful when the C0
dimension indexes different distributions, while C1
indexes replicas of a
single distribution, all sampled in parallel.
a Monte Carlo sampling kernel
For an example how to use see mcmc_no_u_turn_sampler()
.
Other mcmc_kernels:
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
Roughly speaking, "effective sample size" (ESS) is the size of an iid sample
with the same variance as state
.
mcmc_effective_sample_size( states, filter_threshold = 0, filter_beyond_lag = NULL, name = NULL )
mcmc_effective_sample_size( states, filter_threshold = 0, filter_beyond_lag = NULL, name = NULL )
states |
|
filter_threshold |
|
filter_beyond_lag |
|
name |
name to prepend to created ops. |
More precisely, given a stationary sequence of possibly correlated random
variables X_1, X_2,...,X_N
, each identically distributed ESS is the number
such that
Variance{ N**-1 * Sum{X_i} } = ESS**-1 * Variance{ X_1 }.
If the sequence is uncorrelated, ESS = N
. In general, one should expect
ESS <= N
, with more highly correlated sequences having smaller ESS
.
Tensor
or list of Tensor
objects. The effective sample size of
each component of states
. Shape will be states$shape[1:]
.
Other mcmc_functions:
mcmc_potential_scale_reduction()
,
mcmc_sample_annealed_importance_chain()
,
mcmc_sample_chain()
,
mcmc_sample_halton_sequence()
Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm
that takes a series of gradient-informed steps to produce a Metropolis
proposal. This class implements one random HMC step from a given
current_state
. Mathematical details and derivations can be found in
Neal (2011).
mcmc_hamiltonian_monte_carlo( target_log_prob_fn, step_size, num_leapfrog_steps, state_gradients_are_stopped = FALSE, step_size_update_fn = NULL, seed = NULL, store_parameters_in_results = FALSE, name = NULL )
mcmc_hamiltonian_monte_carlo( target_log_prob_fn, step_size, num_leapfrog_steps, state_gradients_are_stopped = FALSE, step_size_update_fn = NULL, seed = NULL, store_parameters_in_results = FALSE, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
step_size |
|
num_leapfrog_steps |
Integer number of steps to run the leapfrog integrator
for. Total progress per HMC step is roughly proportional to
|
state_gradients_are_stopped |
|
step_size_update_fn |
Function taking current |
seed |
integer to seed the random number generator. |
store_parameters_in_results |
If |
name |
string prefixed to Ops created by this function.
Default value: |
The one_step
function can update multiple chains in parallel. It assumes
that all leftmost dimensions of current_state
index independent chain states
(and are therefore updated independently). The output of
target_log_prob_fn(current_state)
should sum log-probabilities across all
event dimensions. Slices along the rightmost dimensions may have different
target distributions; for example, current_state[0, :]
could have a
different target distribution from current_state[1, :]
. These semantics are
governed by target_log_prob_fn(current_state)
. (The number of independent
chains is tf$size(target_log_prob_fn(current_state))
.)
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
Metropolis-adjusted Langevin algorithm (MALA) is a Markov chain Monte Carlo
(MCMC) algorithm that takes a step of a discretised Langevin diffusion as a
proposal. This class implements one step of MALA using Euler-Maruyama method
for a given current_state
and diagonal preconditioning volatility
matrix.
mcmc_metropolis_adjusted_langevin_algorithm( target_log_prob_fn, step_size, volatility_fn = NULL, seed = NULL, parallel_iterations = 10, name = NULL )
mcmc_metropolis_adjusted_langevin_algorithm( target_log_prob_fn, step_size, volatility_fn = NULL, seed = NULL, parallel_iterations = 10, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
step_size |
|
volatility_fn |
function which takes an argument like
|
seed |
integer to seed the random number generator. |
parallel_iterations |
the number of coordinates for which the gradients of
the volatility matrix |
name |
String prefixed to Ops created by this function.
Default value: |
Mathematical details and derivations can be found in Roberts and Rosenthal (1998) and Xifara et al. (2013).
The one_step
function can update multiple chains in parallel. It assumes
that all leftmost dimensions of current_state
index independent chain states
(and are therefore updated independently). The output of
target_log_prob_fn(current_state)
should reduce log-probabilities across
all event dimensions. Slices along the rightmost dimensions may have different
target distributions; for example, current_state[0, :]
could have a
different target distribution from current_state[1, :]
. These semantics are
governed by target_log_prob_fn(current_state)
. (The number of independent
chains is tf.size(target_log_prob_fn(current_state))
.)
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
The Metropolis-Hastings algorithm is a Markov chain Monte Carlo (MCMC) technique which uses a proposal distribution to eventually sample from a target distribution.
mcmc_metropolis_hastings(inner_kernel, seed = NULL, name = NULL)
mcmc_metropolis_hastings(inner_kernel, seed = NULL, name = NULL)
inner_kernel |
|
seed |
integer to seed the random number generator. |
name |
string prefixed to Ops created by this function. Default value: |
Note: inner_kernel$one_step
must return kernel_results
as a collections$namedtuple
which must:
have a target_log_prob
field,
optionally have a log_acceptance_correction
field, and,
have only fields which are Tensor
-valued.
The Metropolis-Hastings log acceptance-probability is computed as:
log_accept_ratio = (current_kernel_results.target_log_prob - previous_kernel_results.target_log_prob + current_kernel_results.log_acceptance_correction)
If current_kernel_results$log_acceptance_correction
does not exist, it is
presumed 0
(i.e., that the proposal distribution is symmetric).
The most common use-case for log_acceptance_correction
is in the
Metropolis-Hastings algorithm, i.e.,
accept_prob(x' | x) = p(x') / p(x) (g(x|x') / g(x'|x)) where, p represents the target distribution, g represents the proposal (conditional) distribution, x' is the proposed state, and, x is current state
The log of the parenthetical term is the log_acceptance_correction
.
The log_acceptance_correction
may not necessarily correspond to the ratio of
proposal distributions, e.g, log_acceptance_correction
has a different
interpretation in Hamiltonian Monte Carlo.
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
The No U-Turn Sampler (NUTS) is an adaptive variant of the Hamiltonian Monte
Carlo (HMC) method for MCMC. NUTS adapts the distance traveled in response to
the curvature of the target density. Conceptually, one proposal consists of
reversibly evolving a trajectory through the sample space, continuing until
that trajectory turns back on itself (hence the name, 'No U-Turn').
This class implements one random NUTS step from a given
current_state
. Mathematical details and derivations can be found in
Hoffman & Gelman (2011).
mcmc_no_u_turn_sampler( target_log_prob_fn, step_size, max_tree_depth = 10, max_energy_diff = 1000, unrolled_leapfrog_steps = 1, seed = NULL, name = NULL )
mcmc_no_u_turn_sampler( target_log_prob_fn, step_size, max_tree_depth = 10, max_energy_diff = 1000, unrolled_leapfrog_steps = 1, seed = NULL, name = NULL )
target_log_prob_fn |
function which takes an argument like
|
step_size |
|
max_tree_depth |
Maximum depth of the tree implicitly built by NUTS. The
maximum number of leapfrog steps is bounded by |
max_energy_diff |
Scaler threshold of energy differences at each leapfrog, divergence samples are defined as leapfrog steps that exceed this threshold. Default to 1000. |
unrolled_leapfrog_steps |
The number of leapfrogs to unroll per tree expansion step. Applies a direct linear multipler to the maximum trajectory length implied by max_tree_depth. Defaults to 1. |
seed |
integer to seed the random number generator. |
name |
name prefixed to Ops created by this function.
Default value: |
The one_step
function can update multiple chains in parallel. It assumes
that a prefix of leftmost dimensions of current_state
index independent
chain states (and are therefore updated independently). The output of
target_log_prob_fn(current_state)
should sum log-probabilities across all
event dimensions. Slices along the rightmost dimensions may have different
target distributions; for example, current_state[0][0, ...]
could have a
different target distribution from current_state[0][1, ...]
. These
semantics are governed by target_log_prob_fn(*current_state)
.
(The number of independent chains is tf$size(target_log_prob_fn(current_state))
.)
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
predictors <- tf$cast( c(201,244, 47,287,203,58,210,202,198,158,165,201,157, 131,166,160,186,125,218,146),tf$float32) obs <- tf$cast(c(592,401,583,402,495,173,479,504,510,416,393,442,317,311,400, 337,423,334,533,344),tf$float32) y_sigma <- tf$cast(c(61,25,38,15,21,15,27,14,30,16,14,25,52,16,34,31,42,26, 16,22),tf$float32) # Robust linear regression model robust_lm <- tfd_joint_distribution_sequential( list( tfd_normal(loc = 0, scale = 1, name = "b0"), tfd_normal(loc = 0, scale = 1, name = "b1"), tfd_half_normal(5, name = "df"), function(df, b1, b0) tfd_independent( tfd_student_t( # Likelihood df = tf$expand_dims(df, axis = -1L), loc = tf$expand_dims(b0, axis = -1L) + tf$expand_dims(b1, axis = -1L) * predictors[tf$newaxis, ], scale = y_sigma, name = "st" ), name = "ind")), validate_args = TRUE) log_prob <-function(b0, b1, df) {robust_lm %>% tfd_log_prob(list(b0, b1, df, obs))} step_size0 <- Map(function(x) tf$cast(x, tf$float32), c(1, .2, .5)) number_of_steps <- 10 burnin <- 5 nchain <- 50 run_chain <- function() { # random initialization of the starting postion of each chain samples <- robust_lm %>% tfd_sample(nchain) b0 <- samples[[1]] b1 <- samples[[2]] df <- samples[[3]] # bijector to map constrained parameters to real unconstraining_bijectors <- list( tfb_identity(), tfb_identity(), tfb_exp()) trace_fn <- function(x, pkr) { list(pkr$inner_results$inner_results$step_size, pkr$inner_results$inner_results$log_accept_ratio) } nuts <- mcmc_no_u_turn_sampler( target_log_prob_fn = log_prob, step_size = step_size0 ) %>% mcmc_transformed_transition_kernel(bijector = unconstraining_bijectors) %>% mcmc_dual_averaging_step_size_adaptation( num_adaptation_steps = burnin, step_size_setter_fn = function(pkr, new_step_size) pkr$`_replace`( inner_results = pkr$inner_results$`_replace`(step_size = new_step_size)), step_size_getter_fn = function(pkr) pkr$inner_results$step_size, log_accept_prob_getter_fn = function(pkr) pkr$inner_results$log_accept_ratio ) nuts %>% mcmc_sample_chain( num_results = number_of_steps, num_burnin_steps = burnin, current_state = list(b0, b1, df), trace_fn = trace_fn) } run_chain <- tensorflow::tf_function(run_chain) res <- run_chain()
predictors <- tf$cast( c(201,244, 47,287,203,58,210,202,198,158,165,201,157, 131,166,160,186,125,218,146),tf$float32) obs <- tf$cast(c(592,401,583,402,495,173,479,504,510,416,393,442,317,311,400, 337,423,334,533,344),tf$float32) y_sigma <- tf$cast(c(61,25,38,15,21,15,27,14,30,16,14,25,52,16,34,31,42,26, 16,22),tf$float32) # Robust linear regression model robust_lm <- tfd_joint_distribution_sequential( list( tfd_normal(loc = 0, scale = 1, name = "b0"), tfd_normal(loc = 0, scale = 1, name = "b1"), tfd_half_normal(5, name = "df"), function(df, b1, b0) tfd_independent( tfd_student_t( # Likelihood df = tf$expand_dims(df, axis = -1L), loc = tf$expand_dims(b0, axis = -1L) + tf$expand_dims(b1, axis = -1L) * predictors[tf$newaxis, ], scale = y_sigma, name = "st" ), name = "ind")), validate_args = TRUE) log_prob <-function(b0, b1, df) {robust_lm %>% tfd_log_prob(list(b0, b1, df, obs))} step_size0 <- Map(function(x) tf$cast(x, tf$float32), c(1, .2, .5)) number_of_steps <- 10 burnin <- 5 nchain <- 50 run_chain <- function() { # random initialization of the starting postion of each chain samples <- robust_lm %>% tfd_sample(nchain) b0 <- samples[[1]] b1 <- samples[[2]] df <- samples[[3]] # bijector to map constrained parameters to real unconstraining_bijectors <- list( tfb_identity(), tfb_identity(), tfb_exp()) trace_fn <- function(x, pkr) { list(pkr$inner_results$inner_results$step_size, pkr$inner_results$inner_results$log_accept_ratio) } nuts <- mcmc_no_u_turn_sampler( target_log_prob_fn = log_prob, step_size = step_size0 ) %>% mcmc_transformed_transition_kernel(bijector = unconstraining_bijectors) %>% mcmc_dual_averaging_step_size_adaptation( num_adaptation_steps = burnin, step_size_setter_fn = function(pkr, new_step_size) pkr$`_replace`( inner_results = pkr$inner_results$`_replace`(step_size = new_step_size)), step_size_getter_fn = function(pkr) pkr$inner_results$step_size, log_accept_prob_getter_fn = function(pkr) pkr$inner_results$log_accept_ratio ) nuts %>% mcmc_sample_chain( num_results = number_of_steps, num_burnin_steps = burnin, current_state = list(b0, b1, df), trace_fn = trace_fn) } run_chain <- tensorflow::tf_function(run_chain) res <- run_chain()
Given N > 1
states from each of C > 1
independent chains, the potential
scale reduction factor, commonly referred to as R-hat, measures convergence of
the chains (to the same target) by testing for equality of means.
mcmc_potential_scale_reduction( chains_states, independent_chain_ndims = 1, name = NULL )
mcmc_potential_scale_reduction( chains_states, independent_chain_ndims = 1, name = NULL )
chains_states |
|
independent_chain_ndims |
Integer type |
name |
name to prepend to created tf. Default: |
Specifically, R-hat measures the degree to which variance (of the means) between chains exceeds what one would expect if the chains were identically distributed. See Gelman and Rubin (1992), Brooks and Gelman (1998)].
Some guidelines:
The initial state of the chains should be drawn from a distribution overdispersed with respect to the target.
If all chains converge to the target, then as N --> infinity
, R-hat –> 1.
Before that, R-hat > 1 (except in pathological cases, e.g. if the chain paths were identical).
The above holds for any number of chains C > 1
. Increasing C
improves effectiveness of the diagnostic.
Sometimes, R-hat < 1.2 is used to indicate approximate convergence, but of course this is problem dependent. See Brooks and Gelman (1998).
R-hat only measures non-convergence of the mean. If higher moments, or other statistics are desired, a different diagnostic should be used. See Brooks and Gelman (1998).
To see why R-hat is reasonable, let X
be a random variable drawn uniformly
from the combined states (combined over all chains). Then, in the limit
N, C --> infinity
, with E
, Var
denoting expectation and variance,
R-hat = ( E[Var[X | chain]] + Var[E[X | chain]] ) / E[Var[X | chain]].
Using the law of total variance, the numerator is the variance of the combined
states, and the denominator is the total variance minus the variance of the
the individual chain means. If the chains are all drawing from the same
distribution, they will have the same mean, and thus the ratio should be one.
Tensor
or list
of Tensor
s representing the R-hat statistic for
the state(s). Same dtype
as state
, and shape equal to
state$shape[1 + independent_chain_ndims:]
.
Stephen P. Brooks and Andrew Gelman. General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics, 7(4), 1998.
Andrew Gelman and Donald B. Rubin. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4):457-472, 1992.
Other mcmc_functions:
mcmc_effective_sample_size()
,
mcmc_sample_annealed_importance_chain()
,
mcmc_sample_chain()
,
mcmc_sample_halton_sequence()
Random Walk Metropolis is a gradient-free Markov chain Monte Carlo
(MCMC) algorithm. The algorithm involves a proposal generating step
proposal_state = current_state + perturb
by a random
perturbation, followed by Metropolis-Hastings accept/reject step. For more
details see Section 2.1 of Roberts and Rosenthal (2004).
mcmc_random_walk_metropolis( target_log_prob_fn, new_state_fn = NULL, seed = NULL, name = NULL )
mcmc_random_walk_metropolis( target_log_prob_fn, new_state_fn = NULL, seed = NULL, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
new_state_fn |
Function which takes a list of state parts and a
seed; returns a same-type |
seed |
integer to seed the random number generator. |
name |
String name prefixed to Ops created by this function.
Default value: |
The current class implements RWM for normal and uniform proposals. Alternatively,
the user can supply any custom proposal generating function.
The function one_step
can update multiple chains in parallel. It assumes
that all leftmost dimensions of current_state
index independent chain states
(and are therefore updated independently). The output of
target_log_prob_fn(current_state)
should sum log-probabilities across all
event dimensions. Slices along the rightmost dimensions may have different
target distributions; for example, current_state[0, :]
could have a
different target distribution from current_state[1, :]
. These semantics
are governed by target_log_prob_fn(current_state)
. (The number of
independent chains is tf$size(target_log_prob_fn(current_state))
.)
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
Replica Exchange Monte Carlo
is a Markov chain Monte Carlo (MCMC) algorithm that is also known as Parallel Tempering.
This algorithm performs multiple sampling with different temperatures in parallel,
and exchanges those samplings according to the Metropolis-Hastings criterion.
The K
replicas are parameterized in terms of inverse_temperature
's,
(beta[0], beta[1], ..., beta[K-1])
. If the target distribution has
probability density p(x)
, the kth
replica has density p(x)**beta_k
.
mcmc_replica_exchange_mc( target_log_prob_fn, inverse_temperatures, make_kernel_fn, swap_proposal_fn = tfp$mcmc$replica_exchange_mc$default_swap_proposal_fn(1), state_includes_replicas = FALSE, seed = NULL, name = NULL )
mcmc_replica_exchange_mc( target_log_prob_fn, inverse_temperatures, make_kernel_fn, swap_proposal_fn = tfp$mcmc$replica_exchange_mc$default_swap_proposal_fn(1), state_includes_replicas = FALSE, seed = NULL, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
inverse_temperatures |
|
make_kernel_fn |
Function which takes target_log_prob_fn and seed args and returns a TransitionKernel instance. |
swap_proposal_fn |
function which take a number of replicas, and return combinations of replicas for exchange. |
state_includes_replicas |
Boolean indicating whether the leftmost dimension
of each state sample should index replicas. If |
seed |
integer to seed the random number generator. |
name |
string prefixed to Ops created by this function.
Default value: |
Typically beta[0] = 1.0
, and 1.0 > beta[1] > beta[2] > ... > 0.0
.
beta[0] == 1
==> First replicas samples from the target density, p
.
beta[k] < 1
, for k = 1, ..., K-1
==> Other replicas sample from
"flattened" versions of p
(peak is less high, valley less low). These
distributions are somewhat closer to a uniform on the support of p
.
Samples from adjacent replicas i
, i + 1
are used as proposals for each
other in a Metropolis step. This allows the lower beta
samples, which
explore less dense areas of p
, to occasionally be used to help the
beta == 1
chain explore new regions of the support.
Samples from replica 0 are returned, and the others are discarded.
list of
next_state
(Tensor or Python list of Tensor
s representing the state(s)
of the Markov chain(s) at each result step. Has same shape as
and current_state
.) and
kernel_results
(collections$namedtuple
of internal calculations used to
'advance the chain).
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
This function uses an MCMC transition operator (e.g., Hamiltonian Monte Carlo)
to sample from a series of distributions that slowly interpolates between
an initial "proposal" distribution:
exp(proposal_log_prob_fn(x) - proposal_log_normalizer)
and the target distribution:
exp(target_log_prob_fn(x) - target_log_normalizer)
,
accumulating importance weights along the way. The product of these
importance weights gives an unbiased estimate of the ratio of the
normalizing constants of the initial distribution and the target
distribution:
E[exp(ais_weights)] = exp(target_log_normalizer - proposal_log_normalizer)
.
mcmc_sample_annealed_importance_chain( num_steps, proposal_log_prob_fn, target_log_prob_fn, current_state, make_kernel_fn, parallel_iterations = 10, name = NULL )
mcmc_sample_annealed_importance_chain( num_steps, proposal_log_prob_fn, target_log_prob_fn, current_state, make_kernel_fn, parallel_iterations = 10, name = NULL )
num_steps |
Integer number of Markov chain updates to run. More iterations means more expense, but smoother annealing between q and p, which in turn means exponentially lower variance for the normalizing constant estimator. |
proposal_log_prob_fn |
function that returns the log density of the initial distribution. |
target_log_prob_fn |
function which takes an argument like
|
current_state |
|
make_kernel_fn |
function which returns a |
parallel_iterations |
The number of iterations allowed to run in parallel.
It must be a positive integer. See |
name |
string prefixed to Ops created by this function.
Default value: |
Note: When running in graph mode, proposal_log_prob_fn
and
target_log_prob_fn
are called exactly three times (although this may be
reduced to two times in the future).
list of
next_state
(Tensor
or Python list of Tensor
s representing the
state(s) of the Markov chain(s) at the final iteration. Has same shape as
input current_state
),
ais_weights
(Tensor with the estimated weight(s). Has shape matching
target_log_prob_fn(current_state)
), and
kernel_results
(collections.namedtuple
of internal calculations used to
advance the chain).
For an example how to use see mcmc_sample_chain()
.
Other mcmc_functions:
mcmc_effective_sample_size()
,
mcmc_potential_scale_reduction()
,
mcmc_sample_chain()
,
mcmc_sample_halton_sequence()
TransitionKernel
steps.This function samples from an Markov chain at current_state
and whose
stationary distribution is governed by the supplied TransitionKernel
instance (kernel
).
mcmc_sample_chain( kernel = NULL, num_results, current_state, previous_kernel_results = NULL, num_burnin_steps = 0, num_steps_between_results = 0, trace_fn = NULL, return_final_kernel_results = FALSE, parallel_iterations = 10, seed = NULL, name = NULL )
mcmc_sample_chain( kernel = NULL, num_results, current_state, previous_kernel_results = NULL, num_burnin_steps = 0, num_steps_between_results = 0, trace_fn = NULL, return_final_kernel_results = FALSE, parallel_iterations = 10, seed = NULL, name = NULL )
kernel |
An instance of |
num_results |
Integer number of Markov chain draws. |
current_state |
|
previous_kernel_results |
A |
num_burnin_steps |
Integer number of chain steps to take before starting to collect results. Default value: 0 (i.e., no burn-in). |
num_steps_between_results |
Integer number of chain steps between collecting
a result. Only one out of every |
trace_fn |
A function that takes in the current chain state and the previous
kernel results and return a |
return_final_kernel_results |
If |
parallel_iterations |
The number of iterations allowed to run in parallel. It
must be a positive integer. See |
seed |
Optional, a seed for reproducible sampling. |
name |
string prefixed to Ops created by this function. Default value: |
This function can sample from multiple chains, in parallel. (Whether or not
there are multiple chains is dictated by the kernel
.)
The current_state
can be represented as a single Tensor
or a list
of
Tensors
which collectively represent the current state.
Since MCMC states are correlated, it is sometimes desirable to produce
additional intermediate states, and then discard them, ending up with a set of
states with decreased autocorrelation. See Owen (2017). Such "thinning"
is made possible by setting num_steps_between_results > 0
. The chain then
takes num_steps_between_results
extra steps between the steps that make it
into the results. The extra steps are never materialized (in calls to
sess$run
), and thus do not increase memory requirements.
Warning: when setting a seed
in the kernel
, ensure that sample_chain
's
parallel_iterations=1
, otherwise results will not be reproducible.
In addition to returning the chain state, this function supports tracing of
auxiliary variables used by the kernel. The traced values are selected by
specifying trace_fn
. By default, all kernel results are traced but in the
future the default will be changed to no results being traced, so plan
accordingly. See below for some examples of this feature.
list of:
checkpointable_states_and_trace: if return_final_kernel_results
is
TRUE
. The return value is an instance of CheckpointableStatesAndTrace
.
all_states: if return_final_kernel_results
is FALSE
and trace_fn
is
NULL
. The return value is a Tensor
or Python list of Tensor
s
representing the state(s) of the Markov chain(s) at each result step. Has
same shape as input current_state
but with a prepended
num_results
-size dimension.
states_and_trace: if return_final_kernel_results
is FALSE
and
trace_fn
is not NULL
. The return value is an instance of
StatesAndTrace
.
Other mcmc_functions:
mcmc_effective_sample_size()
,
mcmc_potential_scale_reduction()
,
mcmc_sample_annealed_importance_chain()
,
mcmc_sample_halton_sequence()
dims <- 10 true_stddev <- sqrt(seq(1, 3, length.out = dims)) likelihood <- tfd_multivariate_normal_diag(scale_diag = true_stddev) kernel <- mcmc_hamiltonian_monte_carlo( target_log_prob_fn = likelihood$log_prob, step_size = 0.5, num_leapfrog_steps = 2 ) states <- kernel %>% mcmc_sample_chain( num_results = 1000, num_burnin_steps = 500, current_state = rep(0, dims), trace_fn = NULL ) sample_mean <- tf$reduce_mean(states, axis = 0L) sample_stddev <- tf$sqrt( tf$reduce_mean(tf$math$squared_difference(states, sample_mean), axis = 0L))
dims <- 10 true_stddev <- sqrt(seq(1, 3, length.out = dims)) likelihood <- tfd_multivariate_normal_diag(scale_diag = true_stddev) kernel <- mcmc_hamiltonian_monte_carlo( target_log_prob_fn = likelihood$log_prob, step_size = 0.5, num_leapfrog_steps = 2 ) states <- kernel %>% mcmc_sample_chain( num_results = 1000, num_burnin_steps = 500, current_state = rep(0, dims), trace_fn = NULL ) sample_mean <- tf$reduce_mean(states, axis = 0L) sample_stddev <- tf$sqrt( tf$reduce_mean(tf$math$squared_difference(states, sample_mean), axis = 0L))
dim
dimensional Halton sequence.Warning: The sequence elements take values only between 0 and 1. Care must be taken to appropriately transform the domain of a function if it differs from the unit cube before evaluating integrals using Halton samples. It is also important to remember that quasi-random numbers without randomization are not a replacement for pseudo-random numbers in every context. Quasi random numbers are completely deterministic and typically have significant negative autocorrelation unless randomization is used.
mcmc_sample_halton_sequence( dim, num_results = NULL, sequence_indices = NULL, dtype = tf$float32, randomized = TRUE, seed = NULL, name = NULL )
mcmc_sample_halton_sequence( dim, num_results = NULL, sequence_indices = NULL, dtype = tf$float32, randomized = TRUE, seed = NULL, name = NULL )
dim |
Positive |
num_results |
(Optional) Positive scalar |
sequence_indices |
(Optional) |
dtype |
(Optional) The dtype of the sample. One of: |
randomized |
(Optional) bool indicating whether to produce a randomized
Halton sequence. If TRUE, applies the randomization described in
Owen (2017). Default value: |
seed |
(Optional) integer to seed the random number generator. Only
used if |
name |
(Optional) string describing ops managed by this function. If not supplied the name of this function is used. Default value: "sample_halton_sequence". |
Computes the members of the low discrepancy Halton sequence in dimension
dim
. The dim
-dimensional sequence takes values in the unit hypercube in
dim
dimensions. Currently, only dimensions up to 1000 are supported. The
prime base for the k-th axes is the k-th prime starting from 2. For example,
if dim
= 3, then the bases will be [2, 3, 5]
respectively and the first
element of the non-randomized sequence will be: [0.5, 0.333, 0.2]
. For a more
complete description of the Halton sequences see
here. For low discrepancy
sequences and their applications see
here.
If randomized
is true, this function produces a scrambled version of the
Halton sequence introduced by Owen (2017). For the advantages of
randomization of low discrepancy sequences see
here.
The number of samples produced is controlled by the num_results
and
sequence_indices
parameters. The user must supply either num_results
or
sequence_indices
but not both.
The former is the number of samples to produce starting from the first
element. If sequence_indices
is given instead, the specified elements of
the sequence are generated. For example, sequence_indices=tf$range(10) is
equivalent to specifying n=10.
halton_elements Elements of the Halton sequence. Tensor
of supplied dtype
and shape
[num_results, dim]
if num_results
was specified or shape
[s, dim]
where s is the size of sequence_indices
if sequence_indices
were specified.
For an example how to use see mcmc_sample_chain()
.
Other mcmc_functions:
mcmc_effective_sample_size()
,
mcmc_potential_scale_reduction()
,
mcmc_sample_annealed_importance_chain()
,
mcmc_sample_chain()
step_size
based on log_accept_prob
.The simple policy multiplicatively increases or decreases the step_size
of
the inner kernel based on the value of log_accept_prob
. It is based on
equation 19 of Andrieu and Thoms (2008). Given enough steps and small
enough adaptation_rate
the median of the distribution of the acceptance
probability will converge to the target_accept_prob
. A good target
acceptance probability depends on the inner kernel. If this kernel is
HamiltonianMonteCarlo
, then 0.6-0.9 is a good range to aim for. For
RandomWalkMetropolis
this should be closer to 0.25. See the individual
kernels' docstrings for guidance.
mcmc_simple_step_size_adaptation( inner_kernel, num_adaptation_steps, target_accept_prob = 0.75, adaptation_rate = 0.01, step_size_setter_fn = NULL, step_size_getter_fn = NULL, log_accept_prob_getter_fn = NULL, validate_args = FALSE, name = NULL )
mcmc_simple_step_size_adaptation( inner_kernel, num_adaptation_steps, target_accept_prob = 0.75, adaptation_rate = 0.01, step_size_setter_fn = NULL, step_size_getter_fn = NULL, log_accept_prob_getter_fn = NULL, validate_args = FALSE, name = NULL )
inner_kernel |
|
num_adaptation_steps |
Scalar |
target_accept_prob |
A floating point |
adaptation_rate |
|
step_size_setter_fn |
A function with the signature
|
step_size_getter_fn |
A function with the signature
|
log_accept_prob_getter_fn |
A function with the signature
|
validate_args |
|
name |
string prefixed to Ops created by this class. Default: "simple_step_size_adaptation". |
In general, adaptation prevents the chain from reaching a stationary
distribution, so obtaining consistent samples requires num_adaptation_steps
be set to a value somewhat smaller than the number of burnin steps.
However, it may sometimes be helpful to set num_adaptation_steps
to a larger
value during development in order to inspect the behavior of the chain during
adaptation.
The step size is assumed to broadcast with the chain state, potentially having
leading dimensions corresponding to multiple chains. When there are fewer of
those leading dimensions than there are chain dimensions, the corresponding
dimensions in the log_accept_prob
are averaged (in the direct space, rather
than the log space) before being used to adjust the step size. This means that
this kernel can do both cross-chain adaptation, or per-chain step size
adaptation, depending on the shape of the step size.
For example, if your problem has a state with shape [S]
, your chain state
has shape [C0, C1, Y]
(meaning that there are C0 * C1
total chains) and
log_accept_prob
has shape [C0, C1]
(one acceptance probability per chain),
then depending on the shape of the step size, the following will happen:
Step size has shape []
, [S]
or [1]
, the log_accept_prob
will be averaged
across its C0
and C1
dimensions. This means that you will learn a shared
step size based on the mean acceptance probability across all chains. This
can be useful if you don't have a lot of steps to adapt and want to average
away the noise.
Step size has shape [C1, 1]
or [C1, S]
, the log_accept_prob
will be
averaged across its C0
dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the C1
dimension. This can be useful when the C1
dimension indexes different distributions, while C0
indexes replicas of a
single distribution, all sampled in parallel.
Step size has shape [C0, C1, 1]
or [C0, C1, S]
, then no averaging will
happen. This means that each chain will learn its own step size. This can be
useful when all chains are sampling from different distributions. Even when
all chains are for the same distribution, this can help during the initial
warmup period.
Step size has shape [C0, 1, 1]
or [C0, 1, S]
, the log_accept_prob
will be
averaged across its C1
dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the C0
dimension. This can be useful when the C0
dimension indexes different distributions, while C1
indexes replicas of a
single distribution, all sampled in parallel.
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
target_log_prob_fn <- tfd_normal(loc = 0, scale = 1)$log_prob num_burnin_steps <- 500 num_results <- 500 num_chains <- 64L step_size <- tf$fill(list(num_chains), 0.1) kernel <- mcmc_hamiltonian_monte_carlo( target_log_prob_fn = target_log_prob_fn, num_leapfrog_steps = 2, step_size = step_size ) %>% mcmc_simple_step_size_adaptation(num_adaptation_steps = round(num_burnin_steps * 0.8)) res <- kernel %>% mcmc_sample_chain( num_results = num_results, num_burnin_steps = num_burnin_steps, current_state = rep(0, num_chains), trace_fn = function(x, pkr) { list ( pkr$inner_results$accepted_results$step_size, pkr$inner_results$log_accept_ratio ) } ) samples <- res$all_states step_size <- res$trace[[1]] log_accept_ratio <- res$trace[[2]]
target_log_prob_fn <- tfd_normal(loc = 0, scale = 1)$log_prob num_burnin_steps <- 500 num_results <- 500 num_chains <- 64L step_size <- tf$fill(list(num_chains), 0.1) kernel <- mcmc_hamiltonian_monte_carlo( target_log_prob_fn = target_log_prob_fn, num_leapfrog_steps = 2, step_size = step_size ) %>% mcmc_simple_step_size_adaptation(num_adaptation_steps = round(num_burnin_steps * 0.8)) res <- kernel %>% mcmc_sample_chain( num_results = num_results, num_burnin_steps = num_burnin_steps, current_state = rep(0, num_chains), trace_fn = function(x, pkr) { list ( pkr$inner_results$accepted_results$step_size, pkr$inner_results$log_accept_ratio ) } ) samples <- res$all_states step_size <- res$trace[[1]] log_accept_ratio <- res$trace[[2]]
Slice Sampling is a Markov Chain Monte Carlo (MCMC) algorithm based, as stated
by Neal (2003), on the observation that "...one can sample from a
distribution by sampling uniformly from the region under the plot of its
density function. A Markov chain that converges to this uniform distribution
can be constructed by alternately uniform sampling in the vertical direction
with uniform sampling from the horizontal slice
defined by the current
vertical position, or more generally, with some update that leaves the uniform
distribution over this slice invariant". Mathematical details and derivations
can be found in Neal (2003). The one dimensional slice sampler is
extended to n-dimensions through use of a hit-and-run approach: choose a
random direction in n-dimensional space and take a step, as determined by the
one-dimensional slice sampling algorithm, along that direction
(Belisle at al. 1993).
mcmc_slice_sampler( target_log_prob_fn, step_size, max_doublings, seed = NULL, name = NULL )
mcmc_slice_sampler( target_log_prob_fn, step_size, max_doublings, seed = NULL, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
step_size |
|
max_doublings |
Scalar positive int32 |
seed |
integer to seed the random number generator. |
name |
string prefixed to Ops created by this function.
Default value: |
The one_step
function can update multiple chains in parallel. It assumes
that all leftmost dimensions of current_state
index independent chain states
(and are therefore updated independently). The output of
target_log_prob_fn(*current_state)
should sum log-probabilities across all
event dimensions. Slices along the rightmost dimensions may have different
target distributions; for example, current_state[0, :]
could have a
different target distribution from current_state[1, :]
. These semantics are
governed by target_log_prob_fn(*current_state)
. (The number of independent
chains is tf$size(target_log_prob_fn(*current_state))
.)
Note that the sampler only supports states where all components have a common dtype.
list of
next_state
(Tensor or Python list of Tensor
s representing the state(s)
of the Markov chain(s) at each result step. Has same shape as
and current_state
.) and
kernel_results
(collections$namedtuple
of internal calculations used to
'advance the chain).
Radford M. Neal. Slice Sampling. The Annals of Statistics. 2003, Vol 31, No. 3 , 705-767.
C.J.P. Belisle, H.E. Romeijn, R.L. Smith. Hit-and-run algorithms for generating multivariate distributions. Math. Oper. Res., 18(1993), 225-266.
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
The transformed transition kernel enables fitting
a bijector which serves to decorrelate the Markov chain Monte Carlo (MCMC)
event dimensions thus making the chain mix faster. This is
particularly useful when the geometry of the target distribution is
unfavorable. In such cases it may take many evaluations of the
target_log_prob_fn
for the chain to mix between faraway states.
mcmc_transformed_transition_kernel(inner_kernel, bijector, name = NULL)
mcmc_transformed_transition_kernel(inner_kernel, bijector, name = NULL)
inner_kernel |
|
bijector |
bijector or list of bijectors. These bijectors use |
name |
string prefixed to Ops created by this function.
Default value: |
The idea of training an affine function to decorrelate chain event dims was presented in Parno and Marzouk (2014). Used in conjunction with the Hamiltonian Monte Carlo transition kernel, the Parno and Marzouk (2014) idea is an instance of Riemannian manifold HMC (Girolami and Calderhead, 2011).
The transformed transition kernel enables arbitrary bijective transformations
of arbitrary transition kernels, e.g., one could use bijectors
tfb_affine
, tfb_real_nvp
, etc.
with transition kernels mcmc_hamiltonian_monte_carlo
, mcmc_random_walk_metropolis
, etc.
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
Warning: this kernel will not result in a chain which converges to the
target_log_prob
. To get a convergent MCMC, use mcmc_hamiltonian_monte_carlo(...)
or mcmc_metropolis_hastings(mcmc_uncalibrated_hamiltonian_monte_carlo(...))
.
For more details on UncalibratedHamiltonianMonteCarlo
, see HamiltonianMonteCarlo
.
mcmc_uncalibrated_hamiltonian_monte_carlo( target_log_prob_fn, step_size, num_leapfrog_steps, state_gradients_are_stopped = FALSE, seed = NULL, store_parameters_in_results = FALSE, name = NULL )
mcmc_uncalibrated_hamiltonian_monte_carlo( target_log_prob_fn, step_size, num_leapfrog_steps, state_gradients_are_stopped = FALSE, seed = NULL, store_parameters_in_results = FALSE, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
step_size |
|
num_leapfrog_steps |
Integer number of steps to run the leapfrog integrator
for. Total progress per HMC step is roughly proportional to
|
state_gradients_are_stopped |
|
seed |
integer to seed the random number generator. |
store_parameters_in_results |
If |
name |
string prefixed to Ops created by this function.
Default value: |
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_langevin()
,
mcmc_uncalibrated_random_walk()
The class generates a Langevin proposal using _euler_method
function and
also computes helper UncalibratedLangevinKernelResults
for the next
iteration.
Warning: this kernel will not result in a chain which converges to the
target_log_prob
. To get a convergent MCMC, use
MetropolisAdjustedLangevinAlgorithm(...)
or MetropolisHastings(UncalibratedLangevin(...))
.
mcmc_uncalibrated_langevin( target_log_prob_fn, step_size, volatility_fn = NULL, parallel_iterations = 10, compute_acceptance = TRUE, seed = NULL, name = NULL )
mcmc_uncalibrated_langevin( target_log_prob_fn, step_size, volatility_fn = NULL, parallel_iterations = 10, compute_acceptance = TRUE, seed = NULL, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
step_size |
|
volatility_fn |
function which takes an argument like
|
parallel_iterations |
the number of coordinates for which the gradients of
the volatility matrix |
compute_acceptance |
logical indicating whether to compute the
Metropolis log-acceptance ratio used to construct |
seed |
integer to seed the random number generator. |
name |
String prefixed to Ops created by this function.
Default value: |
list of
next_state
(Tensor or Python list of Tensor
s representing the state(s)
of the Markov chain(s) at each result step. Has same shape as
and current_state
.) and
kernel_results
(collections$namedtuple
of internal calculations used to
'advance the chain).
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_random_walk()
Warning: this kernel will not result in a chain which converges to the
target_log_prob
. To get a convergent MCMC, use
mcmc_random_walk_metropolis(...)
or
mcmc_metropolis_hastings(mcmc_uncalibrated_random_walk(...))
.
mcmc_uncalibrated_random_walk( target_log_prob_fn, new_state_fn = NULL, seed = NULL, name = NULL )
mcmc_uncalibrated_random_walk( target_log_prob_fn, new_state_fn = NULL, seed = NULL, name = NULL )
target_log_prob_fn |
Function which takes an argument like
|
new_state_fn |
Function which takes a list of state parts and a
seed; returns a same-type |
seed |
integer to seed the random number generator. |
name |
String name prefixed to Ops created by this function.
Default value: |
a Monte Carlo sampling kernel
Other mcmc_kernels:
mcmc_dual_averaging_step_size_adaptation()
,
mcmc_hamiltonian_monte_carlo()
,
mcmc_metropolis_adjusted_langevin_algorithm()
,
mcmc_metropolis_hastings()
,
mcmc_no_u_turn_sampler()
,
mcmc_random_walk_metropolis()
,
mcmc_replica_exchange_mc()
,
mcmc_simple_step_size_adaptation()
,
mcmc_slice_sampler()
,
mcmc_transformed_transition_kernel()
,
mcmc_uncalibrated_hamiltonian_monte_carlo()
,
mcmc_uncalibrated_langevin()
params
needed to create a CategoricalMixtureOfOneHotCategorical distributionnumber of params
needed to create a CategoricalMixtureOfOneHotCategorical distribution
params_size_categorical_mixture_of_one_hot_categorical( event_size, num_components )
params_size_categorical_mixture_of_one_hot_categorical( event_size, num_components )
event_size |
event size of this distribution |
num_components |
number of components in the mixture |
a scalar
params
needed to create an IndependentBernoulli distributionnumber of params
needed to create an IndependentBernoulli distribution
params_size_independent_bernoulli(event_size)
params_size_independent_bernoulli(event_size)
event_size |
event size of this distribution |
a scalar
params
needed to create an IndependentLogistic distributionnumber of params
needed to create an IndependentLogistic distribution
params_size_independent_logistic(event_size)
params_size_independent_logistic(event_size)
event_size |
event size of this distribution |
a scalar
params
needed to create an IndependentNormal distributionnumber of params
needed to create an IndependentNormal distribution
params_size_independent_normal(event_size)
params_size_independent_normal(event_size)
event_size |
event size of this distribution |
a scalar
params
needed to create an IndependentPoisson distributionnumber of params
needed to create an IndependentPoisson distribution
params_size_independent_poisson(event_size)
params_size_independent_poisson(event_size)
event_size |
event size of this distribution |
a scalar
params
needed to create a MixtureLogistic distributionnumber of params
needed to create a MixtureLogistic distribution
params_size_mixture_logistic(num_components, event_shape)
params_size_mixture_logistic(num_components, event_shape)
num_components |
Number of component distributions in the mixture distribution. |
event_shape |
Number of parameters needed to create a single component distribution. |
a scalar
params
needed to create a MixtureNormal distributionnumber of params
needed to create a MixtureNormal distribution
params_size_mixture_normal(num_components, event_shape)
params_size_mixture_normal(num_components, event_shape)
num_components |
Number of component distributions in the mixture distribution. |
event_shape |
Number of parameters needed to create a single component distribution. |
a scalar
params
needed to create a MixtureSameFamily distributionnumber of params
needed to create a MixtureSameFamily distribution
params_size_mixture_same_family(num_components, component_params_size)
params_size_mixture_same_family(num_components, component_params_size)
num_components |
Number of component distributions in the mixture distribution. |
component_params_size |
Number of parameters needed to create a single component distribution. |
a scalar
params
needed to create a MultivariateNormalTriL distributionnumber of params
needed to create a MultivariateNormalTriL distribution
params_size_multivariate_normal_tri_l(event_size)
params_size_multivariate_normal_tri_l(event_size)
event_size |
event size of this distribution |
a scalar
params
needed to create a OneHotCategorical distributionnumber of params
needed to create a OneHotCategorical distribution
params_size_one_hot_categorical(event_size)
params_size_one_hot_categorical(event_size)
event_size |
event size of this distribution |
a scalar
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfd_linear_gaussian_state_space_model
for details.
sts_additive_state_space_model( component_ssms, constant_offset = 0, observation_noise_scale = NULL, initial_state_prior = NULL, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_additive_state_space_model( component_ssms, constant_offset = 0, observation_noise_scale = NULL, initial_state_prior = NULL, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
component_ssms |
|
constant_offset |
scalar |
observation_noise_scale |
Optional scalar |
initial_state_prior |
instance of |
initial_step |
Optional scalar |
validate_args |
|
allow_nan_stats |
|
name |
string prefixed to ops created by this class. Default value: "AdditiveStateSpaceModel". |
The sts_additive_state_space_model
represents a sum of component state space
models. Each of the N
components describes a random process
generating a distribution on observed time series x1[t], x2[t], ..., xN[t]
.
The additive model represents the sum of these
processes, y[t] = x1[t] + x2[t] + ... + xN[t] + eps[t]
, where
eps[t] ~ N(0, observation_noise_scale)
is an observation noise term.
Mathematical Details
The additive model concatenates the latent states of its component models. The generative process runs each component's dynamics in its own subspace of latent space, and then observes the sum of the observation models from the components.
Formally, the transition model is linear Gaussian:
p(z[t+1] | z[t]) ~ Normal(loc = transition_matrix.matmul(z[t]), cov = transition_cov)
where each z[t]
is a latent state vector concatenating the component
state vectors, z[t] = [z1[t], z2[t], ..., zN[t]]
, so it has size
latent_size = sum([c.latent_size for c in components])
.
The transition matrix is the block-diagonal composition of transition matrices from the component processes:
transition_matrix = [[ c0.transition_matrix, 0., ..., 0. ], [ 0., c1.transition_matrix, ..., 0. ], [ ... ... ... ], [ 0., 0., ..., cN.transition_matrix ]]
and the noise covariance is similarly the block-diagonal composition of component noise covariances:
transition_cov = [[ c0.transition_cov, 0., ..., 0. ], [ 0., c1.transition_cov, ..., 0. ], [ ... ... ... ], [ 0., 0., ..., cN.transition_cov ]]
The observation model is also linear Gaussian,
p(y[t] | z[t]) ~ Normal(loc = observation_matrix.matmul(z[t]), stddev = observation_noise_scale)
This implementation assumes scalar observations, so observation_matrix
has shape [1, latent_size]
.
The additive observation matrix simply concatenates the observation matrices from each component:
observation_matrix = concat([c0.obs_matrix, c1.obs_matrix, ..., cN.obs_matrix], axis=-1)
The effect is that each component observation matrix acts on the dimensions of latent state corresponding to that component, and the overall expected observation is the sum of the expected observations from each component.
If observation_noise_scale
is not explicitly specified, it is also computed
by summing the noise variances of the component processes:
observation_noise_scale = sqrt(sum([c.observation_noise_scale**2 for c in components]))
an instance of LinearGaussianStateSpaceModel
.
Other sts:
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
An autoregressive (AR) model posits a latent level
whose value at each step
is a noisy linear combination of previous steps:
level[t+1] = (sum(coefficients * levels[t:t-order:-1]) + Normal(0., level_scale))
sts_autoregressive( observed_time_series = NULL, order, coefficients_prior = NULL, level_scale_prior = NULL, initial_state_prior = NULL, coefficient_constraining_bijector = NULL, name = NULL )
sts_autoregressive( observed_time_series = NULL, order, coefficients_prior = NULL, level_scale_prior = NULL, initial_state_prior = NULL, coefficient_constraining_bijector = NULL, name = NULL )
observed_time_series |
optional |
order |
scalar positive |
coefficients_prior |
optional |
level_scale_prior |
optional |
initial_state_prior |
optional |
coefficient_constraining_bijector |
optional |
name |
the name of this model component. Default value: 'Autoregressive'. |
The latent state is levels[t:t-order:-1]
. We observe a noisy realization of
the current level: f[t] = level[t] + Normal(0., observation_noise_scale)
at
each timestep.
If coefficients=[1.]
, the AR process is a simple random walk, equivalent to
a LocalLevel
model. However, a random walk's variance increases with time,
while many AR processes (in particular, any first-order process with
abs(coefficient) < 1
) are stationary, i.e., they maintain a constant
variance over time. This makes AR processes useful models of uncertainty.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfd_linear_gaussian_state_space_model
for
details.
sts_autoregressive_state_space_model( num_timesteps, coefficients, level_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, name = NULL )
sts_autoregressive_state_space_model( num_timesteps, coefficients, level_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, name = NULL )
num_timesteps |
Scalar |
coefficients |
|
level_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
initial_step |
Optional scalar |
validate_args |
|
name |
name prefixed to ops created by this class. Default value: "AutoregressiveStateSpaceModel". |
In an autoregressive process, the expected level at each timestep is a linear function of previous levels, with added Gaussian noise:
level[t+1] = (sum(coefficients * levels[t:t-order:-1]) + Normal(0., level_scale))
The process is characterized by a vector coefficients
whose size determines
the order of the process (how many previous values it looks at), and by
level_scale
, the standard deviation of the noise added at each step.
This is formulated as a state space model by letting the latent state encode
the most recent values; see 'Mathematical Details' below.
The parameters level_scale
and observation_noise_scale
are each (a batch
of) scalars, and coefficients
is a (batch) vector of size list(order)
. The
batch shape of this Distribution
is the broadcast batch
shape of these parameters and of the initial_state_prior
.
Mathematical Details
The autoregressive model implements a
tfd_linear_gaussian_state_space_model
with latent_size = order
and observation_size = 1
. The latent state vector encodes the recent history
of the process, with the current value in the topmost dimension. At each
timestep, the transition sums the previous values to produce the new expected
value, shifts all other values down by a dimension, and adds noise to the
current value. This is formally encoded by the transition model:
transition_matrix = [ coefs[0], coefs[1], ..., coefs[order] 1., 0 , ..., 0. 0., 1., ..., 0. ... 0., 0., ..., 1., 0. ]
transition_noise ~ N(loc=0., scale=diag([level_scale, 0., 0., ..., 0.]))
The observation model simply extracts the current (topmost) value, and optionally adds independent noise at each step:
observation_matrix = [[1., 0., ..., 0.]] observation_noise ~ N(loc=0, scale=observation_noise_scale)
Models with observation_noise_scale = 0
are AR processes in the formal
sense. Setting observation_noise_scale
to a nonzero value corresponds to a
latent AR process observed under an iid noise model.
an instance of LinearGaussianStateSpaceModel
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
The surrogate posterior consists of independent Normal distributions for
each parameter with trainable loc
and scale
, transformed using the
parameter's bijector
to the appropriate support space for that parameter.
sts_build_factored_surrogate_posterior( model, batch_shape = list(), seed = NULL, name = NULL )
sts_build_factored_surrogate_posterior( model, batch_shape = list(), seed = NULL, name = NULL )
model |
An instance of |
batch_shape |
Batch shape ( |
seed |
integer to seed the random number generator. |
name |
string prefixed to ops created by this function.
Default value: |
variational_posterior tfd_joint_distribution_named
defining a trainable
surrogate posterior over model parameters. Samples from this
distribution are named lists with character
parameter names as keys.
Other sts-functions:
sts_build_factored_variational_loss()
,
sts_decompose_by_component()
,
sts_decompose_forecast_by_component()
,
sts_fit_with_hmc()
,
sts_forecast()
,
sts_one_step_predictive()
,
sts_sample_uniform_initial_state()
Variational inference searches for the distribution within some family of
approximate posteriors that minimizes a divergence between the approximate
posterior q(z)
and true posterior p(z|observed_time_series)
. By converting
inference to optimization, it's generally much faster than sampling-based
inference algorithms such as HMC. The tradeoff is that the approximating
family rarely contains the true posterior, so it may miss important aspects of
posterior structure (in particular, dependence between variables) and should
not be blindly trusted. Results may vary; it's generally wise to compare to
HMC to evaluate whether inference quality is sufficient for your task at hand.
sts_build_factored_variational_loss( observed_time_series, model, init_batch_shape = list(), seed = NULL, name = NULL )
sts_build_factored_variational_loss( observed_time_series, model, init_batch_shape = list(), seed = NULL, name = NULL )
observed_time_series |
|
model |
An instance of |
init_batch_shape |
Batch shape ( |
seed |
integer to seed the random number generator. |
name |
name prefixed to ops created by this function. Default value: |
This method constructs a loss function for variational inference using the
Kullback-Liebler divergence KL[q(z) || p(z|observed_time_series)]
, with an
approximating family given by independent Normal distributions transformed to
the appropriate parameter space for each parameter. Minimizing this loss (the
negative ELBO) maximizes a lower bound on the log model evidence
-log p(observed_time_series)
. This is equivalent to the 'mean-field' method
implemented in Kucukelbir et al. (2017) and is a standard approach.
The resulting posterior approximations are unimodal; they will tend to underestimate posterior
uncertainty when the true posterior contains multiple modes
(the KL[q||p]
divergence encourages choosing a single mode) or dependence between variables.
list of:
variational_loss: float
Tensor
of shape
tf$concat([init_batch_shape, model$batch_shape])
, encoding a stochastic
estimate of an upper bound on the negative model evidence -log p(y)
.
Minimizing this loss performs variational inference; the gap between the
variational bound and the true (generally unknown) model evidence
corresponds to the divergence KL[q||p]
between the approximate and true
posterior.
variational_distributions: a named list giving
the approximate posterior for each model parameter. The keys are
character
parameter names in order, corresponding to
[param.name for param in model.parameters]
. The values are
tfd$Distribution
instances with batch shape
tf$concat([init_batch_shape, model$batch_shape])
; these will typically be
of the form tfd$TransformedDistribution(tfd.Normal(...), bijector=param.bijector)
.
Other sts-functions:
sts_build_factored_surrogate_posterior()
,
sts_decompose_by_component()
,
sts_decompose_forecast_by_component()
,
sts_fit_with_hmc()
,
sts_forecast()
,
sts_one_step_predictive()
,
sts_sample_uniform_initial_state()
Seasonal state space model with effects constrained to sum to zero.
sts_constrained_seasonal_state_space_model( num_timesteps, num_seasons, drift_scale, initial_state_prior, observation_noise_scale = 1e-04, num_steps_per_season = 1, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_constrained_seasonal_state_space_model( num_timesteps, num_seasons, drift_scale, initial_state_prior, observation_noise_scale = 1e-04, num_steps_per_season = 1, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
num_timesteps |
Scalar |
num_seasons |
Scalar |
drift_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
num_steps_per_season |
|
initial_step |
Optional scalar |
validate_args |
|
allow_nan_stats |
|
name |
string prefixed to ops created by this class. Default value: "SeasonalStateSpaceModel". |
an instance of LinearGaussianStateSpaceModel
.
sts_seasonal_state_space_model()
.
Mathematical details
The constrained model implements a reparameterization of the
naive SeasonalStateSpaceModel
. Instead of directly representing the
seasonal effects in the latent space, the latent space of the constrained
model represents the difference between each effect and the mean effect.
The following discussion assumes familiarity with the mathematical details
of SeasonalStateSpaceModel
.
Reparameterization and constraints: let the seasonal effects at a given
timestep be E = [e_1, ..., e_N]
. The difference between each effect e_i
and the mean effect is z_i = e_i - sum_i(e_i)/N
. By itself, this
transformation is not invertible because recovering the absolute effects
requires that we know the mean as well. To fix this, we'll define
z_N = sum_i(e_i)/N
as the mean effect. It's easy to see that this is
invertible: given the mean effect and the differences of the first N - 1
effects from the mean, it's easy to solve for all N
effects. Formally,
we've defined the invertible linear reparameterization Z = R E
, where
R = [1 - 1/N, -1/N, ..., -1/N -1/N, 1 - 1/N, ..., -1/N, ... 1/N, 1/N, ..., 1/N]
represents the change of basis from 'effect coordinates' E to
'residual coordinates' Z. The Z
s form the latent space of the
ConstrainedSeasonalStateSpaceModel
.
To constrain the mean effect z_N
to zero, we fix the prior to zero,
p(z_N) ~ N(0., 0)
, and after the transition at each timestep we project
z_N
back to zero. Note that this projection is linear: to set the Nth
dimension to zero, we simply multiply by the identity matrix with a missing
element in the bottom right, i.e., Z_constrained = P Z
,
where P = eye(N) - scatter((N-1, N-1), 1)
.
Model: concretely, suppose a naive seasonal effect model has initial state
prior N(m, S)
, transition matrix F
and noise covariance
Q
, and observation matrix H
. Then the corresponding constrained seasonal
effect model has initial state prior N(P R m, P R S R' P')
,
transition matrix P R F R^-1
and noise covariance F R Q R' F'
, and
observation matrix H R^-1
, where the change-of-basis matrix R
and
constraint projection matrix P
are as defined above. This follows
directly from applying the reparameterization Z = R E
, and then enforcing
the zero-sum constraint on the prior and transition noise covariances.
In practice, because the sum of effects z_N
is constrained to be zero, it
will never contribute a term to any linear operation on the latent space,
so we can drop that dimension from the model entirely.
ConstrainedSeasonalStateSpaceModel
does this, so that it implements the
N - 1
dimension latent space z_1, ..., z_[N-1]
.
Note that since we constrained the mean effect to be zero, the latent
z_i
's now recover their interpretation as the actual effects,
z_i = e_i
for i =
1, ..., N - 1, even though they were originally defined as residuals. The
Nth effect is represented only implicitly, as the nonzero mean of the first
N - 1effects. Although the computational represention is not symmetric across all
Neffects, we derived the
ConstrainedSeasonalStateSpaceModelby starting with a symmetric representation and imposing only a symmetric constraint (the zero-sum constraint), so the probability model remains symmetric over all
N'
seasonal effects.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
This method decomposes a time series according to the posterior represention of a structural time series model. In particular, it:
Computes the posterior marginal mean and covariances over the additive model's latent space.
Decomposes the latent posterior into the marginal blocks for each model component.
Maps the per-component latent posteriors back through each component's observation model, to generate the time series modeled by that component.
sts_decompose_by_component(observed_time_series, model, parameter_samples)
sts_decompose_by_component(observed_time_series, model, parameter_samples)
observed_time_series |
|
model |
An instance of |
parameter_samples |
|
component_dists A named list mapping
component StructuralTimeSeries instances (elements of model$components
)
to Distribution
instances representing the posterior marginal
distributions on the process modeled by each component. Each distribution
has batch shape matching that of posterior_means
/posterior_covs
, and
event shape of list(num_timesteps)
.
Other sts-functions:
sts_build_factored_surrogate_posterior()
,
sts_build_factored_variational_loss()
,
sts_decompose_forecast_by_component()
,
sts_fit_with_hmc()
,
sts_forecast()
,
sts_one_step_predictive()
,
sts_sample_uniform_initial_state()
observed_time_series <- array(rnorm(2 * 1 * 12), dim = c(2, 1, 12)) day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7, name = "seasonal") local_linear_trend <- observed_time_series %>% sts_local_linear_trend(name = "local_linear") model <- observed_time_series %>% sts_sum(components = list(day_of_week, local_linear_trend)) states_and_results <- observed_time_series %>% sts_fit_with_hmc( model, num_results = 10, num_warmup_steps = 5, num_variational_steps = 15 ) samples <- states_and_results[[1]] component_dists <- observed_time_series %>% sts_decompose_by_component(model = model, parameter_samples = samples)
observed_time_series <- array(rnorm(2 * 1 * 12), dim = c(2, 1, 12)) day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7, name = "seasonal") local_linear_trend <- observed_time_series %>% sts_local_linear_trend(name = "local_linear") model <- observed_time_series %>% sts_sum(components = list(day_of_week, local_linear_trend)) states_and_results <- observed_time_series %>% sts_fit_with_hmc( model, num_results = 10, num_warmup_steps = 5, num_variational_steps = 15 ) samples <- states_and_results[[1]] component_dists <- observed_time_series %>% sts_decompose_by_component(model = model, parameter_samples = samples)
Decompose a forecast distribution into contributions from each component.
sts_decompose_forecast_by_component(model, forecast_dist, parameter_samples)
sts_decompose_forecast_by_component(model, forecast_dist, parameter_samples)
model |
An instance of |
forecast_dist |
A |
parameter_samples |
|
component_dists A named list mapping
component StructuralTimeSeries instances (elements of model$components
)
to Distribution
instances representing the marginal forecast for each component.
Each distribution has batch shape matching forecast_dist
(specifically,
the event shape is [num_steps_forecast]
).
Other sts-functions:
sts_build_factored_surrogate_posterior()
,
sts_build_factored_variational_loss()
,
sts_decompose_by_component()
,
sts_fit_with_hmc()
,
sts_forecast()
,
sts_one_step_predictive()
,
sts_sample_uniform_initial_state()
The dynamic linear regression model is a special case of a linear Gaussian SSM
and a generalization of typical (static) linear regression. The model
represents regression weights
with a latent state which evolves via a
Gaussian random walk:
sts_dynamic_linear_regression( observed_time_series = NULL, design_matrix, drift_scale_prior = NULL, initial_weights_prior = NULL, name = NULL )
sts_dynamic_linear_regression( observed_time_series = NULL, design_matrix, drift_scale_prior = NULL, initial_weights_prior = NULL, name = NULL )
observed_time_series |
optional |
design_matrix |
float |
drift_scale_prior |
instance of |
initial_weights_prior |
instance of |
name |
the name of this component. Default value: 'DynamicLinearRegression'. |
weights[t] ~ Normal(weights[t-1], drift_scale)
The latent state has dimension num_features
, while the parameters
drift_scale
and observation_noise_scale
are each (a batch of) scalars. The
batch shape of this distribution is the broadcast batch shape of these
parameters, the initial_state_prior
, and the design_matrix
.
num_features
is determined from the last dimension of design_matrix
(equivalent to the
number of columns in the design matrix in linear regression).
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfd_linear_gaussian_state_space_model
for details.
sts_dynamic_linear_regression_state_space_model( num_timesteps, design_matrix, drift_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_dynamic_linear_regression_state_space_model( num_timesteps, design_matrix, drift_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
num_timesteps |
Scalar |
design_matrix |
float |
drift_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
initial_step |
scalar |
validate_args |
|
allow_nan_stats |
|
name |
name prefixed to ops created by this class. Default value: 'DynamicLinearRegressionStateSpaceModel'. |
The dynamic linear regression model is a special case of a linear Gaussian SSM
and a generalization of typical (static) linear regression. The model
represents regression weights
with a latent state which evolves via a
Gaussian random walk:
weights[t] ~ Normal(weights[t-1], drift_scale)
The latent state (the weights) has dimension num_features
, while the
parameters drift_scale
and observation_noise_scale
are each (a batch of)
scalars. The batch shape of this Distribution
is the broadcast batch shape
of these parameters, the initial_state_prior
, and the
design_matrix
. num_features
is determined from the last dimension of
design_matrix
(equivalent to the number of columns in the design matrix in
linear regression).
Mathematical Details
The dynamic linear regression model implements a
tfd_linear_gaussian_state_space_model
with latent_size = num_features
and
observation_size = 1
following the transition model:
transition_matrix = eye(num_features) transition_noise ~ Normal(0, diag([drift_scale]))
which implements the evolution of weights
described above. The observation
model is:
observation_matrix[t] = design_matrix[t] observation_noise ~ Normal(0, observation_noise_scale)
an instance of LinearGaussianStateSpaceModel
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
Markov chain Monte Carlo (MCMC) methods are considered the gold standard of Bayesian inference; under suitable conditions and in the limit of infinitely many draws they generate samples from the true posterior distribution. HMC (Neal, 2011) uses gradients of the model's log-density function to propose samples, allowing it to exploit posterior geometry. However, it is computationally more expensive than variational inference and relatively sensitive to tuning.
sts_fit_with_hmc( observed_time_series, model, num_results = 100, num_warmup_steps = 50, num_leapfrog_steps = 15, initial_state = NULL, initial_step_size = NULL, chain_batch_shape = list(), num_variational_steps = 150, variational_optimizer = NULL, variational_sample_size = 5, seed = NULL, name = NULL )
sts_fit_with_hmc( observed_time_series, model, num_results = 100, num_warmup_steps = 50, num_leapfrog_steps = 15, initial_state = NULL, initial_step_size = NULL, chain_batch_shape = list(), num_variational_steps = 150, variational_optimizer = NULL, variational_sample_size = 5, seed = NULL, name = NULL )
observed_time_series |
|
model |
An instance of |
num_results |
Integer number of Markov chain draws. Default value: |
num_warmup_steps |
Integer number of steps to take before starting to
collect results. The warmup steps are also used to adapt the step size
towards a target acceptance rate of 0.75. Default value: |
num_leapfrog_steps |
Integer number of steps to run the leapfrog integrator
for. Total progress per HMC step is roughly proportional to |
initial_state |
Optional Python |
initial_step_size |
|
chain_batch_shape |
Batch shape ( |
num_variational_steps |
|
variational_optimizer |
Optional |
variational_sample_size |
integer number of Monte Carlo samples to use
in estimating the variational divergence. Larger values may stabilize
the optimization, but at higher cost per step in time and memory.
Default value: |
seed |
integer to seed the random number generator. |
name |
name prefixed to ops created by this function. Default value: |
This method attempts to provide a sensible default approach for fitting StructuralTimeSeries models using HMC. It first runs variational inference as a fast posterior approximation, and initializes the HMC sampler from the variational posterior, using the posterior standard deviations to set per-variable step sizes (equivalently, a diagonal mass matrix). During the warmup phase, it adapts the step size to target an acceptance rate of 0.75, which is thought to be in the desirable range for optimal mixing (Betancourt et al., 2014).
list of:
samples: list
of Tensors
representing posterior samples of model
parameters, with shapes [concat([[num_results], chain_batch_shape, param.prior.batch_shape, param.prior.event_shape]) for param in model.parameters]
.
kernel_results: A (possibly nested) list
of Tensor
s representing
internal calculations made within the HMC sampler.
Other sts-functions:
sts_build_factored_surrogate_posterior()
,
sts_build_factored_variational_loss()
,
sts_decompose_by_component()
,
sts_decompose_forecast_by_component()
,
sts_forecast()
,
sts_one_step_predictive()
,
sts_sample_uniform_initial_state()
observed_time_series <- rep(c(3.5, 4.1, 4.5, 3.9, 2.4, 2.1, 1.2), 5) + rep(c(1.1, 1.5, 2.4, 3.1, 4.0), each = 7) %>% tensorflow::tf$convert_to_tensor(dtype = tensorflow::tf$float64) day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7) local_linear_trend <- observed_time_series %>% sts_local_linear_trend() model <- observed_time_series %>% sts_sum(components = list(day_of_week, local_linear_trend)) states_and_results <- observed_time_series %>% sts_fit_with_hmc( model, num_results = 10, num_warmup_steps = 5, num_variational_steps = 15)
observed_time_series <- rep(c(3.5, 4.1, 4.5, 3.9, 2.4, 2.1, 1.2), 5) + rep(c(1.1, 1.5, 2.4, 3.1, 4.0), each = 7) %>% tensorflow::tf$convert_to_tensor(dtype = tensorflow::tf$float64) day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7) local_linear_trend <- observed_time_series %>% sts_local_linear_trend() model <- observed_time_series %>% sts_sum(components = list(day_of_week, local_linear_trend)) states_and_results <- observed_time_series %>% sts_fit_with_hmc( model, num_results = 10, num_warmup_steps = 5, num_variational_steps = 15)
Given samples from the posterior over parameters, return the predictive distribution over future observations for num_steps_forecast timesteps.
sts_forecast( observed_time_series, model, parameter_samples, num_steps_forecast )
sts_forecast( observed_time_series, model, parameter_samples, num_steps_forecast )
observed_time_series |
|
model |
An instance of |
parameter_samples |
|
num_steps_forecast |
scalar |
forecast_dist a tfd_mixture_same_family
instance with event shape
list(num_steps_forecast, 1)
and batch shape tf$concat(list(sample_shape, model$batch_shape))
, with
num_posterior_draws
mixture components.
Other sts-functions:
sts_build_factored_surrogate_posterior()
,
sts_build_factored_variational_loss()
,
sts_decompose_by_component()
,
sts_decompose_forecast_by_component()
,
sts_fit_with_hmc()
,
sts_one_step_predictive()
,
sts_sample_uniform_initial_state()
observed_time_series <- rep(c(3.5, 4.1, 4.5, 3.9, 2.4, 2.1, 1.2), 5) + rep(c(1.1, 1.5, 2.4, 3.1, 4.0), each = 7) %>% tensorflow::tf$convert_to_tensor(dtype = tensorflow::tf$float64) day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7) local_linear_trend <- observed_time_series %>% sts_local_linear_trend() model <- observed_time_series %>% sts_sum(components = list(day_of_week, local_linear_trend)) states_and_results <- observed_time_series %>% sts_fit_with_hmc( model, num_results = 10, num_warmup_steps = 5, num_variational_steps = 15) samples <- states_and_results[[1]] preds <- observed_time_series %>% sts_forecast(model, parameter_samples = samples, num_steps_forecast = 50) predictions <- preds %>% tfd_sample(10)
observed_time_series <- rep(c(3.5, 4.1, 4.5, 3.9, 2.4, 2.1, 1.2), 5) + rep(c(1.1, 1.5, 2.4, 3.1, 4.0), each = 7) %>% tensorflow::tf$convert_to_tensor(dtype = tensorflow::tf$float64) day_of_week <- observed_time_series %>% sts_seasonal(num_seasons = 7) local_linear_trend <- observed_time_series %>% sts_local_linear_trend() model <- observed_time_series %>% sts_sum(components = list(day_of_week, local_linear_trend)) states_and_results <- observed_time_series %>% sts_fit_with_hmc( model, num_results = 10, num_warmup_steps = 5, num_variational_steps = 15) samples <- states_and_results[[1]] preds <- observed_time_series %>% sts_forecast(model, parameter_samples = samples, num_steps_forecast = 50) predictions <- preds %>% tfd_sample(10)
This model defines a time series given by a linear combination of covariate time series provided in a design matrix:
observed_time_series <- tf$matmul(design_matrix, weights)
sts_linear_regression(design_matrix, weights_prior = NULL, name = NULL)
sts_linear_regression(design_matrix, weights_prior = NULL, name = NULL)
design_matrix |
float |
weights_prior |
|
name |
the name of this model component. Default value: 'LinearRegression'. |
The design matrix has shape list(num_timesteps, num_features)
.
The weights are treated as an unknown random variable of size list(num_features)
(both components also support batch shape), and are integrated over using the same
approximate inference tools as other model parameters, i.e., generally HMC or
variational inference.
This component does not itself include observation noise; it defines a
deterministic distribution with mass at the point
tf$matmul(design_matrix, weights)
. In practice, it should be combined with
observation noise from another component such as sts_sum
, as demonstrated below.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
The local level model posits a level
evolving via a Gaussian random walk:
level[t] = level[t-1] + Normal(0., level_scale)
sts_local_level( observed_time_series = NULL, level_scale_prior = NULL, initial_level_prior = NULL, name = NULL )
sts_local_level( observed_time_series = NULL, level_scale_prior = NULL, initial_level_prior = NULL, name = NULL )
observed_time_series |
optional |
level_scale_prior |
optional |
initial_level_prior |
optional |
name |
the name of this model component. Default value: 'LocalLevel'. |
The latent state is [level]
. We observe a noisy realization of the current
level: f[t] = level[t] + Normal(0., observation_noise_scale)
at each timestep.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfd_linear_gaussian_state_space_model
for
details.
The local level model is a special case of a linear Gaussian SSM, in which the
latent state posits a level
evolving via a Gaussian random walk:
level[t] = level[t-1] + Normal(0., level_scale)
sts_local_level_state_space_model( num_timesteps, level_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_local_level_state_space_model( num_timesteps, level_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
num_timesteps |
Scalar |
level_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
initial_step |
Optional scalar |
validate_args |
|
allow_nan_stats |
|
name |
string name prefixed to ops created by this class. Default value: "LocalLevelStateSpaceModel". |
The latent state is [level]
and [level]
is observed (with noise) at each timestep.
The parameters level_scale
and observation_noise_scale
are each (a batch
of) scalars. The batch shape of this Distribution
is the broadcast batch
shape of these parameters and of the initial_state_prior
.
Mathematical Details
The local level model implements a tfp$distributions$LinearGaussianStateSpaceModel
with
latent_size = 1
and observation_size = 1
, following the transition model:
transition_matrix = [[1]] transition_noise ~ N(loc = 0, scale = diag([level_scale]))
which implements the evolution of level
described above, and the observation model:
observation_matrix = [[1]] observation_noise ~ N(loc = 0, scale = observation_noise_scale)
an instance of LinearGaussianStateSpaceModel
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
The local linear trend model posits a level
and slope
, each
evolving via a Gaussian random walk:
level[t] = level[t-1] + slope[t-1] + Normal(0., level_scale) slope[t] = slope[t-1] + Normal(0., slope_scale)
sts_local_linear_trend( observed_time_series = NULL, level_scale_prior = NULL, slope_scale_prior = NULL, initial_level_prior = NULL, initial_slope_prior = NULL, name = NULL )
sts_local_linear_trend( observed_time_series = NULL, level_scale_prior = NULL, slope_scale_prior = NULL, initial_level_prior = NULL, initial_slope_prior = NULL, name = NULL )
observed_time_series |
optional |
level_scale_prior |
optional |
slope_scale_prior |
optional |
initial_level_prior |
optional |
initial_slope_prior |
optional |
name |
the name of this model component. Default value: 'LocalLinearTrend'. |
The latent state is the two-dimensional tuple [level, slope]
. At each
timestep we observe a noisy realization of the current level:
f[t] = level[t] + Normal(0., observation_noise_scale)
.
This model is appropriate for data where the trend direction and magnitude (latent
slope
) is consistent within short periods but may evolve over time.
Note that this model can produce very high uncertainty forecasts, as
uncertainty over the slope compounds quickly. If you expect your data to
have nonzero long-term trend, i.e. that slopes tend to revert to some mean,
then the SemiLocalLinearTrend
model may produce sharper forecasts.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfd_linear_gaussian_state_space_model
for details.
sts_local_linear_trend_state_space_model( num_timesteps, level_scale, slope_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_local_linear_trend_state_space_model( num_timesteps, level_scale, slope_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
num_timesteps |
Scalar |
level_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
slope_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
initial_step |
Optional scalar |
validate_args |
|
allow_nan_stats |
|
name |
string prefixed to ops created by this class. Default value: "LocalLinearTrendStateSpaceModel". |
The local linear trend model is a special case of a linear Gaussian SSM, in
which the latent state posits a level
and slope
, each evolving via a
Gaussian random walk:
level[t] = level[t-1] + slope[t-1] + Normal(0., level_scale) slope[t] = slope[t-1] + Normal(0., slope_scale)
The latent state is the two-dimensional tuple [level, slope]
. The
level
is observed at each timestep.
The parameters level_scale
, slope_scale
, and observation_noise_scale
are each (a batch of) scalars. The batch shape of this Distribution
is the
broadcast batch shape of these parameters and of the initial_state_prior
.
Mathematical Details
The linear trend model implements a tfd_linear_gaussian_state_space_model
with latent_size = 2
and observation_size = 1
, following the transition model:
transition_matrix = [[1., 1.] [0., 1.]] transition_noise ~ N(loc = 0, scale = diag([level_scale, slope_scale]))
which implements the evolution of [level, slope]
described above, and the observation model:
observation_matrix = [[1., 0.]] observation_noise ~ N(loc= 0 , scale = observation_noise_scale)
which picks out the first latent component, i.e., the level
, as the
observation at each timestep.
an instance of LinearGaussianStateSpaceModel
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
Given samples from the posterior over parameters, return the predictive
distribution over observations at each time T
, given observations up
through time T-1
.
sts_one_step_predictive( observed_time_series, model, parameter_samples, timesteps_are_event_shape = TRUE )
sts_one_step_predictive( observed_time_series, model, parameter_samples, timesteps_are_event_shape = TRUE )
observed_time_series |
|
model |
An instance of |
parameter_samples |
|
timesteps_are_event_shape |
Deprecated, for backwards compatibility only. If False, the predictive distribution will return per-timestep probabilities Default value: TRUE. |
forecast_dist a tfd_mixture_same_family
instance with event shape
list(num_timesteps)
and batch shape tf$concat(list(sample_shape, model$batch_shape))
, with
num_posterior_draws
mixture components. The t
th step represents the
forecast distribution p(observed_time_series[t] | observed_time_series[0:t-1], parameter_samples)
.
Other sts-functions:
sts_build_factored_surrogate_posterior()
,
sts_build_factored_variational_loss()
,
sts_decompose_by_component()
,
sts_decompose_forecast_by_component()
,
sts_fit_with_hmc()
,
sts_forecast()
,
sts_sample_uniform_initial_state()
[-2, 2]
distribution in unconstrained space.Initialize from a uniform [-2, 2]
distribution in unconstrained space.
sts_sample_uniform_initial_state( parameter, return_constrained = TRUE, init_sample_shape = list(), seed = NULL )
sts_sample_uniform_initial_state( parameter, return_constrained = TRUE, init_sample_shape = list(), seed = NULL )
parameter |
|
return_constrained |
if |
init_sample_shape |
|
seed |
integer to seed the random number generator. |
uniform_initializer Tensor
of shape
concat([init_sample_shape, parameter.prior.batch_shape, transformed_event_shape])
, where
transformed_event_shape
is parameter.prior.event_shape
, if
return_constrained=TRUE
, and otherwise it is
parameter$bijector$inverse_event_shape(parameter$prior$event_shape)
.
Other sts-functions:
sts_build_factored_surrogate_posterior()
,
sts_build_factored_variational_loss()
,
sts_decompose_by_component()
,
sts_decompose_forecast_by_component()
,
sts_fit_with_hmc()
,
sts_forecast()
,
sts_one_step_predictive()
A seasonal effect model posits a fixed set of recurring, discrete 'seasons', each of which is active for a fixed number of timesteps and, while active, contributes a different effect to the time series. These are generally not meteorological seasons, but represent regular recurring patterns such as hour-of-day or day-of-week effects. Each season lasts for a fixed number of timesteps. The effect of each season drifts from one occurrence to the next following a Gaussian random walk:
sts_seasonal( observed_time_series = NULL, num_seasons, num_steps_per_season = 1, drift_scale_prior = NULL, initial_effect_prior = NULL, constrain_mean_effect_to_zero = TRUE, name = NULL )
sts_seasonal( observed_time_series = NULL, num_seasons, num_steps_per_season = 1, drift_scale_prior = NULL, initial_effect_prior = NULL, constrain_mean_effect_to_zero = TRUE, name = NULL )
observed_time_series |
optional |
num_seasons |
Scalar |
num_steps_per_season |
|
drift_scale_prior |
optional |
initial_effect_prior |
optional |
constrain_mean_effect_to_zero |
if |
name |
the name of this model component. Default value: 'Seasonal'. |
effects[season, occurrence[i]] = ( effects[season, occurrence[i-1]] + Normal(loc=0., scale=drift_scale))
The drift_scale
parameter governs the standard deviation of the random walk;
for example, in a day-of-week model it governs the change in effect from this
Monday to next Monday.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfd_linear_gaussian_state_space_model
for
details.
sts_seasonal_state_space_model( num_timesteps, num_seasons, drift_scale, initial_state_prior, observation_noise_scale = 0, num_steps_per_season = 1, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_seasonal_state_space_model( num_timesteps, num_seasons, drift_scale, initial_state_prior, observation_noise_scale = 0, num_steps_per_season = 1, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
num_timesteps |
Scalar |
num_seasons |
Scalar |
drift_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
num_steps_per_season |
|
initial_step |
Optional scalar |
validate_args |
|
allow_nan_stats |
|
name |
string prefixed to ops created by this class. Default value: "SeasonalStateSpaceModel". |
A seasonal effect model is a special case of a linear Gaussian SSM. The latent states represent an unknown effect from each of several 'seasons'; these are generally not meteorological seasons, but represent regular recurring patterns such as hour-of-day or day-of-week effects. The effect of each season drifts from one occurrence to the next, following a Gaussian random walk:
effects[season, occurrence[i]] = (effects[season, occurrence[i-1]] + Normal(loc=0., scale=drift_scale))
The latent state has dimension num_seasons
, containing one effect for each
seasonal component. The parameters drift_scale
and
observation_noise_scale
are each (a batch of) scalars. The batch shape of
this Distribution
is the broadcast batch shape of these parameters and of
the initial_state_prior
.
Note: there is no requirement that the effects sum to zero.
Mathematical Details
The seasonal effect model implements a tfd_linear_gaussian_state_space_model
with
latent_size = num_seasons
and observation_size = 1
. The latent state
is organized so that the current seasonal effect is always in the first
(zeroth) dimension. The transition model rotates the latent state to shift
to a new effect at the end of each season:
transition_matrix[t] = (permutation_matrix([1, 2, ..., num_seasons-1, 0]) if season_is_changing(t) else eye(num_seasons) transition_noise[t] ~ Normal(loc=0., scale_diag=( [drift_scale, 0, ..., 0] if season_is_changing(t) else [0, 0, ..., 0]))
where season_is_changing(t)
is True
if t `mod` sum(num_steps_per_season)
is in
the set of final days for each season, given by cumsum(num_steps_per_season) - 1
.
The observation model always picks out the effect for the current season, i.e.,
the first element of the latent state:
observation_matrix = [[1., 0., ..., 0.]] observation_noise ~ Normal(loc=0, scale=observation_noise_scale)
an instance of LinearGaussianStateSpaceModel
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
Like the sts_local_linear_trend
model, a semi-local linear trend posits a
latent level
and slope
, with the level component updated according to
the current slope plus a random walk:
sts_semi_local_linear_trend( observed_time_series = NULL, level_scale_prior = NULL, slope_mean_prior = NULL, slope_scale_prior = NULL, autoregressive_coef_prior = NULL, initial_level_prior = NULL, initial_slope_prior = NULL, constrain_ar_coef_stationary = TRUE, constrain_ar_coef_positive = FALSE, name = NULL )
sts_semi_local_linear_trend( observed_time_series = NULL, level_scale_prior = NULL, slope_mean_prior = NULL, slope_scale_prior = NULL, autoregressive_coef_prior = NULL, initial_level_prior = NULL, initial_slope_prior = NULL, constrain_ar_coef_stationary = TRUE, constrain_ar_coef_positive = FALSE, name = NULL )
observed_time_series |
optional |
level_scale_prior |
optional |
slope_mean_prior |
optional |
slope_scale_prior |
optional |
autoregressive_coef_prior |
optional |
initial_level_prior |
optional |
initial_slope_prior |
optional |
constrain_ar_coef_stationary |
if |
constrain_ar_coef_positive |
if |
name |
the name of this model component. Default value: 'SemiLocalLinearTrend'. |
level[t] = level[t-1] + slope[t-1] + Normal(0., level_scale)
The slope component in a sts_semi_local_linear_trend
model evolves according to
a first-order autoregressive (AR1) process with potentially nonzero mean:
slope[t] = (slope_mean + autoregressive_coef * (slope[t-1] - slope_mean) + Normal(0., slope_scale))
Unlike the random walk used in LocalLinearTrend
, a stationary
AR1 process (coefficient in (-1, 1)
) maintains bounded variance over time,
so a SemiLocalLinearTrend
model will often produce more reasonable
uncertainties when forecasting over long timescales.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfd_linear_gaussian_state_space_model
for details.
sts_semi_local_linear_trend_state_space_model( num_timesteps, level_scale, slope_mean, slope_scale, autoregressive_coef, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_semi_local_linear_trend_state_space_model( num_timesteps, level_scale, slope_mean, slope_scale, autoregressive_coef, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
num_timesteps |
Scalar |
level_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
slope_mean |
Scalar (any additional dimensions are treated as batch
dimensions) |
slope_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
autoregressive_coef |
Scalar (any additional dimensions are treated as
batch dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
initial_step |
Optional scalar |
validate_args |
|
allow_nan_stats |
|
name |
string' prefixed to ops created by this class. Default value: "SemiLocalLinearTrendStateSpaceModel". |
The semi-local linear trend model is a special case of a linear Gaussian
SSM, in which the latent state posits a level
and slope
. The level
evolves via a Gaussian random walk centered at the current slope
, while
the slope
follows a first-order autoregressive (AR1) process with
mean slope_mean
:
level[t] = level[t-1] + slope[t-1] + Normal(0, level_scale) slope[t] = (slope_mean + autoregressive_coef * (slope[t-1] - slope_mean) + Normal(0., slope_scale))
The latent state is the two-dimensional tuple [level, slope]
. The
level
is observed at each timestep.
The parameters level_scale
, slope_mean
, slope_scale
,
autoregressive_coef
, and observation_noise_scale
are each (a batch of)
scalars. The batch shape of this Distribution
is the broadcast batch shape
of these parameters and of the initial_state_prior
.
Mathematical Details
The semi-local linear trend model implements a
tfp.distributions.LinearGaussianStateSpaceModel
with latent_size = 2
and observation_size = 1
, following the transition model:
transition_matrix = [[1., 1.] [0., autoregressive_coef]] transition_noise ~ N(loc=slope_mean - autoregressive_coef * slope_mean, scale=diag([level_scale, slope_scale]))
which implements the evolution of [level, slope]
described above, and
the observation model:
observation_matrix = [[1., 0.]] observation_noise ~ N(loc=0, scale=observation_noise_scale)
which picks out the first latent component, i.e., the level
, as the
observation at each timestep.
an instance of LinearGaussianStateSpaceModel
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
The smooth seasonal model uses a set of trigonometric terms in order to
capture a recurring pattern whereby adjacent (in time) effects are
similar. The model uses frequencies
calculated via:
sts_smooth_seasonal( period, frequency_multipliers, allow_drift = TRUE, drift_scale_prior = NULL, initial_state_prior = NULL, observed_time_series = NULL, name = NULL )
sts_smooth_seasonal( period, frequency_multipliers, allow_drift = TRUE, drift_scale_prior = NULL, initial_state_prior = NULL, observed_time_series = NULL, name = NULL )
period |
positive scalar |
frequency_multipliers |
One-dimensional |
allow_drift |
optional |
drift_scale_prior |
optional |
initial_state_prior |
instance of |
observed_time_series |
optional |
name |
the name of this model component. Default value: 'LocalLinearTrend'. |
frequencies[j] = 2. * pi * frequency_multipliers[j] / period
and then posits two latent states for each frequency
. The two latent states
associated with frequency j
drift over time via:
effect[t] = (effect[t-1] * cos(frequencies[j]) + auxiliary[t-] * sin(frequencies[j]) + Normal(0., drift_scale)) auxiliary[t] = (-effect[t-1] * sin(frequencies[j]) + auxiliary[t-] * cos(frequencies[j]) + Normal(0., drift_scale))
where effect
is the smooth seasonal effect and auxiliary
only appears as a
matter of construction. The interpretation of auxiliary
is thus not
particularly important.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_sparse_linear_regression()
,
sts_sum()
A state space model (SSM) posits a set of latent (unobserved) variables that
evolve over time with dynamics specified by a probabilistic transition model
p(z[t+1] | z[t])
. At each timestep, we observe a value sampled from an
observation model conditioned on the current state, p(x[t] | z[t])
. The
special case where both the transition and observation models are Gaussians
with mean specified as a linear function of the inputs, is known as a linear
Gaussian state space model and supports tractable exact probabilistic
calculations; see tfp$distributions$LinearGaussianStateSpaceModel
for
details.
A smooth seasonal effect model is a special case of a linear Gaussian SSM. It
is the sum of a set of "cyclic" components, with one component for each
frequency:
frequencies[j] = 2. * pi * frequency_multipliers[j] / period
Each cyclic component contains two latent states which we denote effect
and
auxiliary
. The two latent states for component j
drift over time via:
effect[t] = (effect[t-1] * cos(frequencies[j]) + auxiliary[t-] * sin(frequencies[j]) + Normal(0., drift_scale)) auxiliary[t] = (-effect[t-1] * sin(frequencies[j]) + auxiliary[t-] * cos(frequencies[j]) + Normal(0., drift_scale))
sts_smooth_seasonal_state_space_model( num_timesteps, period, frequency_multipliers, drift_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
sts_smooth_seasonal_state_space_model( num_timesteps, period, frequency_multipliers, drift_scale, initial_state_prior, observation_noise_scale = 0, initial_step = 0, validate_args = FALSE, allow_nan_stats = TRUE, name = NULL )
num_timesteps |
Scalar |
period |
positive scalar |
frequency_multipliers |
One-dimensional |
drift_scale |
Scalar (any additional dimensions are treated as batch
dimensions) |
initial_state_prior |
instance of |
observation_noise_scale |
Scalar (any additional dimensions are
treated as batch dimensions) |
initial_step |
scalar |
validate_args |
|
allow_nan_stats |
|
name |
string prefixed to ops created by this class. Default value: "LocalLinearTrendStateSpaceModel". |
The auxiliary
latent state only appears as a matter of construction and thus
its interpretation is not particularly important. The total smooth seasonal
effect is the sum of the effect
values from each of the cyclic components.
The parameters drift_scale
and observation_noise_scale
are each (a batch
of) scalars. The batch shape of this Distribution
is the broadcast batch
shape of these parameters and of the initial_state_prior
.
Mathematical Details
The smooth seasonal effect model implements a
tfp$distributions$LinearGaussianStateSpaceModel
with
latent_size = 2 * len(frequency_multipliers)
and observation_size = 1
.
The latent state is the concatenation of the cyclic latent states which themselves
comprise an effect
and an auxiliary
state. The transition matrix is a block diagonal
matrix where block j
is:
transition_matrix[j] = [[cos(frequencies[j]), sin(frequencies[j])], [-sin(frequencies[j]), cos(frequencies[j])]]
The observation model picks out the cyclic effect
values from the latent state:
observation_matrix = [[1., 0., 1., 0., ..., 1., 0.]] observation_noise ~ Normal(loc=0, scale=observation_noise_scale)
For further mathematical details please see Harvey (1990).
an instance of LinearGaussianStateSpaceModel
.
Harvey, A. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press, 1990.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
,
sts_sum()
This model defines a time series given by a sparse linear combination of covariate time series provided in a design matrix:
sts_sparse_linear_regression( design_matrix, weights_prior_scale = 0.1, weights_batch_shape = NULL, name = NULL )
sts_sparse_linear_regression( design_matrix, weights_prior_scale = 0.1, weights_batch_shape = NULL, name = NULL )
design_matrix |
float |
weights_prior_scale |
float |
weights_batch_shape |
if |
name |
the name of this model component. Default value: 'LinearRegression'. |
observed_time_series <- tf$matmul(design_matrix, weights)
This is identical to sts_linear_regression
, except that
sts_sparse_linear_regression
uses a parameterization of a Horseshoe
prior to encode the assumption that many of the weights
are zero,
i.e., many of the covariate time series are irrelevant. See the mathematical
details section below for further discussion. The prior parameterization used
by sts_sparse_linear_regression
is more suitable for inference than that
obtained by simply passing the equivalent tfd_horseshoe
prior to
sts_linear_regression
; when sparsity is desired, sts_sparse_linear_regression
will
likely yield better results.
This component does not itself include observation noise; it defines a
deterministic distribution with mass at the point
tf$matmul(design_matrix, weights)
. In practice, it should be combined with
observation noise from another component such as sts_sum
.
Mathematical Details
The basic horseshoe prior Carvalho et al. (2009) is defined as a Cauchy-normal scale mixture:
scales[i] ~ HalfCauchy(loc=0, scale=1) weights[i] ~ Normal(loc=0., scale=scales[i] * global_scale)`
The Cauchy scale parameters puts substantial mass near zero, encouraging
weights to be sparse, but their heavy tails allow weights far from zero to be
estimated without excessive shrinkage. The horseshoe can be thought of as a
continuous relaxation of a traditional 'spike-and-slab' discrete sparsity
prior, in which the latent Cauchy scale mixes between 'spike'
(scales[i] ~= 0
) and 'slab' (scales[i] >> 0
) regimes.
Following the recommendations in Piironen et al. (2017), SparseLinearRegression
implements
a horseshoe with the following adaptations:
The Cauchy prior on scales[i]
is represented as an InverseGamma-Normal
compound.
The global_scale
parameter is integrated out following a Cauchy(0., scale=weights_prior_scale)
hyperprior, which is also represented as an
InverseGamma-Normal compound.
All compound distributions are implemented using a non-centered parameterization. The compound, non-centered representation defines the same marginal prior as the original horseshoe (up to integrating out the global scale), but allows samplers to mix more efficiently through the heavy tails; for variational inference, the compound representation implicity expands the representational power of the variational model.
Note that we do not yet implement the regularized ('Finnish') horseshoe, proposed in Piironen et al. (2017) for models with weak likelihoods, because the likelihood in STS models is typically Gaussian, where it's not clear that additional regularization is appropriate. If you need this functionality, please email [email protected].
The full prior parameterization implemented in SparseLinearRegression
is
as follows:
Sample global_scale from Cauchy(0, scale=weights_prior_scale). global_scale_variance ~ InverseGamma(alpha=0.5, beta=0.5) global_scale_noncentered ~ HalfNormal(loc=0, scale=1) global_scale = (global_scale_noncentered * sqrt(global_scale_variance) * weights_prior_scale) Sample local_scales from Cauchy(0, 1). local_scale_variances[i] ~ InverseGamma(alpha=0.5, beta=0.5) local_scales_noncentered[i] ~ HalfNormal(loc=0, scale=1) local_scales[i] = local_scales_noncentered[i] * sqrt(local_scale_variances[i]) weights[i] ~ Normal(loc=0., scale=local_scales[i] * global_scale)
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sum()
This class enables compositional specification of a structural time series model from basic components. Given a list of component models, it represents an additive model, i.e., a model of time series that may be decomposed into a sum of terms corresponding to the component models.
sts_sum( observed_time_series = NULL, components, constant_offset = NULL, observation_noise_scale_prior = NULL, name = NULL )
sts_sum( observed_time_series = NULL, components, constant_offset = NULL, observation_noise_scale_prior = NULL, name = NULL )
observed_time_series |
optional |
components |
|
constant_offset |
optional scalar |
observation_noise_scale_prior |
optional |
name |
string name of this model component; used as |
Formally, the additive model represents a random process
g[t] = f1[t] + f2[t] + ... + fN[t] + eps[t]
, where the f
's are the
random processes represented by the components, and
eps[t] ~ Normal(loc=0, scale=observation_noise_scale)
is an observation
noise term. See the AdditiveStateSpaceModel
documentation for mathematical details.
This model inherits the parameters (with priors) of its components, and
adds an observation_noise_scale
parameter governing the level of noise in
the observed time series.
an instance of StructuralTimeSeries
.
For usage examples see sts_fit_with_hmc()
, sts_forecast()
, sts_decompose_by_component()
.
Other sts:
sts_additive_state_space_model()
,
sts_autoregressive_state_space_model()
,
sts_autoregressive()
,
sts_constrained_seasonal_state_space_model()
,
sts_dynamic_linear_regression_state_space_model()
,
sts_dynamic_linear_regression()
,
sts_linear_regression()
,
sts_local_level_state_space_model()
,
sts_local_level()
,
sts_local_linear_trend_state_space_model()
,
sts_local_linear_trend()
,
sts_seasonal_state_space_model()
,
sts_seasonal()
,
sts_semi_local_linear_trend_state_space_model()
,
sts_semi_local_linear_trend()
,
sts_smooth_seasonal_state_space_model()
,
sts_smooth_seasonal()
,
sts_sparse_linear_regression()
Y = g(X) = Abs(X)
, element-wiseThis non-injective bijector allows for transformations of scalar distributions
with the absolute value function, which maps (-inf, inf)
to [0, inf)
.
For y
in (0, inf)
, tfb_absolute_value$inverse(y)
returns the set inverse
{x in (-inf, inf) : |x| = y}
as a tuple, -y, y
.
tfb_absolute_value$inverse(0)
returns 0, 0
, which is not the set inverse
(the set inverse is the singleton {0}
), but "works" in conjunction with
TransformedDistribution
to produce a left semi-continuous pdf.
For y < 0
, tfb_absolute_value$inverse(y)
happily returns the wrong thing, -y, y
This is done for efficiency. If validate_args == TRUE
, y < 0
will raise an exception.
tfb_absolute_value(validate_args = FALSE, name = "absolute_value")
tfb_absolute_value(validate_args = FALSE, name = "absolute_value")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This Bijector is initialized with shift Tensor and scale arguments,
giving the forward operation: Y = g(X) = scale @ X + shift
where the scale term is logically equivalent to:
scale = scale_identity_multiplier * tf.diag(tf.ones(d)) + tf.diag(scale_diag) + scale_tril + scale_perturb_factor @ diag(scale_perturb_diag) @ tf.transpose([scale_perturb_factor]))
tfb_affine( shift = NULL, scale_identity_multiplier = NULL, scale_diag = NULL, scale_tril = NULL, scale_perturb_factor = NULL, scale_perturb_diag = NULL, adjoint = FALSE, validate_args = FALSE, name = "affine", dtype = NULL )
tfb_affine( shift = NULL, scale_identity_multiplier = NULL, scale_diag = NULL, scale_tril = NULL, scale_perturb_factor = NULL, scale_perturb_diag = NULL, adjoint = FALSE, validate_args = FALSE, name = "affine", dtype = NULL )
shift |
Floating-point Tensor. If this is set to NULL, no shift is applied. |
scale_identity_multiplier |
floating point rank 0 Tensor representing a scaling done
to the identity matrix. When |
scale_diag |
Floating-point Tensor representing the diagonal matrix.
|
scale_tril |
Floating-point Tensor representing the lower triangular matrix.
|
scale_perturb_factor |
Floating-point Tensor representing factor matrix with last
two dimensions of shape |
scale_perturb_diag |
Floating-point Tensor representing the diagonal matrix.
|
adjoint |
Logical indicating whether to use the scale matrix as specified or its adjoint. Default value: FALSE. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
dtype |
|
If NULL of scale_identity_multiplier
, scale_diag
, or scale_tril
are specified then
scale += IdentityMatrix
Otherwise specifying a scale argument has the semantics of
scale += Expand(arg)
, i.e., scale_diag != NULL
means scale += tf$diag(scale_diag)
.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X; shift, scale) = scale @ X + shift
shift
is a numeric Tensor and scale is a LinearOperator.
If X
is a scalar then the forward transformation is: scale * X + shift
where *
denotes broadcasted elementwise product.
tfb_affine_linear_operator( shift = NULL, scale = NULL, adjoint = FALSE, validate_args = FALSE, name = "affine_linear_operator" )
tfb_affine_linear_operator( shift = NULL, scale = NULL, adjoint = FALSE, validate_args = FALSE, name = "affine_linear_operator" )
shift |
Floating-point Tensor. |
scale |
Subclass of LinearOperator. Represents the (batch) positive definite matrix |
adjoint |
Logical indicating whether to use the scale matrix as specified or its adjoint. Default value: FALSE. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Both the domain and the codomain of the mapping is [-inf, inf]^n
, however,
the input of the inverse mapping must be strictly increasing.
On the last dimension of the tensor, the Ascending bijector performs:
y = tf$cumsum([x[0], tf$exp(x[1]), tf$exp(x[2]), ..., tf$exp(x[-1])])
tfb_ascending(validate_args = FALSE, name = "ascending")
tfb_ascending(validate_args = FALSE, name = "ascending")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X)
s.t. X = g^-1(Y) = (Y - mean(Y)) / std(Y)
Applies Batch Normalization (Ioffe and Szegedy, 2015) to samples from a data distribution. This can be used to stabilize training of normalizing flows (Papamakarios et al., 2016; Dinh et al., 2017)
tfb_batch_normalization( batchnorm_layer = NULL, training = TRUE, validate_args = FALSE, name = "batch_normalization" )
tfb_batch_normalization( batchnorm_layer = NULL, training = TRUE, validate_args = FALSE, name = "batch_normalization" )
batchnorm_layer |
|
training |
If TRUE, updates running-average statistics during call to inverse(). |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
When training Deep Neural Networks (DNNs), it is common practice to normalize or whiten features by shifting them to have zero mean and scaling them to have unit variance.
The inverse()
method of the BatchNormalization bijector, which is used in
the log-likelihood computation of data samples, implements the normalization
procedure (shift-and-scale) using the mean and standard deviation of the
current minibatch.
Conversely, the forward()
method of the bijector de-normalizes samples (e.g.
X*std(Y) + mean(Y)
with the running-average mean and standard deviation
computed at training-time. De-normalization is useful for sampling.
During training time, BatchNormalization.inverse and BatchNormalization.forward are not
guaranteed to be inverses of each other because inverse(y)
uses statistics of the current minibatch,
while forward(x)
uses running-average statistics accumulated from training.
In other words, tfb_batch_normalization()$inverse(tfb_batch_normalization()$forward(...))
and
tfb_batch_normalization()$forward(tfb_batch_normalization()$inverse(...))
will be identical when
training=FALSE but may be different when training=TRUE.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
More specifically, given [F_0, F_1, ... F_n]
which are scalar or vector
bijectors this bijector creates a transformation which operates on the vector
[x_0, ... x_n]
with the transformation [F_0(x_0), F_1(x_1) ..., F_n(x_n)]
where x_0, ..., x_n
are blocks (partitions) of the vector.
tfb_blockwise( bijectors, block_sizes = NULL, validate_args = FALSE, name = NULL )
tfb_blockwise( bijectors, block_sizes = NULL, validate_args = FALSE, name = NULL )
bijectors |
A non-empty list of bijectors. |
block_sizes |
A 1-D integer Tensor with each element signifying the length of the block of the input vector to pass to the corresponding bijector. The length of block_sizes must be be equal to the length of bijectors. If left as NULL, a vector of 1's is used. |
validate_args |
Logical indicating whether arguments should be checked for correctness. |
name |
String, name given to ops managed by this object. Default:
E.g., |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Bijector which applies a sequence of bijectors
tfb_chain( bijectors = NULL, validate_args = FALSE, validate_event_size = TRUE, parameters = NULL, name = NULL )
tfb_chain( bijectors = NULL, validate_args = FALSE, validate_event_size = TRUE, parameters = NULL, name = NULL )
bijectors |
list of bijector instances. An empty list makes this bijector equivalent to the Identity bijector. |
validate_args |
Logical indicating whether arguments should be checked for correctness. |
validate_event_size |
Checks that bijectors are not applied to inputs with
incomplete support (that is, inputs where one or more elements are a
deterministic transformation of the others). For example, the following
LDJ would be incorrect:
|
parameters |
Locals dict captured by subclass constructor, to be used for copy/slice re-instantiation operators. |
name |
String, name given to ops managed by this object. Default:
E.g., |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
g(X) = X @ X.T
where X
is lower-triangular, positive-diagonal matrixNote: the upper-triangular part of X is ignored (whether or not its zero).
tfb_cholesky_outer_product( validate_args = FALSE, name = "cholesky_outer_product" )
tfb_cholesky_outer_product( validate_args = FALSE, name = "cholesky_outer_product" )
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
The surjectivity of g as a map from the set of n x n positive-diagonal
lower-triangular matrices to the set of SPD matrices follows immediately from
executing the Cholesky factorization algorithm on an SPD matrix A
to produce a
positive-diagonal lower-triangular matrix L
such that A = L @ L.T
.
To prove the injectivity of g, suppose that L_1
and L_2
are lower-triangular
with positive diagonals and satisfy A = L_1 @ L_1.T = L_2 @ L_2.T
. Then
inv(L_1) @ A @ inv(L_1).T = [inv(L_1) @ L_2] @ [inv(L_1) @ L_2].T = I
.
Setting L_3 := inv(L_1) @ L_2
, that L_3
is a positive-diagonal
lower-triangular matrix follows from inv(L_1)
being positive-diagonal
lower-triangular (which follows from the diagonal of a triangular matrix being
its spectrum), and that the product of two positive-diagonal lower-triangular
matrices is another positive-diagonal lower-triangular matrix.
A simple inductive argument (proceeding one column of L_3
at a time) shows
that, if I = L_3 @ L_3.T
, with L_3
being lower-triangular with positive-
diagonal, then L_3 = I
. Thus, L_1 = L_2
, proving injectivity of g.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
M^{-1}
The forward and inverse calculations are conceptually identical to:
forward <- function(x) tf$cholesky(tf$linalg$inv(tf$matmul(x, x, adjoint_b=TRUE)))
inverse = forward
However, the actual calculations exploit the triangular structure of the matrices.
tfb_cholesky_to_inv_cholesky( validate_args = FALSE, name = "cholesky_to_inv_cholesky" )
tfb_cholesky_to_inv_cholesky( validate_args = FALSE, name = "cholesky_to_inv_cholesky" )
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This bijector is a mapping between R^{n}
and the n
-dimensional manifold of
Cholesky-space correlation matrices embedded in R^{m^2}
, where n
is the
(m - 1)
th triangular number; i.e. n = 1 + 2 + ... + (m - 1)
.
tfb_correlation_cholesky(validate_args = FALSE, name = "correlation_cholesky")
tfb_correlation_cholesky(validate_args = FALSE, name = "correlation_cholesky")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Mathematical Details
The image of unconstrained reals under the CorrelationCholesky
bijector is
the set of correlation matrices which are positive definite.
A correlation matrix
can be characterized as a symmetric positive semidefinite matrix with 1s on
the main diagonal. However, the correlation matrix is positive definite if no
component can be expressed as a linear combination of the other components.
For a lower triangular matrix L
to be a valid Cholesky-factor of a positive
definite correlation matrix, it is necessary and sufficient that each row of
L
have unit Euclidean norm. To see this, observe that if L_i
is the
i
th row of the Cholesky factor corresponding to the correlation matrix R
,
then the i
th diagonal entry of R
satisfies:
1 = R_i,i = L_i . L_i = ||L_i||^2
where '.' is the dot product of vectors and ||...||
denotes the Euclidean
norm. Furthermore, observe that R_i,j
lies in the interval [-1, 1]
. By the
Cauchy-Schwarz inequality:
|R_i,j| = |L_i . L_j| <= ||L_i|| ||L_j|| = 1
This is a consequence of the fact that R
is symmetric positive definite with
1s on the main diagonal.
The LKJ distribution with input_output_cholesky=TRUE
generates samples from
(and computes log-densities on) the set of Cholesky factors of positive
definite correlation matrices. The CorrelationCholesky
bijector provides
a bijective mapping from unconstrained reals to the support of the LKJ
distribution.
a bijector instance.
Stan Manual. Section 24.2. Cholesky LKJ Correlation Distribution.
Daniel Lewandowski, Dorota Kurowicka, and Harry Joe, "Generating random correlation matrices based on vines and extended onion method," Journal of Multivariate Analysis 100 (2009), pp 1989-2001.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Computes the cumulative sum of a tensor along a specified axis.
tfb_cumsum(axis = -1, validate_args = FALSE, name = "cumsum")
tfb_cumsum(axis = -1, validate_args = FALSE, name = "cumsum")
axis |
|
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = DCT(X)
, where DCT type is indicated by the type argThe discrete cosine transform
efficiently applies a unitary DCT operator. This can be useful for mixing and decorrelating across
the innermost event dimension.
The inverse X = g^{-1}(Y) = IDCT(Y)
, where IDCT is DCT-III for type==2.
This bijector can be interleaved with Affine bijectors to build a cascade of
structured efficient linear layers as in Moczulski et al., 2016.
Note that the operator applied is orthonormal (i.e. norm='ortho').
tfb_discrete_cosine_transform( validate_args = FALSE, dct_type = 2, name = "dct" )
tfb_discrete_cosine_transform( validate_args = FALSE, dct_type = 2, name = "dct" )
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
dct_type |
integer, the DCT type performed by the forward transformation. Currently, only 2 and 3 are supported. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y=g(X)=exp(X)
ComputesY=g(X)=exp(X)
tfb_exp(validate_args = FALSE, name = "exp")
tfb_exp(validate_args = FALSE, name = "exp")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = exp(X) - 1
This Bijector is no different from tfb_chain(list(tfb_affine_scalar(shift=-1), tfb_exp()))
.
However, this makes use of the more numerically stable routines
tf$math$expm1
and tf$log1p
.
tfb_expm1(validate_args = FALSE, name = "expm1")
tfb_expm1(validate_args = FALSE, name = "expm1")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Note: the expm1(.) is applied element-wise but the Jacobian is a reduction over the event space.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This bijector implements a continuous dynamics transformation parameterized by a differential equation, where initial and terminal conditions correspond to domain (X) and image (Y) i.e.
tfb_ffjord( state_time_derivative_fn, ode_solve_fn = NULL, trace_augmentation_fn = tfp$bijectors$ffjord$trace_jacobian_hutchinson, initial_time = 0, final_time = 1, validate_args = FALSE, dtype = tf$float32, name = "ffjord" )
tfb_ffjord( state_time_derivative_fn, ode_solve_fn = NULL, trace_augmentation_fn = tfp$bijectors$ffjord$trace_jacobian_hutchinson, initial_time = 0, final_time = 1, validate_args = FALSE, dtype = tf$float32, name = "ffjord" )
state_time_derivative_fn |
|
ode_solve_fn |
|
trace_augmentation_fn |
|
initial_time |
Scalar float representing time to which the |
final_time |
Scalar float representing time to which the |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
dtype |
|
name |
name prefixed to Ops created by this class. |
d/dt[state(t)] = state_time_derivative_fn(t, state(t)) state(initial_time) = X state(final_time) = Y
For this transformation the value of log_det_jacobian
follows another
differential equation, reducing it to computation of the trace of the jacobian
along the trajectory
state_time_derivative = state_time_derivative_fn(t, state(t)) d/dt[log_det_jac(t)] = Tr(jacobian(state_time_derivative, state(t)))
FFJORD constructor takes two functions ode_solve_fn
and
trace_augmentation_fn
arguments that customize integration of the
differential equation and trace estimation.
Differential equation integration is performed by a call to ode_solve_fn
.
Custom ode_solve_fn
must accept the following arguments:
ode_fn(time, state): Differential equation to be solved.
initial_time: Scalar float or floating Tensor representing the initial time.
initial_state: Floating Tensor representing the initial state.
solution_times: 1D floating Tensor of solution times.
And return a Tensor of shape [solution_times$shape, initial_state$shape]
representing state values evaluated at solution_times
. In addition
ode_solve_fn
must support nested structures. For more details see the
interface of tfp$math$ode$Solver$solve()
.
Trace estimation is computed simultaneously with state_time_derivative
using augmented_state_time_derivative_fn
that is generated by
trace_augmentation_fn
. trace_augmentation_fn
takes
state_time_derivative_fn
, state.shape
and state.dtype
arguments and
returns a augmented_state_time_derivative_fn
callable that computes both
state_time_derivative
and unreduced trace_estimation
.
Custom ode_solve_fn
and trace_augmentation_fn
examples:
# custom_solver_fn: `function(f, t_initial, t_solutions, y_initial, ...)` # ... : Additional arguments to pass to custom_solver_fn. ode_solve_fn <- function(ode_fn, initial_time, initial_state, solution_times) { custom_solver_fn(ode_fn, initial_time, solution_times, initial_state, ...) } ffjord <- tfb_ffjord(state_time_derivative_fn, ode_solve_fn = ode_solve_fn)
# state_time_derivative_fn: `function(time, state)` # trace_jac_fn: `function(time, state)` unreduced jacobian trace function trace_augmentation_fn <- function(ode_fn, state_shape, state_dtype) { augmented_ode_fn <- function(time, state) { list(ode_fn(time, state), trace_jac_fn(time, state)) } augmented_ode_fn } ffjord <- tfb_ffjord(state_time_derivative_fn, trace_augmentation_fn = trace_augmentation_fn)
For more details on FFJORD and continous normalizing flows see Chen et al. (2018), Grathwol et al. (2018).
a bijector instance.
Chen, T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. K. (2018). Neural ordinary differential equations. In Advances in neural information processing systems (pp. 6571-6583)
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This is implemented as a simple tfb_chain of tfb_fill_triangular followed by tfb_transform_diagonal, and provided mostly as a convenience. The default setup is somewhat opinionated, using a Softplus transformation followed by a small shift (1e-5) which attempts to avoid numerical issues from zeros on the diagonal.
tfb_fill_scale_tri_l( diag_bijector = NULL, diag_shift = 1e-05, validate_args = FALSE, name = "fill_scale_tril" )
tfb_fill_scale_tri_l( diag_bijector = NULL, diag_shift = 1e-05, validate_args = FALSE, name = "fill_scale_tril" )
diag_bijector |
Bijector instance, used to transform the output diagonal to be positive.
Default value: NULL (i.e., |
diag_shift |
Float value broadcastable and added to all diagonal entries after applying the diag_bijector. Setting a positive value forces the output diagonal entries to be positive, but prevents inverting the transformation for matrices with diagonal entries less than this value. Default value: 1e-5. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Triangular matrix elements are filled in a clockwise spiral.
Given input with shape batch_shape + [d]
, produces output with
shape batch_shape + [n, n]
, where n = (-1 + sqrt(1 + 8 * d))/2
.
This follows by solving the quadratic equation d = 1 + 2 + ... + n = n * (n + 1)/2
.
tfb_fill_triangular( upper = FALSE, validate_args = FALSE, name = "fill_triangular" )
tfb_fill_triangular( upper = FALSE, validate_args = FALSE, name = "fill_triangular" )
upper |
Logical representing whether output matrix should be upper triangular (TRUE) or lower triangular (FALSE, default). |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
X = g(Y)
.Returns the forward Bijector evaluation, i.e., X = g(Y)
.
tfb_forward(bijector, x, name = "forward")
tfb_forward(bijector, x, name = "forward")
bijector |
The bijector to apply |
x |
Tensor. The input to the "forward" evaluation. |
name |
name of the operation |
a tensor
Other bijector_methods:
tfb_forward_log_det_jacobian()
,
tfb_inverse_log_det_jacobian()
,
tfb_inverse()
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 b %>% tfb_forward(x)
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 b %>% tfb_forward(x)
Returns the result of the forward evaluation of the log determinant of the Jacobian
tfb_forward_log_det_jacobian( bijector, x, event_ndims, name = "forward_log_det_jacobian" )
tfb_forward_log_det_jacobian( bijector, x, event_ndims, name = "forward_log_det_jacobian" )
bijector |
The bijector to apply |
x |
Tensor. The input to the "forward" Jacobian determinant evaluation. |
event_ndims |
Number of dimensions in the probabilistic events being transformed. Must be greater than or equal to bijector$forward_min_event_ndims. The result is summed over the final dimensions to produce a scalar Jacobian determinant for each event, i.e. it has shape x$shape$ndims - event_ndims dimensions. |
name |
name of the operation |
a tensor
Other bijector_methods:
tfb_forward()
,
tfb_inverse_log_det_jacobian()
,
tfb_inverse()
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 b %>% tfb_forward_log_det_jacobian(x, event_ndims = 0)
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 b %>% tfb_forward_log_det_jacobian(x, event_ndims = 0)
Overview: Glow
is a chain of bijectors which transforms a rank-1 tensor
(vector) into a rank-3 tensor (e.g. an RGB image). Glow
does this by
chaining together an alternating series of "Blocks," "Squeezes," and "Exits"
which are each themselves special chains of other bijectors. The intended use
of Glow
is as part of a tfd_transformed_distribution
, in
which the base distribution over the vector space is used to generate samples
in the image space. In the paper, an Independent Normal distribution is used
as the base distribution.
tfb_glow( output_shape = c(32, 32, 3), num_glow_blocks = 3, num_steps_per_block = 32, coupling_bijector_fn = NULL, exit_bijector_fn = NULL, grab_after_block = NULL, use_actnorm = TRUE, seed = NULL, validate_args = FALSE, name = "glow" )
tfb_glow( output_shape = c(32, 32, 3), num_glow_blocks = 3, num_steps_per_block = 32, coupling_bijector_fn = NULL, exit_bijector_fn = NULL, grab_after_block = NULL, use_actnorm = TRUE, seed = NULL, validate_args = FALSE, name = "glow" )
output_shape |
A list of integers, specifying the event shape of the
output, of the bijectors forward pass (the image). Specified as
|
num_glow_blocks |
An integer, specifying how many downsampling levels to include in the model. This must divide equally into both H and W, otherwise the bijector would not be invertible. Default Value: 3 |
num_steps_per_block |
An integer specifying how many Affine Coupling and 1x1 convolution layers to include at each level of the spatial hierarchy. Default Value: 32 (i.e. the value used in the original glow paper). |
coupling_bijector_fn |
A function which takes the argument |
exit_bijector_fn |
Similar to coupling_bijector_fn, exit_bijector_fn is
a function which takes the argument |
grab_after_block |
A tuple of floats, specifying what fraction of the remaining channels to remove following each glow block. Glow will take the integer floor of this number multiplied by the remaining number of channels. The default is half at each spatial hierarchy. Default value: None (this will take out half of the channels after each block. |
use_actnorm |
A boolean deciding whether or not to use actnorm. Data-dependent
initialization is used to initialize this layer. Default value: |
seed |
A seed to control randomness in the 1x1 convolution initialization.
Default value: |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
A "Block" (implemented as the GlowBlock
Bijector) performs much of the
transformations which allow glow to produce sophisticated and complex mappings
between the image space and the latent space and therefore achieve rich image
generation performance. A Block is composed of num_steps_per_block
steps,
which are each implemented as a Chain
containing an
ActivationNormalization
(ActNorm) bijector, followed by an (invertible)
OneByOneConv
bijector, and finally a coupling bijector. The coupling
bijector is an instance of a RealNVP
bijector, and uses the
coupling_bijector_fn
function to instantiate the coupling bijector function
which is given to the RealNVP
. This function returns a bijector which
defines the coupling (e.g. Shift(Scale)
for affine coupling or Shift
for
additive coupling).
A "Squeeze" converts spatial features into channel features. It is
implemented using the Expand
bijector. The difference in names is
due to the fact that the forward
function from glow is meant to ultimately
correspond to sampling from a tfp$util$TransformedDistribution
object,
which would use Expand
(Squeeze is just Invert(Expand)). The Expand
bijector takes a tensor with shape [H, W, C]
and returns a tensor with shape
[2H, 2W, C / 4]
, such that each 2x2x1 spatial tile in the output is composed
from a single 1x1x4 tile in the input tensor, as depicted in the figure below.
Forward pass (Expand)
\ \ \ \ \ \\ \ ----> \ 1 \ 2 \ \\\__1__\ \____\____\ \\\__2__\ \ \ \ \\__3__\ <---- \ 3 \ 4 \ \__4__\ \____\____\
Inverse pass (Squeeze)
This is implemented using a chain of Reshape
-> Transpose
-> Reshape
bijectors. Note that on an inverse pass through the bijector, each Squeeze
will cause the width/height of the image to decrease by a factor of 2.
Therefore, the input image must be evenly divisible by 2 at least
num_glow_blocks
times, since it will pass through a Squeeze step that many
times.
An "Exit" is simply a junction at which some of the tensor "exits" from the
glow bijector and therefore avoids any further alteration. Each exit is
implemented as a Blockwise
bijector, where some channels are given to the
rest of the glow model, and the rest are given to a bypass implemented using
the Identity
bijector. The fraction of channels to be removed at each exit
is determined by the grab_after_block
arg, indicates the fraction of
remaining channels which join the identity bypass. The fraction is
converted to an integer number of channels by multiplying by the remaining
number of channels and rounding.
Additionally, at each exit, glow couples the tensor exiting the highway to
the tensor continuing onward. This makes small scale features in the image
dependent on larger scale features, since the larger scale features dictate
the mean and scale of the distribution over the smaller scale features.
This coupling is done similarly to the Coupling bijector in each step of the
flow (i.e. using a RealNVP bijector). However for the exit bijector, the
coupling is instantiated using exit_bijector_fn
rather than coupling
bijector fn, allowing for different behaviors between standard coupling and
exit coupling. Also note that because the exit utilizes a coupling bijector,
there are two special cases (all channels exiting and no channels exiting).
The full Glow bijector consists of num_glow_blocks
Blocks each of which
contains num_steps_per_block
steps. Each step implements a coupling using
bijector_coupling_fn
. Between blocks, glow converts between spatial pixels
and channels using the Expand Bijector, and splits channels out of the
bijector using the Exit Bijector. The channels which have exited continue
onward through Identity bijectors and those which have not exited are given
to the next block. After passing through all Blocks, the tensor is reshaped
to a rank-1 tensor with the same number of elements. This is where the
distribution will be defined.
A schematic diagram of Glow is shown below. The forward
function of the
bijector starts from the bottom and goes upward, while the inverse
function
starts from the top and proceeds downward.
a bijector instance.
Glow Schematic Diagram Input Image ######################## shape = [H, W, C] \ /<- Expand Bijector turns spatial \ / dimensions into channels. | XXXXXXXXXXXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXX A single step of the flow consists Glow Block - | XXXXXXXXXXXXXXXXXXXX <- of ActNorm -> 1x1Conv -> Coupling. | XXXXXXXXXXXXXXXXXXXX there are num_steps_per_block | XXXXXXXXXXXXXXXXXXXX steps of the flow in each block. |_ XXXXXXXXXXXXXXXXXXXX \ / <– Expand bijectors follow each glow \ / block XXXXXXXX\\\\ <– Exit Bijector removes channels _ _ from additional alteration. | XXXXXXXX ! | ! | XXXXXXXX ! | ! | XXXXXXXX ! | ! After exiting, channels are passed Glow Block - | XXXXXXXX ! | ! <— downward using the Blockwise and | XXXXXXXX ! | ! Identify bijectors. | XXXXXXXX ! | ! |_ XXXXXXXX ! | ! \ / <—- Expand Bijector \ / XXX\\ | ! <—- Exit Bijector _ | XXX ! | | ! | XXX ! | | ! | XXX ! | | ! low Block - | XXX ! | | ! | XXX ! | | ! | XXX ! | | ! |_ XXX ! | | ! XX\ ! | | ! <—– (Optional) Exit Bijector | | | v v v Output Distribution ########## shape = [H * W * C]
Legend
[H, W, C]: R:H,%20W,%20C [2H, 2W, C / 4]: R:2H,%202W,%20C%20/%204 [H, W, C]: R:H,%20W,%20C [H * W * C]: R:H%20*%20W%20*%20C
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = 1 - exp(-c * (exp(rate * X) - 1)
, the Gompertz CDF.This bijector maps inputs from [-inf, inf]
to [0, inf]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1)
gives back a
random variable with the
Gompertz distribution:
Y ~ GompertzCDF(concentration, rate) pdf(y; c, r) = r * c * exp(r * y + c - c * exp(-c * exp(r * y)))
Note: Because the Gompertz distribution concentrates its mass close to zero,
for larger rates or larger concentrations, bijector.forward
will quickly
saturate to 1.
tfb_gompertz_cdf( concentration, rate, validate_args = FALSE, name = "gompertz_cdf" )
tfb_gompertz_cdf( concentration, rate, validate_args = FALSE, name = "gompertz_cdf" )
concentration |
Positive Float-like |
rate |
Positive Float-like |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = exp(-exp(-(X - loc) / scale))
This bijector maps inputs from [-inf, inf]
to [0, 1]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1)
gives back a
random variable with the Gumbel distribution:
tfb_gumbel(loc = 0, scale = 1, validate_args = FALSE, name = "gumbel")
tfb_gumbel(loc = 0, scale = 1, validate_args = FALSE, name = "gumbel")
loc |
Float-like Tensor that is the same dtype and is broadcastable with scale.
This is loc in |
scale |
Positive Float-like Tensor that is the same dtype and is broadcastable with loc.
This is scale in |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Y ~ Gumbel(loc, scale)
pdf(y; loc, scale) = exp(-( (y - loc) / scale + exp(- (y - loc) / scale) ) ) / scale
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = exp(-exp(-(X - loc) / scale))
, the Gumbel CDF.This bijector maps inputs from [-inf, inf]
to [0, 1]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1)
gives back a
random variable with the Gumbel distribution:
tfb_gumbel_cdf(loc = 0, scale = 1, validate_args = FALSE, name = "gumbel_cdf")
tfb_gumbel_cdf(loc = 0, scale = 1, validate_args = FALSE, name = "gumbel_cdf")
loc |
Float-like |
scale |
Positive Float-like |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Y ~ GumbelCDF(loc, scale) pdf(y; loc, scale) = exp(-( (y - loc) / scale + exp(- (y - loc) / scale) ) ) / scale
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = X
ComputesY = g(X) = X
tfb_identity(validate_args = FALSE, name = "identity")
tfb_identity(validate_args = FALSE, name = "identity")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Bijector constructed from custom functions
tfb_inline( forward_fn = NULL, inverse_fn = NULL, inverse_log_det_jacobian_fn = NULL, forward_log_det_jacobian_fn = NULL, forward_event_shape_fn = NULL, forward_event_shape_tensor_fn = NULL, inverse_event_shape_fn = NULL, inverse_event_shape_tensor_fn = NULL, is_constant_jacobian = NULL, validate_args = FALSE, forward_min_event_ndims = NULL, inverse_min_event_ndims = NULL, name = "inline" )
tfb_inline( forward_fn = NULL, inverse_fn = NULL, inverse_log_det_jacobian_fn = NULL, forward_log_det_jacobian_fn = NULL, forward_event_shape_fn = NULL, forward_event_shape_tensor_fn = NULL, inverse_event_shape_fn = NULL, inverse_event_shape_tensor_fn = NULL, is_constant_jacobian = NULL, validate_args = FALSE, forward_min_event_ndims = NULL, inverse_min_event_ndims = NULL, name = "inline" )
forward_fn |
Function implementing the forward transformation. |
inverse_fn |
Function implementing the inverse transformation. |
inverse_log_det_jacobian_fn |
Function implementing the log_det_jacobian of the forward transformation. |
forward_log_det_jacobian_fn |
Function implementing the log_det_jacobian of the inverse transformation. |
forward_event_shape_fn |
Function implementing non-identical static event shape changes. Default: shape is assumed unchanged. |
forward_event_shape_tensor_fn |
Function implementing non-identical event shape changes. Default: shape is assumed unchanged. |
inverse_event_shape_fn |
Function implementing non-identical static event shape changes. Default: shape is assumed unchanged. |
inverse_event_shape_tensor_fn |
Function implementing non-identical event shape changes. Default: shape is assumed unchanged. |
is_constant_jacobian |
Logical indicating that the Jacobian is constant for all input arguments. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
forward_min_event_ndims |
Integer indicating the minimal dimensionality this bijector acts on. |
inverse_min_event_ndims |
Integer indicating the minimal dimensionality this bijector acts on. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
X = g^{-1}(Y)
.Returns the inverse Bijector evaluation, i.e., X = g^{-1}(Y)
.
tfb_inverse(bijector, y, name = "inverse")
tfb_inverse(bijector, y, name = "inverse")
bijector |
The bijector to apply |
y |
Tensor. The input to the "inverse" evaluation. |
name |
name of the operation |
a tensor
Other bijector_methods:
tfb_forward_log_det_jacobian()
,
tfb_forward()
,
tfb_inverse_log_det_jacobian()
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 y <- b %>% tfb_forward(x) b %>% tfb_inverse(y)
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 y <- b %>% tfb_forward(x) b %>% tfb_inverse(y)
Returns the result of the inverse evaluation of the log determinant of the Jacobian
tfb_inverse_log_det_jacobian( bijector, y, event_ndims, name = "inverse_log_det_jacobian" )
tfb_inverse_log_det_jacobian( bijector, y, event_ndims, name = "inverse_log_det_jacobian" )
bijector |
The bijector to apply |
y |
Tensor. The input to the "inverse" Jacobian determinant evaluation. |
event_ndims |
Number of dimensions in the probabilistic events being transformed. Must be greater than or equal to bijector$inverse_min_event_ndims. The result is summed over the final dimensions to produce a scalar Jacobian determinant for each event, i.e. it has shape x$shape$ndims - event_ndims dimensions. |
name |
name of the operation |
a tensor
Other bijector_methods:
tfb_forward_log_det_jacobian()
,
tfb_forward()
,
tfb_inverse()
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 y <- b %>% tfb_forward(x) b %>% tfb_inverse_log_det_jacobian(y, event_ndims = 0)
b <- tfb_affine_scalar(shift = 1, scale = 2) x <- 10 y <- b %>% tfb_forward(x) b %>% tfb_inverse_log_det_jacobian(y, event_ndims = 0)
Creates a Bijector which swaps the meaning of inverse and forward.
Note: An inverted bijector's inverse_log_det_jacobian is often more
efficient if the base bijector implements _forward_log_det_jacobian. If
_forward_log_det_jacobian is not implemented then the following code is
used:
y = b$inverse(x)
-b$inverse_log_det_jacobian(y)
tfb_invert(bijector, validate_args = FALSE, name = NULL)
tfb_invert(bijector, validate_args = FALSE, name = NULL)
bijector |
Bijector instance. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Bijector which applies a Stick Breaking procedure.
tfb_iterated_sigmoid_centered(validate_args = FALSE, name = "iterated_sigmoid")
tfb_iterated_sigmoid_centered(validate_args = FALSE, name = "iterated_sigmoid")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = (1 - (1 - X)**(1 / b))**(1 / a)
, with X in [0, 1]
This bijector maps inputs from [0, 1]
to [0, 1]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1) gives back a
random variable with the Kumaraswamy distribution:
Y ~ Kumaraswamy(a, b)
pdf(y; a, b, 0 <= y <= 1) = a * b * y ** (a - 1) * (1 - y**a) ** (b - 1)
tfb_kumaraswamy( concentration1 = NULL, concentration0 = NULL, validate_args = FALSE, name = "kumaraswamy" )
tfb_kumaraswamy( concentration1 = NULL, concentration0 = NULL, validate_args = FALSE, name = "kumaraswamy" )
concentration1 |
float scalar indicating the transform power, i.e.,
|
concentration0 |
float scalar indicating the transform power,
i.e., |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = (1 - (1 - X)**(1 / b))**(1 / a)
, with X in [0, 1]
This bijector maps inputs from [0, 1]
to [0, 1]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1) gives back a
random variable with the Kumaraswamy distribution:
Y ~ Kumaraswamy(a, b)
pdf(y; a, b, 0 <= y <= 1) = a * b * y ** (a - 1) * (1 - y**a) ** (b - 1)
tfb_kumaraswamy_cdf( concentration1 = 1, concentration0 = 1, validate_args = FALSE, name = "kumaraswamy_cdf" )
tfb_kumaraswamy_cdf( concentration1 = 1, concentration0 = 1, validate_args = FALSE, name = "kumaraswamy_cdf" )
concentration1 |
float scalar indicating the transform power, i.e.,
|
concentration0 |
float scalar indicating the transform power,
i.e., |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
A random variable Y has a Lambert W x F distribution if W_tau(Y) = X has distribution F, where tau = (shift, scale, tail) parameterizes the inverse transformation.
tfb_lambert_w_tail( shift = NULL, scale = NULL, tailweight = NULL, validate_args = FALSE, name = "lambertw_tail" )
tfb_lambert_w_tail( shift = NULL, scale = NULL, tailweight = NULL, validate_args = FALSE, name = "lambertw_tail" )
shift |
Floating point tensor; the shift for centering (uncentering) the input (output) random variable(s). |
scale |
Floating point tensor; the scaling (unscaling) of the input (output) random variable(s). Must contain only positive values. |
tailweight |
Floating point tensor; the tail behaviors of the output random variable(s). Must contain only non-negative values. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
This bijector defines the transformation underlying Lambert W x F distributions that transform an input random variable to an output random variable with heavier tails. It is defined as Y = (U * exp(0.5 * tail * U^2)) * scale + shift, tail >= 0 where U = (X - shift) / scale is a shifted/scaled input random variable, and tail >= 0 is the tail parameter.
Attributes:
shift: shift to center (uncenter) the input data.
scale: scale to normalize (de-normalize) the input data.
tailweight: Tail parameter delta
of heavy-tail transformation; must be >= 0.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This will be wrapped in a make_template to ensure the variables are only created once. It takes the input and returns the loc ("mu" in Germain et al. (2015)) and log_scale ("alpha" in Germain et al. (2015)) from the MADE network.
tfb_masked_autoregressive_default_template( hidden_layers, shift_only = FALSE, activation = tf$nn$relu, log_scale_min_clip = -5, log_scale_max_clip = 3, log_scale_clip_gradient = FALSE, name = NULL, ... )
tfb_masked_autoregressive_default_template( hidden_layers, shift_only = FALSE, activation = tf$nn$relu, log_scale_min_clip = -5, log_scale_max_clip = 3, log_scale_clip_gradient = FALSE, name = NULL, ... )
list-like of non-negative integer, scalars indicating the number
of units in each hidden layer. Default: |
|
shift_only |
logical indicating if only the shift term shall be computed. Default: FALSE. |
activation |
Activation function (callable). Explicitly setting to NULL implies a linear activation. |
log_scale_min_clip |
float-like scalar Tensor, or a Tensor with the same shape as log_scale. The minimum value to clip by. Default: -5. |
log_scale_max_clip |
float-like scalar Tensor, or a Tensor with the same shape as log_scale. The maximum value to clip by. Default: 3. |
log_scale_clip_gradient |
logical indicating that the gradient of tf$clip_by_value should be preserved. Default: FALSE. |
name |
A name for ops managed by this function. Default: "tfb_masked_autoregressive_default_template". |
... |
|
Warning: This function uses masked_dense to create randomly initialized
tf$Variables
. It is presumed that these will be fit, just as you would any
other neural architecture which uses tf$layers$dense
.
About Hidden Layers
Each element of hidden_layers should be greater than the input_depth
(i.e., input_depth = tf$shape(input)[-1]
where input is the input to the
neural network). This is necessary to ensure the autoregressivity property.
About Clipping
This function also optionally clips the log_scale (but possibly not its
gradient). This is useful because if log_scale is too small/large it might
underflow/overflow making it impossible for the MaskedAutoregressiveFlow
bijector to implement a bijection. Additionally, the log_scale_clip_gradient
bool indicates whether the gradient should also be clipped. The default does
not clip the gradient; this is useful because it still provides gradient
information (for fitting) yet solves the numerical stability problem. I.e.,
log_scale_clip_gradient = FALSE means grad[exp(clip(x))] = grad[x] exp(clip(x))
rather than the usual grad[clip(x)] exp(clip(x))
.
list of:
shift: Float
-like Tensor
of shift terms
log_scale: Float
-like Tensor
of log(scale) terms
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
The affine autoregressive flow (Papamakarios et al., 2016) provides a relatively simple framework for user-specified (deep) architectures to learn a distribution over continuous events. Regarding terminology,
tfb_masked_autoregressive_flow( shift_and_log_scale_fn, is_constant_jacobian = FALSE, unroll_loop = FALSE, event_ndims = 1L, validate_args = FALSE, name = NULL )
tfb_masked_autoregressive_flow( shift_and_log_scale_fn, is_constant_jacobian = FALSE, unroll_loop = FALSE, event_ndims = 1L, validate_args = FALSE, name = NULL )
shift_and_log_scale_fn |
Function which computes shift and log_scale from both the
forward domain (x) and the inverse domain (y).
Calculation must respect the "autoregressive property". Suggested default:
tfb_masked_autoregressive_default_template(hidden_layers=...).
Typically the function contains |
is_constant_jacobian |
Logical, default: FALSE. When TRUE the implementation assumes log_scale does not depend on the forward domain (x) or inverse domain (y) values. (No validation is made; is_constant_jacobian=FALSE is always safe but possibly computationally inefficient.) |
unroll_loop |
Logical indicating whether the |
event_ndims |
integer, the intrinsic dimensionality of this bijector.
1 corresponds to a simple vector autoregressive bijector as implemented by the
|
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
"Autoregressive models decompose the joint density as a product of conditionals, and model each conditional in turn. Normalizing flows transform a base density (e.g. a standard Gaussian) into the target density by an invertible transformation with tractable Jacobian." (Papamakarios et al., 2016)
In other words, the "autoregressive property" is equivalent to the
decomposition, p(x) = prod{ p(x[perm[i]] | x[perm[0:i]]) : i=0, ..., d }
where perm is some permutation of {0, ..., d}
. In the simple case where
the permutation is identity this reduces to:
p(x) = prod{ p(x[i] | x[0:i]) : i=0, ..., d }
. The provided
shift_and_log_scale_fn, tfb_masked_autoregressive_default_template, achieves
this property by zeroing out weights in its masked_dense layers.
In TensorFlow Probability, "normalizing flows" are implemented as
tfp.bijectors.Bijectors. The forward "autoregression" is implemented
using a tf.while_loop and a deep neural network (DNN) with masked weights
such that the autoregressive property is automatically met in the inverse.
A TransformedDistribution using MaskedAutoregressiveFlow(...) uses the
(expensive) forward-mode calculation to draw samples and the (cheap)
reverse-mode calculation to compute log-probabilities. Conversely, a
TransformedDistribution using Invert(MaskedAutoregressiveFlow(...)) uses
the (expensive) forward-mode calculation to compute log-probabilities and the
(cheap) reverse-mode calculation to compute samples.
Given a shift_and_log_scale_fn, the forward and inverse transformations are (a sequence of) affine transformations. A "valid" shift_and_log_scale_fn must compute each shift (aka loc or "mu" in Germain et al. (2015)]) and log(scale) (aka "alpha" in Germain et al. (2015)) such that ech are broadcastable with the arguments to forward and inverse, i.e., such that the calculations in forward, inverse below are possible.
For convenience, tfb_masked_autoregressive_default_template is offered as a possible shift_and_log_scale_fn function. It implements the MADE architecture (Germain et al., 2015). MADE is a feed-forward network that computes a shift and log(scale) using masked_dense layers in a deep neural network. Weights are masked to ensure the autoregressive property. It is possible that this architecture is suboptimal for your task. To build alternative networks, either change the arguments to tfb_masked_autoregressive_default_template, use the masked_dense function to roll-out your own, or use some other architecture, e.g., using tf.layers. Warning: no attempt is made to validate that the shift_and_log_scale_fn enforces the "autoregressive property".
Assuming shift_and_log_scale_fn has valid shape and autoregressive semantics, the forward transformation is
def forward(x): y = zeros_like(x) event_size = x.shape[-event_dims:].num_elements() for _ in range(event_size): shift, log_scale = shift_and_log_scale_fn(y) y = x * tf.exp(log_scale) + shift return y
and the inverse transformation is
def inverse(y): shift, log_scale = shift_and_log_scale_fn(y) return (y - shift) / tf.exp(log_scale)
Notice that the inverse does not need a for-loop. This is because in the forward pass each calculation of shift and log_scale is based on the y calculated so far (not x). In the inverse, the y is fully known, thus is equivalent to the scaling used in forward after event_size passes, i.e., the "last" y used to compute shift, log_scale. (Roughly speaking, this also proves the transform is bijective.)
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Analogous to tf$layers$dense
.
tfb_masked_dense( inputs, units, num_blocks = NULL, exclusive = FALSE, kernel_initializer = NULL, reuse = NULL, name = NULL, ... )
tfb_masked_dense( inputs, units, num_blocks = NULL, exclusive = FALSE, kernel_initializer = NULL, reuse = NULL, name = NULL, ... )
inputs |
Tensor input. |
units |
integer scalar representing the dimensionality of the output space. |
num_blocks |
integer scalar representing the number of blocks for the MADE masks. |
exclusive |
logical scalar representing whether to zero the diagonal of the mask, used for the first layer of a MADE. |
kernel_initializer |
Initializer function for the weight matrix.
If NULL (default), weights are initialized using the |
reuse |
logical scalar representing whether to reuse the weights of a previous layer by the same name. |
name |
string used to describe ops managed by this function. |
... |
|
See Germain et al. (2015)for detailed explanation.
tensor
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
g(L) = inv(L)
, where L is a lower-triangular matrixL must be nonsingular; equivalently, all diagonal entries of L must be nonzero.
The input must have rank >= 2. The input is treated as a batch of matrices
with batch shape input.shape[:-2]
, where each matrix has dimensions
input.shape[-2]
by input.shape[-1]
(hence input.shape[-2]
must equal input.shape[-1]
).
tfb_matrix_inverse_tri_l(validate_args = FALSE, name = "matrix_inverse_tril")
tfb_matrix_inverse_tri_l(validate_args = FALSE, name = "matrix_inverse_tril")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This bijector is identical to the "Convolution1x1" used in Glow (Kingma and Dhariwal, 2018).
tfb_matvec_lu(lower_upper, permutation, validate_args = FALSE, name = NULL)
tfb_matvec_lu(lower_upper, permutation, validate_args = FALSE, name = NULL)
lower_upper |
The LU factorization as returned by |
permutation |
The LU factorization permutation as returned by |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Warning: this bijector never verifies the scale matrix (as parameterized by LU ecomposition) is invertible. Ensuring this is the case is the caller's responsibility.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = NormalCDF(x)
This bijector maps inputs from [-inf, inf]
to [0, 1]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1)
gives back a
random variable with the Normal distribution:
tfb_normal_cdf(validate_args = FALSE, name = "normal")
tfb_normal_cdf(validate_args = FALSE, name = "normal")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Y ~ Normal(0, 1)
pdf(y; 0., 1.) = 1 / sqrt(2 * pi) * exp(-y ** 2 / 2)
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Both the domain and the codomain of the mapping is [-inf, inf]
, however,
the input of the forward mapping must be strictly increasing.
The inverse of the bijector applied to a normal random vector y ~ N(0, 1)
gives back a sorted random vector with the same distribution x ~ N(0, 1)
where x = sort(y)
tfb_ordered(validate_args = FALSE, name = "ordered")
tfb_ordered(validate_args = FALSE, name = "ordered")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
On the last dimension of the tensor, Ordered bijector performs:
y[0] = x[0]
y[1:] = tf$log(x[1:] - x[:-1])
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
event_shape
of a Tensor
.The semantics of bijector_pad
generally follow that of tf$pad()
except that bijector_pad
's paddings
argument applies to the rightmost
dimensions. Additionally, the new argument axis
enables overriding the
dimensions to which paddings
is applied. Like paddings
, the axis
argument is also relative to the rightmost dimension and must therefore be
negative.
The argument paddings
is a vector of integer
pairs each representing the
number of left and/or right constant_values
to pad to the corresponding
righmost dimensions. That is, unless axis
is specified, specifiying
kdifferent
paddingsmeans the rightmost
kdimensions will be "grown" by the sum of the respective
paddingsrow. When
axisis specified, it indicates the dimension to which the corresponding
paddingselement is applied. By default
axisis
NULLwhich means it is logically equivalent to
range(start=-len(paddings), limit=0)', i.e., the rightmost dimensions.
tfb_pad( paddings = list(c(0, 1)), mode = "CONSTANT", constant_values = 0, axis = NULL, validate_args = FALSE, name = NULL )
tfb_pad( paddings = list(c(0, 1)), mode = "CONSTANT", constant_values = 0, axis = NULL, validate_args = FALSE, name = NULL )
paddings |
A vector-shaped |
mode |
One of |
constant_values |
In "CONSTANT" mode, the scalar pad value to use. Must be
same type as |
axis |
The dimensions for which |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Permutes the rightmost dimension of a Tensor
tfb_permute(permutation, axis = -1L, validate_args = FALSE, name = NULL)
tfb_permute(permutation, axis = -1L, validate_args = FALSE, name = NULL)
permutation |
An integer-like vector-shaped Tensor representing the permutation to apply to the axis dimension of the transformed Tensor. |
axis |
Scalar integer Tensor representing the dimension over which to tf$gather. axis must be relative to the end (reading left to right) thus must be negative. Default value: -1 (i.e., right-most). |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = (1 + X * c)**(1 / c)
, where X >= -1 / c
The power transform maps
inputs from [0, inf]
to [-1/c, inf]
; this is equivalent to the inverse of this bijector.
This bijector is equivalent to the Exp bijector when c=0.
tfb_power_transform(power, validate_args = FALSE, name = "power_transform")
tfb_power_transform(power, validate_args = FALSE, name = "power_transform")
power |
float scalar indicating the transform power, i.e.,
|
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This transformation represents a monotonically increasing piecewise rational
quadratic function. Outside of the bounds of knot_x
/knot_y
, the transform
behaves as an identity function.
tfb_rational_quadratic_spline( bin_widths, bin_heights, knot_slopes, range_min = -1, validate_args = FALSE, name = NULL )
tfb_rational_quadratic_spline( bin_widths, bin_heights, knot_slopes, range_min = -1, validate_args = FALSE, name = NULL )
bin_widths |
The widths of the spans between subsequent knot |
bin_heights |
The heights of the spans between subsequent knot |
knot_slopes |
The slope of the spline at each knot, a floating point
|
range_min |
The |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Typically this bijector will be used as part of a chain, with splines for
trailing x
dimensions conditioned on some of the earlier x
dimensions, and
with the inverse then solved first for unconditioned dimensions, then using
conditioning derived from those inverses, and so forth.
For each argument, the innermost axis indexes bins/knots and batch axes
index axes of x
/y
spaces. A RationalQuadraticSpline
with a separate
transform for each of three dimensions might have bin_widths
shaped
[3, 32]
. To use the same spline for each of x
's three dimensions we may
broadcast against x
and use a bin_widths
parameter shaped [32]
.
Parameters will be broadcast against each other and against the input
x
/y
s, so if we want fixed slopes, we can use kwarg knot_slopes=1
.
A typical recipe for acquiring compatible bin widths and heights would be:
nbins <- unconstrained_vector$shape[-1] range_min <- 1 range_max <- 1 min_bin_size = 1e-2 scale <- range_max - range_min - nbins * min_bin_size bin_widths = tf$math$softmax(unconstrained_vector) * scale + min_bin_size
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = 1 - exp( -(X/scale)**2 / 2 ), X >= 0
.This bijector maps inputs from [0, inf]
to [0, 1]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1)
gives back a
random variable with the
Rayleigh distribution:
Y ~ Rayleigh(scale) pdf(y; scale, y >= 0) = (1 / scale) * (y / scale) * exp(-(y / scale)**2 / 2)
tfb_rayleigh_cdf(scale, validate_args = FALSE, name = "rayleigh_cdf")
tfb_rayleigh_cdf(scale, validate_args = FALSE, name = "rayleigh_cdf")
scale |
Positive floating-point tensor.
This is |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Likewise, the forward of this bijector is the Rayleigh distribution CDF.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Real NVP models a normalizing flow on a D-dimensional distribution via a
single D-d-dimensional conditional distribution (Dinh et al., 2017):
y[d:D] = x[d:D] * tf.exp(log_scale_fn(x[0:d])) + shift_fn(x[0:d])
y[0:d] = x[0:d]
The last D-d units are scaled and shifted based on the first d units only,
while the first d units are 'masked' and left unchanged. Real NVP's
shift_and_log_scale_fn computes vector-valued quantities.
For scale-and-shift transforms that do not depend on any masked units, i.e.
d=0, use the tfb_affine bijector with learned parameters instead.
Masking is currently only supported for base distributions with
event_ndims=1. For more sophisticated masking schemes like checkerboard or
channel-wise masking (Papamakarios et al., 2016), use the tfb_permute
bijector to re-order desired masked units into the first d units. For base
distributions with event_ndims > 1, use the tfb_reshape bijector to
flatten the event shape.
tfb_real_nvp( num_masked, shift_and_log_scale_fn, is_constant_jacobian = FALSE, validate_args = FALSE, name = NULL )
tfb_real_nvp( num_masked, shift_and_log_scale_fn, is_constant_jacobian = FALSE, validate_args = FALSE, name = NULL )
num_masked |
integer indicating that the first d units of the event
should be masked. Must be in the closed interval |
shift_and_log_scale_fn |
Function which computes shift and log_scale from both the
forward domain (x) and the inverse domain (y).
Calculation must respect the "autoregressive property". Suggested default:
|
is_constant_jacobian |
Logical, default: FALSE. When TRUE the implementation assumes log_scale does not depend on the forward domain (x) or inverse domain (y) values. (No validation is made; is_constant_jacobian=FALSE is always safe but possibly computationally inefficient.) |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Recall that the MAF bijector (Papamakarios et al., 2016) implements a normalizing flow via an autoregressive transformation. MAF and IAF have opposite computational tradeoffs - MAF can train all units in parallel but must sample units sequentially, while IAF must train units sequentially but can sample in parallel. In contrast, Real NVP can compute both forward and inverse computations in parallel. However, the lack of an autoregressive transformations makes it less expressive on a per-bijector basis.
A "valid" shift_and_log_scale_fn must compute each shift (aka loc or "mu" in Papamakarios et al. (2016) and log(scale) (aka "alpha" in Papamakarios et al. (2016)) such that each are broadcastable with the arguments to forward and inverse, i.e., such that the calculations in forward, inverse below are possible. For convenience, real_nvp_default_nvp is offered as a possible shift_and_log_scale_fn function.
NICE (Dinh et al., 2014) is a special case of the Real NVP bijector which discards the scale transformation, resulting in a constant-time inverse-log-determinant-Jacobian. To use a NICE bijector instead of Real NVP, shift_and_log_scale_fn should return (shift, NULL), and is_constant_jacobian should be set to TRUE in the RealNVP constructor. Calling tfb_real_nvp_default_template with shift_only=TRUE returns one such NICE-compatible shift_and_log_scale_fn.
Caching: the scalar input depth D of the base distribution is not known at
construction time. The first call to any of forward(x), inverse(x),
inverse_log_det_jacobian(x), or forward_log_det_jacobian(x) memoizes
D, which is re-used in subsequent calls. This shape must be known prior to
graph execution (which is the case if using tf$layers
).
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This will be wrapped in a make_template to ensure the variables are only
created once. It takes the d-dimensional input x[0:d]
and returns the D-d
dimensional outputs loc ("mu") and log_scale ("alpha").
tfb_real_nvp_default_template( hidden_layers, shift_only = FALSE, activation = tf$nn$relu, name = NULL, ... )
tfb_real_nvp_default_template( hidden_layers, shift_only = FALSE, activation = tf$nn$relu, name = NULL, ... )
list-like of non-negative integer, scalars indicating the number
of units in each hidden layer. Default: |
|
shift_only |
logical indicating if only the shift term shall be computed (i.e. NICE bijector). Default: FALSE. |
activation |
Activation function (callable). Explicitly setting to NULL implies a linear activation. |
name |
A name for ops managed by this function. Default: "tfb_real_nvp_default_template". |
... |
tf$layers$dense arguments |
The default template does not support conditioning and will raise an exception if condition_kwargs are passed to it. To use conditioning in real nvp bijector, implement a conditioned shift/scale template that handles the condition_kwargs.
list of:
shift: Float
-like Tensor
of shift terms
log_scale: Float
-like Tensor
of log(scale) terms
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
b(x) = 1. / x
A Bijector that computes b(x) = 1. / x
tfb_reciprocal(validate_args = FALSE, name = "reciprocal")
tfb_reciprocal(validate_args = FALSE, name = "reciprocal")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
The semantics generally follow that of tf$reshape()
, with a few differences:
The user must provide both the input and output shape, so that
the transformation can be inverted. If an input shape is not
specified, the default assumes a vector-shaped input, i.e.,
event_shape_in = list(-1)
.
The Reshape bijector automatically broadcasts over the leftmost
dimensions of its input (sample_shape and batch_shape); only
the rightmost event_ndims_in dimensions are reshaped. The
number of dimensions to reshape is inferred from the provided
event_shape_in (event_ndims_in = length(event_shape_in))
.
tfb_reshape( event_shape_out, event_shape_in = c(-1), validate_args = FALSE, name = NULL )
tfb_reshape( event_shape_out, event_shape_in = c(-1), validate_args = FALSE, name = NULL )
event_shape_out |
An integer-like vector-shaped Tensor representing the event shape of the transformed output. |
event_shape_in |
An optional integer-like vector-shape Tensor representing the event shape of the input. This is required in order to define inverse operations; the default of list(-1) assumes a vector-shaped input. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X; scale) = scale * X
.Examples:
Y <- 2 * X b <- tfb_scale(scale = 2)
tfb_scale( scale = NULL, log_scale = NULL, validate_args = FALSE, name = "scale" )
tfb_scale( scale = NULL, log_scale = NULL, validate_args = FALSE, name = "scale" )
scale |
Floating-point |
log_scale |
Floating-point |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X; scale) = scale @ X
In TF parlance, the scale
term is logically equivalent to:
scale = tf$diag(scale_diag)
The scale
term is applied without materializing a full dense matrix.
tfb_scale_matvec_diag( scale_diag, adjoint = FALSE, validate_args = FALSE, name = "scale_matvec_diag", dtype = NULL )
tfb_scale_matvec_diag( scale_diag, adjoint = FALSE, validate_args = FALSE, name = "scale_matvec_diag", dtype = NULL )
scale_diag |
Floating-point |
adjoint |
|
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
dtype |
|
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X; scale) = scale @ X
.scale
is a LinearOperator
.
If X
is a scalar then the forward transformation is: scale * X
where *
denotes broadcasted elementwise product.
tfb_scale_matvec_linear_operator( scale, adjoint = FALSE, validate_args = FALSE, name = "scale_matvec_linear_operator" )
tfb_scale_matvec_linear_operator( scale, adjoint = FALSE, validate_args = FALSE, name = "scale_matvec_linear_operator" )
scale |
Subclass of |
adjoint |
|
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This bijector is identical to the "Convolution1x1" used in Glow (Kingma and Dhariwal, 2018).
tfb_scale_matvec_lu( lower_upper, permutation, validate_args = FALSE, name = NULL )
tfb_scale_matvec_lu( lower_upper, permutation, validate_args = FALSE, name = NULL )
lower_upper |
The LU factorization as returned by |
permutation |
The LU factorization permutation as returned by |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X; scale) = scale @ X
.The scale
term is presumed lower-triangular and non-singular (ie, no zeros
on the diagonal), which permits efficient determinant calculation (linear in
matrix dimension, instead of cubic).
tfb_scale_matvec_tri_l( scale_tril, adjoint = FALSE, validate_args = FALSE, name = "scale_matvec_tril", dtype = NULL )
tfb_scale_matvec_tri_l( scale_tril, adjoint = FALSE, validate_args = FALSE, name = "scale_matvec_tril", dtype = NULL )
scale_tril |
Floating-point |
adjoint |
|
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
dtype |
|
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
This is implemented as a simple tfb_chain of tfb_fill_triangular followed by tfb_transform_diagonal, and provided mostly as a convenience. The default setup is somewhat opinionated, using a Softplus transformation followed by a small shift (1e-5) which attempts to avoid numerical issues from zeros on the diagonal.
tfb_scale_tri_l( diag_bijector = NULL, diag_shift = 1e-05, validate_args = FALSE, name = "scale_tril" )
tfb_scale_tri_l( diag_bijector = NULL, diag_shift = 1e-05, validate_args = FALSE, name = "scale_tril" )
diag_bijector |
Bijector instance, used to transform the output diagonal to be positive.
Default value: NULL (i.e., |
diag_shift |
Float value broadcastable and added to all diagonal entries after applying the diag_bijector. Setting a positive value forces the output diagonal entries to be positive, but prevents inverting the transformation for matrices with diagonal entries less than this value. Default value: 1e-5. |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X; shift) = X + shift
.where shift
is a numeric Tensor
.
tfb_shift(shift, validate_args = FALSE, name = "shift")
tfb_shift(shift, validate_args = FALSE, name = "shift")
shift |
floating-point tensor |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = (1 - exp(-rate * X)) * exp(-c * exp(-rate * X))
This bijector maps inputs from [-inf, inf]
to [0, inf]
. The inverse of the
bijector applied to a uniform random variable X ~ U(0, 1)
gives back a
random variable with the
Shifted Gompertz distribution:
Y ~ ShiftedGompertzCDF(concentration, rate) pdf(y; c, r) = r * exp(-r * y - exp(-r * y) / c) * (1 + (1 - exp(-r * y)) / c)
tfb_shifted_gompertz_cdf( concentration, rate, validate_args = FALSE, name = "shifted_gompertz_cdf" )
tfb_shifted_gompertz_cdf( concentration, rate, validate_args = FALSE, name = "shifted_gompertz_cdf" )
concentration |
Positive Float-like |
rate |
Positive Float-like |
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
Note: Even though this is called ShiftedGompertzCDF
, when applied to the
Uniform
distribution, this is not the same as applying a GompertzCDF
with
a Shift
bijector (i.e. the Shifted Gompertz distribution is not the same as
a Gompertz distribution with a location parameter).
Note: Because the Shifted Gompertz distribution concentrates its mass close
to zero, for larger rates or larger concentrations, bijector$forward
will
quickly saturate to 1.
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = 1 / (1 + exp(-X))
ComputesY = g(X) = 1 / (1 + exp(-X))
tfb_sigmoid(validate_args = FALSE, name = "sigmoid")
tfb_sigmoid(validate_args = FALSE, name = "sigmoid")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sinh_arcsinh()
,
tfb_sinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = sinh(X)
.Bijector that computes Y = sinh(X)
.
tfb_sinh(validate_args = FALSE, name = "sinh")
tfb_sinh(validate_args = FALSE, name = "sinh")
validate_args |
Logical, default FALSE. Whether to validate input with asserts. If validate_args is FALSE, and the inputs are invalid, correct behavior is not guaranteed. |
name |
name prefixed to Ops created by this class. |
a bijector instance.
For usage examples see tfb_forward()
, tfb_inverse()
, tfb_inverse_log_det_jacobian()
.
Other bijectors:
tfb_absolute_value()
,
tfb_affine_linear_operator()
,
tfb_affine_scalar()
,
tfb_affine()
,
tfb_ascending()
,
tfb_batch_normalization()
,
tfb_blockwise()
,
tfb_chain()
,
tfb_cholesky_outer_product()
,
tfb_cholesky_to_inv_cholesky()
,
tfb_correlation_cholesky()
,
tfb_cumsum()
,
tfb_discrete_cosine_transform()
,
tfb_expm1()
,
tfb_exp()
,
tfb_ffjord()
,
tfb_fill_scale_tri_l()
,
tfb_fill_triangular()
,
tfb_glow()
,
tfb_gompertz_cdf()
,
tfb_gumbel_cdf()
,
tfb_gumbel()
,
tfb_identity()
,
tfb_inline()
,
tfb_invert()
,
tfb_iterated_sigmoid_centered()
,
tfb_kumaraswamy_cdf()
,
tfb_kumaraswamy()
,
tfb_lambert_w_tail()
,
tfb_masked_autoregressive_default_template()
,
tfb_masked_autoregressive_flow()
,
tfb_masked_dense()
,
tfb_matrix_inverse_tri_l()
,
tfb_matvec_lu()
,
tfb_normal_cdf()
,
tfb_ordered()
,
tfb_pad()
,
tfb_permute()
,
tfb_power_transform()
,
tfb_rational_quadratic_spline()
,
tfb_rayleigh_cdf()
,
tfb_real_nvp_default_template()
,
tfb_real_nvp()
,
tfb_reciprocal()
,
tfb_reshape()
,
tfb_scale_matvec_diag()
,
tfb_scale_matvec_linear_operator()
,
tfb_scale_matvec_lu()
,
tfb_scale_matvec_tri_l()
,
tfb_scale_tri_l()
,
tfb_scale()
,
tfb_shifted_gompertz_cdf()
,
tfb_shift()
,
tfb_sigmoid()
,
tfb_sinh_arcsinh()
,
tfb_softmax_centered()
,
tfb_softplus()
,
tfb_softsign()
,
tfb_split()
,
tfb_square()
,
tfb_tanh()
,
tfb_transform_diagonal()
,
tfb_transpose()
,
tfb_weibull_cdf()
,
tfb_weibull()
Y = g(X) = Sinh( (Arcsinh(X) + skewness) * tailweight )
For skewness in (-inf, inf)
and tailweight in (0, inf)
, this
transformation is a diffeomorphism of the real line (-inf, inf)
.
The inverse transform is X = g^{-1}(Y) = Sinh( ArcSinh(Y) / tailweight - skewness )
.
The SinhArcsinh transformation of the Normal is described in
Sinh-arcsinh distributions
tfb_sinh_arcsinh( skewness = NULL, tailweight = NULL, validate_args = FALSE, name = "SinhArcsinh" )
tfb_sinh_arcsinh( skewness = NULL, tailweight = NULL, validate_args = FALSE, name = "SinhArcsinh" )
skewness |
Skewness parameter. Float-type Ten |