Basic Layers

Linear Layer

class TFNetworkLayer.LinearLayer(activation, with_bias=True, grad_filter=None, forward_weights_init='glorot_uniform', bias_init=0.0, use_transposed_weights=False, **kwargs)[source]

Linear/forward/fully-connected/1x1-conv layer. Does a linear transformation on the feature-dimension of the input with an optional bias term and an optional activation function. See also DotLayer, ElemwiseProdLayer, WeightedSumLayer.

Parameters:
  • activation (str|None) – e.g. “relu”, or None
  • with_bias (bool) –
  • grad_filter (float|None) – if grad norm is higher than this threshold (before activation), the grad is removed
  • forward_weights_init (str) – see TFUtil.get_initializer()
  • recurrent_weights_init (str) – see TFUtil.get_initializer()
  • bias_init (str|float) – see TFUtil.get_initializer()
  • use_transposed_weights (bool) – If True, define the weight matrix with transposed dimensions (n_out, n_in).
layer_class = 'linear'[source]

Copy Layer

class TFNetworkLayer.CopyLayer(extra_deps=(), **kwargs)[source]

This layer does nothing, it copies its input. If multiple sources are provided, they are concatenated in the feature-dim.

Parameters:extra_deps (list[LayerBase]) – Just add as an additional dependency, without really using it. This can have an effect though on the search beam, via SelectSearchSourcesLayer. We only have this here for the CopyLayer because the get_out_data_from_opts() must know about it and define the right beam. Also see the option collocate_with, which is different in that it does not add a dependency.
layer_class = 'copy'[source]
get_dep_layers(self)[source]
Return type:list[LayerBase]
classmethod get_out_data_from_opts(name, sources=(), extra_deps=(), out_type=None, n_out=<class 'Util.NotSpecified'>, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • extra_deps (list[LayerBase]) –
  • out_type (dict[str]|None) –
  • n_out (int|None|NotSpecified) –
Return type:

Data

classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (TFNetwork.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer

Combine Layer

class TFNetworkLayer.CombineLayer(kind, sources, activation=None, with_bias=False, eval=None, eval_locals=None, eval_for_output_loss=False, **kwargs)[source]

Applies some binary operation on all sources, such as addition. Also see ActivationLayer.

Parameters:
  • kind (str) – e.g. “average” or “add”, or “eval”
  • sources (list[LayerBase]) –
  • activation (str|None) – if provided, activation function to apply, e.g. “tanh” or “relu”
  • with_bias (bool) – if given, will add a bias
  • eval (str|callable) – for kind=”eval”, will eval this string. or function. see _op_kind_eval()
  • eval_locals (dict[str]|None) – locals for eval
  • eval_for_output_loss (bool) – will do the same eval on layer.output_loss
layer_class = 'combine'[source]
classmethod get_out_data_from_opts(n_out=<class 'Util.NotSpecified'>, out_type=None, sources=(), **kwargs)[source]
Parameters:
  • n_out (int|None|NotSpecified) –
  • out_type (dict[str]|None) –
  • sources (list[LayerBase]) –
Return type:

Data

Convolution Layer

class TFNetworkLayer.ConvLayer(n_out, filter_size, padding, strides=1, dilation_rate=1, input_expand_dims=0, input_add_feature_dim=False, input_split_feature_dim=None, auto_use_channel_first=False, with_bias=False, activation=None, forward_weights_init='glorot_uniform', bias_init=0.0, **kwargs)[source]

A generic convolution layer which supports 1D, 2D and 3D convolution. Pooling can be done in the separate “pool” layer.

Parameters:
  • n_out (int) – number of outgoing features
  • filter_size (tuple[int]) – (width,), (height,width) or (depth,height,width) for 1D/2D/3D conv. the input data ndim must match, or you can add dimensions via input_expand_dims or input_add_feature_dim. it will automatically swap the batch-dim to the first axis of the input data.
  • padding (str) – “same” or “valid”
  • strides (int|tuple[int]) – strides for the spatial dims, i.e. length of this tuple should be the same as filter_size, or a single int.
  • dilation_rate (int|tuple[int]) – dilation for the spatial dims
  • input_expand_dims (int) – number of dynamic dims to add to the input
  • input_add_feature_dim (bool) – will add a dim at the end and use input-feature-dim == 1, and use the original input feature-dim as a spatial dim.
  • auto_use_channel_first (bool) – convert the input to NCHW or not
  • input_split_feature_dim (None|int) – if set, like input_add_feature_dim it will add a new feature dim which is of value input_split_feature_dim, and the original input feature dim will be divided by input_split_feature_dim, thus it must be a multiple of that value.
  • with_bias (bool) – if True, will add a bias to the output features
  • activation (None|str) – if set, will apply this function at the end
layer_class = 'conv'[source]
recurrent = True[source]
classmethod calc_out_dim(in_dim, filter_size, stride, padding, dilation_rate=1)[source]
Parameters:
  • in_dim (int|tf.Tensor|T) – dimension in some axis
  • filter_size (int) – e.g. 2, for the corresponding axis
  • stride (int) – e.g. 1, for the corresponding axis
  • dilation_rate (int) – e.g. 1
  • padding (str) – “valid” or “same”
Returns:

the output dimension

Return type:

T

classmethod get_out_data_from_opts(**kwargs)[source]

Via _get_out_type_from_opts().

Return type:Data

Pooling Layer

class TFNetworkLayer.PoolLayer(mode, pool_size, padding='VALID', dilation_rate=1, strides=None, use_channel_first=False, **kwargs)[source]

A generic N-D pooling layer. This would usually be done after a convolution for down-sampling.

Parameters:
  • mode (str) – “max” or “avg”
  • pool_size (tuple[int]) – shape of the window of each reduce
  • padding (str) – “valid” or “same”
  • dilation_rate (tuple[int]|int) –
  • strides (tuple[int]|int|None) – in contrast to tf.nn.pool, the default (if it is None) will be set to pool_size
  • use_channel_first (bool) – if set, will transform input to NCHW format
layer_class = 'pool'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, pool_size, strides=None, dilation_rate=1, sources=(), padding='VALID', use_channel_first=False, **kwargs)[source]
Parameters:
  • name (str) –
  • pool_size (tuple[int]|list[int]) –
  • strides (tuple[int]|list[int]|int) –
  • dilation_rate (int|tuple[int]|list[int]) –
  • sources (list[LayerBase]) –
  • padding (str) –
  • use_channel_first (bool) –
Return type:

Data

Reduce Layer

class TFNetworkLayer.ReduceLayer(mode, axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, use_time_mask=None, **kwargs)[source]

This reduces some axis by using “sum” or “max”. It’s basically a wrapper around tf.reduce_sum or tf.reduce_max.

Parameters:
  • mode (str) – “sum” or “max”, “argmin”, “min”, “argmin”, or “mean”
  • axes (int|list[int]|str) – One axis or multiple axis to reduce. It accepts the special tokens “B”|”batch”, “spatial”, “spatial_except_time”, or “F”|”feature”, and it is strongly recommended to use some of these symbolic names. See Data.get_axes_from_description().
  • axis (int|list[int]|str) – for compatibility, can be used instead of axes
  • keep_dims (bool) – if dimensions should be kept (will be 1)
  • enforce_batch_dim_axis (int) – will swap the batch-dim-axis of the input with the given axis. e.g. 0: will convert the input into batch-major format if not already like that. Note that this is still not enough in some cases, e.g. when the other axes are also not as expected. The strong recommendation is to use a symbolic axis description.
  • use_time_mask (bool) – if we reduce over the time-dim axis, use the seq len info. By default, in that case, it will be True.
layer_class = 'reduce'[source]
classmethod need_enforce_batch_dim_axis(axes)[source]
Parameters:axes (int|list[int]|str) –
Returns:if any integer is in axes, thus we should have a fixed dimension layout
Return type:bool
classmethod get_axes(axis, input_data)[source]
Parameters:
  • axis – see self.__init__()
  • input_data (Data) –
Returns:

list of axes

Return type:

list[int]

classmethod get_out_data_from_opts(name, sources, mode='', axes=None, axis=None, keep_dims=False, enforce_batch_dim_axis=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • mode (str) – (default here “” because other code uses this function)
  • axes (str|list[str]|None) –
  • axis (str|None) –
  • keep_dims (bool) –
  • enforce_batch_dim_axis (int|None) –
Return type:

Data

Reduce-Out Layer

class TFNetworkLayer.ReduceOutLayer(mode, num_pieces, **kwargs)[source]

Combination of SplitDimsLayer applied to the feature dim and ReduceLayer applied to the resulting feature dim. This can e.g. be used to do maxout.

Parameters:
  • mode (str) – “sum” or “max” or “mean”
  • num_pieces (int) – how many elements to reduce. The output dimension will be input.dim // num_pieces.
layer_class = 'reduce_out'[source]
classmethod get_out_data_from_opts(num_pieces, sources, name, **kwargs)[source]
Parameters:
  • num_pieces (int) –
  • sources (list[LayerBase]) –
  • name (str) –
Return type:

Data

Dot Layer

class TFNetworkLayer.DotLayer(red1=-1, red2=-2, var1=-2, var2=-1, add_var2_if_empty=True, debug=False, **kwargs)[source]

This performs a dot-product of two sources. The underlying matmul expects shapes (shared…, I, J) * (shared…, J, K) -> (shared…, I, K). We say that J is the axis to be reduced, I is the var-dim of source 1, and K is the var-dim of source 2. I, J, K can also be multiple axes from the sources. The var-dims don’t need to exist. All other axes (shared…) are expected to match.

Parameters:
  • red1 (str|int|tuple[str|int]|list[str|int]) – reduce axes of first source
  • red2 (str|int|tuple[str|int]|list[str|int]) – reduce axes of second source
  • var1 (str|int|tuple[str|int]|list[str|int]|None) – var axes of first source
  • var2 (str|int|tuple[str|int]|list[str|int]|None) – var axes of second source
  • add_var2_if_empty (bool) – if var2=None, add dim=1 at the end
  • debug (bool) – will print debug shapes, etc.
layer_class = 'dot'[source]
classmethod get_out_data_from_opts(name, sources, red1=-1, red2=-2, var1=-2, var2=-1, add_var2_if_empty=True, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • red1 (str|int|tuple[str|int]|list[str|int]) – reduce axes of first source
  • red2 (str|int|tuple[str|int]|list[str|int]) – reduce axes of second source
  • var1 (str|int|tuple[str|int]|list[str|int]|None) – var axes of first source
  • var2 (str|int|tuple[str|int]|list[str|int]|None) – var axes of second source
  • add_var2_if_empty (bool) –
Return type:

Data

Constant Layer

class TFNetworkLayer.ConstantLayer(sources, value=0.0, dtype=None, **kwargs)[source]

Output is a constant value.

Parameters:
  • sources (list[LayerBase]) –
  • value (int|float|bool) –
  • dtype (str|None) –
layer_class = 'constant'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (TFNetwork.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, value=0.0, dtype=None, **kwargs)[source]
Parameters:
  • name (str) –
  • value (int|float|bool) –
  • dtype (str|None) –
Return type:

Data

Variable Layer

class TFNetworkLayer.VariableLayer(shape, dtype='float32', add_batch_axis=True, add_time_axis=False, trainable=True, init=0, **kwargs)[source]

Represents a variable. Can add batch/time dimension if wanted. Can be trainable. See defaults.

Parameters:
  • shape (tuple[int]|list[int]) –
  • dtype (str) –
  • add_batch_axis (bool) –
  • add_time_axis (bool) –
  • trainable (bool) –
  • init (str|float|int) – see TFUtil.get_initializer()
layer_class = 'variable'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (TFNetwork.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, shape, dtype='float32', add_batch_axis=True, add_time_axis=False, **kwargs)[source]
Parameters:
  • name (str) –
  • shape (tuple[int]|list[int]) –
  • dtype (str) –
  • add_batch_axis (bool) –
  • add_time_axis (bool) –
Return type:

Data

Activation Layer

class TFNetworkLayer.ActivationLayer(activation, **kwargs)[source]

This layer just applies an activation function. See TFUtil.get_activation_function() about supported functions. Also see EvalLayer and CombineLayer for similar layers.

Parameters:activation (str) – e.g. “relu”, “tanh”, etc
layer_class = 'activation'[source]

Gating Layer

class TFNetworkLayer.GatingLayer(activation, gate_activation='sigmoid', **kwargs)[source]

Splits the output into two equal parts, applies the gate_activation (sigmoid by default) on the one part, some other activation (e.g. tanh) on the other part and then element-wise multiplies them. Thus, the output dimension is input-dimension / 2.

layer_class = 'gating'[source]
classmethod get_out_data_from_opts(name, sources, n_out=<class 'Util.NotSpecified'>, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • n_out (int|None|NotSpecified) –
Return type:

Data

Window Layer

class TFNetworkLayer.WindowLayer(window_size, window_left=None, window_right=None, axis='T', padding='same', **kwargs)[source]

Adds a window dimension. By default, uses the time axis and goes over it with a sliding window. The new axis for the window is created right after the time axis. Will always return as batch major mode. E.g. if the input is (batch, time, dim), the output is (batch, time, window_size, dim). If you want to merge the (window_size, dim) together to (window_size * dim,), you can use the MergeDimsLayer, e.g. {“class”: “merge_dims”, “axes”: “except_time”}.

This is not to take out a window from the time-dimension. See SliceLayer or SliceNdLayer.

Parameters:
  • window_size (int) –
  • window_left (int|None) –
  • window_right (int|None) –
  • axis (str|int) – see Data.get_axis_from_description()
  • padding (str) – “same” or “valid”
  • kwargs
layer_class = 'window'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, window_size, axis='T', sources=(), **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • window_size (int) –
  • axis (str) –
Return type:

Data

classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, window_size, axis='T', sources=(), **kwargs)[source]
Parameters:
  • batch_dim (tf.Tensor) –
  • rec_layer (TFNetworkRecLayer.RecLayer|LayerBase) –
  • window_size (int) –
  • axis (str) –
  • sources (list[LayerBase]) –
Return type:

dict[str,tf.Tensor]

Length Layer

class TFNetworkLayer.LengthLayer(add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]

Returns the length of sources as (B,), via input size_placeholder.

Parameters:
  • add_time_axis (bool) –
  • dtype (str) –
  • sparse (bool) –
layer_class = 'length'[source]
classmethod get_out_data_from_opts(name, sources, add_time_axis=False, dtype='int32', sparse=False, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • add_time_axis (bool) –
  • dtype (str) –
  • sparse (bool) –
Return type:

Data

Weighted Sum Layer

class TFNetworkLayer.WeightedSumLayer(axes, padding=None, size=None, keep_dims=None, **kwargs)[source]

Calculates a weighted sum, either over a complete axis of fixed dimension, or over some window. Can also do that for multiple axes. The weights are a trainable parameter matrix. Similar would be to use ElemwiseProdLayer and ReduceLayer, or just a DotLayer with a VariableLayer. See also LinearLayer.

Parameters:
  • axes (str|list[str]) – the axes to do the weighted-sum over
  • padding (str) – “valid” or “same”, in case of keep_dims=True
  • size (None|tuple[int]) – the kernel-size. if left away, the axes must be of fixed dimension, and we will use keep_dims=False, padding=”valid” by default. Otherwise, if given, you must also provide padding and keep_dims=True by default.
  • keep_dims (bool) – if False, the axes will be squeezed away. see also size.
layer_class = 'weighted_sum'[source]
classmethod get_out_data_from_opts(name, sources, axes, padding=None, size=None, keep_dims=None, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axes (str|list[str]) –
  • padding (str|None) –
  • size (None|tuple[int]) –
  • keep_dims (bool|None) –
Return type:

Data

Cumulative Sum Layer

class TFNetworkLayer.CumsumLayer(axis='T', additional_left_summand_per_element=None, reverse=False, **kwargs)[source]

Basically wraps tf.cumsum. Also supports that in the RecLayer.

Parameters:
  • axis (str) – see Data.get_axis_from_description()
  • additional_left_summand_per_element (str|int|float|None) – the order matters for tf.string
  • reverse (bool) –
layer_class = 'cumsum'[source]
recurrent = True[source]
classmethod get_out_data_from_opts(name, sources, axis='T', **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
  • axis (str) –
Return type:

Data

classmethod get_rec_initial_extra_outputs(batch_dim, rec_layer, axis='T', sources=(), **kwargs)[source]
Parameters:
  • batch_dim (tf.Tensor) –
  • rec_layer (TFNetworkRecLayer.RecLayer|LayerBase) –
  • axis (str) –
  • sources (list[LayerBase]) –
Return type:

dict[str,tf.Tensor]

Elementwise Product Layer

class TFNetworkLayer.ElemwiseProdLayer(axes, size=None, **kwargs)[source]

Element-wise product in some axes. Microsoft calls this “static attention”, in Deep Conv. NN with Layer-wise Context Expansion and Attention (LACE). The matrix/tensor to be used for the product are given as a trainable parameter. See also LinearLayer.

Parameters:
  • axes (str|list[str]) – e.g. “spatial”, but all those axes must be of fixed dimension
  • size (tuple[int]) – for double-checking, you can explicitly provide the size
layer_class = 'elemwise_prod'[source]
classmethod get_out_data_from_opts(name, sources, **kwargs)[source]
Parameters:
  • name (str) –
  • sources (list[LayerBase]) –
Return type:

Data

Accumulate Mean Layer

class TFNetworkLayer.AccumulateMeanLayer(exp_average, axes='bt', initial_value=None, is_prob_distribution=None, **kwargs)[source]

Accumulates the mean of the input (in training) (over batch-dim and time-dim by default). It’s similar to ReduceLayer

Parameters:
  • exp_average (float) – momentum in exponential average calculation
  • axes (int|list[str]|str) – the axes to reduce. must contain batch and time.
  • initial_value (float) – how to initialize the variable which accumulates the mean
  • is_prob_distribution (bool) – if provided, better default for initial_value
layer_class = 'accumulate_mean'[source]
classmethod get_out_data_from_opts(axes='bt', **kwargs)[source]
Parameters:axes (str) –
Return type:Data

Switch Layer

class TFNetworkLayer.SwitchLayer(condition, true_from, false_from, **kwargs)[source]

Wrapper around tf.where() (or more generically TFUtil.where_bc()), or statically choose a single source if the condition is a callable (…)->bool. (tf.cond is not useful here, as the sources would have been already constructed and computed.) See also CondLayer.

Parameters:
  • condition (LayerBase|bool) – if callable, expected to be (…)->bool, and called in transform_config_dict
  • true_from (LayerBase|None) –
  • false_from (LayerBase|None) –
layer_class = 'switch'[source]
classmethod transform_config_dict(d, network, get_layer)[source]
Parameters:
  • d (dict[str]) – will modify inplace
  • network (TFNetwork.TFNetwork) –
  • -> LayerBase) get_layer (((str)) – function to get or construct another layer
classmethod get_out_data_from_opts(name, condition, true_from, false_from, **kwargs)[source]
Parameters:
  • name (str) –
  • condition (LayerBase|bool) –
  • true_from (LayerBase|None) –
  • false_from (LayerBase|None) –
Return type:

Data

get_dep_layers(self)[source]
Return type:list[LayerBase]

Compare Layer

class TFNetworkLayer.CompareLayer(kind='equal', value=None, **kwargs)[source]

Compares (e.g. equality check) all the sources element-wise.

Parameters:
  • kind (str) – e.g. “equal”
  • value (float|int|None) – if specified, will also compare to this
layer_class = 'compare'[source]
classmethod get_out_data_from_opts(n_out=<class 'Util.NotSpecified'>, out_type=None, sources=(), **kwargs)[source]
Parameters:
  • n_out (int|None|NotSpecified) –
  • out_type (dict[str]|None) –
  • sources (list[LayerBase]) –
Return type:

Data