NetworkHiddenLayer

class NetworkHiddenLayer.AdaptiveDepthLayer(eps=0.01, tau=0.01, bias=-1.0, damping='graves', **kwargs)[source]
cost()[source]
cost_scale()[source]
layer_class = 'adaptive_depth'[source]
class NetworkHiddenLayer.AddZeroRowsLayer(row_index, number=1, **kwargs)[source]
layer_class = 'add_zero_rows'[source]
class NetworkHiddenLayer.AlignmentLayer(direction='inv', tdps=None, nstates=1, nstep=1, min_skip=0, max_skip=30, search='align', train_skips=False, base=None, output_attention=False, output_z=False, reduce_output=True, blank=False, **kwargs)[source]
cost()[source]
errors()[source]
layer_class = 'align'[source]
class NetworkHiddenLayer.AttentionLayer(base, conv_x=None, conv_y=None, **kwargs)[source]
layer_class = 'attention'[source]
class NetworkHiddenLayer.AttentionReshapeLayer(conf=0.3, pad=1, cap=1, **kwargs)[source]
layer_class = 'attention_reshape'[source]
class NetworkHiddenLayer.AttentionVectorLayer(base, template, **kwargs)[source]
cost()[source]
layer_class = 'attention_vector'[source]
class NetworkHiddenLayer.BaseInterpolationLayer(base=None, method='softmax', output_weights=False, **kwargs)[source]
layer_class = 'base'[source]
class NetworkHiddenLayer.BatchToTimeLayer(base, **kwargs)[source]
layer_class = 'batch_to_time'[source]
class NetworkHiddenLayer.BinOpLayer(op=None, n_out=None, **kwargs)[source]
static get_bin_op(op)[source]
Return type:theano.Op
layer_class = 'bin_op'[source]
class NetworkHiddenLayer.BlurLayer(ctx=5, p=1.0, **kwargs)[source]
RandomStreams[source]

alias of MRG_RandomStreams

layer_class = 'blur'[source]
rng = <theano.sandbox.rng_mrg.MRG_RandomStreams object>[source]
class NetworkHiddenLayer.CAlignmentLayer(direction='inv', tdps=None, nstates=1, nstep=1, min_skip=1, max_skip=30, search='align', train_skips=False, train_emission=False, clip_emission=1.0, base=None, coverage=0, output_z=False, reduce_output=True, blank=None, nil=None, focus='last', mode='viterbi', **kwargs)[source]
cost()[source]
errors()[source]
layer_class = 'calign'[source]
class NetworkHiddenLayer.CalcStepLayer(n_out=None, from_prev='', apply=False, step=None, initial='zero', **kwargs)[source]
layer_class = 'calc_step'[source]
class NetworkHiddenLayer.ChunkingLayer(chunk_size=1, method='concat', **kwargs)[source]
layer_class = 'chunking'[source]
class NetworkHiddenLayer.ChunkingSublayer(n_out, sublayer, chunk_size, chunk_step, chunk_distribution='uniform', add_left_context=0, add_right_context=0, normalize_output=True, trainable=False, **kwargs)[source]
cost()[source]
layer_class = 'chunking_sublayer'[source]
make_constraints()[source]
recurrent = True[source]
class NetworkHiddenLayer.ClippingLayer(sparse_window=1, **kwargs)[source]
layer_class = 'clip'[source]
class NetworkHiddenLayer.ClusterDependentSubnetworkLayer(n_out, subnetwork, n_clusters, load='<random>', data_map=None, trainable=True, concat_sources=True, **kwargs)[source]
Parameters:
  • n_out (int) – output dimension of output layer
  • network (dict[str,dict]) – subnetwork as dict (JSON content)
  • data_map (list[str]) – maps the sources (from) of the layer to data input. the list should be as long as the sources. default is [“data”], i.e. it expects one source and maps it as data in the subnetwork.
  • load (str) – load string. filename but can have placeholders via str.format. Or “<random>” for no load.
  • trainable (bool) – if we take over all params from the subnetwork
cost()[source]
layer_class = 'clustersubnet'[source]
make_constraints()[source]
recurrent = True[source]
update_cluster_target(seq_tag)[source]
class NetworkHiddenLayer.CollapseLayer(axis=0, **kwargs)[source]
layer_class = 'collapse'[source]
class NetworkHiddenLayer.ConcatBatchLayer(**kwargs)[source]
layer_class = 'concat_batch'[source]
class NetworkHiddenLayer.ConstantLayer(value, n_out, dtype='float32', **kwargs)[source]
layer_class = 'constant'[source]
class NetworkHiddenLayer.ConvPoolLayer(dx, dy, fx, fy, **kwargs)[source]
layer_class = 'convpool'[source]
class NetworkHiddenLayer.CopyLayer(activation=None, **kwargs)[source]

It’s mostly the Identity function. But it will make sparse to non-sparse.

layer_class = 'copy'[source]
class NetworkHiddenLayer.CorruptionLayer(noise='gaussian', p=0.0, clip=False, **kwargs)[source]
RandomStreams[source]

alias of MRG_RandomStreams

layer_class = 'corruption'[source]
rng = <theano.sandbox.rng_mrg.MRG_RandomStreams object>[source]
class NetworkHiddenLayer.DetectionLayer(label_idx, **kwargs)[source]
cost()[source]
layer_class = 'detection'[source]
class NetworkHiddenLayer.DftLayer(dftLength=512, windowName='hamming', flag_useSqrtWindow=False, **kwargs)[source]

This layer is applying the DFT of the input vector. The input is expected to be a segment of the time signal cut out with the rectangular function (so no windowing has been done) The output of the layer is the absolute values of the complex DFT coefficients. Only non negative coefficients are returned because of symmetric spectrum

layer_class = 'dft_layer_abs'[source]
recurrent = True[source]
class NetworkHiddenLayer.DiscriminatorLayer(base=None, pgen=0.5, alpha=1, forge=False, ncritic=0, n_tmp=1, dynamic_scaling=False, error_scaling=False, loss='ce', **kwargs)[source]
cost()[source]
cost_scale()[source]
errors()[source]
layer_class = 'disc'[source]
class NetworkHiddenLayer.DownsampleLayer(factor, axis, method='average', padding=False, sample_target=False, base=None, **kwargs)[source]

E.g. method == “average”, axis == 0, factor == 2 -> each 2 time-frames are averaged. See TheanoUtil.downsample. You can also use method == “max”.

layer_class = 'downsample'[source]
class NetworkHiddenLayer.DualStateLayer(acts='relu', acth='tanh', **kwargs)[source]
layer_class = 'dual'[source]
class NetworkHiddenLayer.DumpLayer(filename, with_grad=True, n_out=None, **kwargs)[source]
global_debug_container = None[source]
layer_class = 'dump'[source]
class NetworkHiddenLayer.DuplicateIndexBatchLayer(**kwargs)[source]
layer_class = 'duplicate_index_batch'[source]
class NetworkHiddenLayer.EmbeddingLayer(**kwargs)[source]
layer_class = 'embedding'[source]
class NetworkHiddenLayer.EnergyNormalization(**kwargs)[source]

This layer expects a (chunkted) time signal at the input. It normalizes the signal energy of the input chunk.

layer_class = 'energy_normalization_layer'[source]
recurrent = True[source]
class NetworkHiddenLayer.ErrorsLayer(target, **kwargs)[source]
errors()[source]
Return type:theano.Variable
layer_class = 'errors'[source]
class NetworkHiddenLayer.FAlignmentLayer(direction='inv', tdps=None, nstates=1, nstep=1, min_skip=1, max_skip=10, search='align', train_skips=False, base=None, output_attention=False, output_z=False, reduce_output=True, blank=False, focus='last', mode='viterbi', **kwargs)[source]
layer_class = 'falign'[source]
make_tdps(tdps, max_skip)[source]
class NetworkHiddenLayer.FStdAlignmentLayer(direction='inv', base=None, nstates=3, skip_tdp=0, **kwargs)[source]
cost()[source]
errors()[source]
layer_class = 'fstdalign'[source]
class NetworkHiddenLayer.ForwardLayer(sparse_window=1, **kwargs)[source]
layer_class = 'hidden'[source]
class NetworkHiddenLayer.FrameConcatZeroLayer(num_frames, left=True, **kwargs)[source]

Concats zero at the start (left=True) or end in the time-dimension. I.e. you can e.g. delay the input by N frames. See also FrameConcatZeroLayer (frame_cutoff).

layer_class = 'frame_concat_zero'[source]
class NetworkHiddenLayer.FrameCutoffLayer(num_frames, left=True, **kwargs)[source]

Cutoffs frames at the start (left=True) or end in the time-dimension. You should use this when you used FrameConcatZeroLayer(frame_concat_zero).

layer_class = 'frame_cutoff'[source]
class NetworkHiddenLayer.GaussianFilter1DLayer(sigma, axis, window_radius=40, **kwargs)[source]
layer_class = 'gaussian_filter_1d'[source]
recurrent = True[source]
class NetworkHiddenLayer.GenericCodeLayer(code, n_out, **kwargs)[source]
Parameters:code (str) – generic Python code used for eval(). must return some output
layer_class = 'generic_code'[source]
class NetworkHiddenLayer.HDF5DataLayer(filename, dset, **kwargs)[source]
layer_class = 'hdf5'[source]
recurrent = True[source]
class NetworkHiddenLayer.HiddenLayer(activation='sigmoid', **kwargs)[source]
get_linear_forward_output(with_bias=True, sources=None)[source]
class NetworkHiddenLayer.IndexToVecLayer(n_out, **kwargs)[source]
layer_class = 'idx_to_vec'[source]
class NetworkHiddenLayer.InputBase(**kwargs)[source]
layer_class = 'input_base'[source]
class NetworkHiddenLayer.InterpolationLayer(n_out, **kwargs)[source]
layer_class = 'interp'[source]
class NetworkHiddenLayer.InvAlignSegmentationLayer(window=0, win=20, base=None, **kwargs)[source]
layer_class = 'invalignsegment'[source]
class NetworkHiddenLayer.InvAlignSegmentationLayer2(window=0, win=20, base=None, join_states=False, **kwargs)[source]
find_diff_array(att)[source]
layer_class = 'invalignsegment2'[source]
set_yout(y_out, diff, maxdiff)[source]
class NetworkHiddenLayer.InvBacktrackLayer(direction='inv', tdps=None, nstates=1, nstep=1, min_skip=1, max_skip=30, search='align', train_skips=False, train_emission=False, clip_emission=1.0, base=None, coverage=0, output_z=False, reduce_output=True, blank=None, nil=None, focus='last', mode='viterbi', **kwargs)[source]
cost()[source]
errors()[source]
layer_class = 'ibt'[source]
class NetworkHiddenLayer.KernelLayer(kernel='gauss', base=None, sigma=4.0, **kwargs)[source]
layer_class = 'kernel'[source]
class NetworkHiddenLayer.LengthLayer(min_len=0.0, max_len=1.0, use_real=0.0, err='ce', oracle=False, pad=0, **kwargs)[source]
cost()[source]
cost_scale()[source]
layer_class = 'length'[source]
class NetworkHiddenLayer.LengthProjectionLayer(use_real=1.0, oracle=True, eval_oracle=False, pad=0, smo=0.0, avg=10.0, method='mapq', **kwargs)[source]
cost()[source]
cost_scale()[source]
errors()[source]
layer_class = 'length_projection'[source]
class NetworkHiddenLayer.LengthUnitLayer(min_len=1, max_len=32, **kwargs)[source]
layer_class = 'length_unit'[source]
class NetworkHiddenLayer.LinearCombLayer(n_out, n_comb, activation=None, **kwargs)[source]

Linear combination of each n_comb elements with bias.

layer_class = 'linear_comb'[source]
class NetworkHiddenLayer.LossLayer(loss, copy_input=None, **kwargs)[source]
Parameters:
  • index (theano.Variable) – index for batches
  • loss (str) – e.g. ‘ce’
layer_class = 'loss'[source]
class NetworkHiddenLayer.MfccLayer(dftSize=512, samplingFrequency=16000.0, fl=0, fh=None, nrOfFilters=40, nrOfMfccCoefficients=None, **kwargs)[source]

The source layer of this layer should be the DftLayer

batch_norm(h, dim, use_shift=False, use_std=False, use_sample=0.0, force_sample=True, index=None, **kwargs)[source]

overwrite function from Layer to change default parameters of batch_norm: use_shift, use_std and foce_sample

getMfccFilterMatrix(samplingFrequency, fl, fh, dftSize, nrOfFilters, flag_areaNormalized=0)[source]

returns the filter bank matrix used for the MFCCs For mathematical details see the book “speech language processing” by Huang et. al. pp. 314

#TBD !!! :type dftSize: int :param dftSize: size of dft :type nrOfFilters: int :param nrOfFilters: the number of filters used for the filterbank :type flag_areaNormalized: int :param flag_areaNormalized: flag that specifies which filter bank will be returned

0 - not normalized filter bank 1 - normalized filter bank where each filter covers an area of 1
invMelScale(melVal)[source]

returns the respective value in the frequency domain

Parameters:melVal (float) – value in mel domain
Return type:float
layer_class = 'mfcc_layer'[source]
melScale(freq)[source]

returns the respective value on the mel scale

Parameters:freq (float) – frequency value to transform onto mel scale
Return type:float
recurrent = True[source]
class NetworkHiddenLayer.NativeLayer(n_out, native_class, params, **kwargs)[source]
layer_class = 'native'[source]
recurrent = True[source]
class NetworkHiddenLayer.PolynomialExpansionLayer(n_degree, n_out=None, **kwargs)[source]
layer_class = 'polynomial_expansion'[source]
class NetworkHiddenLayer.Preemphasis(alpha=1.0, **kwargs)[source]

This layer is expecting a time signal as input and applying the preemphasis to the segment. (This is not completely correct application of preemphasis, since the first element of the segment does not know its predecessor in the time signal, therefore the effect is different than applying preemphasis on the complete signal beforehand)

layer_class = 'preemphasis_layer'[source]
recurrent = True[source]
class NetworkHiddenLayer.RBFLayer(n_out, **kwargs)[source]

Use radial basis function.

layer_class = 'rbf'[source]
class NetworkHiddenLayer.RNNBlockLayer(num_layers=1, direction=0, **kwargs)[source]
cost()[source]
errors()[source]
layer_class = 'rnnblock'[source]
recurrent = True[source]
class NetworkHiddenLayer.RandomRouteLayer(p=None, test_route=-1, n_out=None, **kwargs)[source]
layer_class = 'random_route'[source]
class NetworkHiddenLayer.RandomSelectionLayer(n_out, **kwargs)[source]
layer_class = 'random_selection'[source]
class NetworkHiddenLayer.RemoveRowsLayer(row_index, number=1, **kwargs)[source]
layer_class = 'remove_rows'[source]
class NetworkHiddenLayer.ReshapeLayer(base=None, **kwargs)[source]
layer_class = 'reshape'[source]
class NetworkHiddenLayer.ReverseAttentionLayer(base=None, **kwargs)[source]
layer_class = 'reverse_attention'[source]
class NetworkHiddenLayer.ReverseLayer(**kwargs)[source]

Reverses the time-dimension.

layer_class = 'reverse'[source]
class NetworkHiddenLayer.RoutingLayer(base, p=0.5, oracle=False, **kwargs)[source]
cost()[source]
cost_scale()[source]
layer_class = 'signal_router'[source]
class NetworkHiddenLayer.ScaleGradLayer(scale=1.0, disconnect=False, **kwargs)[source]
layer_class = 'scale_grad'[source]
class NetworkHiddenLayer.ScaleGradientOp(scale)[source]
grad(input, output_gradients)[source]
make_node(x)[source]
perform(node, inputs, output_storage)[source]
view_map = {0: [0]}[source]
class NetworkHiddenLayer.SegmentClassTargets(num_classes, window=15, **kwargs)[source]
class BuildClassesOp[source]
itypes = (TensorType(int32, scalar), TensorType(int32, scalar), TensorType(int32, matrix), TensorType(int8, matrix))[source]
otypes = (TensorType(float32, 3D), TensorType(int8, matrix))[source]
perform(node, inputs, output_storage)[source]
SegmentClassTargets.layer_class = 'segment_class_targets'[source]
class NetworkHiddenLayer.SegmentFinalStateLayer(base=None, use_full_label=False, **kwargs)[source]
layer_class = 'segfinal'[source]
class NetworkHiddenLayer.SegmentInputLayer(window=15, **kwargs)[source]
class ReinterpretCastOp[source]
itypes = (TensorType(int32, matrix),)[source]
otypes = (TensorType(float32, matrix),)[source]
perform(node, inputs, output_storage)[source]
SegmentInputLayer.layer_class = 'segment_input'[source]
class NetworkHiddenLayer.SegmentLayer(**kwargs)[source]
layer_class = 'segment'[source]
class NetworkHiddenLayer.SharedForwardLayer(base=None, sparse_window=1, **kwargs)[source]
layer_class = 'hidden_shared'[source]
class NetworkHiddenLayer.SigmoidToTanhLayer(**kwargs)[source]
layer_class = 'sigmoid_to_tanh'[source]
class NetworkHiddenLayer.SignalSplittingLayer(base, p=0.5, oracle=False, **kwargs)[source]
cost()[source]
cost_scale()[source]
layer_class = 'signal_splitter'[source]
class NetworkHiddenLayer.SignalValue(begin=0, sidx=0, risk=0.1, margin=0.0, copy_output=None, **kwargs)[source]
cost()[source]
cost_scale()[source]
errors()[source]
layer_class = 'sigval'[source]
class NetworkHiddenLayer.SourceAttentionLayer(base, n_tmp=64, **kwargs)[source]
layer_class = 'source_attention'[source]
class NetworkHiddenLayer.SplitBatchLayer(n_parts=1, part=0, **kwargs)[source]
layer_class = 'split_batch'[source]
class NetworkHiddenLayer.StateAlignmentLayer(target, prior_scale=0.0, **kwargs)[source]
layer_class = 'state_alignment'[source]
class NetworkHiddenLayer.StateToAct(dual=False, **kwargs)[source]
layer_class = 'state_to_act'[source]
class NetworkHiddenLayer.StateVector(output_activation='identity', idx=-1, **kwargs)[source]
layer_class = 'state_vector'[source]
class NetworkHiddenLayer.SubnetworkLayer(n_out, subnetwork, load='<random>', data_map=None, trainable=True, concat_sources=True, **kwargs)[source]
Parameters:
  • n_out (int) – output dimension of output layer
  • network (dict[str,dict]) – subnetwork as dict (JSON content)
  • data_map (list[str]) – maps the sources (from) of the layer to data input. the list should be as long as the sources. default is [“data”], i.e. it expects one source and maps it as data in the subnetwork.
  • concat_sources (bool) – if we concatenate all sources into one, like it is standard for most other layers
  • load (str) – load string. filename but can have placeholders via str.format. Or “<random>” for no load.
  • trainable (bool) – if we take over all params from the subnetwork
cost()[source]
layer_class = 'subnetwork'[source]
make_constraints()[source]
recurrent = True[source]
class NetworkHiddenLayer.SumLayer(**kwargs)[source]
layer_class = 'sum'[source]
class NetworkHiddenLayer.TanhToSigmoidLayer(**kwargs)[source]
layer_class = 'tanh_to_sigmoid'[source]
class NetworkHiddenLayer.TimeBlurLayer(t_start, t_end, t_step, distribution, **kwargs)[source]
layer_class = 'time_blur'[source]
recurrent = True[source]
class NetworkHiddenLayer.TimeChunkingLayer(n_out, chunk_size, chunk_step, **kwargs)[source]
layer_class = 'time_chunking'[source]
class NetworkHiddenLayer.TimeConcatLayer(**kwargs)[source]
layer_class = 'time_concat'[source]
class NetworkHiddenLayer.TimeFlatLayer(chunk_size, chunk_step, **kwargs)[source]
layer_class = 'time_flat'[source]
class NetworkHiddenLayer.TimeShift(base=None, n_shift=1, **kwargs)[source]
layer_class = 'time_shift'[source]
class NetworkHiddenLayer.TimeToBatchLayer(**kwargs)[source]
layer_class = 'time_to_batch'[source]
class NetworkHiddenLayer.TimeUnChunkingLayer(n_out, chunking_layer, **kwargs)[source]
layer_class = 'time_unchunking'[source]
class NetworkHiddenLayer.TimeWarpGlobalLayer(n_out=None, renorm_time=True, window_size=30, sigma2=0.5, **kwargs)[source]

Similar to TimeWarpLayer but different. This warp is cumulative and applied globally.

add_var_random_mat(n, m, name)[source]
layer_class = 'time_warp_global'[source]
recurrent = True[source]
class NetworkHiddenLayer.TimeWarpLayer(t_start, t_end, t_step, sigma, input_window, input_proj=None, **kwargs)[source]

Like https://en.wikipedia.org/wiki/Image_warping, controlled by NN. A bit like simple local feed-forward attention, where the attention is controlled by the input (encoder) and not output (decoder). Maybe similar: A Hybrid Dynamic Time Warping-Deep Neural Network Architecture for Unsupervised Acoustic Modeling, http://ewan.website/interspeech_2015_dnn_dtw.pdf Implementation is very similar to TimeBlurLayer except that the weight distribution is different every time frame and controlled by a NN. Note that this warp is applied locally. See also TimeWarpGlobalLayer.

layer_class = 'time_warp'[source]
recurrent = True[source]
class NetworkHiddenLayer.TorchLayer(n_out, lua_fw_func, lua_bw_func, params, lua_file=None, **kwargs)[source]
layer_class = 'torch'[source]
recurrent = True[source]
class NetworkHiddenLayer.TruncationLayer(n_trunc, **kwargs)[source]
layer_class = 'trunc'[source]
class NetworkHiddenLayer.UnsegmentInputLayer(original_output, **kwargs)[source]
class UnsegmentInputOp[source]
itypes = (TensorType(float32, 3D), TensorType(int8, matrix))[source]
otypes = (TensorType(float32, 3D),)[source]
perform(node, inputs, output_storage)[source]
UnsegmentInputLayer.layer_class = 'unsegment_input'[source]
class NetworkHiddenLayer.UpsampleLayer(factor, axis, time_like_last_source=False, method='nearest-neighbor', **kwargs)[source]
layer_class = 'upsample'[source]
class NetworkHiddenLayer.WindowContextLayer(window, average='uniform', scan=False, n_out=None, **kwargs)[source]
layer_class = 'window_context'[source]
class NetworkHiddenLayer.WindowLayer(window, delta=0, delta_delta=0, **kwargs)[source]
layer_class = 'window'[source]
NetworkHiddenLayer.concat_sources(sources, masks=None, mass=None, unsparse=False, expect_source=True)[source]
Parameters:
  • unsparse (bool) – whether to make sparse sources into 1-of-k
  • expect_source (bool) – whether to throw an exception if there is no source

:returns (concatenated sources, out dim) :rtype: (theano.Variable, int)