returnn.frontend.init

Common parameter initialization functions.

https://github.com/rwth-i6/returnn/wiki/Parameter-initialization

class returnn.frontend.init.ParamInit[source]

API for param init

class returnn.frontend.init.Normal(stddev: float, *, truncated: bool = True, dtype: str | None = None)[source]

Initialization by normal distribution (truncated by default), independent of the dimensions (fan in/out).

See VarianceScaling and derivatives for variants which depend on fan in/out.

class returnn.frontend.init.VarianceScaling(scale: float | None = None, mode: str | None = None, distribution: str | None = None, dtype: str | None = None)[source]

Provides a generalized way for initializing weights. All the common initialization methods are special cases such as Xavier Glorot and Kaiming He.

Code adopted from TensorFlow VarianceScaling.

scale = 1.0[source]
mode = 'fan_in'[source]
distribution = 'truncated_normal'[source]
dtype: str[source]
class returnn.frontend.init.Glorot(scale: float | None = None, mode: str | None = None, distribution: str | None = None, dtype: str | None = None)[source]

Xavier Glorot (http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf). scale 1, fan_avg, uniform

scale = 1.0[source]
mode = 'fan_avg'[source]
distribution = 'uniform'[source]
class returnn.frontend.init.He(scale: float | None = None, mode: str | None = None, distribution: str | None = None, dtype: str | None = None)[source]

Kaiming He (https://arxiv.org/pdf/1502.01852.pdf). scale 2, fan_in, normal

scale = 2.0[source]
mode = 'fan_in'[source]
distribution = 'normal'[source]
returnn.frontend.init.HeNormal[source]

alias of He

class returnn.frontend.init.HeUniform(scale: float | None = None, mode: str | None = None, distribution: str | None = None, dtype: str | None = None)[source]

He-init (He) but using a uniform distribution. scale 2, fan_in, uniform

distribution = 'uniform'[source]