returnn.tf.engine

TensorFlow engine

The basic engine for the TensorFlow backend is implemented here, i.e. the high-level logic to train, i.e. looping over epochs, holding the network instance, creating the TensorFlow session, managing the data pipeline, etc.

See Technological Overview for an overview how it fits all together.

exception returnn.tf.engine.CancelTrainingException[source]

Training was cancelled.

class returnn.tf.engine.Runner(engine, dataset_name=None, dataset=None, batches=None, train=False, eval=True, train_flag=None, extra_fetches=None, extra_fetches_callback=None)[source]

This encapsulates the logic around TF session.run, i.e. iterating over the dataset.

Parameters:
  • engine (Engine)

  • dataset_name (str|None) – “train”, “dev” or so

  • dataset (Dataset.Dataset|None)

  • batches (BatchSetGenerator|None)

  • train (bool) – whether to do updates on the model

  • train_flag (bool|None) – normally just as train. but e.g. maybe you want to have the train_flag but not train

  • eval (bool) – whether to evaluate (i.e. calculate loss/error)

  • extra_fetches (dict[str,tf.Tensor|Data|LayerBase|(()->tf.Tensor)]|None) – additional fetches per step. extra_fetches_callback will be called with these. In case of Data/LayerBase, it will return a list, where each item corresponds to the batch-seq. It might also be useful to add network.get_extern_data(“seq_idx”) and network.get_extern_data(“seq_tag”).

  • extra_fetches_callback ((**dict[str,numpy.ndarray|str|list[numpy.ndarray|str])->None) –

    called if extra_fetches

run(report_prefix: str, *, raise_exception: bool = False)[source]
Parameters:
  • report_prefix – prefix for logging, e.g. “train”

  • raise_exception – if True, directly raise any exception; otherwise, will store in

exit_due_to_error()[source]

Exit due to an previous error.

class returnn.tf.engine.Engine(config=None)[source]

TF backend engine.

Parameters:

config (returnn.config.Config|None)

finalize(error_occurred=False)[source]

Finalizes the datasets, TF session, network, graph.

get_const_tensor(key, value)[source]
Parameters:
  • key

  • value

Returns:

tf.constant(value)

Return type:

tf.Tensor

is_requesting_for_gpu()[source]
Return type:

bool

make_tf_session()[source]

Initializes self.tf_session.

get_eval_datasets()[source]
Returns:

dict of datasets used for eval (dev, eval)

Return type:

dict[str,Dataset]

property dev_data[source]
Return type:

Dataset|None

property eval_data[source]
Return type:

Dataset|None

load_model(epoch=None, filename=None)[source]
Parameters:
  • epoch (int)

  • filename (str)

save_model(filename=None)[source]
Parameters:

filename (str) – full filename for model

static delete_model(filename)[source]
Parameters:

filename (str)

Returns:

accumulated file-size in bytes of deleted files

Return type:

int

init_train_from_config(config=None, train_data=None, dev_data=None, eval_data=None)[source]
Parameters:
get_net_dict_for_epoch(*, epoch: int, step: int | None = None, config=None)[source]
Parameters:
Return type:

dict[str]

init_network_from_config(config=None, *, net_dict_post_proc=None)[source]
Parameters:
classmethod create_network(config, rnd_seed, train_flag, eval_flag, search_flag, net_dict, extern_data=None, initial_learning_rate=1.0)[source]
Parameters:
  • config (returnn.config.Config)

  • rnd_seed (int)

  • train_flag (bool|tf.Tensor)

  • initial_learning_rate (float)

  • eval_flag (bool)

  • search_flag (bool)

  • extern_data (ExternData|None)

  • net_dict (dict[str,dict[str]])

Returns:

network, updater

Return type:

(TFNetwork, Updater|None)

need_init_new_network(net_desc=None)[source]
Parameters:

net_desc (dict[str,dict[str]]|None) – layer name -> layer description dict

Return type:

bool

init_new_network(net_desc=None)[source]

Reinitializes the network, and copies over the parameter from the current network.

Parameters:

net_desc (dict[str,dict[str]]|None) – layer name -> layer description dict. use existing by default

train()[source]

Does the whole training, i.e. the loop over all the epochs.

init_train_epoch()[source]

Init for the current train epoch.

train_epoch()[source]

Train a single epoch (self.epoch).

format_score(score)[source]
Parameters:

score (dict[str,float])

Returns:

score(s) as str

Return type:

str

eval_model(output_file=None, output_per_seq_file=None, loss_name=None, output_per_seq_format=None, output_per_seq_file_format='txt', skip_already_evaluated=False, init_seq_order=True, lr_control_update_scores=True)[source]

Eval the current model on the eval datasets (dev + eval, whatever is set). See also self.search() for performing beam search.

Parameters:
  • output_file (str|None) – if given, will save the results to this file (total err/score for each dataset)

  • output_per_seq_file (str|None) – if given, will save the err/score for each sequence

  • loss_name (str|None) – specifies the loss which will be written to output_file

  • output_per_seq_format (list[str]|tuple[str]|None) – which properties of loss_name should be written to output_per_seq_file. allowed_outputs = {“seq_tag”, “seq_len”, “score”, “error”, “pos_score”, “pos_error”}.

  • skip_already_evaluated (bool)

  • init_seq_order (bool)

  • output_per_seq_file_format (str) – “txt” or “py”

  • lr_control_update_scores (bool) – update and save scores in learning rate control

Returns:

nothing

check_last_epoch()[source]

Checks if there are outstanding tasks (eval_model) for the last epoch, and executes them.

check_uninitialized_vars()[source]

All vars in TF which are controlled by us should also have been initialized by us. We also take care about the optimizer slot variables. However, TF can still create other vars which we do not know about. E.g. the Adam optimizer creates the beta1_power/beta2_power vars (which are no slot vars). Here, we find all remaining uninitialized vars, report about them and initialize them.

get_specific_feed_dict(dataset, seq_idx)[source]
Parameters:
  • dataset (Dataset.Dataset)

  • seq_idx (int) – index of sequence, -1 for all sequences in dataset

Returns:

feed_dict for self.tf_session.run()

Return type:

dict[tf.Tensor,numpy.ndarray]

run_single(dataset, seq_idx, output_dict, ext_feed_dict=None)[source]
Parameters:
  • dataset (Dataset)

  • seq_idx (int) – index of sequence, -1 for all sequences in dataset

  • output_dict (dict[str,tf.Tensor]) – key -> tf.Tensor

  • ext_feed_dict (dict[tf.Tensor,numpy.ndarray])

Returns:

output_dict but values evaluated

Return type:

dict[str,numpy.ndarray]

forward_single(dataset, seq_idx, output_layer_name=None)[source]

Forwards a single sequence. If you want to perform search, and get a number of hyps out, use search_single().

Parameters:
  • dataset (Dataset.Dataset)

  • seq_idx (int)

  • output_layer_name (str|None) – e.g. “output”. if not set, will read from config “forward_output_layer”

Returns:

numpy array, output in time major format (time,dim)

Return type:

numpy.ndarray

forward_to_hdf(data, output_file, combine_labels='', batch_size=0, output_layer=None)[source]

Is aiming at recreating the same interface and output as Engine.forward_to_hdf(). See also EngineTask.HDFForwardTaskThread() and hdf_dump_from_dataset() in the hdf_dump.py tool.

Parameters:
  • data (Dataset)

  • output_file (str)

  • combine_labels (str) – ignored at the moment

  • batch_size (int)

  • output_layer (LayerBase)

forward_with_callback(*, dataset: Dataset, callback: ForwardCallbackIface)[source]

forward

analyze(data, statistics)[source]
Parameters:
  • data (Dataset.Dataset)

  • statistics (list[str]|None) – ignored at the moment

Returns:

print everything to log.v1, and return the Runner instance to get access to all the stats

Return type:

Runner

search(dataset, do_eval=True, output_layer_names='output', output_file=None, output_file_format='txt')[source]
Parameters:
  • dataset (Dataset)

  • do_eval (bool) – calculate errors and print reference. can only be done if we have the reference target

  • output_layer_names (str|list[str])

  • output_file (str)

  • output_file_format (str) – “txt” or “py”

search_single(dataset, seq_idx, output_layer_name=None)[source]

Performs search. See also forward_single().

Parameters:
  • dataset (Dataset.Dataset)

  • seq_idx (int) – index of sequence, -1 for all sequences in dataset

  • output_layer_name (str|None) – e.g. “output”. if not set, will read from config “search_output_layer”

Returns:

list of score and numpy array, each numpy arry in format (time,dim)

Return type:

list[(float,numpy.ndarray)]

search_single_seq(sources, output_layer_name=None)[source]
Parameters:
  • sources (list[numpy.ndarray|list[int]]) – source sequences as a list of indices

  • output_layer_name (str|None) – e.g. “output”. if not set, will read from config “search_output_layer”

Returns:

list of all hyps, which is a tuple of score and string

Return type:

list[(float,str)]

search_single_string_to_string_seq(sources, output_layer_name=None)[source]
Parameters:
  • sources (str|list[str]) – source text as a string (list for batch translation)

  • output_layer_name (str|None) – e.g. “output”. if not set, will read from config “search_output_layer”

Returns:

list of all hyps, which is a tuple of score and string

Return type:

list[(float,str)]

compute_priors(dataset, config=None)[source]
Parameters:
web_server(port)[source]

Starts a web-server with a simple API to forward data through the network (or search if the flag is set).

Parameters:

port (int) – for the http server

Returns:

returnn.tf.engine.get_global_engine()[source]

Similar to Config.get_global_config().

Return type:

Engine