returnn.engine.batch#

Defines BatchSeqCopyPart and other batch related helpers. This is shared across different backends.

class returnn.engine.batch.BatchSeqCopyPart(seq_idx, seq_start_frame, seq_end_frame, batch_slice, batch_frame_offset)[source]#
A batch used for training in RETURNN can consist of several parts from sequences,

ordered in various ways. The dataset, depending on the configuration, can generate these. For the non-recurrent case, we usually concatenate them together into one slice. For the recurrent case, we have a single slice per sequence, or even multiple slices for a sequence in case of chunking.

This class represents one single such part and where it is going to

be stored in the batch.

property frame_length[source]#
Return type:

NumbersDict

class returnn.engine.batch.Batch[source]#

A batch can consists of several sequences (= segments). This is basically just a list of BatchSeqCopyPart.

try_sequence_as_slice(length)[source]#
Parameters:

length (NumbersDict) – number of (time) frames

Returns:

new shape which covers the old shape and one more data-batch, format (time,batch)

Return type:

(NumbersDict,int)

add_sequence_as_slice(seq_idx, seq_start_frame, length)[source]#

Adds one data-batch in an additional slice.

Parameters:
add_frames(seq_idx, seq_start_frame, length, frame_dim_corresponds=True)[source]#

Adds frames to all data-batches. Will add one data-batch if we don’t have one yet.

Parameters:
  • seq_idx (int) –

  • seq_start_frame (NumbersDict|int) –

  • length (NumbersDict) – number of (time) frames

  • frame_dim_corresponds (bool) – if the batch frame offset should always be the same (max value) for all keys

init_with_one_full_sequence(seq_idx, dataset)[source]#
Parameters:
  • seq_idx (int) –

  • dataset (Dataset.Dataset) –

get_all_slices_num_frames()[source]#

Note that this is only an upper limit in case of data_shape[1] > 1 because data_shape[0] is the max frame len of all seqs.

Returns:

related to the data-key with max length

Return type:

NumbersDict

get_total_num_frames()[source]#
Return type:

NumbersDict

property start_seq[source]#
Return type:

int|None

property end_seq[source]#
Return type:

int|None

get_num_seqs()[source]#
Return type:

int

class returnn.engine.batch.BatchSetGenerator(dataset, generator, shuffle_batches=False, cache_whole_epoch=True)[source]#

This will give you the next batches (list[Batch]) such that you can use them for assign_dev_data(). We get those batches from a generator, i.e. lazily on-the-fly. This is the whole point of BatchSetGenerator - that we must not know the whole list of batches in advance. As assign_dev_data() can fail for various reasons, we buffer the list of batches and you call self.advance() explicitly to go forward to next batches.

Parameters:
  • shuffle_batches (bool) –

  • cache_whole_epoch (bool) –

reset()[source]#

Call this after one epoch to reuse the previously cached batches.

peek_next_n(n)[source]#
Return type:

list[Batch]

:returns it might return less. There is no way to know in advance. If self.has_more() is True, it will at least return one.

advance(n)[source]#
completed_frac()[source]#
Return type:

float

:returns 0-1, >0

has_more()[source]#

This would also try to advance further in the dataset, thus it might block. If it returns False, no more data is available in the dataset.

Return type:

bool

get_current_batch_idx()[source]#
Return type:

int