Defines BatchSeqCopyPart and other batch related helpers. This is shared across different backends.

class EngineBatch.BatchSeqCopyPart(seq_idx, seq_start_frame, seq_end_frame, batch_slice, batch_frame_offset)[source]
A batch used for training in CRNN can consist of several parts from sequences,
ordered in various ways. The dataset, depending on the configuration, can generate these. For the non-recurrent case, we usually concatenate them together into one slice. For the recurrent case, we have a single slice per sequence, or even multiple slices for a sequence in case of chunking.
This class represents one single such part and where it is going to
be stored in the batch.
Return type:NumbersDict
class EngineBatch.Batch[source]

A batch can consists of several sequences (= segments). This is basically just a list of BatchSeqCopyPart.

try_sequence_as_slice(self, length)[source]
Parameters:length (NumbersDict) – number of (time) frames
Returns:new shape which covers the old shape and one more data-batch, format (time,batch)
Return type:(NumbersDict,int)
add_sequence_as_slice(self, seq_idx, seq_start_frame, length)[source]

Adds one data-batch in an additional slice.

  • seq_idx (int) –
  • seq_start_frame (NumbersDict|int) –
  • length (NumbersDict) – number of (time) frames
add_frames(self, seq_idx, seq_start_frame, length, frame_dim_corresponds=True)[source]

Adds frames to all data-batches. Will add one data-batch if we don’t have one yet.

  • seq_idx (int) –
  • seq_start_frame (NumbersDict|int) –
  • length (NumbersDict) – number of (time) frames
  • frame_dim_corresponds (bool) – if the batch frame offset should always be the same (max value) for all keys
init_with_one_full_sequence(self, seq_idx, dataset)[source]

Note that this is only an upper limit in case of data_shape[1] > 1 because data_shape[0] is the max frame len of all seqs.

Returns:related to the data-key with max length
Return type:NumbersDict
Return type:NumbersDict
Return type:int|None
Return type:int|None
Return type:int
class EngineBatch.BatchSetGenerator(dataset, generator, shuffle_batches=False, cache_whole_epoch=True)[source]

This will give you the next batches (list[Batch]) such that you can use them for assign_dev_data(). We get those batches from a generator, i.e. lazily on-the-fly. This is the whole point of BatchSetGenerator - that we must not know the whole list of batches in advance. As assign_dev_data() can fail for various reasons, we buffer the list of batches and you call self.advance() explicitly to go forward to next batches.

  • shuffle_batches (bool) –
  • cache_whole_epoch (bool) –

Call this after one epoch to reuse the previously cached batches.

peek_next_n(self, n)[source]
Return type:list[Batch]

:returns it might return less. There is no way to know in advance. If self.has_more() is True, it will at least return one.

advance(self, n)[source]
Return type:float

:returns 0-1, >0


This would also try to advance further in the dataset, thus it might block. If it returns False, no more data is available in the dataset.

Return type:bool
Return type:int