CachedDataset2

Provides CachedDataset2.

class CachedDataset2.CachedDataset2(**kwargs)[source]

Somewhat like CachedDataset, but different. Simpler in some sense. And more generic. Caching might be worse.

If you derive from this class: - you must override _collect_single_seq - you must set num_inputs (dense-dim of “data” key) and num_outputs (dict key -> dim, ndim-1) - you should set labels - handle seq ordering by overriding init_seq_order - you can set _estimated_num_seqs - you can set _num_seqs or _num_timesteps if you know them in advance

init_seq_order(self, epoch=None, seq_list=None)[source]
Parameters:
  • epoch (int|None) –
  • | None seq_list (list[str]) – In case we want to set a predefined order.
Return type:

bool

:returns whether the order changed (True is always safe to return)

This is called when we start a new epoch, or at initialization. Call this when you reset the seq list.

is_cached(self, start, end)[source]
Parameters:
  • start (int) –
  • end (int) –
Return type:

bool

num_seqs[source]
Return type:int
is_less_than_num_seqs(self, n)[source]
Parameters:n (int) –
Return type:int
get_num_timesteps(self)[source]
Return type:int
get_seq_length(self, sorted_seq_idx)[source]
Return type:Util.NumbersDict
get_data(self, seq_idx, key)[source]
Parameters:
  • seq_idx (int) –
  • key (str) –
Return type:

numpy.ndarray

get_input_data(self, seq_idx)[source]
Parameters:seq_idx (int) –
Return type:numpy.ndarray
get_targets(self, target, seq_idx)[source]
Parameters:
  • target (str) –
  • seq_idx (int) –
Return type:

numpy.ndarray

get_ctc_targets(self, sorted_seq_idx)[source]
Parameters:sorted_seq_idx (int) –
Return type:numpy.ndarray|None
get_tag(self, sorted_seq_idx)[source]
Parameters:sorted_seq_idx (int) –
Return type:str
get_data_keys(self)[source]
Return type:list[str]
get_target_list(self)[source]

Target data keys are usually not available during inference. Overwrite this if your dataset is more custom.

is_data_sparse(self, key)[source]
Parameters:key (str) – e.g. “data” or “classes”
Return type:bool
get_data_dim(self, key)[source]
Parameters:key (str) – e.g. “data” or “classes”
Return type:int
Returns:number of classes, no matter if sparse or not
get_data_dtype(self, key)[source]
Parameters:key (str) –
Return type:str
class CachedDataset2.SingleStreamPipeDataset(dim, ndim, sparse=False, dtype='float32')[source]

Producer: Gets data from somewhere / an external source, running in some thread. Consumer: The thread / code which calls load_seqs and get_data here.

Parameters:
  • dim (int) –
  • ndim (int) –
  • sparse (bool) –
  • dtype (str) –
is_data_sparse(self, key)[source]
Parameters:key (str) –
Return type:bool
get_data_dtype(self, key)[source]
Parameters:key (str) –
Return type:str
init_seq_order(self, epoch=None, seq_list=None)[source]
Parameters:
  • epoch (int) –
  • seq_list (list[str]|None) –
Return type:

bool

producer_add_data(self, data, seq_tag=None)[source]
Parameters:
  • data (numpy.ndarray) –
  • seq_tag (str|None) –
producer_set_finished(self)[source]

Mark finished.