returnn.util.fsa
¶
Utility functions to generate FSAs (or FSTs).
- class returnn.util.fsa.Edge(source_state_idx, target_state_idx, label, weight=0.0)[source]¶
class to represent an edge
- Parameters:
source_state_idx (int) – the starting node of the edge
target_state_idx (int) – the ending node od th edge
label (int|str|None) – the label of the edge (normally a letter or a phoneme …)
weight (float) – probability of the word/phon in -log space
- class returnn.util.fsa.Graph(lemma)[source]¶
class holds the Graph representing the Finite State Automaton holds the input and the created output (ASG, CTC, HMM) states between input and output may be held if necessary
- Parameters:
lemma (str|list[str]|list[Edge]|None) – a sentence or word
list[str] lem_list: lemma transformed into list of strings list[Edge] lem_edges: the lemma is provided as a list of edges, so basically is already a fsa
- static make_single_state_graph(num_states, edges)[source]¶
takes a graph with several states and transforms into single state graph :param int num_states: number of states :param list[Edges] edges: list of Edges symbolizing the graph :return: returns the transformed list of Edges with one state :rtype: list[Edges]
- class returnn.util.fsa.Asg(fsa, num_labels=256, asg_repetition=2, label_conversion=False)[source]¶
class to create ASG FSA
- Parameters:
fsa (Graph) – represents the Graph on which the class operates
num_labels (int) – number of labels without blank, silence, eps and repetitions where num_labels > 0
asg_repetition (int) – asg repeat symbol which stands for x repetitions where asg_repetition > 1
label_conversion (bool) – shall the labels be converted into numbers (only ASG and CTC)
- class returnn.util.fsa.Ctc(fsa, num_labels=256, label_conversion=False)[source]¶
class to create CTC FSA
- Parameters:
fsa (Graph) – represents the Graph on which the class operates
num_labels (int) – number of labels without blank, silence, eps and repetitions
label_conversion (bool) – shall the labels be converted into numbers (only ASG and CTC)
- class returnn.util.fsa.Hmm(fsa, depth=6, allo_num_states=3, state_tying_conversion=False)[source]¶
class to create HMM FSA
- Parameters:
fsa (Graph) – represents the Graph on which the class operates
depth (int) – the depth of the HMM FSA process
allo_num_states (int) – number of allophone states where allo_num_states > 0
state_tying_conversion (bool) – flag for state tying conversion
- class returnn.util.fsa.AllPossibleWordsFsa(fsa)[source]¶
constructs a fsa from all words in a lexicon
takes a lexicon file and constructs a fsa over all words
- Parameters:
fsa (Graph) – the graph which holds the constructed fsa
- class returnn.util.fsa.Ngram(n)[source]¶
constructs a fsa with a n-gram lm
constructs a fsa over a lexicon with n-grams
- Parameters:
n (int) – size of the gram (1, 2, 3)
- returnn.util.fsa.load_lexicon(lexicon_name='recog.150k.final.lex.gz', pickleflag=False)[source]¶
loads Lexicon takes a file, loads the xml and returns as Lexicon a pickled file can be loaded for a speed improvement where:
lex.lemmas and lex.phonemes important
- Parameters:
lexicon_name (str) – holds the path and name of the lexicon file
pickleflag (bool) – flag to indicate if the lexicon datastructure is to be pickled
- Return lexicon:
lexicon datastructure
- Return type:
- returnn.util.fsa.load_state_tying(state_tying_name='state-tying.txt')[source]¶
loads a state tying map from a file, loads the file and returns its content state tying slower with pickling where:
statetying.allo_map important
- Parameters:
state_tying_name – holds the path and name of the state tying file
- Return state_tying:
state tying datastructure
- Return type:
- class returnn.util.fsa.Store(num_states, edges, filename='edges', path='./tmp/', file_format='svg')[source]¶
Conversion and save class for FSA
- Parameters:
num_states (int) – number of states of FSA
edges (list[Edge]) – list of edges representing FSA
filename (str) – name of the output file
path (str) – location
file_format (str) – format in which to save the file
- save_to_file()[source]¶
saves dot graph to file settings: filename, path caution: overwrites already present files
- static label_conversion(edges)[source]¶
coverts the string labels to int labels :param list[Edge] edges: list of edges describing the fsa graph :return edges: :rtype: list[Edges]
- class returnn.util.fsa.FastBaumWelchBatchFsa(edges, weights, start_end_states)[source]¶
FSA(s) in representation format for
FastBaumWelchOp
.- Parameters:
edges (numpy.ndarray) – (4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
weights (numpy.ndarray) – (num_edges,), weights of the edges
start_end_states (numpy.ndarray) – (2, batch), (start,end) state idx in automaton.
One FSA shared for all the seqs in one batch (i.e. across batch-dim). This is a simplistic class which provides the necessary functions to add edges, and simple conversion to
FastBaumWelchBatchFsa
.- Parameters:
source_state_idx (int)
target_state_idx (int)
emission_idx (int)
weight (float)
- Parameters:
state_idx (int)
num_emission_labels (int)
- Parameters:
n_batch (int)
- Return type:
int
- Parameters:
n_batch (int)
- Return edges:
(4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
- Return type:
numpy.ndarray
- Parameters:
n_batch (int)
- Return weights:
(num_edges,), weights of the edges
- Return type:
numpy.ndarray
- Parameters:
n_batch (int)
- Return start_end_states:
(2, batch), (start,end) state idx in automaton. there is only one single automaton.
- Return type:
numpy.ndarray
- Parameters:
n_batch (int)
- Return type:
- returnn.util.fsa.get_ctc_fsa_fast_bw(targets, seq_lens, blank_idx)[source]¶
- Parameters:
targets (numpy.ndarray) – shape (batch,time)
seq_lens (numpy.ndarray) – shape (batch)
blank_idx (int)
- Return type:
- returnn.util.fsa.fast_bw_fsa_staircase(seq_lens, with_loop=False, max_skip=None, start_max_skip=None, end_max_skip=None)[source]¶
Builds up a staircase FSA, returns a FastBaumWelchBatchFsa. The emissions are indices [0, …, seq_len - 1].
- Parameters:
seq_lens (list[int]|numpy.ndarray)
with_loop (bool)
max_skip (int|list[int]) – per batch if a list
start_max_skip (int|list[int]) – per batch if a list
end_max_skip (int|list[int]) – per batch if a list
- Return type: