Fsa

class Fsa.Edge(source_state_idx, target_state_idx, label, weight=0.0)[source]

class to represent an edge

Parameters:
  • source_state_idx (int) – the starting node of the edge
  • target_state_idx (int) – the ending node od th edge
  • label (int|str|None) – the label of the edge (normally a letter or a phoneme …)
  • weight (float) – probability of the word/phon in -log space
SIL = '_'[source]
EPS = '*'[source]
BLANK = '%'[source]
as_tuple()[source]
class Fsa.Graph(lemma)[source]

class holds the Graph representing the Finite State Automaton holds the input and the created output (ASG, CTC, HMM) states between input and output may be held if necessary

Parameters:lemma (str|list[str]|list[Edge]|None) – a sentence or word

list[str] lem_list: lemma transformed into list of strings list[Edge] lem_edges: the lemma is provided as a list of edges, so basically is already a fsa

is_empty()[source]
static make_single_state_graph(num_states, edges)[source]

takes a graph with several states and transforms into single state graph :param int num_states: number of states :param list[Edges] edges: list of Edges symbolizing the graph :return: returns the transformed list of Edges with one state :rtype: list[Edges]

class Fsa.Asg(fsa, num_labels=256, asg_repetition=2, label_conversion=False)[source]

class to create ASG FSA

Parameters:
  • fsa (Graph) – represents the Graph on which the class operates
  • num_labels (int) – number of labels without blank, silence, eps and repetitions where num_labels > 0
  • asg_repetition (int) – asg repeat symbol which stands for x repetitions where asg_repetition > 1
  • label_conversion (bool) – shall the labels be converted into numbers (only ASG and CTC)
run()[source]

creates the ASG FSA

class Fsa.Ctc(fsa, num_labels=256, label_conversion=False)[source]

class to create CTC FSA

Parameters:
  • fsa (Graph) – represents the Graph on which the class operates
  • num_labels (int) – number of labels without blank, silence, eps and repetitions
  • label_conversion (bool) – shall the labels be converted into numbers (only ASG and CTC)
run()[source]

creates the CTC FSA

class Fsa.Hmm(fsa, depth=6, allo_num_states=3, state_tying_conversion=False)[source]

class to create HMM FSA

Parameters:
  • fsa (Graph) – represents the Graph on which the class operates
  • depth (int) – the depth of the HMM FSA process
  • allo_num_states (int) – number of allophone states where allo_num_states > 0
  • state_tying_conversion (bool) – flag for state tying conversion
run()[source]

creates the HMM FSA

class Fsa.AllPossibleWordsFsa(fsa)[source]

constructs a fsa from all words in a lexicon

takes a lexicon file and constructs a fsa over all words :param Graph fsa: the graph which holds the constructed fsa

run()[source]
class Fsa.Ngram(n)[source]

constructs a fsa with a n-gram lm

constructs a fsa over a lexicon with n-grams :param int n: size of the gram (1, 2, 3)

run()[source]
Fsa.load_lexicon(lexicon_name='recog.150k.final.lex.gz', pickleflag=False)[source]

loads Lexicon takes a file, loads the xml and returns as Lexicon a pickled file can be loaded for a speed improvement where:

lex.lemmas and lex.phonemes important
Parameters:
  • lexicon_name (str) – holds the path and name of the lexicon file
  • pickleflag (bool) – flag to indicate if the lexicon datastructure is to be pickled
Return lexicon:

lexicon datastructure

Return type:

Lexicon

Fsa.load_state_tying(state_tying_name='state-tying.txt')[source]

loads a state tying map from a file, loads the file and returns its content state tying slower with pickling where:

statetying.allo_map important
Parameters:state_tying_name – holds the path and name of the state tying file
Return state_tying:
 state tying datastructure
Return type:StateTying
class Fsa.Store(num_states, edges, filename='edges', path='./tmp/', file_format='svg')[source]

Conversion and save class for FSA

Parameters:
  • num_states (int) – number of states of FSA
  • edges (list[Edge]) – list of edges representing FSA
  • filename (str) – name of the output file
  • path (str) – location
  • file_format (str) – format in which to save the file
fsa_to_dot_format()[source]

converts num_states and edges within the graph to dot format

save_to_file()[source]

saves dot graph to file settings: filename, path caution: overwrites already present files

static label_conversion(edges)[source]

coverts the string labels to int labels :param list[Edge] edges: list of edges describing the fsa graph :return edges: :rtype: list[Edges]

static add_nodes(graph, num_states)[source]

add nodes to the dot graph :param Digraph graph: add nodes to this graph :param int num_states: number of states equal number of nodes

static add_edges(graph, edges)[source]

add edges to the dot graph :param Digraph graph: add edges to this graph :param list[Edge] edges: list of edges

class Fsa.BuildSimpleFsaOp(state_models=None)[source]
itypes = (TensorType(int32, matrix),)[source]
otypes = (TensorType(float32, matrix), TensorType(float32, vector), TensorType(float32, matrix))[source]
perform(node, inputs, output_storage, params=None)[source]

Required: Calculate the function on the inputs and put the variables in the output storage. Return None.

Parameters:
node : Apply instance

Contains the symbolic inputs and outputs.

inputs : list

Sequence of inputs (immutable).

output_storage : list

List of mutable 1-element lists (do not change the length of these lists)

Raises:
MethodNotDefined

The subclass does not override this method.

Notes

The output_storage list might contain data. If an element of output_storage is not None, it has to be of the right type, for instance, for a TensorVariable, it has to be a Numpy ndarray, with the right number of dimensions, and the correct dtype. Its shape and stride pattern, can be arbitrary. It not is guaranteed that it was produced by a previous call to impl. It could be allocated by another Op impl is free to reuse it as it sees fit, or to discard it and allocate new memory.

class Fsa.FastBaumWelchBatchFsa(edges, weights, start_end_states)[source]

FSA(s) in representation format for FastBaumWelchOp.

Parameters:
  • edges (numpy.ndarray) – (4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
  • weights (numpy.ndarray) – (num_edges,), weights of the edges
  • start_end_states (numpy.ndarray) – (2, batch), (start,end) state idx in automaton.
class Fsa.FastBwFsaShared[source]

One FSA shared for all the seqs in one batch (i.e. across batch-dim). This is a simplistic class which provides the necessary functions to

add_edge(source_state_idx, target_state_idx, emission_idx, weight=0.0)[source]
Parameters:
  • source_state_idx (int) –
  • target_state_idx (int) –
  • emission_idx (int) –
  • weight (float) –
add_inf_loop(state_idx, num_emission_labels)[source]
Parameters:
  • state_idx (int) –
  • num_emission_labels (int) –
get_num_edges(n_batch)[source]
Parameters:n_batch (int) –
Return type:int
get_edges(n_batch)[source]
Parameters:n_batch (int) –
Return edges:(4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
Return type:numpy.ndarray
get_weights(n_batch)[source]
Parameters:n_batch (int) –
Return weights:(num_edges,), weights of the edges
Return type:numpy.ndarray
get_start_end_states(n_batch)[source]
Parameters:n_batch (int) –
Return start_end_states:
 (2, batch), (start,end) state idx in automaton. there is only one single automaton.
Return type:numpy.ndarray
get_fast_bw_fsa(n_batch)[source]
Parameters:n_batch (int) –
Return type:FastBaumWelchBatchFsa
Fsa.fast_bw_fsa_staircase(seq_lens, with_loop=False, max_skip=None, start_max_skip=None, end_max_skip=None)[source]

Builds up a staircase FSA, returns a FastBaumWelchBatchFsa. The emissions are indices [0, …, seq_len - 1].

Parameters:
  • seq_lens (list[int]) –
  • with_loop (bool) –
  • max_skip (int|list[int]) – per batch if a list
  • start_max_skip (int|list[int]) – per batch if a list
  • end_max_skip (int|list[int]) – per batch if a list
Return type:

FastBaumWelchBatchFsa

class Fsa.LoadWfstOp(filename)[source]

Op: maps segment names (tags) to fsa automata (load from disk) that can be used to compute a BW-alignment

make_node(tags)[source]

Create a “apply” nodes for the inputs in that order.

perform(node, inputs, output_storage, params=None)[source]

Required: Calculate the function on the inputs and put the variables in the output storage. Return None.

Parameters:
node : Apply instance

Contains the symbolic inputs and outputs.

inputs : list

Sequence of inputs (immutable).

output_storage : list

List of mutable 1-element lists (do not change the length of these lists)

Raises:
MethodNotDefined

The subclass does not override this method.

Notes

The output_storage list might contain data. If an element of output_storage is not None, it has to be of the right type, for instance, for a TensorVariable, it has to be a Numpy ndarray, with the right number of dimensions, and the correct dtype. Its shape and stride pattern, can be arbitrary. It not is guaranteed that it was produced by a previous call to impl. It could be allocated by another Op impl is free to reuse it as it sees fit, or to discard it and allocate new memory.

Fsa.main()[source]