Fsa

class Fsa.Edge(source_state_idx, target_state_idx, label, weight=0.0)[source]

class to represent an edge

Parameters:
  • source_state_idx (int) – the starting node of the edge
  • target_state_idx (int) – the ending node od th edge
  • label (int|str|None) – the label of the edge (normally a letter or a phoneme ...)
  • weight (float) – probability of the word/phon in -log space
weight = None[source]

int|None idx_word_in_sentence: index of word in the given sentence int|None idx_phon_in_word: index of phon in a word int|None idx: label index within the sentence/word/phon bool phon_at_word_begin: flag indicates if phon at the beginning of a word bool phon_at_word_end: flag indicates if phon at the end of a word float|None score: score of the edge bool is_loop: is the edge a loop within the graph

class Fsa.Graph(lemma)[source]

class holds the Graph representing the Finite State Automaton holds the input and the created output (ASG, CTC, HMM) states between input and output may be held if necessary

Parameters:lemma (str|None) – a sentence or word

list[str] lemma_list: input transformed into list if necessary

set_filename(name)[source]

sets the filename, for use with saving :param str name: the filename, different stuff gets appended

make_single_state_graph()[source]
save()[source]
class Fsa.Asg(graph, num_labels, asg_repetition=2, label_conversion=False)[source]

class to create ASG FSA

Parameters:
  • fsa (Graph) – represents the Graph on which the class operates
  • num_labels (int) – number of labels without blank, silence, eps and repetitions
  • asg_repetition (int) – asg repeat symbol which stands for x repetitions
  • label_conversion (bool) – shall the labels be converted into numbers (only ASG and CTC)
set_asg_rep(reps)[source]

sets the asg repeat symbol :param int reps: the asg repeat

set_num_labels(numlab)[source]

sets number of labels :param int numlab: the number of labels

set_label_conversion(onoff)[source]

sets label conversion on or off :param bool onoff: flag to set label conversion on/off

run()[source]
class Fsa.Ctc(graph, num_labels, label_conversion=False)[source]

class to create CTC FSA

Parameters:
  • fsa (Graph) – represents the Graph on which the class operates
  • num_labels (int) – number of labels without blank, silence, eps and repetitions
  • label_conversion (bool) – shall the labels be converted into numbers (only ASG and CTC)
set_num_labels(numlab)[source]

sets number of labels :param int numlab: the number of labels

set_label_conversion(onoff)[source]

sets label conversion on or off :param bool onoff: flag to set label conversion on/off

run()[source]
class Fsa.Hmm(graph, depth=6, allo_num_states=3)[source]

class to create HMM FSA

Parameters:
  • fsa (Graph) – represents the Graph on which the class operates
  • depth (int) – the depth of the HMM FSA process
  • allo_num_states (int) – number of allophone states
set_depth(depth)[source]

sets the depth for the HMM FSA process :param int depth: the depth of the HMM FSA process

load_lexicon(lexicon_name)[source]

loads Lexicon :param str lexicon_name: holds the path and name of the lexicon file

load_state_tying(state_tying_name)[source]

loads StateTying :param state_tying_name: holds the path and name of the state tying file

class Fsa.Fsa[source]

class to create Finite State Automaton

Parameters:
  • lemma (str|list[str]) – word or sentence
  • fsa_type (str) – determines finite state automaton type: asg, ctc, hmm
  • num_states (int) – number of states
  • edges (list) – list of edges
where:
num_states: int, number of states.
per convention, state 0 is start state, state (num_states - 1) is single final state
edges: list[(from,to,label_idx,weight)]
from and to are state_idx >= 0 and < num_states, label_idx >= 0 and label_idx < num_labels –or– label_idx == num_labels for blank symbol weight is a float, in -log space
Parameters:
  • filename (str) – name of file to store graph
  • asg_repetition (int) – repetition symbols for asg
  • num_labels (int) – number of labels
  • label_conversion (bool) – use chars or indexes
  • final_states (list[int]) – list of final states
  • depth (int) – depth / level of hmm
  • allo_num_states (int) – number of allophone states
  • lexicon (str) – lexicon file name
  • state_tying (str) – state tying file name
  • phon_dict (dict) – dictionary of phonemes, loaded from lexicon file
set_params(asg_repetition=2, num_labels=256, label_conversion=False, depth=6, allo_num_states=3, lexicon_name='', state_tying_name='', single_state=False)[source]

sets the parameters for FSA generator checks if needed params for fsa type available otherwise erquests user input :param str filename: sets the output file name :param int asg_repetition:

if a label is repeated within the lemma how many repetitions will be substituted with a specific repetition symbol
Parameters:
  • num_labels (int) – total number of labels
  • label_conversion (bool) – true: each label converted to index of its label false: no conversion
  • depth (int) – depth of the hmm acceptor
  • allo_num_states (int) – umber of allophone states
  • lexicon (str) – lexicon file name
  • state_tying (str) – state tyting file name
  • single_state (bool) – produce additional fsa: single node
Returns:

set_lemma(lemma)[source]
Parameters:lemma (str) – word or sentence
set_fsa_type(fsa_type)[source]
Parameters:fsa_type (str) – determines finite state automaton type: asg, ctc, hmm
set_filename(filename)[source]
Parameters:filename (str) – name of file to store graph
set_hmm_depth(depth)[source]
set_lexicon(lexicon_name=None)[source]

sets a new lexicon :param str lexicon_name: lexicon path

set_state_tying(state_tying=None)[source]

sets a new state tying file :param str state_tying: state tying file/path

run()[source]

runs the FSA

convert_label_seq_to_indices()[source]

takes label sequence of chars and converts to indices (ascii numbering)

reduce_node_num()[source]

takes the edges and nodes, then reduces all to one node

Fsa.fsa_to_dot_format(file, num_states, edges)[source]
Parameters:
  • num_states
  • edges
Returns:

converts num_states and edges to dot file to svg file via graphviz

class Fsa.BuildSimpleFsaOp(loop_emission_idxs=(), loop_scores=(0.0, 0.0))[source]
itypes = (TensorType(int32, matrix),)[source]
otypes = (TensorType(float32, matrix), TensorType(float32, vector), TensorType(float32, matrix))[source]
perform(node, inputs, output_storage, params=None)[source]
class Fsa.FastBaumWelchBatchFsa(edges, weights, start_end_states)[source]

FSA(s) in representation format for FastBaumWelchOp.

Parameters:
  • edges (numpy.ndarray) – (4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
  • weights (numpy.ndarray) – (num_edges,), weights of the edges
  • start_end_states (numpy.ndarray) – (2, batch), (start,end) state idx in automaton.
class Fsa.FastBwFsaShared[source]

One FSA shared for all the seqs in one batch (i.e. across batch-dim). This is a simplistic class which provides the necessary functions to

add_edge(source_state_idx, target_state_idx, emission_idx, weight=0.0)[source]
Parameters:
  • source_state_idx (int) –
  • target_state_idx (int) –
  • emission_idx (int) –
  • weight (float) –
add_inf_loop(state_idx, num_emission_labels)[source]
Parameters:
  • state_idx (int) –
  • num_emission_labels (int) –
get_num_edges(n_batch)[source]
Parameters:n_batch (int) –
Return type:int
get_edges(n_batch)[source]
Parameters:n_batch (int) –
Return edges:(4,num_edges), edges of the graph (from,to,emission_idx,sequence_idx)
Return type:numpy.ndarray
get_weights(n_batch)[source]
Parameters:n_batch (int) –
Return weights:(num_edges,), weights of the edges
Return type:numpy.ndarray
get_start_end_states(n_batch)[source]
Parameters:n_batch (int) –
Return start_end_states:
 (2, batch), (start,end) state idx in automaton. there is only one single automaton.
Return type:numpy.ndarray
get_fast_bw_fsa(n_batch)[source]
Parameters:n_batch (int) –
Return type:FastBaumWelchBatchFsa
Fsa.main()[source]