SprintCache

This module is about reading (maybe later also writing) the Sprint archive format.

class SprintCache.FileInfo(name, pos, size, compressed, index)[source]

File info.

Parameters:
  • name (str) –
  • pos (int) –
  • size (int) –
  • compressed (bool|int) –
  • index (int) –
class SprintCache.FileArchive(filename, must_exists=True)[source]

File archive.

read_u32(self)[source]
Return type:int
read_U32(self)[source]
Return type:int
read_u64(self)[source]
Return type:int
read_char(self)[source]
Return type:int
read_bytes(self, l)[source]
Return type:bytes
read_str(self, l, enc='ascii')[source]
Return type:str
read_f32(self)[source]
Return type:float
read_f64(self)[source]
Return type:float
read_v(self, typ, size)[source]
Parameters:
  • typ (str) – “f” for float (float32) or “d” for double (float64)
  • size (int) – number of elements to return
Returns:

numpy array of shape (size,) of dtype depending on typ

Return type:

numpy.ndarray

write_str(self, s)[source]
Parameters:s (str) –
Return type:int
write_char(self, i)[source]
Parameters:i (int) –
Return type:int
write_u32(self, i)[source]
Parameters:i (int) –
Return type:int
write_U32(self, i)[source]
Parameters:i (int) –
Return type:int
write_u64(self, i)[source]
Parameters:i (int) –
Return type:int
write_f32(self, i)[source]
Parameters:i (float) –
Return type:int
write_f64(self, i)[source]
Parameters:i (float) –
Return type:int
SprintCacheHeader = 'SP_ARC1\x00'[source]
start_recovery_tag = 2857740885[source]
end_recovery_tag = 1437226410[source]
file_list(self)[source]
Return type:list[str]
finalize(self)[source]

Finalize.

read_file_info_table(self)[source]

Read file info table.

write_file_info_table(self)[source]

Write file info table.

scan_archive(self)[source]

Scan archive.

has_entry(self, filename)[source]
Parameters:filename (str) – argument for self.read()
Returns:True if we have this entry
read(self, filename, typ)[source]
Parameters:
  • filename (str) – the entry-name in the archive
  • typ (str) – “str”, “feat” or “align”
Returns:

depending on typ, “str” -> string, “feat” -> (time, data), “align” -> align, where string is a str, time is list of time-stamp tuples (start-time,end-time) in millisecs,

data is a list of features, each a numpy vector,

align is a list of (time, allophone, state), time is an int from 0 to len of align,

allophone is some int, state is e.g. in [0,1,2].

Return type:

str|(list[numpy.ndarray],list[numpy.ndarray])|list[(int,int,int)]

get_state(self, mix)[source]
Parameters:mix (int) –
Returns:(mix, state)
Return type:(int,int)
set_allophones(self, f)[source]
Parameters:f (str) – allophone filename. line-separated. will ignore lines starting with “#”
add_feature_cache(self, filename, features, times)[source]
Parameters:
  • filename (str) –
  • features
  • times
add_attributes(self, filename, dim, duration)[source]
Parameters:
  • filename (str) –
  • dim (int) –
  • duration (float) –
class SprintCache.FileArchiveBundle(filename=None)[source]

File archive bundle.

Parameters:filename (str|None) – .bundle file
add_bundle(self, filename)[source]
Parameters:filename (str) – bundle
add_archive(self, filename)[source]
Parameters:filename (str) – single archive
add_bundle_or_archive(self, filename)[source]
Parameters:filename (str) –
file_list(self)[source]
Return type:list[str]
Returns:list of content-filenames (which can be used for self.read())
has_entry(self, filename)[source]
Parameters:filename (str) – argument for self.read()
Returns:True if we have this entry
read(self, filename, typ)[source]
Parameters:
  • filename (str) – the entry-name in the archive
  • typ (str) – “str”, “feat” or “align”
Returns:

depending on typ, “str” -> string, “feat” -> (time, data), “align” -> align, where string is a str, time is list of time-stamp tuples (start-time,end-time) in millisecs,

data is a list of features, each a numpy vector,

align is a list of (time, allophone, state), time is an int from 0 to len of align,

allophone is some int, state is e.g. in [0,1,2].

Return type:

str|(list[numpy.ndarray],list[numpy.ndarray])|list[(int,int,int)]

Uses FileArchive.read().

set_allophones(self, filename)[source]
Parameters:filename (str) – allophone filename
SprintCache.open_file_archive(archive_filename, must_exists=True)[source]
Parameters:
  • archive_filename (str) –
  • must_exists (bool) –
Return type:

FileArchiveBundle|FileArchive

SprintCache.is_sprint_cache_file(filename)[source]
Parameters:filename (str) – file to check. must exist
Returns:True iff this is a sprint cache (which can be loaded with open_file_archive())
Return type:bool
class SprintCache.AllophoneLabeling(silence_phone, allophone_file, phoneme_file=None, state_tying_file=None, verbose_out=None)[source]

Allophone labeling.

Parameters:
  • silence_phone (str) – e.g. “si”
  • allophone_file (str) – list of allophones
  • phoneme_file (str|None) – list of phonemes
  • state_tying_file (str|None) – allophone state tying (e.g. via CART). maps each allophone state to a class label
  • verbose_out (file) – stream to dump log messages
get_label_idx_by_allo_state_idx(self, allo_state_idx)[source]
Parameters:allo_state_idx (int) –
Return type:int
get_label_idx(self, allo_idx, state_idx)[source]
Parameters:
  • allo_idx (int) –
  • state_idx (int) –
Return type:

int

class SprintCache.MixtureSet(filename)[source]

Mixture set.

Parameters:filename (str) –
read_u32(self)[source]
Return type:int
read_U32(self)[source]
Return type:int
read_u64(self)[source]
Return type:int
read_char(self)[source]
Return type:int
read_str(self, l, enc='ascii')[source]
Parameters:
  • l (int) –
  • enc (str) –
Return type:

str

read_f32(self)[source]
Return type:float
read_f64(self)[source]
Return type:float
read_v(self, size, a)[source]
Parameters:
  • size (int) –
  • a (array.array) –
Return type:

array.array

write(self, filename)[source]
Parameters:filename (str) –
get_mean_by_idx(self, idx)[source]
Parameters:idx (int) –
Return type:numpy.ndarray
get_cov_by_idx(self, idx)[source]
Parameters:idx (int) –
Return type:numpy.ndarray
get_number_mixtures(self)[source]
Return type:int
class SprintCache.WordBoundaries(filename)[source]

Word boundaries.

Parameters:filename (str) –
read_u16(self)[source]
Return type:int
read_u32(self)[source]
Return type:int
read_str(self, l, enc='ascii')[source]
Return type:str
SprintCache.main()[source]

Main entry for usage as a tool.