NAME

pocketsphinx_batch - Run speech recognition in batch mode

SYNOPSIS

pocketsphinx_batch -ctl ctlfile -cepdir cepdir -cepext .mfc [ options ]...

DESCRIPTION

Run speech recognition over a list of utterances in batchmode. A list of arguments follows:

-adchdr: Size of audio file header in bytes (headers are ignored)
-adcin: Input is raw audio data
-agc: Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh: Initial threshold for automatic gain control
-allphone: phoneme decoding with phonetic lm
-allphone_ci: Perform phoneme decoding with phonetic lm and context-independent units only
-alpha: Preemphasis parameter
-argfile: file giving extra arguments.
-ascale: Inverse of acoustic model scale for confidence score calculation
-aw: Inverse weight applied to acoustic scores.
-backtrace: Print results and backtraces to log file.
-beam: Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
-bestpath: Run bestpath (Dijkstra) search over word lattice (3rd pass)
-bestpathlw: Language model probability weight for bestpath search
-build_outdirs: Create missing subdirectories in output directory
-cepdir: files directory (prefixed to filespecs in control file)
-cepext: Input files extension (suffixed to filespecs in control file)
-ceplen: Number of components in the input feature vector
-cmn: Cepstral mean normalization scheme ('current', 'prior', or 'none')
-cmninit: Initial values (comma-separated) for cepstral mean when 'prior' is used
-compallsen: Compute all senone scores in every frame (can be faster when there are many senones)
-ctl: file listing utterances to be processed
-ctlcount: No. of utterances to be processed (after skipping -ctloffset entries)
-ctlincr: Do every Nth line in the control file
-ctloffset: No. of utterances at the beginning of -ctl file to be skipped
-ctm: output in CTM file format (may require post-sorting)
-debug: level for debugging messages
-dict: pronunciation dictionary (lexicon) input file
-dictcase: Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
-dither: Add 1/2-bit noise
-doublebw: Use double bandwidth filters (same center freq)
-ds: Frame GMM computation downsampling ratio
-fdict: word pronunciation dictionary input file
-feat: Feature stream type, depends on the acoustic model
-featparams: containing feature extraction parameters.
-fillprob: Filler word transition probability
-frate: Frame rate
-fsg: format finite state grammar file
-fsgctl: file listing FSG file to use for each utterance
-fsgdir: directory for FSG files
-fsgext: extension for FSG files (including leading dot)
-fsgusealtpron: Add alternate pronunciations to FSG
-fsgusefiller: Insert filler words at each state.
-fwdflat: Run forward flat-lexicon search over word lattice (2nd pass)
-fwdflatbeam: Beam width applied to every frame in second-pass flat search
-fwdflatefwid: Minimum number of end frames for a word to be searched in fwdflat search
-fwdflatlw: Language model probability weight for flat lexicon (2nd pass) decoding
-fwdflatsfwin: Window of frames in lattice to search for successor words in fwdflat search
-fwdflatwbeam: Beam width applied to word exits in second-pass flat search
-fwdtree: Run forward lexicon-tree search (1st pass)
-hmm: containing acoustic model files.
-hyp: output file name
-hypseg: output with segmentation file name
-input_endian: Endianness of input data, big or little, ignored if NIST or MS Wav
-jsgf: grammar file
-keyphrase: to spot
-kws: file with keyphrases to spot, one per line
-kws_delay: Delay to wait for best detection score
-kws_plp: Phone loop probability for keyword spotting
-kws_threshold: Threshold for p(hyp)/p(alternatives) ratio
-latsize: Initial backpointer table size
-lda: containing transformation matrix to be applied to features (single-stream features only)
-ldadim: Dimensionality of output of feature transformation (0 to use entire matrix)
-lifter: Length of sin-curve for liftering, or 0 for no liftering.
-lm: trigram language model input file
-lmctl: a set of language model
-lmname: language model in -lmctl to use by default
-lmnamectl: file listing LM name to use for each utterance
-logbase: Base in which all log-likelihoods calculated
-logfn: to write log messages in
-logspec: Write out logspectral files instead of cepstra
-lowerf: Lower edge of filters
-lpbeam: Beam width applied to last phone in words
-lponlybeam: Beam width applied to last phone in single-phone words
-lw: Language model probability weight
-maxhmmpf: Maximum number of active HMMs to maintain at each frame (or -1 for no pruning)
-maxwpf: Maximum number of distinct word exits at each frame (or -1 for no pruning)
-mdef: definition input file
-mean: gaussian means input file
-mfclogdir: to log feature files to
-min_endfr: Nodes ignored in lattice construction if they persist for fewer than N frames
-mixw: mixture weights input file (uncompressed)
-mixwfloor: Senone mixture weights floor (applied to data from -mixw file)
-mllr: transformation to apply to means and variances
-mllrctl: file listing MLLR transforms to use for each utterance
-mllrdir: directory for MLLR transforms
-mllrext: extension for MLLR transforms (including leading dot)
-mmap: Use memory-mapped I/O (if possible) for model files
-nbest: Number of N-best hypotheses to write to -nbestdir (0 for no N-best)
-nbestdir: for writing N-best hypothesis lists
-nbestext: Extension for N-best hypothesis list files
-ncep: Number of cep coefficients
-nfft: Size of FFT
-nfilt: Number of filter banks
-nwpen: New word transition penalty
-outlatbeam: Minimum posterior probability for output lattice nodes
-outlatdir: for dumping word lattices
-outlatext: Filename extension for dumping word lattices
-outlatfmt: Format for dumping word lattices (s3 or htk)
-pbeam: Beam width applied to phone transitions
-pip: Phone insertion penalty
-pl_beam: Beam width applied to phone loop search for lookahead
-pl_pbeam: Beam width applied to phone loop transitions for lookahead
-pl_pip: Phone insertion penalty for phone loop
-pl_weight: Weight for phoneme lookahead penalties
-pl_window: Phoneme lookahead window size, in frames
-rawlogdir: to log raw audio files to
-remove_dc: Remove DC offset from each frame
-remove_noise: Remove noise with spectral subtraction in mel-energies
-remove_silence: Enables VAD, removes silence frames from processing
-round_filters: Round mel filter frequencies to DFT points
-samprate: Sampling rate
-seed: Seed for random number generator; if less than zero, pick our own
-sendump: dump (compressed mixture weights) input file
-senin: Input is senone score dump files
-senlogdir: to log senone score files to
-senmgau: to codebook mapping input file (usually not needed)
-silprob: Silence word transition probability
-smoothspec: Write out cepstral-smoothed logspectral files
-svspec: specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
-tmat: state transition matrix input file
-tmatfloor: HMM state transition probability floor (applied to -tmat file)
-topn: Maximum number of top Gaussians to use in scoring.
-topn_beam: Beam width used to determine top-N Gaussians (or a list, per-feature)
-toprule: rule for JSGF (first public rule is default)
-transform: Which type of transform to use to calculate cepstra (legacy, dct, or htk)
-unit_area: Normalize mel filters to unit area
-upperf: Upper edge of filters
-uw: Unigram weight
-vad_postspeech: Num of silence frames to keep after from speech to silence.
-vad_prespeech: Num of speech frames to keep before silence to speech.
-vad_startspeech: Num of speech frames to trigger vad from silence to speech.
-vad_threshold: Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
-var: gaussian variances input file
-varfloor: Mixture gaussian variance floor (applied to data from -var file)
-varnorm: Variance normalize each utterance (only if CMN == current)
-verbose: Show input filenames
-warp_params: defining the warping function
-warp_type: Warping function type (or shape)
-wbeam: Beam width applied to word exits
-wip: Word insertion penalty
-wlen: Hamming window length

To do batchmode recognition, you will need to specify a control file, using -ctl This is a simple text file containing one entry per line. Each entry is the name of an input file relative to the -cepdir directory, and without the filename extension (which is given in the -cepext argument).

If you are using acoustic feature files as input (see sphinx_fe(1) for information on how to generate these), you can also specify a subpart of a file, using the following format:

FILENAME START-FRAME END-FRAME UTTERANCE-ID

AUTHOR

Written by numerous people at CMU from 1994 onwards. This manual page by David Huggins-Daines <dhuggins@cs.cmu.edu>

NAME

SYNOPSIS

DESCRIPTION

AUTHOR

COPYRIGHT

SEE ALSO