pocketsphinx_batch - Run speech recognition in batch mode
pocketsphinx_batch -ctl ctlfile
-cepdir cepdir -cepext .mfc [ options
]...
Run speech recognition over a list of utterances in batchmode. A
list of arguments follows:
- -adchdr
- Size of audio file header in bytes (headers are ignored)
- -adcin
- Input is raw audio data
- -agc
- Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
- -agcthresh
- Initial threshold for automatic gain control
- -allphone
- phoneme decoding with phonetic lm
- -allphone_ci
- Perform phoneme decoding with phonetic lm and context-independent units
only
- -alpha
- Preemphasis parameter
- -argfile
- file giving extra arguments.
- -ascale
- Inverse of acoustic model scale for confidence score calculation
- -aw
- Inverse weight applied to acoustic scores.
- -backtrace
- Print results and backtraces to log file.
- -beam
- Beam width applied to every frame in Viterbi search (smaller values mean
wider beam)
- -bestpath
- Run bestpath (Dijkstra) search over word lattice (3rd pass)
- -bestpathlw
- Language model probability weight for bestpath search
- -build_outdirs
- Create missing subdirectories in output directory
- -cepdir
- files directory (prefixed to filespecs in control file)
- -cepext
- Input files extension (suffixed to filespecs in control file)
- -ceplen
- Number of components in the input feature vector
- -cmn
- Cepstral mean normalization scheme ('current', 'prior', or 'none')
- -cmninit
- Initial values (comma-separated) for cepstral mean when 'prior' is
used
- -compallsen
- Compute all senone scores in every frame (can be faster when there are
many senones)
- -ctl
- file listing utterances to be processed
- -ctlcount
- No. of utterances to be processed (after skipping -ctloffset
entries)
- -ctlincr
- Do every Nth line in the control file
- -ctloffset
- No. of utterances at the beginning of -ctl file to be skipped
- -ctm
- output in CTM file format (may require post-sorting)
- -debug
- level for debugging messages
- -dict
- pronunciation dictionary (lexicon) input file
- -dictcase
- Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII
characters only)
- -dither
- Add 1/2-bit noise
- -doublebw
- Use double bandwidth filters (same center freq)
- -ds
- Frame GMM computation downsampling ratio
- -fdict
- word pronunciation dictionary input file
- -feat
- Feature stream type, depends on the acoustic model
- -featparams
- containing feature extraction parameters.
- -fillprob
- Filler word transition probability
- -frate
- Frame rate
- -fsg
- format finite state grammar file
- -fsgctl
- file listing FSG file to use for each utterance
- -fsgdir
- directory for FSG files
- -fsgext
- extension for FSG files (including leading dot)
- -fsgusealtpron
- Add alternate pronunciations to FSG
- -fsgusefiller
- Insert filler words at each state.
- -fwdflat
- Run forward flat-lexicon search over word lattice (2nd pass)
- -fwdflatbeam
- Beam width applied to every frame in second-pass flat search
- -fwdflatefwid
- Minimum number of end frames for a word to be searched in fwdflat
search
- -fwdflatlw
- Language model probability weight for flat lexicon (2nd pass)
decoding
- -fwdflatsfwin
- Window of frames in lattice to search for successor words in fwdflat
search
- -fwdflatwbeam
- Beam width applied to word exits in second-pass flat search
- -fwdtree
- Run forward lexicon-tree search (1st pass)
- -hmm
- containing acoustic model files.
- -hyp
- output file name
- -hypseg
- output with segmentation file name
- -input_endian
- Endianness of input data, big or little, ignored if NIST or MS Wav
- -jsgf
- grammar file
- -keyphrase
- to spot
- -kws
- file with keyphrases to spot, one per line
- -kws_delay
- Delay to wait for best detection score
- -kws_plp
- Phone loop probability for keyword spotting
- -kws_threshold
- Threshold for p(hyp)/p(alternatives) ratio
- -latsize
- Initial backpointer table size
- -lda
- containing transformation matrix to be applied to features (single-stream
features only)
- -ldadim
- Dimensionality of output of feature transformation (0 to use entire
matrix)
- -lifter
- Length of sin-curve for liftering, or 0 for no liftering.
- -lm
- trigram language model input file
- -lmctl
- a set of language model
- -lmname
- language model in -lmctl to use by default
- -lmnamectl
- file listing LM name to use for each utterance
- -logbase
- Base in which all log-likelihoods calculated
- -logfn
- to write log messages in
- -logspec
- Write out logspectral files instead of cepstra
- -lowerf
- Lower edge of filters
- -lpbeam
- Beam width applied to last phone in words
- -lponlybeam
- Beam width applied to last phone in single-phone words
- -lw
- Language model probability weight
- -maxhmmpf
- Maximum number of active HMMs to maintain at each frame (or -1 for
no pruning)
- -maxwpf
- Maximum number of distinct word exits at each frame (or -1 for no
pruning)
- -mdef
- definition input file
- -mean
- gaussian means input file
- -mfclogdir
- to log feature files to
- -min_endfr
- Nodes ignored in lattice construction if they persist for fewer than N
frames
- -mixw
- mixture weights input file (uncompressed)
- -mixwfloor
- Senone mixture weights floor (applied to data from -mixw file)
- -mllr
- transformation to apply to means and variances
- -mllrctl
- file listing MLLR transforms to use for each utterance
- -mllrdir
- directory for MLLR transforms
- -mllrext
- extension for MLLR transforms (including leading dot)
- -mmap
- Use memory-mapped I/O (if possible) for model files
- -nbest
- Number of N-best hypotheses to write to -nbestdir (0 for no
N-best)
- -nbestdir
- for writing N-best hypothesis lists
- -nbestext
- Extension for N-best hypothesis list files
- -ncep
- Number of cep coefficients
- -nfft
- Size of FFT
- -nfilt
- Number of filter banks
- -nwpen
- New word transition penalty
- -outlatbeam
- Minimum posterior probability for output lattice nodes
- -outlatdir
- for dumping word lattices
- -outlatext
- Filename extension for dumping word lattices
- -outlatfmt
- Format for dumping word lattices (s3 or htk)
- -pbeam
- Beam width applied to phone transitions
- -pip
- Phone insertion penalty
- -pl_beam
- Beam width applied to phone loop search for lookahead
- -pl_pbeam
- Beam width applied to phone loop transitions for lookahead
- -pl_pip
- Phone insertion penalty for phone loop
- -pl_weight
- Weight for phoneme lookahead penalties
- -pl_window
- Phoneme lookahead window size, in frames
- -rawlogdir
- to log raw audio files to
- -remove_dc
- Remove DC offset from each frame
- -remove_noise
- Remove noise with spectral subtraction in mel-energies
- -remove_silence
- Enables VAD, removes silence frames from processing
- -round_filters
- Round mel filter frequencies to DFT points
- -samprate
- Sampling rate
- -seed
- Seed for random number generator; if less than zero, pick our own
- -sendump
- dump (compressed mixture weights) input file
- -senin
- Input is senone score dump files
- -senlogdir
- to log senone score files to
- -senmgau
- to codebook mapping input file (usually not needed)
- -silprob
- Silence word transition probability
- -smoothspec
- Write out cepstral-smoothed logspectral files
- -svspec
- specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
- -tmat
- state transition matrix input file
- -tmatfloor
- HMM state transition probability floor (applied to -tmat file)
- -topn
- Maximum number of top Gaussians to use in scoring.
- -topn_beam
- Beam width used to determine top-N Gaussians (or a list, per-feature)
- -toprule
- rule for JSGF (first public rule is default)
- -transform
- Which type of transform to use to calculate cepstra (legacy, dct, or
htk)
- -unit_area
- Normalize mel filters to unit area
- -upperf
- Upper edge of filters
- -uw
- Unigram weight
- -vad_postspeech
- Num of silence frames to keep after from speech to silence.
- -vad_prespeech
- Num of speech frames to keep before silence to speech.
- -vad_startspeech
- Num of speech frames to trigger vad from silence to speech.
- -vad_threshold
- Threshold for decision between noise and silence frames. Log-ratio between
signal level and noise level.
- -var
- gaussian variances input file
- -varfloor
- Mixture gaussian variance floor (applied to data from -var
file)
- -varnorm
- Variance normalize each utterance (only if CMN == current)
- -verbose
- Show input filenames
- -warp_params
- defining the warping function
- -warp_type
- Warping function type (or shape)
- -wbeam
- Beam width applied to word exits
- -wip
- Word insertion penalty
- -wlen
- Hamming window length
To do batchmode recognition, you will need to specify a control
file, using -ctl This is a simple text file containing one entry per
line. Each entry is the name of an input file relative to the -cepdir
directory, and without the filename extension (which is given in the
-cepext argument).
If you are using acoustic feature files as input (see
sphinx_fe(1) for information on how to generate these), you can also
specify a subpart of a file, using the following format:
FILENAME START-FRAME END-FRAME UTTERANCE-ID
Written by numerous people at CMU from 1994 onwards. This manual
page by David Huggins-Daines <dhuggins@cs.cmu.edu>
Copyright © 1994-2016 Carnegie Mellon University. See the
file LICENSE included with this package for more information.