pocketsphinx_continuous - Run speech recognition in continuous
listening mode
pocketsphinx_continuous [-infile
filename.wav ] [ -inmic yes ] [ options ]...
This program opens the audio device or a file and waits for
speech. When it detects an utterance, it performs speech recognition on
it.
To record from microphone and decode use
- -inmic yes
To decode a 16kHz 16-bit mono WAV file use
- -infile filename.wav
You can also specify -lm or -fsg or -kws
depending on whether you are using a statistical language model or a
finite-state grammar or look for a keyphase.
- -adcdev
- of audio device to use for input.
- -agc
- Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
- -agcthresh
- Initial threshold for automatic gain control
- -allphone
- phoneme decoding with phonetic lm
- -allphone_ci
- Perform phoneme decoding with phonetic lm and context-independent units
only
- -alpha
- Preemphasis parameter
- -argfile
- file giving extra arguments.
- -ascale
- Inverse of acoustic model scale for confidence score calculation
- -aw
- Inverse weight applied to acoustic scores.
- -backtrace
- Print results and backtraces to log file.
- -beam
- Beam width applied to every frame in Viterbi search (smaller values mean
wider beam)
- -bestpath
- Run bestpath (Dijkstra) search over word lattice (3rd pass)
- -bestpathlw
- Language model probability weight for bestpath search
- -ceplen
- Number of components in the input feature vector
- -cmn
- Cepstral mean normalization scheme ('current', 'prior', or 'none')
- -cmninit
- Initial values (comma-separated) for cepstral mean when 'prior' is
used
- -compallsen
- Compute all senone scores in every frame (can be faster when there are
many senones)
- -debug
- level for debugging messages
- -dict
- pronunciation dictionary (lexicon) input file
- -dictcase
- Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII
characters only)
- -dither
- Add 1/2-bit noise
- -doublebw
- Use double bandwidth filters (same center freq)
- -ds
- Frame GMM computation downsampling ratio
- -fdict
- word pronunciation dictionary input file
- -feat
- Feature stream type, depends on the acoustic model
- -featparams
- containing feature extraction parameters.
- -fillprob
- Filler word transition probability
- -frate
- Frame rate
- -fsg
- format finite state grammar file
- -fsgusealtpron
- Add alternate pronunciations to FSG
- -fsgusefiller
- Insert filler words at each state.
- -fwdflat
- Run forward flat-lexicon search over word lattice (2nd pass)
- -fwdflatbeam
- Beam width applied to every frame in second-pass flat search
- -fwdflatefwid
- Minimum number of end frames for a word to be searched in fwdflat
search
- -fwdflatlw
- Language model probability weight for flat lexicon (2nd pass)
decoding
- -fwdflatsfwin
- Window of frames in lattice to search for successor words in fwdflat
search
- -fwdflatwbeam
- Beam width applied to word exits in second-pass flat search
- -fwdtree
- Run forward lexicon-tree search (1st pass)
- -hmm
- containing acoustic model files.
- -infile
- file to transcribe.
- -inmic
- Transcribe audio from microphone.
- -input_endian
- Endianness of input data, big or little, ignored if NIST or MS Wav
- -jsgf
- grammar file
- -keyphrase
- to spot
- -kws
- file with keyphrases to spot, one per line
- -kws_delay
- Delay to wait for best detection score
- -kws_plp
- Phone loop probability for keyword spotting
- -kws_threshold
- Threshold for p(hyp)/p(alternatives) ratio
- -latsize
- Initial backpointer table size
- -lda
- containing transformation matrix to be applied to features (single-stream
features only)
- -ldadim
- Dimensionality of output of feature transformation (0 to use entire
matrix)
- -lifter
- Length of sin-curve for liftering, or 0 for no liftering.
- -lm
- trigram language model input file
- -lmctl
- a set of language model
- -lmname
- language model in -lmctl to use by default
- -logbase
- Base in which all log-likelihoods calculated
- -logfn
- to write log messages in
- -logspec
- Write out logspectral files instead of cepstra
- -lowerf
- Lower edge of filters
- -lpbeam
- Beam width applied to last phone in words
- -lponlybeam
- Beam width applied to last phone in single-phone words
- -lw
- Language model probability weight
- -maxhmmpf
- Maximum number of active HMMs to maintain at each frame (or -1 for
no pruning)
- -maxwpf
- Maximum number of distinct word exits at each frame (or -1 for no
pruning)
- -mdef
- definition input file
- -mean
- gaussian means input file
- -mfclogdir
- to log feature files to
- -min_endfr
- Nodes ignored in lattice construction if they persist for fewer than N
frames
- -mixw
- mixture weights input file (uncompressed)
- -mixwfloor
- Senone mixture weights floor (applied to data from -mixw file)
- -mllr
- transformation to apply to means and variances
- -mmap
- Use memory-mapped I/O (if possible) for model files
- -ncep
- Number of cep coefficients
- -nfft
- Size of FFT
- -nfilt
- Number of filter banks
- -nwpen
- New word transition penalty
- -pbeam
- Beam width applied to phone transitions
- -pip
- Phone insertion penalty
- -pl_beam
- Beam width applied to phone loop search for lookahead
- -pl_pbeam
- Beam width applied to phone loop transitions for lookahead
- -pl_pip
- Phone insertion penalty for phone loop
- -pl_weight
- Weight for phoneme lookahead penalties
- -pl_window
- Phoneme lookahead window size, in frames
- -rawlogdir
- to log raw audio files to
- -remove_dc
- Remove DC offset from each frame
- -remove_noise
- Remove noise with spectral subtraction in mel-energies
- -remove_silence
- Enables VAD, removes silence frames from processing
- -round_filters
- Round mel filter frequencies to DFT points
- -samprate
- Sampling rate
- -seed
- Seed for random number generator; if less than zero, pick our own
- -sendump
- dump (compressed mixture weights) input file
- -senlogdir
- to log senone score files to
- -senmgau
- to codebook mapping input file (usually not needed)
- -silprob
- Silence word transition probability
- -smoothspec
- Write out cepstral-smoothed logspectral files
- -svspec
- specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
- -time
- Print word times in file transcription.
- -tmat
- state transition matrix input file
- -tmatfloor
- HMM state transition probability floor (applied to -tmat file)
- -topn
- Maximum number of top Gaussians to use in scoring.
- -topn_beam
- Beam width used to determine top-N Gaussians (or a list, per-feature)
- -toprule
- rule for JSGF (first public rule is default)
- -transform
- Which type of transform to use to calculate cepstra (legacy, dct, or
htk)
- -unit_area
- Normalize mel filters to unit area
- -upperf
- Upper edge of filters
- -uw
- Unigram weight
- -vad_postspeech
- Num of silence frames to keep after from speech to silence.
- -vad_prespeech
- Num of speech frames to keep before silence to speech.
- -vad_startspeech
- Num of speech frames to trigger vad from silence to speech.
- -vad_threshold
- Threshold for decision between noise and silence frames. Log-ratio between
signal level and noise level.
- -var
- gaussian variances input file
- -varfloor
- Mixture gaussian variance floor (applied to data from -var
file)
- -varnorm
- Variance normalize each utterance (only if CMN == current)
- -verbose
- Show input filenames
- -warp_params
- defining the warping function
- -warp_type
- Warping function type (or shape)
- -wbeam
- Beam width applied to word exits
- -wip
- Word insertion penalty
- -wlen
- Hamming window length
Written by numerous people at CMU from 1994 onwards. This manual
page by David Huggins-Daines <dhuggins@cs.cmu.edu>
Copyright © 1994-2016 Carnegie Mellon University. See the
file LICENSE included with this package for more information.