COMBINE_LANG_MODEL(1) | COMBINE_LANG_MODEL(1) |
combine_lang_model - generate starter traineddata
combine_lang_model --input_unicharset filename --script_dir dirname --output_dir rootdir --lang lang [--lang_is_rtl] [pass_through_recoder] [--words file --puncs file --numbers file]
combine_lang_model(1) generates a starter traineddata file that can be used to train an LSTM-based neural network model. It takes as input a unicharset and an optional set of wordlists. It eliminates the need to run set_unicharset_properties(1), wordlist2dawg(1), some non-existent binary to generate the recoder (unicode compressor), and finally combine_tessdata(1).
--lang lang
--script_dir PATH
--input_unicharset FILE
--lang_is_rtl BOOL
--pass_through_recoder BOOL
--version_str STRING
--words FILE
--numbers FILE
--puncs FILE
--output_dir PATH
combine_lang_model(1) was first made available for tesseract4.00.00alpha.
Main web site: https://github.com/tesseract-ocr Information on training tesseract LSTM: https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00.html
Copyright (C) 2012 Google, Inc. Licensed under the Apache License, Version 2.0
The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett Packard (1985-1995) and Google (2006-present).
01/11/2023 |