Set Tesseract font for OCR -


i use tesseract serial number recognition, want recognize single characters, no word, no dictionary. therefore use 1 of trained tesseract font-types serial number achieve better recognition results.

these trained tesseract font-types:

andale_mono.ttf arial_black.ttf arial_bold.ttf arial.ttf comic_sans_ms_bold.ttf comic_sans_ms.ttf courier_new_bold.ttf courier_new.ttf georgia_bold.ttf georgia.ttf gottf impact.ttf times_new_roman_bold.ttf times_new_roman.ttf trebuchet_ms_bold.ttf trebuchet_ms.ttf verdana_bold.ttf verdana.ttf 

since trained font-types have different font-desin styles, there problems in distinguishing example "z" , "2" characters. times new roman has more rounded design, while arial has more straight lines.

font-type design differences

my experience is, tesseract has problems distinguish "z" , "2" due changed similarity of other font-designs.

therefore think can achieve better recognition results, if 1 font-type (for example arial) used character recognition tesseract.

question:

is there possibility specify font-type in tesseract?

similar, older topic (october 2012) link


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -