Set Tesseract font for OCR -
i use tesseract serial number recognition, want recognize single characters, no word, no dictionary. therefore use 1 of trained tesseract font-types serial number achieve better recognition results.
these trained tesseract font-types:
andale_mono.ttf arial_black.ttf arial_bold.ttf arial.ttf comic_sans_ms_bold.ttf comic_sans_ms.ttf courier_new_bold.ttf courier_new.ttf georgia_bold.ttf georgia.ttf gottf impact.ttf times_new_roman_bold.ttf times_new_roman.ttf trebuchet_ms_bold.ttf trebuchet_ms.ttf verdana_bold.ttf verdana.ttf
since trained font-types have different font-desin styles, there problems in distinguishing example "z" , "2" characters. times new roman has more rounded design, while arial has more straight lines.
my experience is, tesseract has problems distinguish "z" , "2" due changed similarity of other font-designs.
therefore think can achieve better recognition results, if 1 font-type (for example arial) used character recognition tesseract.
question:
is there possibility specify font-type in tesseract?
similar, older topic (october 2012) link
Comments
Post a Comment