There is a new KNIME forum. You can still browse and read content from our old forum but if you want to create new posts or join ongoing discussions, please visit our new KNIME forum: https://forum.knime.com

Suggestions for how to handle fused labels

Member for

3 years 11 months joshuahoran

HI all,

I'm just diving in to the imaging nodes and I am blown away at how powerful these nodes are. I am currently processing images of vials with circular labels on the caps. My goal is to isolate and then perform OCR on the digits shown on the label. I have learned that Tess4J does a poor job when there is border around the text so I go to great lengths to remove as much of the peripheral (non-text) image before performing OCR. To do this I use:

Global Thresholder -> Connected Component Analysis -> Feature calculator

I use the calculated features to filter out anything large or circular. Overall this works very well except in cases like the one shown in the attached image. Notice that the top '7' digit is fused with the border segment, which results in its removal along with the border. I am looking for suggestions on ways I can try to split the '7' from the circular border. So far I have tried the Waehlby Splitter node, but that seems to segment everything into a million smaller pieces and I am no longer able to identify which pieces are part of the border and which are part of the text.

Any suggestions are welcome!

 

 

Comments
Sun, 02/18/2018 - 07:27

Member for

6 years 4 months

gab1one

Hi Joshua,

As a starting point: try out the Thinning node, followed by a Morphological Operations node with the Erode option to increase the size of the lines again. This allows you to split the "7" from the rest of the border.

best,

Gabriel

Tue, 02/20/2018 - 09:23

Member for

3 years 11 months

joshuahoran

Thanks Gabriel. This helps me move in the right direction.

Files