简介:图像分析与理解课程作业
Introduction to Chinese Text Detection and Recognition
Text detection in natural scene environment plays an important role in many computer vision applications.While existing text detection methods are focused on English characters,there is strong application demands on text detection in otherlanguages, such as Chinese.
As Chinese characters are much more complex than English characters, innovative and more efficient text detection techniques are required for Chinese texts
This week,Our group create the website and read some relevant papers to make a better understanding of our group project
Overview of Chinese Text Detection and Recognition
This week ,we have read some of papers that refer to chinese text detection.The one that interested us most is a paper titled A Convolutional Neural Network-Based Chinese Text Detection Algorithm via text Struture Modeling. This paper presents a novel text detection algorithm for Chinese character based on a specific designed convolutional neural network(CNN).
The CNN contains a text structure component detector layer,a spatial pyramid layer,and a multi-input-layer deep belief network(DBN).The CNN is pre-trained via a convolutional sparse auto-encoder,specifically designed for extracting complex features from chinese characters.In particular, the text structure component detectors enhance the accuracy and uniqueness of feature descriptors by extracting multiple text structure components in various ways. The spatial pyramid layer enhances the scale invariability of the CNN for detecting texts in multiple scales.Finally, the multi-input-layer DBN replaces the full connected layers in the CNN to ensure features from multiple scales are comparable.
The first novel point is Chinese text structure feature extracter, which is a special layer in CNN called text structure component detector(TSCD) layer.By analysizing the structures of Chinese characters, the Chinese text structure component types can be effectively classified to several easily distinguishable groups based on their aspect ratios, For each text structure component group,a special TSCD is designed to extract its feature , which has its unique feature map shape.
An unsupervised learning method,named convolutional spparse auto-encoder(CSAE),for complex and abstract Chinese texts.AS the availability of public scene Chinese text datasets is very limited, applying an unsupervised learning method to pretrain a CNN model is important in avoiding overfitting.
The third contribution is on the application of a spatial pyramid layer(SPL) and designing a multi-input-layer deep belief network(DBN) as the fully connected layer in the model. The SPL improves the scale invariability of CNN, which is vital to detect various scale texts in natural.With the multi-input-layer DBN, the scale features extracted by SPL and the text features extracted by TSCD can be combined effctively.
HDevelop-based Chinese Text Detection and Recognition
本周,我们利用HDevelop的软件实现了近一千种中文文字识别,在训练库中每个汉字有10种字体以提升识别精度。我们的算法处理过程主要包括图像灰度化、二值化、膨胀以及腐蚀。
我们的实验表明对于纯文本图像的识别精度较高,但对自然背景下包含文本的图像识别精度就很差,原因是膨胀腐蚀算法对于含有背景噪声的文字区域提取不准确。
我们下一步打算对图像先进行文字区域提取,然后对其进行膨胀腐蚀运算。
下图是我们对纯文本图像的识别结果,图中的四个汉字都能识别出来

由于字体的差异性,当我们对某些汉字,如汉字“一”,进行识别时,却识别为“人”字。

我们分析可能是由于“一”字最后收尾的一笔,对识别可能会有影响。于是,我们在训练库中新增了几种字体,汉字“一”被正确识别出来了,我们实验结果表明,训练库中字体的数目的增加会提高识别精度。

本周,我们扩充训练库至2500个汉字,之前利用的SVM的分类方法的识别效果如下图

由于训练库中只包含图像中6个汉字,导致识别效果较差,下图是对另外一幅图像中的文字进行识别的结果

由于对英文没进行训练,英文全部识别错误。在汉字中,由于中文结构的复杂性,有时会把左右或者上下结构的汉字识别为多个汉字,比如上图中的“你”和“总”,还有对于形近字也容易识别错误,如把“全”识别为“金”。
后来,我们利用KNN的方法重新对中文汉字进行识别,提高了一定的识别准确性,如下图所示,对第一幅图进行识别,8个字识别出了7个字。
总结:我们的中文文本检测与识别都是基于白底黑字的图像进行检测识别,这离实际的运用还有很大的差距,但通过这些简单的图像中的汉字识别,我们熟悉了基本图像处理的流程及一些相关的算法。能够把课堂上学到的知识运用到实践中,这应该是我们学习图像分析与理解最大的收获吧!
References:
[1] A Convolutional Neural Network Based Chinese Text Detection Algorithm via Text Structure Modeling, In TMM 2016.
[2] A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine, In cs.CV, 2016.
Group Member:
周健 杨田野 罗曼 易俊
Group Project:
Chinese Text Detection and Recognition