简介：图像分析与理解课程作业

Introduction to Chinese Text Detection and Recognition

2017.03.27

Text detection in natural scene environment plays an important role in many computer vision applications.While existing text detection methods are focused on English characters,there is strong application demands on text detection in otherlanguages, such as Chinese.

As Chinese characters are much more complex than English characters, innovative and more efficient text detection techniques are required for Chinese texts

This week,Our group create the website and read some relevant papers to make a better understanding of our group project

Overview of Chinese Text Detection and Recognition

This week ,we have read some of papers that refer to chinese text detection.The one that interested us most is a paper titled A Convolutional Neural Network-Based Chinese Text Detection Algorithm via text Struture Modeling. This paper presents a novel text detection algorithm for Chinese character based on a specific designed convolutional neural network(CNN).

fig 1-流程图

The CNN contains a text structure component detector layer,a spatial pyramid layer,and a multi-input-layer deep belief network(DBN).The CNN is pre-trained via a convolutional sparse auto-encoder,specifically designed for extracting complex features from chinese characters.In particular, the text structure component detectors enhance the accuracy and uniqueness of feature descriptors by extracting multiple text structure components in various ways. The spatial pyramid layer enhances the scale invariability of the CNN for detecting texts in multiple scales.Finally, the multi-input-layer DBN replaces the full connected layers in the CNN to ensure features from multiple scales are comparable.

fig 2

The first novel point is Chinese text structure feature extracter, which is a special layer in CNN called text structure component detector(TSCD) layer.By analysizing the structures of Chinese characters, the Chinese text structure component types can be effectively classified to several easily distinguishable groups based on their aspect ratios, For each text structure component group,a special TSCD is designed to extract its feature , which has its unique feature map shape.

fig 3-算法

An unsupervised learning method,named convolutional spparse auto-encoder(CSAE),for complex and abstract Chinese texts.AS the availability of public scene Chinese text datasets is very limited, applying an unsupervised learning method to pretrain a CNN model is important in avoiding overfitting.

The third contribution is on the application of a spatial pyramid layer(SPL) and designing a multi-input-layer deep belief network(DBN) as the fully connected layer in the model. The SPL improves the scale invariability of CNN, which is vital to detect various scale texts in natural.With the multi-input-layer DBN, the scale features extracted by SPL and the text features extracted by TSCD can be combined effctively.