First Workshop on Analysis and Understanding of Document Images in Network Media (AUDINM)

Beijing, May 12, 2015

Co-Chairs: Cheng-Lin Liu, Jean-Marc Ogier


l  Introduction

People face huge multimedia data in networked era, including text, image, video and audio. Document images are an important category of network media data. This category of data increases rapidly particularly because of the increasing use of smart phones and social media sites (Facebook, twitter, Weibo, etc). For understanding and exploiting media data effectively, the text information in images is necessary to be analyzed and recognized. Potential applications include commercial data mining, information extraction and cyber security.

Document images can be categorized into four broad classes: scene text images, scanned documents, camera-based paper documents, synthesized (born-digital) documents. Scanned documents have been the focus of pattern recognition and document analysis community for a long time, but the camera-based and Web-based documents received high attention only in recent 10 years. Several competitions and workshops on robust reading (aiming for scene text images and Web document images) have been held, and the publications in this field are fast increasing now.

To push forward the research in this field and strengthen the collaboration between Chinese and French researchers, the National Natural Science Foundation of China (NSFC) and the French National Research Agency (ANR) co-funded a project Analysis and Understanding of Document Images in Network Media (AUDINM), which is performed by the Institute of Automation of Chinese Academy of Sciences (PI: Cheng-Lin Liu) and the University of La Rochelle (PI: Jean-Marc Ogier). As a partial commitment of the project, this workshop invites researchers from China and France to exchange the progress in the field of document image analysis. We are happy for the participation of Dr. Josep Llados from Spain.


l  Program

Place: Meeting Room #1, 3rd floor, Intelligence Building, Institute of Automation of Chinese Academy of Sciences, No.95, Zhongguancun East Road, Beijing

北京中关村东路95号,中国科学院自动化研究所 智能化大厦3层第一会议室

A map for the venue can be found at







Cheng-Lin Liu


Document Engineering: From Image Processing to Document Analysis and Indexing

Jean-Marc Ogier


Historical Chinese Document Recognition for Digital Library

Liangrui Peng




Word Spotting Tools in Genealogy Retrieval from Historical Handwritten Records

Josep Llados


What Is Text? Detecting Text in Scene Imagery

Qixiang Ye




Smartphone-Based Acquisition of Document Images

Muzzamil Luqman


Scene Text Detection in Images and Video

Xu-Cheng Yin


Representation in Scene Text Detection and Recognition

Cong Yao




Segmentation and Indexation of Complex Objects in Comic Book Images

Jean-Christophe Burie


Text Line Recognition in Chinese Document Images

Cheng-Lin Liu


Laboratory Tour to NLPR


l  Abstracts and Speakers Biographies

1. Jean-Marc Ogier (University of La Rochelle, France)

Title: Document Engineering: From Image Processing to Document Analysis and Indexing

Abstract: Document engineering is the area of knowledge concerned with principles, tools and processes that improve our ability to create, manage, store, compact, access, and maintain documents.  The fields of document recognition and retrieval have grown rapidly in recent years. Such development has been fueled by the emergence of new application areas such as the World Wide Web (WWW), digital libraries, and video- and camera-based OCR. This talk will address some recent developments in the area of Document Processing.

Biography: Jean-Marc Ogier received his PhD degree in computer science from the University of Rouen, France, in 1994. During this period (1991-1994), he worked on graphic recognition for Matra Ms&I Company. From 1994 to 2000, he was an associate professor at the University of Rennes 1 during a first period (1994-1998)  and at the University of Rouen from 1998 to 2001. Now full professor at the university of la Rochelle, Pr Ogier is the head of URL laboratory which gathers more than 100 members and works mainly of Document Analysis and Content Management. Author of more than 160 publications / communications, he managed several French and European projects dealing with historical document analysis, either with public institutions, or with private companies.  Pr Ogier is a Deputy Director of the GDR I3 of the French National Research Centre (CNRS). He is also Chair of the Technical Committee 10 (Graphic Recognition) of the International Association for Pattern Recognition (IAPR), and i the representative member of France at the governing board of the IAPR. At last he is also Vice rector of the university of La Rochelle.


2. Liangrui Peng (Tsinghua University, China)

Title: Historical Chinese Document Recognition for Digital Library

Abstract: Historical Chinese documents which reflect Chinese civilization are invaluable collections in many libraries both in China and abroad. Historical Chinese document recognition technology is very important to facilitate the related full text digitalization projects for digital libraries. However, it is a very challenging problem in OCR (Optical Character Recognition) research field due to large character set of historical Chinese characters, variant font types, versatile document layout styles, the degradation of image quality, and the lack of labelled training samples. In this talk, I will present our recent progress of key technologies including historical Chinese character recognition, character segmentation, and document image preprocessing. Especially, several transfer learning methods for historical Chinese character recognition are explored.

Biography: Liangrui Peng is an associate professor at the Department of Electronic Engineering, Tsinghua University. She received the BS degree and the MS degree from Huazhong University of Science and Technology, Wuhan, China, in 1994 and 1997 respectively. She received the PhD degree from Tsinghua University in 2010. Since 1997, she has been a faculty member at the Department of Electronic Engineering, Tsinghua University. Her research interests include multilingual document recognition, pattern recognition, and machine learning. She was one of the recipients of the National Awards for Science and Technology Progress (Second Class) in China in 1999, 2003 and 2008. She is a member of the IEEE and of the ACM.


3. Josep Llados (Universitat Autònoma de Barcelona, Spain)

Title: Word Spotting Tools in Genealogy Retrieval from Historical Handwritten Records

Abstract: Search centered at people is very important in historical research, including historical demography, people trajectories reconstruction and genealogical research. Queries about a person and his/her connections to other people allow to get a picture of a historical context: a person’s life, an event, a location at some period of time. For this purpose, scholars use documents like birth, marriage, or census records.

From a technical point of view, word spotting plays a central role in searching among historical people records. Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. We have proposed different word spotting approaches for historical manuscript retrieval. In particular, we have evaluated the performance within the EU-ERC project Five Centuries of Marriages (5CofM), which consists in the analysis of marriage license records from the Barcelona Cathedral.

We have made some contributions in context-aware word spotting. Usually word spotting is built based solely on the statistics of local terms. The use of correlative semantic labels between codewords adds more discriminability in the process. Three levels of context can be defined in a word spotting scenario. First, the joint occurrence of words in a given image segment. Second, the geometric context involving a language model regarding to the relative 1D or 2D position of objects. Third, the semantic context defined by the topic of the document. A number of document collections convey an underlying structure. We take advantage of the structure to boost the search of words, with a joint search of the query word and its context.

Biography: Josep Lladós received the degree in Computer Sciences in 1991 from the Universitat Politècnica de Catalunya and the PhD degree in Computer Sciences in 1997 from the Universitat Autònoma de Barcelona (Spain) and the Université Paris 8 (France). Currently he is an Associate Professor at the Computer Sciences Department of the Universitat Autònoma de Barcelona and a staff researcher of the Computer Vision Center, where he is also the director since January 2009. He is visiting researcher of the IDAKS Lab of the Osaka Prefecture University (Japan). He is chair holder of Knowledge Transfer of the UAB Research Park and Santander Bank. He is the head of the Pattern Recognition and Document Analysis Group (2014SGR-1436). His current research fields are document analysis, structural and syntactic pattern recognition and computer vision. He has been the head of a number of Computer Vision R+D projects and published more than 200 papers in national and international conferences and journals. J. Lladós is an active member of the Image Analysis and Pattern Recognition Spanish Association (AERFAI), a member society of the IAPR. He is currently the chairman of the IAPR Educational Committee.  Formerly he served as chairman of the IAPR Indistrial Liaison Committee, and the IAPR TC-10, the Technical Committee on Graphics Recognition. He is chief editor of the ELCVIA (Electronic Letters on Computer Vision and Image Analysis) and he serves on the Editorial Board of the IJDAR (International Journal in Document Analysis and Recognition), the Cultural Heritage Digitization (specialty section of Frontiers in Digital Humanities), and also a PC member of a number of international conferences. He was the recipient of the IAPR-ICDAR Young Investigator Award in 2007. He was the general chair of the International Conference on Document Analysis and Recognition (ICDAR’2009) held in Barcelona in July 2009, and co-chair of the IAPR TC-10 Graphics Recognition Workshop of 2003 (Barcelona), 2005 (Hong Kong), 2007 (Curitiba) and 2009 (La Rochelle). Josep Lladós has also experience in technological transfer and in 2002 he created the company ICAR Vision Systems, a spin-off of the CVC/UAB.


4. Qixiang Ye (University of Chinese Academy of Sciences, China)

Title: What Is Text? Detecting Text in Scene Imagery

Abstract: The problems of text detection and recognition in images and video have received increased attention in recent years, as indicated by the emergence of recent “Robust Reading” competitions in 2003, 2005, 2011, and 2013, along with bi-annual international workshops on camera-based document analysis and recognition (CBDAR) from 2005 to 2013. The emergence of applications on mobile devices, including the iPhone and Android platforms, which translate text into other languages in real time, has stimulated renewed interest in the problems

This report analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery. It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems. Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition. Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed. The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared. This review provides a fundamental comparison and analysis of the remaining problems in the field.

Biography: Qixiang Ye received his B.S. and M.S. degrees in mechanical & electrical engineering from Harbin Institute of Technology (HIT) in 1999 and 2001 respectively, and a Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences in 2006. From 2006 to 2009, he was an assistant professor and since 2009 and has been an associate professor at the University of the Chinese Academy of Sciences. He was a visiting assistant professor of University of Maryland Institute of Advanced Computer Studies (UMIACS) until Jan. 2014. His research interests include image processing, image based object detection and machine learning. He was a recipient of the Sony Outstanding Paper Award and the LuJiaXi Young Researcher Award. He is a Senior Member of IEEE.


5. Muhammad Muzzamil Luqman (University of La Rochelle, France)

Title: Smartphone-Based Acquisition of Document Images

Abstract: In this talk I first present an overview on the evolution of the document scanners. In second part I will give a detailed presentation of the various works that have been realized at the L3i laboratory of the University of La Rochelle, on producing the algorithms for converting modern smartphones into intelligent document scanners. In the last part of the presentation I will show some interesting demos including a domo on scanning posters in very high resolution by smartphones and a domo on the virtual reality in document images by smart glasses.

Biography: Muhammad Muzzamil Luqman is currently a Postdoctoral researcher with  at L3i Laboratory, University of La Rochelle (France), since September 2012. Luqman is a PhD in Computer Science from François Rabelais University of Tours (France) and Autonoma University of Barcelona (Spain). He successfully defended his PhD thesis - titled "Fuzzy Multilevel Graph Embedding for Recognition, Indexing and Retrieval of Graphic Document Images" - with distinction "très honorable (magna cum laude)", on Friday 2nd of March 2012 at François Rabelais University of Tours (France). His thesis supervisors were Professor Jean-Yves Ramel, Dr. Thierry Brouard and Dr. Josep Llados.


6. Xu-Cheng Yin (University of Science and Technology Beijing, China)

Title: Scene Text Detection in Images and Video

Abstract: In this talk, we first present an adaptive hierarchical clustering algorithm, which can simultaneously learn similarity weights (to adaptively combine different feature similarities) and the clustering threshold (to automatically determine the number of clusters). Then, we propose an robust method for detecting horizontal (near horizontal) text in scene images, where character candidates with pruned MSERs are extracted, text candidates are constructed by grouping characters using the adaptive hierarchical clustering, text candidates are filtered with a Bayesian classifiers, and text candidates are classified with an AdaBoost learner. Next, we extend our system to multi-orientation scene text detection, where the text candidates construction process consists of several sequential coarse-to-fine grouping steps: morphology-based, orientation-based and projection-based grouping. Finally, we also present a novel multi-strategy tracking based text detection method in scene videos, which uses tracking by detection, spatio-temporal context learning, and linear prediction to predict the candidate text location sequentially, and adaptively integrates and select the best matching text block. Our proposed system won the first place of both “Text Localization in Real Scenes” and “Text Localization in Born-Digital Images” in the ICDAR 2013 Robust Reading Competition.

Biography: Dr. Xu-Cheng Yin received the B.Sc and M.Sc. degrees both in computer science from the University of Science and Technology Beijing, Beijing, China, in 1999 and 2002, respectively, and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, China, in 2006. He is an Associate Professor with the Department of Computer Science and Technology, University of Science and Technology Beijing. He was a Visiting Researcher in the School of Computer Science, University of Massachusetts Amherst, MA, USA from July 2014 to August 2014 with the Computer Vision Lab, and from January 2013 to January 2014 with the Center for Intelligent Information Retrieval respectively. From 2006 to 2008, he was a Researcher at IT Lab, Fujitsu R&D Center. From 2002 to 2006, he was a R&D engineer at the R&D center, Hanwang Technology Co. Ltd. His pattern recognition and computer vision team won the first place of both “Text Localization in Real Scenes” and “Text Localization in Born-Digital Images” in the ICDAR 2013 Robust Reading Competition. His information retrieval and recommendation systems team won the 1st place of INEX 2014 Social Book Search Track (Suggestion Task). His current research interests include pattern recognition and computer vision, information retrieval and recommendation systems, and document analysis and recognition.


7. Cong Yao (Huazhong University of Science and Technology)

Title: Representation in Scene Text Detection and Recognition

Text, as one of the most influential inventions of humanity, has played an important role in human life, so far from ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications, therefore text detection and recognition in natural scenes have become active research topics in computer vision. Especially in recent years, the community has seen a surge of research efforts and substantial progresses in these fields. In this presentation, I will first introduce recent works on scene text detection and recognition, mainly from the perspective of representation. Also, I will present the main ideas and details of three of our works. Finally, potential directions for future research are discussed.

Biography: Cong Yao received the B.S. and Ph.D. degrees in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2008 and 2014, respectively. He was a Visiting Research Scholar with Temple University, Philadelphia, PA, USA, in 2013. His research has focused on computer vision and machine learning, in particular, the area of text detection and recognition in natural images.


8. Jean-Christophe Burie (University of La Rochelle, France)

Title: Segmentation and Indexation of Complex Objects in Comic Book Images

Abstract:  Born in the 19th century, comics is a visual medium used to express ideas via images, often combined with text or visual information. It is considered as a sequential art, spread worldwide initially using newspapers, books and magazines. Nowadays, the development of the new technologies and the World Wide Web is giving birth to a new form of paperless comics that takes advantage of the virtual world freedom. However, traditional comics still represent an important cultural heritage in many countries. They have not yet received the same level of attention as music, cinema or literature about their adaptation to the digital format. Using information technologies with classic comics would facilitate the exploration of digital libraries, faster theirs translations, allow augmented reading, speech playback for the visually impaired etc.

The design process of comics is so typical that their automated analysis may be seen as a niche research field within document analysis, at the intersection of complex background, semi-structured and mixed content documents.

In this work, three different approaches for comic book image analysis has been proposed. The first approach is called “sequential” because the image content is described in an intuitive way, from simple to complex elements using previously extracted elements to guide further processing. The second method is called “independent” because it is composed by several specific extractors for each elements of the image content. The third approach introduce a knowledge-driven system that combines low and high level processing to build a scalable system of comics image understanding.

Biography: Jean-Christophe BURIE received his Ph.D. degree in automatic control engineering and industrial data processing from University of Lille, France, in 1995. During his thesis (1993 - 1995), he worked on stereovision algorithms for obstacle detection in the framework of the European Project EUREKA-Prometheus. He was a research fellow in the Department of Mechanical Engineering for Computer-Controlled  Machinery, Osaka University, Japan from 1995 to 1997 in the framework  of the Lavoisier Program of the French Foreign Office. From 1998 to 2014, Jean-Christophe Burie was associate professor at the Computer Science Department of La Rochelle University. He is now Full Professor and deputy director of the L3i Lab. He is also member of governing board of the Technical Committee 10 (Graphic Recognition) of the International Association for Pattern Recognition (IAPR). His research interests include computer vision, color image processing, pattern recognition, document image analysis. Since 2011, he is the co-leader of the E-BDtheque research program dedicated to the indexing of comics books and has developed some relations with Osaka University (Japan) for the indexing of Manga.


9. Cheng-Lin Liu (Institute of Automation of Chinese Academy of Sciences, China)

Title: Text Line Recognition in Chinese Document Images

Abstract: Text line recognition is at the core of document image recognition and its major difficulty lies in character segmentation. To overcome the ambiguity of boundary between characters, a promising approach is to perform character segmentation and recognition simultaneously incorporating contextual information, just like human being’s behavior in reading. The methods can be categorized into three classes: holistic approach, implicit-segmentation-based and explicit-segmentation-based. Holistic approach only applied to small lexicon. HMM and RNN (such as BLSTM) are typical implicit segmentation methods. Chinese text line recognition has mostly taken the explicit segmentation approach, which generates character candidates by merging primitive segments based on over-segmentation. In this talk, I report some recent works of my group on Chinese text line recognition, focusing on the techniques of over-segmentation, classifier and contexts modeling and fusion. We applied the approach to handwriting recognition and scene text recognition. Last, I discuss the remaining problems and prospects.

Biography: Cheng-Lin Liu is a Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, Beijing, China, and is now the director of the laboratory. He received the B.S. degree in electronic engineering from Wuhan University, Wuhan, China, the M.E. degree in electronic engineering from Beijing Polytechnic University, Beijing, China, the Ph.D. degree in pattern recognition and intelligent control from the Chinese Academy of Sciences, Beijing, China, in 1989, 1992 and 1995, respectively. He was a postdoctoral fellow at Korea Advanced Institute of Science and Technology (KAIST) and later at Tokyo University of Agriculture and Technology from March 1996 to March 1999. From 1999 to 2004, he was a research staff member and later a senior researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. His research interests include pattern recognition, image processing, neural networks, machine learning, and especially the applications to character recognition and document analysis. He has published over 200 technical papers at prestigious international journals and conferences. He is on the editorial board of journals Pattern Recognition, Image and Vision Computing, and International Journal on Document Analysis and Recognition. He is a Fellow of the IAPR and a senior member of the IEEE.