Segmentation and indexation of complex objects in comic book

Christophe Rigaud


Born in the 19th century, comics is a visual medium used to express ideas via images, often combined with text or visual information.
It is an art form that uses images deployed in sequence for graphic storytelling (sequential art), spread worldwide initially using newspapers, books and magazines.
Nowadays, the development of the new technologies and the World Wide Web is giving birth to a new form of paperless comics that takes advantage of the virtual world freedom.
However, traditional comics still represent an important cultural heritage in many countries.
They have not yet received the same level of attention as music, cinema or literature about their adaptation to the digital format.
Using information technologies with digitized comic books would facilitate the exploration of digital libraries, accelerate their translation, allow augmented reading, speech playback for the visually impaired etc.

Heritage museums such as the CIBDI (French acronym for International City of Comic books and Images), the Kyoto International Manga Museum and The Digital Comic Museum have already digitized several thousands of comic albums that some are now in the public domain.
Despite the growing market place of digital comics, few research has been carried out to take advantage of the added value provided by these new media.
A particularity of documents is their dependence on the type of document that often requires specific processing.
The challenge of document analysis systems is to propose generic solutions for specific problems.
The design process of comics is so specific that their automated analysis may be seen as a niche research field within document analysis, at the intersection of complex background, semi-structured and mixed content documents.
Being at the intersection of several fields combines their difficulties.
In this thesis, we review, highlight and illustrate the challenges related to comic book image analysis in order to provide a good overview about the last research progress in this field and the current issues.
In order to cover the widest possible scope of study, we propose three different approaches for comic book image analysis.
The three approaches aim to provide an automatic description of the image content.
Different levels of description are discussed, from spacial positions (low level) to semantic information (high level).

The first approach describes the image in an intuitive way, from simple to complex elements using previously extracted elements to guide further processing.

Simple elements such as panel, text and balloon regions are extracted first, followed by balloon tails and comic character positions from the direction indicated by the tails.
The second approach addresses independent information extraction to recover the main drawback of the first approach: error propagation.
This second method is composed by several specific extractors for each type of content, independent from each other.
Those extractors can be used in parallel, without needing previous information which cancels the error propagation effect.
Extra processing such as balloon type classification and text recognition are also covered.
The third approach introduces a knowledge-driven system that combines low and high level processing to build a scalable system for comics image understanding.
This approach is intended to improve the overall precision of content extraction methods.
We built an expert system composed by an inference engine and two models, one for comics domain and another one for image processing, stored in an ontology.
The first model embeds the knowledge about comic books and the second models the image processing related part.
These two models allow consistency analysis of extracted information and inference of the relationships between all the extracted elements such as the reading order, the type of text (e.g. spoken, onomatopoeic, illustrative) and the relations between speech balloons and speaking characters.
The expert system combines the benefits of the two first approaches and enables high level semantic description such as the reading order, the semantic of the balloon shapes, the relations between the speech balloons and their speakers, and the interaction between the comic characters.

Apart from that, in this thesis we have provided the first public comic book image dataset and ground truth to the community along with an overall experimental comparison of all the proposed methods and some of the state-of-the-art methods

Full Text:

PDF (583Kb)
Copyright (c) 2016 Christophe Rigaud