Towards an interactive index structuring system for content-based image retrieval in large image databases

Hien Phuong Lai, Muriel Visani, Alain Boucher, Jean-Marc Ogier

Abstract

In recent years, the expansion of acquisition devices such as digital cameras, the development of storage and
transmission techniques and the success of tablet computers facilitate the development of many large image
databases as well as the interactions with the users. This thesis [1] deals with the problem of Content-Based
Image Retrieval (CBIR) on these huge masses of data. Traditional CBIR systems generally rely on three phases:
feature extraction, feature space structuring and retrieval. In this thesis, we are particularly interested in the
structuring phase (normally called indexing phase), which plays a very important role in finding information in
large databases. This phase aims at organizing the visual feature descriptors of all images into an efficient data
structure in order to facilitate, accelerate and improve further retrieval. We assume that the feature extraction
phase is completed and the image feature descriptors which are usually low-level features describing the color,
shape, texture, etc. of all images are available. Instead of traditional structuring methods, clustering methods
which organize image descriptors into groups of similar objects (clusters), without any constraint on the cluster
size, are studied. The aim is to obtain an indexed structure more adapted to the retrieval of high dimensional
and unbalanced data. Clustering process can be done without prior knowledge (unsupervised clustering) or
with a limited amount of prior knowledge (semi-supervised clustering).

Due to the “semantic gap” between high-level semantic concepts expressed by the user via the query and
the low-level features automatically extracted from the images, the clustering results and therefore the retrieval
results are generally different from the wishes of the user. In this thesis, we proposed to involve the user in
the clustering phase so that he/she can interact with the system so as to improve the clustering results, and thus
improve the performance of the further retrieval. The idea is as follows. Firstly, images are organized into
clusters by using an initial clustering. Then, the user visualizes the clustering result and provides feedback to
the system in order to guide the re-clustering phase. The system then re-organizes the dataset by using not only
the similarity between objects, but also the feedback given by the user in order to reduce the semantic gap.
The interactive loop can be iterated until the clustering result satisfies the user. In the case of large database
indexing, we assume that the user has no prior knowledge about the image database. Therefore, an unsupervised
clustering method is suitable to be used for the initial clustering, when no supervised information is available
yet. Then, after receiving the user feedback in each interactive iteration, a semi-supervised clustering can be
used for the re-clustering process.

Based on a deep study of the state of the art of different unsupervised clustering methods [4] as well as semi-
supervised clustering approaches [2, 3], we propose in this thesis a new interactive semi-supervised clustering
model [3] involving the user in the clustering phase in order to improve the clustering results. From the formal
analysis of different unsupervised clustering methods [4], we chose to experiment some methods which appear
to be the most suitable to be used in an incremental context involving the user in the clustering stage. The
hierarchical BIRCH unsupervised clustering (Zhang et al., 1996) which gives the best performance from these
experiments [4] is chosen to be used as the initial clustering in our model. Then, an interactive loop in which the user provides the feedback to the system and the system re-organizes the image database using the new
semi-supervised clustering method proposed in this thesis is iterated until the clustering result satisfies the user.
As the user has no prior knowledge about the image database, it is difficult for him/her to label the clusters
or the images in the clusters using classes. Therefore, we provide to the user an interactive interface allowing
him/her to easily visualize the clustering result and give feedback to the system. Based on the majority of
the images displayed for each cluster, the user can specify, by some simple clicks, relevant or non-relevant
images for each cluster. The user can also drag and drop images between clusters in order to change the
cluster assignment of some images. Then, supervised information is deduced from the user feedback in order
to be used for the re-clustering phase using the proposed semi-supervised clustering method. According to
our study of the state of the art of different semi-supervised clustering methods, supervised information may
consist of class labels for some objects or pairwise constraints (must-link or cannot-link) between objects. The
experimental analysis of different semi-supervised clustering methods in the interactive context [2, 3] shows
a high performance of the HMRF-kmeans (Basu et al., 2004) which uses pairwise constraints compared with
the other methods. Inspired from the HMRF-kmeans method, we proposed a new semi-supervised clustering
method [3] for the re-clustering process. Instead of using pairwise constraints between images, our method
uses pairwise constraints between the leaf entries (CF entries) of the BIRCH tree as supervised information for
guiding the re-organization of the CF entries in the re-clustering phase. As each CF entry groups a set of similar
images, pairwise constraints between images can be replaced by a smaller number of pairwise constraints
between CF entries, without reducing the quality of supervised information. And therefore, the processing
time could be reduced without decreasing the performance. In our model, after receiving user feedback in
each interactive iteration, pairwise constraints can be deduced based not only on the user feedback but also
on the neighbourhood information. Neighbourhood information groups images according to the willingness
of the user to classify them in the same clusters (via user feedback of all interactive iterations). This kind of
information helps to maximize the supervised information (pairwise constraints) gained from a same number
of user clicks.

In order to avoid the subjective dependence of the clustering results on the human user, a software agent
simulating the behaviour of the human user for providing feedback to the system is used for the experimental
analysis of our system using different image databases of increasing sizes (Wang, PascalVoc2006, Caltech101,
Corel30k). Moreover, different strategies for deducing pairwise constraints from user feedback and neighbour-
hood information were investigated. Among these strategies, the strategy which keeps only the most “diffi-
cult” constraints (must-link between the most distant objects and cannot-links between the closest objects) was
shown to give the best trade-off between the performance and the processing time. Furthermore, the experi-
mental results show that our model helps to improve the clustering results by involving the user and that our
semi-supervised clustering outperforms the HMRF-kmeans, in both performance and processing time. Note
that our clustering structure can be used not only for facilitating the further image retrieval, but also for helping
the navigation in large image databases. Moreover, in this thesis, we propose a 2D interface for visualizing the
group structure of high dimensional image databases.

Keywords

semi-supervised clustering; interactive learning; image indexing; classification and clustering
Copyright (c)