On Clustering Images of Objects
Jongwoo Lim
Ph.D. Thesis, University of Illinois at Urbana Champaign
The images of an object may look very different under different illumination
conditions or viewing directions.
This thesis considers the problem of clustering images of various objects
into disjoint subsets according to object identity.
The clustering problem has been studied for decades and currently it is
one of the most active research fields in machine learning.
Compared to other data types considered in the clustering research, images
are practically very imporant, but they are very difficult to be handled
due to lack of understandings on their intrinsic properties.
First, the hypergraph partitioning problem, i.e., the problem of clustering
in domains where the orders of affinity relations are higher than pairwise,
is addressed, and a new two-step algorithm for solving this problem is
proposed.
The proposed algorithm first builds a graph approximation of the hypergraph
with a novel approximation scheme, then a spectral graph partitioning
algorithm
is applied to generate the final clustering result.
In addition to the thorough experiments, a theoretical analysis relates
the proposed algorithm to existing hypergraph partitioning algorithms
and presents the reasons for its superior performance.
Given powerful general clustering algorithms, the key step for solving
the image clustering problem is to design an effective similarity measure
between images.
In this thesis, according to the conditions under which images are taken,
three different similarity measures are presented.
When the relative position between objects and the camera is fixed but
the illumination in the environment changes, the conic affinity and
the gradient affinity gives excellent clustering result.
For each image in the clustering dataset, the conic affinity assigns
non-negative weights to other images in its neighborhood, which represents
the likelihood of being from the same object.
The weights represent the global geometric relations among images
in the image space.
In contrast, the gradient affinity works in a pairwise manner.
It directly compares two images based on the probabilistic distribution of
the image gradient vectors of an object under different illumination.
Dealing with viewing direction changes is harder since these images do not
have a simple global structure in the image space.
Instead of considering global relation between images, the local linear
structure between nearby images is exploited to determine the closeness
between images.
By considering affine warps of the images, small 3D pose variation can be
handled robustly, which enhances the clustering result over pose variaition.
When the lighting and pose variation occurs simultaneously,
we do not have an algorithm that presents reasonably good clustering
performance yet.
A few candidate algorithms are tested extensively and the result are given
in this thesis.
Finding an effective similarity measure for both changes still remains as
an open problem.