Resarch interest: Computer Vision, Robotics, Machine Learning
![]() |
In this paper we propose a robust visual odometry system for a wide-baseline camera rig with wide field-of-view (FOV) fisheye lenses, which provides full omnidirectional stereo observations of the environment. For more robust and accurate ego-motion estimation we adds three components to the standard VO pipelines, 1) the hybrid projection model for improved feature matching, 2) multi-view p3p ransanc algorithm for pose estimation, and 3) online update of rig extrinsic parameters. The proposed system is extensively evaluated with synthetic datasets with ground-truth and real sequences of highly dynamic environment, and its superior performance is demonstrated. |
![]() |
Building the map of environment for localization and navigation is critical for scene understanding and robot operation. We propose a metric-topological mapping which holds the benefits of both metric maps and topological maps. |
![]() |
Real-time approach for monocular visual simultaneous localization and mapping (SLAM) within a large-scale environment is proposed. From a monocular video sequence, the proposed method continuously computes the current 6-DOF camera pose and 3D landmarks position. The proposed method successfully builds consistent maps from challenging outdoor sequences using a monocular camera as the only sensor, while existing approaches have utilized additional structural information such as camera height from the ground. |
![]() |
![]() |
Points are commonly used for structure from motion and ego-motion estimation. We investigated more robust and fast ways to use line features for motion estimation of a stereo camera rig. |
![]() |
Visual inertial odometry (VIO) gained lots of interest recently for efficient and accurate ego-motion estimation of robots and automobiles. With a monocular camera and an inertial measurement unit (IMU) rigidly attached, VIO aims to estimate the 3D pose trajectory of the device in a global metric space. We propose a novel visual inertial odometry algorithm which directly optimizes the camera poses with noisy IMU data and visual feature locations. |
![]() |
In this paper, we propose a robust dense stereo reconstruction algorithm using a random walk with restart. The pixel-wise matching costs are aggregated into superpixels and the modified random walk with restart algorithm updates the matching cost for all possible disparities between the superpixels. |
![]() |
For decades many visual trackers have been proposed, but there was little effort to quantitatively measure and compare their performance. In this work we provide a dataset which contains common test videos with hand-labeled groundtruth. The tracker library with standardized interface for massive evaluation enables the researchers to easily test and compare their trackers with the state-of-the-art trackers. |
![]() |
Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low- level features used in existing multi-target tracking methods are not effective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face features using convolutional neural networks (CNNs). |
![]() | We propose a DFT based pooling layer for convolutional neural networks. The proposed DFT magnitude pooling satisfies translation invariance and shape preserving properties. It pools DFT magnitude of last convolution feature map based on shift theorem. Convolutional neural networks with the proposed method improve the performance of various visual classification tasks. We validate the ability of the transformation invariance by sufficient experiments of the paper. |
![]() |
Googel Business Photos is a service which brings the StreetView inside of local businesses. It requires automatic panorama stitching, panorama localization using structure from motion, and manual editing for misplaced panoramas. |
![]() |
In Honda Research Institute USA, inc. I worked on a human-robot interaction project. We built a system which makes ASIMO to play a memory game (card matching) with a child player. All sensing is done using the onboard stereo camera. |
![]() |
Image clustering is to group images according to the identity or class of the objects in the images. Ideally the affinity measure must be insensitive to illumination variation or viewing direction changes in the images. We have proposed a few affinity measures for this purpose, and a general framework for hypergraph approximation. |