Lecture: 10 AM, this Sunday, Beijing Time.
Computer Vision – Object Representation Learning from Static to Dynamic Visual Context
As an interdisciplinary scientific field, computer vision deals with how computers work to acquire a high-level understanding of digital images or/and videos. In recent decades, notably after the wide-spreading of integrating deep learning technologies into visual understanding, several computer vision problems seemed to have been resolved. Some state-of-the-art methods perform even better than human beings. From the perspective of engineering, computer vision seeks to automate tasks that the human visual system can do. Meanwhile, more problems raise: can computer vision techniques solve visual tasks in the same as the way as people do? Obtaining good results on standard benchmark indicates good models or simply overfitting? How to learn task-specific knowledge based on large amounts of data?
The field of computer vision covers tremendous appealing topics, terminologies, as well as methods. As a doctoral student, I humbly believe that I can only discuss computer vision from a narrow, one-sided perspective. I will go through a few projects that I have participated in, which may present typical scenarios in solving computer vision problems. During this lecture, I will also share my personal perspectives behind these projects.
- An overview of solving a learning-based computer vision problem
- Problem setting: What is the task?
- Testing case: Given the input what the expected outcome would be? What is the baseline of the model? How to build the test case?
- What is your dataset? How does your data look like? Can we make any assumption of the underlying distribution of the data?
- What kind of representation is the best to describe the data? The trade-off of generalization & discrimination
- Will it be a regression model & classification model? What kind of regressor/ classifier can best meet the expected outcome?
- How to train the model? Will the data be sufficient? Is the learning method good enough to reach local minima?
- How to evaluate the model? What kind of evaluation metrics can you design? How can the evaluation metrics reflect the task requirements?
- Pedestrian detection in nighttime driving assistance system based on infrared videos
- Methodology: Cascading model: 1) Hog feature + adaboost 2) sift feature + deformable part based model + latent SVM
- Distinctive Image Features from Scale-Invariant Keypoints
- Object Detection with Discriminatively Trained Part Based Models
- A Short Introduction to Boosting
- Deformable Part Models are Convolutional Neural Networks
- Lesson to learn: 1) Hand-crafted feature is pure heuristic 2) The non deep-learning based model design is more delicate and human interpretable
- Online-learning of task dynamics for general object tracking
- Methodology: Deep learning with recurrent neural network
- Intuition: Encodes all past evidence causally and recursively in a recurrent neural network architecture.
- The system is able to maintain an abstract memory: the motion representation
- Fully-Convolutional Siamese Networks for Object Tracking