Pavel Tokmakov

I am currently a postdoc at CMU working with Martial Hebert and Deva Ramanan. My main research interest can be summarized as integrating external knowledge and structure into deep learning models.
I completed my PhD at Inria, France under Cordelia Schmid's and Karteek Alahari's supervision, studying the role of motion in object recognition. Prior to PhD I worked on statistical relational learning and interactive knowledge discovery. You can find my CV here. A full list of publications is avaliable on Google Scholar.


Selected publications

Learning compositional representations for few-shot recognition

Deep learning representations lack the compositionality property, which is instrumental for the human ability to learn novel concepts from a few examples. In this work we investigate several approaches to enforcing this property during training. The resulting models demonstrate significant improvements in the few-shot setting.

A study on action detection in the wild

In this project we address the problem of long tail category distribution in action detection datasets, both in the training and in the test set. For the former, we introduce a simple but effective approach for transferring knowledge form head to tail classes. For the latter, we propose a new metric which is not biased by the distribution of examples.

A structured model for action detection

A dominant paradigm for learning-based approaches in computer vision is training generic models on large datasets, and allowing them to discover the optimal representation for the problem at hand. In this work we propose instead to integrate some domain knowledge into the architecture of an action detection model. This allows us to achieves significant improvements over the state-of-the art without much parameter tuning.

Towards segmenting anything that moves

Detecting and segmenting all the objects in a scene is a key requirement for agents operating in the world. However, even defining what is an object is ambiguous. In this work we use motion as a bottom up cue, and propose a learning-based method for category-agnostic instance segmentation in videos.

Learning to segment moving objects

Motion segmentation is the classical problem of separating moving object in a video from the background. In this work we propose the first learning-based approach for this problem. We then extend the model with an appearance stream and a visual memory module, allowing it to segment objects before they start and after they stop moving.

Weakly-supervised semantic segmentation using motion cues

Semantic segmentation models require a large amount of expensive, pixel-level annotations to train. We propose to reduce the annotation burden by training the models on weakly-labeled videos and obtaining information about the precise shape of the objects from motion for free. Our model integrates motion cues into a label inference framework in a soft way, which allows to automatically improve the quality of the masks during training.

Relational linear programming

We propose relational linear programming, a simple framework for combining linear programs (LPs) and logic programs. A relational linear program (RLP) is a declarative LP template defining the objective and the constraints through the logical concepts of objects, relations, and quantified variables. This allows one to express the LP objective and constraints relationally for a varying number of individuals without enumerating them.