Josef Sivic

Distinguished Researcher

Intelligent Machine Perception project

ELLIS Unit Prague

Czech Institute of Informatics, Robotics, and Cybernetics

Czech Technical University in Prague

Senior researcher (currently on leave), Willow team , INRIA Paris

Address: CIIRC CVUT, Jugoslavskych partyzanu 1580/3, 160 00 Praha 6, Czech Republic

Office: Building B, 6th floor, office B-639 ( Directions )

E-mail: Josef dot Sivic at cvut dot cz


I am a distinguished researcher at the Czech Institute of Informatics, Robotics and Cybernetics at the Czech Technical University where I lead the Intelligent Machine Perception project and the ELLIS Unit Prague . I am on leave from a senior researcher position at INRIA Paris where I keep close ties to the Willow team where I am an external collaborator. I completed my PhD at the University of Oxford (2006) working with Professor Andrew Zisserman and received my habilitation degree (2014) at Ecole Normale Supérieure in Paris. After my PhD, I did a post-doc in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT working with Prof. William Freeman. My research is in computer vision - a branch of computer science and engineering that aims to extract information from images. I am particularly interested in developing intelligent systems that can automatically understand complex visual inputs and can learn with only minimal supervision. Between 2014-2019 I led the ERC project LEAP .


News

Hiring Postdocs/Phds in Machine Learning, Vision and Robotics at the Intelligent Machine Perception group in Prague. Bc./MSc/internship topics are also available. See example topics here .

I am excited to co-chair an interesting session on learning from demonstration at IROS 2022 .

Our work on unsupervised semantic segmentation of urban scenes via cross-modal distillation has been accepted to ECCV for an oral presentation.

I am excited to contribute to the European Network of Excellence in Robotics euROBIN .

Our paper on learning to answer questions from millions of narrated videos has been invited to the TPAMI Special Issue on the Best Papers of ICCV 2021.

Our work on learning from Internet videos how to manipulate tools has been published in IEEE RA-L . A pre-print is available here . The work has received the best poster prize at the 2nd International Workshop on AI for Robotics organized by Naver Labs Europe.

I gave a lecture on multi-modal learning at the EEML 2021 summer school .

Our work on single-view estimation of articulated robot pose has been accepted to CVPR 2021 for oral presentation.

ELLIS Unit Prague at CIIRC CTU has been officially inaugurated .

Our CosyPose approach wins 5 awards at the 6D object pose estimation challenge at ECCV 2020 .

Our work on estimating 3D motion and forces of person-object interactions was shortlisted for Best Paper at CVPR 2019 .

Our work on Neighbourhood Consensus Networks was presented as a spotlight at NeurIPS 2018. An extended version was accepted to PAMI ( pdf ) and a new more efficient version ( pdf ) appeared at ECCV 2020.

Antoine Miech (co-advised with I. Laptev) won the Google Fellowship . An interview with Antoine is here .

Two papers on visual localization have been accepted for spotlight presentations at CVPR 2018. Pre-prints are available online here and here .

Antoine Miech (co-advised with I. Laptev) has won the Google Cloud & YouTube-8M Video Understanding Challenge . The workshop paper describing the winning entry is now online . Bravo Antoine!

Our work on weakly-supervised learning of visual relations has been accepted for oral presentation at ICCV 2017. The pre-print is online .

Our work on learning geometric matching has been accepted for spotlight presentation at CVPR 2017. The pre-print is online .

Our work on weakly supervised place recognition has been accepted for oral presentation at CVPR 2016. The pre-print is online .

Our work on learning from instruction videos has been accepted for oral presentation at CVPR 2016. The pre-print is online .

chair_small.jpeg

Our work on weakly supervised learning with convolutional neural networks has been accepted to CVPR 2015. The pre-print is online .

Our work on place recognition across large changes in illumination has been accepted to CVPR 2015. The pre-print is online .

Our work on aligning paintings to 3D models has been published in ACM Transactions on Graphics. The pre-print is available online .

Our work on transferring mid-level image representations using convolutional neural networks has been accepted for oral presentation at CVPR 2014. The paper is available online .

chairs4.jpg

Our work on 2D-3D alignment for 3D object category recognition has been accepted for oral presentation at CVPR 2014 (The paper is available online ) .

I have been awarded an ERC Starting Grant 2013 .

IJCV special issues on Visual Media Geo-Localization and Video Recognition that I have co-edited have been published.


Publications



Teaching


Charles University, Prague, Faculty of Mathematics and Physics

  • 3D Computer Vision, Winter 2022
  • 3D Computer Vision, Winter 2021
  • 3D Computer Vision, Winter 2020
  • 3D Computer Vision, Winter 2019

(co-lecturing with V. Hlavac)

Ecole Normale Supérieure de Paris / Ecole Normale Supérieure de Cachan

  • Object Recognition and Computer Vision, Fall 2019
  • Object Recognition and Computer Vision, Fall 2018
  • Object Recognition and Computer Vision, Fall 2017
  • Object Recognition and Computer Vision, Fall 2016
  • Object Recognition and Computer Vision, Fall 2015
  • Object Recognition and Computer Vision, Fall 2014
  • Object Recognition and Computer Vision, Fall 2013
  • Object Recognition and Computer Vision, Fall 201 2
  • Object Recognition and Computer Vision, Fall 2011
  • Object Recognition and Computer Vision, Fall 2010
  • Object Recognition and Computer Vision, Fall 2009
  • Object Recognition and Computer Vision, Fall 2008

(co-lecturing with J. Ponce, C. Schmid and I. Laptev)

ENS/INRIA Visual Recognition and Machine Learning Summer School, Paris, 22-26 July 2013
INRIA Visual Rec ogni tion and Machine Learning Summer School, Grenoble, 9-13 July 2012
ENS/INRIA Visual Recognition and Machine Learning Summer School, Paris, 25-29 July 2011
Computer Vision and Machine Learning Winter School, ENS Lyon, 24-28 January 2011
INRIA Visual Recognition and Machine Learning Summer School, Grenoble, 26-30 July 2010



Students and Collaborators

Post-docs:

Phd students:

  • Petr Kouba (co-advised with Jiri Sedlar, Stanislav Mazurenko and Jiri Damborsky)
  • Raman Samusevich (co-advised with Tomas Pluskal)
  • Tomas Soucek
  • Louis Montaut (co-advised with V. Petrik, J. Carpentier and I. Laptev)
  • Antonin Vobecky (co-advised with D. Hurych and P. Perez)
  • Hugo Cisneros (co-advised with T. Mikolov)
  • Katerina Zorina (co-advised with V. Petrik and T. Pajdla)
  • Yann Labbe (co-advised with I. Laptev)
  • Zongmian Li (co-advised with I. Laptev and N. Mansard)

Graduated:

M.Sc. students:

  • Jan Waltl, 2022 (co-advised with T. Soucek)
  • Lucas Disson, 2021 (co-advised with J. Sedlar and T. Pajdla)
  • Jaroslav Macke, 2021 (co-advised with Jiri Sedlar)
  • Louis Montaut, 2020 (co-advised with V. Petrik)
  • Josselin Bonnevie, 2019 (co-advised with M. Cimpoi)
  • Hugo Cisneros, 2019 (co-advised with T. Mikolov)
  • Mihai Dusmanu, 2019 (co-advised with T. Pajdla, T. Sattler, A. Torii)
  • Robin Strudel, 2018 (co-advised with I. Laptev)
  • Yann Labbe, 2018 (co-advised with I. Laptev)
  • Zongmian Li, 2017 (co-advised with I. Laptev and N. Mansard)
  • Dmitry Zhukov, 2017 (co-advised with I. Laptev and J.-P. Alayrac)
  • Ignacio Rocco, 2016 (co-advised with R. Arandjelovic)
  • Antoine Miech, 2016 (co-advised with I. Laptev and P. Bojanowski)
  • Julia Peyre, 2015 (co-advised with I. Laptev and C. Schmid)
  • Theophile Dalens, 2014 (co-advised with M. Aubry)
  • Maxime Oquab, 2013 (co-advised with L. Bottou and I. Laptev)
  • Tuang-Hung Vu, 2013 (co-advised with I. Laptev and A. Oliva)
  • Piotr Bojanowski, 2012 (co-advised with I. Laptev and C. Schmid)
  • Michael Mathieu, 2011 (co-advised with I. Laptev)
  • Guillaume Seguin, 2011 (co-advised with I. Laptev)
  • Ugo Jardonnet, 2010 (co-advised with T. Cour and I. Laptev)
  • Jose Lezama, 2010 (co-advised with I. Laptev)
  • Vincent Delaitre, 2010 (co-advised with I. Laptev)
  • Warith Harchaoui, 2009 (co-advised with I. Laptev)
  • Mathieu Prevot, 2008 (co-advised with B. Russell)
  • Oliver Whyte, 2008 (co-advised with A. Zisserman)
  • Jan Knopp, visiting student, 2008 (co-advised with T. Pajdla)

Former collaborators:


Professional activities


Selected Projects

MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare

Y Labbé, L Manuelli, A Mousavian, S Tyree, S Birchfield, J Tremblay, J Carpentier, M Aubry, D Fox, J Sivic

Conference on Robot Learning (CoRL) (2022)

PDF | Project page | Code | Video

Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation

A Vobecky, D Hurych, O Siméoni, S Gydaris, A Bursuc, P Pérez, J Sivic

European Conference on Computer Vision (ECCV) (2022)

PDF | Project page | Code | Video

Look for the Change:
Learning Object States and State-Modifying Actions from Untrimmed Web Videos

T Soucek, J-B Alayrac, A Miech, I Laptev, J Sivic

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

PDF | Project page | Code | Dataset | Video

Focal Length and Object Pose Estimation via Render and Compare

G Ponimatkin, Y Labbe, B Russell, M Aubry, J Sivic

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

PDF | Project page | Code

Learning to Manipulate Tools by Aligning Simulation to Video Demonstration

K Zorina, J Carpentier, J Sivic, V Petrík

IEEE Robotics and Automation Letters (RA-L) (2021)

PDF | Project page | Code | Video | Best poster prize at the 2nd International Workshop on AI for Robotics .

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

A Yang, A Miech, J Sivic, I Laptev, C Schmid

IEEE International Conference on Computer Vision (ICCV) (2021)

PDF | Project page | Code | Demo | Invited to the TPAMI Special Issue on the Best Papers of ICCV 2021.

RoboPose: Single-view robot pose and joint angle estimation via render & compare

Y Labbé, J Carpentier, M Aubry, J Sivic

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

PDF | Project page | Code

Artificial Dummies for Urban Dataset Augmentation

A Vobecký, D Hurych, M Uřičář, P Pérez, J Sivic

AAAI Conference on Artificial Intelligence (2021)

PDF | Project page | code

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos

V Petrik, M Tapaswi, I Laptev, J Sivic

Conference on Robot Learning (CoRL) (2020)

PDF | Project page | Code | Supp | Video

CosyPose: Consistent multi-view multi-object 6D pose estimation

Y Labbé, J Carpentier, M Aubry, J Sivic

In European Conference on Computer Vision (ECCV) (2020)

PDF | Project page | Code | Supp | Video | Winner of the 6D object pose estimation at ECCV'20 (5/6 awards)

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

A Miech, J-B Alayrac, L Smaira, I Laptev, J Sivic, A Zisserman

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

PDF | Project page | Code

Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video

Z Li, J Sedlar, J Carpentier, I Laptev, N Mansard, J Sivic

In CVPR 2019 | PDF | Project page | Best paper finalist

Howto100M: Learning a Text-video Embedding by Watching Hundred Million Narrated Video Clips

A Miech, D Zhukov, J-B Alayrac, M Tapaswi, I Laptev, J Sivic

In ICCV 2019 | PDF | Project page | Code | Demo

Neighbourhood Consensus Networks

I Rocco, M Cimpoi, R Arandjelovic, A Torii, T Pajdla, J Sivic

In Advances in Neural Information Processing Systems (NeurIPS) (2018)

PDF | Project page | Code

InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

H. Taira, M. Okutomi, T. Sattler, M. Cimpoi, M. Pollefeys, J. Sivic, T. Pajdla, A. Torii

In CVPR 2018 | PDF | Project page | Code

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

T Sattler, W Maddern, C Toft, A Torii, L Hammarstrand, E Stenborg, D Safari, M Okutomi, M Pollefeys, J Sivic, F Kahl, T Pajdla

In CVPR 2018 | PDF | Project page

Weakly-supervised learning of visual relations

J. Peyre., I. Laptev, C. Schmid, and J. Sivic

In ICCV 2017 | PDF | Project page

Convolutional neural network architecture for geometric matching

I. Rocco, R. Arandjelovic and  J. Sivic

In CVPR 2017 | PDF | Project page

Unsupervised learning from narrated instruction videos

J.-B. Alayrac, P. Bojanowski, N. Agrawal, I. Laptev, J. Sivic and S. Lacoste-Julien

In CVPR 2016 | PDF | Project page

NetVLAD: CNN architecture for weakly supervised place recognition

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla and J. Sivic

In CVPR 2016 | PDF | Project page

Is object localization for free? – Weakly-supervised learning with convolutional neural networks

M. Oquab, L. Bottou, I. Laptev, J. Sivic

In CVPR 2015 | PDF | Project page

Linking Past to Present: Discovering Style in Two Centuries of Architecture

Lee, S., Maisonneuve, N., Crandall, D., Efros, A. and Sivic, J.

In ICCP 2015 | PDF | Project page

24/7 place recognition by view synthesis

Torii, A., Arandjelovic, R., Sivic, J., Pajdla, T. and Okutomi, M.

In CVPR 2015 | PDF | Project page

Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models

Aubry, M., Maturana, D., Efros, A., Russell, B. and Sivic, J.

In CVPR 2014 | PDF | Project page

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

M. Oquab, L. Bottou, I. Laptev, J. Sivic

In CVPR 2014 | PDF | Project page

Painting-to-3D Model Alignment Via Discriminative Visual Elements

M. Aubry, B. Russell and J. Sivic

In ACM Transactions on Graphics (TOG), 2013

PDF | Project page

Pose estimation and segmentation of people in 3D movies

Alahari, K., Seguin, G., Sivic, J. and Laptev, I.

In ICCV 2013 || PDF | Project page

Finding actors and actions in movies

Bojanowski, P., Laptev, I., Ponce, J., Schmid, C. and Sivic, J.

In ICCV 2013 | PDF | Project page

Visual place recognition with repetitive structures

Torii, A., Sivic, J., Pajdla, T. and Okutomi, M.

In CVPR 2013 | PDF | Project page

Learning per-location classifiers for visual place recognition

Gronat, P., Obozinski, G., Sivic, J. and Pajdla, T.

In CVPR 2013 | PDF | Project page

Scene semantics from long-term observation of people

V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Gupta and A. Efros

In ECCV 2012 | PDF | Project page

People watching: human actions as a cue for single-view geometry

D. Fouhey, V. Delaitre, A. Gupta, A. Efros, I. Laptev and J. Sivic

In ECCV 2012 | PDF | Project page

What makes Paris look like Paris?

Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic and Alexei Efros

In SIGGRAPH 2012 | PDF | Project page

Data-driven crowd analysis in videos

Mikel Rodriguez, Josef Sivic, Ivan Laptev and Jean-Yves Audibert

In ICCV 2011 | PDF | Project page

Density-aware person detection and tracking in crowds .

Mikel Rodriguez, Ivan Laptev, Josef Sivic and Jean-Yves Audibert

In ICCV 2011 | PDF | Project page

Learning person-object interactions for action recognition in still images

Vincent Delaitre, Josef Sivic and Ivan Laptev

In NIPS 2011 | PDF | Project page

Automatic alignment of paintings and photographs depicting a 3D scene

Bryan Russell, Josef Sivic, Jean Ponce and Helene Dessales

In 3rd International IEEE Workshop on 3D Representation for Recognition (2011) | PDF | Project page

Visual localization by linear combination of image descriptors

Akihiko Torii, Josef Sivic and Tomas Pajdla

In 2nd IEEE Workshop on Mobile Vision (2011) | PDF

Deblurring shaken and partially saturated images

Oliver Whyte, Josef Sivic and Andrew Zisserman

IEEE Workshop on Color and Photometry in Computer Vision (2011) PDF | Project page

Track to the future: Spatio-temporal video segmentation with long-range motion cues

Jose Lezama, Karteek Alahari, Josef Sivic and Ivan Laptev

In CVPR 2011| PDF | Project page

Avoiding confusing features in place recognition

Jan Knopp, Josef Sivic, and Tomas Pajdla

In ECCV 2010 | PDF | Project page

Descriptor learning for efficient retrieval

James Philbin, Michael Isard, Josef Sivic, and Andrew Zisserman

In ECCV 2010 | PDF

Non-uniform deblurring for shaken images

Oliver Whyte, Josef Sivic, Andrew Zisserman and Jean Ponce

In CVPR 2010 | PDF | Project page

Recognizing human actions in still images: a study of bag-of-features and part-based representations

Vincent Delaitre, Ivan Laptev and Josef Sivic

In BMVC 2010 | PDF

Get out of my picture! Internet-based inpainting

Oliver Whyte, Josef Sivic and Andrew Zisserman

In BMVC 2009 | PDF | Project page

Segmenting Scenes by Matching Image Composites

Bryan Russell, Alexei A. Efros, Josef Sivic, Bill Freeman and Andrew Zisserman

In NIPS 2009 | PDF | Bibtex | Project page

Automatic annotation of human actions in video

Olivier Duchenne, Ivan Laptev, Josef Sivic, Francis Bach and Jean Ponce

In ICCV 2009 | PDF | Bibtex | AVI

"Who are you?'': Learning person specific classifiers from video

Josef Sivic, Mark Everingham and Andrew Zisserman

In CVPR 2009 | PDF | Bibtex | AVI

SIFT Flow: Dense Correspondence across Different Scenes

Ce Liu, Jenny Yuen, Antonio Torralba, Josef Sivic and William T. Freeman

In ECCV 2008 | PDF | Bibtex | Project page

Creating and Exploring a Large Photorealistic Virtual Space

Josef Sivic, Biliana Kaneva, Antonio Torralba, Shai Avidan and William T. Freeman

In  IEEE Workshop on Internet Vision, at CVPR’08 | PDF | Bibtex | Supplementary Video

Unsupervised Discovery of Visual Object Class Hierarchies

Josef Sivic, Bryan Russell, Andrew Zisserman, William T. Freeman and Alexei A. Efros

In CVPR 2008 | PDF | Bibtex

Geometric LDA: A Generative Model for Particular Object Discovery

James Philbin, Josef Sivic, Andrew Zisserman

In BMVC 2008 | PDF | Bibtex | Extended journal version

Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases

James Philbin, Ondrej Chum, Michael Isard, Josef  Sivic, Andrew Zisserman

In CVPR 2008 | PDF | Bibtex

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval

Ondrej Chum, James Philbin, Josef Sivic, Michael Isard, Andrew Zisserman

In ICCV 2007 | PDF | Bibtex | Project page and Demo

Object retrieval with large vocabularies and fast spatial matching

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, Andrew Zisserman

In CVPR 2007 | PDF | Bibtex | Project page and Demo

Using Multiple Segmentations to Discover Objects and their Extent in Image Collections

Bryan C. Russell, Alexei A. Efros, Josef Sivic, William T. Freeman, Andrew Zisserman

In CVPR 2006 | PDF | Bibtex | Project page and source code

Hello! My name is... Buffy -- Automatic Naming of Characters in TV Video

Mark Everingham, Josef  Sivic, Andrew Zisserman

In BMVC 2006 | PDF | Bibtex | Project page | Extended Journal Version

Winner of best industrial paper prize

Finding people in repeated shots of the same scene

Josef Sivic, C. Lawrence Zitnick, Richard Szeliski

In BMVC 2006  | PDF | Bibtex | Extended Technical Report

Person spotting: video shot retrieval for face sets

Josef Sivic, Mark Everingham, Andrew Zisserman

In CIVR 2005 | PDF | Bibtex | Project page and Demo

Discovering objects and their location in images

Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman

In ICCV 2005 | PDF | Bibtex | Source Code (pLSA)

Object Level Grouping for Video Shots

Josef Sivic, Frederick Schaffalitzky, Andrew Zisserman

In ECCV 2004 | PDF | Bibtex | Extended Journal Version

Video Data Mining Using Configurations of Viewpoint Invariant Regions

Josef Sivic, Andrew Zisserman

In CVPR 2004 | PDF | Bibtex

Video Google: A Text Retrieval Approach to Object Matching in Videos

Josef Sivic, Andrew  Zisserman

In ICCV 2003 | PDF | Bibtex | Project page and Demo