The best paper of the workshop was chosen:
Developing autonomous systems that are able to assist humans in everyday's tasks is one of the grand challenges in modern computer science. In order to perform tasks such as navigation, recognition and manipulation of objects, these systems should be able to efficiently extract 3D knowledge of their environment. In this talk, I'll review current techniques for 3D scene understanding from single image as well as RGB-D imagery. I'll then show how natural sentential descriptions can be exploited to improve 3D visual parsing. I'll conclude by reviewing how Computer Graphics can help 3D scene understanding.
Raquel Urtasun is an Assistant Professor at the Department of Computer Science, University of Toronto. From 2009-2014 she was an Assistant Professor at TTI-Chicago, a philanthropically endowed academic institute located in the campus of the University of Chicago. She was a visiting professor at ETH Zurich during the spring semester of 2010. Previously, she was a postdoctoral research scientist at UC Berkeley and ICSI and a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. Raquel Urtasun completed her PhD at the Computer Vision Laboratory, at EPFL, Switzerland in 2006 working with Pascal Fua and David Fleet at the University of Toronto. She has been area chair of multiple machine learning and vision conferences (i.e., NIPS, UAI, ICML, CVPR, ECCV, ICCV), she is on the editorial board of the International Journal of Computer Vision (IJCV), and served in the committee of numerous international conferences.
She also won several awards including the Connaught New Researcher Award and the Best Paper Runner Up at CVPR 2013.
Her major interests are statistical machine learning, computer vision and robotics, with a particular interest in structured prediction and its application to autonomous driving and indoor robotics.
There has been growing interest in data-driven shape analysis of 3D shapes. However, such analysis faces the challenges that the scant availability of quality 3D shapes and quality 3D shape collections. In this talk, I will present my work in addressing these challenges by leveraging the massive available image data. First, I will introduce projective analysis for semantic segmentation and labeling of 3D shapes, which can handle non-manifold, incomplete, or self-intersecting shapes by using labeled images. Next, I will present our work in training images construction -- image distilling, which automatically extracts a subset of inlier images from internet images collections. Finally, I will discuss several challenges in using internet images for 3D shape analysis.
Dr. Yunhai Wang is an associate researcher at visual computing center, SIAT. He received his Ph.D. degree from Graduate University of Chinese Academy of Sciences in 2011. His research deals with the development of visualization and computer graphics techniques that help people see and understand the data. He is particularly interested in developing machine learning algorithms for data visualization and shape analysis.
On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes – a kitchen, an elevator, your office – and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision for decades. Recently, researchers have come to realize that a large amount of image data is the key to several major breakthroughs in image recognition, as exemplified by face detection and deep feature learning. However, while an image is a 2D array, the world is 3D and it is not possible to bypass 3D reasoning during scene understanding. In this talk, I will advocate the use of big 3D data in all major steps of scene understanding. I will share my experience on how to use big 3D data for bottom-up object detection, top-down context reasoning, 3D feature learning and shape representation. As examples, I will present three of our recent works to demonstrate the power of big 3D data: Sliding Shapes -- a 3D object detector trained from a large amount of depth maps rendered from CAD models, PanoContext -- a data-driven non-parametric context model for panoramic scene parsing, and 3D ShapeNets -- a Convolutional Deep Belief Network learned from CAD models on the Internet. Finally, I will discuss several remaining open challenges for big 3D data.
Jianxiong Xiao is an Assistant Professor in the Department of Computer Science at Princeton University. He received his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT). His research interests are in computer vision, with a focus on data-driven scene understanding. He has been motivated by the goal of building computer systems that automatically understand visual scenes, both inferring the semantics (e.g. SUN Database) and extracting 3D structure (e.g. Big Museum). His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012 and Google Research Best Papers Award for 2012, and has appeared in popular press in the United States. Jianxiong was awarded the Google U.S./Canada Fellowship in Computer Vision in 2012, MIT CSW Best Research Award in 2011, and Google Research Awards in 2014. More information can be found at his group website: http://vision.princeton.edu.
|9:00 - 9:40||[Keynote] Jianxiong Xiao: Teaching Computers to See using Big 3D Data|
|9:45 - 10:03||Hung-Kuo Chu and Yu-Shiang Wong: "Annotating RGBD Images of Indoor Scenes"|
|10:05 - 10:23||Moos Hueting, Aron Monszpart and Nicolas Mellado: "MCGraph: Multi-Criterion Representation for Scene Understanding"|
|10:25 - 10:43||Manolis Savva, Angel X. Chang, Gilbert Bernstein, Christopher D. Manning and Pat Hanrahan: "On Being the Right Scale: Sizing Large Collections of 3D Models"|
|10:45 - 10:58||Break|
|11:00 - 11:18||Kevin Karsch and David Forsyth: "Blind Recovery of Spatially Varying Reflectance from a Single Image"|
|11:20-12:00||Keynote by Yunhai Wang|
|12:05-12:45||Keynote by Raquel Urtasun|
The importance of making computers understand the scene presented to them cannot be understated. The ability to automatically infer the semantics and geometry of any given scene would enable a variety of different applications in the field of Augmented reality, Robotics, Image Processing and Visualization. Understandably, a large amount of research effort has been directed at this problem in the computer vision and machine learning communities, with plenty of motivation and interest in computer graphics. The availability of commodity depth sensors have led to a number of breakthroughs to made in this space. Much of this success can be attributed to the use of computer graphics for generating realistic sensor data. We believe the time is ripe for extending this promising approach to the more challenging problem of full scene understanding. However, to enable this, we need close collaboration between researchers from machine learning, computer vision, and computer graphics. This workshop is intended to bring researchers from these communities together.
We are soliciting original contributions which employ shape analysis and image processing for abstracting, representing, and manipulating raw depth scans of indoor environments.
Specific topics include, but are not limited, to:
The manuscripts should be submitted as PDF files and should be no more than 8-10 pages in Siggraph paper format. All submissions must be prepared according to the ACM SIGGRAPH publication guidelines. Each paper will be peer-reviewed by at least two reviewers. Acceptance will be based on relevance to the workshop, novelty, and technical quality. In submitting a manuscript to this workshop, the authors acknowledge that no paper substantially similar in content has been submitted to another workshop or conference during the review period. The best paper will be invited to submit a full journal version of the paper to Computers & Graphics (Elsevier). The proceedings of each workshop will also be included in the ACM Digital Library.
All workshop research papers must be original, unpublished work, written and presented in English. All submissions must be prepared according to the ACM SIGGRAPH publication guidelines. At least one author of each accepted workshop paper is to register for SIGGRAPH Asia 2014 (full conference pass) by 15 September 2014, and give a presentation at the workshop. To recognize the contribution of workshop paper presenters, the presenters can apply for a 25% discount per accepted paper submission.