Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding

1Zhejiang University     2University College London     3Yale University     4Hangzhou Normal University


* Joint first authors

Figure 1: Starting from a heavily occluded single view RGBD image (left), we extract a coarse scene structure as an arrangement of cuboids along with their inter-cuboid relations (middle) using physical stability considerations to hypothesize the missing regions/relations. The coarse structure can then be used for scene understanding and manipulation. This result was generated in the automatic mode.


Missing data due to occlusion is a key challenge in 3D acquisition, particularly in cluttered man-made scenes. Such partial information about the scenes limits our ability to analyze and understand them. In this work we abstract such environments as collections of cuboids and hallucinate geometry in the occluded regions by globally analyzing the physical stability of the resultant arrangements of the cuboids. Our algorithm extrapolates the cuboids into the unseen regions to infer both their corresponding geometric attributes (e.g., size, orientation) and how the cuboids topologically interact with each other (e.g., touch or fixed). The resultant arrangement provides an abstraction for the underlying structure of the scene that can then be used for a range of common geometry processing tasks. We evaluate our algorithm on a large number of test scenes with varying complexity, validate the results on existing benchmark datasets, and demonstrate the use of the recovered cuboid-based structures towards object retrieval, scene completion, etc.


Figure 3: Based on incomplete scene observation (left), we can imagine multiple completions. Top-right shows invalid completions: an unstable stack of three cuboids, or a stable stack of three cuboids that violates the visibility constraint. Bottom-right shows valid completions: two fused cuboids and a small cuboid, or a stack of three cuboids consistent with the visibility constraint.

Algorithm Pipeline

Starting from a single view RGBD image (left), we extract an initial set of cuboids. The cuboids are then extended and refined subject to physical stability considerations to produce a final set of extended cuboids along with appropriate inter-cuboid connection information to fill in the occluded regions.

Application: model retrieval and scene re-arrangement

We thank the reviewers for their comments and suggestions for improving the paper. This work was supported in part by an UCL Impact award, the ERC Starting Grant SmartGeometry (StG-2013- 335373), NSFC (No. 61402402), and gifts from Adobe Research.


The dataset contains 699 NYU2 scenes with:

  • RGB-D point clouds created from the colour and depth images in the NYU2 dataset
  • initial and ground truth proxies
  • initial and ground truth relationship graphs
  • optimized proxies using our technique
  • optimized relationship graphs using our technique

The downloadable code currently contains the project used for initialization (grabCut/automatic) and visualization.

  title   = "Imagining the Unseen: Stability-based Cuboid Arrangements for Scene Understanding", 
  author  = "Tianjia Shao* and Aron Monszpart* and Youyi Zheng and Bongjin Koo and Weiwei Xu and Kun Zhou 
             and Niloy Mitra",
  year    = {2014},
  journal = {ACM SIGGRAPH Asia 2014},
  note    = {* Joint first authors}

Paper (33MB)

Paper Compressed (3.9MB)

Slides (pptx, 200MB)

Slides Compressed (pdf, 2MB)

Supplementary Material (470MB)

Supplementary Material Compressed (256MB)

Dataset-NYU (4GB) (updated: 01 Oct 2014)

Dataset-Recorded (20MB) (updated: 09 Apr 2015)

Code (1MB)
(updated: 05 Dec 2014)