MCGraph: Multi-Criterion Representation for Scene Understanding

University College London
Siggraph Asia 2014 Workshop on Indoor Scene Understanding: Where Graphics meets Vision

* Joint first authors

Figure 1: Left: RGBD scene and its primitive abstraction. Right: Overview of the proposed MCGraph. It contains a hierarchical knowledge graph, the nodes of which describe multi-criteria prior knowledge units (rectangle nodes). An abstraction graph is built using processing or manual editing, where nodes represent objects (circle nodes, colours are consistent with the scene view) and (oriented) relations (square nodes). The nodes of the abstraction graph can be connected to the knowledge graph through relation edges (dashed), which form a relation set. Each edge characterizes, labels or abstracts objects of the scene using prior knowledge units, possibly by specifying parameters (e.g., cuboid dimensions and positions). For the sake of clarity, only a subset of the relation set is shown.


Abstract

The field of scene understanding endeavours to extract a broad range of information from 3D scenes. Current approaches exploit one or at most a few different criteria (e.g., spatial, semantic, functional information) simultaneously for analysis. We argue that to take scene understanding to the next level of performance, we need to take into account many different, and possibly previously unconsidered types of knowledge simultaneously. A unified representation for this type of processing is as of yet missing. In this work we propose MCGraph: a unified multi-criterion data representation for understanding and processing of large-scale 3D scenes. Scene abstraction and prior knowledge are kept separated, but highly connected. For this purpose, primitives (i.e., proxies) and their relationships (e.g., contact, support, hierarchical) are stored in an abstraction graph, while the different categories of prior knowledge necessary for processing are stored separately in a knowledge graph. These graphs complement each other bidirectionally, and are processed concurrently. We illustrate our approach by expressing previous techniques using our formulation, and present promising avenues of research opened up by using such a representation. We also distribute a set of MCGraph annotations for a small number of NYU2 scenes, to be used as ground truth multi-criterion abstractions.