In this talk, I will describe a series of past, current and future projects that aim to understand 3D environments through the perspective of a human or other embodied agent acting in the scene. By learning from observations of people acting in the real world, we can obtain an agent-centric representation of the structure and semantics of 3D environments that is useful for both analysis and synthesis tasks. First, I will demonstrate how we can use this representation to analyze 3D environments and predict how likely they are to support specific human actions. Then I will show how we can use the same representation to generate 3D environments and human poses depicting common actions. Finally, I will describe ongoing work on building an embodied simulation framework to establish a common platform for research in embodied agents acting within realistic 3D environments. This platform allows us to leverage computer graphics to generate 3D environments with controlled variation, enabling systematic learning through simulation for problems in computer vision, robotics, NLP, and AI.
Human activities invariably involve movement and interactions with other objects, animate or inanimate. Reasoning about motions raises many interesting challenges, especially when we have multiple moving objects with distinct motions, yet all synergistic towards a common high-level goal. Motion estimation, representation, and segmentation from visual data are all non-trivial problems, especially since occlusions are very common in human-object interactions involving contacts. In this talk we survey a number of recent efforts in high-level motion analysis and inference from visual data, such as RGB and RGB-D videos or point clouds over time. We start by reviewing certain recent deep neural architectures for processing point cloud data. We then look at the simple problem of inferring 3D flow in dynamic point clouds, revisiting ICP from a learning perspective. We proceed to derive descriptors for human-object interactions that aim to capture certainly key aspects of the geometry and dynamics of the interaction, but without being too closely tied to any particular motion representation. We finally discuss how to infer the motion patterns of multi-step human activities in a desktop or tablet-top settings, such as for example in setting a table for dinner. We exhibit a recurrent neural architecture that can learn from 2D videos the patterns of such activities and generate synthetic interactions that follow both physical laws and human conventions. This machinery allows us to transport interactions spatially, to new settings, as well as transport interactions temporally, to produce continuations or completions of partially observed activities. This latter functionality facilitates the creation of assistive agents that can help people by inferring intent and provide them with either informational or physical help in smart environments.
In the recent years, commodity 3D sensors have become easily and widely available. These advances in sensing technology have spawned significant interest in using captured 3D data for mapping and semantic understanding of 3D environments. In this talk, I will give an overview of our latest research in the context of 3D reconstruction of indoor environments. I will further talk about the use of 3D data in the context of modern machine learning techniques. Specifically, I will highlight the importance of training data, and how can we efficiently obtain labeled and self-supervised ground truth training datasets from captured 3D content. Finally, I will show a selection of state-of-the-art deep learning approaches, including discriminative semantic labeling of 3D scenes and generative reconstruction techniques.
4D vision is an emerging area within Computer Vision addressing the capture and analysis of real-world dynamic scenes from video. This talk will review progress in 4D vision over the past decade through to current challenges. Recent advances have enabled 4D capture of natural dynamic scenes such as people together with the use of capture human performance for video-realistic animation. The technology is currently being deployed in entertainment content production for both film and immersive content for VR.
The over-segmentation of images into atomic regions has become a standard and powerful tool in Vision and Graphics. The very popular superpixel methods, that operate at the pixel level, cannot directly capture the geometric information disseminated into the images. In this talk, we propose an alternative representation to superpixels. By operating at the level of geometric shapes, typically line-segments, one can generate geometric partitions of images as layouts of polygons. Such layouts are compact, scalable, and come with geometric guarantees. We present two different methods to generate such geometric partitions. The first method builds a Voronoi diagram that conforms to preliminarily detected geometric shapes, whereas the second one exploits a kinetic framework to locally propagate the geometric shapes. Through some applications in urban reconstruction, we show that such partitions are particularly adapted to analyse images with strong geometric signatures as man-made objects.
The goal of procedural modeling is to generate realistic content. The realism of this content is typically assessed by qualitatively evaluating a small number of results, or, less frequently, by conducting a user study. However, there is a lack of systematic treatment and understanding of what is considered realistic, both in procedural modeling and for images in general. We conduct a user study that primarily investigates the realism of procedurally generated buildings. Specifically, we investigate the role of fine and coarse details, and investigate which other factors contribute to the perception of realism. We find that realism is carried on different scales, and identify other factors that contribute to the realism of procedural and non-procedural buildings.
Digital fabrication allows for an extremely fast transition from virtual prototypes to their physical realization. In the case of deformable objects, one would like to design these prototypes with a clear idea in mind about how they should behave once they are printed. It is not easy to predict what combination of material and geometric properties will produce a specific global deformation behavior. We seek to create tools that simplify as much as possible the way a user specifies the desired behavior and automate the rest of the design process. In this talk, we take a brief look at the diversity of recent works, identify the fundamental aspects of these methods, and present computational solutions for the design, simulation, and fabrication of two interesting kinds of deformable structures: i) We first explore Flexible Rod Meshes. These are light-weight and cost-efficient physical shapes, that can be fabricated in one piece from a single base material, and yet produce deformable objects with really complex behaviors. We present a tool that takes as input a deformable surface together with a set of poses and boundary conditions and automatically computes a rod mesh ready to be printed. ii) We then study Kirchhoff-Plateau Surfaces. These are planar networks of thin elastic rods embedded in pre-tensioned membranes that deploy into complex, three-dimensional shapes, composed of minimal surface patches. We propose a tool to interactively explore this intriguing and expressive design space, using a combination of topology and geometry editing, forward simulation, sensitivity analysis and highly efficient inverse design. In the last part of the talk, we’ll briefly take a look at some new trends and promising challenges in the field.
Physics simulations for solids and fluids are today essential for the production of realistic animations and special effects in feature films, computer games and surgical simulators. The underlying mathematical models often require handling of boundaries and geometric interfaces. Phenomena modelled using interfaces include but are not limited to collision handling, two-way coupling in solid-fluid interaction, multiphase flows, and cutting and fracture of solid objects. Despite the wide range of existing approaches, an accurate and robust treatment is still difficult. In this talk I will present recent approaches that aim towards robust and accurate treatment of interfaces and boundaries in physically based animation. In the first part I will give an overview over the research that resulted from the work as PhD student. The main part will be devoted to a recent approach for the simulation of cutting of deformable solids. A finite element discretization will be introduced that is able to capture discontinuities in the underlying partial differential equation's solution due to physical cuts. Without the requirement on any topological changes in the discretization mesh, basis enrichments are employed that augment the approximation basis by discontinuous functions. One key aspect of the method is the construction of specialized quadrature rules for numerical computation of integrals over piecewise polynomial but discontinuous functions arising due to dissected finite elements. On the basis of several examples and comparisons the robustness and visual realism of the method will be demonstrated. Finally, the talk will be concluded by a discussion of limitations and future work.
In the past few decades, advances in digital design tools have made it possible to design highly complex 3D shapes. However, physical realization of these shapes remains a challenging task. Recently, the emergence of affordable fabrication tools such as 3D printers and laser cutters allows us to turn a digital design into a physical object. But effective use of these tools requires the design shape to satisfy specific requirements related to the fabrication technologies, which are not considered by traditional 3D design tools. We argue that these fabrication requirements can be incorporated into the design process as geometric constraints, such that the resulting designs can be realized using specific technologies and materials. We present a few fabrication-aware design tools for different applications, from freeform architectural design to cost-effective fabrication of large objects.
In this talk I will describe ongoing efforts in video recognition at Adobe. Video presents additional challenges over recognition in still images. Example challenges include the sheer volume of data, lack of annotated data across time, and presence of action categories where motion and appearance cues are critical. I’ll describe work that addresses the issue of visual representation in video and its intersection with natural language.
Physics simulations for virtual smoke, explosions or water are by now crucial tools for special effects. Despite their wide spread use, it’s still very difficult to get get these simulations under control, or to make them fast enough for practical use. In this talk I will present recent research projects that aim for solving and alleviating these issues. A central part of this talk will be devoted to methods for interpolating fluid simulations. I will describe a method that uses 5D optical flow to register two space-time volumes of simulations. Once the registration is computed, in-between versions can be generated very efficiently. In contrast to previous work, this approach uses a volumetric representation, which is beneficial for smooth and robust registrations without user intervention. I will show several examples of smoke and liquid animations generated with this interpolation method, and discuss limitations of the approach. The talk will be concluded by giving an outlook of open questions in the area.
In this talk, I will give an introduction to Integer Programming (IP) and show how we used IP in recent research projects. The projects range from problem formulations in visualization to urban modeling.
I will discuss the problem of "tactile mesh saliency", where tactile salient points are those on a virtual mesh that a human is more likely to grasp, press, or touch if the mesh were a real-world object. While the concept of visual saliency has been previously explored in the areas of mesh and image processing, tactile saliency detection has not been explored. I will describe the solution towards this problem that we have developed. We collect crowdsourced data of relative rankings and develop a new formulation to combine deep learning and learning-to-rank methods to compute a tactile saliency measure. The solution is demonstrated on a variety of 3D meshes and various applications including material suggestion for rendering and fabrication. Time permitting, I will also discuss other problems that I have recently worked on that take a similar learning framework.
Many applications of surface models, such as mesh processing, simulation, and manufacturing, are sensitive to the topological properties of the models. To create a surface with the desirable topology, a common strategy is to first reconstruct the surface from the input data using a topology-oblivious algorithm and then fix any topological errors in a post-process. We advocate a different strategy that reconstructs the surface with topology constraints in mind. The talk reviews several recent work in this direction that revolve around reconstructing surfaces from a network of spatial curves. We will consider a variety of topological constraints, such as manifoldness, connected components, and genus.
Interactive physics-based simulations are now capable of reproducing a growing number of motion skills, often with a focus on generating agile-and-robust locomotion. In this talk, I review recent progress in simulation-based models of human and animal motion as used for computer animation, where they seek to replace simpler kinematic models based on motion-capture. We will discuss the roles of optimization, machine learning, and simplified models in these approaches, as well as what insights might be shared between robotics and our simulation-based work in animation. A wide variety of animated results will be shown to illustrate the capabilities of current methods. I'll also identify several research directions where we still need to see significant progress.
3D printers have become popular in recent years and enable fabrication of custom objects for home users. The promise of moving creations from a virtual space into reality is truly tantalizing, and its applications go far beyond basic manufacturing and rapid prototyping. However, many obstacles remain for 3D printing to be practical and commonplace. In this talk, I will review our recent works on geometric modeling and processing for 3D printing applications.
As a complete shape description, the medial axis of a geometric shape possesses a number of favorable properties--it encodes symmetry, local thickness and structural components of the shape it represents. Hence, the medial axis has been studied extensively in shape modeling and analysis since its introduction by Blum in 1960s. However, the practical application of the medial axis is hindered by its notorious instability and lack of compact representation; that is, a primitive medial axis without proper processing is often represented as a dense discrete mesh with many spurious branches. In this talk I shall represent some recent studies on computing stable and compact representations of the medial axes of 3D shapes. Techniques from mesh simplification will be employed to compute a medial axis without spurious branches and represented by a small number of mesh vertices, while meeting specified approximation accuracy.
The success of physics sandbox applications and physics-based puzzle games is a strong indication that casual users and hobbyists enjoy designing mechanisms, for educational or entertainment purposes. In these applications, a variety of mechanisms are designed by assembling two-dimensional shapes, creating gears, cranks, cams, and racks. We propose to start from such casual designs of mechanisms and turn them into a 3D model that can be printed onto widely available, inexpensive filament based 3D printers.
This talk covers two topics from computer graphics: In the first part, it is shown how to perform exact anti-aliasing in the context of rasterization by utilizing closed-form solutions of the corresponding filter convolutions. This provides a ground truth solution to edge anti-aliasing in the context of 3D-to-2D rasterization, which is made possible by an analytic visibility method. Parallel algorithms are presented for these methods and an efficient GPGPU implementation is outlined. The second part of the talk presents a reduced-order approach to shape optimization. The task of optimizing the physical behavior of fabricable models is formulated in terms of offset surfaces. This allows the associated non-linear optimization problem to be efficiently encoded in a reduced-order basis - Manifold Harmonics in our case - which significantly reduces the computational effort to find a solution.
The past decade in computer vision research has witnessed the re-emergence of 'deep learning' and in particular, convolutional neural network techniques, allowing to learn task-specific features from examples and achieving a breakthrough in performance in a wide range of applications. However, in the geometry processing and computer graphics communities, these methods are practically unknown. One of the reasons stems from the facts that 3D shapes (typically modeled as Riemannian manifolds) are not shift-invariant spaces, hence the very notion of convolution is rather elusive. In this talk, I will show some recent works from our group trying to bridge this gap. Specifically, I will show the construction of intrinsic convolutional neural networks on meshes and point clouds, with applications such as finding dense correspondence between deformable shapes and shape retrieval.
The differential structure of surfaces captured by the Laplace Beltrami Operator (LBO) can be used to construct a space for analyzing visual and geometric information. The decomposition of the LBO at one end, and the heat operator at the other end provide us with efficient tools for dealing with images and shapes. Denoising, matching, segmenting, filtering, exaggerating are just few of the problems for which the LBO provides a convenient operating environment. We will review the optimality of a truncated basis provided by the LBO, and a selection of relevant metrics by which such optimal bases are constructed. A specific example is the scale invariant metric for surfaces, that we argue to be a natural choice for the study of articulated shapes and forms.
Finite element simulations of deformable objects are typically based on spatial discretizations using either tetrahedral or hexahedral elements. This allows for simple and efficient computations, but in turn requires complicated remeshing in case of topological changes or adaptive simulations. In this talk I will show how the use of arbitrary polyhedral elements in FEM simulations avoid the need for remeshing and thereby simplifies adaptive refinement, interactive cutting, and fracturing of the simulation domain.
Physics simulations are widely recognized to be crucial tools for complex special effects in feature films. In addition real-time simulations are by now often central game-play elements in modern computer games. Despite this, we are still very far from being able to accurately simulate the complexity of nature around us, and the common numerical methods are often difficult to fine-tune and control. In this talk I will focus on fluid effects, and I will explain a different take on dealing with them in virtual environments: instead of trying to calculate everything from scratch based on a physical model, I will outline a method to capture the motion of a fluid based on a sequence of density inputs. A simulation tightly coupled with optical flow is used in an optimization step to calculate the actual flow velocities. Interestingly, an accurate flow simulation turns out to be a crucial prior to constrain the optical flow reconstruction to physical motions. I will then demonstrate how the extracted velocities can be used to re-run modified setups and generate higher-resolution versions of the original flow. The talk will be concluded by giving an outlook of requirements of the visual effects industry, open challenges for capturing flows, and areas of application outside of computer graphics.
This talk will present an overview of my recent research which evolves around discrete and computational differential geometry with applications in architecture, computational design and manufacturing. From the mathematical perspective, we are working on extensions of classical differential geometry to data and objects which frequently arise in applications, but do not satisfy the classical differentiability assumptions. On the practical side, our work aims at geometric modeling tools which include important aspects of function and fabrication already in the design phase. This interplay of theory and applications will be illustrated at hand of selected recent projects on the computational design of architectural freeform structures under manufacturing and structural constraints. In particular, we will address smooth skins from simple and repetitive elements, self-supporting structures, form-finding with polyhedral meshes, optimized support structures, shading systems and the exploration of the available design space.
Visual media surrounds us, and there is growing interest in new applications such as 3D printing and collaborative virtual worlds. As more and more people engage in producing visual content, there is a demand for interfaces that help novice users carry out creative design. Such an interface should allow people to easily and intuitively express high-level design goals, such as 'create a fast airplane' or 'create a cute toy', while allowing the final product to be customized according to each person's preferences. Current interfaces require the design goal to be reached through careful planning and execution of a series of low-level drawing and editing commands -- which requires previsualization, dexterity and time -- or serendipitiously through largely unstructured exploration. The gap between how a person thinks about what she wants to create, and how she can interact with a computer to get there, is a barrier for the novice. In this talk, I will present recent work on capturing design intent in high-level, linguistic terms. For example, the designer may want to make a virtual creature more 'scary', or a web page more 'artistic'. Such requirements are natural for humans, yet cannot be directly expressed in current interfaces. Our work combines crowdsourcing, machine learning and probabilistic shape analysis to create a design interface that directly supports such expression. The approach is data-driven: large repositories of existing designs are used to learn shared structure and semantics, and repurposed for synthesizing new designs. I will conclude with a discussion of directions, opportunities and challenges for new tools for high-level design that exploit the inter-relationship of semantics, function and form to aid the creative process.
Storytelling is essential for communicating ideas. When they are well told, stories help us make sense of information, appreciate cultural or societal differences, and imagine living in entirely different worlds. Audio/visual stories in the form of radio programs, books-on-tape, podcasts, television, movies and animations, are especially powerful because they provide a rich multisensory experience. Technological advances have made it easy to capture stories using the microphones and cameras that are readily available in our mobile devices, But, the raw media rarely tells a compelling story. The best storytellers carefully compose, filter, edit and highlight the raw media to produce an engaging piece. Yet, the software tools they use to create and manipulate the raw audio/video media (e.g. Pro Tools, Photoshop, Premiere, Final Cut Pro, Maya etc.) force storytellers to work at a tediously low-level – selecting, filtering and layering pixels or cutting and transitioning between audio/video frames. While these tools provide flexible and precise control over the look and sound of the final result, they are notoriously difficult to learn and accessible primarily to experts. In this talk I'll present a number of recent projects that aim to significantly reduce the effort required to edit and produce high-quality audio/visual stories.
The information contained across many data sets is often highly correlated. Such connections and correlations can arise because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for data sets of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc. We argue that when extracting knowledge from the data in a given data set, we can do significantly better if we exploit the wider context provided by all the relationships between this data set and a ‘society' or 'social network' of other related data sets. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps, small and large, between inter-related data. The network can act as a regularizer, allowing us to benefit from the 'wisdom of the collection' in performing operations on individual data sets or in map inference between them.
We present a generalization of Algebraic Point Set Surfaces for the analysis of point-clouds in Scale-Space, called Growling Least Squares. We will show some work-in-progress using this technique to decompose and interactively edit multi-scale features on large point-clouds. We will also show how algebraic sphere fitting can be used to develop enhanced shading effects in the real-time ray-tracer used in Modo.
We present recent advance in Moving Least Square surfaces. In particular, we will show how to extend the concept of Algebraic Point Set Surfaces to point clouds with non-oriented input normals. Indeed, when line of sight information is lacking, computing a consistent orientation is as difficult as the surface reconstruction problem itself. We will also show applications of this new technique to image abstraction and stylization.
Polygon surface meshes are preferred over triangle meshes in a number of applications related to geometric modeling and reverse engineering. Among those, anisotropic meshes are preferred over isotropic ones when seeking faithful surface approximation for a low number of elements. In this talk I will present an approach for anisotropic polygonal surface remeshing. Our algorithm takes as input a surface triangle mesh. An anisotropic rectangular metric is derived from a user-specified normal-based tolerance error and the requirement to favor rectangle-shaped polygons. Our algorithm uses a greedy refinement and relaxation procedure that adds, deletes and relocates generators so as to match two criteria related to partitioning and conformity. I will discuss several directions to generalize this metric and to consolidate it in order to optimize the complexity / distortion trade-off for effective shape approximation.
This is joint work with Bertrand Pellenard and Jean-Marie Morvan.
We present a method for reconstructing surfaces from point sets. The main novelty lies in a structure-preserving approach where the input point set is first consolidated by structuring and resampling the planar components, before reconstructing the surface from both the consolidated components and the unstructured points. The final surface is obtained through solving a graph-cut problem formulated on the 3D Delaunay triangulation of the structured point set where the tetrahedra are labeled as inside or outside cells. Structuring facilitates the surface reconstruction as the point set is substantially reduced and the points are enriched with structural meaning related to adjacency between primitives. Our approach departs from the common dichotomy between smooth/piecewise-smooth and primitive-based representations by combining canonical parts from detected primitives and free-form parts of the inferred shape. Our experiments on a variety of inputs illustrate the potential of our approach in terms of robustness, flexibility and efficiency.
This is joint work with Pierre Alliez.
Synthesizing structured content from example is challenging, since blindly introducing randomness in the process may break precise alignments between features. In this talk I will describe two techniques for synthesizing new structured images from example, which do not require a high-level description of the content. The first approach targets synthesis of textures used in architectural scenes, such as facades, control panels, doors and windows. The second approach targets structured patterns synthesized along curves. Our approaches provide convincing results at little cost, and afford for compact storage of the results. This is especially important for applications with downloadable content.
3D scanning technology developed very fast, shapes with fruitful details can be well captured, and we need to convert point clouds to mesh models. Previous works mainly focus on high-quality input, and mainly small scale data, e.g., Bunny, Dragon models. Recent trend in point cloud processing is large scale data: e.g., façade, factory, and process of lower quality data by consumer level devices: e.g., Microsoft Kinect. This talk will introduce two works on recovering High-quality structure information from low-quality scan acquisition: adaptive partitioning of urban facades and structure recovery by part assembly.