Computer vision has made impressive gains through the use of deep learning models, trained with large-scale labeled data. However, labels require expertise and curation and are expensive to collect. Can one discover useful visual representations without the use of explicitly curated labels? In this talk, I will present several case studies exploring the paradigm of self-supervised learning -- using raw data as its own supervision. Several ways of defining objective functions in high-dimensional spaces will be discussed, including the use of General Adversarial Networks (GANs) to learn the objective function directly from the data. Applications of self-supervised learning will be presented, including colorization, on/off-screen source separation, paired and unpaired image-to-image translation (aka pix2pix and cycleGAN), and curiosity-based exploration.
My research topic is learning invariant representations. Simply put: whereas most of deep learning is concerned with finding the important information in an input, I focus on ignoring harmful or irrelevant parts of information. This can be important to counteract biases or to better leverage structure in the data. In this talk, I will cover two ideas. (1) Creating invariances with respect to a specific feature using adversarial training (arxiv.org/pdf/1806.05502). (2) Leveraging permutation invariant neural network architectures for addressing set-based problems (arxiv.org/abs/1901.09006 & arxiv.org/pdf/1907.12887). The application domains I am embedding this research in are intuitive physics as well as pedestrian tracking and trajectory prediction with a specific focus on relational reasoning.
In this talk, I will introduce Normalized Object Coordinate Space (NOCS) to capture category-level information about object properties such as 6D pose and shape. NOCS is a canonical space that normalizes for position, orientation, and size of instances in a certain category. It can be used to represent intra-category variation of specific object properties. I will then describe a new representation called NOCS map that can be used to learn to predict 6D pose or 3D shape from a single RGB image. NOCS maps have several advantages compared to representations such as voxel grids or point clouds. For instance, they jointly encode shape and pose, have direct 2D-3D correspondences, and allow us to use well-studied 2D CNN machinery. I will show results from both 6D pose estimation  and 3D reconstruction . Finally, I will discuss opportunities to extend NOCS maps to different object perception tasks.
Deep knowledge of the world is necessary if we are to augment real scenes with virtual entities, or to have autonomous and intelligent agents and artifacts that can assist us in everyday activities -- or even carry out tasks entirely independently. One way to factorize the complexity of the world is to associate information and knowledge with stable entities, animate or inanimate, such as a person or a vehicle, etc -- things we can generically call objects. In this talk I'll survey a number of recent efforts whose aim is to create and annotate reference representations for objects based on 3D models, with the aim of delivering information to new observations, as needed. The information may relate to object geometry, appearance, articulation, materials, physical properties, affordances, or functionality.One challenge of the 3D world is that 3D data typically come as point clouds or meshes, which do not have the regular grid structure of image or video data. This makes it challenging to apply the highly successful convolutional deep architectures (CNNs) to 3D data, as CNNs heavily depend on neighborhood regularity for weight sharing and other optimizations. The talk will illustrate deep architectures capable of processing irregular 3D geometry for tasks such as object extraction from scenes, geometric primitive fitting, inferring object function from observations, and learning to differentiate objects through language. Tools towards these goals include canonical spaces for objects and representations of their compositional structure, as well as multi-objective training and learned communication patterns in architectures.
I will discuss whether computers, using Artificial Intelligence (AI), could create art. I will cover the history of automation in art, examining the hype and reality of AI tools for art together with predictions about how they will be used. I will also discuss different scenarios for how an algorithm could be considered the author of an artwork, which comes down to questions of why we create and appreciate artwork.
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification have been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called perceptual losses? What elements are critical for their success? We introduce a new dataset of human perceptual similarity judgments and systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.Despite their strong transfer performance, deep convolutional representations surprisingly lack a basic low-level property -- shift-invariance, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the Nyquist sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components. We observe increased accuracy in ImageNet classification, across several commonly-used architectures. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks.
In my talk, I will describe recent work on learning to represent and generate 3D shapes. I will start by describing AtlasNet, an approach that represents a 3D shape as a collection of parametric surface elements and naturally infers a surface representation of the shape. I’ll then describe an extension, 3D-CODED, for matching deformable shapes to obtain 3D correspondences. Finally, I’ll describe an approach for representing shapes as the deformation and combination of learnable elementary 3D structures.
We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. Most generative models rely on 2D kernels to generate images and make few assumptions about the 3D world. These models therefore tend to create blurry images or artefacts in tasks that require a strong 3D understanding, such as novel-view synthesis. HoloGAN instead learns a 3Drepresentation of the world, and to render this representation in a realistic manner. Unlike other GANs, HoloGAN provides explicit control over the pose of generated objects through rigid-body transformations of the learnt 3D features. Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with similar or higher visual quality than other generative models. HoloGAN can be trained end-to-end from unlabelled 2D images only. Particularly, we do not require pose labels, 3D shapes, or multiple views of the same objects. This shows that HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner.
I will present about two data-driven frameworks based on neural networks for interactive character control. The first approach is called a Phase-Functioned Neural Network (PFNN). In this network structure, the weights are computed via a cyclic function which uses the phase as an input. Along with the phase, our system takes as input user controls, the previous state of the character, the geometry of the scene, and automatically produces high quality motions that achieve the desired user control. The entire network is trained in an end-to-end fashion on a large dataset composed of locomotion such as walking, running, jumping, and climbing movements fitted into virtual environments. Our system can therefore automatically produce motions where the character adapts to different geometric environments such as walking and running over rough terrain, climbing over large rocks, jumping over obstacles, and crouching under low ceilings. Our network architecture produces higher quality results than time-series autoregressive models such as LSTMs as it deals explicitly with the latent variable of motion relating to the phase. Once trained, our system is also extremely fast and compact, requiring only milliseconds of execution time and a few megabytes of memory, even when trained on gigabytes of motion data. Our work is most appropriate for controlling characters in interactive scenes such as computer games and virtual reality systems. The second approach is called Mode-Adaptive Neural Networks. This is an extension of the PFNN and has the capability to control quadruped characters, where the locomotion is multimodal. The system is composed of the motion prediction network and the gating network. At each frame, the motion prediction network computes the character state in the current frame given the state in the previous frame and the user-provided control signals. The gating network dynamically updates the weights of the motion prediction network by selecting and blending what we call the expert weights, each of which specializes in a particular movement. Due to the increased flexibility, the system can learn consistent expert weights across a wide range of non-periodic/periodic actions, from unstructured motion capture data, in an end-to-end fashion. In addition, the users are released from performing complex labelling of phases in different gaits. We show that this architecture is suitable for encoding the multi-modality of quadruped locomotion and synthesizing responsive motion in real-time.
Humans involuntarily move their eyes when retrieving an image from memory. This motion is often similar to actually observing the image. We suggest to exploit this behavior as a new modality in human computer interaction, using the motion of the eyes as a descriptor of the image. Interaction requires the user's eyes to be tracked but no voluntary physical activity. We perform a controlled experiment and develop matching techniques using machine learning to investigate if images can be discriminated based on the gaze patterns recorded while users merely think about image. Our results indicate that image retrieval is possible with an accuracy significantly above chance. We also show that this result generalizes to images not used during training of the classifier and extends to uncontrolled settings in a realistic scenario.
Traditional computer graphics rendering pipeline is designed for procedurally generating 2D quality images from 3D shapes with high performance. The non-differentiability due to discrete operations such as visibility computation makes it hard to explicitly correlate rendering parameters and the resulting image, posing a significant challenge for inverse rendering tasks. Recent work on differentiable rendering achieves differentiability either by designing surrogate gradients for non-differentiable operations or via an approximate but differentiable renderer. These methods, however, are still limited when it comes to handling occlusion, and restricted to particular rendering effects. We present RenderNet, a differentiable rendering convolutional network with a novel projection unit that can render 2D images from 3D shapes. Spatial occlusion and shading calculation are automatically encoded in the network. Our experiments show that RenderNet can successfully learn to implement different shaders, and can be used in inverse rendering tasks to estimate shape, pose, lighting and texture from a single image.
In this talk I will focus on the possibilities that arise from recent advances in the area of deep learning for accelerating and improving physics simulations. I will focus on fluids, which encompass a large class of materials we encounter in our everyday lives. In addition to being ubiquitous, the underlying physical model, the Navier-Stokes equations, at the same time represent a challenging, non-linear advection-diffusion PDE that poses interesting challenges for deep learning methods. I will explain and discuss several research projects from our lab that focus on temporal predictions of physical functions, temporally coherent adversarial training, and predictions of steady-state turbulence solutions. Among other things, it turns out to be useful to make the learning process aware of the underlying physical principles. Here, especially the transport component of the Navier-Stokes equations plays a crucial role. I will also give an outlook about open challenges in the area of deep learning for physical problems. Most importantly, trained models could serve as priors for a variety of inverse and control problems.
In this talk, I will describe a series of past, current and future projects that aim to understand 3D environments through the perspective of a human or other embodied agent acting in the scene. By learning from observations of people acting in the real world, we can obtain an agent-centric representation of the structure and semantics of 3D environments that is useful for both analysis and synthesis tasks. First, I will demonstrate how we can use this representation to analyze 3D environments and predict how likely they are to support specific human actions. Then I will show how we can use the same representation to generate 3D environments and human poses depicting common actions. Finally, I will describe ongoing work on building an embodied simulation framework to establish a common platform for research in embodied agents acting within realistic 3D environments. This platform allows us to leverage computer graphics to generate 3D environments with controlled variation, enabling systematic learning through simulation for problems in computer vision, robotics, NLP, and AI.
Human activities invariably involve movement and interactions with other objects, animate or inanimate. Reasoning about motions raises many interesting challenges, especially when we have multiple moving objects with distinct motions, yet all synergistic towards a common high-level goal. Motion estimation, representation, and segmentation from visual data are all non-trivial problems, especially since occlusions are very common in human-object interactions involving contacts. In this talk we survey a number of recent efforts in high-level motion analysis and inference from visual data, such as RGB and RGB-D videos or point clouds over time. We start by reviewing certain recent deep neural architectures for processing point cloud data. We then look at the simple problem of inferring 3D flow in dynamic point clouds, revisiting ICP from a learning perspective. We proceed to derive descriptors for human-object interactions that aim to capture certainly key aspects of the geometry and dynamics of the interaction, but without being too closely tied to any particular motion representation. We finally discuss how to infer the motion patterns of multi-step human activities in a desktop or tablet-top settings, such as for example in setting a table for dinner. We exhibit a recurrent neural architecture that can learn from 2D videos the patterns of such activities and generate synthetic interactions that follow both physical laws and human conventions. This machinery allows us to transport interactions spatially, to new settings, as well as transport interactions temporally, to produce continuations or completions of partially observed activities. This latter functionality facilitates the creation of assistive agents that can help people by inferring intent and provide them with either informational or physical help in smart environments.
In the recent years, commodity 3D sensors have become easily and widely available. These advances in sensing technology have spawned significant interest in using captured 3D data for mapping and semantic understanding of 3D environments. In this talk, I will give an overview of our latest research in the context of 3D reconstruction of indoor environments. I will further talk about the use of 3D data in the context of modern machine learning techniques. Specifically, I will highlight the importance of training data, and how can we efficiently obtain labeled and self-supervised ground truth training datasets from captured 3D content. Finally, I will show a selection of state-of-the-art deep learning approaches, including discriminative semantic labeling of 3D scenes and generative reconstruction techniques.
4D vision is an emerging area within Computer Vision addressing the capture and analysis of real-world dynamic scenes from video. This talk will review progress in 4D vision over the past decade through to current challenges. Recent advances have enabled 4D capture of natural dynamic scenes such as people together with the use of capture human performance for video-realistic animation. The technology is currently being deployed in entertainment content production for both film and immersive content for VR.
The over-segmentation of images into atomic regions has become a standard and powerful tool in Vision and Graphics. The very popular superpixel methods, that operate at the pixel level, cannot directly capture the geometric information disseminated into the images. In this talk, we propose an alternative representation to superpixels. By operating at the level of geometric shapes, typically line-segments, one can generate geometric partitions of images as layouts of polygons. Such layouts are compact, scalable, and come with geometric guarantees. We present two different methods to generate such geometric partitions. The first method builds a Voronoi diagram that conforms to preliminarily detected geometric shapes, whereas the second one exploits a kinetic framework to locally propagate the geometric shapes. Through some applications in urban reconstruction, we show that such partitions are particularly adapted to analyse images with strong geometric signatures as man-made objects.
The goal of procedural modeling is to generate realistic content. The realism of this content is typically assessed by qualitatively evaluating a small number of results, or, less frequently, by conducting a user study. However, there is a lack of systematic treatment and understanding of what is considered realistic, both in procedural modeling and for images in general. We conduct a user study that primarily investigates the realism of procedurally generated buildings. Specifically, we investigate the role of fine and coarse details, and investigate which other factors contribute to the perception of realism. We find that realism is carried on different scales, and identify other factors that contribute to the realism of procedural and non-procedural buildings.
Digital fabrication allows for an extremely fast transition from virtual prototypes to their physical realization. In the case of deformable objects, one would like to design these prototypes with a clear idea in mind about how they should behave once they are printed. It is not easy to predict what combination of material and geometric properties will produce a specific global deformation behavior. We seek to create tools that simplify as much as possible the way a user specifies the desired behavior and automate the rest of the design process. In this talk, we take a brief look at the diversity of recent works, identify the fundamental aspects of these methods, and present computational solutions for the design, simulation, and fabrication of two interesting kinds of deformable structures: i) We first explore Flexible Rod Meshes. These are light-weight and cost-efficient physical shapes, that can be fabricated in one piece from a single base material, and yet produce deformable objects with really complex behaviors. We present a tool that takes as input a deformable surface together with a set of poses and boundary conditions and automatically computes a rod mesh ready to be printed. ii) We then study Kirchhoff-Plateau Surfaces. These are planar networks of thin elastic rods embedded in pre-tensioned membranes that deploy into complex, three-dimensional shapes, composed of minimal surface patches. We propose a tool to interactively explore this intriguing and expressive design space, using a combination of topology and geometry editing, forward simulation, sensitivity analysis and highly efficient inverse design. In the last part of the talk, we’ll briefly take a look at some new trends and promising challenges in the field.
Physics simulations for solids and fluids are today essential for the production of realistic animations and special effects in feature films, computer games and surgical simulators. The underlying mathematical models often require handling of boundaries and geometric interfaces. Phenomena modelled using interfaces include but are not limited to collision handling, two-way coupling in solid-fluid interaction, multiphase flows, and cutting and fracture of solid objects. Despite the wide range of existing approaches, an accurate and robust treatment is still difficult. In this talk I will present recent approaches that aim towards robust and accurate treatment of interfaces and boundaries in physically based animation. In the first part I will give an overview over the research that resulted from the work as PhD student. The main part will be devoted to a recent approach for the simulation of cutting of deformable solids. A finite element discretization will be introduced that is able to capture discontinuities in the underlying partial differential equation's solution due to physical cuts. Without the requirement on any topological changes in the discretization mesh, basis enrichments are employed that augment the approximation basis by discontinuous functions. One key aspect of the method is the construction of specialized quadrature rules for numerical computation of integrals over piecewise polynomial but discontinuous functions arising due to dissected finite elements. On the basis of several examples and comparisons the robustness and visual realism of the method will be demonstrated. Finally, the talk will be concluded by a discussion of limitations and future work.
In the past few decades, advances in digital design tools have made it possible to design highly complex 3D shapes. However, physical realization of these shapes remains a challenging task. Recently, the emergence of affordable fabrication tools such as 3D printers and laser cutters allows us to turn a digital design into a physical object. But effective use of these tools requires the design shape to satisfy specific requirements related to the fabrication technologies, which are not considered by traditional 3D design tools. We argue that these fabrication requirements can be incorporated into the design process as geometric constraints, such that the resulting designs can be realized using specific technologies and materials. We present a few fabrication-aware design tools for different applications, from freeform architectural design to cost-effective fabrication of large objects.
In this talk I will describe ongoing efforts in video recognition at Adobe. Video presents additional challenges over recognition in still images. Example challenges include the sheer volume of data, lack of annotated data across time, and presence of action categories where motion and appearance cues are critical. I’ll describe work that addresses the issue of visual representation in video and its intersection with natural language.
Physics simulations for virtual smoke, explosions or water are by now crucial tools for special effects. Despite their wide spread use, it’s still very difficult to get get these simulations under control, or to make them fast enough for practical use. In this talk I will present recent research projects that aim for solving and alleviating these issues. A central part of this talk will be devoted to methods for interpolating fluid simulations. I will describe a method that uses 5D optical flow to register two space-time volumes of simulations. Once the registration is computed, in-between versions can be generated very efficiently. In contrast to previous work, this approach uses a volumetric representation, which is beneficial for smooth and robust registrations without user intervention. I will show several examples of smoke and liquid animations generated with this interpolation method, and discuss limitations of the approach. The talk will be concluded by giving an outlook of open questions in the area.
In this talk, I will give an introduction to Integer Programming (IP) and show how we used IP in recent research projects. The projects range from problem formulations in visualization to urban modeling.
I will discuss the problem of "tactile mesh saliency", where tactile salient points are those on a virtual mesh that a human is more likely to grasp, press, or touch if the mesh were a real-world object. While the concept of visual saliency has been previously explored in the areas of mesh and image processing, tactile saliency detection has not been explored. I will describe the solution towards this problem that we have developed. We collect crowdsourced data of relative rankings and develop a new formulation to combine deep learning and learning-to-rank methods to compute a tactile saliency measure. The solution is demonstrated on a variety of 3D meshes and various applications including material suggestion for rendering and fabrication. Time permitting, I will also discuss other problems that I have recently worked on that take a similar learning framework.
Many applications of surface models, such as mesh processing, simulation, and manufacturing, are sensitive to the topological properties of the models. To create a surface with the desirable topology, a common strategy is to first reconstruct the surface from the input data using a topology-oblivious algorithm and then fix any topological errors in a post-process. We advocate a different strategy that reconstructs the surface with topology constraints in mind. The talk reviews several recent work in this direction that revolve around reconstructing surfaces from a network of spatial curves. We will consider a variety of topological constraints, such as manifoldness, connected components, and genus.
Interactive physics-based simulations are now capable of reproducing a growing number of motion skills, often with a focus on generating agile-and-robust locomotion. In this talk, I review recent progress in simulation-based models of human and animal motion as used for computer animation, where they seek to replace simpler kinematic models based on motion-capture. We will discuss the roles of optimization, machine learning, and simplified models in these approaches, as well as what insights might be shared between robotics and our simulation-based work in animation. A wide variety of animated results will be shown to illustrate the capabilities of current methods. I'll also identify several research directions where we still need to see significant progress.
3D printers have become popular in recent years and enable fabrication of custom objects for home users. The promise of moving creations from a virtual space into reality is truly tantalizing, and its applications go far beyond basic manufacturing and rapid prototyping. However, many obstacles remain for 3D printing to be practical and commonplace. In this talk, I will review our recent works on geometric modeling and processing for 3D printing applications.
As a complete shape description, the medial axis of a geometric shape possesses a number of favorable properties--it encodes symmetry, local thickness and structural components of the shape it represents. Hence, the medial axis has been studied extensively in shape modeling and analysis since its introduction by Blum in 1960s. However, the practical application of the medial axis is hindered by its notorious instability and lack of compact representation; that is, a primitive medial axis without proper processing is often represented as a dense discrete mesh with many spurious branches. In this talk I shall represent some recent studies on computing stable and compact representations of the medial axes of 3D shapes. Techniques from mesh simplification will be employed to compute a medial axis without spurious branches and represented by a small number of mesh vertices, while meeting specified approximation accuracy.
The success of physics sandbox applications and physics-based puzzle games is a strong indication that casual users and hobbyists enjoy designing mechanisms, for educational or entertainment purposes. In these applications, a variety of mechanisms are designed by assembling two-dimensional shapes, creating gears, cranks, cams, and racks. We propose to start from such casual designs of mechanisms and turn them into a 3D model that can be printed onto widely available, inexpensive filament based 3D printers.
This talk covers two topics from computer graphics: In the first part, it is shown how to perform exact anti-aliasing in the context of rasterization by utilizing closed-form solutions of the corresponding filter convolutions. This provides a ground truth solution to edge anti-aliasing in the context of 3D-to-2D rasterization, which is made possible by an analytic visibility method. Parallel algorithms are presented for these methods and an efficient GPGPU implementation is outlined. The second part of the talk presents a reduced-order approach to shape optimization. The task of optimizing the physical behavior of fabricable models is formulated in terms of offset surfaces. This allows the associated non-linear optimization problem to be efficiently encoded in a reduced-order basis - Manifold Harmonics in our case - which significantly reduces the computational effort to find a solution.
The past decade in computer vision research has witnessed the re-emergence of 'deep learning' and in particular, convolutional neural network techniques, allowing to learn task-specific features from examples and achieving a breakthrough in performance in a wide range of applications. However, in the geometry processing and computer graphics communities, these methods are practically unknown. One of the reasons stems from the facts that 3D shapes (typically modeled as Riemannian manifolds) are not shift-invariant spaces, hence the very notion of convolution is rather elusive. In this talk, I will show some recent works from our group trying to bridge this gap. Specifically, I will show the construction of intrinsic convolutional neural networks on meshes and point clouds, with applications such as finding dense correspondence between deformable shapes and shape retrieval.
The differential structure of surfaces captured by the Laplace Beltrami Operator (LBO) can be used to construct a space for analyzing visual and geometric information. The decomposition of the LBO at one end, and the heat operator at the other end provide us with efficient tools for dealing with images and shapes. Denoising, matching, segmenting, filtering, exaggerating are just few of the problems for which the LBO provides a convenient operating environment. We will review the optimality of a truncated basis provided by the LBO, and a selection of relevant metrics by which such optimal bases are constructed. A specific example is the scale invariant metric for surfaces, that we argue to be a natural choice for the study of articulated shapes and forms.
Finite element simulations of deformable objects are typically based on spatial discretizations using either tetrahedral or hexahedral elements. This allows for simple and efficient computations, but in turn requires complicated remeshing in case of topological changes or adaptive simulations. In this talk I will show how the use of arbitrary polyhedral elements in FEM simulations avoid the need for remeshing and thereby simplifies adaptive refinement, interactive cutting, and fracturing of the simulation domain.
Physics simulations are widely recognized to be crucial tools for complex special effects in feature films. In addition real-time simulations are by now often central game-play elements in modern computer games. Despite this, we are still very far from being able to accurately simulate the complexity of nature around us, and the common numerical methods are often difficult to fine-tune and control. In this talk I will focus on fluid effects, and I will explain a different take on dealing with them in virtual environments: instead of trying to calculate everything from scratch based on a physical model, I will outline a method to capture the motion of a fluid based on a sequence of density inputs. A simulation tightly coupled with optical flow is used in an optimization step to calculate the actual flow velocities. Interestingly, an accurate flow simulation turns out to be a crucial prior to constrain the optical flow reconstruction to physical motions. I will then demonstrate how the extracted velocities can be used to re-run modified setups and generate higher-resolution versions of the original flow. The talk will be concluded by giving an outlook of requirements of the visual effects industry, open challenges for capturing flows, and areas of application outside of computer graphics.
This talk will present an overview of my recent research which evolves around discrete and computational differential geometry with applications in architecture, computational design and manufacturing. From the mathematical perspective, we are working on extensions of classical differential geometry to data and objects which frequently arise in applications, but do not satisfy the classical differentiability assumptions. On the practical side, our work aims at geometric modeling tools which include important aspects of function and fabrication already in the design phase. This interplay of theory and applications will be illustrated at hand of selected recent projects on the computational design of architectural freeform structures under manufacturing and structural constraints. In particular, we will address smooth skins from simple and repetitive elements, self-supporting structures, form-finding with polyhedral meshes, optimized support structures, shading systems and the exploration of the available design space.
Visual media surrounds us, and there is growing interest in new applications such as 3D printing and collaborative virtual worlds. As more and more people engage in producing visual content, there is a demand for interfaces that help novice users carry out creative design. Such an interface should allow people to easily and intuitively express high-level design goals, such as 'create a fast airplane' or 'create a cute toy', while allowing the final product to be customized according to each person's preferences. Current interfaces require the design goal to be reached through careful planning and execution of a series of low-level drawing and editing commands -- which requires previsualization, dexterity and time -- or serendipitiously through largely unstructured exploration. The gap between how a person thinks about what she wants to create, and how she can interact with a computer to get there, is a barrier for the novice. In this talk, I will present recent work on capturing design intent in high-level, linguistic terms. For example, the designer may want to make a virtual creature more 'scary', or a web page more 'artistic'. Such requirements are natural for humans, yet cannot be directly expressed in current interfaces. Our work combines crowdsourcing, machine learning and probabilistic shape analysis to create a design interface that directly supports such expression. The approach is data-driven: large repositories of existing designs are used to learn shared structure and semantics, and repurposed for synthesizing new designs. I will conclude with a discussion of directions, opportunities and challenges for new tools for high-level design that exploit the inter-relationship of semantics, function and form to aid the creative process.
Storytelling is essential for communicating ideas. When they are well told, stories help us make sense of information, appreciate cultural or societal differences, and imagine living in entirely different worlds. Audio/visual stories in the form of radio programs, books-on-tape, podcasts, television, movies and animations, are especially powerful because they provide a rich multisensory experience. Technological advances have made it easy to capture stories using the microphones and cameras that are readily available in our mobile devices, But, the raw media rarely tells a compelling story. The best storytellers carefully compose, filter, edit and highlight the raw media to produce an engaging piece. Yet, the software tools they use to create and manipulate the raw audio/video media (e.g. Pro Tools, Photoshop, Premiere, Final Cut Pro, Maya etc.) force storytellers to work at a tediously low-level – selecting, filtering and layering pixels or cutting and transitioning between audio/video frames. While these tools provide flexible and precise control over the look and sound of the final result, they are notoriously difficult to learn and accessible primarily to experts. In this talk I'll present a number of recent projects that aim to significantly reduce the effort required to edit and produce high-quality audio/visual stories.
The information contained across many data sets is often highly correlated. Such connections and correlations can arise because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for data sets of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc. We argue that when extracting knowledge from the data in a given data set, we can do significantly better if we exploit the wider context provided by all the relationships between this data set and a ‘society' or 'social network' of other related data sets. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps, small and large, between inter-related data. The network can act as a regularizer, allowing us to benefit from the 'wisdom of the collection' in performing operations on individual data sets or in map inference between them.
We present a generalization of Algebraic Point Set Surfaces for the analysis of point-clouds in Scale-Space, called Growling Least Squares. We will show some work-in-progress using this technique to decompose and interactively edit multi-scale features on large point-clouds. We will also show how algebraic sphere fitting can be used to develop enhanced shading effects in the real-time ray-tracer used in Modo.
We present recent advance in Moving Least Square surfaces. In particular, we will show how to extend the concept of Algebraic Point Set Surfaces to point clouds with non-oriented input normals. Indeed, when line of sight information is lacking, computing a consistent orientation is as difficult as the surface reconstruction problem itself. We will also show applications of this new technique to image abstraction and stylization.
Polygon surface meshes are preferred over triangle meshes in a number of applications related to geometric modeling and reverse engineering. Among those, anisotropic meshes are preferred over isotropic ones when seeking faithful surface approximation for a low number of elements. In this talk I will present an approach for anisotropic polygonal surface remeshing. Our algorithm takes as input a surface triangle mesh. An anisotropic rectangular metric is derived from a user-specified normal-based tolerance error and the requirement to favor rectangle-shaped polygons. Our algorithm uses a greedy refinement and relaxation procedure that adds, deletes and relocates generators so as to match two criteria related to partitioning and conformity. I will discuss several directions to generalize this metric and to consolidate it in order to optimize the complexity / distortion trade-off for effective shape approximation.
This is joint work with Bertrand Pellenard and Jean-Marie Morvan.
We present a method for reconstructing surfaces from point sets. The main novelty lies in a structure-preserving approach where the input point set is first consolidated by structuring and resampling the planar components, before reconstructing the surface from both the consolidated components and the unstructured points. The final surface is obtained through solving a graph-cut problem formulated on the 3D Delaunay triangulation of the structured point set where the tetrahedra are labeled as inside or outside cells. Structuring facilitates the surface reconstruction as the point set is substantially reduced and the points are enriched with structural meaning related to adjacency between primitives. Our approach departs from the common dichotomy between smooth/piecewise-smooth and primitive-based representations by combining canonical parts from detected primitives and free-form parts of the inferred shape. Our experiments on a variety of inputs illustrate the potential of our approach in terms of robustness, flexibility and efficiency.
This is joint work with Pierre Alliez.
Synthesizing structured content from example is challenging, since blindly introducing randomness in the process may break precise alignments between features. In this talk I will describe two techniques for synthesizing new structured images from example, which do not require a high-level description of the content. The first approach targets synthesis of textures used in architectural scenes, such as facades, control panels, doors and windows. The second approach targets structured patterns synthesized along curves. Our approaches provide convincing results at little cost, and afford for compact storage of the results. This is especially important for applications with downloadable content.
3D scanning technology developed very fast, shapes with fruitful details can be well captured, and we need to convert point clouds to mesh models. Previous works mainly focus on high-quality input, and mainly small scale data, e.g., Bunny, Dragon models. Recent trend in point cloud processing is large scale data: e.g., façade, factory, and process of lower quality data by consumer level devices: e.g., Microsoft Kinect. This talk will introduce two works on recovering High-quality structure information from low-quality scan acquisition: adaptive partitioning of urban facades and structure recovery by part assembly.