Free2CAD: Parsing Freehand Drawings into CAD Commands
Changjian Li1,2 Hao Pan3 Adrien Bousseau2 Niloy J. Mitra1,4
1University College London 2Inria, Université Côte d’Azur
3 Microsoft Research Asia 4 Adobe Research
SIGGRAPH 2022
Honorable Mention Best Paper Award
Abstract
CAD modeling, despite being the industry-standard, remains restricted to usage by skilled practitioners due to two key barriers. First, the user must be able to mentally parse a final shape into a valid sequence of supported CAD commands; and second, the user must be sufficiently conversant with CAD software packages to be able to execute the corresponding CAD commands. As a step towards addressing both these challenges, we present Free2CAD wherein the user can simply sketch the final shape and our system parses the input strokes into a sequence of commands expressed in a simplified CAD language. When executed, these commands reproduce the sketched object. Technically, we cast sketch-based CAD modeling as a sequence-to-sequence translation problem, for which we leverage the powerful Transformers neural network architecture. Given the sequence of pen strokes as input, we introduce the new task of grouping strokes that correspond to individual CAD operations. We combine stroke grouping with geometric fitting of the operation parameters, such that intermediate groups are geometrically corrected before being reused, as context, for subsequent steps in the sequence inference. Although trained on synthetically-generated data, we demonstrate that Free2CAD generalizes to sketches created from real-world CAD models as well as to sketches drawn by novice users.
Free2CAD Algorithm
1) Overview
Free2CAD converts a user drawing, presented as an ordered set of strokes, to a sequence of CAD commands (a). Our method runs in two main phases. First, in the stroke grouping phase (b), individual strokes s_i , encoded as si , are processed by a Transformer, comprising of an encoder E^T and decoder D^T, to produce grouping probabilities p_i^j for each of the strokes in the context of the previous encoded group information g_j. Second, in the operation reconstruction phase (c), the group probabilities are processed, conditioned on existing geometric context, to produce geometric primitives along with their parameters. The regressed geometric primitives are used to further correct the current grouping and the updated groups are passed back as context to the subsequent stroke grouping. Note that, in the grouping correction step, if the geometric fitting is unsatisfactory, current strokes along with the fitted primitives may be skipped (see Section 6). The two phases are trained using ground truth information, when available, to provide context information.
Animated algorithm overview:
2) Processing long sequences
In order to process long sequences of input strokes, we propose a scheme that runs in multiple Steps and, in each Step, processes a maximum of K groups. Specifically, we train Free2CAD (see Figure 3 in the paper) to look at the current input sequence, process only K groups, and mark all the remaining strokes into a tail group. We then remove the strokes that have been successfully processed, and start the next Step with the remaining strokes. In this example, working with K=3, the entire sequence of 28 input strokes is handled in two Steps outputting a total of 5 groups.
Animated algorithm:
Interactive Modeling Demos
1) Demo1: camera
2) Demo2: mechanical part
Video (with voice)
Presentation Video
Results Gallery
We show a selection of modeling sessions using our method. In each case, the final model was created from a single view drawing (except example (a), where the last step is drawn from a second view) with 10-44 strokes that were respectively parsed into 4-5 groups, finally producing the corresponding inferred CAD models. All the models were drawn against an isometric grid background.
Bibtex
@Article{Li:2022:Free2CAD,
Title = {Free2CAD: Parsing Freehand Drawings into CAD Commands},
Author = {Changjian Li and Hao Pan and Adrien Bousseau and Niloy J. Mitra},
Journal = {ACM Trans. Graph. (Proceedings of SIGGRAPH 2022)},
Year = {2022},
Number = {4},
Volume = {41},
Pages={93:1--93:16},
numpages = {16},
DOI={https://doi.org/10.1145/3528223.3530133},
Publisher = {ACM}
}
Acknowledgements
The authors would like to thank the reviewers for their valuable suggestions, the user evaluation participants, Jian Shi, Yuxiao Guo and team remembers of both GraphDeco (INRIA) and SGP (UCL) groups for the valuable discussions, and Julien Philip, George Drettakis for proofreading earlier drafts of the paper. AB was supported by ERC Starting Grant D3 (ERC-2016-STG 714221); CL and NM were supported by European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 956585, ERC Grant (SmartGeometry 335373), and gifts from Autodesk and Adobe. NM thanks Tuhin for introducing him to the isometric grids.
Abstract
CAD modeling, despite being the industry-standard, remains restricted to usage by skilled practitioners due to two key barriers. First, the user must be able to mentally parse a final shape into a valid sequence of supported CAD commands; and second, the user must be sufficiently conversant with CAD software packages to be able to execute the corresponding CAD commands. As a step towards addressing both these challenges, we present Free2CAD wherein the user can simply sketch the final shape and our system parses the input strokes into a sequence of commands expressed in a simplified CAD language. When executed, these commands reproduce the sketched object. Technically, we cast sketch-based CAD modeling as a sequence-to-sequence translation problem, for which we leverage the powerful Transformers neural network architecture. Given the sequence of pen strokes as input, we introduce the new task of grouping strokes that correspond to individual CAD operations. We combine stroke grouping with geometric fitting of the operation parameters, such that intermediate groups are geometrically corrected before being reused, as context, for subsequent steps in the sequence inference. Although trained on synthetically-generated data, we demonstrate that Free2CAD generalizes to sketches created from real-world CAD models as well as to sketches drawn by novice users.
Free2CAD Algorithm
1) Overview
Free2CAD converts a user drawing, presented as an ordered set of strokes, to a sequence of CAD commands (a). Our method runs in two main phases. First, in the stroke grouping phase (b), individual strokes s_i , encoded as si , are processed by a Transformer, comprising of an encoder E^T and decoder D^T, to produce grouping probabilities p_i^j for each of the strokes in the context of the previous encoded group information g_j. Second, in the operation reconstruction phase (c), the group probabilities are processed, conditioned on existing geometric context, to produce geometric primitives along with their parameters. The regressed geometric primitives are used to further correct the current grouping and the updated groups are passed back as context to the subsequent stroke grouping. Note that, in the grouping correction step, if the geometric fitting is unsatisfactory, current strokes along with the fitted primitives may be skipped (see Section 6). The two phases are trained using ground truth information, when available, to provide context information. Animated algorithm overview:
2) Processing long sequences
In order to process long sequences of input strokes, we propose a scheme that runs in multiple Steps and, in each Step, processes a maximum of K groups. Specifically, we train Free2CAD (see Figure 3 in the paper) to look at the current input sequence, process only K groups, and mark all the remaining strokes into a tail group. We then remove the strokes that have been successfully processed, and start the next Step with the remaining strokes. In this example, working with K=3, the entire sequence of 28 input strokes is handled in two Steps outputting a total of 5 groups. Animated algorithm:
Interactive Modeling Demos
1) Demo1: cameraVideo (with voice)
Presentation Video
Results Gallery
We show a selection of modeling sessions using our method. In each case, the final model was created from a single view drawing (except example (a), where the last step is drawn from a second view) with 10-44 strokes that were respectively parsed into 4-5 groups, finally producing the corresponding inferred CAD models. All the models were drawn against an isometric grid background.
Bibtex
@Article{Li:2022:Free2CAD, Title = {Free2CAD: Parsing Freehand Drawings into CAD Commands}, Author = {Changjian Li and Hao Pan and Adrien Bousseau and Niloy J. Mitra}, Journal = {ACM Trans. Graph. (Proceedings of SIGGRAPH 2022)}, Year = {2022}, Number = {4}, Volume = {41}, Pages={93:1--93:16}, numpages = {16}, DOI={https://doi.org/10.1145/3528223.3530133}, Publisher = {ACM} }
Acknowledgements
The authors would like to thank the reviewers for their valuable suggestions, the user evaluation participants, Jian Shi, Yuxiao Guo and team remembers of both GraphDeco (INRIA) and SGP (UCL) groups for the valuable discussions, and Julien Philip, George Drettakis for proofreading earlier drafts of the paper. AB was supported by ERC Starting Grant D3 (ERC-2016-STG 714221); CL and NM were supported by European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 956585, ERC Grant (SmartGeometry 335373), and gifts from Autodesk and Adobe. NM thanks Tuhin for introducing him to the isometric grids.