Learning a Causal Transition Model for
Object Cutting

Zeyu Zhang*, Muzhi Han*, Baoxiong Jia, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu, Hangxin Liu†

*denotes joint first authors, †denotes corresponding author

PDF | Code (coming soon)

Planning for object cutting with a stochastic grammar of object fragmentation. The grammar reveals the underlying fluent space of object fragmentation and captures causal transitions in a compositional manner with production rules. An observed fragmentation process is represented as a parse tree derived from the grammar; planning for object cutting is to infer an optimal parse tree that describes the desired fragmentation. Observing cutting a carrot could support planning actions for cutting a potato into the same by sharing the production rule $c_3 \rightarrow c_5 c_5$.

Abstract

Cutting objects into desired fragments is challenging for robots due to the spatially unstructured nature of fragments and the complex one-to-many object fragmentation caused by actions. We present a novel approach to model object fragmentation using an attributed stochastic grammar. This grammar abstracts fragment states as node variables and captures causal transitions in object fragmentation through production rules. We devise a probabilistic framework to learn this grammar from human demonstrations. The planning process for object cutting involves inferring an optimal parse tree of desired fragments using the learned grammar, with parse tree productions corresponding to cutting actions. We employ Monte Carlo Tree Search (MCTS) to efficiently approximate the optimal parse tree and generate a sequence of executable cutting actions. The experiments demonstrate the efficacy in planning for object-cutting tasks, both in simulation and on a physical robot. The proposed approach outperforms several baselines by demonstrating superior generalization to novel setups, thanks to the compositionality of the grammar model.

Algorithm Overview

An illustration of the inference process to obtain an optimal parse tree $pt^*$ through MCTS. (a) Given fragment point clouds in the current or goal configuration, we extract a shape feature for each fragment with a pre-trained point cloud encoder and process it with an MLP to classify the fragment type $p(c|z)$ (the vector shows probability in greyscale). (b) We show an example of a Monte Carlo search tree where the state of a search node is a parse tree derived from the grammar. The expansion of a search node is to apply production rules to its parse tree. The yellow region $\mathcal{H}(\mathcal{I}^t)$ is a set of search nodes whose states (\ie, parse trees) are sampled from fragments in $\mathcal{I}^t$ according to $p(c|z)$. (c-d) To evaluate rollout results, we find the best assignment that grounds each terminal node to a fragment in $\mathcal{I}^{g}$. The dotted lines in (c) represent an optimal assignment that maximizes the shape matching likelihood, which is further refined to maximize the layout grouping likelihood, shown in solid lines in (d).

Demo


BibTex

@inproceedings{zhang2023learning,
  title = {Learning a Causal Transition Model for Object Cutting},
  author = {Zhang, Zeyu and Han, Muzhi and Jia, Baoxiong and Jiao, Ziyuan and Zhu, Yixin and Zhu, Song-Chun and Liu, Hangxin},
  booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages = {1996--2003},
  year = {2023},
  organization = {IEEE},
}

References

2023

  1. Learning a Causal Transition Model for Object Cutting
    Manipulation TAMP Skill Learning
    Zeyu Zhang*, Muzhi Han*, Baoxiong Jia, Ziyuan Jiao, Yixin Zhu, Song-Chun Zhu, and Hangxin Liu
    In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023