UniFolding: Towards Sample-efficient, Scalable, and Generalizable Robotic Garment Folding

Han Xue*, Yutong Li*, Wenqiang Xu, Huanyu Li, Dongzhe Zheng, Cewu Lu 

Shanghai Jiao Tong University

CoRL 2023

Abstract

This paper explores the development of UniFolding, a sample-efficient, scalable, and generalizable robotic system for unfolding and folding various garments. UniFolding employs the proposed UFONet neural network to integrate unfolding and folding decisions into a single policy model that is adaptable to different garment types and states. The design of UniFolding is based on a garment’s partial point cloud, which aids in generalization and reduces sensitivity to variations in texture and shape. The training pipeline prioritizes low-cost, sampleefficient data collection. Training data is collected via a human-centric process with offline and online stages. The offline stage involves human unfolding and folding actions via Virtual Reality, while the online stage utilizes human-in-theloop learning to fine-tune the model in a real-world setting. The system is tested on two garment types: long-sleeve and short-sleeve shirts. Performance is evaluated on 20 shirts with significant variations in textures, shapes, and materials.

Paper & Code

You can find our paper here

You can find our code here

Challenges

Method & Advantages

Experiments Setup

Action Primitives

The pipeline of UniFolding system. It contains two stages to fully fold a garment from a random initial state, namely Unfolding and Folding

Method Details


Left: UFONet takes a masked point cloud of the observed garment state as the input, predicts the primitive action to be performed, and regresses the picking (placing) points.


Right: The training pipeline of our method. It consists of offline data collection in simulation with human demonstration and online data collection in the real world. In the offline data collection phase, we collect human demonstration data for unfolding and folding tasks through a Virtual Reality interface in a fast and low-cost manner. By leveraging human priors from the demonstrations, we can simplify the dense action space into a ranking problem with a sparse set of keypoint candidates. This substantially reduces exploration time in both simulation and the real world. After obtaining an initial policy from offline supervised learning, we perform self-supervised learning in simulation for unfolding tasks. In the online data collection phase, we adopt a human-in-the-loop learning approach to fine-tune the policy in the real world.

Demo

video.mp4

Garment Instances

We tested on two garment types: long-sleeve shirts, and short-sleeve T-shirts, choosing 60 real-world garment intances with varying shapes, sizes, textures, and materials. Sizes ranged from 38cm×60cm to 80cm× 167cm, aspect ratios varied between 0.2695 : 1 and 1.1167 : 1, and materials  included cotton, polyester, spandex, nylon, viscose, wool and so on. The garments were divided into train/test sets at a 2 : 1 ratio.  All the garments in the testing set remain unseen in all experiments.

    Unseen Testing Set (Long-sleeve Shirts)    

      Unseen Testing Set (Short-sleeve T-Shirts)

               Training Set (Long-sleeve Shirts)     

Training Set (Short-sleeve T-Shirts)

Experiments for Each Unseen Garment Instance

Long-sleeve Shirts

als.1.mp4
als.2.mp4
als.3.mp4
als.4.mp4
als.5.mp4
als.6.mp4
als.7.mp4
als.8.mp4
als.9.mp4
als.10.mp4

Short-sleeve T-Shirts

ass.11.mp4
ass.12.mp4
ass.13.mp4
ass.14.mp4
ass.15.mp4
ass.16.mp4
ass.17.mp4
ass.18.mp4
ass.19.mp4
ass.20.mp4

Comparaison with Baseline

Each set of GIFs below represents a comparison between the ClothFunnes baseline (left) and our method (right), with experiments conducted with the same garments. We can see that the baseline method works relatively well on garments with solid and light color, but it suffers from complex textures, unusual shapes and materials. Our method exhibts strong generalization ability for various garments.

Annotation Examples for Fine-tuning

Short-sleeve T-shirts (training set)

Annotation of Best Action Type and Pick  Points

Fling

Human Preference on Model Prediction

P1 (left) > P2 (right)

Human Preference on Model Prediction

P1 (left) > P2 (right)

Is human annotation (left) better than all pairs of keypoint candidate predictions (right)?

No

Annotation of Best Action Type and Pick  Points

Fling

Human Preference on Model Prediction

P1 (left) < P2 (right)

Human Preference on Model Prediction

P1 (left) > P2 (right)

Is human annotation (left) better than all pairs of keypoint candidate predictions (right)?

Yes

Long-sleeve Shirts (training set)

Annotation of Best Action Type and Pick  Points

Fling

Human Preference on Model Prediction

P1 (left) < P2 (right)

Human Preference on Model Prediction

P1 (left) < P2 (right)

Is human annotation (left) better than all pairs of keypoint candidate predictions (right)?

No

Annotation of Best Action Type and Pick  Points

Fling

Human Preference on Model Prediction

P1 (left) < P2 (right)

Human Preference on Model Prediction

P1 (left) < P2 (right)

Is human annotation (left) better than all pairs of keypoint candidate predictions (right)?

Yes