Hybrid Diffusion Planning

Abstract

Constructing robots to accomplish long-horizon tasks is a long-standing challenge within artificial intelligence. Approaches using generative methods, particularly Diffusion Models, have gained attention due to their ability to model continuous robotic trajectories for planning and control. However, we show that these models struggle with long-horizon tasks that involve complex decision-making and, in general, are prone to confusing different modes of behavior, leading to failure. To remedy this, we propose to augment continuous trajectory generation by simultaneously generating a high-level symbolic plan. We show that this requires a novel mix of discrete variable diffusion and continuous diffusion, which dramatically outperforms the baselines. In addition, we illustrate how this hybrid diffusion process enables flexible trajectory synthesis, allowing us to condition synthesized actions on partial and complete discrete conditions.

Trajectory-only Diffusion Models struggle with long-horizon decision making tasks, such as sorting tasks. These are highly multimodal tasks, and while all demonstrations terminate in a sorted state, the trained model mix behaviours in the dataset, resulting in flawed plans. Hybrid Diffusion Planning (HDP), by combining the planning task with the diffusion of a discrete symbolic plan, however, shows remarkable ability to solve complex tasks.

Trajectory-only Diffusion

Hybrid Diffusion Planning

Simulated tasks

X-Arm Sorting

Arrange Blocks

Hook Task

Method	X-Arm Sorting	Arrange Blocks	Hook Task
Diffuser	46%	67%	38%
Joint Diffuser	41%	61%	48%
Separate Diffuser	38%	62%	43%
Hybrid (Ours)	83%	74%	60%

In addition to experimental benchmarks, we measure the robustness of all methods as task complexity increases by varying the number of blocks to sort. We find that Hybrid Diffusion Planning is significantly stronger than the baselines.

2 Blocks

3 Blocks

4 Blocks

Real-world Evaluation

Real Sorting Task

Real Hook Task

Method	Sorting Task	Hook Task
Diffuser	20%	6.7%
Joint Diffuser	10%	10%
Separate Diffuser	0%	6.7%
Hybrid (Ours)	70%	60%