If you use this work or find it helpful, please consider citing:
Training robotic policies directly in the real world is expensive and unscalable. Although generative simulation enables large-scale data synthesis, current approaches often fail to generate logically coherent long-horizon tasks and struggle with dynamic physical uncertainties due to open-loop execution.
To address these challenges, we propose Affordance-Graphed Task Worlds (AGT-World), a unified framework that autonomously constructs interactive simulated environments and corresponding robot task policies based on real-world observations. Unlike methods relying on random proposals or static replication, AGT-World formalizes the task space as a structured graph, enabling the precise, hierarchical decomposition of complex goals into theoretically grounded atomic primitives. Furthermore, we introduce a Self-Evolution mechanism with hybrid feedback to autonomously refine policies, combining Vision-Language Model reasoning and geometric verification. Extensive experiments demonstrate that our method significantly outperforms in success rates and generalization, achieving a self-improving cycle of proposal, execution, and correction for scalable robot learning.
1. Open Refrigerator
2. Pick Up Glass
3. Put Glass In
4. Close Refrigerator
Iteration 2 (Final Success)
1. Pick Up Apple
2. Put Apple Into
Iteration 2 (Final Success)
1. Task Modeling & Graph Definition
First, we give the definition of Simple Task and Complex Long-horizon Task:
To accomplish these tasks, an Action Flow \( \pi(T) \) is executed to achieve a specific simple task, while an Action Transfer \( e(T_k,T_{k+1}) \) bridges consecutive simple tasks \( T_k \) and \( T_{k+1} \) by aligning their boundary world states.
Building upon these concepts, we formalize the task space as a structured Affordance-Graphed Task World. The world is represented as a universal directed graph \( \mathcal{G} = (V, E) \):
2. Compositional Reachability Theorem
Based on our probabilistic graph model, we provide a theoretical guarantee for the solvability of generated tasks:
3. Self-Evolution Mechanism (Expression 17)
To refine execution parameters autonomously, the evolution from iteration \( \tau \) to \( \tau+1 \) is governed by the following mechanism:
\[ (\hat{\pi}_{\tau+1}, \hat{m}_{\tau+1}) \sim p_E\left((\pi, m) \mid \mathbf{X}_k^I, \mathbf{m}, \mathcal{T}, \left\{\left(\hat{\pi}_i, \hat{m}_i\right)\right\}_{i=1}^{\tau}; \epsilon_4\right). \]
Symbols Explanation:
If you use this work or find it helpful, please consider citing: