CLIMB: Language-Guided Continual Learning for Task Planning with Iterative Model Building

1Georgia Institute of Technology, 2Georgia Tech Research Institute 3University of Toronto 4Ohio State University 5Nvidia

Abstract

We present CLIMB, a continual learning framework for robot task planning that leverages foundation models and execution feedback to guide domain model construction.

CLIMB incrementally builds a PDDL model of its operating environment while completing tasks, creating a set of world state predicates that function as a representation of the causal structure present in the environment. CLIMB's continual learning approach enables it to solve classes of problems it has previously encountered without needing to relearn task-specific information, endowing it with the ability to expand its environment representation to novel problem formulations.

We show CLIMB to be a capable zero-shot planner for simple tasks. For complex tasks with non-obvious predicates, we demonstrate CLIMBs ability to self-improve through iterative executiong and feedback, resulting in superior performance once a PDDL model has been established.

Architecture


The CLIMB Planning Framework includes independent modules for problem translation, planning, predicate generation, verification, execution, and perception. CLIMB is comprised of modules that generate the PDDL, construct a plan trace for the given problem, observe the robot's performance, and refine the PDDL through observation and queried solutions from the LLM. Each of the LLM modules implements OpenAI's gpt-4o-2024-08-06 model.

Observing Failures

When attempting to build a pyramid, CLIMB is initially missing a precondition that the two base blocks must be adjacent. The resulting plan results in an observation that the block was placed between, but not on top of, the base blocks.

Learning from Experience

Through iterative reprompting and refinement, CLIMB adds an additional precondition that 2 blocks must be adjacent for a third block to be spanned across them. Adding base adjacency solves the pyramid stacking problem.

Real Robot Demonstrations



We demonstrate CLIMB's ability to learn from evaluations on real hardware. We have created a demonstration platform which mirrors the IsaacLab Blocksworld environment to evaluate the sim to real translation of CLIMB's observations and plans. To address the challenge of perception, we place AprilTag2 markers on the cubes and perceive the scene using a wrist-mounted camera. By returning to a home position after each action, we can gather full pose information for all objects in the scene. The object and robot pose data gathered in this manner mirrors the ground truth state information available in IsaacSim, resulting in similar performance across the two platforms.


Below are demonstrations of CLIMB solving problems and learning from failures in our BlocksWorld++ on real hardware.

IsaacLab Task Playground

We have developed a simulation environment in IsaacLab to evaluate BlocksWorld and BlocksWorld++ problem cases. The IsaacLab environment mirrors the physical robot system but abstracts away perception by directly accessing object state information. The information provided to the LLM for error evaluation and improvment directly matches the physical robot embodiment.

Related Work

We would like to highlight some other work on language-guided robot planning:

  • InterPreT uses a similar feedback and improvement mechanism to learn predicates and refine domain representations through human-provided feedback.
  • The LASP and COWP planners also uses a hybrid architecture to improve existing PDDL plans through error feedback from execution.
  • Tom Silver's work at MIT has recently evaluated both LLM code generation for PDDL planning tasks and predicate invention for planning abstraction.
  • BibTeX

    
          @misc{byrnes2024climblanguageguidedcontinuallearning,
            title={CLIMB: Language-Guided Continual Learning for Task Planning with Iterative Model Building}, 
            author={Walker Byrnes and Miroslav Bogdanovic and Avi Balakirsky and Stephen Balakirsky and Animesh Garg},
            year={2024},
            eprint={2410.13756},
            archivePrefix={arXiv},
            primaryClass={cs.RO},
            url={https://arxiv.org/abs/2410.13756}, 
          }