Long-Horizon Planning¶

Long-Horizon Planning (Phase E34) lets a Grampus agent break a complex, multi-step task into a structured DAG of subgoals, execute them — in parallel where possible — and automatically recover from failures without restarting from scratch. Use it when a task requires more than 4–5 tool calls, when subgoal dependencies matter for correctness, or when you need confident progress tracking across long-running work.

When to use planning vs. ReAct¶

	AgentRunner (ReAct)	PlanningRunner
Best for	Conversational turns, short tasks	Multi-step research, pipelines, batch jobs
Structure	Greedy step-by-step	Structured DAG of verified subgoals
Failure handling	Hits `max_iterations`	Retry → fallback → partial replan
Token cost	Full history passed every call	Scoped context per subgoal (82% reduction)
Parallelism	Sequential tool calls	Independent subgoals run concurrently
Overhead	None	1–2 extra LLM calls for complexity check + synthesis

The runner automatically detects simple tasks and delegates directly to the underlying AgentRunner without planning overhead (see Adaptive routing).

Prerequisites¶

pip install "grampus-ai[anthropic]"   # or openai

No additional dependencies — planning uses only the LLM client you already configure.

Minimal example¶

import asyncio
from grampus.core.models.anthropic import AnthropicClient
from grampus.core.types import AgentDefinition
from grampus.orchestration import AgentRunner, PlanningRunner, PlanningConfig
from grampus.tools.executor import ToolExecutor

async def main():
    client = AnthropicClient(api_key="...")
    executor = ToolExecutor(registry=...)   # your tool registry
    agent_runner = AgentRunner(client, executor)

    planner = PlanningRunner(
        agent_runner=agent_runner,
        model_client=client,
        model_id="claude-opus-4-7",   # powerful model for planning
        config=PlanningConfig(
            complexity_threshold=4,       # skip planning for simple tasks
            max_subgoals=10,
            max_replans=3,
            enable_lookahead=True,        # FLARE-style path simulation
            enable_parallel_subgoals=True,
        ),
    )

    agent_def = AgentDefinition(
        name="research-agent",
        model="claude-sonnet-4-6",
        system_prompt="You are a research assistant.",
        max_iterations=8,
    )

    result = await planner.run(
        "Research the top 5 Python async frameworks, compare their performance benchmarks, "
        "and write a summary with a recommendation.",
        agent_def,
        tool_names=["web_search", "read_url", "write_file"],
    )

    print(result.final_output)
    print(f"Subgoals completed: {result.completed_subgoals}")
    print(f"Replans triggered:  {result.replans_triggered}")

asyncio.run(main())

Architecture¶

User task
    │
    ▼
┌─────────────────────────────────────────────────────┐
│  Complexity gate (1 cheap LLM call)                 │
│  estimated_steps ≤ threshold → AgentRunner directly │
│  estimated_steps > threshold → full planning        │
└──────────────────────────┬──────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────┐
│  Planner (powerful model)                           │
│  • Generates SubGoal DAG from task + tool list      │
│  • Validates: unique IDs, no cycles, valid deps     │
│  • Topological sort → parallel execution waves      │
└──────────────────────────┬──────────────────────────┘
                           │
              ┌────────────┴────────────┐
              │  Wave 0 (no deps)       │  Wave 1 (deps met)  ...
              │  SubGoal A  SubGoal B   │  SubGoal C
              │  (asyncio.gather)       │
              └────────────────────────┘
                           │
              ┌────────────▼────────────┐
              │  For each subgoal:      │
              │  1. LookaheadSimulator  │  ← FLARE path scoring
              │  2. AgentRunner (scoped │  ← TDP: only task +
              │     context only)       │    completed summaries
              │  3. PostconditionVerify │  ← pass / partial / fail
              │  4. Retry if partial    │
              │  5. Try fallback if fail│
              └────────────┬────────────┘
                           │ subgoal FAILED after all retries?
                           ▼
              ┌────────────────────────┐
              │  Replanner             │  ← partial replan only
              │  Preserves completed   │    (Google DeepMind design)
              │  Generates new subgoals│
              └────────────┬───────────┘
                           │
                           ▼
              ┌────────────────────────┐
              │  Synthesis (1 call)    │  ← combine all outputs
              └────────────┬───────────┘
                           │
                           ▼
                       PlanResult

Adaptive routing¶

The planning layer first estimates task complexity with a single cheap LLM call. If the estimated tool-call count is at or below PlanningConfig.complexity_threshold (default 4), the runner delegates directly to AgentRunner.run() without creating a plan. This eliminates planning overhead on simple conversational tasks.

config = PlanningConfig(
    complexity_threshold=4,   # tasks needing ≤4 tool calls skip planning
)

Change complexity_threshold to tune the break-even point for your workload.

Scoped context (TDP)¶

Each subgoal executor receives a fresh context containing only:

Global task description — one sentence for orientation
Completed subgoal summaries — one line per finished step: - id: output_summary
Current subgoal — description + verifiable success criterion

The full conversation history is not passed. This is the core of Task-Decoupled Planning (arXiv 2601.07577) and reduces token usage by ~82% on long plans while confining error propagation to the active subgoal.

Lookahead path simulation¶

When enable_lookahead=True (the default), the runner generates lookahead_paths candidate execution approaches before committing to each subgoal. The approach with the highest estimated success score is injected as a hint into the subgoal executor's prompt.

This is a lightweight version of the FLARE trajectory simulation from "Why Reasoning Fails to Plan" (arXiv 2601.22311). Lookahead is advisory: if parsing fails for any reason, the executor proceeds without a hint — it never blocks execution.

config = PlanningConfig(
    enable_lookahead=True,
    lookahead_paths=2,   # number of candidate paths per subgoal
)

Retry and fallback logic¶

For each subgoal, the executor runs this control flow:

execute() → verify()
    PASS   → subgoal COMPLETED
    PARTIAL → retry (up to max_retries times)
    FAIL   → try fallback_strategy (one attempt)
           → if still fails: subgoal FAILED → trigger Replanner

The fallback_strategy field on a SubGoal is a plain-English description of an alternative approach the LLM should try if the primary strategy fails. The planner populates it automatically; you can also set it explicitly when constructing subgoals for tests or manual plans.

Partial replanning¶

When a subgoal fails after all retries and its fallback, the Replanner is called. It receives:

The original task
All completed subgoals and their outputs (unchanged)
The failed subgoal and its failure reason
The remaining planned subgoals (now invalidated)

The replanner generates only the new downstream subgoals — completed work is preserved. This is based on the Google DeepMind Subgoal Framework (arXiv 2603.19685) which shows that partial replanning preserves completed work and reduces cost vs. full replan.

config = PlanningConfig(
    max_replans=3,   # hard cap on replan cycles before raising PlanningError
)

If max_replans is reached, PlanningError(code="MAX_REPLANS_EXCEEDED") is raised.

Using `planning_node` in a Graph¶

Wrap PlanningRunner as a graph node for composable multi-step pipelines:

from grampus.orchestration import Graph, planning_node, human_node

handler = planning_node(
    planning_runner=planner,
    agent_def=agent_def,
    tool_names=["web_search", "write_file"],
    memory_context_key="memory_context",   # reads from state.metadata
)

async def route(state):
    plan = state.metadata.get("plan_result", {})
    return "review" if not plan.get("success") else "end"

graph = (
    Graph(graph_id="research-pipeline")
    .add_node("plan", handler, entry=True)
    .add_conditional_edge("plan", route, {"review": "review", "end": None})
    .add_node("review", human_node("Planning failed — please review."))
)

The node appends an ASSISTANT message with final_output and stores the full PlanResult dict in state.metadata["plan_result"]. State status is set to COMPLETED on success, FAILED if PlanResult.success=False.

PlanResult fields¶

Field	Type	Description
`task`	`str`	Original user task
`plan`	`Plan`	Final plan version executed (may be a replan)
`final_output`	`str`	Synthesized answer from all completed subgoals
`completed_subgoals`	`list[str]`	IDs of successfully completed subgoals
`failed_subgoals`	`list[str]`	IDs of subgoals that could not be completed
`replans_triggered`	`int`	Number of replan cycles that occurred
`total_token_usage`	`TokenUsage \\| None`	Accumulated token usage
`duration_seconds`	`float`	Wall-clock duration
`success`	`bool`	`True` when all subgoals completed without failures

Cost model¶

Call type	When	Model tier
Complexity estimate	Once per run	`fast`
Plan creation	Once per run (+ once per replan)	`powerful`
Lookahead	Once per subgoal (if enabled)	`fast`
Subgoal execution	Once+ per subgoal (via AgentRunner)	`balanced`
Verification	Once+ per subgoal	`fast`
Synthesis	Once per run	`balanced`

For a 6-subgoal plan with no replanning and lookahead enabled: roughly 14 LLM calls total (1 complexity + 1 plan + 6 lookahead + 6 verify + 1 synthesis = 15, minus any subgoal internal calls that short-circuit).

Wire in a CostTracker to get a full accounting:

from grampus.orchestration import CostTracker

tracker = CostTracker(agent_id="research-agent", session_id="s1", budget_usd=0.50)
planner = PlanningRunner(agent_runner, client, model_id, cost_tracker=tracker)

Research basis¶

Design decision	Source
Greedy step selection fails on long horizons	"Why Reasoning Fails to Plan", arXiv 2601.22311 (Jan 2026)
Scoped context reduces tokens 82%	Task-Decoupled Planning (TDP), arXiv 2601.07577 (Jan 2026)
Fallback before replanning doubles success rate	ReAcTree, arXiv 2511.02424 (AAMAS 2026)
Partial replan, preserve completed subgoals	Google DeepMind Subgoal Framework, arXiv 2603.19685 (Mar 2026)
Adaptive engagement avoids overhead on simple tasks	"Learning When to Plan", arXiv 2509.03581
DAG structure enables parallel subgoal execution	TDP + ReAcTree

See also the Orchestration API reference for the full type reference.