The agent loop
GantryEngine runs one loop per task. Each iteration has four steps.
The loop continues until the LLM produces an answer with no tool calls.
START
│
▼
memory_recall ──── injects top-3 relevant past results (first step only)
│
▼
observe ─────────── takes a screenshot / accessibility snapshot (if perception= is set)
│
▼
think ───────────── LLM receives full message history → decides: call tools or finish
│
▼
act ─────────────── runs each tool call; approval gate fires here if configured
│
▼
review ──────────── did the LLM return tool calls? yes → back to observe. no → END
What each step does
memory_recall fires only on the first iteration. It searches long-term memory
(if configured) with the task as the query and prepends the top results to the
message history. Skipped if no memory= is attached.
observe calls perception.observe() and appends the result (screenshot, URL,
accessibility tree) as a HumanMessage. Skipped if no perception= is attached.
Screenshot diffing is automatic — if the screen hasn't changed since the last step,
the image is omitted to save vision tokens.
think sends the full message history to the LLM. The response is either:
- A list of tool calls → continue to
act - A plain text answer with no tool calls →
reviewwill terminate the loop
act runs each tool call from the think response:
- Checks the guardrail approval list — if the tool requires approval, calls
approval_callback(or interrupts ifuse_interrupt=True) - Resolves secret aliases in arguments (if
secrets=is configured) - Executes the tool and records the result as a
ToolMessage - Catches all exceptions — errors become
ToolMessage(status="error")so the LLM can self-correct
review checks whether the last LLM message contained tool calls. If not, the task is
considered complete and the loop exits.
Creating an engine
from gantrygraph import GantryEngine
from langchain_anthropic import ChatAnthropic
agent = GantryEngine(
llm=ChatAnthropic(model="claude-sonnet-4-6"),
# What the agent can see
perception=..., # DesktopScreen, WebPage, or custom BasePerception
# What the agent can do
tools=[...], # BaseAction, MCPClient, or @gantry_tool functions
# How many steps before stopping
max_steps=50,
max_consecutive_errors=5, # stop after N back-to-back tool errors
# Context and instructions
system_prompt="...",
# Observability
on_event=lambda e: print(e),
telemetry="stdout", # "stdout" | "langsmith" | "silent"
# Security
approval_callback=...,
guardrail=..., # GuardrailPolicy
budget=..., # BudgetPolicy
secrets=..., # GantrySecrets
# Memory
memory=..., # InMemoryStore or ChromaMemory
# State persistence (suspend/resume)
checkpointer=...,
enable_suspension=False,
# Token cost controls
perception_mode="auto", # "auto" | "axtree" | "vision"
message_window=None, # keep only the last N messages in context
enable_caching=False, # Anthropic prompt cache on system messages
)
All parameters except llm are optional.
Running a task
=== "Sync"
result = agent.run("List the 5 largest files in /tmp")
# blocks until the task finishes
=== "Async"
result = await agent.arun("List the 5 largest files in /tmp")
# use this in async code — avoids nested event loop issues
=== "Stream events"
async for event in agent.astream_events("List the 5 largest files"):
print(event.event_type, event.step, event.data)
# yields GantryEvent after each node transition
Observability shortcut
telemetry= is the fastest way to see what the agent is doing:
GantryEngine(llm=..., telemetry="stdout") # colored output to console — zero setup
GantryEngine(llm=..., telemetry="langsmith") # sets LANGCHAIN_TRACING_V2=true automatically
GantryEngine(llm=..., telemetry="silent") # suppresses all gantrygraph logging
For structured logging or custom sinks, pass on_event= instead:
def log(event):
print(f"[{event.event_type}] step={event.step} {event.data}")
agent = GantryEngine(llm=..., on_event=log)
Token cost controls
Four orthogonal knobs that together reduce token spend by 5–10× on long tasks:
| Parameter | Default | Savings |
|---|---|---|
perception_mode="axtree" |
"auto" |
~80% per observe step (text tree vs screenshot) |
message_window=20 |
None |
Caps O(N²) history growth to O(N) |
enable_caching=True |
False |
Up to 90% on Anthropic system-message tokens |
ShellTools(max_output_chars=2000) |
2000 |
Prevents megabyte log dumps |
from gantrygraph import GantryEngine
from gantrygraph.actions import ShellTools
from langchain_anthropic import ChatAnthropic
agent = GantryEngine(
llm=ChatAnthropic(model="claude-sonnet-4-6"),
tools=[ShellTools(max_output_chars=2000)],
perception_mode="axtree", # accessibility tree only — no screenshots
message_window=20, # messages[0] + last 20 messages sent to LLM
enable_caching=True, # cache_control on system messages
max_steps=50,
)
perception_mode values:
"auto"(default) — sends the accessibility tree when available, screenshot as fallback"axtree"— always text-only; fails fast on vision-only setups (canvas apps, PDFs)"vision"— always screenshot; use for pixel-level tasks (CAPTCHA, image editing)
See the Cost optimization guide for a full breakdown.
Limits and budget
| Parameter | Default | What it does |
|---|---|---|
max_steps |
50 |
Hard cap on act-node executions |
max_consecutive_errors |
5 |
Stop after N back-to-back tool failures — catches infinite loops |
BudgetPolicy.max_wall_seconds |
None |
Wall-clock timeout per run — raises TimeoutError |
BudgetPolicy.max_tokens |
None |
Cumulative token cap — raises BudgetExceededError |
BudgetPolicy.on_limit |
"stop" |
"stop" raises on breach; "warn" logs and continues |
from gantrygraph.security import BudgetPolicy
agent = GantryEngine(
llm=...,
max_steps=20,
budget=BudgetPolicy(
max_wall_seconds=60.0,
max_tokens=10_000,
on_limit="stop",
),
)
Loading config from environment
from gantrygraph import GantryConfig
from langchain_anthropic import ChatAnthropic
cfg = GantryConfig.from_env()
agent = cfg.build(llm=ChatAnthropic(model="claude-sonnet-4-6"))
from gantrygraph import GantryEngine
from langchain_anthropic import ChatAnthropic
agent = GantryEngine(
llm=ChatAnthropic(model="claude-sonnet-4-6"),
# What the agent can see
perception=..., # DesktopScreen, WebPage, or custom BasePerception
# What the agent can do
tools=[...], # BaseAction, MCPClient, or @gantry_tool functions
# How many steps before stopping
max_steps=50,
max_consecutive_errors=5, # stop after N back-to-back tool errors
# Context and instructions
system_prompt="...",
# Observability
on_event=lambda e: print(e),
telemetry="stdout", # "stdout" | "langsmith" | "silent"
# Security
approval_callback=...,
guardrail=..., # GuardrailPolicy
budget=..., # BudgetPolicy
secrets=..., # GantrySecrets
# Memory
memory=..., # InMemoryStore or ChromaMemory
# State persistence (suspend/resume)
checkpointer=...,
enable_suspension=False,
# Token cost controls
perception_mode="auto", # "auto" | "axtree" | "vision"
message_window=None, # keep only the last N messages in context
enable_caching=False, # Anthropic prompt cache on system messages
)
0
Advanced — raw graph access
get_graph() returns the compiled LangGraph StateGraph. Use it to inject custom nodes,
checkpointers, or streaming:
graph = agent.get_graph()
# CompiledStateGraph — call ainvoke, astream, get_state, etc.
See the API reference for a full custom-loop example.
See also: Guardrails · Quickstart · API reference