Long-Horizon Agent Capabilities Final plan · planning primitive · builds on #1476
Implementation Plan · Long-Horizon Agents

Long-horizon agents — plan, approve, execute

Some jobs are bigger than one reply: follow up with 142 quiet leads, clean a segment, run a multi-step play. Long-horizon capability lets an agent research read-only, present a structured plan, get policy-driven approval (human review, or auto for low-risk), then hand execution to a durable engine. Modeled on Claude Code plan mode; opt-in per agent. Bulk SMS is the first long-horizon task.

Shift · planning is a per-agent ai_agent capability you toggle on Rides on · #1476's tiered system-prompt architecture (Tier 0 / 0.5 / 1) Diverges from CC · our plan is a structured executable artifact, not advisory text
↳ The pivot from v1: there is no bulk-specific "planner stage." Planning is a capability any agent can be given — the owner ticks "let this agent plan" in its config. A planning-enabled agent gets a planning instruction tier (sibling to #1476's PLATFORM_CORE) and the present_plan tool; it researches read-only, presents a plan, gets policy-driven approval (human or auto), and an executor runs it. Bulk SMS is one PlanAction; the bulk operation is created only after approval and runs exactly as today.

Decision · TL;DR

PRIMITIVE

A first-class AgentPlan any agent can produce: research → present_plan → approve → execute. Lives in mods/ai_agent.

PER-AGENT TOGGLE

"Let this agent plan" is a config flag (any agent, default off). On → a planning prompt tier + the present_plan tool. The agent does the tool-use itself; no bespoke planner extractor.

APPROVAL = POLICY

HumanReview · Auto · RiskGated. Bulk SMS → HumanReview (sends real messages). Low-risk plans can auto-approve.

BULK = CONSUMER

A BulkSmsCampaign plan action. On approval it creates the bulk operation + spawns today's drafting engine. Engine barely changes.

What this buys over v1

Reusable across every agent (not bulk-only) · the agent's real reasoning + tools build the audience (not a one-shot extractor) · auto-approval for low-risk tasks falls out of the same model · and the bulk engine shrinks back to "create + draft + send," with the pre-drafting work owned by the plan.

01Why a planning primitive

Planning a complex task is a capability we want for any agent, not a feature of one workflow. Claude Code already models this well — we adapt it.

What we borrow from Claude Code plan mode

  • Read-only plan phase. The agent explores with safe/read tools; mutations are withheld until a plan is approved. (In CC, writes never auto-approve in plan mode.)
  • Present → approve → execute. A discrete proposal, a decision, then execution under a chosen permission posture.
  • Approval is a policy, not always a human. CC's permission modes (default · acceptEdits · plan · auto · dontAsk · bypassPermissions) span interactive → autonomous, and canUseTool / hooks decide approval programmatically per action.
!

Where we deliberately diverge

In Claude Code the plan is free-form markdown, consumed once by a human while the same agent keeps going and executes its own tool calls. We can't do that here — a separate durable engine executes (we rejected "agent sends 200 messages in a thread," option ⑤). So our plan is a structured, executable artifact a downstream worker consumes. We take CC's lifecycle and approval model; the plan itself is typed. This is the "plan-and-execute" agent pattern rather than "plan-then-continue."

Approval policy ← permission modes

Claude Code modePostureOur ApprovalPolicy
plan → review & approvehuman gates the planHumanReview — persist plan, surface card, pause for approve/edit/reject
auto / acceptEditsproceeds with soft/no gateAuto — approved on submission, execute immediately
canUseTool / permission_policyprogrammatic per-actionRiskGated — a hook inspects the plan: gate high-impact, auto low-impact

"Some plans auto-approve so the human isn't in the loop" = Auto / RiskGated. Sending real SMS is high-impact → HumanReview in v1, but the enum + hook exist so other agents/tasks can auto-approve without a rebuild.

Lands directly on #1476's tiered system prompt

#1476 is rebuilding agent instructions as layers in build_system_prompt: Tier 0 PLATFORM_CORE (identity · "you act only by calling tools" · operator-vs-contact trust · grounding), Tier 0.5 org_rules + agent_rules (from config_payload), Tier 1 delivery_directive. Planning is just another tier — a PLANNING_BLOCK the executor injects when the agent is planning-enabled, exactly like it injects PLATFORM_CORE. No new prompt machinery; we add one gated block + one tool.

02Anatomy of an AgentPlan

A general envelope (human-readable + auditable) wrapping a typed, extensible action. The action is what an executor knows how to run.

struct AgentPlan {
    id: Uuid,
    organization_id: Uuid,
    agent_id: Uuid,              // which agent authored it (e.g. the assistant clone)
    thread_id: Option<Uuid>,    // the conversation it came from
    title: String,             // "Follow up with 142 quiet quote leads"
    summary: String,           // human-readable rationale, rendered at review
    action: PlanAction,        // the typed, executable payload (below)
    approval_policy: ApprovalPolicy,
    status: PlanStatus,
    created_by: Option<Uuid>,
    approved_by: Option<Uuid>, approved_at: Option<DateTime>,
    executed_ref: Option<Uuid>,    // e.g. the assistant_operation it spawned
}

enum PlanAction {            // extensible — one variant implemented in v1
    BulkSmsCampaign {
        audience: Vec<ContactDecision>,  // { contact_id, include, reason, kind }
        message_concept: String,        // the idea, not the final text
    },
    // future: ContactCleanup { … }, ScheduleFollowUps { … }, EnrichContacts { … }
}

enum ApprovalPolicy { HumanReview, Auto, RiskGated }
enum PlanStatus { Drafting, AwaitingApproval, Approved, Rejected, Executing, Done, Failed }
i

Structure comes free from the tool schema

The agent fills this by calling present_plan(plan) — a built-in ai_agent tool whose input schema is the plan. So we get the agent's tool-use reasoning and a typed artifact, with none of the "force a submit tool to get JSON" fragility — the plan submission is that tool. (CC's ExitPlanMode is the same idea; ours just carries structure because an engine executes it.)

03The plan lifecycle

user: "follow up with everyone who got a quote last month and went quiet" │ ▼ ┌──────────────────────────────────────────────────────────────────────┐ │ AGENT TURN · read-only planning (assistant clone, its own tools) │ │ • list/segment contacts · check opt-out · read tags/deals/notes │ │ • decide include/exclude (+reason) · shape the message concept │ │ • call present_plan( BulkSmsCampaign { audience, concept } ) │ └───────────────────────────────┬──────────────────────────────────────┘ ▼ AgentPlan persisted (status: drafting→…) resolve ApprovalPolicy ┌──────────────┴───────────────┐ HumanReview │ │ Auto / RiskGated(low) ▼ ▼ ╔════ PLAN REVIEW (thread card) ════╗ status: Approved ║ audience: in / out + reasons ║ │ ║ message concept (editable) ║ │ ║ approve · edit · reject ║ │ ╚═══════════════╤═══════════════════╝ │ ▼ approve │ status: Approved ◀───────────────────┘ ▼ PlanExecutor.dispatch(action) │ BulkSmsCampaign → ▼ create_bulk_operation(included, concept) + spawn_draft_engine │ (opt-out hard-filter safety net · existing durable engine) ▼ status: drafting → awaiting_approval ◀── GATE 2 (existing draft sheet, per message) ▼ approve_operation sending → completed

Two gates remain, but they're now different kinds: plan approval (general, in the thread — the audience + the idea) and draft review (bulk-specific, the existing sheet — the actual messages). Auto-approval skips the first; the second stays a deliberate safety stop for sending real SMS.

04Approval & auto-approve

One resolver decides whether a plan pauses for a human. It's the seam where "agent autonomy" is tuned.

fn resolve_approval(agent: &Agent, plan: &AgentPlan) -> Decision {
    match plan.approval_policy {
        HumanReview => Decision::Gate,
        Auto       => Decision::Proceed,
        RiskGated  => assess_risk(plan),   // the canUseTool / permission_policy analog
    }
}
// v1 assess_risk: sends messages OR spends money OR mutates > N rows ⇒ Gate; else Proceed.
  • v1 stance: BulkSmsCampaign is hardwired to HumanReview — real messages, real cost, irreversible. No auto-send in v1.
  • The hook is the autonomy dial. Per-agent default policy + per-action override + assess_risk. Today it always gates bulk; tomorrow a trusted agent can auto-approve a read-only enrichment plan.
  • Audit: every plan records policy + decision + approver + timestamp, so an auto-approved plan is as inspectable as a gated one.
i

Read-only planning is the safety invariant

The planning turn must not mutate — it only proposes. The single mutation (creating + sending the operation) happens in PlanExecutor after approval. That mirrors CC's "no writes in plan mode" and makes auto-approval safe to reason about: an un-approved plan has changed nothing.

05Bulk SMS as the first consumer

The feature you started from is now one PlanAction + one executor handler. Everything campaign-specific lives here; the primitive stays generic.

v1 design (retired)

  • Bespoke bulk_plan_engine_service running a one-shot CampaignPlan extractor.
  • New planning / awaiting_plan_approval statuses on the bulk operation.
  • Audience + concept baked into the bulk engine.

v2 design (this)

  • The assistant agent plans via its own tool-use; emits present_plan(BulkSmsCampaign).
  • Plan lifecycle lives in ai_agent; the operation is created only after approval and starts at drafting as today.
  • delegate_bulk_operation is no longer agent-facing — it becomes the executor handler.

What happens to delegate_bulk_operation

Today the agent calls it to create-and-send directly. After: the agent calls present_plan instead; the PlanExecutor calls the existing create_bulk_operation + spawn_draft_engine path (the body of today's delegate) on approval. The create/spawn code is reused verbatim — only its trigger moves from "agent tool" to "plan execution."

06The executor we reuse

The bulk engine is already durable and well-built. It executes the approved plan unchanged, save for consuming the concept.

PieceWhereRole under the plan
create_bulk_operation + spawn_draft_engineservices/bulk_operation_engine_service.rsExecutor handler for BulkSmsCampaign reuse
Drafter extractorservices/follow_up_drafter_extractor_service.rsPer-contact draft; now also takes the concept small change
Gate 2 review sheetcomponents/operation_review_sheet_component.rsPer-message review before send reuse
Cards · websockets · reaperoperation_card · bulk_operation_event_type · reap_stale…jobLive progress + durability reuse
Statusesbulk_operation_status_type.rsUnchanged — drafting→awaiting_approval→sending reuse
!

Compliance gap to close regardless (Phase 1)

create_bulk_operation pre-skips only no_phone — it never consults contact_channel_opt_out. Keep a deterministic opt-out hard-filter in the executor as a safety net even though the planner already avoids opted-out contacts. Defense in depth; ships on its own.

07Data model

One new general table; a two-column touch on the bulk operation. The audience/exclusions live in the plan, so the item schema is untouched.

New newai_agent_plan (general)

CREATE TABLE ai_agent_plan (
  id               UUID PRIMARY KEY,
  organization_id  UUID NOT NULL REFERENCES organization(id),
  agent_id         UUID NOT NULL REFERENCES ai_agent(id),
  thread_id        UUID REFERENCES ai_thread(id),
  title            VARCHAR(255) NOT NULL,
  summary          TEXT NOT NULL,
  action_type      VARCHAR(48) NOT NULL,      // "bulk_sms_campaign"
  action_payload   JSONB NOT NULL,            // the typed PlanAction (audience + concept)
  approval_policy  VARCHAR(24) NOT NULL,      // human_review | auto | risk_gated
  status           VARCHAR(24) NOT NULL,      // drafting | awaiting_approval | …
  created_by       UUID REFERENCES "user"(id),
  approved_by      UUID REFERENCES "user"(id),
  approved_at      timestamptz,
  executed_ref     UUID,                      // the assistant_operation it spawned
  created_at       timestamptz NOT NULL,
  updated_at       timestamptz NOT NULL
);
CREATE INDEX idx_ai_agent_plan_org_status ON ai_agent_plan (organization_id, status);

Touch changeassistant_operation

ALTER TABLE assistant_operation
  ADD COLUMN message_concept TEXT,                // the approved idea, used by the drafter
  ADD COLUMN plan_id         UUID REFERENCES ai_agent_plan(id);  // provenance

No new column — the toggle lives in config_payload reuse

Planning is opt-in per agent via the existing ai_agent.config_payload JSONB (same place #1476 reads agent_rules): planning_enabled: bool (default false) + plan_approval: "human_review"|"auto"|"risk_gated" (default human_review). No schema change on ai_agent — mirrors how agent_rules is stored, and keeps the cached static prompt prefix byte-stable when off.

What we no longer need (vs v1)

No new bulk-operation statuses; no item excluded status; no item exclusion columns; no AiArea::AssistantBulkPlan. The pre-drafting phase is the plan's, not the operation's — items exist only for included contacts and draft as today.

!

Schemas are gitignored & auto-generated

After the migration run just generate (dev DB must be migrated). Never hand-edit src/bases/db/schemas/.

08Planning as an agent capability

Not assistant-only. Any agent can be given planning — the owner toggles it on, and the runtime layers the capability onto #1476's tiered prompt.

The toggle (per-agent config)

Planning rides on the same config_payload that #1476/#1478 use for agent_rules. Two keys, default off:

// ai_agent.config_payload
{
  "agent_rules": "…",                 // (#1476)
  "planning_enabled": true,           // "Let this agent plan complex tasks" — default false
  "plan_approval": "human_review"     // human_review | auto | risk_gated — default human_review
}
  • Surfaced in the agent config UI (the #1478 instructions/rules editor): a "Planning" toggle + an approval-mode selector ("Always review my agent's plans" / "Let it proceed on low-risk plans"). This is the "user can check if they want the agents to plan" control.
  • Generic across agent types — Text Reply, voice, the assistant, custom agents. Default off means today's behaviour is unchanged until an owner opts in.

What turning it on does — two gated additions

1 · A planning prompt tier

  • The executor injects a PLANNING_BLOCK into build_system_prompt — a sibling tier to #1476's PLATFORM_CORE, gated by planning_enabled exactly like include_platform_core.
  • It tells the agent: for a complex, multi-step, or high-impact task, don't act immediately — research with read tools, then call present_plan and wait for approval.

2 · The present_plan tool

  • Added to the agent's toolset when planning_enabled (like attach_domain_tools gates the domain bundle).
  • Input schema = the AgentPlan envelope. On call the runtime persists the plan, resolves plan_approval, then gates (card) or auto-dispatches.

The planning turn itself

  • List-aware via existing domain tools. A domain agent already attaches the Session-bound toolset (attach_domain_tools). Planning reuses those reads — contacts, tags, custom fields, recent messages, opt-out — to build and justify the audience. No new planner model area.
  • Read-only invariant. During a planning turn the runtime withholds mutating tools (send SMS, write contact) and offers present_plan as the terminal action. The only mutation is post-approval execution. (This is the one place #1476's "you act only by calling tools" contract needs a planning-mode complement — withhold the writing tools.)
i

This is where "the agent does the tool use" lives

Unlike v1's one-shot extractor, the audience is the product of the agent actually querying the CRM with its tools and reasoning over the results — then committing the result as the present_plan argument. Structured output, real reasoning, no extra model area — and it's the same mechanism whether the agent is the assistant or a Text Reply agent.

09Execution & the worker

On approval, a thin dispatcher maps the action to a handler. The per-contact drafter barely changes.

async fn dispatch(plan: AgentPlan) -> Result<(), AppError> {
    match plan.action {
        PlanAction::BulkSmsCampaign { audience, message_concept } => {
            let included = audience.iter().filter(|d| d.include).map(|d| d.contact_id);
            let op = create_bulk_operation(included, message_concept, …).await?;  // opt-out safety net inside
            spawn_draft_engine(op.operation_id);
            set_executed_ref(plan.id, op.operation_id).await?;
        }
    }
}

Drafter today

  • draft_follow_up_message(org, persona, instruction, context)
  • system = persona + HARD_RULES

Drafter after

  • draft_follow_up_message(org, persona, concept, instruction, context)
  • system = persona + concept + HARD_RULES (concept from the approved plan)

Fan-out (20 concurrent), claim batch (50), prefetch, counters, websockets, send path — all untouched.

10UI surfaces

Plan card (new · in the agent thread)

  • Title + summary; audience counts ("142 in · 31 out").
  • Grouped exclusions w/ reasons (opted out · already signed · off-campaign).
  • Editable message concept; Approve · Edit · Reject.
  • Generic to AgentPlan — renders any action's summary; bulk adds an audience detail view.

Draft sheet (existing · Gate 2)

  • Reused unchanged — per-message review before send.
  • OperationCard + use_realtime_operation keep live progress.
  • Surfaces once the executor has created the operation.

Progressive disclosure: the plan card defaults to summary + concept + counts; the full included/excluded lists expand on demand (mirrors the engine review sheet and the UX-principles disclosure rules). Auto-approved plans skip the card and post a "plan auto-approved → executing" note instead.

11Learning loop later

Both decisions are signal, and now they attach to a clean entity — the plan.

Plan-level signal

  • Audience edits (operator drops / re-includes) → audience-judgment lessons.
  • Concept rewrites + approve/reject → strategy lessons.

Draft-level signal

  • Per-message edits in the existing sheet → voice lessons.

Both feed the authoring agent's ai_agent_learning / ai_agent_lesson (evergreen, versioned; weekly agent cadence). Because the plan records its agent_id, this works for any planning agent, not just the campaign case.

12Build phases

Build the primitive thin, shaped by the one real consumer; prove it with bulk SMS. Compliance fix lands first.

  • P1Opt-out pre-filter (compliance)S

    Add opt-out + dupe to create_bulk_operation's deterministic skip. Independent of everything else — closes the live gap now.

    bulk_operation_engine_service.rs · opt-out read helper

  • P2ai_agent_plan entity + lifecycleM

    Migration + types (PlanStatus, ApprovalPolicy, PlanAction envelope); persistence; resolve_approval + assess_risk stub. just generate.

    migration/… · mods/ai_agent/types/… · services/ai_agent_plan_service.rs (new)

  • P3present_plan tool + PLANNING_BLOCK tier + read-only postureL

    New ai_agent tool (schema = AgentPlan); runtime persists, resolves policy, gates or auto-dispatches. Add the PLANNING_BLOCK prompt tier to build_system_prompt + offer present_plan, both gated by config_payload.planning_enabled; withhold mutating tools during a planning turn. Builds on #1476's tiered build_system_prompt — rebase on it.

    mods/ai_agent/tools/present_plan_tool.rs (new) · build_system_prompt_service.rs (#1476) · run_ai_agent_thread_service.rs · tool_registry

  • P4PlanExecutor + BulkSmsCampaign handlerM

    Dispatcher; BulkSmsCampaigncreate_bulk_operation + spawn_draft_engine; thread message_concept into the drafter; set executed_ref. Retarget delegate_bulk_operation as the handler.

    services/plan_executor_service.rs (new) · bulk_operation_engine_service.rs · follow_up_drafter_extractor_service.rs · delegate_bulk_operation.rs

  • P5Plan APIs + review cardM

    GET plan · PUT plan (edit concept / include-exclude, opt-out re-include blocked) · POST approve / reject. Plan card in the thread; auto-approve note path.

    mods/ai_agent/api/ai_agent_plan_api.rs (new) · components/agent_plan_card_component.rs (new)

  • P6Per-agent toggle + config UIM

    Read/write config_payload.planning_enabled + plan_approval; add a "Planning" toggle + approval-mode selector to the agent instructions/rules editor. Lands in #1478's config UI (or extends it). Default off → no behaviour change until an owner opts in.

    #1478 agent config UI · ai_agent config DTOs / api

  • P7Learning signals laterM

    Plan-gate + draft-gate edits → ai_agent_lesson on the authoring agent. Gated behind the flow shipping.

    ai_agent learning services · plan card + draft sheet

1
new general table (ai_agent_plan)
2
columns added to assistant_operation
0
new bulk-operation statuses
1
PlanAction variant in v1

13Open decisions

D-1

Plan = structured artifact, or free-form text like Claude Code?

  • Free-form markdown — only works if the same agent re-reads & executes (the ⑤ path we rejected).
Default → structured. The human-readable summary field covers the "readable plan" need; the typed action covers execution.
D-2

How general in v1?

  • Build the full multi-action planner framework now.
Default → thin-but-general. The envelope is reusable; we don't speculate on actions we don't have yet.
D-3

Does the planning turn pause-and-resume, or hand off?

  • Pause-and-resume the same agent turn after approval (closer to CC, but holds a turn open across human latency).
Default → hand off. The thread shows the plan card; approval triggers the executor; progress streams back over the existing websocket.
D-4

Keep Gate 2 (per-message draft review) under the plan model?

  • Let plan approval subsume drafts (auto-approve sends) — faster, riskier; make it a policy later.
Default → keep both for v1; collapsing to one gate becomes an ApprovalPolicy choice later.
D-5

"Already signed / off-campaign" signal source

  • Block semantic exclusions until a first-class deal/contract-stage signal exists.
Default → best-effort from existing signals (plans/deals aren't a prod signal — see §15).
D-6

Who can plan, and what's the default?

  • Hardwire planning to the assistant only.
Resolved (you asked for this) → per-agent toggle, default off, all agent types. Bulk SMS proves it on whichever agent an owner enables; approval mode (plan_approval) is a sibling setting on the same toggle.

14Files to touch

New

  • migration/src/m{ts}_create_ai_agent_plan.rs (+ assistant_operation columns)
  • mods/ai_agent/types/agent_plan_*.rs (plan, status, policy, action)
  • mods/ai_agent/services/ai_agent_plan_service.rs
  • mods/ai_agent/services/plan_executor_service.rs
  • mods/ai_agent/tools/present_plan_tool.rs
  • mods/ai_agent/api/ai_agent_plan_api.rs
  • components/agent_plan_card_component.rs

Changed

  • build_system_prompt_service.rs (#1476 — add the PLANNING_BLOCK tier, gated by planning_enabled)
  • run_ai_agent_thread_service.rs (read-only posture · gate present_plan + block on planning_enabled)
  • tool_registry_service.rs (offer present_plan when enabled)
  • #1478 agent config UI + config DTOs/api (the planning toggle + approval selector)
  • services/bulk_operation_engine_service.rs (opt-out filter · concept)
  • services/follow_up_drafter_extractor_service.rs (concept arg)
  • ai/rig/tools/assistant/delegate_bulk_operation.rs (becomes executor handler)

15Risks & non-goals

!

Scope creep — keep the primitive thin.

A general planning subsystem invites over-design. v1 ships one action and a hand-off lifecycle. Resist building a multi-step plan DAG, plan templates, or a risk-scoring engine before a second consumer exists.

!

Read-only planning isn't enforced for free.

The runtime must actually withhold mutating tools during a planning turn (or an agent could send before approval). This is the one place the CC "no writes in plan mode" guarantee has to be re-implemented; get it right in P3.

!

Semantic exclusion is bounded by signals.

"Contract already signed" needs that state where the agent's tools can read it (tag/custom field/note/message). Plans/deals aren't a prod signal. Promise a reviewable first pass, not perfect curation — Gate 1 is the catch.

  • Non-goal: agent-per-contact drafting (⑤) or a live operation thread (④). Drafting stays a cheap extractor the executor fans out.
  • Non-goal: auto-sending bulk SMS in v1. BulkSmsCampaign is always HumanReview; auto-approve is for future low-risk actions.
  • Non-goal: per-segment concepts, multi-action plans, plan templates — the envelope leaves room; none built now.