Platform Architecture

A layered, platform-portable architecture. Any MCP-capable orchestrator connects to the same tool server — same tools, same workflows, different model or UI. Bold Penguin's reference orchestrator is the Deep Agent, a production Python service built on deepagents and LangChain.

Design Principles

1. Platform Portable

Claude, ChatGPT, Gemini, or custom agentic app — all connect to the same MCP tool server. Change the orchestrator; tool flows stay the same. The Deep Agent itself is portable across providers — primary and fallback models are configured per-turn via LLM_PROVIDER_ORDER (default anthropic,openai,google_genai).

2. Per-Task Model Selection

Each pipeline stage uses the best model for the job. Structured subagents (MQS, CQS) run on a fast model (CLAUDE_FAST_MODEL, default claude-haiku-4-5). Reasoning-heavy subagents (planner, critique, mkt_intel, quote) run on the primary model (CLAUDE_MODEL, default claude-sonnet-4-6). Model choices are tool-internal and swappable, invisible to the orchestrator.

3. Deterministic Harness

LLMs extract and analyze. Deterministic code validates, scores, routes, and decides. The LLM never self-scores its own output. HITL gates are raised by explicit tool calls (request_* tools), not by prompt heuristics — the graph pauses when the tool fires, regardless of what the model wants to do next.

4. Human-in-the-Loop Gates

The Deep Agent exposes explicit pipeline stages where execution can pause and wait for broker input. The external SSE contract documents eight gate stages today; an additional two (awaiting_generate_quotes, awaiting_post_quote_actions) are defined in the PipelineStage enum and wired up in the HITL tool layer but are surfaced through the internal /chat/stream endpoint rather than the external SSE contract. The system never auto-proceeds past a gate; every resume carries an action_id or resume_options envelope back through the same streaming endpoint.

Layered Architecture

Layer	Role	Components
Orchestrator	Platform portable	Any MCP-capable orchestrator — Claude, ChatGPT, Gemini, or custom agentic app. Bold Penguin's reference implementation is the Deep Agent (`create_deep_agent`) — a Python FastAPI service that coordinates seven specialized subagents.
Protocol	Standard interface	MCP (Model Context Protocol) over streamable HTTP, stdio, or AWS AgentCore IAM/SigV4. Tool definitions, argument schemas, and async task patterns are protocol-level. Any MCP client discovers and calls tools without custom integration. The Deep Agent's default client is `MCP_CLIENT_TYPE=iam` (AWS AgentCore SigV4); alternatives are a static `MCP_BEARER_TOKEN` or OAuth via `BP_AUTH_URL` + `BP_API_KEY` + `CLIENT_ID` + `CLIENT_SECRET`. A separate `X-API-Key` (`API_KEY` env var) secures the FastAPI endpoints.
Tool Layer	One or more MCP servers	Insurance Intelligence MCP Server (`universal_mcp_server` via AWS AgentCore) — primary server, multiple domains: enrichment, applications, market intelligence, document ingestion. Salesforce MCP (stdio via `@tsmztech/mcp-server-salesforce`) — optional CRM server configured in `app/config/mcp_servers.json`. A separate Policy Document Analyzer MCP exists in the broader platform (handles policy extraction / comparison) but is not currently bound to the Deep Agent.
Data & AI	Per-task models	Primary LLM (`CLAUDE_MODEL`), fast LLM (`CLAUDE_FAST_MODEL`), fallback providers (`LLM_PROVIDER_ORDER`), NLP / Critic / Analysis (cross-model adversarial review), Embeddings (local semantic search), Persistence (submissions, audit trail — Partner Engine is source of truth).

Key Insight

Change the orchestrator — tool flows stay the same. Change the model per task — protocol stays the same. Change the UI — audit trail stays the same.

End-to-End Pipeline

The full pipeline runs through these stages, with red gates representing human-in-the-loop stop points where the system never auto-proceeds:

User Intent → 🔴 Plan Confirmation → Intake / Enrichment → 🔴 Critique Review → 🔴 Discrepancy Corrections → Application Create (MQS) → 🔴 Application Consent → Market Intelligence → 🔴 Carrier Selection → 🔴 Carrier Confirmation → Carrier Questions (CQS) → 🔴 Application Answers → 🔴 Quote Selection → 🔴 Generate Quotes → Quoting → 🔴 Post-Quote Actions → Quote Summary

Not every turn visits every gate — gates fire only when the agent needs a human decision. For example, awaiting_user_input fires only when the critique agent surfaces discrepancies; awaiting_application_answers fires only when MQS/CQS exhaust automated answer strategies; and awaiting_quote_selection only fires when CQS identifies carriers with open questions (if all carriers are auto-ready the pipeline goes straight to awaiting_generate_quotes). awaiting_post_quote_actions pauses after quotes come back so the broker can compare, request more carriers, or edit enriched data.

Agent Details

Full pipeline reference — agents, descriptions, MCP tools, products, and services. The Deep Agent itself is the coordinator; the table below lists the seven production subagents it delegates to.

Step	Agent	Description	MCP Tools	Products	Services / APIs
1 — Planning	`planner_agent`	Runs first on every new request. Analyzes the user's intent and produces a structured step-by-step execution plan. Not re-run on HITL resume — plan persists in `state.plan`.	`write_plan_to_state_tool`	—	—
2 — Intake	`intake_agent`	Business data enrichment and document extraction. Given company name + address, fetches NAICS codes, financials, and contacts. Supports direct-upload and S3 ingestion. Generates `enriched_data` and an `enriched_data_summary` (~500-token compact summary used by downstream agents).	`enrich_company_data`, `do_document_ingestion_from_s3`, `initiate_document_upload`, `do_document_submission_by_tx_id`, `do_data_inquiry`	SubmissionLink	Universal Submit API, Submission Status Inquiry API, Data Inquiry API, Company Submit API, Location Submit API
3 — Critique	`critique_agent`	Two modes. Mode A: Reviews enriched and extracted data and surfaces issues for user review before the pipeline proceeds. Mode B: Processes user corrections or approval and writes an outcome back to state. Always writes to `artifacts.data_quality_checks`.	`read_intake_enriched_data_tool`, `read_intake_extracted_data_tool`, `write_critique_to_state_tool`	SubmissionLink, JackIQ	JackIQ API
4 — Master Questions	`mqs_agent`	Creates the application and fills master (common) questions. Uses a 3-tier answer strategy: (1) `find_answers` from submission data, (2) intelligent guesses, (3) `request_user_answers` as last resort. Runs on the fast model (Haiku) when primary provider is Anthropic; falls back to `CLAUDE_MODEL` / `OPENAI_MODEL` / `GOOGLE_MODEL` when not.	`create_application`, `update_application`, `find_incomplete_master_question`, `find_answers`, `find_business_type`, `find_carrier_class`, `get_application_summary`	PartnerEngine, Terminal	Partner Engine API
5 — Market Intel	`mkt_intel_agent`	Predicts which carriers are likely to bind using market intelligence API. Writes `carriers_likely_to_bind` and `mkt_intel_data` to state.	`get_market_intelligence` (plus `get_market_recommendation`, currently disabled)	Market Intelligence	Market Intelligence API, Market Recommendation API
6 — Carrier Questions	`cqs_agent`	Fills carrier-specific questions for carriers confirmed by the user. Only processes questions scoped to confirmed carriers. Uses same 3-tier answer strategy as mqs_agent. Produces `cqs_summary` (carriers_with_questions / carriers_auto) so the main agent knows whether to raise `awaiting_quote_selection`. Also runs on the fast model (same non-Anthropic fallback as MQS).	`find_incomplete_questions_by_carrier`, `find_answers`, `update_application`, `find_business_type`, `find_carrier_class`	CarrierEngine, PartnerEngine, Terminal	Partner Engine API
7 — Quote Summary	`quote_agent`	Reads `quote_status` and `quote_responses` from state, fetches live application summary if needed, and produces a broker-friendly markdown quote table (Carrier / Status / Premium / Limits / Deductible).	`get_application_summary`	ClauseLink, PartnerEngine	Clause Link API

Subagents are defined two ways: code factories in app/agents/ and JSON definitions in app/config/subagents.json. JSON definitions, when present, override the code factory for the same subagent key — this lets prompts and tool lists be tuned without redeploying.

Orchestrator: the Deep Agent

The Deep Agent is the reference orchestrator that drives the pipeline above.

Middleware Stack

create_deep_agent is invoked once for the main agent and supplies a default middleware stack (filesystem, summarization, prompt caching, patched tool calls, todo tracking, subagent dispatch). The Deep Agent extends that default with one additional middleware:

ModelFallbackMiddleware — on primary-model error, re-runs the call against the next provider in LLM_PROVIDER_ORDER. Only added when at least one fallback model is configured.

Each subagent built by _build_subagent gets its own explicit stack, applied in this order:

TodoListMiddleware — tracks plan progress as structured todos.
FilesystemMiddleware — exposes a scratchpad filesystem tool.
SummarizationMiddleware — compresses older messages when the window exceeds 50K tokens, keeping the most recent 6 messages verbatim. Defaults are patched at module load so the trigger fires at this budget rather than the library default.
AnthropicPromptCachingMiddleware — attaches Anthropic prompt-caching headers so the static system prompt + tool definitions are cached across turns. Configured with unsupported_model_behavior="ignore" so non-Anthropic models pass through unchanged.
PatchToolCallsMiddleware — normalizes tool-call IDs across providers so LangGraph can restart after a HITL pause without loss of trace context.

Subagents are compiled with recursion_limit=100.

Session Storage

Run state is persisted between turns so a HITL-paused run can resume on the next request. The backend is pluggable via PERSISTENCE_LAYER:

filesystem — JSON files under AGENT_STATE_DIR (.agent_state/{run_id}.json).
mongodb — MongoDB collections via SMARTDATA_MONGO_CLUSTER_URL (production default).
mongodb_and_filesystem — both, with Mongo as primary and filesystem as a local cache.

Side-file data (HITL requests, tool outputs staged for the next turn) is written via get_persistence_provider().write_side_data(...) into a tool_data collection keyed by run_id + side_key.

Run State Schema

Each run is a Pydantic AgentRunState model serialized to a nested dict via to_dict():

run_id, timestamp, thread_id, user_request
chat_history, enriched_data, enriched_data_summary, extracted_data
artifacts:
  risk_profile, firmographics, data_quality_report, data_quality_checks
  carriers_likely_to_bind, mkt_intel
  application: { state, data, mqs_data, cqs_data, eligible_carriers }
  quotes: { status, responses }
plan: { steps, summary, created_at, todos, current_step, status }
execution_log
pipeline:
  submission_state, carrier_selection, carriers_for_quoting, user_provided_answers
field_audit, message_history, event_log, user_info, errors, cost_tracking

application_state is one of none | mqs_complete | ready_for_quoting. submission_state.pipeline_stage is the current HITL stage (see Pipeline Stages).

Streaming Contract

All production traffic flows through the Deep Agent's external streaming SSE endpoint. External integrators reach this via the A2A protocol rather than calling it directly; the request envelope is:

{
  "message": "string",
  "run_id": "string | null",
  "resume_options": "ResumeOptions | null",
  "action_id": "string | null",
  "payload": {}
}

action_id is the preferred resume mechanism — the server maps confirm_plan, approve_critique, reply_critique, provide_corrections, confirm_application, select_all, select_specific, submit_answers, get_quotes, and bind / download / done to the equivalent resume_options shape.

SSE Events

Event	Purpose
`run_id`	Emitted immediately — clients persist this to resume on the next turn
`content`	Per-token LLM output from the main agent
`subagent_content`	Per-token LLM output from a subagent (tagged with `agent`)
`mkt_intel`	Carriers-likely-to-bind list written to state
`application`	Application / MQS checkpoint
`eligible_carriers`	Eligible-carrier list written to state
`quotes`	Quote status + responses
`usage`	Per-call token counts
`state`	Incremental message snapshot
`hitl_pause`	Pipeline paused at a HITL gate — carries `pipeline_stage`, `resume_hint`, and `ui_component`
`cost`	Turn cost summary (by_model + totals) — emitted before `done`
`done`	Final payload with full `run_state`
`error`	Terminal failure

Pipeline Stages

The full PipelineStage enum in app/models/submission.py defines ten awaiting_* pause states. The first eight are emitted on hitl_pause and done by the external SSE contract; the last two currently surface only via the internal /chat/stream endpoint (and are visible on AgentRunState.submission_state.pipeline_stage either way):

`pipeline_stage`	When	HITL tool	Required resume
`awaiting_plan_confirmation`	Planner proposes an execution plan	`request_plan_confirmation`	`{ confirm_plan: true }`
`awaiting_critique_review`	Critique agent presents findings for review	`request_critique_review`	`{ critique_reply: "<feedback>" }` or `""` to approve
`awaiting_user_input`	Critique found discrepancies and requests corrections	`request_discrepancy_review`	`{ corrections: { <field>: <value> } }` or `{ proceed_without_corrections: true }`
`awaiting_application_consent`	Broker consent required before submitting application	`request_application_consent`	`{ confirm_application: true }`
`awaiting_carrier_selection`	User selects carriers from the MI list	`request_carrier_selection`	`{ carrier_choice: "all" }` or `{ carrier_choice: "specific", carriers: [...] }`
`awaiting_carrier_confirmation`	User confirms the eligible-carrier list	`request_carrier_confirmation`	same as carrier_selection
`awaiting_application_answers`	MQS/CQS need answers to specific questions	`request_user_answers`	`{ application_answers: { <question_code>: <answer> } }`
`awaiting_quote_selection`	CQS identified carriers with open questions; user selects which to quote	`request_quote_selection`	`{ carrier_choice: "specific", carriers: [...] }`
`awaiting_generate_quotes` (internal)	CQS complete; broker confirms "generate quotes"	`request_quote_generation`	proceed message / `--get-quotes` action
`awaiting_post_quote_actions` (internal)	Quotes are back; broker chooses compare / more carriers / edit enriched data	`request_post_quote_actions`	`POST_QUOTE_ACTION:` directive in follow-up message

Each hitl_pause event carries a self-describing ui_component spec (component name, gate_type, title, description, props, actions[]). Frontends render the gate from that spec instead of hardcoding component logic.

Use Case Flows

UC1 — Quote a New Risk

flowchart TD IN(["Company Name + Address, plus or minus Submission Docs"]):::input PLAN["planner_agent (write_plan_to_state)"]:::plan GP{{"PLAN CONFIRMATION"}}:::gate INTAKE["intake_agent (enrich_company_data, S3/upload ingestion)"]:::enrich CRIT["critique_agent (review_status=pending)"]:::critique GC{{"CRITIQUE REVIEW / DISCREPANCIES"}}:::gate APP["mqs_agent (create_application, find_answers)"]:::app GAC{{"APPLICATION CONSENT"}}:::gate MI["mkt_intel_agent (get_market_intelligence)"]:::mi G1{{"CARRIER SELECTION / CONFIRMATION"}}:::gate CQS["cqs_agent (per-carrier questions, find_answers)"]:::app GMQ{{"APPLICATION ANSWERS"}}:::gate GAS["quote_agent (get_application_summary)"]:::app G2{{"QUOTE SELECTION"}}:::gate OUT(["Digital Quote Objects"]):::output IN --> PLAN --> GP --> INTAKE --> CRIT --> GC --> APP --> GAC --> MI --> G1 G1 --> CQS --> GMQ --> GAS --> G2 --> OUT CQS -. "loop until complete" .-> CQS GMQ -. "unanswerable? surface to broker" .-> GMQ classDef input fill:#92400e,color:#fef3c7,stroke:#b45309,stroke-width:2px classDef plan fill:#6d28d9,color:#fff,stroke:#5b21b6,stroke-width:2px classDef enrich fill:#0284c7,color:#fff,stroke:#0369a1,stroke-width:1px classDef app fill:#1d4ed8,color:#fff,stroke:#1e40af,stroke-width:1px classDef mi fill:#0891b2,color:#fff,stroke:#0e7490,stroke-width:1px classDef gate fill:#dc2626,color:#fff,stroke:#b91c1c,stroke-width:2px classDef critique fill:#d97706,color:#fff,stroke:#b45309,stroke-width:1px classDef output fill:#065f46,color:#d1fae5,stroke:#047857,stroke-width:2px

Legend: 🟤 Input · 🟣 Planning · 🔵 Enrichment · 🟠 Critique · 🔷 Application · 🟦 Market Intelligence · 🔴 HITL Gates

UC2 — Quote Comparison (Platform-adjacent, not Deep Agent)

The UC2 flow is implemented by the broader Bold Penguin platform via the Policy Document Analyzer MCP server and the submission inventory — it is not currently bound to the Deep Agent. An orchestrator wishing to run this flow today connects directly to the Policy Document Analyzer MCP.

flowchart TD DOCS(["Policy Documents\n(expiring policies, offline quotes,\nbinders, dec pages)"]):::input UC1(["UC1 Digital\nQuote Objects"]):::uc1 EXT["extract_policy_document\n(async, per doc)"]:::extract POLL["check_extraction_status\n(poll)"]:::extract PE["PolicyExtraction\nJSON(s)"]:::data CMP["compare_extractions"]:::compare RND["Quote Recommendation\nReport"]:::compare OUT(["Recommendation\nReport"]):::output DOCS --> EXT --> POLL --> PE DOCS -. "insured info seeds\napplication for digital quotes" .-> UC1 UC1 -. "digital quotes flow\nfrom UC1 quoting" .-> PE PE --> CMP --> RND --> OUT classDef input fill:#1e3a5f,color:#93c5fd,stroke:#1d4ed8,stroke-width:2px classDef extract fill:#d97706,color:#fff,stroke:#b45309,stroke-width:1px classDef data fill:#f3f4f6,color:#374151,stroke:#d1d5db,stroke-width:1px classDef compare fill:#059669,color:#fff,stroke:#047857,stroke-width:1px classDef output fill:#065f46,color:#d1fae5,stroke:#047857,stroke-width:2px classDef uc1 fill:#92400e,color:#fef3c7,stroke:#b45309,stroke-width:2px

Legend: 🔷 Policy Documents · 🟠 Policy Extraction (async) · 🟢 Comparison & Report · 🟤 UC1 Digital Quotes

Pipeline Deep Dives

The two sections below describe capabilities in the broader Bold Penguin platform. They run on separate MCP servers and pipelines that the Deep Agent does not currently bind to. They are retained here because orchestrators other than the Deep Agent (and, potentially, future Deep Agent releases) compose them with the Deep Agent pipeline.

Policy Document Extraction (Policy Document Analyzer MCP)

The extraction pipeline runs through these stages in sequence:

PDF Reader — pdfplumber + pypdfium2 — raw text + table structures
Document Splitter — statistical page scoring + pinning (declarations, endorsements, SOF) — 100K char budget
LLM Extraction — 30+ rule system prompt — model-per-task (Sonnet, Gemini Pro)
Deterministic Validation — Pydantic schema + arithmetic + cross-reference scrubbing
Critic Agent — different foundation model reviews adversarially for missed fields and hallucinations
Deterministic Scoring — auto-fix (high confidence) / flag for review / escalate
Output — PolicyExtraction JSON + validation + audit trail

Smart Document Splitting

100K character budget with statistical page scoring — 200-page policies trimmed before LLM sees them
Declarations pinning — 3-tier header detection + continuation by density score
Endorsement pinning — header + preamble detection, body continuation (5pg/endorsement cap, 30 total)
Schedule of Forms always included as the authoritative manifest

Critic Agent (Cross-Model Adversarial)

Never self-scores — a separate foundation model critiques the extraction
Deterministic checks first — SOF completeness (form count vs extracted forms), cross-reference integrity
Per-field confidence drives auto-fix vs. flag vs. escalate
Source quote verification — substring match against original text chunks confirms evidence

Audit Trail

Source page citations per extracted field
Validation results — errors, warnings, hints by severity
Critic feedback — what was found, auto-fixed, or flagged

Submission Processing Pipeline (Submission Link, upstream of the Deep Agent)

The pipeline below feeds data into the Deep Agent's intake_agent via enrich_company_data and do_data_inquiry; the trust/consensus/ontology work itself happens upstream in the Submission Link platform, not inside the Deep Agent.

Document Extraction — structured data from ACORDs, carrier apps, loss runs, SOVs
Company Enrichment — 50+ OOTB sources — NAICS, revenue, employees, legal entity
Trust Scoring — proprietary per-source reliability scoring
Consensus Detection — cross-source agreement algorithm — 3+ sources align = high confidence
Critique Gating (HITL) — findings surfaced to broker as an awaiting_critique_review or awaiting_user_input pause — never auto-submitted
Ontology Mapping (Deterministic) — 153+ PE answer codes to canonical fields
Question Set Matching — canonical fields to ApplicationForm question codes
Answer Provenance Tagging — answered_by_type + answered_by_source on every answer
Completeness Loop — find gaps, fill, surface unanswerable to broker (HITL via awaiting_application_answers)
Output — complete application — quotes auto-fire to selected carriers

Trust and Consensus Algorithms

Trust scoring — proprietary per-source reliability model assigns weights based on historical accuracy per field type
Consensus detection — cross-source agreement: when extracted, enriched, and third-party values converge, confidence increases
Triangulation — submission doc extraction + 50+ enrichment sources + third-party data compared and reconciled
Per-field confidence output — every data point tagged with trust score + consensus level

Deterministic Ontology

153+ PE answer codes mapped to canonical SD Dictionary V0.86 field names
Priority hierarchy — ia_*_v3 > mqs_* > bold_penguin_* when multiple sources match
Entity groups — locations, vehicles, drivers, owners, WC classes with composite key logic
Mapping is a lookup table, not LLM inference

Answer Provenance

Every answer is traceable to its source:

Provenance Type	Source
`submission_link_enriched`	Third party data — from enrichment APIs
`submission_link_extracted`	Document upload — from uploaded documents
`submission_link_defaulted`	Defaulted answers — configured defaults

MCP Tool Registry

The Deep Agent connects to one or more MCP servers at startup via get_mcp_tools() (configured in app/config/mcp_servers.json). Each subagent declares a filtered toolset; not every subagent sees every tool.

Insurance Intelligence MCP (bound to Deep Agent)

This is the primary tool server — the universal_mcp_server reached over AWS AgentCore IAM/SigV4 by default. Every tool below is used by at least one Deep Agent subagent.

Tool	Domain	Used by	Description
`enrich_company_data`	Enrichment	intake	Smart company profile — NAICS, financials, descriptions
`do_document_ingestion_from_s3`	Intake	intake	Async ingestion of a submission doc from S3
`initiate_document_upload`	Intake	intake	Obtain pre-signed upload URL for a submission doc
`do_document_submission_by_tx_id`	Intake	intake	Finalize a document submission by transaction id
`do_data_inquiry`	Intake	intake	Query insurance intelligence data by insured
`find_business_type`	Enrichment	mqs, cqs	Search for business type information
`find_carrier_class`	Enrichment	mqs, cqs	Search for carrier class codes
`create_application`	Application	mqs	Create application form + trigger quote submission
`update_application`	Application	mqs, cqs	Update answers on application form
`find_incomplete_master_question`	Application	mqs	Find missing required master questions
`find_incomplete_questions_by_carrier`	Application	cqs	Find missing carrier-specific questions
`find_answers`	Application	mqs, cqs	Find answers for specific question codes
`get_application_summary`	Application	mqs, quote	Application summary + quote request status
`get_market_intelligence`	Market Intel	mkt_intel	Carrier-specific MI predictions (product/NAICS/state)
`get_market_recommendation`	Market Intel	(disabled)	Carrier eligibility check — currently disabled in the pipeline
`validate_token`	Auth	(auth layer)	Validate authentication token

Salesforce MCP (optional, bound via stdio)

Configured in mcp_servers.json, pulled in when SALESFORCE_INSTANCE_URL + SALESFORCE_CLIENT_ID + SALESFORCE_CLIENT_SECRET are set. Exposes the standard Salesforce tool set (record CRUD, SOQL queries) provided by @tsmztech/mcp-server-salesforce. No Deep Agent subagent uses these tools in the default pipeline — they're available to any subagent that opts in.

Adjacent platform capabilities (separate MCP servers, not bound to the Deep Agent)

The following tools exist on the broader platform but are exposed by separate MCP servers. Orchestrators other than the Deep Agent (or a future Deep Agent release) can bind to them using the same MCP pattern; today they drive the UC2 policy-comparison flow and legacy submission processing, not the Deep Agent pipeline.

Tool	Server	Purpose
`extract_policy_document`, `extract_quote_object`, `check_extraction_status`, `get_extraction_schema`	Policy Document Analyzer	Async extraction of policy docs, carrier API quote objects, and schema lookup
`compare_extractions`, `generate_comparison_pdf`, `generate_comparison_html`, `save_comparison_pdf`	Policy Document Analyzer	Normalize + compare extractions; render side-by-side HTML/PDF
`search_similar_submission`	Submission Link	Search inventory by insured name (text or Mongo query)
`crm_upsert_opportunity`	Custom CRM MCP (optional)	Upsert opportunity with recommendation metadata, report URL, carrier selection

HITL Request Tools (orchestration-only, local to the Deep Agent)

LangChain-native tools that never call external APIs. Each stages a gate payload to a side file and causes the stream to emit hitl_pause (or, for the last two, returns a state transition visible on the internal /chat/stream endpoint):

request_plan_confirmation, request_critique_review, request_discrepancy_review, request_application_consent, request_carrier_selection, request_carrier_confirmation, request_user_answers, request_quote_generation, request_post_quote_actions.

request_user_answers is also bound to the mqs_agent and cqs_agent subagents so they can surface unanswerable questions without round-tripping through the main agent. request_quote_selection is defined in app/tools/hitl_state_tools.py and referenced in the v4/v5 system prompts but is not currently bound to the main agent or any subagent in code; the awaiting_quote_selection stage is set programmatically via build_submission_state_awaiting_quote_selection in app/sessions/hitl.py.

Context and State Architecture

The platform uses a three-tier state architecture to manage context efficiently.

Tier 1 — Conversation Context

Tool chain = transaction log — each response carries IDs forward. application_id, task_id, and order_id are all durable and linked to submission_reference_id for full traceability.
Summaries only — conversation sees carrier + premium + error count, not 10K-token JSONs. Intake writes a compact enriched_data_summary (~500 tokens) that downstream agents read instead of the full payload.
Cache-safe prefix — static system prompt + tool defs (~15K tokens) cached via AnthropicPromptCachingMiddleware at up to 90% savings.
Per-UC system prompts — git-tracked .md files, never mutated mid-session. Prompt version pinned by PROMPT_VERSION (default v4).

Tier 2 — Server-Side State

Full artifacts in AgentRunState — extraction JSONs, critic results, MQS/CQS answers, quote responses keyed by run_id behind the MCP tools and the Deep Agent persistence layer.
Heavy data never transits conversation — get_application_summary and similar tools pull from server state.
Ephemeral working memory plus durable run state — AgentRunState.message_history replays a prior session on resume; submission_state.pipeline_stage tells the pipeline where to pick up.
Per-task model calls — separate cache contexts behind MCP boundary.
Summarization middleware — messages older than the keep-window are compressed when the conversation exceeds 50K tokens, preserving the last 6 turns verbatim.

Tier 3 — Durable Audit Trail

Pluggable persistence — PERSISTENCE_LAYER=mongodb | filesystem | mongodb_and_filesystem. Mongo is the production default; filesystem is the local-dev default.
MongoDB collections — runs, tool_data side files, extraction audits, critic decisions, CoT logs, token usage per step.
Cross-transaction memory — previous extractions and comparisons queryable by insured name across sessions.
Async writes — fire-and-forget, never block the workflow.
submission_reference_id — cross-transaction durable key linking all activity for same insured.
Token cost reporting — cost_tracking on AgentRunState accumulates token counts and USD cost per model across all turns of a run. The cost SSE event emits the turn total just before done.
LangSmith tracing — optional, enabled via LANGSMITH_API_KEY; defaults to project arctic-agents.

note

Conversation carries references. Tools hold artifacts. The persistence layer holds the audit trail.

System Prompt Architecture

Orchestrator-Level Prompts

Per-UC workflow prompts defining tool sequences and stop gates
Static + versioned — git-tracked .md files, never mutated mid-session; version pinned by PROMPT_VERSION
Model-agnostic — same prompt drives Claude, ChatGPT, Gemini, or BP Agent
JSON override — app/config/subagents.json entries replace the code-factory prompt and tool list for a subagent without code changes

Tool-Internal Prompts

Per-task prompts — structured extraction rules, critic verification, NLP parsing, triangulation logic
Hidden behind MCP — orchestrator never sees them, separate cache context
Dynamic injection — text chunks + deterministic hints into user prompts
Model-specific — each prompt tuned for its assigned model — swap without changing workflow

Platform Architecture

Design Principles​

1. Platform Portable​

2. Per-Task Model Selection​

3. Deterministic Harness​

4. Human-in-the-Loop Gates​

Layered Architecture​

End-to-End Pipeline​

Agent Details​

Orchestrator: the Deep Agent​

Middleware Stack​

Session Storage​

Run State Schema​

Streaming Contract​

SSE Events​

Pipeline Stages​

Use Case Flows​

UC1 — Quote a New Risk​

UC2 — Quote Comparison (Platform-adjacent, not Deep Agent)​

Pipeline Deep Dives​

Policy Document Extraction (Policy Document Analyzer MCP)​

Smart Document Splitting​

Critic Agent (Cross-Model Adversarial)​

Audit Trail​

Submission Processing Pipeline (Submission Link, upstream of the Deep Agent)​

Trust and Consensus Algorithms​

Deterministic Ontology​

Answer Provenance​

MCP Tool Registry​

Insurance Intelligence MCP (bound to Deep Agent)​

Salesforce MCP (optional, bound via stdio)​

Adjacent platform capabilities (separate MCP servers, not bound to the Deep Agent)​

HITL Request Tools (orchestration-only, local to the Deep Agent)​

Context and State Architecture​

Tier 1 — Conversation Context​

Tier 2 — Server-Side State​

Tier 3 — Durable Audit Trail​

System Prompt Architecture​

Orchestrator-Level Prompts​

Tool-Internal Prompts​