Skip to main content

Platform Architecture

A layered, platform-portable architecture. Any MCP-capable orchestrator connects to the same tool server — same tools, same workflows, different model or UI. Bold Penguin's reference orchestrator is the Deep Agent, a production Python service built on deepagents and LangChain.

Design Principles

1. Platform Portable

Claude, ChatGPT, Gemini, or custom agentic app — all connect to the same MCP tool server. Change the orchestrator; tool flows stay the same. The Deep Agent itself is portable across providers — primary and fallback models are configured per-turn via LLM_PROVIDER_ORDER (default anthropic,openai,google_genai).

2. Per-Task Model Selection

Each pipeline stage uses the best model for the job. Structured subagents (MQS, CQS) run on a fast model (CLAUDE_FAST_MODEL, default claude-haiku-4-5). Reasoning-heavy subagents (planner, critique, mkt_intel, quote) run on the primary model (CLAUDE_MODEL, default claude-sonnet-4-6). Model choices are tool-internal and swappable, invisible to the orchestrator.

3. Deterministic Harness

LLMs extract and analyze. Deterministic code validates, scores, routes, and decides. The LLM never self-scores its own output. HITL gates are raised by explicit tool calls (request_* tools), not by prompt heuristics — the graph pauses when the tool fires, regardless of what the model wants to do next.

4. Human-in-the-Loop Gates

The Deep Agent exposes explicit pipeline stages where execution can pause and wait for broker input. The external SSE contract documents eight gate stages today; an additional two (awaiting_generate_quotes, awaiting_post_quote_actions) are defined in the PipelineStage enum and wired up in the HITL tool layer but are surfaced through the internal /chat/stream endpoint rather than the external SSE contract. The system never auto-proceeds past a gate; every resume carries an action_id or resume_options envelope back through the same streaming endpoint.

Layered Architecture

LayerRoleComponents
OrchestratorPlatform portableAny MCP-capable orchestrator — Claude, ChatGPT, Gemini, or custom agentic app. Bold Penguin's reference implementation is the Deep Agent (create_deep_agent) — a Python FastAPI service that coordinates seven specialized subagents.
ProtocolStandard interfaceMCP (Model Context Protocol) over streamable HTTP, stdio, or AWS AgentCore IAM/SigV4. Tool definitions, argument schemas, and async task patterns are protocol-level. Any MCP client discovers and calls tools without custom integration. The Deep Agent's default client is MCP_CLIENT_TYPE=iam (AWS AgentCore SigV4); alternatives are a static MCP_BEARER_TOKEN or OAuth via BP_AUTH_URL + BP_API_KEY + CLIENT_ID + CLIENT_SECRET. A separate X-API-Key (API_KEY env var) secures the FastAPI endpoints.
Tool LayerOne or more MCP serversInsurance Intelligence MCP Server (universal_mcp_server via AWS AgentCore) — primary server, multiple domains: enrichment, applications, market intelligence, document ingestion. Salesforce MCP (stdio via @tsmztech/mcp-server-salesforce) — optional CRM server configured in app/config/mcp_servers.json. A separate Policy Document Analyzer MCP exists in the broader platform (handles policy extraction / comparison) but is not currently bound to the Deep Agent.
Data & AIPer-task modelsPrimary LLM (CLAUDE_MODEL), fast LLM (CLAUDE_FAST_MODEL), fallback providers (LLM_PROVIDER_ORDER), NLP / Critic / Analysis (cross-model adversarial review), Embeddings (local semantic search), Persistence (submissions, audit trail — Partner Engine is source of truth).
Key Insight

Change the orchestrator — tool flows stay the same. Change the model per task — protocol stays the same. Change the UI — audit trail stays the same.

End-to-End Pipeline

The full pipeline runs through these stages, with red gates representing human-in-the-loop stop points where the system never auto-proceeds:

User Intent → 🔴 Plan ConfirmationIntake / Enrichment → 🔴 Critique Review → 🔴 Discrepancy CorrectionsApplication Create (MQS) → 🔴 Application ConsentMarket Intelligence → 🔴 Carrier Selection → 🔴 Carrier ConfirmationCarrier Questions (CQS) → 🔴 Application Answers → 🔴 Quote Selection → 🔴 Generate QuotesQuoting → 🔴 Post-Quote ActionsQuote Summary

Not every turn visits every gate — gates fire only when the agent needs a human decision. For example, awaiting_user_input fires only when the critique agent surfaces discrepancies; awaiting_application_answers fires only when MQS/CQS exhaust automated answer strategies; and awaiting_quote_selection only fires when CQS identifies carriers with open questions (if all carriers are auto-ready the pipeline goes straight to awaiting_generate_quotes). awaiting_post_quote_actions pauses after quotes come back so the broker can compare, request more carriers, or edit enriched data.

Agent Details

Full pipeline reference — agents, descriptions, MCP tools, products, and services. The Deep Agent itself is the coordinator; the table below lists the seven production subagents it delegates to.

StepAgentDescriptionMCP ToolsProductsServices / APIs
1 — Planningplanner_agentRuns first on every new request. Analyzes the user's intent and produces a structured step-by-step execution plan. Not re-run on HITL resume — plan persists in state.plan.write_plan_to_state_tool
2 — Intakeintake_agentBusiness data enrichment and document extraction. Given company name + address, fetches NAICS codes, financials, and contacts. Supports direct-upload and S3 ingestion. Generates enriched_data and an enriched_data_summary (~500-token compact summary used by downstream agents).enrich_company_data, do_document_ingestion_from_s3, initiate_document_upload, do_document_submission_by_tx_id, do_data_inquirySubmissionLinkUniversal Submit API, Submission Status Inquiry API, Data Inquiry API, Company Submit API, Location Submit API
3 — Critiquecritique_agentTwo modes. Mode A: Reviews enriched and extracted data and surfaces issues for user review before the pipeline proceeds. Mode B: Processes user corrections or approval and writes an outcome back to state. Always writes to artifacts.data_quality_checks.read_intake_enriched_data_tool, read_intake_extracted_data_tool, write_critique_to_state_toolSubmissionLink, JackIQJackIQ API
4 — Master Questionsmqs_agentCreates the application and fills master (common) questions. Uses a 3-tier answer strategy: (1) find_answers from submission data, (2) intelligent guesses, (3) request_user_answers as last resort. Runs on the fast model (Haiku) when primary provider is Anthropic; falls back to CLAUDE_MODEL / OPENAI_MODEL / GOOGLE_MODEL when not.create_application, update_application, find_incomplete_master_question, find_answers, find_business_type, find_carrier_class, get_application_summaryPartnerEngine, TerminalPartner Engine API
5 — Market Intelmkt_intel_agentPredicts which carriers are likely to bind using market intelligence API. Writes carriers_likely_to_bind and mkt_intel_data to state.get_market_intelligence (plus get_market_recommendation, currently disabled)Market IntelligenceMarket Intelligence API, Market Recommendation API
6 — Carrier Questionscqs_agentFills carrier-specific questions for carriers confirmed by the user. Only processes questions scoped to confirmed carriers. Uses same 3-tier answer strategy as mqs_agent. Produces cqs_summary (carriers_with_questions / carriers_auto) so the main agent knows whether to raise awaiting_quote_selection. Also runs on the fast model (same non-Anthropic fallback as MQS).find_incomplete_questions_by_carrier, find_answers, update_application, find_business_type, find_carrier_classCarrierEngine, PartnerEngine, TerminalPartner Engine API
7 — Quote Summaryquote_agentReads quote_status and quote_responses from state, fetches live application summary if needed, and produces a broker-friendly markdown quote table (Carrier / Status / Premium / Limits / Deductible).get_application_summaryClauseLink, PartnerEngineClause Link API

Subagents are defined two ways: code factories in app/agents/ and JSON definitions in app/config/subagents.json. JSON definitions, when present, override the code factory for the same subagent key — this lets prompts and tool lists be tuned without redeploying.


Orchestrator: the Deep Agent

The Deep Agent is the reference orchestrator that drives the pipeline above.

Middleware Stack

create_deep_agent is invoked once for the main agent and supplies a default middleware stack (filesystem, summarization, prompt caching, patched tool calls, todo tracking, subagent dispatch). The Deep Agent extends that default with one additional middleware:

  • ModelFallbackMiddleware — on primary-model error, re-runs the call against the next provider in LLM_PROVIDER_ORDER. Only added when at least one fallback model is configured.

Each subagent built by _build_subagent gets its own explicit stack, applied in this order:

  1. TodoListMiddleware — tracks plan progress as structured todos.
  2. FilesystemMiddleware — exposes a scratchpad filesystem tool.
  3. SummarizationMiddleware — compresses older messages when the window exceeds 50K tokens, keeping the most recent 6 messages verbatim. Defaults are patched at module load so the trigger fires at this budget rather than the library default.
  4. AnthropicPromptCachingMiddleware — attaches Anthropic prompt-caching headers so the static system prompt + tool definitions are cached across turns. Configured with unsupported_model_behavior="ignore" so non-Anthropic models pass through unchanged.
  5. PatchToolCallsMiddleware — normalizes tool-call IDs across providers so LangGraph can restart after a HITL pause without loss of trace context.

Subagents are compiled with recursion_limit=100.

Session Storage

Run state is persisted between turns so a HITL-paused run can resume on the next request. The backend is pluggable via PERSISTENCE_LAYER:

  • filesystem — JSON files under AGENT_STATE_DIR (.agent_state/{run_id}.json).
  • mongodb — MongoDB collections via SMARTDATA_MONGO_CLUSTER_URL (production default).
  • mongodb_and_filesystem — both, with Mongo as primary and filesystem as a local cache.

Side-file data (HITL requests, tool outputs staged for the next turn) is written via get_persistence_provider().write_side_data(...) into a tool_data collection keyed by run_id + side_key.

Run State Schema

Each run is a Pydantic AgentRunState model serialized to a nested dict via to_dict():

run_id, timestamp, thread_id, user_request
chat_history, enriched_data, enriched_data_summary, extracted_data
artifacts:
risk_profile, firmographics, data_quality_report, data_quality_checks
carriers_likely_to_bind, mkt_intel
application: { state, data, mqs_data, cqs_data, eligible_carriers }
quotes: { status, responses }
plan: { steps, summary, created_at, todos, current_step, status }
execution_log
pipeline:
submission_state, carrier_selection, carriers_for_quoting, user_provided_answers
field_audit, message_history, event_log, user_info, errors, cost_tracking

application_state is one of none | mqs_complete | ready_for_quoting. submission_state.pipeline_stage is the current HITL stage (see Pipeline Stages).


Streaming Contract

All production traffic flows through the Deep Agent's external streaming SSE endpoint. External integrators reach this via the A2A protocol rather than calling it directly; the request envelope is:

{
"message": "string",
"run_id": "string | null",
"resume_options": "ResumeOptions | null",
"action_id": "string | null",
"payload": {}
}

action_id is the preferred resume mechanism — the server maps confirm_plan, approve_critique, reply_critique, provide_corrections, confirm_application, select_all, select_specific, submit_answers, get_quotes, and bind / download / done to the equivalent resume_options shape.

SSE Events

EventPurpose
run_idEmitted immediately — clients persist this to resume on the next turn
contentPer-token LLM output from the main agent
subagent_contentPer-token LLM output from a subagent (tagged with agent)
mkt_intelCarriers-likely-to-bind list written to state
applicationApplication / MQS checkpoint
eligible_carriersEligible-carrier list written to state
quotesQuote status + responses
usagePer-call token counts
stateIncremental message snapshot
hitl_pausePipeline paused at a HITL gate — carries pipeline_stage, resume_hint, and ui_component
costTurn cost summary (by_model + totals) — emitted before done
doneFinal payload with full run_state
errorTerminal failure

Pipeline Stages

The full PipelineStage enum in app/models/submission.py defines ten awaiting_* pause states. The first eight are emitted on hitl_pause and done by the external SSE contract; the last two currently surface only via the internal /chat/stream endpoint (and are visible on AgentRunState.submission_state.pipeline_stage either way):

pipeline_stageWhenHITL toolRequired resume
awaiting_plan_confirmationPlanner proposes an execution planrequest_plan_confirmation{ confirm_plan: true }
awaiting_critique_reviewCritique agent presents findings for reviewrequest_critique_review{ critique_reply: "<feedback>" } or "" to approve
awaiting_user_inputCritique found discrepancies and requests correctionsrequest_discrepancy_review{ corrections: { <field>: <value> } } or { proceed_without_corrections: true }
awaiting_application_consentBroker consent required before submitting applicationrequest_application_consent{ confirm_application: true }
awaiting_carrier_selectionUser selects carriers from the MI listrequest_carrier_selection{ carrier_choice: "all" } or { carrier_choice: "specific", carriers: [...] }
awaiting_carrier_confirmationUser confirms the eligible-carrier listrequest_carrier_confirmationsame as carrier_selection
awaiting_application_answersMQS/CQS need answers to specific questionsrequest_user_answers{ application_answers: { <question_code>: <answer> } }
awaiting_quote_selectionCQS identified carriers with open questions; user selects which to quoterequest_quote_selection{ carrier_choice: "specific", carriers: [...] }
awaiting_generate_quotes (internal)CQS complete; broker confirms "generate quotes"request_quote_generationproceed message / --get-quotes action
awaiting_post_quote_actions (internal)Quotes are back; broker chooses compare / more carriers / edit enriched datarequest_post_quote_actionsPOST_QUOTE_ACTION: directive in follow-up message

Each hitl_pause event carries a self-describing ui_component spec (component name, gate_type, title, description, props, actions[]). Frontends render the gate from that spec instead of hardcoding component logic.


Use Case Flows

UC1 — Quote a New Risk

flowchart TD IN(["Company Name + Address, plus or minus Submission Docs"]):::input PLAN["planner_agent (write_plan_to_state)"]:::plan GP{{"PLAN CONFIRMATION"}}:::gate INTAKE["intake_agent (enrich_company_data, S3/upload ingestion)"]:::enrich CRIT["critique_agent (review_status=pending)"]:::critique GC{{"CRITIQUE REVIEW / DISCREPANCIES"}}:::gate APP["mqs_agent (create_application, find_answers)"]:::app GAC{{"APPLICATION CONSENT"}}:::gate MI["mkt_intel_agent (get_market_intelligence)"]:::mi G1{{"CARRIER SELECTION / CONFIRMATION"}}:::gate CQS["cqs_agent (per-carrier questions, find_answers)"]:::app GMQ{{"APPLICATION ANSWERS"}}:::gate GAS["quote_agent (get_application_summary)"]:::app G2{{"QUOTE SELECTION"}}:::gate OUT(["Digital Quote Objects"]):::output IN --> PLAN --> GP --> INTAKE --> CRIT --> GC --> APP --> GAC --> MI --> G1 G1 --> CQS --> GMQ --> GAS --> G2 --> OUT CQS -. "loop until complete" .-> CQS GMQ -. "unanswerable? surface to broker" .-> GMQ classDef input fill:#92400e,color:#fef3c7,stroke:#b45309,stroke-width:2px classDef plan fill:#6d28d9,color:#fff,stroke:#5b21b6,stroke-width:2px classDef enrich fill:#0284c7,color:#fff,stroke:#0369a1,stroke-width:1px classDef app fill:#1d4ed8,color:#fff,stroke:#1e40af,stroke-width:1px classDef mi fill:#0891b2,color:#fff,stroke:#0e7490,stroke-width:1px classDef gate fill:#dc2626,color:#fff,stroke:#b91c1c,stroke-width:2px classDef critique fill:#d97706,color:#fff,stroke:#b45309,stroke-width:1px classDef output fill:#065f46,color:#d1fae5,stroke:#047857,stroke-width:2px

Legend: 🟤 Input · 🟣 Planning · 🔵 Enrichment · 🟠 Critique · 🔷 Application · 🟦 Market Intelligence · 🔴 HITL Gates

UC2 — Quote Comparison (Platform-adjacent, not Deep Agent)

The UC2 flow is implemented by the broader Bold Penguin platform via the Policy Document Analyzer MCP server and the submission inventory — it is not currently bound to the Deep Agent. An orchestrator wishing to run this flow today connects directly to the Policy Document Analyzer MCP.

flowchart TD DOCS(["Policy Documents\n(expiring policies, offline quotes,\nbinders, dec pages)"]):::input UC1(["UC1 Digital\nQuote Objects"]):::uc1 EXT["extract_policy_document\n(async, per doc)"]:::extract POLL["check_extraction_status\n(poll)"]:::extract PE["PolicyExtraction\nJSON(s)"]:::data CMP["compare_extractions"]:::compare RND["Quote Recommendation\nReport"]:::compare OUT(["Recommendation\nReport"]):::output DOCS --> EXT --> POLL --> PE DOCS -. "insured info seeds\napplication for digital quotes" .-> UC1 UC1 -. "digital quotes flow\nfrom UC1 quoting" .-> PE PE --> CMP --> RND --> OUT classDef input fill:#1e3a5f,color:#93c5fd,stroke:#1d4ed8,stroke-width:2px classDef extract fill:#d97706,color:#fff,stroke:#b45309,stroke-width:1px classDef data fill:#f3f4f6,color:#374151,stroke:#d1d5db,stroke-width:1px classDef compare fill:#059669,color:#fff,stroke:#047857,stroke-width:1px classDef output fill:#065f46,color:#d1fae5,stroke:#047857,stroke-width:2px classDef uc1 fill:#92400e,color:#fef3c7,stroke:#b45309,stroke-width:2px

Legend: 🔷 Policy Documents · 🟠 Policy Extraction (async) · 🟢 Comparison & Report · 🟤 UC1 Digital Quotes


Pipeline Deep Dives

The two sections below describe capabilities in the broader Bold Penguin platform. They run on separate MCP servers and pipelines that the Deep Agent does not currently bind to. They are retained here because orchestrators other than the Deep Agent (and, potentially, future Deep Agent releases) compose them with the Deep Agent pipeline.

Policy Document Extraction (Policy Document Analyzer MCP)

The extraction pipeline runs through these stages in sequence:

  1. PDF Reader — pdfplumber + pypdfium2 — raw text + table structures
  2. Document Splitter — statistical page scoring + pinning (declarations, endorsements, SOF) — 100K char budget
  3. LLM Extraction — 30+ rule system prompt — model-per-task (Sonnet, Gemini Pro)
  4. Deterministic Validation — Pydantic schema + arithmetic + cross-reference scrubbing
  5. Critic Agent — different foundation model reviews adversarially for missed fields and hallucinations
  6. Deterministic Scoring — auto-fix (high confidence) / flag for review / escalate
  7. Output — PolicyExtraction JSON + validation + audit trail

Smart Document Splitting

  • 100K character budget with statistical page scoring — 200-page policies trimmed before LLM sees them
  • Declarations pinning — 3-tier header detection + continuation by density score
  • Endorsement pinning — header + preamble detection, body continuation (5pg/endorsement cap, 30 total)
  • Schedule of Forms always included as the authoritative manifest

Critic Agent (Cross-Model Adversarial)

  • Never self-scores — a separate foundation model critiques the extraction
  • Deterministic checks first — SOF completeness (form count vs extracted forms), cross-reference integrity
  • Per-field confidence drives auto-fix vs. flag vs. escalate
  • Source quote verification — substring match against original text chunks confirms evidence

Audit Trail

  • Source page citations per extracted field
  • Validation results — errors, warnings, hints by severity
  • Critic feedback — what was found, auto-fixed, or flagged

The pipeline below feeds data into the Deep Agent's intake_agent via enrich_company_data and do_data_inquiry; the trust/consensus/ontology work itself happens upstream in the Submission Link platform, not inside the Deep Agent.

  1. Document Extraction — structured data from ACORDs, carrier apps, loss runs, SOVs
  2. Company Enrichment — 50+ OOTB sources — NAICS, revenue, employees, legal entity
  3. Trust Scoring — proprietary per-source reliability scoring
  4. Consensus Detection — cross-source agreement algorithm — 3+ sources align = high confidence
  5. Critique Gating (HITL) — findings surfaced to broker as an awaiting_critique_review or awaiting_user_input pause — never auto-submitted
  6. Ontology Mapping (Deterministic) — 153+ PE answer codes to canonical fields
  7. Question Set Matching — canonical fields to ApplicationForm question codes
  8. Answer Provenance Tagginganswered_by_type + answered_by_source on every answer
  9. Completeness Loop — find gaps, fill, surface unanswerable to broker (HITL via awaiting_application_answers)
  10. Output — complete application — quotes auto-fire to selected carriers

Trust and Consensus Algorithms

  • Trust scoring — proprietary per-source reliability model assigns weights based on historical accuracy per field type
  • Consensus detection — cross-source agreement: when extracted, enriched, and third-party values converge, confidence increases
  • Triangulation — submission doc extraction + 50+ enrichment sources + third-party data compared and reconciled
  • Per-field confidence output — every data point tagged with trust score + consensus level

Deterministic Ontology

  • 153+ PE answer codes mapped to canonical SD Dictionary V0.86 field names
  • Priority hierarchyia_*_v3 > mqs_* > bold_penguin_* when multiple sources match
  • Entity groups — locations, vehicles, drivers, owners, WC classes with composite key logic
  • Mapping is a lookup table, not LLM inference

Answer Provenance

Every answer is traceable to its source:

Provenance TypeSource
submission_link_enrichedThird party data — from enrichment APIs
submission_link_extractedDocument upload — from uploaded documents
submission_link_defaultedDefaulted answers — configured defaults

MCP Tool Registry

The Deep Agent connects to one or more MCP servers at startup via get_mcp_tools() (configured in app/config/mcp_servers.json). Each subagent declares a filtered toolset; not every subagent sees every tool.

Insurance Intelligence MCP (bound to Deep Agent)

This is the primary tool server — the universal_mcp_server reached over AWS AgentCore IAM/SigV4 by default. Every tool below is used by at least one Deep Agent subagent.

ToolDomainUsed byDescription
enrich_company_dataEnrichmentintakeSmart company profile — NAICS, financials, descriptions
do_document_ingestion_from_s3IntakeintakeAsync ingestion of a submission doc from S3
initiate_document_uploadIntakeintakeObtain pre-signed upload URL for a submission doc
do_document_submission_by_tx_idIntakeintakeFinalize a document submission by transaction id
do_data_inquiryIntakeintakeQuery insurance intelligence data by insured
find_business_typeEnrichmentmqs, cqsSearch for business type information
find_carrier_classEnrichmentmqs, cqsSearch for carrier class codes
create_applicationApplicationmqsCreate application form + trigger quote submission
update_applicationApplicationmqs, cqsUpdate answers on application form
find_incomplete_master_questionApplicationmqsFind missing required master questions
find_incomplete_questions_by_carrierApplicationcqsFind missing carrier-specific questions
find_answersApplicationmqs, cqsFind answers for specific question codes
get_application_summaryApplicationmqs, quoteApplication summary + quote request status
get_market_intelligenceMarket Intelmkt_intelCarrier-specific MI predictions (product/NAICS/state)
get_market_recommendationMarket Intel(disabled)Carrier eligibility check — currently disabled in the pipeline
validate_tokenAuth(auth layer)Validate authentication token

Salesforce MCP (optional, bound via stdio)

Configured in mcp_servers.json, pulled in when SALESFORCE_INSTANCE_URL + SALESFORCE_CLIENT_ID + SALESFORCE_CLIENT_SECRET are set. Exposes the standard Salesforce tool set (record CRUD, SOQL queries) provided by @tsmztech/mcp-server-salesforce. No Deep Agent subagent uses these tools in the default pipeline — they're available to any subagent that opts in.

Adjacent platform capabilities (separate MCP servers, not bound to the Deep Agent)

The following tools exist on the broader platform but are exposed by separate MCP servers. Orchestrators other than the Deep Agent (or a future Deep Agent release) can bind to them using the same MCP pattern; today they drive the UC2 policy-comparison flow and legacy submission processing, not the Deep Agent pipeline.

ToolServerPurpose
extract_policy_document, extract_quote_object, check_extraction_status, get_extraction_schemaPolicy Document AnalyzerAsync extraction of policy docs, carrier API quote objects, and schema lookup
compare_extractions, generate_comparison_pdf, generate_comparison_html, save_comparison_pdfPolicy Document AnalyzerNormalize + compare extractions; render side-by-side HTML/PDF
search_similar_submissionSubmission LinkSearch inventory by insured name (text or Mongo query)
crm_upsert_opportunityCustom CRM MCP (optional)Upsert opportunity with recommendation metadata, report URL, carrier selection

HITL Request Tools (orchestration-only, local to the Deep Agent)

LangChain-native tools that never call external APIs. Each stages a gate payload to a side file and causes the stream to emit hitl_pause (or, for the last two, returns a state transition visible on the internal /chat/stream endpoint):

request_plan_confirmation, request_critique_review, request_discrepancy_review, request_application_consent, request_carrier_selection, request_carrier_confirmation, request_user_answers, request_quote_generation, request_post_quote_actions.

request_user_answers is also bound to the mqs_agent and cqs_agent subagents so they can surface unanswerable questions without round-tripping through the main agent. request_quote_selection is defined in app/tools/hitl_state_tools.py and referenced in the v4/v5 system prompts but is not currently bound to the main agent or any subagent in code; the awaiting_quote_selection stage is set programmatically via build_submission_state_awaiting_quote_selection in app/sessions/hitl.py.


Context and State Architecture

The platform uses a three-tier state architecture to manage context efficiently.

Tier 1 — Conversation Context

  • Tool chain = transaction log — each response carries IDs forward. application_id, task_id, and order_id are all durable and linked to submission_reference_id for full traceability.
  • Summaries only — conversation sees carrier + premium + error count, not 10K-token JSONs. Intake writes a compact enriched_data_summary (~500 tokens) that downstream agents read instead of the full payload.
  • Cache-safe prefix — static system prompt + tool defs (~15K tokens) cached via AnthropicPromptCachingMiddleware at up to 90% savings.
  • Per-UC system prompts — git-tracked .md files, never mutated mid-session. Prompt version pinned by PROMPT_VERSION (default v4).

Tier 2 — Server-Side State

  • Full artifacts in AgentRunState — extraction JSONs, critic results, MQS/CQS answers, quote responses keyed by run_id behind the MCP tools and the Deep Agent persistence layer.
  • Heavy data never transits conversationget_application_summary and similar tools pull from server state.
  • Ephemeral working memory plus durable run stateAgentRunState.message_history replays a prior session on resume; submission_state.pipeline_stage tells the pipeline where to pick up.
  • Per-task model calls — separate cache contexts behind MCP boundary.
  • Summarization middleware — messages older than the keep-window are compressed when the conversation exceeds 50K tokens, preserving the last 6 turns verbatim.

Tier 3 — Durable Audit Trail

  • Pluggable persistencePERSISTENCE_LAYER=mongodb | filesystem | mongodb_and_filesystem. Mongo is the production default; filesystem is the local-dev default.
  • MongoDB collections — runs, tool_data side files, extraction audits, critic decisions, CoT logs, token usage per step.
  • Cross-transaction memory — previous extractions and comparisons queryable by insured name across sessions.
  • Async writes — fire-and-forget, never block the workflow.
  • submission_reference_id — cross-transaction durable key linking all activity for same insured.
  • Token cost reportingcost_tracking on AgentRunState accumulates token counts and USD cost per model across all turns of a run. The cost SSE event emits the turn total just before done.
  • LangSmith tracing — optional, enabled via LANGSMITH_API_KEY; defaults to project arctic-agents.
note

Conversation carries references. Tools hold artifacts. The persistence layer holds the audit trail.

System Prompt Architecture

Orchestrator-Level Prompts

  • Per-UC workflow prompts defining tool sequences and stop gates
  • Static + versioned — git-tracked .md files, never mutated mid-session; version pinned by PROMPT_VERSION
  • Model-agnostic — same prompt drives Claude, ChatGPT, Gemini, or BP Agent
  • JSON overrideapp/config/subagents.json entries replace the code-factory prompt and tool list for a subagent without code changes

Tool-Internal Prompts

  • Per-task prompts — structured extraction rules, critic verification, NLP parsing, triangulation logic
  • Hidden behind MCP — orchestrator never sees them, separate cache context
  • Dynamic injection — text chunks + deterministic hints into user prompts
  • Model-specific — each prompt tuned for its assigned model — swap without changing workflow