Investment Agent Platform — Buy-Side Intelligence Layer

The problem

Senior people spend the morning stitching, not thinking.

Every working day, the same recurring outputs are hand-assembled from four or five vendor screens and a handful of internal documents. The data is already there. The judgement is already in the team's head. What's missing is the layer that does the assembly.

Front office

Two days for a first-draft IC memo.

An analyst spends a day and a half pulling fundamentals, peer comps, ESG flags and the last three IC discussions. The actual original thinking takes hours, not days.

Middle office

Compliance breaches noticed late.

Less than half of mandate breaches are surfaced inside an hour. By the time someone reads the alert, contextualises it and routes it, the trade has already cleared.

Back office

Recon breaks closed by hand.

Ops opens twelve breaks every morning, classifies each one, drafts a custodian email, and chases responses. The classification is rules-based; the drafting is templated; nothing is decided.

Who this is for

Six personas, six recurring frustrations — each one named.

The platform is not a generic productivity tool. Every agent maps to a specific role, a specific weekly task, and a specific person whose calendar will visibly empty out if the agent works. If we cannot name the person and the task, we are not building the agent.

Front office

Portfolio Manager

Decides positioning, owns the P&L

Pain: spends 60–90 minutes of every morning reading overnight macro, peer notes, custody alerts and risk emails before forming a view. By the time the desk meeting starts, half the energy is already gone to information assembly.

FO-01 · FO-04 · MO-01 · MO-02

Front office

Investment Analyst

Builds the case, owns the IC memo

Pain: spends a full day and a half pulling fundamentals, peer comps, ESG flags and the last three IC discussions before the original thinking can even begin. The memo gets a single round of review before the deadline forces the issue.

FO-02 · FO-03 · FO-04 · FO-05

Middle office

Risk Officer

Pre-trade limits, post-trade attribution

Pain: the morning risk brief takes 90 minutes to assemble — VaR, factor exposures, mandate proximity, hot-spots — and is read in three minutes. Attribution commentary slips a week behind the report it explains.

MO-01 · MO-02 · MO-05

Middle office

Compliance Officer

Mandate adherence, breach response

Pain: a new mandate takes two weeks to translate from PDF into rule-engine code; less than half of breaches are surfaced inside an hour; the breach narrative for the client is hand-written each time.

MO-03 · MO-04

Back office

Operations Lead

Recon breaks, corporate actions, cash

Pain: opens twelve recon breaks every morning; classifies each by hand; drafts a custodian email; chases for two days; misses one corporate action a quarter; reconciles the cash ladder in a spreadsheet that nobody else can read.

BO-01 · BO-02 · BO-03 · BO-04

Executive

COO / Head of D&AI

Sponsor, accountable for ROI

Pain: needs a credible ROI story for the board, an audit-defensible governance posture for the regulator, and a way to know — week by week — whether the platform is actually freeing time or just shuffling it around.

All 15 · sponsor-level dashboard

Goals & non-goals

Ten things this platform is. Eight things it isn't.

A programme that promises everything is a programme that ships nothing. Goals are measurable; non-goals are written down so that scope creep has somewhere to die.

Goals — measurable, dated, owned

Free 25–35% of measured recurring task time against the Phase 0 baseline by M24 — not against an assumption.
Cite every quantitative claim back to a system of record at >99% coverage, verified by an independent second-model check.
Surface mandate breaches inside one hour for ≥90% of cases by Phase 2 exit.
Cut IC memo first-draft time from two days to under one by Phase 2 exit.
Reduce recon MTTR by 40% against the Phase 0 baseline.
Operate with zero autonomous external actions — every irreversible step waits for a human.
Pass independent SR 11-7 / OCC 2011-12 / ECB TRIM model validation for every deployed agent.
Achieve SOC 1 Type 1 by M14, SOC 2 Type 2 in BAU year 3 (M28–30).
Maintain a deterministic replay endpoint that reproduces any historical output bit-identically within tolerance.
Retire or materially rework at least one agent by Phase 3 based on telemetry — a programme that never kills anything is not honest.

Non-goals — explicitly out of scope

No autonomous trade execution, period — not at Phase 1, not at Phase 3, not in BAU year 5.
No client-facing chat — the platform is internal-only for Year 1; external surfaces require a separate compliance review.
Vendor AI is not ground truth — Bloomberg / FactSet / Aladdin AI outputs are reconciliation inputs, not authoritative claims.
No replacement for human judgement — agents do not decide; they prepare, draft, and propose.
No general-purpose chatbot surface — the team-wide "ask anything" mode is explicitly deferred until governance is mature.
No cross-tenant model training — single-tenant LLM only; no firm data flows into vendor training corpora.
No silent retraining — every model version change is a controlled deployment with re-validation, not a vendor-pushed update.
No "AI assistant" branding — agents are presented as governed tools with named owners, not as personalities.

What's new in v1.1

Six honest course-corrections — before any code is written.

v1.0 was an ambitious single-track plan. v1.1 is the version that survives a head-of-D&AI review: measure before promising, narrow Phase 1 to what platform-build can actually carry, and put the highest-blast-radius agents behind harder gates.

Phase 0 added

Three months of measure-and-decide before Phase 1 starts.

A time-and-motion baseline replaces the 30–45% working assumption with a measured number. Vendor selection (OMS, recon engine, risk engine), MNPI and model risk policy sign-off, and Bloomberg licence review all close before Phase 1 budget is released.

Lean is the default

Eight agents, eight FTE — fifteen only if economics clear.

v1.0 implied 14 FTE / 15 agents from day one. v1.1 commits Lean (8 FTE / 8 agents), and treats Full as a Phase 1-success-conditional expansion. The two highest-risk agents — Manager Selection and Client Letter — sit behind a Phase 2 unit-economics gate.

Phasing re-cut

Two agents in Phase 1, not five.

Platform-build is 4–5 months of real work. Stacking five agents on top is how integration debt gets baked in forever. Phase 1 is now FO-01 + BO-01; Phase 1.5 (M7–9) ships MO-01 + MO-02 on the now-real platform; Document Concierge moves to Phase 2 when MNPI tagging is operationally proven.

SR 11-7 is the framework

No more "SR 11-7-style" hedge.

Every agent is treated as a model under SR 11-7 + OCC Bulletin 2011-12 + the firm's internal model risk policy + ECB TRIM where EU funds apply. Independent validator, model card, declared eval suite, kill criteria, annual recertification — for all of them.

MNPI tagging hardened

Four tags became six.

PUBLIC, LICENSED, RESTRICTED, MNPI — plus MNPI-RESEARCH and MNPI-OPS to separate corporate-access from operational privileged data. Promotion-only mutability, 30-day cooling-off on demotions, CCO-level approval. Mis-tagging is a P1 incident with structural consequences.

Three new design docs

Determinism · MNPI · Adversarial.

Reviewer pushback led to three sibling design documents: deterministic replay for SR 11-7 reproducibility, the MNPI tagging taxonomy with its legal exposure framing, and the adversarial testing programme covering nine attack vectors per agent.

The agents

One job, one persona, one human gate.

Every agent does exactly one thing. It reads from systems of record, drafts an output, and stops at a human-approval gate before anything irreversible happens. No agent places trades, files documents, sends letters or modifies records on its own. The Lean default ships 8 of these. The remaining 7 require Phase 1 success to unlock — and two of them require an additional unit-economics gate.

FO-01Phase 1 · Lean

Morning Macro Brief

A 6:30 a.m. brief in house style covering overnight markets, central-bank moves, and the day's top movers across our holdings.

Persona: PMFront office

BO-01Phase 1 · Lean

Recon Break Triage

Pulls every open break at 7 a.m., classifies by type, drafts the custodian email, and routes to the right operations owner.

Persona: OpsBack office

MO-01Phase 1.5 · Lean

Risk Morning Brief

One-page risk summary at 7 a.m.: VaR, factor exposures, tracking error, stress-scenario flags — every number cited to the risk engine.

Persona: Risk OfficerMiddle office

MO-02Phase 1.5 · Lean

Attribution Commentary

Drafts Brinson and Karnosky-Singer attribution in plain English. Allocation, selection, currency — broken out and explained.

Persona: PM, IRMiddle office

BO-04Phase 2 · Lean

Document & Audit Trail Concierge

Natural-language search across IPS, IC minutes, custodian notices, sell-side research, internal memos and emails — with sentence-level citations. Moved out of Phase 1 because it's the highest-blast-radius agent and ships only when MNPI tagging is operationally proven.

Persona: AllBack office

FO-02Phase 2 · Lean

IC Memo Drafter + Devil's Advocate

First-pass IC memo from the research packet, plus a structured counter-argument that cites specific contradictory sources.

Persona: Analyst, PMFront office

MO-03Phase 2 · Lean

Compliance Breach Triage

Reads Charles River / AIM alerts the moment they fire, classifies severity, drafts the narrative, escalates to the compliance officer.

Persona: Compliance OfficerMiddle office

FO-04Phase 2 · Lean

Earnings Call Synthesiser

Triggered when an earnings call ends. Fifteen-minute summary with deltas vs the last four quarters, cited to transcript timestamps.

Persona: Analyst, PMFront office

BO-02Phase 2 · Full

Corporate Action Interpreter

Reads SWIFT MT564 and issuer notices, recommends an election with rationale, flags portfolio impact. Human approves before the custodian is told.

Persona: Ops, PMBack office

BO-03Phase 2 · Full

Cash Ladder + FX Funding

Multi-currency cash ladder twice daily. Identifies funding gaps, suggests forward rolls, surfaces expiry dates before they bite.

Persona: Treasury, OpsBack office

MO-05Phase 2 · Full

Liquidity Coverage Monitor

Days-to-liquidate per holding under stress; coverage versus the liability profile; alerts when a position outgrows its venue.

Persona: Risk, OpsMiddle office

MO-04Phase 3 · Full

Mandate-to-Rule Agent

Translates IPS / IMA clauses into structured CRD or AIM rule drafts. Two-person sign-off mandatory. Highest-stakes agent in the platform — payback measured in years, justified on risk reduction not hours.

Persona: ComplianceMiddle office

FO-03Phase 3 · Full

Pre-IC Research Packet

The full pre-IC packet — peer set, fundamentals, ESG, prior diligence, technicals. Last on purpose: it consumes every other agent.

Persona: AnalystFront office

FO-05Phase 3 · Conditional

Manager Selection Analyst

Peer set construction, DDQ synthesis, Form ADV ingestion, red-flag detection. Two gates: Phase 2 unit-economics must clear, and the firm must hold external funds.

Persona: PM (FoF)Front office · conditional

FO-06Phase 3 · Conditional

Client Letter Drafter

Quarterly letter first draft in house voice. PM, IR and Compliance sign off before anything leaves the firm. Two gates: Phase 2 unit-economics, and the firm must have external clients.

Persona: PM, IRFront office · conditional

Lean default · 8 agents committed Full expansion · gated on Phase 1 success Conditional · gated on Phase 2 economics + firm-state

A day in the life

Five worked examples — actual outputs the team would see.

Every example below is what the user actually receives — the brief in Slack, the recon table at 7 a.m., the IC memo with its devil's advocate, the breach card on the compliance desk, the Q&A from the document concierge. None of it is final without a human approving it.

06 : 30 Local · Mon

The Morning Macro Brief lands in Slack FO-01

Before the PM has finished their coffee, the brief is in #front-office. It's not a digest of headlines — it's the firm's own portfolio against an overnight market backdrop, written in the firm's voice, with every figure traceable to the system it came from.

slack · #front-office

A

Morning Brief Agent · FO-01 06:30

Overnight setup. Asia closed mixed; Nikkei +0.4%, HSI -1.2% on property weakness. Bunds rallied 6bp after softer German PPI. DXY +0.3%, USDJPY through 158 — currency overlay flagged for review.

Top movers across our holdings

ASML NA · semis cyclical+3.8% FactSet · 06:24 UTC
Aramco · upstream KSA+1.2% Bloomberg · 06:18
SolarEdge · clean energy-7.1% FactSet · pre-mkt
JD.com · China consumer-2.4% HKEX close

Calendar today. ECB minutes 12:30 UTC. US CPI tomorrow — Risk Brief flagged elevated rate sensitivity in the EU credit sleeve.

07 : 00 Local · Tue

Twelve recon breaks, triaged before anyone opens email BO-01

By the time Operations sits down, every open break has been pulled from Duco, classified, written up, and routed. The ops manager isn't deciding what to do — they're confirming the triage and pressing send on the custodian email.

recon dashboard · 07:02

Break	Class	Severity	Drafted action	Owner
BRK-44218	Position · TSLA	Material	Custodian email + chase trade ticket TR-9812	L. Park
BRK-44219	Cash · USD	Review	FX settlement timing — propose T+1 reclass	F. Halim
BRK-44220	Price · ASML NA	Auto	Stale BVAL; refreshed at 06:58 — clear	Auto-cleared
BRK-44221	Corp action · BHP	Review	Stock split 1:3 not yet booked at custodian	L. Park
BRK-44222	Position · 7203 JP	Low	1-share rounding; flagged for monthly clean-up	Queue
BRK-44223	Cash · EUR	Material	Funding gap €4.2m — see Cash Ladder agent	Treasury

Showing 6 of 12 breaks. 10 auto-classified · 2 escalated for human triage. Mean draft time per break: 38s All actions require operator approval before send.

11 : 00 Local · Wed

An IC memo, drafted in 40 minutes — with its own counter-argument FO-02

Yesterday this memo would have taken two days to assemble and a day to argue with. Today the analyst kicks the agent off after the morning meeting. The memo and a structured devil's-advocate argument are ready before lunch. The analyst spends the afternoon refining the thesis, not pulling fundamentals.

Investment Committee · Memo first draft · NVO US · Underweight → Overweight · 18 Sep 2026

Novo Nordisk — Recommend overweight, +120 bps active

Thesis. GLP-1 demand visibility extends through 2028 on capacity additions at Kalundborg and Catalent fill-finish^[1]. Pricing pressure in the US is real but smaller than the Street fears: rebate disclosures suggest a 6–8% net realised price decline against the consensus 12%^[2].

Position size. 1.4% active vs benchmark 0.6%. Liquidity check: 8 days to fully liquidate at 25% ADV^[3]. Currency overlay: DKK exposure already inside policy band.

Catalysts. Q4 capacity guidance update (Nov), CagriSema PIII top-line (early 2027), oral semaglutide CV outcomes data (mid-2027).

Devil's Advocate · structured counter

Three independent sources contradict the rebate-disclosure thesis. Express Scripts' 2026 formulary memo^[4] guides to 14% net price erosion; this is the same number CVS cited on its 2Q call^[5]. If those are correct, our base case overstates 2027 EBITDA by ~9%. The position remains supportable but the size should be 80–100 bps, not 120.

[1] Novo Q3 transcript p.14 · [2] BBG NRX rebate dataset · [3] Internal liquidity engine · [4] Express Scripts memo Aug 2026 · [5] CVS 2Q call transcript p.22

14 : 14 Local · Thu

A breach surfaced and classified within minutes MO-03

Charles River fires an alert at 14:14. Eleven minutes later the compliance officer has the full picture: which mandate clause, which positions, what the rule actually says, severity, and a draft narrative. The officer's job is to confirm and act — not to assemble.

BREACH · CRD-2914Mandate: Global Equity Mandate IITriggered 14:14 UTCSurfaced 14:25 UTCSeverity: High

Single-issuer concentration limit exceeded — Microsoft holdings now 5.4% vs 5.0% mandate cap.

The breach is driven by mark-to-market drift, not a new trade. Two adjacent funds are within tolerance. The relevant clause is IPS §4.3.bIPS-2024 v3 · §4.3.b — concentration measured at month-end on a look-through basis. The look-through is not yet stale; the breach is real.

Recommended action: trim 0.5% over the next 3 trading days using the standard 25% ADV ramp. No client notification required under §4.3.b.iii.

16 : 30 Local · Thu

"What did we agree about EM exposure last September?" BO-04

A simple question that used to mean digging through SharePoint, email and IC minutes for an hour. Now it's a sentence. The agent retrieves from every indexed corpus — IPS, IC minutes, custodian notices, internal memos — and answers with sentence-level citations. No invented quotes, no paraphrased numbers, no guesses.

What did we agree about EM exposure in the September IC?

The 14 September IC ratified a tactical reduction in EM equity from 14% to 10% over two months, with the residual 4% redirected to short-duration EM hard-currency credit. The decision cited weakening EM real-rate differentials and an explicit concern about USD strength into Q4. The IC also agreed to revisit the cap at the December meeting.

[1] IC Minutes 14 Sep 2026 · §3 (Asset Allocation) · "tactical reduction approved unanimously"

[2] IC Minutes 14 Sep 2026 · §4.2 · "redirect to short-duration EM credit"

[3] Pre-IC research packet 12 Sep · macro section p.6 · real-rate chart

[4] Quarterly review draft Sep 2026 · §EM allocation · "December review checkpoint"

How it works

A read layer, a reasoning layer, a human-approval layer, and a permanent log.

The platform is small in concept and disciplined in design. It reads from systems of record, retrieves from the firm's own document corpus, asks an LLM to reason within tight guardrails, asks a human to approve anything irreversible, and writes everything to an immutable log.

01 · DATA

Read, never write

Bloomberg, FactSet, LSEG, Aladdin, custodian, prime broker, OMS, IPS / IMA documents, IC minutes, emails. Read-only access; every datapoint tagged at ingest.

02 · GROUND

Retrieve before reasoning

For every claim the agent might make, it retrieves the underlying source first — keyword and semantic search across every indexed corpus, with sentence-level citation enforced.

03 · REASON

Numbers from systems, words from the model

The LLM is allowed to write prose, draw conclusions, propose actions. It is not allowed to invent figures. Every quantitative claim must trace to a source.

04 · APPROVE

Human gate, every time

Nothing irreversible — a custodian email, a filed document, a client letter, a rule change — leaves the platform without a human approving it in Slack or the web app.

05 · LOG

Seven-year immutable trail

Every prompt, every retrieval, every tool call, every approval, every output is written to a hash-chained, write-once log retained for seven years. No exceptions.

Architecture

Seven layers, one direction of flow — data up, decisions down.

Every byte enters at the bottom and is tagged immediately. Nothing reaches an agent that has not first been written to the Golden Source. Nothing leaves an agent that has not first been written to the audit log. The Governance plane sits beside everything, not above — it can pause any layer at any time.

7 · Presentation

Web app, Slack, email

Where humans review, edit and approve. Read-only by default; every irreversible action is a workflow gate, not a button.

Next.js · Slack Bolt
Postmark · SSO/SAML

6 · Agent runtime

15 narrowly-scoped agents (8 Lean default)

Each agent is a model card + prompt template + declared tool surface + eval suite. temperature=0, pinned model versions, canonical tool ordering. Agents declare which MNPI tags they may consume.

Anthropic Claude · GPT-4o fallback
Temporal workflows
Prompt registry · cache discipline

5 · Domain services

Citation · verifier · tool surface · MNPI gate

Independent verifier model checks every quantitative claim before publication. The MNPI gate is enforced here — agents request data, the gate consults the agent's tag declaration and returns or refuses.

FastAPI services
Independent verifier model
Citation enforcer

4 · Golden Source

Single authoritative store · MNPI-tagged at ingestion

Everything an agent sees comes from here, never directly from a vendor screen. The six-tag taxonomy is enforced at write — tag-unknown blocks for 72 hours then auto-archives.

Postgres + pgvector
S3 (immutable, versioned)
Six-tag MNPI taxonomy

3 · Ingestion

Connectors · classifiers · de-duplication

A 7–13B fine-tuned classifier tags every document at the moment it enters. MNPI precision target ≥99% at confidence threshold 0.95. Promotion-only mutability; demotions need 30-day cooling-off and CCO approval.

Kafka streams
MNPI classifier (fine-tuned)
Schema registry

2 · Data sources

Bloomberg · FactSet · Refinitiv · Aladdin · OMS · custodian · email · SharePoint

Read-only connectors — no agent ever talks to a vendor system directly. Bloomberg licence terms are reviewed in Phase 0 before any redistribution path is built.

Vendor APIs · SFTP · SWIFT
Microsoft Graph
OMS / custodian webhooks

1 · Governance

Audit log · model risk · policy · kill switch

Hash-chained, write-once, seven-year retention. SR 11-7 + OCC 2011-12 + ECB TRIM model registry. Policy-as-code for MNPI handling, retention, redistribution. A platform-wide kill switch lives here.

Append-only log (Merkle)
Model registry · policy engine
OpenTelemetry · Sentry

Evaluation

Three tiers of evidence — the agent earns the right to ship.

No agent reaches production without 750+ test cases passed across three layers, an independent SR 11-7 validator's sign-off, and a documented kill criterion. Quarterly fitness reviews re-test the same cases and any new failure modes seen in BAU.

500+

Tier 3 · Adversarial

Red-team cases per agent. P1: MO-04 quarterly. P2: BO-04, FO-05 quarterly.

200+

Tier 2 · Continuous

Production-shaped scenarios with golden outputs. Re-run on every model-version change.

50+

Tier 1 · Pre-deployment

Unit-style cases for citation, schema, tool-call ordering, MNPI tag respect.

Each tier has a different question.

Pre-deployment asks does the agent obey its contract? Continuous asks does it still produce the right answer on a representative day? Adversarial asks can we break it on purpose?

Citation coverage >99% on every output — verifier model independently checks each quantitative claim.
Pass rate ≥95% on Tier 2 with zero P1 regressions before any model version is promoted.
Adversarial test sets sized by blast radius — 200+ cases for HIGH-risk agents quarterly, 100+ for MEDIUM semi-annually, 50+ for LOW annually.
Independent SR 11-7 validator sign-off before production deployment, with annual recertification.
Kill criterion declared in the model card before launch — what telemetry pattern triggers a roll-back, not a debate.

Determinism & reproducibility

Five execution rules — any output, replayed bit-for-bit.

If a regulator, an auditor, or a head of compliance asks "what did the agent see and what did it produce?" eight months after the fact, the answer is reproduced from the audit log — same prompt, same retrieval, same tool order, same model version, same response. The endpoint is POST /api/v1/validator/rerun with cache_bypass:true.

1

Temperature locked at zero

All production agents run at temperature=0. Stochastic generation is disabled; sampling is deterministic.

2

Pinned model versions

Every agent points at an exact dated model snapshot — claude-opus-4-7-20251101, never a moving alias. Version changes are controlled deployments with re-validation.

3

Prompt immutability

Prompts are content-hashed. The hash is bound to the model version. Any prompt change forces a new agent revision and a fresh eval run.

4

Canonical tool ordering

When two tools could be called, the order is fixed by policy — never inferred. Reproducing an output reproduces the same tool sequence.

5

Cache-bypass on re-run

Replay deliberately bypasses the prompt cache. The byte-identity test is real — not "the cache returned what it returned last time".

Audit envelope per invocation — every field hash-chained

invocation_id

agent_id

model_id

prompt_hash

retrieval_context_hash

tool_call_sequence_hash

response_hash

chain_hash

embedding_model_version

mnpi_taxonomy_version

Continuous monitoring targets: byte-identity rate >97%, semantic-drift and material-discrepancy combined <0.5%. Three classes of equality recognised — byte-identical, semantic-identical, distribution-equivalent — with explicit thresholds for each.

Vendor AI is reconciliation, not ground truth

Bloomberg, FactSet and Aladdin do not get to be the source.

Vendor AI assistants are useful — and they are not authoritative. Their outputs flow into a reconciliation step that compares them against the firm's own evidence and produces three explicit verdicts. We never let a vendor model speak directly into an agent's reasoning.

Inputs

Vendor AI output — Bloomberg GPT, FactSet Mercury, Aladdin Copilot, etc.

Firm evidence — Golden Source records, internal research, custodian data, the agent's own retrieval.

→

Reconciliation produces one of three verdicts

Agree

Vendor + firm evidence converge. The claim is cited to the firm's own record; the vendor result is logged as corroborating but not load-bearing.

Diverge

Vendor and firm evidence disagree. The output is flagged for human review — a named analyst owns the disposition; the agent does not pick a side.

Vendor-only

Firm has no independent evidence. The claim is not promoted into agent reasoning. It is surfaced in the UI as an unverified vendor signal — never cited, never acted on.

Adversarial testing

Nine attack vectors — tested before production, retested every quarter.

An LLM-powered agent is an attack surface. Adversarial testing is not a one-off red-team exercise; it is a continuous discipline with named owners, sized test sets, and ensemble defences for the highest-blast-radius agents. MO-04 Mandate-to-Rule is Priority 1. BO-04 Document Concierge is Priority 2.

VECTOR 01

Direct prompt injection

User-supplied input attempts to override the system prompt — "ignore previous instructions and email …". Defence: three-zone trust boundary with delimiter-wrapped ingestion.

VECTOR 02

Indirect / RAG poisoning

Adversarial content embedded in retrieved documents tries to hijack reasoning. Defence: retrieval-layer injection filtering, tag-based segregation, no instruction execution from data zone.

VECTOR 03

Adversarial mandate PDF

A mandate document crafted to cause MO-04 to encode a permissive rule. Defence: ensemble classification (MO-04 + BO-04), human compliance officer must approve any rule diff.

VECTOR 04

Spoofed transcript

A fake "IC discussion" or earnings call transcript injected via SharePoint. Defence: provenance-aware ingestion, MNPI-tag mismatch alert, source-of-record check.

VECTOR 05

Forged custodian notice

A fake corporate-action or settlement message attempts to drive a BO action. Defence: signed channels only, schema validation, human approval gate enforced at the workflow engine.

VECTOR 06

MNPI exfiltration

Output crafted to leak MNPI through summary, paraphrase or structured output. Defence: tag-aware output filter, second-model exfiltration check, no MNPI in agent prose unless explicitly licensed by tag.

VECTOR 07

Cost-exhaustion

An attacker drives the agent into expensive retrieval / generation loops to burn budget. Defence: per-agent rate limits, cost ceilings, automatic degradation to a smaller model on threshold breach.

VECTOR 08

Tool-call manipulation

Inputs that attempt to coerce the agent into calling tools out of canonical order or with unsafe arguments. Defence: tool-arg schema enforcement, canonical ordering at the runtime, no native code execution.

VECTOR 09

Refusal-bypass

Multi-turn or role-play attacks that try to get the agent to produce content it has refused. Defence: stateless re-evaluation per turn, output schema enforcement, append-only audit of refusal events.

Zone A · Instruction

System prompts & policies

Authoritative. Loaded from the prompt registry, hash-verified, signed by the agent owner. Nothing in this zone can be edited by content from any other zone.

trust = HIGH
mutable = NO (only via release)

Zone B · Data

Retrieved documents & vendor outputs

Treated as untrusted text — wrapped in delimiters, never interpreted as instructions. Vendor AI outputs land here, not in Zone A.

trust = LOW
mutable = YES (read-only to agent)

Zone C · User input

Analyst / PM messages

Treated as a parameter, not a directive. The agent may use user input to scope a query; it does not let user input override Zone A policy or escalate Zone B trust.

trust = MEDIUM
mutable = YES (per turn)

Adversarial priority by agent — sized by blast radius

MO-04

P1 · 200+ qtr

BO-04

P2 · 200+ qtr

FO-05

P2 · 200+ qtr

FO-02

MED · 100+ semi

FO-03

MED · 100+ semi

FO-04

MED · 100+ semi

MO-03

MED · 100+ semi

BO-02

MED · 100+ semi

BO-03

MED · 100+ semi

FO-01

LOW · 50+ ann

MO-01

LOW · 50+ ann

MO-02

LOW · 50+ ann

MO-05

LOW · 50+ ann

BO-01

LOW · 50+ ann

FO-06

P2 · client-facing

The timeline

Phase 0, then four delivery phases — shipped in the right order.

Three months of measure-and-decide before any code is written. Phase 1 builds platform plus two agents — not five. Phase 1.5 adds two more on the now-real platform. Phase 2 ships Document Concierge first (when MNPI tagging is operationally proven) then the expansion. Phase 3 hardens for scrutiny: SR 11-7 validation, SOC 1 Type 1, vendor renegotiation. SOC 2 Type 2 follows in BAU year 3.

Phase 0 · Pre-build

Measure and decide.

Months −3 — 0

10 deliverables · 8 hard gates · zero code shipped

Time-and-motion baseline study — replaces the 30–45% working assumption with a measured number against which Phase 1 ROI is computed.
Vendor selection: OMS anchor (Aladdin vs Bloomberg AIM), recon engine (Duco vs SmartStream vs AutoRek), risk engine (BarraOne vs Aladdin Risk vs MARS).
Bloomberg licence review (legal-owned), MNPI policy sign-off, model risk policy approval, deployment-mode decision (cloud vs on-prem).
Final budget envelope signed: Lean default, Full conditional on Phase 1 gate.

⌥ Exit gate: all 8 Phase 0 hard gates closed · Phase 1 budget released

Phase 1 · Foundation

Platform first, agents second.

Months 1 — 6

2 agents · platform built deliberately · zero P1 incidents

Two production agents: FO-01 Morning Macro Brief · BO-01 Recon Break Triage. Low MNPI risk, high recurring time, fast feedback loop.
Foundations live end-to-end: identity, MNPI tagging at ingestion, audit log, prompt registry, eval harness, human-in-the-loop workflow engine, deterministic replay.
First connectors: Bloomberg, the chosen OMS anchor, the chosen custodian, the chosen recon engine.

⌥ Exit gate: ≥ 70% weekly active usage · ≥ 30% time saved on the two agents' tasks · zero audit incidents

Phase 1.5 · On-platform agents

Two more on the now-real platform.

Months 7 — 9

2 agents added · platform compounding starts · first SR 11-7 validation

Two agents added: MO-01 Risk Morning Brief · MO-02 Attribution Commentary — both originally Phase 1 in v1.0; slipped here because the platform is the bottleneck.
SR 11-7 model validation programme operational; first three agents formally validated by an independent reviewer.
Adoption telemetry instrumented; Phase 1 success gate evaluated to unlock Phase 2 expansion.

⌥ Exit gate: 4 agents at ≥ 70% adoption · attribution cycle ≥ 60% faster · platform cost-per-output trending down

Phase 2 · Expansion

Earn the harder agents.

Months 10 — 16

7 expansion agents · Document Concierge first · M14 unit-economics gate

BO-04 Document Concierge first — moved from Phase 1 because it's the highest-blast-radius agent (email + SharePoint under MNPI ACLs) and ships only when tagging is operationally proven.
Then six more: FO-02 IC Memo Drafter + Devil's Advocate · MO-03 Compliance Breach Triage · FO-04 Earnings Synthesiser · BO-02 Corporate Action · BO-03 Cash Ladder + FX · MO-05 Liquidity Coverage.
M14 unit-economics gate: cost-per-output vs target, adoption by persona, marginal LLM cost trend. FO-05 and FO-06 are not committed for build until this gate passes.

⌥ Exit gate: IC memo ≤ 1 day · recon MTTR −40% · breaches < 1 hr ≥ 90% · M14 unit-economics gate cleared

Phase 3 · Maturity

Harden for scrutiny.

Months 17 — 24

Up to 4 final agents · full SR 11-7 validation · SOC 1 Type 1

Up to four final agents: MO-04 Mandate-to-Rule · FO-03 Pre-IC Research Packet · FO-05 Manager Selection (conditional) · FO-06 Client Letter (conditional) — last because each consumes every other agent or carries the heaviest blast radius.
All deployed agents formally validated under SR 11-7 + OCC 2011-12 + ECB TRIM. Annual recertification cycle running. SOC 1 Type 1 by M14; SOC 2 Type 2 attestation in BAU year 3 (M28–30).
Quarterly fitness reviews live. At least one agent retired or materially reworked based on telemetry — if nothing has been killed by Phase 3, the framework is not honest.

⌥ Exit gate: 25–35% of measured baseline freed · all deployed agents validated · clean SOC 1 Type 1

Operations & change management

Fifteen named champions, three SLA tiers, quarterly fitness reviews.

Adoption is engineered, not hoped for. Every agent has a champion in the operating team — not a project manager, the actual person whose week the agent reshapes. Incidents are categorised before launch, not after. And every quarter, every deployed agent re-earns the right to keep running.

Champions & training

Fifteen named owners — one per agent.

Per-agent champion — the analyst, PM or ops lead who uses the agent daily and owns its adoption number.
Three-tier training — sponsor briefing, persona-specific deep-dive, weekly office hours during launch month.
Telemetry signals — thumbs up/down, time-to-review, edit volume, abstain rate, re-run rate. Read weekly, not quarterly.
Kill criterion — if weekly active usage falls below 70% for two consecutive months, the agent is paused for triage.

Incident classes

P1 fail-closes the platform.

P1 — agent down in window when the team needs it (e.g. morning brief, breach surface).
P1 — MNPI mis-tag on any document, in either direction.
P1 — HITL bypass — any irreversible action recorded without an approval event.
P1 — audit log unreachable — agents stop, full halt.
P1 — hallucination caught by the verifier or a human after publication.
P2 — citation coverage drops, eval pass-rate drops, repeated abstain on a known-good case.

Quarterly fitness review

Every agent re-earns the right to run.

Re-run the eval suite — Tier 1 + Tier 2 fully, Tier 3 if HIGH-blast-radius.
Refresh adversarial test set — add any new failure modes seen in BAU.
Independent validator sign-off — annual under SR 11-7; spot-check quarterly.
Cost-per-output trend — flat or rising without proportional adoption gain → pause for redesign.
Retire-or-rework — Phase 3 commits to ≥1 retirement / material rework. A programme that never kills anything is not honest.

SLA tier

Audience

Time-to-restore

Communication

Critical

Agents in their morning window — FO-01, MO-01, BO-01.

≤ 30 min · platform-wide kill switch available.

Sponsor + champion within 15 min · post-mortem within 5 days.

Scheduled

Periodic outputs — FO-02, FO-04, BO-02, MO-02.

≤ 4 hours · graceful degradation to human draft.

Champion notified · weekly digest to sponsor.

On-demand

Ad-hoc agents — FO-03, FO-05, BO-04 lookups.

≤ 1 business day · queue resilience required.

Champion only · monthly availability metric.

Unit economics

Fifteen agents, ranked honestly — by net annual benefit and payback.

Every agent is modelled at FTE loaded rate $275k / year ($5,500 / week), Anthropic Sonnet $3 / $15 per million tokens with ~90% prompt-cache discount, and a deliberately conservative adoption ramp. Two agents fail their own gate — FO-03 and MO-04 — and that is exactly why they are deferred behind a unit-economics gate at M14, not committed to build now.

#

Agent

What it does

Net / yr

Payback (months)

Confidence

1

FO-01

Morning Macro Brief

$250k

16 mo · best in class

High

2

BO-01

Recon Break Triage

$200k

26 mo

High

3

FO-02

IC Memo Drafter + Devil's Advocate

$270k

30 mo

Medium

4

FO-04

Earnings Synthesiser

$180k

30 mo

Medium

5

MO-01

Risk Morning Brief

$115k

35 mo

High

6

MO-02

Attribution Commentary

$90k

51 mo

High

7

MO-05

Liquidity Coverage

$60k

77 mo

Medium

8

BO-04

Document Concierge

$95k

83 mo · highest blast radius

Medium

9

MO-03

Compliance Breach Triage

$48k

96 mo

Medium

10

BO-02

Corporate Action

$45k

117 mo

Medium

11

BO-03

Cash Ladder + FX

$65k

71 mo

Medium

12

FO-03

Pre-IC Research Packet

$210k

38 mo · M14 unit-economics gate

Low · gate

13

FO-05

Manager Selection (conditional)

$120k

60 mo · conditional + gate

Low · gate

14

FO-06

Client Letter (conditional)

$100k

66 mo · conditional + gate

Low · gate

15

MO-04

Mandate-to-Rule

$25k

317 mo (~26 yr) · fails its own gate

Low · gate

$2.7–3.2M

Lean Year 1 cost (8 FTE, build-heavy)

$5.7–6.8M

Lean 24-month total cost

$1.0–1.4M

Direct hours-freed value / year (400–600 hrs/mo × $200/hr)

$265–445k

Phase 0 cost — measure-and-decide before build

Decision log

Twelve decisions written down — so they cannot be quietly reversed.

An honest programme commits to its hard calls in writing. These are the decisions that shape every other choice — staffing, sequencing, scope, governance. Each one has a status, an owner and a date. Reversing one means re-opening it explicitly.

DEC-001

Internal-only in Year 1.

No client-facing surfaces until governance is mature. External chat, external API access and external dashboards are explicit non-goals through M24.

Status · Locked · Sponsor + COO

DEC-002

No autonomous trading. Ever.

Agents do not place orders. Not at Phase 1, not at Phase 3, not in BAU year five. Rationale: the risk surface dwarfs the value and trips fiduciary duty.

Status · Locked · CIO + CCO

DEC-003

Kill criteria pre-committed.

An agent that drops below 70% weekly active usage for two consecutive months is paused for triage. Quarterly fitness review can retire it.

Status · Locked · Head of D&AI

DEC-004

Single-tenant LLM only.

No firm data flows into shared training corpora. Vendor contracts must explicitly exclude cross-tenant training and silent retraining.

Status · Locked · Legal + CISO

DEC-005

HITL is non-negotiable.

Every irreversible external action waits for a named human approver. The gate lives at the workflow engine, not the UI — agents cannot bypass it by changing client.

Status · Locked · CCO

DEC-006

Lean is the default.

Eight agents, eight FTE. Full (15 / 14) is conditional on Phase 1 success. Budget is approved Lean only — Full requires a fresh sign-off.

Status · Locked · COO + Sponsor

DEC-007

FO-05 / FO-06 conditional.

Manager Selection and Client Letter are not committed for build. Both sit behind the M14 unit-economics gate alongside FO-03 and MO-04.

Status · Gated · Phase 2 review

DEC-008

Vendor AI is not ground truth.

Bloomberg, FactSet, Aladdin AI outputs flow into a reconciliation step with three explicit verdicts. Agents never cite vendor model output as authoritative.

Status · Locked · Head of D&AI

DEC-009

Phase 0 is mandatory.

Three months of measure-and-decide before any code is written. Eight hard gates, including time-and-motion baseline, vendor selection and policy sign-off, must close before Phase 1 budget is released.

Status · Locked · Sponsor

DEC-010

FO-03 + MO-04 unit-economics gate.

M14 review re-tests cost-per-output, adoption, marginal LLM cost. Both agents fail their own gate at current model cost — only proceed if the gap closes.

Status · Gated · M14

DEC-011

Time-and-motion replaces assertion.

The "30–45% time freed" working assumption is replaced by a Phase 0 measured baseline. Phase 1 ROI is computed against the measurement, not the slide.

Status · Locked · COO

DEC-012

≥1 agent retired by Phase 3.

If nothing has been killed or materially reworked by Phase 3, the framework is not honest. The retirement is a deliverable, not an exception.

Status · Committed · Phase 3 exit

Trust & safety

The four hard rules — baked in, not bolted on.

These are not policies on a slide. Each one is enforced at the platform layer: a misconfigured agent cannot bypass them. Failures fail closed — the agent stops, an incident is logged, and the action does not happen.

Every number is cited

No quantitative claim leaves an agent without a citation back to a system of record. The model can write prose. It cannot invent figures. A second model — the verifier — independently checks every quantitative claim before publication. Citation coverage target: > 99%.

Every action waits for a human

Agents are read-only by default. Anything irreversible — sending a custodian email, filing a document, drafting a client letter, modifying a compliance rule — pauses at a human-approval gate. The gate is enforced at the workflow engine, not the UI.

Every step is logged

Every prompt, every retrieved document, every tool call, every approval, every output is written to an immutable, hash-chained, write-once log retained for seven years. If the log is unreachable, the agent stops — no action, no exception.

MNPI is segregated at ingestion

Every datapoint is tagged at the moment it enters the platform across a six-tag taxonomy — PUBLIC, LICENSED, RESTRICTED, MNPI, MNPI-RESEARCH, MNPI-OPS. Promotion-only mutability, 30-day cooling-off on demotions, CCO-level approver. Agents declare which tags they may consume; the platform hard-fails any cross-channel leak.

Regulator posture

Every agent is treated as a model under SR 11-7 + OCC Bulletin 2011-12 + the firm's internal model risk policy + ECB TRIM where EU funds apply. Each one has a model card, a declared eval suite, an independent validator, kill criteria, and an annual recertification date.

Audit posture is sequenced honestly: SOC 1 Type 1 by Month 14, SOC 2 Type 2 in BAU year 3 (M28–30). The platform produces, on demand, a deterministic replay of any output ever generated — what data went in, which model version produced it, which tools were called in what order, who approved it, when, and why.

Design doc · Determinism

Five execution rules — temperature=0, pinned model versions, prompt immutability, canonical tool ordering, cache-bypass on re-run. /api/v1/validator/rerun reproduces any historical output bit-for-bit within a tolerance band.

Design doc · MNPI tagging

Six-tag taxonomy framed against 10b-5 / MAR / FSMA exposure. Classifier targets 99% MNPI precision at 0.95 confidence; mis-tag is a P1 incident with structural consequences.

Design doc · Adversarial

Nine attack vectors per agent: prompt injection (direct + indirect), adversarial mandate PDFs, MNPI exfiltration, cost-exhaustion. MO-04 is Priority 1, BO-04 is Priority 2. Three-zone trust boundary: instruction · data · user-input.

The numbers we'll watch

A simple dashboard. Honest indicators.

The programme is judged against the Phase 0 measured baseline — not against estimates. Hours freed. Quality at-or-above the human first draft. Zero control failures. Clean model-risk posture. And — most importantly — at least one agent retired or reworked based on telemetry by Phase 3. A programme that never kills anything isn't reviewing honestly.

Programme dashboard · sample at month 24 Live

Hours freed / month

487hrs

↗ +18% vs M18

Recurring outputs drafted by agents

85%

↗ on Phase 3 target

IC memo time-to-first-draft

0.4days

↘ from 2.5 days at start

Breaches surfaced in < 1 hr

96%

↗ above target

Cost per agent-output (vs M6)

−54%

↘ ahead of plan

Hallucination rate (verified)

0.21%

↘ within tolerance

Agents validated (SR 11-7 + OCC + TRIM)

8 / 8 Lean

✓ complete

Agents retired or reworked

2

⌥ honest pruning

The dashboard above is a representative sample. Actual figures will be reported monthly to the steering committee from Month 1; the targets are documented in SUCCESS_CRITERIA.md.

What happens next

Eight Phase 0 hard gates stand between this document and Phase 1.

Phase 1 budget is not released until every Phase 0 gate closes. Each one is documented as an ADR-style decision (DEC-001 through DEC-012) in the Programme Plan. Below are the six most consequential — the rest are in the bundle.

i.

Time-and-motion baseline measured?

A 4–6 week study replaces the v1.0 working assumption (30–45% automatable) with a measured number per task category. Phase 1 ROI is computed against this — not against a Range. Without it, the programme cannot honestly claim to have freed hours.

Phase 0 · DEC-001 · PL + ops + analyst leads

ii.

OMS anchor: Aladdin or Bloomberg AIM?

Determines which adapter is built first, which golden-source conflict-resolution rules apply, and the shape of every front- and middle-office integration. Consequence flows through every later phase.

Phase 0 · DEC-002 · COO + CRO + IT

iii.

Recon and risk engines selected?

Duco vs SmartStream vs AutoRek shapes BO-01's break schema. BarraOne vs Aladdin Risk vs MARS shapes MO-01 and MO-05. Both decisions must precede agent development; agents inherit the engine's vocabulary.

Phase 0 · DEC-003 + DEC-004 · COO / CRO

iv.

Bloomberg licence scope for LLM use cleared?

Does the licence permit Bloomberg data inside prompts sent to third-party LLM endpoints? If not, the Bloomberg path stays inside ASKB or moves on-premises. Legal-owned, blocking everything that touches market data.

Phase 0 · DEC-005 · Legal owns

v.

Cloud, on-prem, or hybrid?

Determines data-residency posture under PDPL / GDPR, dictates whether the Year-3 on-premises LLM line item is needed, and shapes the BAA / DPA terms with the LLM vendor.

Phase 0 · DEC-006 · CISO + COO + Compliance

vi.

External funds? External clients?

FO-05 Manager Selection requires the firm to hold external funds; FO-06 Client Letter requires external clients. Both also gated on the Phase 2 unit-economics review (DEC-010). If neither firm-state holds, Phase 3 effort is redirected to platform hardening.

Phase 0 declaration · DEC-007 · revisited at M14

Agents that think alongside the investment team — never instead of them.

Senior people spend the morning stitching, not thinking.

Two days for a first-draft IC memo.

Compliance breaches noticed late.

Recon breaks closed by hand.

Six personas, six recurring frustrations — each one named.

Portfolio Manager

Investment Analyst

Risk Officer

Compliance Officer

Operations Lead

COO / Head of D&AI

Ten things this platform is. Eight things it isn't.

Six honest course-corrections — before any code is written.

Three months of measure-and-decide before Phase 1 starts.

Eight agents, eight FTE — fifteen only if economics clear.

Two agents in Phase 1, not five.

No more "SR 11-7-style" hedge.

Four tags became six.

Determinism · MNPI · Adversarial.

One job, one persona, one human gate.

Morning Macro Brief

Recon Break Triage

Risk Morning Brief

Attribution Commentary

Document & Audit Trail Concierge

IC Memo Drafter + Devil's Advocate

Compliance Breach Triage

Earnings Call Synthesiser

Corporate Action Interpreter

Cash Ladder + FX Funding

Liquidity Coverage Monitor

Mandate-to-Rule Agent

Pre-IC Research Packet

Manager Selection Analyst

Client Letter Drafter

Five worked examples — actual outputs the team would see.

The Morning Macro Brief lands in Slack FO-01

Twelve recon breaks, triaged before anyone opens email BO-01

An IC memo, drafted in 40 minutes — with its own counter-argument FO-02

Novo Nordisk — Recommend overweight, +120 bps active

A breach surfaced and classified within minutes MO-03

"What did we agree about EM exposure last September?" BO-04

A read layer, a reasoning layer, a human-approval layer, and a permanent log.

Read, never write

Retrieve before reasoning

Numbers from systems, words from the model

Human gate, every time

Seven-year immutable trail

Seven layers, one direction of flow — data up, decisions down.

Three tiers of evidence — the agent earns the right to ship.

Each tier has a different question.

Five execution rules — any output, replayed bit-for-bit.

Temperature locked at zero

Pinned model versions

Prompt immutability

Canonical tool ordering

Cache-bypass on re-run

Bloomberg, FactSet and Aladdin do not get to be the source.

Nine attack vectors — tested before production, retested every quarter.

Direct prompt injection

Indirect / RAG poisoning

Adversarial mandate PDF

Spoofed transcript

Forged custodian notice

MNPI exfiltration

Cost-exhaustion

Tool-call manipulation

Refusal-bypass

System prompts & policies

Retrieved documents & vendor outputs

Analyst / PM messages

Phase 0, then four delivery phases — shipped in the right order.

Measure and decide.

Platform first, agents second.

Two more on the now-real platform.

Earn the harder agents.

Harden for scrutiny.

Fifteen named champions, three SLA tiers, quarterly fitness reviews.

Fifteen named owners — one per agent.