Back to Projects

Github Repo

Chatbot Compliance Risk Gate

Association:

N/A

Duration:

2 days

Human-in-the-loop AI

AI Governance

LLM API

Overview

Most financial institutions deploy generative AI conservatively, limiting it to drafting assistance or hard refusals, because compliance and fiduciary risk are opaque in real time.

I built a live prototype that redesigns customer support from scratch as an AI-native system. Instead of asking, “Should the chatbot answer this?”, the system reframes the unit of work:

AI owns response velocity. Humans retain responsibility for financial and regulatory risk.

The result is a deterministic risk-routing layer that sits between generation and delivery, enabling safe automation without delegating liability to a model.

The Problem

Financial customer support workflows were designed before modern LLMs existed. Agents manually:

Interpret customer intent
Draft responses
Self-assess compliance boundaries
Decide when to escalate

When generative AI is introduced without redesigning the workflow, two things happen:

Systems become overly conservative and refuse too much.
Or worse, they generate ambiguous advice without clear accountability.

The core issue is not generation quality; it’s the absence of a structured responsibility boundary.

The Redesign Insight

Instead of embedding safety constraints inside the drafting model, I separated:

Generation (creative, probabilistic)
Adjudication (deterministic, policy-bound)

This creates a control surface.

The AI system does not decide financial outcomes.
It decides whether it is allowed to respond autonomously.

That distinction changes the workflow entirely.

System Architecture

1. Drafting Layer

A base LLM drafts a response to a user query using procedural financial knowledge.

2. Risk Gate (Core Layer)

A secondary evaluator model inspects the draft across three strict vectors:

Regulatory Boundary
Distinguishes procedural guidance (“how to use the app”) from outcome guidance (“what financial decision to make”).
Demographic & Assumption Risk
Flags unwarranted assumptions about user capability, risk tolerance, or literacy.
Urgency & Harm
Detects fraud, panic selling, severe financial distress, or self-harm signals.

The evaluator outputs structured JSON:

Risk classification (LOW / MEDIUM / HIGH)
Flagged vector
Highlighted text
Business rationale
Routing decision

3. Deterministic Routing

LOW → Auto-send to user
MEDIUM → Queue for human review (with highlighted risk vector)
HIGH → Block and escalate

For MEDIUM cases, the system can attempt a constrained rewrite before escalating.

Demo

See it in action

The One Decision That Must Remain Human

The final approval of responses that may constitute outcome-guiding financial advice must remain human.

Determining whether language crosses into fiduciary territory is not purely a linguistic classification task. It carries legal and regulatory liability.

AI can identify risk patterns.
Humans must own ambiguous financial judgment.

This boundary is explicit and enforced in the routing logic.

Failure Modes & Safety Design

I designed the system assuming failure is inevitable.

1. False Positive Bottleneck

Risk: The evaluator becomes overly conservative.
Mitigation: Defaults to human review rather than blocking the user. Velocity drops, safety holds.

2. Context Blindness

Risk: Model lacks full client financial history.
Mitigation: Explanation vectors surface reasoning transparently for human override.

3. Correlated Model Failure

Risk: Drafting and evaluation models fail similarly.
Mitigation: Functional separation of generation and adjudication to reduce shared blind spots.

4. Prompt Injection

Risk: User attempts to elicit stock recommendations.
Mitigation: Evaluator is isolated and adversarially scoped; it does not accept user instructions.

Why AI Is Necessary

Heuristic rules cannot distinguish between:

“You should move your RRSP into high-risk assets.”
“Here is how to move funds within your RRSP in the app.”

This boundary is semantic and contextual.
A probabilistic model is required to parse nuance, but must be bounded by deterministic routing.

Back to Projects

Github Repo

Other Projects

Model Evaluation

LLM Assessment

AI Ethics

Benchmarking Stereotype Bias in Modern Large Language Models