🛡️ AppSec MCP Server

Reference implementation — a guardrail-wrapped Model Context Protocol server for regulated-bank application security

Let AI assistants drive the bank’s AppSec scanning ecosystem — Checkmarx, Black Duck, Invicti, Corgea — through one server, with security controls wired into every call: audit, per-app authorization, credential isolation, PII redaction, and a human checkpoint before any action.

8MCP Tools
5Ordered Guardrails
3Pluggable LLM Providers
4Scanner Adapters
0Secrets in LLM Context
100%Calls Audited

📝 The Problem

Application-security backlog scales with the number of applications, not the size of the AppSec team. The traditional answer — wire each scanner into each team’s pipeline by hand — is an N teams × M scanners × P applications integration cost that has to be maintained forever.

At the same time, AI assistants are arriving in developer workflows fast, and banks are — rightly — nervous about handing an autonomous agent the keys to security tooling, source code, and scanner credentials. The hard part isn’t calling a scanner from an LLM. The hard part is doing it in a way that survives an audit.

✅ The Solution

A single Model Context Protocol (MCP) server that exposes the AppSec scanning ecosystem as a unified, LLM-callable tool surface. Any client — a developer’s IDE, an internal triage agent, Claude Desktop or Claude Code — calls one server. The server orchestrates the right scanner, normalizes the findings, retrieves the bank’s own policy as grounding, and surfaces patches.

Every tool call is wrapped by an ordered stack of security guardrails before anything executes. The thesis: you can give engineers AI-assisted security tooling without loosening a single control. Build the server once; onboard hundreds of apps through a thin manifest.

🏗️ Architecture at a Glance

One server, three transports, eight tools, five guardrails, three pluggable LLM providers, RAG over the bank’s policy catalog, and an end-to-end OpenTelemetry observability plane.

System Architecture

LLM clients → guardrail-wrapped tool runtime → scanners, RAG & LLM providers — every call observable end to end.

AppSec MCP Server architecture: LLM clients connect over stdio/SSE to a guardrail-wrapped MCP server exposing 8 tools, with RAG-backed resources, a pluggable LLM provider abstraction, mock scanner adapters, a credential store, and an OpenTelemetry observability plane feeding Tempo, Loki, and Grafana.
LLM clients MCP core / tools / RAG Guardrail middleware External processes Observability Persistent store

🔐 The Five Guardrails

Middleware wraps every tool call. The guardrails run in order — an action never reaches a scanner or an LLM until each one has passed.

G3

Audit

One NDJSON record per call — caller, tool, app, latency, outcome. The trace ID is the audit-log correlation key.

G4

Scoping

Per-app authorization re-checked at every tool entry, not just at the gateway. Calling a tool for an app you’re not authorized on returns 403, not a filtered result.

G1

Credential Isolation

Scanner secrets are pulled from the secret store at call time and never enter LLM context, logs, or trace attributes.

G5

Redaction

SSN / PCI / email scrubbed before anything reaches an LLM. Every hit emits a metric — a spike is itself a finding.

G2

Human-in-the-Loop

Any actionable output is flagged requires_approval=true. The server never auto-suppresses, auto-closes, or auto-merges. Agentic ≠ autonomous.

🛠️ Tool Surface — 8 Tools

A deliberately bounded surface. The server exposes only the AppSec tools it explicitly catalogs — no shell, no arbitrary code execution.

checkmarx_scanSAST scan orchestration
blackduck_scanSCA / dependency scan
invicti_scanDAST scan orchestration
corgea_triageLLM-assisted false-positive triage
finding_listNormalized findings query
finding_patch_suggestSmallest-patch suggestion
app_onboardManifest-driven app onboarding
app_statusPer-app posture & scan state

In this reference implementation the scanners are FastAPI mock stubs that honor the same adapter contract a production Checkmarx / Black Duck / Invicti / Corgea integration would — so the orchestration, normalization, and guardrail logic is exercised end to end without live vendor credentials.

🔌 Transports — stdio, SSE & Streamable HTTP

MCP transports define how JSON-RPC messages travel between client and server. The server speaks more than one so it can serve both local and networked clients.

stdio

The client launches the server as a local child process and talks over stdin/stdout. No network, no ports — the lowest attack surface. Right for desktop and IDE clients on the same machine (Claude Desktop, Claude Code).

SSE — legacy remote

Server-Sent Events was MCP’s original remote transport: a network HTTP service with a persistent stream plus a separate POST endpoint. Still supported for backward compatibility, but deprecated as of the 2025-03-26 spec.

Streamable HTTP — current remote

The current recommended remote transport: a single endpoint, bidirectional, and friendly to load balancers and firewalls. New networked deployments target this; one server can bind both stdio and Streamable HTTP, gated by a flag.

🧳 Redaction — “Privacy Orchestration”

Redaction is a middleware step that runs before any boundary — LLM context, logs, or span attributes. Scan findings carry source code, so this is not optional. It works in three layers:

1 · Structured detection

Deterministic patterns for well-formed sensitive data: SSNs, payment-card PANs (Luhn-validated to cut false hits), emails, API keys and high-entropy strings. Cheap, and catches the bulk.

2 · Unstructured detection

An NER / classifier pass for what patterns miss — names, addresses, and data classes embedded in code comments or test fixtures.

3 · Action & signal

Matches are masked or tokenized (not deleted, so the finding still reads coherently for triage), and every hit emits a metric. That count is its own security signal.

🧰 Pluggable LLM Providers

The server is not locked to one AI vendor. A single LLMProvider interface (complete() / embed()) sits in front of three interchangeable implementations, so the model is a configuration decision — pinned and model-risk-tracked — not a hard dependency.

Ollama — local

Fully on-premise / open-weight serving. The default for a regulated bank where source-code findings must never leave the boundary.

OpenAI — API

Hosted option behind the same interface, available where an approved enterprise tenant and data-handling terms permit.

Anthropic — API

Hosted Claude models behind the same interface, swappable without touching tool logic.

🔍 RAG-Grounded Triage

Triage decisions can’t hallucinate at a bank. Rather than trusting a general model’s judgment on a domain-specific call, the server retrieves the bank’s actual policies, suppression history, and per-app findings and feeds the most relevant policy docs into the prompt — with outputs that cite the retrieved evidence for a human to verify.

ResourceRole in the RAG knowledge base
appsec://apps/{id}Per-application inventory & metadata
appsec://policiesPolicy catalog — the source of truth for triage
appsec://suppressionsSuppression registry with owner & expiry
appsec://findings/{app}Normalized prior findings per app

Retrieval is backed by a lightweight local SQLite + numpy store — the resource surface is the knowledge base.

📊 Observability Plane

Every tool call emits telemetry over OpenTelemetry. Steady-state runs at a 5% sample by default; an X-Debug-Request header flips a single request to 100% capture with DEBUG logging — cheap monitoring plus full forensic detail on demand.

OTel SDK → Collector

Spans, metrics and logs emitted via the SDK’s on-demand sampler, shipped to an OTLP collector on :4317.

Tempo & Loki

Traces land in Tempo, logs in Loki. The span’s trace ID is the key that ties a request to its audit record.

Grafana

Four dashboards on :3000 — latency, throughput, redaction hits, and per-tool error rates.

🏛️ Built for a Regulated Bank

The controls are the point, not an afterthought. Design choices map directly to a financial-services posture:

  • Continuous security validation — per-tool-call audit emission and baseline-parity checks, applied to AppSec.
  • Defense in depth — per-app scoping at the call layer and the gateway and the audit log.
  • Privacy orchestration — PII / PCI redacted pre-LLM, with every hit counted.
  • Model-risk aware — every LLM in the loop is a pinned, tracked model, not a transparent dependency.
  • Examiner-ready — the trace ID is the audit key; every action is logged, scoped, and revocable.

🔧 Technology Stack

FastMCPPythonstdio / Streamable HTTP SQLite + numpy (RAG)OllamaOpenAIAnthropic FastAPI (scanner stubs)Secrets Manager OpenTelemetry / OTLPTempoLokiGrafana

🤝 Let’s Connect

Interested in MCP-driven AI orchestration for security tooling, or applying these guardrail patterns in a regulated environment?

Brian Uckert

Platform & AI-Augmented Application Security

Be-Digital.biz

brian.uckert@be-digital.biz