Back to Portfolio HomePortfolioML/Ops Starter

Be Digital · Case study

A day-1 starter for a lean ML/Ops team

A team splitting Ops off from Enterprise to sit alongside a growing ML org needed three things at once: a path off a GitLab instance that had fallen out of security compliance, guardrails for AI tooling that wasn't standardized on one vendor yet, and a way to keep a growing AWS surface patched without dedicated infra headcount. This is the starter I built — and where it deliberately leans on a guardrail pattern already proven out rather than reinventing it.

Goal
Stand up new GitLab + AI guardrails + AWS ownership without new headcount
Hard constraint
Guardrails must hold regardless of which AI tool an engineer picks
Tooling
GitLab CI/CD · Terraform · Renovate
AI tools in scope
Claude Code, GitHub Copilot — and whatever ships next
Profile
Tool-agnostic by construction, not by convention
Enforcement
CI diff gate — the same pattern proven on a production coverage-gate constraint

The problem

Three constraints were arriving at once, and none of them had a dedicated owner yet. The GitLab instance running years of proven CI/CD had fallen out of compliance because IT stopped patching it — migration had to preserve that work, not just move source code. New AI tooling was arriving fast, with no standardization on one vendor, which meant tool-specific guardrail config (deny-rules, hooks) would only ever cover part of the team. And everything in AWS outside the already-owned Databricks and Kubernetes platforms had no owner at all, on a team too lean to watch it by hand.

The common thread: none of these could be solved with a bigger team. They needed to be solved with better defaults.

The approach: three starters, one enforcement pattern

Rather than three unrelated deliverables, the guardrail piece became the connective layer. The same "enforce it where every tool passes through, not where one tool happens to read config" lesson — documented in depth here for a single-repo, Copilot-only case — generalizes directly to a multi-tool team standing up from scratch.

In this starter

  • gitlab-migration/ — export + validate scripts, catching the two things that silently break pipelines post-migration
  • ai-guardrails/ — the CI-enforced gate, tool-agnostic by design, with first-layer config for whichever tools the team actually adopts
  • aws-baseline/ — minimal Terraform + a grouped Renovate config scoped to what a 1–2 person team can realistically own

Deliberately left out

  • A full platform bootstrap — this is a starting point to extend, not a finished platform
  • Any assumption of dedicated infra headcount — every piece is scoped to what a lean team can run unattended
  • A specific AI vendor pick — the guardrail layer works whether the team standardizes on one tool or several

Enforcement that survives a tool swap

Tool-level guardrails — .claude/settings.json deny-rules, PreToolUse hooks, copilot-instructions.md — only cover the tool they're configured for. A team that hasn't standardized yet, or that adds a second tool six months in, needs the real boundary somewhere every tool passes through regardless. That's CI, not tool config.

ANY SOURCE AI tool or human proposes a change Tool-level config First layer — catches mistakes early, tool-dependent ADVISORY CI guard-diff gate The hard backstop — reads protected-paths.yml directly Blind to which tool produced the diff ✓ Merge ✗ Blocked + logged override
Every override is logged with a timestamp, commit, and human-authored reason — an auditable exception, not a silent bypass.

The AWS piece: patching without headcount

Renovate operates on declared dependencies in Git — base images, Terraform provider pins, Helm charts, manifest tags — and opens PRs against them. It does not patch a running host directly. For a lean team, that's still the highest-leverage first move: it requires no new infrastructure, targets the class of vulnerability most likely to go unwatched, and turns "someone should check for CVEs" into a PR that shows up on its own schedule.

Live-host patching (kernel, OS packages on anything not immutable) still needs a second layer — either unattended-upgrades/dnf-automatic via config management, or an immutable-image model where Renovate's PR triggers a Packer rebuild and the fleet rotates onto the new image. The baseline here sets up the first move; the README documents the second.

What I'd carry to the next team

  1. Enforcement has to survive a tool swap. If the guardrail only works for the AI tool you're using today, it's not a guardrail — it's a default that happens to hold until the next tool arrives.
  2. Migration exports don't include secrets, by design. CI/CD variables have to be re-entered or synced separately — worth catching before the first pipeline run, not after.
  3. "Patch it without headcount" means automating the boring 80%, not hiring for the 20% that's actually hard. Renovate plus a scan-and-rebuild loop covers most of the surface a lean team would otherwise have to watch by hand.
  4. Start with the two pieces that map to the next 90 days. The migration and the guardrail gate are usually the ones with a deadline attached; the AWS baseline is the one to grow into as the team's scope grows.

Brian Uckert · Be Digital Biz Inc. · Start a conversation →

Want this kind of work for your team?

Cost-aware, auditable AI workflows — and the platform groundwork around them — that respect your guardrails and your budget.

See GenAI & AppSec advisory