Be Digital · Field notes · GenAI Advisory

The Credit Fire: A Phoenix Project Parable About Copilot Spend

For the team. Same facts as the one-pager — told the way Erik would tell it.

Day 2. 9:14 a.m. The Slack that lit the fire.

Marcus posts it to the whole channel, and you can feel the room tense through the screen:

"Hey team — it's day 2 and I've already burned 55% of my Copilot credits. I'm on GPT-5.4 Medium. I tried planning with it and executing on the mini model, but that went badly. My conclusion: just keep using 5.4 Medium."

— Marcus, via Slack

It's the kind of message that spreads panic quietly. If Marcus is at 55% on day 2, the whole pooled budget is a grease fire, and everyone's doing the math on how many days until the org runs dry. The instinct in the channel is the same instinct on every plant floor in trouble: find the one broken machine and never touch it again. Lock the model. Stop experimenting. Hunker down.

That's when Erik — the crusty platform lead who's seen three billing models come and go — wanders over, coffee in hand, and asks the question that changes the shape of the problem.

"You think this is a model problem. It's a work-in-progress problem."

"You think this is a model problem. It's a work-in-progress problem."

"Marcus," he says, "what were you charged for?" "Requests?" "Wrong. That changed June 1. Copilot is usage-based now. You're not billed per prompt — you're billed on tokens: input, output, and cached, times the model's rate. One credit is a penny. Your seat's allowance is pooled with the whole team's." He taps the desk. "So every request pays for everything you drag into it. Your instructions file. The whole-repo context. The chat history you never cleared. Every tool call. The thinking effort."

He lets that land.

"You didn't burn 55% because Medium is expensive. You burned it because you were hauling a freight train of context through every single prompt. Downgrading the model shrinks one car on that train. It's usually not the biggest one."

Erik calls it what it is: excess work-in-progress. Bloated context is inventory piling up on the floor — invisible, expensive, and slowing everything down. And like all WIP, the fix isn't working harder on the one machine. It's making the work flow smaller and cleaner.

← Gets longer every turn →

Instructions

History

Tools

Files

Thinking

Model Rate

Each turn re-bills the entire context. The model is one car — often not the largest.

The First Way: See the flow. Six levers, not one.

Erik grabs a marker. "Model choice is lever number six. Here are the other five, and they cost you more."

The always-loaded tax

Your copilot-instructions file is injected into every request and billed every turn. Three hundred lines of house rules? You're paying for all of it on every prompt, forever. Keep it lean. Push heavy, situational guidance into skills that load only when invoked — not into the file that rides along on everything.

Chat hygiene

A long session reprocesses its entire history on every message. New task? New chat (Ctrl/Cmd+N). Long thread you still need? /compact to summarize and reclaim it. Side question? /fork so you don't rebuild context from scratch.

Tools and MCP servers

Every enabled tool's output eats context and credits. Turn off what this task doesn't need.

The files you swore were excluded

The index respects .gitignore and files.exclude, so build junk and data dumps cost nothing — until you open one. Open an ignored file and it walks straight into context anyway. Don't open what you don't need.

Thinking effort

More reasoning = more tokens = more credits. The defaults are tuned; only crank it for real architecture or gnarly multi-step debugging.

The model itself

Then, and only then, match model to task — light models for edits and boilerplate, reasoning models for the hard stuff, "Auto" to let it route.

"Five of those," Erik says, "are about the size of the work. Not the machine."

1 Instructions tax — trim always-loaded files

2 Chat hygiene — new chat, /compact, /fork

3 Tools & MCP — disable unused servers

4 Excluded files — don't open what you don't need

5 Thinking effort — default unless architecture

6 Model choice — last, not first

Impact ↑

LOW

Model choice is lever #6 — the five above it are about the size of the work, not the machine.

The Second Way: The feedback loop Marcus was missing.

"Now — your plan-then-execute experiment." Erik almost smiles. "That wasn't a bad idea. That's literally the recommended pattern. Reason big to make the plan, execute cheap to run it. You didn't fail the strategy. You failed the handoff."

He counts it off:

The plan you gave the mini model was vague — not a reviewed, step-by-step spec. So it guessed. And guessing means rework, and rework re-bills every token. You paid twice to save once.
You executed in the same bloated chat, so the cheap model reprocessed the whole freight train. Cheap model, expensive context.
And maybe the mini model was just too weak for those edits — so every retry lit another match.

"Run it right: Plan agent builds the plan with a reasoning model. You review and tighten it — that's the feedback loop, that's where defects die cheap. Then open a fresh session, drop to a faster model, and hand it only the approved plan. Trim the tools. If you still see rework, step the model up a tier — but fix the plan and the context first. Rework is the fire. The model is just the spark."

The Third Way: Make the invisible visible, and make it a habit.

"You can't fight a fire you can't see," Erik says, pulling up three instruments:

Copilot status dashboard (Status Bar)

Your % of allowance, live.

/chronicle:cost-tips

Personalized advice from your actual recent sessions.

Agent Debug Logs

The Summary view shows token totals and tool calls; the Cache Explorer shows your prompt-cache hit rate. (A stable instructions prefix gets cached — and cached is cheaper. One more reason to stop thrashing your context.)

"Watch these the way you'd watch a dashboard in prod. Small habits, every day — not one heroic lockdown."

The morning after.

Marcus doesn't lock the model. He trims the instructions file, splits the heavy stuff into skills, starts a clean chat per task, kills the MCP servers he wasn't using, and runs plan-then-execute the way Erik drew it — reviewed plan, fresh session, cheap execution.

By Friday his burn rate is a third of what it was. The channel exhales. And the lesson makes it onto the team wiki, in Erik's words:

"The bottleneck was never the model. It was the work-in-progress. Shrink the context, clean the flow, make the spend visible — and the cheap-execution pattern finally pays off."

— Erik

Day 2 — before

55%

Grease fire: bloated context and rework.

🔥

Friday — after

18%

Phoenix mode: reviewed plan, fresh session, trimmed tools.

🪽

Same work, same models available — just smaller context and cleaner flow.

Sources

Confirm your org's exact credit allowance and pooling with your Copilot admin. Characters are illustrative; the facts and commands are real.

Be Digital — notes from the bench. Related: Claude Skills & context engineering · Claude Code implementation field notes · A Copilot-only coverage workflow · Cost optimization: working with tokens · Context & token cheat-sheet

Want this kind of thinking for your team?

Token cost is a work-in-progress problem. The advisory engagement applies these patterns to your specific tooling and compliance framework.

See GenAI & AppSec advisory