Claude Code · Azure · FinOps
I built a Claude Code skill that does a 20-minute Azure cost review
A read-only Azure cost auditor that produces a written report a human can read and act on. Calibrated against a real tenant. Finds roughly 3× what Azure Advisor surfaces.
The pitch
Most Azure cost reviews are some combination of: someone clicking through the portal for a day, a £15k consultant deck three weeks later, or Azure Advisor — which is honest about what it sees but conservative by design.
I wanted a fourth option: a Claude Code skill that runs in twenty minutes, against a snapshot of the tenant, and produces a written report a human can read and act on. Read-only. Citation-grounded. Reproducible.
Run against a tenant doing roughly £30k/month in Azure spend, the latest run produces:
Total estimated monthly savings: £5,436 – £7,369 / month
Findings by severity: 0 Critical · 18 High · 38 Medium · 174 Low · 8 Info
That's roughly 3× Azure Advisor's published recommendation ceiling for the same tenant (~£20k/year). Not because Advisor is wrong — because the skill carries opinions Advisor cannot.
Below: what it catches, and the architecture that lets it grow without becoming flaky.
What it found
The single biggest finding wasn't a misconfigured VM. It was a 5× App Service P2v3 reservation running at low utilisation with 9 months remaining before expiry. The skill reports both the monthly band and the term-to-expiry forecast:
Underused reservation: P2v3 × 5 — 40% utilisation Forecast: £4,558 – £5,535 between now and expiry
That single line is worth more attention than any number of orphaned-disk findings. It's also the kind of thing Advisor never says, because Advisor doesn't combine spend + utilisation + term remaining into a single forward-looking £ figure.
Other examples from the same run:
- GeoRedundant backup vault flagged on a non-production subscription — switching to LocallyRedundant is a 50% storage cut on Microsoft published rates, but only if there are no protected items at switch time.
- Backup retention bloat on a workload-policy vault: weekly=104, monthly=60, yearly=10 — well above what the source workload actually needs.
- Subscription on full PAYG/EA/CSP whose name says non-prod: a hint that whoever provisioned it skipped the Dev/Test offer. The skill is offer-aware (CSP vs EA vs PAYG) and writes the recommendation accordingly.
- Premium SSD P30 reservations at 25% utilisation, forecast £1,447–£1,953 over the remaining 8 months.
The point isn't any single number — it's the texture. The skill surfaces the kinds of findings that show up in a senior FinOps consultant's deck, not the kinds an Advisor cron job emits.
The architecture
The skill is a monorepo of two installable Python packages:
azure-investigator-core— the read-only collectors. Pulls fromazCLI, normalises into a snapshot directory.azure-cost-investigator— the rule engine. Reads a snapshot, evaluates rules, writes a report.
Numbers as of the latest commit:
- 15 cost rules
- 14 knowledge corpus docs
- 18 collectors
- 227 passing tests (1 skipped — a smoke test gated on a real-tenant snapshot env var)
- Output:
report.md+report.html+findings.yaml
Reservation forecast: a small change with a big narrative effect
A naïve "underused reservation" rule reports a monthly band: "this reservation is wasting £X/month at current utilisation". True, but small.
The skill reads expiryDateTime from the reservation snapshot and adds two evidence fields:
term_remaining_months: 9
forecast_low_gbp_to_expiry: 4558
forecast_high_gbp_to_expiry: 5535
If expiryDateTime is missing, it falls back to a 12-month default (overridable via config). The recommendation copy reads "£4,558–£5,535 between now and expiry" instead of "£506–£615/month" — same underlying data, dramatically clearer impact.
The same rule dispatches across VM, App Service, and Premium SSD SKU families, not just VMs.
Six architectural bets
1. Read-only is absolute. A single function (azcli.run_json) gates every Azure CLI call. Twenty-three unit tests verify it rejects every one of 33 forbidden write verbs before subprocess is reached. Every collector — vms, disks, reservations, recovery_services, log_analytics — goes through the same gate. There is no escape hatch.
2. Citation-grounded knowledge corpus. Every rule declares KNOWLEDGE_REFS pointing at committed .md files in the corpus. Each knowledge doc has frontmatter with source_url, source_retrieved, and source_sha256. A rule that cites a missing document fails to load. This is how I keep the model from inventing numbers — the rule is allowed to compute, but it must justify its limits from a doc that has a Microsoft URL on it.
3. Severity × Confidence as independent axes. Findings carry both. A high-severity / low-confidence finding ("we think this £4k reservation is wasted, but utilisation data is sparse") gets handled differently from a low-severity / high-confidence finding ("this £4 IP is definitely orphaned"). Single-axis lists hide that distinction.
4. Savings as ranges, not point estimates. Every monetary recommendation comes with a non-empty assumption string explaining what's not netted out: negotiated discounts, reservation coverage, regional price drift. Treating the range itself as the answer is a feature.
5. Persona separation in the core package. The core package is shared with a sibling azure-security-investigator skill (a v2 stub in the same repo). When I added the recovery-services collector for cost, the security skill inherited it for free.
6. Rendering is a first-class output. The skill produces report.html automatically alongside report.md, with print-friendly CSS. Browser-print-to-PDF works without extra tooling. This matters because cost reports get sent to humans who don't read terminal Markdown.
The rule corpus
The fifteen rules currently in the skill:
orphaned_disks— unattached managed disks billed by capacity.unattached_public_ips— Public IPs not bound to a NIC or load balancer.vm_rightsizing— P95 CPU under 3% and outbound under 2 Mb/s on a 7-day window.underused_reservations— VM, App Service, and Premium SSD reservations under utilisation threshold, with term-to-expiry forecast.appservice_idle_plans— App Service Plans with zero traffic.storage_tier_mismatch— Hot blobs older than the Cool break-even.basic_sku_retirement— Public IP Basic SKU and other deprecated SKUs.tagging_governance— missing CAF-recommended tags.env_mismatch_in_prod_sub— non-prod-named resources living in production subs.unused_snapshots— disk snapshots older than the retention threshold.nic_orphans— NICs detached from any VM.rsv_backup_retention— Recovery Services Vault retention bloat plus GRS-redundancy scrutiny.dev_test_offer_eligibility— PAYG/EA/CSP subscriptions that look non-prod by name.log_analytics_retention— Log Analytics workspace retention exceeding workload need.advisor_cost_recommendations— surfaces Azure Advisor's own £-quantified findings.
Adding a sixteenth rule means: writing a new file in rules/, declaring its KNOWLEDGE_REFS, writing the knowledge doc, writing tests. No registration step.
What it deliberately doesn't do
A short list:
- No remediation verbs. Read-only is a guarantee, not a default.
- No live web fetches at runtime. Knowledge is committed and SHA256-tracked.
- No reservation purchase recommendations. Sizing reservations is a finance decision, not a script's call.
- No multi-cloud comparisons. AWS/GCP have different mental models.
- No ticket-system integration. The output is a Markdown file. What you do with it is yours.
- No web UI. A static
report.htmlis the closest concession.
Security findings live in azure-security-investigator, a separate v2 stub in the same repo, sharing the core package. Splitting cost from security at the skill boundary keeps each persona's report focused.
Running it
az login # interactive
uv sync --all-packages # one-time
azure-investigator pull -s <SUB_NAME> # one or more --subscription flags
azure-cost-investigator analyse latest # writes report.md + report.html + findings.yaml
Twenty minutes is roughly the wall-clock for a tenant with a few dozen subscriptions worth of resources. Most of it is az calls; the rule evaluation is sub-second.
For a printable PDF: open report.html in a browser, ⌘P → Save as PDF. The CSS is print-tuned.
Repo
github.com/mindaugasnakrosis/azure-costs-analyzer
MIT-licensed. Python 3.11+. CI green on 3.11 and 3.12.
If you run it against your own tenant and it surfaces something interesting (or — more usefully — something wrong), open an issue. Calibration against more tenants is the next thing keeping this honest.
Written with Claude Code. The code is the artefact; the article is the receipt.
Try it
Clone the repo and follow the README — the install path is documented end-to-end, including a smoke test that runs without network or LLM. mindaugasnakrosis/azure-costs-analyzer →