Claude Code · Jira · Agile
From markdown notes to a sized Jira backlog in five minutes
A working agent that does the unglamorous part of agile delivery — and refuses to ship vague work.
Last week I tested an idea on a deliberately rough product brief — an Inventory Management System change list, six themes, eighteen features, three bugs, written in the kind of plain language people actually use when they're thinking. No Jira terminology, no story-point talk, no acceptance criteria. Just sections like "Real-Time Stock Visibility" and bullets like "extend the stock read model to track bonded and duty-paid quantities independently."
The traditional next step is for a business analyst or a senior engineer to spend half a day grooming that into Jira: deciding the Epic / Story / Subtask split, writing INVEST-compliant summaries, sizing in modified Fibonacci, picking priorities, applying a label taxonomy, and writing acceptance criteria a tester can actually verify.
I gave the markdown to Claude Code with a Jira-aware skill loaded. Five minutes later there were 48 tickets in a Jira project, sized, prioritised, and labelled, with descriptions written to a senior-coach standard.
Here's what came out, what didn't, and why the interesting thing isn't the tool.
The run
The skill is mdjira — a tiny Python CLI plus a Claude skill file that encodes the working canon of agile and Jira practice. Citations to Bill Wake on INVEST, Mike Cohn on the user-story template and Fibonacci sizing, Dan North on Given/When/Then, Atlassian on the epic/story/subtask hierarchy and the "Miscellaneous Bug Fixes" anti-pattern, Roman Pichler on slicing the cake. Open the skill file and every rule has a footnote.
The flow:
- I typed
/mdjira IMS-project-example.mdin Claude Code. - The skill read the markdown, made structural decisions, and drafted an
intake.yamlnext to the source file. - It ran a lint check (17 mechanical anti-pattern rules — empty descriptions, single-child epics, unsized stories, uniform priorities, padding subtasks, label spam) and reported a 30-second human-readable preview.
- I approved.
- The CLI made three bulk API calls to Jira (
POST /rest/api/3/issue/bulk, one per hierarchy level — Epics, Stories, Subtasks) and wrote aresults.jsonaudit file. 48 tickets, SCRUM-5 through SCRUM-52.
That's the surface story. The interesting part is what the skill decided in step 2.
The judgment shown
Three calls stood out, the kind a junior would miss and a senior would defend.
It killed a "Defects to Fix" epic. The source markdown had a section titled "Defects to Fix" with three bugs underneath: a concurrency race, a timezone export bug, and a search bug ignoring accented characters. The naive move is to honour the source structure: Epic "Defects", three child Stories. The skill refused. Atlassian's own guidance calls a "Miscellaneous Bug Fixes" epic an anti-pattern by name — only create an Epic if there is a clear beginning, end, and specific value being delivered. The three bugs share nothing except being defects. So the skill distributed them: the concurrency race attached to the real-time stock epic (it directly threatens the data-integrity outcome that epic depends on), the timezone bug to the audit-and-compliance epic (where exports live), and the accented-character search bug to the UX epic (catalogue-search friction). Each kept a defect label so they remain filterable as a set.
It split a too-big story. The source had a single recommendation called "Two-way sync with the trading platform" — publish stock-reservation events from IMS, consume shipment events back, with idempotency on redelivery. As a single story that's a sub-epic in disguise: two surfaces, two teams' concerns, sequenced consumer-depends-on-publisher. The skill split it into INTEG-PUB and INTEG-CONS, sized each at 5 points, and noted in the description that the consumer is dependent on the publisher's contract. That's textbook Pichler — slicing the cake on goal lines, not technical lines.
It refused a uniform Highest. A common mistake under time pressure is to mark everything as High because everything feels important. The skill flagged exactly one ticket as Highest: the negative-stock concurrency race. Its rationale, in the preview message: silent data corruption from a known race, undermines every other RTSV story, active fire. Everything else landed at High / Medium / Low in proportion. The final spread was 1 Highest, 5 High, 9 Medium, 2 Low — which is what a healthy backlog looks like. If everything is Highest, nothing is.
What got written, not just decided
Each story description follows a fixed template — Why this matters, Scope, Acceptance criteria, Source — so a stakeholder can read any single ticket cold and know what's being done and why. The acceptance criteria are written as observable outcomes, not implementation steps. From the live-stock story:
- A stock movement recorded by the warehouse adjustment service is reflected on the dashboard tile within 60s
- When the client loses connectivity for up to five minutes, it reconnects and resyncs to current stock without user action
- No regression in initial dashboard load time (within ±10% of baseline)
Compare to what a rushed draft would have written: "Real-time updates work. Reconnection handled." Both describe the same feature; only one is testable.
For backend / infra / bug stories, AC are bullet checklists. For user-facing flows, AC are Given/When/Then. The skill picks per story and doesn't mix formats — force-fitting Given/When/Then onto a database resize is something amateurs do.
Story points are on the modified Fibonacci scale (1, 2, 3, 5, 8, 13). Anything that would size to 20+ has to be split before it can leave the skill — Cohn's point about the gaps growing with size is the explicit policy. The 17 stories from this run totalled 91 points: 1×2pt, 3×3pt, 8×5pt, 5×8pt, no 13s. Every 5-point or larger story has a sizing rationale in the preview.
What this is, and what it isn't
mdjira is not a "wrap GPT around Jira" wrapper. The LLM is a substitutable component; Claude today, something else next year. The product is the encoded senior-coach prompt — about 400 lines of agile and Jira practice with citations, anti-pattern checks, title patterns by ticket type, and a description template that's mandatory for every ticket. It's the kind of thing a £1000/day Jira contractor would charge a quarter to teach a team and then leave behind half-applied.
What surprised me, building it: the prompt's value is mostly in the refusals. The skill refuses to write a ticket without a "Why this matters" line. It refuses to size a story above 13 without splitting. It refuses to ship an epic with one child. It refuses to mix Given/When/Then with bullet AC in the same ticket. It refuses to use labels as priority synonyms. The cumulative effect is that the output is worse to argue with than to accept, because the alternative is producing the same level of rigour by hand.
That last sentence is the whole bet. Most agile tooling assumes you'll fight the rigour and lose. This one assumes you've already lost the fight against the rigour and want help applying it consistently.
Try it
Repo: https://github.com/mindaugasnakrosis/mdjira
pip install mdjira
mdjira init
mdjira install-skill
Then in Claude Code, on any markdown file:
> /mdjira specs/your-doc.md
It will preview before writing. It will refuse to ship vague work. The first run on a real document is the only review you need to decide whether the bet — that encoded judgment is more valuable than encoded automation — holds for your team.
MIT licensed. Issues, forks, and bug reports welcome. The skill file is the spec — disagree with a rule? Fork it. That's the point.
Try it
Clone the repo and follow the README — the install path is documented end-to-end, including a smoke test that runs without network or LLM. mindaugasnakrosis/mdjira →