Six local-first AI quality checks that ride inside Claude Code, Cursor, Antigravity, Codex CLI, Copilot, Aider, Continue.dev, and any AGENTS.md-aware tool — plus two portable JSON wire formats for the artifacts every testing tool produces (bugs found, test cases written).
Drop-in playbooks for Claude Code, Cursor, Antigravity, Codex CLI, GitHub Copilot Chat, Aider, Continue.dev, and the universal AGENTS.md. Pure-local — no API key, no cloud call, no account.
Created & open-sourced by jank.ai — MIT, free forever.
Find bugs in the latest changes, auto-fix the obvious/dangerous ones, then offer to fix the rest. Works on code, docs, images, and Office files.
Full quality scan of ALL recent changes with a lime-green ASCII bar graph per attribute — correctness, security, reliability, performance, maintainability, testing, and more.
Trace recent changes across four risk dimensions — code blast-radius · GenAI smells (leaked secrets, dupes) · user-frustration · business. Includes "is reverting safer?".
Will-it-survive-production score. Statistical test-coverage analysis (t-tests, p-values, sampling). Flags where you need more runs and estimates the cost & time to close gaps.
Intelligent AI diff of an API, document, or local file. Ignores noise (timestamps, dynamic IDs, generated metadata) and surfaces only the changes that matter.
Cumulative metrics — tokens, cost, fixes by category, plus a high-confidence estimate of developer time + money saved. Per-session and lifetime.
Get the bundle once, then drop it into whichever tool you use. Cards are collapsed by default — open the one you need.
Pick whichever you prefer. Every per-client setup below assumes the bundle is at /tmp/opentestai.
First-class slash commands — runs as part of Claude's normal conversation loop.
cp -R /tmp/opentestai/jank-oss ~/.claude/plugins/jank-oss
/jank
Rules inject the playbook into your Composer prompt — runs inside Cursor's agent loop.
cp -R /tmp/opentestai/jank-oss/dist/cursor/.cursor ./your-repo/
@jank find bugs in my recent changes
SKILL.md packages are first-class — they show up in the command palette and chain with other agents.
cp -R /tmp/opentestai/jank-oss/dist/antigravity/skills/* ~/.antigravity/skills/
/jank
Terminal-native — pipe output into shell tools, drop it into a CI step, exit codes work like any Unix command.
cp -R /tmp/opentestai/jank-oss/dist/codex/skills/* ~/.codex/skills/
codex /jank
One file in .github/ and your whole team gets the playbooks via natural-language requests.
cp /tmp/opentestai/jank-oss/dist/copilot/.github/copilot-instructions.md ./your-repo/.github/
Scope a session to one quality dimension via --prompt-file — perfect for batch fixes.
cp -R /tmp/opentestai/jank-oss/dist/aider/.aider ./your-repo/
aider --prompt-file .aider/prompts/jank.md
Open-source extension for VS Code + JetBrains — your whole team gets /jank without per-IDE setup.
cp /tmp/opentestai/jank-oss/dist/continue/.continue/config.json ./your-repo/.continue/
/jank
Cline, Cody, Roo, Sweep, and any agentic tool that reads AGENTS.md at the repo root — one file, infinite tools.
cp /tmp/opentestai/jank-oss/dist/agents-md/AGENTS.md ./your-repo/
commands/*.md — edit a playbook, run node scripts/build-all.mjs, and every client's package re-emits.
View build script →
The six skills above were built and open-sourced by jank.ai — MIT, no strings.
Beyond the OSS commands, jank.ai also spins up cloud testing:
/jank_cloud runs your tests in a real cloud browser ·
/jank_watch sets up smart-diff URL monitoring with persona-judged better/worse verdicts + dashboards ·
/jank_loop autonomously hunts & fixes in a loop ·
/jank_eyes sends your change to real human eyes for review.
Every field is self-explaining. Every payload is paste-ready. v1 is shipping — comments, critiques, and PRs are open.
Testing's wire formats predate large language models. Gherkin and JUnit XML were written for parsers, not readers. TestRail and Xray store steps as styled HTML inside relational rows. None of these survive a copy-paste into a chat window. Bugs live in Jira tickets, free-text comments, and screenshots scattered across Slack. There is no portable JSON for "this is a bug" or "this is a test case" — so every agent reinvents one, and none of them talk to each other.
The producer is increasingly an LLM. The consumer is increasingly an LLM. The handoffs — "fix this bug", "run this test", "explain why it failed", "route to the right engineer" — happen between agents, not just humans. The format needs to be self-describing (no schema lookup), reasoning-aware (the model's thinking is part of the payload), and action-ready (the next step is embedded, not external).
Once a bug or test case is a portable JSON object, every link in the chain — finder, runner, triage, fix — can be swapped. A bug found by one agent can be fixed by another. A test case generated by an LLM can be replayed by Playwright, Selenium, Cypress, or a human. Without a standard, every vendor is an island. With one, the testing layer becomes a composable substrate for AI.
A bug is a finding with a confidence, a reason, evidence, and a paste-ready fix prompt. One JSON object, ready to ship to Jira, GitHub, an AI coding assistant, or another agent.
{
"summary": "",
"score": 0-100, // 100 = pristine
"issues": [
{
"bug_title": "...", // short title
"bug_type": ["primary", "sub"], // 1 of 8 categories
"bug_confidence": 1-10, // is it real?
"bug_priority": 1-10, // how urgent?
"interestingness": 1-10, // notable to readers?
"bug_reasoning_why_a_bug": "...", // the case for it
"bug_reasoning_why_not_a_bug": "...",// counter-argument
"suggested_fix": "...",
"bug_why_fix": "...", // user/business impact
"what_type_of_engineer_to_route_issue_to": "...",
"prompt_to_fix_this_issue": "...", // paste-ready AI fix prompt
"possibly_relevant_page_console_text": "...",
"possibly_relevant_network_call": "...",
"possibly_relevant_page_text": "...",
"possibly_relevant_page_elements": "...",
"tester": "...", "byline": "..."
}
]
} bug_title, suggested_fix, bug_why_fix — no codes, no abbreviations. A model reading any payload knows what each field means without a schema doc in its context window. Tokens save themselves.bug_reasoning_why_a_bug + bug_reasoning_why_not_a_bug capture the model's thinking on both sides. Pre-AI bug trackers only stored the verdict; AI-first trackers store the case, because the verdict is sometimes wrong and a human (or another AI) needs to audit it.prompt_to_fix_this_issue is a paste-ready instruction for an AI coding assistant. The format admits that the next step after "bug found" is "AI fixes it" — and ships the prompt that closes the loop, instead of leaving the user to compose one.what_type_of_engineer_to_route_issue_to lets agents triage and dispatch without external rules. A swarm of fix-bots can self-organize on this field alone.A test case is an intent ("test that signup works"), a sequence of steps, and an assertion. Steps describe what to do, why a tester would do it, and what should be true after — in language a human or an AI runner can execute.
{
"title": "<short imperative title>",
"goal": "<what success looks like>",
"steps": [
{
"description": "what the user does",
"thinking": "what a tester reasons first",
"action": "navigate | click | fill | select | press | wait | assert",
"target": "<visible text · aria · css>", // prefer visible text
"value": "<for fill/select/press/navigate>",
"expected": "<what is true after this step>"
}
]
}
// or a declarative check (no steps, no model needed at run time):
{ "type": "console-clean" } // no console errors
{ "type": "no-broken-link" } // no 4xx/5xx links
{ "type": "network-clean" } // no failed XHR
{ "type": "selector-exists", "selector": "..." }
{ "type": "selector-missing", "selector": "..." }
{ "type": "text-contains", "text": "..." }
{ "type": "text-missing", "text": "..." }
{ "type": "alt-text" } // every img has alt
{ "type": "contrast-min", "min": 4.5 } // WCAG AAthinking is a first-class field. Gherkin lost this — "Given/When/Then" tells you what happens but not why a tester would do that. When an LLM regenerates a step (because the UI changed), it needs the original intent, not the original click target.target: "Sign up" beats target: "#btn-3 > span.cta" in every way that matters. LLMs see screenshots; humans read screens; CSS selectors break on every redesign. Visible text is the most stable, most portable, most legible address for an interactive element — and it's what the model already has.navigate · click · fill · select · press · wait · assert. Runners (Playwright, Selenium, a vision-LLM, a human) can each implement the same seven primitives in their own way, and a test case travels between them without translation.expected. Steps without assertions are debt. The schema requires you to say what should be true after each action — so an LLM regenerating a test from intent can verify it actually still works, not just compile.goal — "user successfully completes checkout" — survives UI changes. The steps don't. When the UI shifts, an AI agent can regenerate the steps from the goal. Pre-AI test suites couldn't do this; AI-first suites have to.Standards are how an industry stops re-inventing the same thing in private. Email had RFC 822. Browsers had HTML 1.0. Continuous integration had JUnit XML. The AI-testing era needs its own: a bug format and a test case format that LLMs natively read, write, and reason about — portable across runners, finders, fixers, and humans. We're shipping v1 of both. Comments, critiques, and PRs are open.
Perfect software isn't achievable — but confidence is. The manifesto calls out what most engineering teams quietly know: that code coverage percentages, unit test counts, and green CI pipelines are not quality. They're activity. The industry has been measuring the wrong things, and we wrote down why.
Open-source automated bug detection, persona feedback, and test case generation using 33+ specialized AI testing agent profiles. OpenTestAI analyzes application screenshots, network logs, console logs, and DOM/accessibility trees — selects relevant virtual tester profiles based on artifact content, then runs each tester's specialized prompt to surface high-confidence bugs.
A growing community of engineers, testers, and builders rethinking how software quality works. Tell us what matters to you.
Welcome to the community. We'll keep you in the loop on episodes, tools, and what the community is working on.