Open Source · MIT · NEW · v1.5

Open skills + formats
for AI-era testing.

Six local-first AI quality checks that ride inside Claude Code, Cursor, Antigravity, Codex CLI, Copilot, Aider, Continue.dev, and any AGENTS.md-aware tool — plus two portable JSON wire formats for the artifacts every testing tool produces (bugs found, test cases written).

Six local-first quality checks.
Works in 8 AI coding tools.

Drop-in playbooks for Claude Code, Cursor, Antigravity, Codex CLI, GitHub Copilot Chat, Aider, Continue.dev, and the universal AGENTS.md. Pure-local — no API key, no cloud call, no account.

Created & open-sourced by jank.ai — MIT, free forever.

jank auto-fix

Find bugs in the latest changes, auto-fix the obvious/dangerous ones, then offer to fix the rest. Works on code, docs, images, and Office files.

jank_deep 12 dimensions

Full quality scan of ALL recent changes with a lime-green ASCII bar graph per attribute — correctness, security, reliability, performance, maintainability, testing, and more.

jank_risk 4-dim risk

Trace recent changes across four risk dimensions — code blast-radius · GenAI smells (leaked secrets, dupes) · user-frustration · business. Includes "is reverting safer?".

jank_confidence statistical

Will-it-survive-production score. Statistical test-coverage analysis (t-tests, p-values, sampling). Flags where you need more runs and estimates the cost & time to close gaps.

jank_diff smart diff

Intelligent AI diff of an API, document, or local file. Ignores noise (timestamps, dynamic IDs, generated metadata) and surfaces only the changes that matter.

jank_stats cumulative

Cumulative metrics — tokens, cost, fixes by category, plus a high-confidence estimate of developer time + money saved. Per-session and lifetime.

Eight clients. One source of truth.

Get the bundle once, then drop it into whichever tool you use. Cards are collapsed by default — open the one you need.

0
Get the bundle — once

Pick whichever you prefer. Every per-client setup below assumes the bundle is at /tmp/opentestai.

Option A · Clone
git clone https://github.com/jarbon/opentestai /tmp/opentestai
Claude Code /jank

First-class slash commands — runs as part of Claude's normal conversation loop.

  1. 1
    Move the bundle into Claude's plugin dir
    cp -R /tmp/opentestai/jank-oss ~/.claude/plugins/jank-oss
  2. 2
    Restart Claude Code, then invoke
    /jank
Cursor @jank

Rules inject the playbook into your Composer prompt — runs inside Cursor's agent loop.

  1. 1
    Copy the Cursor rules into your repo
    cp -R /tmp/opentestai/jank-oss/dist/cursor/.cursor ./your-repo/
  2. 2
    Mention in Cursor chat
    @jank find bugs in my recent changes
Antigravity /jank

SKILL.md packages are first-class — they show up in the command palette and chain with other agents.

  1. 1
    Drop skill folders into Antigravity
    cp -R /tmp/opentestai/jank-oss/dist/antigravity/skills/* ~/.antigravity/skills/
  2. 2
    Open the command palette & invoke
    /jank
Codex CLI codex /jank

Terminal-native — pipe output into shell tools, drop it into a CI step, exit codes work like any Unix command.

  1. 1
    Copy skill packages into Codex
    cp -R /tmp/opentestai/jank-oss/dist/codex/skills/* ~/.codex/skills/
  2. 2
    Run from any project
    codex /jank
GitHub Copilot Chat natural lang.

One file in .github/ and your whole team gets the playbooks via natural-language requests.

  1. 1
    Drop instructions in your repo's .github/
    cp /tmp/opentestai/jank-oss/dist/copilot/.github/copilot-instructions.md ./your-repo/.github/
  2. 2
    Ask Copilot in natural language
    "do a jank scan on the recent changes"
Aider --prompt-file

Scope a session to one quality dimension via --prompt-file — perfect for batch fixes.

  1. 1
    Copy the prompts into your repo
    cp -R /tmp/opentestai/jank-oss/dist/aider/.aider ./your-repo/
  2. 2
    Run a scoped session
    aider --prompt-file .aider/prompts/jank.md
Continue.dev /jank

Open-source extension for VS Code + JetBrains — your whole team gets /jank without per-IDE setup.

  1. 1
    Merge into your Continue config
    cp /tmp/opentestai/jank-oss/dist/continue/.continue/config.json ./your-repo/.continue/
  2. 2
    Invoke in the Continue chat
    /jank
Universal · AGENTS.md natural lang.

Cline, Cody, Roo, Sweep, and any agentic tool that reads AGENTS.md at the repo root — one file, infinite tools.

  1. 1
    Drop AGENTS.md into your repo root
    cp /tmp/opentestai/jank-oss/dist/agents-md/AGENTS.md ./your-repo/
  2. 2
    Ask your tool in natural language
    "do a jank scan" — the AGENTS.md routing teaches the model which playbook to follow.
One source of truth in commands/*.md — edit a playbook, run node scripts/build-all.mjs, and every client's package re-emits. View build script →
Created & open-sourced by jank.ai

The six skills above were built and open-sourced by jank.ai — MIT, no strings.

Beyond the OSS commands, jank.ai also spins up cloud testing: /jank_cloud runs your tests in a real cloud browser · /jank_watch sets up smart-diff URL monitoring with persona-judged better/worse verdicts + dashboards · /jank_loop autonomously hunts & fixes in a loop · /jank_eyes sends your change to real human eyes for review.

Visit jank.ai →

The two formats, in full.

Every field is self-explaining. Every payload is paste-ready. v1 is shipping — comments, critiques, and PRs are open.

01
The artifacts we test on are pre-AI

Testing's wire formats predate large language models. Gherkin and JUnit XML were written for parsers, not readers. TestRail and Xray store steps as styled HTML inside relational rows. None of these survive a copy-paste into a chat window. Bugs live in Jira tickets, free-text comments, and screenshots scattered across Slack. There is no portable JSON for "this is a bug" or "this is a test case" — so every agent reinvents one, and none of them talk to each other.

02
AI changes who reads and writes them

The producer is increasingly an LLM. The consumer is increasingly an LLM. The handoffs — "fix this bug", "run this test", "explain why it failed", "route to the right engineer" — happen between agents, not just humans. The format needs to be self-describing (no schema lookup), reasoning-aware (the model's thinking is part of the payload), and action-ready (the next step is embedded, not external).

03
A standard means tools interoperate

Once a bug or test case is a portable JSON object, every link in the chain — finder, runner, triage, fix — can be swapped. A bug found by one agent can be fixed by another. A test case generated by an LLM can be replayed by Playwright, Selenium, Cypress, or a human. Without a standard, every vendor is an island. With one, the testing layer becomes a composable substrate for AI.

otai/bug · v1

Bug format

jump to spec ↓

A bug is a finding with a confidence, a reason, evidence, and a paste-ready fix prompt. One JSON object, ready to ship to Jira, GitHub, an AI coding assistant, or another agent.

{
  "summary": "",
  "score": 0-100,                          // 100 = pristine
  "issues": [
    {
      "bug_title": "...",                  // short title
      "bug_type": ["primary", "sub"],      // 1 of 8 categories
      "bug_confidence": 1-10,              // is it real?
      "bug_priority":   1-10,              // how urgent?
      "interestingness": 1-10,             // notable to readers?
      "bug_reasoning_why_a_bug": "...",    // the case for it
      "bug_reasoning_why_not_a_bug": "...",// counter-argument
      "suggested_fix": "...",
      "bug_why_fix": "...",                // user/business impact
      "what_type_of_engineer_to_route_issue_to": "...",
      "prompt_to_fix_this_issue": "...",   // paste-ready AI fix prompt
      "possibly_relevant_page_console_text": "...",
      "possibly_relevant_network_call": "...",
      "possibly_relevant_page_text": "...",
      "possibly_relevant_page_elements": "...",
      "tester": "...", "byline": "..."
    }
  ]
}
securityaccessibilityperformanceusabilityvisualcontentreliabilityprivacy
See real bug examples on GitHub
5 hand-crafted payloads · security · a11y · perf · content · full report

Why these fields — and why AI-first

  • Self-explaining field names. bug_title, suggested_fix, bug_why_fix — no codes, no abbreviations. A model reading any payload knows what each field means without a schema doc in its context window. Tokens save themselves.
  • Reasoning is part of the payload. bug_reasoning_why_a_bug + bug_reasoning_why_not_a_bug capture the model's thinking on both sides. Pre-AI bug trackers only stored the verdict; AI-first trackers store the case, because the verdict is sometimes wrong and a human (or another AI) needs to audit it.
  • Confidence and priority are 1–10 integers, not floats or labels. LLMs calibrate well on small integer scales. They struggle with probability floats (overconfident) and with arbitrary enums ("Sev-2"). 1–10 means downstream filters like "confidence ≥ 7" actually work across models.
  • A fix prompt ships with the bug. prompt_to_fix_this_issue is a paste-ready instruction for an AI coding assistant. The format admits that the next step after "bug found" is "AI fixes it" — and ships the prompt that closes the loop, instead of leaving the user to compose one.
  • Routing is a field, not a workflow. what_type_of_engineer_to_route_issue_to lets agents triage and dispatch without external rules. A swarm of fix-bots can self-organize on this field alone.
  • Evidence is loose strings, not structured logs. Console text, a network call URL, the relevant HTML — pasted in raw. Easier to produce, easier to read, no schema for the schema. The model that wrote the bug saw these as text; the model that consumes it sees them the same way.
  • Surface, don't silently skip. Any finding with confidence ≥ 4 is included. The schema's job is to make borderline cases legible (with reasoning + counter-argument), not to hide them. Suppression is downstream work.
otai/testcase · v1

Test case format

jump to spec ↓

A test case is an intent ("test that signup works"), a sequence of steps, and an assertion. Steps describe what to do, why a tester would do it, and what should be true after — in language a human or an AI runner can execute.

{
  "title": "<short imperative title>",
  "goal":  "<what success looks like>",
  "steps": [
    {
      "description": "what the user does",
      "thinking":    "what a tester reasons first",
      "action":      "navigate | click | fill | select | press | wait | assert",
      "target":      "<visible text · aria · css>",   // prefer visible text
      "value":       "<for fill/select/press/navigate>",
      "expected":    "<what is true after this step>"
    }
  ]
}

// or a declarative check (no steps, no model needed at run time):
{ "type": "console-clean" }                       // no console errors
{ "type": "no-broken-link" }                      // no 4xx/5xx links
{ "type": "network-clean" }                       // no failed XHR
{ "type": "selector-exists", "selector": "..." }
{ "type": "selector-missing", "selector": "..." }
{ "type": "text-contains", "text": "..." }
{ "type": "text-missing",  "text": "..." }
{ "type": "alt-text" }                            // every img has alt
{ "type": "contrast-min", "min": 4.5 }            // WCAG AA
navigateclickfillselectpresswaitassert
See real test case examples on GitHub
6 hand-crafted files · 3 multi-step flows · 3 declarative one-liners

Why these fields — and why AI-first

  • Intent lives next to action. thinking is a first-class field. Gherkin lost this — "Given/When/Then" tells you what happens but not why a tester would do that. When an LLM regenerates a step (because the UI changed), it needs the original intent, not the original click target.
  • Targets are visible text, not selectors, by default. target: "Sign up" beats target: "#btn-3 > span.cta" in every way that matters. LLMs see screenshots; humans read screens; CSS selectors break on every redesign. Visible text is the most stable, most portable, most legible address for an interactive element — and it's what the model already has.
  • One small action vocabulary. Seven verbs cover everything a browser test does: navigate · click · fill · select · press · wait · assert. Runners (Playwright, Selenium, a vision-LLM, a human) can each implement the same seven primitives in their own way, and a test case travels between them without translation.
  • Every step has an expected. Steps without assertions are debt. The schema requires you to say what should be true after each action — so an LLM regenerating a test from intent can verify it actually still works, not just compile.
  • Declarative checks are first-class. Some "tests" aren't flows — "no console errors", "every image has alt text", "contrast ≥ 4.5". Pre-AI tools forced you to write a script for these. Here they're a one-line JSON object that any runner can interpret. AI doesn't need to generate code to express a check.
  • Goal-first, not script-first. goal — "user successfully completes checkout" — survives UI changes. The steps don't. When the UI shifts, an AI agent can regenerate the steps from the goal. Pre-AI test suites couldn't do this; AI-first suites have to.
The point

Standards are how an industry stops re-inventing the same thing in private. Email had RFC 822. Browsers had HTML 1.0. Continuous integration had JUnit XML. The AI-testing era needs its own: a bug format and a test case format that LLMs natively read, write, and reason about — portable across runners, finders, fixers, and humans. We're shipping v1 of both. Comments, critiques, and PRs are open.

New · Open for comments

We published the Manifesto for
Confidence Engineering

Perfect software isn't achievable — but confidence is. The manifesto calls out what most engineering teams quietly know: that code coverage percentages, unit test counts, and green CI pipelines are not quality. They're activity. The industry has been measuring the wrong things, and we wrote down why.

Every team that has ever shipped a critical bug had passing tests. The goal isn't more tests — it's knowing enough to ship with confidence.
Open source · MIT licensed

Open-source automated bug detection, persona feedback, and test case generation using 33+ specialized AI testing agent profiles. OpenTestAI analyzes application screenshots, network logs, console logs, and DOM/accessibility trees — selects relevant virtual tester profiles based on artifact content, then runs each tester's specialized prompt to surface high-confidence bugs.

Mode 1
Bug Detection
Find high-confidence bugs using 33 specialized AI skill profiles, each with a focused domain.
Mode 2
Persona Feedback
Generate diverse user persona feedback panels for UX and product insight.
Mode 3
Test Case Generation
Create comprehensive, prioritized test case suites for any page or flow.