Perfect software isn't achievable, but confidence is. Ship with real evidence, not optimism and vanity metrics.
Through work on real-world, AI-accelerated projects, we have come to value:
AI systems don't produce the same output twice. Pass/fail test assertions assume deterministic behavior — a model that no longer holds. Confidence in AI-generated and AI-operated systems requires statistical evaluation, behavioral distributions, and confidence intervals, not green checkboxes.
Human creativity, judgment, and intuition are most valuable where machines fall short — at the edges of ambiguity, novel failure modes, and decisions that require context no system can fully encode. Leave the rest to the machines.
AI is not the enemy of Confidence Engineering. AI dramatically expands the ability to explore, simulate, analyze, and evaluate systems at scales impossible for humans alone.
94% code coverage and 10,000 passing unit tests are not confidence — they are the illusion of diligence. Running more tests is not the same as understanding more risk. Real confidence is grounded in evidence of actual behavior under real conditions. The question is never "how many?" — it's "do we know enough to ship?"
Modern systems fail less from single defects and more from unintended interactions between independently functioning components, agents, models, and services.
A component that passes all its tests can still break the system. Confidence requires understanding how parts compose, interact, and fail together — not just whether each one works in isolation.
Add your name alongside engineers, testers, and builders who share this vision.
Spread the word — the more engineers who stand behind this, the louder the signal.
By signing you agree to be listed publicly as a signatory.
Confidence was chosen deliberately. It carries no legacy baggage — it isn't tied to "QA," "testing," or any particular methodology or toolchain. It isn't an AI-specific term, but it sits naturally alongside AI: confidence scores, model confidence, probabilistic outputs. That adjacency matters as AI becomes central to how software is built and verified.
More importantly, confidence is what engineering and business actually want. Not a coverage number. Not a pass rate. Not a green pipeline. Those are proxies — means to an end. The end is being able to say: we are confident this works, in production, for real users, under real conditions. Every metric in software quality is ultimately an attempt to approximate that feeling. Confidence Engineering names the goal directly.
This also applies recursively: every metric should itself be held to a confidence standard. How confident are we that this coverage number means what we think it means? How confident are we that these passing tests reflect real behavior? The discipline isn't just about measuring software — it's about knowing how much to trust the measurements.
A note on confidence intervals: in statistics, a confidence interval is a range of values that likely contains the true value of something you're measuring — paired with a probability, like "we are 95% confident the true defect rate is between 0.2% and 0.8%." It quantifies uncertainty honestly rather than collapsing it into a single point estimate. That's the spirit of Confidence Engineering: not a binary pass/fail, but a calibrated, evidence-based statement about how much you can rely on what you've built.