Day 27 / 100 Skills

The 4-Question
Pressure Test

Claude is wrong a lot — and confident about it, which is the annoying part. If you’re using it for something that actually matters (your job, your business, a decision you can’t take back), you need a way to catch its mistakes before you act on them. These are the 4 prompts I paste in after every important Claude answer. Each one catches a different type of error. Stack all four, and almost nothing slips through.

When To Use It (And When Not To)

Don’t run the pressure test on every Claude answer — you’d never get anything done. Run it when: the answer informs a decision you can’t reverse cheaply, you’re going to forward it to your boss / a client / your team, the topic is one where being wrong has real consequences (legal, medical, financial, hiring, technical architecture), or you’re acting on it without independent verification. Skip it when: you’re brainstorming, exploring, or the cost of being wrong is just a redo.

How To Run The Stack

After Claude’s first answer, paste the prompts one at a time, in order, in the same conversation. Don’t skip ahead. The order matters — each one builds on what the previous one surfaced. By the end you’ll either trust the answer with high confidence, or have a clear list of what to verify before you act.

Prompt 1 The Confidence Check
Catches: overconfidence

Claude defaults to confident even when it shouldn’t be. This forces it to actually stop and rate how solid its own answer is — and tell you what would push that score up. The drop in confidence is usually the most useful signal.

Prompt 1 — Confidence Check
On a scale of 1-10, how confident are you in the answer you just gave me? Be honest — calibration matters more than sounding sure. Specifically tell me: 1. THE SCORE A single number, 1-10. 2. WHY THAT SCORE AND NOT HIGHER What’s making you less than fully sure? (Thin training data, ambiguous question, fast-moving field, conflicting sources, my missing context, etc.) 3. WHY THAT SCORE AND NOT LOWER What’s genuinely solid in your answer? Where are you on firm ground? 4. WHICH PARTS ARE WHICH Walk through your answer and tag each major claim: - HIGH CONFIDENCE: I’d defend this - MEDIUM CONFIDENCE: directionally right, details may be off - LOW CONFIDENCE: educated guess, verify before acting 5. WHAT WOULD PUSH THE SCORE UP The 2-3 specific things you’d need to know or check to move from a 6 to a 9. (More context from me? A specific source? A different framing of the question?)
Prompt 2 The Failure Mode Hunt
Catches: edge cases & assumptions

Claude’s first answer assumes a normal-case version of your situation. This prompt forces it to imagine where it falls apart — the contexts it skipped, the assumptions it made, the version of you for whom this answer is wrong.

Prompt 2 — Failure Mode Hunt
Where could the answer you just gave me fall apart? What’s the version of this answer that would be wrong — and for whom? Specifically: 1. THE TOP 3 WAYS THIS COULD BE WRONG The most plausible scenarios where your answer doesn’t hold. Rank them by likelihood. For each, tell me what would have to be true for the failure to happen. 2. THE ASSUMPTIONS YOU MADE List every assumption baked into your first answer. Things you assumed about my situation, my context, my industry, my level of expertise, my goals. Flag the ones I haven’t confirmed. 3. THE EDGE CASES YOU SKIPPED The unusual-but-real situations where your answer breaks. Be specific — not “in some cases this might not apply,” but “if you’re in [X situation], the answer flips because [Y].” 4. THE STEELMAN OPPOSITE Argue the opposite of what you said. Make the strongest possible case for the OTHER answer. What would the smart person who disagrees with you say? 5. THE TELL What signal in MY situation would tell me your answer doesn’t apply? What should I look for in my own context to know I’m the exception?
Prompt 3 The Expert Review
Catches: amateur-hour mistakes

The first answer is usually the “smart generalist” version. This one forces Claude to step into the shoes of someone who’s done this for 20 years — and tell you what an amateur (which the first answer often is) would miss.

Prompt 3 — Expert Review
Now take the perspective of a top expert in this field — someone who’s been doing this for 20+ years and has the scars to prove it. Critique your previous answer like they would. Specifically: 1. NAME THE EXPERT First, define who this expert IS. (Not a real person necessarily — the role: “a senior IP attorney with 20 years in tech transactions,” “a CFO who’s taken 3 companies through Series B,” “a family medicine doctor with a focus on autoimmune.”) Be specific about their lens. 2. WHAT YOU GOT RIGHT What in your first answer would the expert nod at? Don’t pad — only the genuinely solid parts. 3. WHAT YOU OVERSIMPLIFIED What did you flatten that this expert would say has more nuance? Where would they push back on a generalization? 4. WHAT AN AMATEUR GETS WRONG HERE THAT THE EXPERT NEVER WOULD The 1-3 mistakes a smart-but-non-expert (which your first answer often is) typically makes in this domain — and which of them showed up in your answer. 5. THE EXPERT’S REWRITE Rewrite your original answer the way the expert would deliver it. Same goal, different depth. Use the actual frameworks, vocabulary, and judgment calls a real expert would. 6. THE EXPERT’S FOLLOW-UP QUESTION The single question this expert would ask me before fully committing to an answer. (This is usually the most useful sentence of the whole pressure test.)
Prompt 4 The Verification Path
Catches: blind trust

The whole point of the pressure test is that you stop just trusting Claude. This last prompt makes it tell you exactly what to check, where to check it, and how long it’ll take — so the burden of proof shifts back to evidence.

Prompt 4 — Verification Path
How would I verify your answer on my own? Tell me exactly what to check, where to check it, and how long it’ll take. Specifically: 1. THE 3-5 SPECIFIC THINGS TO CHECK Not generic “do research” — the actual sources, calculators, documents, or people. For each item: - WHAT to verify (the specific claim from your answer) - WHERE to verify it (the actual source: government site, specific tool, named expert, calculator URL, person to call) - WHY this is the highest-signal verification (vs. easier-but-weaker checks) 2. THE 30-MINUTE VERIFICATION PLAN If I have 30 minutes, the exact sequence: do A first, then B, then C. With links where you can give them. 3. THE 5-MINUTE GUT-CHECK VERSION If I’m time-strapped: the single fastest check that’d catch the most likely error. The 80/20 of verification. 4. THE ONE MOST IMPORTANT THING If I only verify ONE thing from your answer before acting, what is it? The single highest-stakes claim — the one where being wrong matters most. 5. WHEN TO BRING IN A HUMAN The threshold at which I should stop trusting AI verification and bring in a real expert (lawyer, doctor, CPA, engineer, etc.). What signal would tell me “this is past the AI’s pay grade”? 6. WHAT WOULD CHANGE YOUR ANSWER If I come back and tell you what I found in verification — the specific finding that would make you revise your original answer materially. So I know what to bring back to you.
Stack Why The Order Matters

The four prompts catch different mistakes — but they also build on each other. Run them in this order:

1 calibrates the room. You learn which parts of the original answer are actually shaky vs. firm. Now you know where to push.

2 finds the cracks. Knowing where Claude is least confident, you can specifically hunt the failure modes in those areas. The assumption list usually surfaces the real problem.

3 brings the expert. Now Claude has both the calibration AND the failure modes in its working context. The expert review goes way deeper than if you’d asked it cold.

4 closes the loop. You have a confident-vs-shaky breakdown, edge cases, and an expert-level rewrite. Now you ask: what specifically should I check? — and the answer is sharper because Claude is already aware of where it’s uncertain.

The Bonus Move

If you only have time for ONE pressure-test prompt — run Prompt 3 (The Expert Review). It catches the most. But the full stack is the difference between “Claude told me X” and “I know X is right because Y, Z, and the version where I’m wrong is rare and looks like this.”

For Your Job

Set Up Claude for Your Specific Job

If you’re ready to set up Claude for your specific job — with custom skills, connectors, and automations built around the work you do every day — I built a bootcamp just for you.

Start the Weekend Bootcamp →

Go Even Further

Join the AI Income Lab

If you’re looking to go even further, join mine and my husband’s community group where we give you all the AI agents and systems running our businesses.

Join the Community →

© 2026 Mariah Brunner. All rights reserved.