What a real UX audit looks like

By TYPENORMLabs • 7 min • Tue May 26 2026 21:00:00 GMT-0300 (Brasilia Standard Time)

Type "UX audit checklist" into Google and the top ten results read like one article reprinted with different logos — Nielsen's ten heuristics, severity 1 to 3, exported as a PDF. The thing TYPENORM charges for is not that. A real audit produces a scored rubric, three to five prioritized findings, and a one-page exec summary. Not a forty-seven-line document nobody opens twice. This piece names what we score, shows one anonymized run, and prints the output spec.

Why most "UX audits" are Nielsen 10 with a logo

I scanned the first page of results for that query in May 2026. Eleken, UXTeam, HubSpot, UXCam, AIENAI — five different domains, one article. Average length sits around 3,200 words. The depth-pad is the giveaway: they expand on what each heuristic means, not on what to do when two of them disagree on the same screen.

The format pattern repeats. Forty-seven heuristics. Severity rated 1 to 3. Output as a PDF. No prioritization beyond "high / medium / low," which collapses on contact with a roadmap because every finding ends up high.

What's missing is the part that costs real money. No vertical-specific clarity — HIPAA disclosure, MiFID II suitability, GDPR consent, all rendered as generic "compliance feedback." No three-axis rubric, so a confirm modal and a 404 page get the same scoring slot. No severity tied to user harm. A misaligned button and a regulator-flagged disclosure gap come back at the same priority because both deviate from "best practice."

The client gets the PDF, picks two cosmetic fixes off the top, ships them, and the shared frame for the next quarter is never built. The team ran the audit and learned nothing the team did not already suspect.

An audit that doesn't change what ships next month is decoration.

What we actually score

The rubric is the same one we used to compare MedTech and FinTech clarity in an earlier piece. That piece applied it at design time. This one applies it at audit time. Same axes, different verb — score, not engineer.

Stakes — worst-case error cost

We ask one question on this axis: what does the worst-case mistake cost? Not the typical mistake. The worst.

Evidence comes from three places: the incident log, support tickets tagged "wrong" or "didn't mean to," and legal review notes from the last compliance pass. If the team can't produce all three within twenty minutes, that itself is a finding.

A bad Stakes score is rarely "we don't know." It is "we modeled the typical worst-case and called it the worst-case." A team has run the math on a failed $2,400 transfer and rounded "lawsuit, harm, regulator-flagged disclosure" down to zero because none of those have happened yet. Alex Kearns and Robinhood is the canonical case where that axis failed silently. The UI showed a -$730,000 figure that was a margin artifact, not actual debt. The team had calibrated to FinTech's typical worst-case (money). The actual worst-case was human.

Reversibility — time to irrecoverable

We ask: how long until the user can no longer take this action back?

The answer lives in undo paths, dispute windows, cancel-before-pickup flows, and soft-delete versus hard-delete settings. We walk every destructive action in the surface and time-stamp it. Then we compare that number to the copy the user sees on the same screen.

A bad Reversibility score is a UI claim that doesn't match the engine. Wise documents a cancel-before-pickup window in minutes to hours, well-signalled in the receipt screen. Zelle does not — and the UI doesn't say so loudly. The inverse failure is equally common: a product advertises "24-hour cancellation" while the dispute desk lives behind a phone tree and a 72-hour callback. LogRocket's reversible-actions framework is the cleanest external reference we point teams to when this gap shows up.

Cadence — actions per user

How often does this user touch this action? A power user who taps the same confirm modal four times a day will tap through it without reading by week two. High cadence plus low reversibility plus low stakes-perception is the automated tap-through trap — the same shape that lets a daily transfer become a four-figure mistake before anyone sees the screen.

Severity — how a finding becomes a P0

Severity is the audit-specific add. The composite is simple on purpose: (Stakes × Cadence) / Reversibility-days → P0 / P1 / P2. P0 ships this sprint. P1 ships next quarter, monitored. P2 stays in the backlog with a written reason.

It's a heuristic. The point is forcing a number on a finding. A team that argues for an hour over whether something is P0 or P1 has already done the work; they agree it matters, they disagree about the date. That conversation alone is worth more than a 47-line PDF.

One audit, anonymized

I have run this rubric on a handful of recent audits. The one I keep coming back to: Series-B SaaS, FinTech-adjacent — treasury and B2B payments. Forty-eight-hour engagement, one product surface: the in-app transfer flow.

Why this surface and not the dashboard. The team had already paid $99 for a Nielsen-10 audit six months earlier. The PDF lived in a Notion page. Nothing had shipped from it. I asked what they remembered from it and got a shrug. That is the audit-shaped hole the three-axis rubric fills.

Finding 1 — Stakes mis-calibrated. The team had modeled worst-case as a $4,200 failed transfer per incident, based on the 95th-percentile transfer size. I asked the legal lead for the last MiFID-equivalent review. The actual worst-case was a regulator-flagged disclosure gap on the source-of-funds field, which would end one of their three enterprise contracts. Roughly 100× the modeled cost. The disclosure copy lived three taps deep, in a help drawer. Fix: surface the disclosure on the confirm screen, scoped to enterprise tenants. P0.

Finding 2 — Reversibility-window UI claim. The confirm screen said "instant transfer" for every path. The engine ran two: ACH (24-hour hold, batched) and wire (true instant, irrevocable). Same screen, two different reversibility windows, one copy string. Users on ACH thought the money had moved and called support when it had not. Users on wire thought they had 24 hours to cancel and did not. Fix: split the surface — pickup-method drives the copy and the dispute-window callout. P0.

Finding 3 — Cadence trap. The flow had a three-screen confirm. I watched session recordings for daily power users. They were tapping through all three screens within four sessions, total dwell under 600ms. The confirm was wallpaper. No anti-habituation mechanic, no value re-display, no random-position confirm. Fix: defer to P2. Too expensive for the sprint. De-risked with a monitor: flag any session where confirm dwell-time drops below 600ms on a transfer above the user's 90th-percentile size.

Two of three shipped in the next sprint. One deferred with monitoring. Total scope of fixes was smaller than the original PDF — ranked by user harm, not heuristic count.

What you get — the output

Three artifacts, each bounded:

Scored rubric. Three axes by surface, one number per cell. Fits on a screen.
Three to five prioritized findings. One paragraph each — evidence, proposed fix, P-level.
One-page exec summary. The read for whoever signs off on the sprint.

A single rubric cell looks like this:

Surface: in-app transfer flow
Stakes: 4/5     Reversibility: 2/5     Cadence: 5/5
Composite severity: P0
Finding: Worst-case mis-modeled. Disclosure gap → contract loss.

The exec summary is the read for someone with eight minutes between meetings. Each finding gets a one-sentence harm statement, a one-sentence proposed fix, and a P-level. No appendix, no methodology refresher; the full report carries that. If the exec summary does not change what ships next month, we missed.

When not to apply. Pre-PMF products. The rubric assumes the product has users with a worst-case to model. If your worst-case is "user closes the app and doesn't come back," run a usability test, not this audit. The honest limit: this is opinionated. If you want a Nielsen-10 walk-through, the $99 PDFs are right there and we will not bid against them.

Take it further

For one anonymized teardown of how this rubric applies in a real product surface, see our MedTech vs FinTech clarity piece.

If you want a scored audit on your own surface — three to five prioritized findings, a one-page exec, and the rubric run honestly — apply for a Full UX Audit. We run 48-hour and full engagements; the format is the same.

Sources: Eleken — UX audit checklist · LogRocket — UX reversible actions framework · CBS News — Alex Kearns / Robinhood wrongful-death suit · Wise — How do I cancel my transfer.