Case Study · 2025

Fifty bugs, seventy-two hours, one launch.

A GenAI-powered enterprise product was due to launch at Adobe Summit 2025. I led the QA sprint that took it from "maybe" to a flawless reveal — zero post-launch defects, on stage, in front of the room.

Role: GenAI QA Lead
Org: Accenture
Stage: Adobe Summit 2025
Window: 72 hours

// 01 · Challenge

The 72-hour window before Adobe Summit.

We were T-minus 72 hours from a public reveal at Adobe Summit 2025, and the GenAI product had a stack of issues nobody had triaged in priority order — just an unsorted queue and a deadline. Some were cosmetic. Some would have made the product unusable on stage. Telling them apart was most of the job.

GenAI launches are different from classical software launches. The failure modes aren't just "broken" — they're "subtly wrong in a way that makes the audience lose faith." A factual hallucination in a demo prompt is a worse outcome than a crash, because a crash you can recover from. A confidently-wrong answer is the headline.

// 02 · Process

Triage, kill, ship.

I worked the queue in three passes, each with a different lens.

Pass one — categorizeEvery issue tagged: blocker, embarrassment, polish, or won't-fix. Forty percent dropped to won't-fix on the first pass.
Pass two — reproduceEvery blocker had to repro in a clean environment. If it didn't, it wasn't a blocker — it was a story we were telling ourselves.
Pass three — fix or fenceEach surviving issue got either a fix in the build or a fence around the demo script so it couldn't be triggered.

That last move is underrated. In a 72-hour window, you don't fix everything. You fix what matters and you build a demo path that doesn't go near the rest. The script is part of the product.

// 03 · Outcome

Zero post-launch defects.

50+

Critical issues resolved

72h

Pre-launch window

Post-launch defects reported

The product launched at Adobe Summit 2025 cleanly. No post-launch defects came in from the field, and the demo path held under live audience pressure. The win wasn't the heroics — it was the discipline. Triage was the product.

// 04 · Writer AI

Evaluating Writer AI for a Fortune-500 client.

In parallel, I led a structured evaluation of Writer AI's enterprise capabilities for a global communications client. The deliverable was an adoption strategy that informed a real go/no-go decision — meaning the eval had to actually surface where the tool's limits were, not just where its demos shined.

I built the eval around the client's three highest-volume use cases, ran them at production-like prompt volumes, and benchmarked outputs against in-house communications standards. The output was a one-page recommendation, three-page rationale, and a list of guardrails the client would need before adoption.

// 05 · Learnings

QA for GenAI is its own discipline.

Define "wrong" before testingFor a GenAI product, "wrong" isn't binary. Build the rubric first; testing without one is theater.
The demo path is a featureScripted paths through the product are not a workaround — they're a deliberate design choice for high-stakes reveals.
Confidence is calibratedThe thing you're shipping isn't accuracy. It's the right level of accuracy paired with the right level of hedging.

Next case →

GCV — Global Collaboration Village (Davos)