Micah Berkley
Miami, FL[email protected]
All writing
ResearchMay 2026·12 min read

Voice AI that actually converts: what 50,000 calls taught me.

Every AI phone call must clear three gates: get answered, get trusted, get filed. Field data from a production legal-intake voice system shows what moves each one.

Vintage telephone on a yellow background
Three gates per call: answered, trusted, filed.

An AI phone call earns a signed case or a hang-up, and it decides which in seconds. I have 50,000 production calls of data on what separates the two.

Here is the machine behind that number. At iExcel I run a mass-tort reply engine, an AI voice system that calls back people who just asked a law firm for help online. Mass tort means many people harmed by the same product, drug, or disaster, each with a similar claim.

The system calls each lead, screens the claim, and moves qualified callers toward a signed retainer, the agreement that makes someone a client. Fifty thousand calls taught me one model with three gates. The call has to get answered, then trusted, then filed.

Miss any gate and the other two are worth nothing. What follows is the field manual I wish someone had handed me at call zero: what moves each gate, where I broke things, and the checklist I now run on every demo.

The phone is the most brutal surface in AI

A chatbot gets patience. People squint at a clumsy answer, rephrase, and try again. A phone call gets seconds.

People hang up mid-sentence and never come back. There is no screen to lean on, no buttons, no chance to edit a bad reply before it ships. Every word lands live, in a stranger’s ear, while their thumb rests on the red button.

Demos lie on this surface more than on any other. A demo call happens in a quiet room with a willing listener. Production calls happen in traffic, on lunch breaks, with kids in the background and a spam reflex already trained.

Scale forces humility too. At 50,000 calls, a failure that shows up once in a thousand calls has already happened to fifty real people. Rare stops meaning safe.

I spent years handling escalations as a Senior Site Reliability Engineer at Google, the calls that start when money is already burning. Voice AI runs at that pressure on every call. Nothing is a draft, and every mistake happens in public, one listener at a time.

That background shapes how I build. Reliability is a feature of the conversation itself, the same way staying online is a feature of a website. A voice agent that is brilliant nine times and broken the tenth is a broken product.

Two decades in enterprise tech taught me to respect surfaces where failure is instant. That is exactly why I work this channel. Email gets ignored and texts get skimmed, but a phone call done right is still the fastest path from “I need help” to “I signed.”

Gate one: get answered

Nobody answers a call they expect to waste their time. So the first fight has nothing to do with AI. It is about being a call worth picking up.

Three levers move pickup and callback rates in our field data. None of them are exotic.

First, local presence. A Miami lead sees a Miami number on the screen. An unknown out-of-state number reads as spam before the first ring ends.

Caller ID is your first impression, and it gets judged before your system says a word. Buy the numbers that match your leads.

Second, timing. Speed-to-lead is the gap between the form-fill and your call, and that gap decides more than any script. The person who just submitted a form about a harmful drug is sitting with the problem right now, phone in hand.

Call within minutes and you are the help they asked for. Call tomorrow and you are an interruption they have to place.

So the reply engine dials while the form is still warm, including hours when a human team is asleep. When nobody picks up, it retries at different times of day instead of hammering the same dead window. The lead who cannot answer at 2 p.m. often answers at 7.

One warning on volume. Ten attempts in one afternoon does not raise pickup, it teaches the lead to block the number. Persistence wins across days and different hours, not minutes.

Third, first-sentence honesty. The opening line says who is calling and why, tied to the form they just filled out. People stay on the line for relevance, and they hang up on mystery.

No fake familiarity either. The system never pretends to be an old friend making a casual call, because that trick collapses in one exchange and takes the firm’s name down with it.

Voicemail follows the same rule. Short, named, specific, with a clear promise of when the next call comes. Get these three levers right and the gate opens, because nothing downstream matters until it does.

Gate two: get trusted

Answered buys you seconds. Trust buys you the conversation. This is where most voice AI dies, and the autopsy is almost always the same.

Start with disclosure. Our system says it is an AI assistant up front, in plain words, before the caller has to wonder. That choice came from listening to calls, not from a legal checklist.

People forgive a machine for being a machine. They never forgive it for pretending to be a person.

The worst thing a legal-intake agent can do is dodge the robot question. When a caller asks “is this a real person,” our agent answers straight, and those calls end better.

Trust also rides on what the agent refuses to do. Ours gives no legal advice and promises no outcomes, and it says so when pushed. In legal intake, an honest “the attorney will answer that” beats a fluent guess every time.

Next, latency. Latency is the pause between the caller finishing a thought and the system starting its reply. Under a second feels like conversation, while anything longer feels broken and sends the caller talking into the silence.

We engineered the full pipeline for sub-second turns, from speech recognition to the voice itself. Dead air is where trust drains out.

Tone beats vocabulary. Mass-tort callers are often sick, grieving, or scared, and a system that rushes them sounds like a debt collector. Warm, unhurried, plain words win.

And let people interrupt. Real conversations overlap, so when a caller cuts in, the agent stops talking and listens.

Engineers call that barge-in. Callers call it being heard. A script that rolls straight over an interruption teaches the caller to stop talking.

Miami adds one more trust lever: language. A caller who opens in Spanish should hear Spanish back on the same call, without a hold or a transfer. Trust is hard enough to earn in someone’s first language.

Holding all of this steady comes down to how the agent’s instructions are layered, and that deserves its own field guide. I already wrote it: my breakdown of prompt architecture for production voice agents covers the exact stack.

A friendly call that never files is a failed call.

Gate three: get filed

Filed means a signed retainer or a case created in the firm’s system. That is the only number that counts. Not talk time, not completion rates, not how human it sounded.

I design every call backward from that outcome. Start at the signature and walk in reverse: what has to be true one minute before it, and one minute before that, all the way back to hello. Every question earns its place by moving the file forward or disqualifying fast.

Disqualifying fast is a win, not a failure. A polite twenty-minute conversation with someone who has no claim costs you money and wastes their evening. Warmth that never becomes a case is a cost center with good manners.

Backward design changes the questions too. Intake scripts love biography, but a filed case needs a handful of facts that establish the claim. The agent goes after those first and lets the story arrive around them.

The handoff is the step most teams get wrong. My system does not close. It screens, qualifies, and answers the early questions.

Then it hands the qualified caller to a human while they are still on the line, a live transfer instead of a callback that never connects. The AI does what no human team can do at volume, which is instant callbacks for every lead at every hour. The human does the one thing people still do best, the final yes.

Transfer timing matters as much as the transfer. Hand off too early and your people spend the day on unqualified calls. Hand off too late and the caller’s yes cools while the robot keeps talking.

Designing backward also keeps the scoreboard honest. I built Claresto, my Ad Operations Command Center, on the same rule: pick the number that pays and make every screen answer to it. For the deeper method, my piece on measuring AI ROI walks through choosing that number and defending it.

The failure modes I paid for

Every rule above is a scar. Three mistakes cost me the most, and I am listing them so they cost you less.

Over-automation came first. Early on I pushed the agent to run the whole call, from hello to close. The filed number told the truth: qualified callers stalled at the last step.

People will give facts to a machine all day, but many still want a person at the moment of commitment, especially on a legal matter. I pulled the close back to humans, and the system went back to its real job, which is feeding them.

Rigid scripts came second. A script is a guess about how a conversation will go, and real callers break the guess inside the first minute.

They answer question three while the agent is asking question one. They mention the injured person is their mother, not them. A script that cannot bend loses exactly the callers with real cases, because real cases are messy.

The fix was building the agent around goals and listening instead of lines. It knows what it needs to learn, not what it is supposed to say next.

Ignoring abandonment data came third, and this one embarrasses me. I did big-data marketing science at Fashion Nova, so I know what hides in drop-off data, and I still spent my first months studying only the completed calls.

The hang-ups held more information. Line up abandoned transcripts second by second and they point to the exact sentence where people quit. We rewrite that sentence, watch the next batch, and repeat.

A hang-up transcript is a free consultation from your toughest critic. Read every one.

All three failures share one root. I tuned the system to sound impressive instead of tuning it to pass the gates. The gates never cared how it sounded.

The checklist I run on every vendor demo

You do not need to build a reply engine to use any of this. You need it the day a vendor plays you a polished demo call. I walk owners through this list at GP Tuesdays, the free weekly AI training I run for Miami entrepreneurs, and it holds up against any pitch.

A vendor that survives that list is selling a system. A vendor that only survives the demo is selling a recording.

Run the same list on my system. A builder who designed for the gates has nothing to hide, and an owner who tests the gates cannot be sold a demo.

None of this is legal-industry magic, either. Swap the signed retainer for a booked job, a scheduled estimate, or a paid deposit, and the three gates hold for any business that lives on inbound leads.

This is the operating model I laid out in my essay on the fractional Chief of AI: embed with the team, ship the system, hand it off running. Voice is simply the surface where that discipline shows fastest, because the phone punishes sloppy work in seconds.

If leads fill out your forms and nobody calls back within minutes, that is the first system we fix together. I’m the Fractional Chief of AI for owner-led businesses from $5M to $50M that know they’re behind, and I take you from watching AI happen to running on it in 90 days. Book a strategy call.

Micah Berkley
Micah Berkley

Fractional Chief of AI in Miami. Ex-Google Cloud Architect, ex-BMW ML. I help companies put AI to work, and teach the next generation to build with it.

Keep reading