Dylan Fox didn’t just build another speech-to-text API. He’s building the future of audio intelligence — and he’s not afraid to show you the benchmarks.
As the Founder and CEO of AssemblyAI, Fox leads one of the most advanced Speech AI platforms in the world. With clients like The Wall Street Journal, Spotify, and NBCUniversal, AssemblyAI processes over 30 million inference calls per day. And with over $115 million raised, it’s among the best-funded companies quietly reshaping voice tech.
But here’s what makes Fox unique — he takes people behind the curtain.
As seen in Millionaire MNL, his recent LinkedIn posts read like a public lab notebook. You won’t find fluff. You’ll find benchmark comparisons, product drops, and hard truths about AI development in 2025.
“We just open sourced the strongest Whisper model ever trained”
In one of his most talked-about posts, Fox announced the open-sourcing of WhisperX-Large-V3, AssemblyAI’s new state-of-the-art model — fine-tuned from OpenAI’s Whisper. He posted benchmarks that showed massive performance improvements, including:
- 3.7% lower Word Error Rate than OpenAI’s Whisper-Large
- 16% fewer hallucinations
- Better diarization across noisy audio
Why did they open source it? As Fox explained:
“We want to push the ecosystem forward — even if it means helping our competitors improve.”
The post went viral in the AI research community, with top engineers and founders resharing the benchmarks. It wasn’t just a product update. It was a power move.
Slam-1: A promptable Speech Language Model for developers
Another standout post revealed Slam-1, AssemblyAI’s first promptable speech model. Fox called it the “ChatGPT of voice” — but for structured developer use.
Slam-1 supports custom prompts like “return all numbers mentioned” or “list medications,” improving rare word accuracy by 66% and reducing formatting errors by 27%.
As Fox put it:
“We believe speech models should be programmable, not static.”
The release sparked conversations about how AI developers need more control, not more black-box APIs — positioning AssemblyAI as the antidote to generic LLM integration.
“We beat Google, AWS, and Azure on transcription accuracy”
If there’s one thing Fox is not afraid of, it’s competition.
In a post highlighting Universal-1, AssemblyAI’s flagship transcription model, he broke down how it outperformed Big Tech:
- Trained on 12.5M hours of audio
- 22% better accuracy than Google Cloud Speech-to-Text
- Robust across accents, file formats, and domains
Fox wrote:
“Accuracy isn’t optional. It’s the entire product.”
The post showed side-by-side error comparisons and prompted dozens of developers to test AssemblyAI for the first time. It wasn’t just a flex — it was a conversion funnel disguised as a transparency drop.
“Fast Company just named us one of the most innovative companies of 2025”
One of Fox’s most personal recent posts celebrated a company milestone: AssemblyAI’s inclusion in Fast Company’s Top 50 Most Innovative Companies list for 2025.
While most founders post a press quote and move on, Fox used the moment to reflect:
“We almost shut down three times in the early days. Now we’re powering 30M+ daily API calls. Thank you to our team.”
The post was a reminder that despite the technical breakthroughs, AssemblyAI is still a startup — and Fox still sees himself as an operator first, CEO second.
Why Dylan Fox might be leading the most important AI company no one talks about
AssemblyAI isn’t building consumer-facing magic tricks. It’s building infrastructure — the kind used by hospitals, newsrooms, financial institutions, and software companies that need speech intelligence at scale.
From speaker detection and summarization to sentiment analysis and audio redaction, AssemblyAI is powering features that most end users never see — but feel.
Fox’s posts make it clear: he’s not chasing buzz. He’s setting benchmarks.
Millionaire MNL News is a global news platform spotlighting business developments and remarkable individuals in entrepreneurship and lifestyle.