LiveKit Founders

LiveKit raised $100 million in Series C funding at a $1 billion valuation to expand its real-time voice and video infrastructure that powers ChatGPT's Advanced Voice Mode and handles over 3 billion voice AI calls annually. Index Ventures led the round with participation from Salesforce Ventures, Hanabi Capital, Altimeter Capital Management, and Redpoint Ventures. The funding arrives just 10 months after LiveKit's Series B in April 2025, reflecting accelerating demand for voice AI infrastructure.

Founded in 2021 by Russ d'Sa and David Zhao, LiveKit began as an open-source project for building real-time audio and video applications during the pandemic Zoom era. The company transitioned to offering managed cloud services when enterprises demonstrated strong demand for professionally supported infrastructure. LiveKit now serves over 100,000 developers and powers voice applications across OpenAI, xAI, Salesforce, Tesla, Meta, Spotify, 911 emergency operators, and mental health providers.

The startup's technology addresses fundamental infrastructure challenges unique to voice AI applications. Traditional web applications use HTTP protocol designed for moving text data in stateless, independent requests. Voice AI requires continuous real-time connections where agents maintain conversational context across sessions lasting minutes or hours. LiveKit built a global network of data centers functioning as unified fabric optimized for ultra-low latency voice and video routing between AI agents and users worldwide.

The company recently partnered with telephony carriers to link LiveKit's network directly to public switched telephone networks, enabling the lowest possible latency for phone-based voice agent interactions. This infrastructure handles the orchestration of multiple AI models required for each conversational turn including speech-to-text conversion, turn detection to identify when users finish speaking, language model processing, and text-to-speech generation. Managing these components with minimal latency determines whether voice interactions feel natural or frustratingly delayed.

LiveKit released its Agents framework in September 2023 alongside OpenAI's ChatGPT Voice Mode launch, providing developers with tools to build custom voice AI applications using the same infrastructure powering ChatGPT. When LiveKit raised its Series A, investors told the founders that voice interfaces to AI models were "three to five years out" according to company statements. The landscape transformed dramatically following OpenAI's GPT-4o demonstration in May 2024, after which voice AI transformed from speculative future technology to immediate industry priority.

The company identifies two broad categories of voice agents being built on its platform. Open-ended agents engage in meandering conversations covering unpredictable topics without predetermined structure. Closed-loop agents follow specific workflows with defined objectives like processing insurance claims, scheduling appointments, or conducting structured interviews. LiveKit's infrastructure supports both patterns through its developer framework enabling Python or Node.js programs to function as participants in real-time communication sessions.

Enterprise deployments demonstrate the technology's business impact. Hello Patient uses LiveKit voice agents to manage hospital workflows. Salient deploys voice agents for automotive loan servicing. Podium implements AI employees handling sales, scheduling, marketing, and customer support across organizations. These implementations

signal that voice AI has progressed beyond experimental pilots into production systems managing critical business operations.

The $1 billion valuation reflects investor conviction that 2026 represents voice AI's inflection point for broad deployment across thousands of use cases. CEO statements indicate the company anticipates voice becoming the default computer interface, positioning LiveKit as backbone infrastructure for this paradigm shift. The financing will expand compute, storage, and network services while scaling infrastructure for voice-driven and computer vision applications.

Keep Reading