
Inferact Team
AI infrastructure startup Inferact launched Wednesday with $150 million in seed funding at an $800 million valuation to commercialize vLLM, an open-source project that dramatically reduces costs and improves speeds for deploying large language models. The round was co-led by Andreessen Horowitz and Lightspeed Venture Partners with participation from Sequoia Capital, Altimeter Capital, Redpoint Ventures, ZhenFund, Databricks Ventures, and the UC Berkeley Chancellor's Fund.
The company was founded by the team behind vLLM, including computer science professor and Databricks co-founder Ion Stoica, who directs the University of California at Berkeley's Sky Computing Lab where vLLM was originally developed in 2023. Co-founder Woosuk Kwon serves as technical lead for the project, which has grown to more than 2,000 code contributors since its inception. The founding team includes several Berkeley AI researchers who recognized that AI inference—the process of running trained models to generate outputs—represents a critical bottleneck limiting enterprise AI adoption.
vLLM accelerates inference workloads by applying optimizations that improve how large language models utilize computing resources. The technology addresses memory management inefficiencies that occur when models generate responses, enabling significantly faster processing and lower operational costs. Companies including Stripe have reportedly reduced inference costs by over 70 percent using vLLM, demonstrating tangible economic benefits that attracted substantial venture investment despite the project's open-source nature.
The tool optimizes random access memory usage through a technique called PagedAttention that more efficiently allocates memory during inference operations. Traditional approaches waste significant resources by reserving large memory blocks that remain partially unused. vLLM's optimization allows more concurrent requests to be processed using the same hardware, effectively multiplying the capacity of existing infrastructure. The technology also enables models to generate multiple tokens simultaneously rather than sequentially, reducing latency for end users.
Inferact's emergence reflects a fundamental shift in AI industry priorities from training increasingly large models to deploying them cost-effectively at scale. Model training dominated venture investment and technical focus through 2024 as companies competed to build more capable foundation models. However, as models reached commercial viability, the economics of serving billions of inference requests became the determining factor in profitability. Organizations deploying AI applications discovered that inference costs often exceeded training expenses over a product's lifetime, creating urgent demand for optimization technologies.
The startup plans to launch a paid serverless version of vLLM that automates infrastructure provisioning, updates, and operational management while keeping the core project open-source. This commercialization strategy mirrors successful precedents including MongoDB and Redis, which built large developer communities through open-source projects before monetizing enterprise-grade managed services. Woosuk Kwon wrote in a blog post that Inferact envisions making AI deployment "as simple as spinning up a serverless database" rather than requiring dedicated infrastructure teams.
The $800 million valuation substantially exceeds typical seed-stage benchmarks, reflecting investor conviction that inference optimization addresses a systemic bottleneck with trillion-dollar market implications. A 2025 Carta report found AI startups command median seed valuations of $19 million compared to $15 million for non-AI companies, but Inferact's valuation dwarfs these figures by more than 40 times. Venture firms are betting that whoever controls the infrastructure layer enabling cost-effective AI deployment will capture enormous value as the technology scales across industries.
The investment comes as multiple AI inference startups attract substantial capital. Inferact's launch mirrors the recent commercialization of SGLang as RadixArk, which secured funding at a $400 million valuation led by Accel according to sources familiar with the transaction. Both projects emerged from UC Berkeley's AI research ecosystem and address similar inference optimization challenges. The parallel commercialization efforts suggest investors view the inference infrastructure market as large enough to support multiple well-funded competitors rather than winner-take-all dynamics.
Enterprise adoption patterns validate investor enthusiasm for inference optimization. A 2026 Dynatrace report found 88 percent of organizations now use AI in at least one business function, but only one-third have scaled AI enterprise-wide. The primary barriers cited include legacy system integration complexity, data governance concerns, and prohibitively high inference costs. Inferact's technology directly addresses the cost barrier by improving hardware utilization and reducing per-request expenses, potentially accelerating the transition from proof-of-concept deployments to production-scale implementations.
Cloud infrastructure providers are simultaneously investing in inference optimization, reinforcing the strategic importance of the technology category. Amazon Web Services introduced tools including Amazon Bedrock AgentCore and Trainium3 UltraServers at re:Invent 2025, emphasizing scalable deployment for AI agents. While AWS has not partnered with Inferact, the alignment in strategic priorities suggests the inference bottleneck has become a central concern for the entire cloud computing industry as AI workloads proliferate.
vLLM currently operates under the governance of the PyTorch Foundation, which manages the project independently from Inferact's commercial operations. The startup will continue supporting the open-source community while building proprietary enterprise features including advanced security controls, compliance certifications, and dedicated support contracts. This separation aims to maintain developer trust in the open-source project while creating differentiation for Inferact's commercial offerings targeting large enterprises with stringent operational requirements.
The transition from open-source project to venture-backed startup represents a common maturation path for infrastructure technologies that achieve critical mass. Maintaining the free community version ensures continued adoption by developers and small companies while establishing vLLM as a de facto standard. Enterprise customers requiring higher service levels, integration support, or compliance certifications then become candidates for Inferact's paid offerings, creating a funnel from open-source users to commercial customers without alienating the community that built the project's initial momentum.




