“Give me a token. Give me a token. I want it fast. I want it cheap. I want it now.”
This is the mantra for developers building software based on generative AI models, or at least what Parasail CEO Mike Henry is hearing. Parasail provides cloud computing services to companies running AI models for inference, and Henry told TechCrunch that the company generates 500 billion tokens per day. What about tokenmaxxing?
Henry is an executive at Groq, an LLM-focused chipmaker where he built the company’s cloud products. We recognized early on that developers building software based on AI models wanted cloud processing specific to their needs. Now, coming out of stealth a year ago, Parasail has raised $32 million in Series A to do it at scale.
Although Henry has experience in physical chip design, Parasail is not keen on owning its own chips. Although some of its GPUs are manufactured in-house, the company primarily rents processing time at its 40 data centers in 15 countries around the world, purchases additional ones from liquidity markets, and orchestrates everything behind the scenes to reduce the cost of inference requests.
By allocating workloads wisely and avoiding peaks in demand, the company aims to compete with companies that own their own silicon and may be constrained by existing customer commitments and workloads.
The company’s potential depends on the continued adoption of its open source models and agents outside of Frontier Labs. Parasail executives and investors say this is due to the increased cost and friction of using products from companies like Anthropic and OpenAI.
Instead, hybrid architectures are emerging, according to Andreas Stuhlmüller, CEO of Elicit. Elicit is a startup that raised $22 million in Series A to develop a research assistant for scientific literature. His clients, leading pharmaceutical companies, use LLM-based tools to review and analyze data from tens of thousands of scientific papers.
tech crunch event
San Francisco, California
|
October 13-15, 2026
“Sending 100,000 requests to an API endpoint is a huge pain, so we moved more towards an open model,” Stuhlmüller told TechCrunch. Especially now, the company relies on agents to improve its services, divide tasks and do more strategic work over longer periods of time. The open model handles the initial screening to reduce labor costs before the more capable frontier model provides the final answer.
Agents are becoming an increasingly common part of software development, and the proliferation of model queries is driving investment in companies like Parasail that provide infrastructure for inexpensive inference. Samir Kumar, a partner at Touring Capital who co-led the round, told TechCrunch that he expects inference to account for at least 20% of the cost of building software in the future.
How much of that market does Parasail represent?In the crowded cloud computing space, Henry argues that his company’s focus on inference (no training allowed) and willingness to accept start-up customers without long-term contracts sets it apart from larger cloud computing companies focused on enterprise businesses and even better-funded competitors in the cloud inference space such as Fireworks AI and Baseten.
Of course, a different type of risk exists when all your customers are seed and Series B startups in the unpredictable AI space.
Steve Jang, partner at Kindred Ventures, another co-leader of the funding, said the economics of model deployment necessitate a compute intermediary like the one provided by Parasail. That was before models for content generation and robotics were widely used.
“Everyone thought there was an AI bubble. There is no AI bubble,” he told TechCrunch. “We estimate that demand far exceeds supply.”
