As LLM became more powerful, hallucinations proved stubbornly difficult to avoid. Even the smartest models will make errors. There are ways to detect these errors, but the industry is still figuring out the best methods.
Perhaps the company, which just raised $9 million in seed funding from Andreessen Horowitz, is trying to build a more rigorous method to spot these mistakes.
As stated by founder Peter Elias (pictured above), the company’s goal is to prevent hallucinations and simple factual errors from reaching users, and to achieve 99.99% accuracy, which is common in deterministic systems but much harder to achieve with AI. Ultimately, bringing LLM to that level of precision requires rethinking many of the fundamental assumptions of AI engineering.
Perhaps the company’s first product is a data science tool built to derive quick answers from complex datasets. Each result is accompanied by a citation and audit trail about how it was developed. This is an increasingly common technique among AI tools.
But preventing errors from creeping into these summaries required an elaborate harness system that Elias describes as a “data science mech suit.” The LLM’s first pass answers are checked against a deterministic validation system, and results that do not match the dataset are returned. Importantly, the LLM is trained on validators and the entire system is optimized for fast and accurate answers, the company said.
“What we learned building this is that the better the harness engineering, the weaker the model can be,” Elias says. “If you can adjust the context enough, the model doesn’t have to work as hard to do the right thing. Essentially, this is an exercise in reducing ambiguity.”
This will likely allow the company’s data science tools to run on significantly smaller AI models. Elias said the current version runs on a model that is “four classes weaker than the Frontier model,” which means it can run on local hardware (i.e., on desktop computers rather than in a data center), significantly reducing the token costs associated with using AI.
This is a welcome idea at a time when token costs are rising and many customers are rethinking their AI budgets. And Elias’ ideas go beyond data science. The same engine can be extended to cover use cases such as accounting and medical services (or, in Elias’s words, “any use case where accuracy is important”).
“I think it’s very interesting that no major AI lab has even tried this,” Elias says. “We are incentivized not to do that because the more times we modify the model, the more profitable it is.”
If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.
