Bringing order to the Wild West of autonomous agents

Untested autonomous agents pose a grave threat to AI users and companies alike.
Brooke Hopkins founded Coval to enable large-scale simulations to identify potential issues and improve AI agent performance before deployment.
The company has raised $3.3 million to strengthen its simulation and evaluation platform for voice and chat agents.

When one AI agent performs well, it raises the potential for others to succeed too, because people start to see that it’s possible. That’s why we’re so passionate about enabling high-quality agents—because when the tide rises, it helps everyone and more people will be able to trust and use them.

Brooke Hopkins

Founder, Coval

We are on the brink of an explosion in AI agents, yet many of these systems remain woefully under-tested. In the face of a tidal wave of agents, we now face critical questions like,‘What will happen when untested agents are everywhere?’ and ‘Are we heading toward agent-induced chaos?’. One company is setting out to change AI agents' course, using real-world data and rigorous testing to create calm in the agent boom.

We spoke with Brooke Hopkins, Founder of Coval and former Senior Software Engineer with Waymo, to discuss how Coval is using real-world data and rigorous testing in an attempt to bring order to the Wild West world of AI agents.

Validation is vital: "I think agents are going to explode," Hopkins says. "There are really two types of agents: internal agents, which exist within their own systems, and autonomous agents that interact with external systems and users. We're going to see a big explosion of autonomous agents especially."

Autonomous agents, in particular, pose a significant risk if left under-tested. "When they're acting on behalf of people, it's crucial that they're performing tasks correctly," Hopkins explains. "The risk is that people aren’t testing agents enough, and there's a general skepticism about them because of a lack of validation."

Putting agents to the test: This is where Coval steps in. The company operates a simulation and evaluation platform specifically designed for AI agents—primarily focusing on voice and chat agents. Having just raised $3.3M and secured backing from MaC Venture Capital, General Catalyst, Y Combinator, Fortitude Ventures, Pioneer Fund, Lombardstreet Ventures, and angel investors, Coval is getting the necessary capital to stretch what they can do for agent testing.

Similarities with self-driving cars: Drawing on her experience at Waymo, where she led the evaluation infrastructure team, Hopkins realized that the challenges facing autonomous agents are strikingly similar to those encountered in the world of self-driving cars. "When you have an autonomous agent trying to accomplish a task, you need to understand all the possible paths it can take to reach the goal, and where it might go off course or behave unpredictably," she says. "This is very similar to the problems I was solving at Waymo."

Coval’s platform runs large-scale simulations to test AI agents in real-world scenarios before they are deployed. This helps identify potential issues and improve the system before launch. Once the agent is in production, Coval continues to monitor interactions, flagging transcripts that may require review and feeding these insights back into the simulation process to refine the agent's performance.

When you have an autonomous agent trying to accomplish a task, you need to understand all the possible paths it can take to reach the goal, and where it might go off course or behave unpredictably.

Brooke Hopkins

Founder, Coval

It's complicated: The complexity of testing agents goes beyond simply inputting commands and receiving expected outputs. "With agents, there's no clear-cut input and output. You have inputs, but the responses are unpredictable due to the myriad events occurring in real-world interactions," Hopkins explains. "That's why we focus on autonomous agents interacting with external systems or users. The dynamic nature of these interactions makes it very difficult to test without simulated, dynamically generated responses."

Failure isn't an option: Despite the complexity of stress testing agents, Hopkins is optimistic. "I see a lot of parallels with self-driving cars," Hopkins states. "The real risk wasn't necessarily that one company would fail, but that any failure could undermine the entire industry. If a company messes up, it creates skepticism about whether the technology will work at all. Similarly, when one AI agent performs well, it raises the potential for others to succeed too, because people start to see that it's possible. That's why we're so passionate about enabling high-quality agents—because when the tide rises, it helps everyone and more people will be able to trust and use them."