It all started with a frustrating week.
For days, we had been trying to build an AI agent application to help Ironclad users answer complex questions about the data locked in their legal contracts. We were inspired by libraries like LangChain and LlamaIndex, but we could not use common techniques like Retrieval Augmented Generation (RAG). The “retrieval” part of RAG does not work well when your documents (legal contracts) are mostly identical, and you want to answer questions about multiple of them. So we additionally turned to projects like AutoGPT and BabyAGI.
The work began well enough; we were able to chain together a few prompts and seemed to be on a good track. Then we hit a wall. The agent had simply become too complex to manage. We were spending so much time setting debug breakpoints and trying to figure out where things had gone wrong. The rising complexity introduced by every new prompt layer simply made it too hard to follow and reason through the program. I was about to shut down the project, resigned to this being one of those promising ideas that we couldn’t bring to life.
And that’s when Andy Brenneke, one of our engineers, came to me with something surprising. He had built a tool to visualize recursive prompt chains and used it to rebuild our agent. And suddenly, it all just clicked. The issues in reasoning became clear, the complexity of the recursive prompt chain became manageable, and we were able to develop the solution in a few rapid weeks.
A common response to a common need
We realized that the need we had hit upon – for a visualization tool built for AI agent design – was so basic that others were likely looking for the same thing. So we started asking friends and peers at other companies and sure enough, others had the same requirements. From those conversations we quickly realized that visualization alone, while helpful, was not enough.
In its rough initial form, what we had built was appropriate for building a standalone agent, but that wasn’t the only challenge confronting us and other development teams. We needed something that would help us embed an agent within an already-built application, connecting and linking appropriately to a large base of existing code.
Our prototype tool lacked other critical capabilities needed for modern, big-team development. Enterprise software isn’t built by one or two people working in a vacuum, but by diverse groups of engineers and experts working in tight collaboration to get a solution all the way from idea to final production. What we had was good for following logic and flow but little else.
To build a tool suitable for our common context and needs, we knew it would need to include the following capabilities:
- Debuggability. In building Ironclad Contract AI (yes… we’re calling our new agent CAI 🙂), the first challenge we ran into was not knowing what was going wrong in the prompt chain. By being able to actually connect to a remote debugger endpoint, we were able to increase our iteration speed several times over. Critically, this also meant that we could increase the complexity of the agent that we could build.
- Code review and collaboration. The team working on CAI needed to be able to contribute, make changes, and review collaboratively. By storing the agent prompts and chain in YAML format and versioning it with our repository, we were able to do code reviews on prompts and other agent changes within GitHub in exactly the same way we would perform review on any other type of code.
- Unit testing and evaluation. Adding more features is great, but you also need to be sure that you aren’t breaking something else in the process. We needed a testing framework to support unit-level testing, which allowed us to confidently add new capabilities to CAI.
Meet Rivet
Meet Rivet, a new visual programming environment for building with Generative AI designed specifically for enterprise development teams.
Rivet allows you to visualize the complex chains of logic while building and debugging, a gamechanger when working with LLMs. It also allows for remote debuggability, allowing you to connect to a debugging endpoint in your application (either locally or in a staging environment) and display the execution of the chain in real-time. You can also run test graphs remotely, which is especially useful for testing graphs that include function calls within your application.
The tool is built for collaboration. Embedding Rivet in your application requires importing the “Rivet-core” package and committing the Rivet graph (a YAML file) to your repository. This keeps dependencies light, and allows for code reviews of changes to the Rivet graph (like tweaks to prompts or changes in models). We envision “Rivet-core” being translated to other languages, but we started with TypeScript, which is Ironclad’s language of choice.
Let’s get to work!
At Ironclad, we believe in the power of open source and community-based innovation. So we decided to share Rivet in the hope that others would both benefit from and improve on the tool. Ultimately, we all have an interest in the development of high-quality common standards, tools, and practices for AI. Wouldn’t we get far more done, and create something much more capable and enduring, if we engage with this as a community? Why should we all invest time and resources into fulfilling common needs?
For several months, we’ve worked closely with teams at other companies on how to make the solution more effective. The early feedback from these development partners has been encouraging:
- Willow Servicing, a mortgage servicing platform, used Rivet to build their virtual servicing agent. “Rivet really addressed some limitations that we were hitting up against… and some we didn’t know we had,” says their CTO Teddy Coleman. “The visualization makes a big difference when working through agentic logic. But the ability to debug and collaborate across the team made a huge difference as well.”
- Attentive, an SMS marketing platform, is using Rivet on several projects. “Rivet’s visual programming environment is a game-changer,” says Todd Berman, CTO. “The visual nature of the tool, paired with how collaborative it is, allows us to create complex chains for AI agents in drastically less time than it would take in other environments. It completely opens up the black box on complex prompting – we’ve tried other graph-based prompt tools, and they don’t hold a candle to Rivet. It’s the best tool out there.”
- Sourcegraph, a code AI platform that makes it easy to read, write, and fix code, also embraced the tool. “Rivet is a super slick and compelling tool for prompt construction and LLM composition, particularly when you’re trying to combine AI with many existing tools and APIs,” says Beyang Liu, CTO. “I can see this becoming a popular tool for those working on robust and reliable AI applications.”
Too many people have contributed their insights, ideas, and help to this project to name them all! I do, however, want to send a HUGE thank you to these amazing people:
- Harrison Chase at LangChain
- Lauren Reeder and Jess Lee at Sequoia Capital
- Mike Knoop at Zapier
- Sarah Guo at Conviction
- Teddy Coleman at Willow
- Todd Berman and the Attentive team
- Domenic Donato and the AssemblyAI team
- Beyang Liu and the Sourcegraph team
- JJ Zhuang, Zain Adil, and the Instacart team
- Ankur Goyal at BrainTrust
- Caitlin Colgrove and Bryan Bischof at Hex
- Alan Pierce and Ashu Singhal at Benchling
- Joe Chrzanowski and Adam Evans at Airkit
- Dennis Xu at Mem Labs
- Curtis Liu and Joe Reeves at Amplitude
I hope this tool is useful to you and your teams. I can’t wait to see what changes and improvements the AI community can make to this tool! You can get started with Rivet here.
Ironclad is not a law firm, and this post does not constitute or contain legal advice. To evaluate the accuracy, sufficiency, or reliability of the ideas and guidance reflected here, or the applicability of these materials to your business, you should consult with a licensed attorney. Use of and access to any of the resources contained within Ironclad’s site do not create an attorney-client relationship between the user and Ironclad.