“A Moonshot Effort”: How We Built Contract AI

September 19, 2023 6 min read

abstract AI generated image of. an astronaut floating in space

It’s something I love about this new era of generative AI: Anything feels possible. The technology is still so new, but there’s this sense of almost unlimited potential.

That sense of possibility exists for just about every industry and every company, but particularly so for legal. After all, so much of legal work is tied up in reading, writing and reasoning. And now, we suddenly have computers that can read, write, and reason…and they can do it in human language. So of course this is an incredibly exciting technology in our space.

At Ironclad, we have long anticipated the rise of AI. Back when we started the company in 2015, Jason Boehmig and I knew that realizing our full vision of transforming the way companies created, shared, and leveraged contracts would require a breakthrough in machine intelligence. (Our first web domain was even Ironclad.ai!) Many of our customers have huge, complex repositories of contracts, often using different structure, taxonomy, and format. It was clear that AI would have to be a foundational piece of any platform that would manage and extract insight from all of that data.

Now, with the launch of Contract AI (CAI), the first-ever AI agent capable of complex contract analysis, we are taking a huge step forward in fulfilling that vision.

It was a hard journey to get to this point, and I thought it might be interesting to others dealing with similar challenges if I shared what we learned.

“A moonshot effort”

In early 2022, as generative AI really began to emerge, we got to work on what would become AI Assist. This was a set of AI-powered tools aimed at helping customers take drudgery and manual work out of contract negotiations and adjustments. This was a great beginning, but we quickly realized that AI was capable of far more.

We knew that it could, in principle, act as an intelligent agent that could read hundreds of contracts at a time, using data visualization, search, and analysis capabilities to spot opportunities or threats. In theory, this agent could act less as a piece of software executing a carefully defined task and more as a reasoning colleague with the ability to perform open-ended exploration and assessment.

In our discussions, we thought of this agent as a “talented intern”; capable and motivated, bringing knowledge, logic, and imagination to the task. Just like any intern, it would lack perspective and context, but it could serve as a virtual partner to humans, extending the reach and depth of their analysis dramatically.

The potential value of such a capability was clearly high – but so was the development challenge associated with building it. To be effective, it would need to be able to strategize and reason, as well as to use tools intelligently and autonomously. It would also need to be accurate, meaning we would need to find a way to address hallucination.

Like so many when it first came out, I was excited by the potential of AutoGPT. The idea of self-managing AI suggested so many promising new directions and possibilities. Then, once it became clear no one could really build anything effective on AutoGPT, the concept of self-managing and self-generating AI seemed to fade with it.

Except we still felt it had potential. Essentially, we were trying to revisit that idea with CAI. Could we find a way to harness the power of self-managing AI within the parameters of our customers’ needs? And to deliver the level of precision that is required for its output to be useful and reliable?

Quickly, the project got bigger in every dimension. It required more resources, more people. It felt like a moonshot effort, and we were certainly not sure we could pull it off… but we were excited about what it could become.

A long and winding path

As we began working on the project, the obstacles began to multiply. We would encounter one technical challenge, find our way through, and then stumble onto another. In the end, we confronted three primary areas of difficulty.

1. Stretching the limits of self-managing generative AI

Because of AutoGPT, most people think that LLMs are not that good at coming up with novel things. They seemed to hit a limit in both the number of tools used and the number of steps that the system took. When you exceed a single tool and step, according to conventional wisdom, you saw a dramatic reduction in reliability and performance.

In early experimentation, we kept hitting a “hallucination wall.” The AI agent we were building was essentially an LLM feedback loop, feeding input from one LLM into another in a circular fashion. Because the nature of feedback loops is to magnify, the hallucinations would magnify as well. We needed to find a way to break the cycle.

In a sense, we needed the AI to recognize when the ideas it came up with were good, and when they were not. It needed to be able to filter out the “bad” ideas at each stage. We borrowed ideas from Google Research’s ReAct paper that helped us figure out the right balance between diversity and stabilization. Too much diversity in our settings and the agent tipped over into hallucination; too little and it lost power and effectiveness.

We had to test the agent at each loop carefully to ensure we had the balance right, but after much trial and error, we found that our tool could capably use multiple tools and plan and execute multiple steps while maintaining reliability.

2. Overcoming the logical complexity of multi-loop agentic programming

The next big hurdle we hit lay in tooling. Agentic AI development requires designing multi-step logical loops. With every step, the complexity skyrockets, making the following the flow and troubleshooting it accordingly nearly impossible. After weeks spent trying to find a way around the issue, I was ready to pull the plug on the entire project.

That’s when Andy Brenneke on my team surprised us with a clutch breakthrough. He came up with a promising new prototype tool that enabled visualization of the logical flows. The change was dramatic; now, we were suddenly able to follow and improve the agentic logic. It was a gamechanger for our team and a foundational piece for the project.

In fact, we liked this tool so much that it took on a life of its own… We expanded its capabilities, shared with some of our partner companies’ development teams, and collaborated to make it better. This tool became Rivet, an exciting new open source offering that we hope will add to the state of AI tooling in our industry.

3. Getting the user experience right

If our agent was to be useful to our users and deliver real impact, the UX had to be dialed in for their needs. We were influenced by the recent Emergence paper on the topic, which offered a helpful framework for how AI could be implemented for maximum user benefit.

The focus of our UX design was to provide a capable co-pilot and coach to the user. The goal was not to enable them to have an unfocused exploratory chat; it was to help them get something specific done. So the balance of the agent had to be tipped to be a bit more opinionated, to provide specific guidance and direction. It needed to be fast, responsive, and intuitive.

We also realized another key requirement: The agent would need to be able to “show its work.” Given the sensitive, business-critical nature of contracts and contracting data, no thoughtful user would blindly trust the conclusion of a virtual agent (or a real intern, for that matter) without being able to understand and follow its sources and analysis. So we included a “Show reasoning” option that the user could select to understand what assumptions lay behind the results.

A journey with no end

On behalf of my team and of all of Ironclad, I want to welcome our customers to CAI!

We are proud of all the work that has gone into this, but we understand that this is a continuing journey. Ultimately, these types of agents are only valuable if they are seen as easy to use and trustworthy. I believe we accomplished both goals, but we will monitor and finetune the solution over time. Most of all, we will continue to listen to our customers who are so important to helping us make the platform better.

On a final note, I love that, after so many years being seen as slow to adopt new technology and ideas, legal is now in the frontlines. It is a very exciting time for those of us who are passionate about the development, implementation and use of legal technology. I really believe that other industries will be looking to legal for ideas and best practices!

To be amongst the first to access the Ironclad Contract AI beta, join the waitlist.

Ironclad is not a law firm, and this post does not constitute or contain legal advice. To evaluate the accuracy, sufficiency, or reliability of the ideas and guidance reflected here, or the applicability of these materials to your business, you should consult with a licensed attorney. Use of and access to any of the resources contained within Ironclad’s site do not create an attorney-client relationship between the user and Ironclad.

Want more content like this? Sign up for our monthly newsletter.

Thanks for subscribing! Check your inbox for a confirmation email.

AI Contract Management Thought Leadership

Cai GoGwilt is CTO and Co-Founder of Ironclad. Before founding Ironclad, he was a software engineer at Palantir Technologies. He holds a B.S. and M.Eng. in Computer Science from the Massachusetts Institute of Technology.

“A Moonshot Effort”: How We Built Contract AI

“A moonshot effort”

A long and winding path

1. Stretching the limits of self-managing generative AI

2. Overcoming the logical complexity of multi-loop agentic programming

3. Getting the user experience right

A journey with no end

More stories from our team:

Ironclad Layers on New No-code Features to Simplify AI Customization

Ironclad’s AI Assist™ Taps the Power of GPT-4 to Make Contract Review a Breeze

Meet Rivet, An Open Source Visual Programming Environment for Generative AI