As I recently wrote, we are entering into the “age of agents”….This is an exciting time in the development of AI, as companies exploit the flexibility and power of AI agents to create novel solutions for business problems.
In the early days, AI development sometimes felt like a dark mystery, with lots of trial and error. Now, we are far enough along that some big “rules of the game” have emerged. So I asked some friends for their thoughts on the new rules of AI development.
1. Integrate target users deeply into every stage of development
“You should think a lot about your user” is hardly a new thought. But agentic AI demands that we do this with much more focus and depth.
Most of us have plenty of experience building tools. Now we find ourselves building agents that users will accept and collaborate with as virtual colleagues. It’s a distinction with vast implications. When you think about what it takes for a human being to understand, trust, and interact with an agent, the smallest details can matter profoundly. Get it wrong, and your solution can become useless.
“Don’t build a bauble,” says Bischof. “It’s easy to get distracted by all the possibilities AI offers us and end up building a beautiful and well-designed trinket that doesn’t add practical value. What precisely is the user trying to do? Why wasn’t it easy to build this capability before? If we use AI here, what should their interaction be with the solution? It’s impossible to answer these questions without engaging with early users.”
With AI agents, the devil is in the details. Seemingly minute shifts in tone and format of output can make all the difference in generating results that are useful. Minor aspects of UX can either pull in or alienate users. Agentic AI projects shift the pressure on the development team from “we should have a detailed, deep understanding of our user” to “we need to get the users direct opinion and input at every stage.”
This also demands that we reconsider the lifecycle of development. We are all used to thinking about beta input as being all about refinement and bug fixes, but AI agents may need to evolve significantly, even drastically, during beta and even after general release. This is because you can’t really understand users’ needs until the solution is in use, something Krenzel emphasizes.
“The golden rule of ‘ship early and ship often’ applies more to agent development than any form of product development I’ve seen,” he says. “The industry hasn’t yet converged on familiar patterns that users can lean on; once you ship you’ll find some users who are absolutely daunted by the blank canvas of an empty conversation and conversely you’ll find other users trying to do things that are absolutely impossible. You’ll have to iterate on the user experience, guardrails, and the agent capabilities to converge towards your users’ expectations. Your intuition, alone, will not get you as far as you think.”
It is only once you have the solution out there in broad use that you really get the comprehensive input you need. There is no finish line anymore.
2. Find the balance between capability and consistency
Not all users are the same. Your agent must have the openness and range of capability to serve tech-oriented experts, as well as the guidance and predictability to be approachable to new users. It can be hard to get the balance right.
“One of the beautiful things about building products that leverage LLMs is the flexibility to work with the unexpected,” says Krenzel. “The trade-off is that this flexibility can lead to uncertainty and confusion. Is the agent actually going to do what it says it’s going to? Will it do so reliably? You need to give your users confidence. Finding the right balance between handling anything a user might want to do versus consistently performing a set of tasks will be a constant trade-off that every product will face.”
Predictability is important since users will only embrace an agent if they understand and trust its results. As Bischof points out, the challenge of delivering consistent outcomes is heightened by the interconnected nature of AI, which introduces massive complexity.
“Traditional software gives the benefit of tight contracts between components, which significantly improves the ability to build reliable systems,” he says. “Machine learning systems produce much greater entropy. Doing object detection in some image pipeline? It could yield 2-2000 bounding boxes, each must be handled. So when building composable systems, your goal is to make the interfaces or types consistent while keeping the contents of each step easily iterable.”
It might seem like the goal is to create a fully predictable system. As Fulford noted in a recent presentation at NeurIPS 2023, however, it is critical to understand that extreme predictability comes at the cost of imagination and creativity.
“We probably don’t want models that never hallucinate, because you can think of it as the model being creative,” she points out. “We just want models that hallucinate in the right context. In some contexts, it is ok to hallucinate (for example, if you’re asking for help with creative writing or new creative ways to address a problem), while in other cases it isn’t.”
3. Build an open, flexible application that can evolve over time
Change is the only constant in AI development. Because the public LLMs that we are building on are constantly shifting and evolving, you need to build an application that will tolerate and even benefit from future updates.
“Ultimately, hosted models are evolving in-time with the API and the infrastructure capabilities,” says Bischof. “This means that there are material differences in how the models behave that correlate with things like new infrastructure. It also means that old prompts and strategies may degrade and have performance regressions when the model ‘updates.’ This means that your posture should be: ‘How can I ensure it’s easy to modify/experiment with any part of my pipeline?’”
Another reason why flexibility is so important in AI design is prompt optimization. Sometimes developers will write their prompt template directly into the code, a short-sighted approach that ties your hands later. If you do that, the only way to test how a change performs is to boot the app and trial it. With the rising complexity of these apps, however, you might have multiple prompt templates used in a single user input. With no way to isolate a single prompt, you will struggle to understand what is doing what.
A much better approach is to use a tool that helps you manage and iterate on your prompts and provide observability at a granular level. A tool like Rivet, which I wrote about recently, makes it much easier to iterate and evolve your solution without getting bogged down in complexity and confusion.
Share your own rules!
I hope that you find these principles helpful, but they are just a beginning. Things are still so new in agentic AI development, and no one has fully cracked the code on how best to design, structure, and execute these projects. We all have much to learn from each other. What has worked for you? What do you believe matters most?
Ironclad is not a law firm, and this post does not constitute or contain legal advice. To evaluate the accuracy, sufficiency, or reliability of the ideas and guidance reflected here, or the applicability of these materials to your business, you should consult with a licensed attorney. Use of and access to any of the resources contained within Ironclad’s site do not create an attorney-client relationship between the user and Ironclad.