Product, Engineering Excellence

Key Principles in Designing Agents for Production – Part III: Design Principles for Agent Tooling

Understand exactly how to design and implement an agent's tools for a full picture of Camunda agentic design.

By Niall Deehan

When building an Agent in Camunda, the LLM configuration is intended to be independent of the configuration of individual tools. This means that there should be no dependencies at the configuration level between the LLM and any specific tool. This is because at run time the engine itself will gather up all the tools that have been added to the scope of the agent and pass them along to the LLM.

The idea behind this is that the LLM's own configuration may have something like, "Make sure you communicate any status change to the customer," while the tool set available could have one that says, "Use this tool in order to send an email to a customer," and another that says, "Use this tool to send a text message to the customer." This means that we're putting the agent in the position where it needs to use its reasoning in order to work out which tool would be best to use given the current context.

So when configuring tools themselves, it should always be enough to simply explain what the tool really does rather than explain anything about why the tool should be used.

What is a tool in Camunda's context?

When building an Agent in Camunda, you might conflate the idea that an agent tool = a BPMN task. While that is sometimes the case, a task is only a subset of the options available as agent tools. They can be a series of tasks, subprocesses, or even events to be triggered or waited for. The options available within the tool set of an agent are vast because it's an implementation of BPMN, and specifically because the Zeebe engine is doing the work of executing the tools rather than the LLM itself.

The illustration below showcases some of the options that you should be aware of. (Note: the dash-dot line here is only to illustrate the scope of a given tool and it's not required for the agent to run)

Overview of tool options available within a Camunda agent's ad hoc subprocess

Configuring the tools

Any given tool, when added to the ad hoc subprocess (and therefore the agent's context), needs up to three different configurations. That's true no matter what the tool does or how the tool is built. Those three configurations are as follows:

Element Documentation: This could be considered an addition to the system prompt, but instead of defining concepts at a high level, it's a low-level natural language description of what the tool is used for and any pertinent requirements or restrictions for its use.

It can be as simple as "Use this tool in order to send a communication to the client," or depending on the requirements could also include prerequisites: "Use this tool in order to send an email to the client. Ensure the customer data has been validated before using this tool and all communication procedures have been read and understood." The key here is that the element documentation should be limited in scope to information that could theoretically exist in an agent without needing to understand that agent's overall goal. That way, if an improvement or update is required in the system prompt, the tool and its description shouldn't need to be updated. Try not to have any high-level requirements embedded here; those should always be maintained in the system prompt.

Agent Input: The vast majority of tools in the agent's context will require some kind of input when called upon. In some cases, a tool will require that an agent add some specific data itself (e.g., search criteria). If you've been paying attention — this means that the agent has either found some data from the user prompt or as the output of another tool and is recontextualizing it as input for this tool. Before I get into more detail about how an agent should be creating this data, you need to always consider the question of whether the input needs to come from the agent itself.

Because we're talking about a process here, there are two buckets of data we need to think about: the global process scope and a subset of that, the agent context. The process instance will always have access to all data known by the process, while the agent context only knows what you decided to tell it. This means that it's likely that you don't need to have the agent input the data needed, but rather you can just pluck it from the global context. Let's take the example of a tool that lets you search for client data by name. If the client's name had already been added to the process previously, you should simply configure the tool to always access that variable. It means that the agent itself has one less thing to worry about (i.e., fail at) and leads to far more predictable executions within the agent context.

But once you've established that the agent is indeed required in order to populate the input of the tool, you need to understand the FEEL function fromAI(). This is just a way of telling the agent how to create the variable that's needed. For example, a tool needs the name of the customer:

fromAI(toolCall.customerName, "this is the name of the customer you're searching for")

But this function can be far more useful because it is also a great place to put safeguards, restrictions, and requirements. When an agent reads the element documentation and decides on a tool to use, it also checks to make sure that it has the required inputs. This check means you can define specific instructions about formatting either in natural language or by giving a required JSON schema that needs to be adhered to. You should always take advantage of this feature because clarifying the boundaries of one or more input variables can lead to far less unpredictable behavior from the agent.

Tool Output: It's easy to take for granted that when a tool finishes you can simply dump the result of the tool back to the LLM by putting it in a variable called "toolCallResult." What can often be forgotten are the details of how that variable is returned, because this can give you a lot of power over what the LLM actually can see.

Once a tool finishes being executed, the process engine wraps up and passes the value of the "toolCallResult" variable to the LLM. What this means is an opportunity to catch the output of the tool and parse that in order to find only the pieces that are relevant to helping the LLM solve its goal. It also saves money if you ensure you're only passing relevant tokens of data to the LLM, so on a very basic level it's a good idea. But this isn't the only use case.

This feature is particularly useful when dealing with sensitive data that you may not want passed to the LLM. Let's say you're doing a customer search: you enter the customer's name, and what the LLM needs is just the customer number — but lo and behold, the result of the tool is actually a whole bunch of data like the address, job title, and medical records associated with that customer. Well, in cases like this you can parse the results for just the data you actually want and leave the rest either as a process variable that the LLM will never see, or decide not to store it at all.

In the next part of this series I'll talk in more detail about patterns that can help safeguard data and add guardrails to your agent for a safer feeling when deploying.

Start the discussion at forum.camunda.io

Try All Features of Camunda