The idea of separating the responsibilities of an agent into a decision-maker and an executor means that you need to have some awareness at design time of how to build your agent so that these responsibilities are not blurred, and that begins with understanding what the LLM component should and shouldn't be asked to do.
As mentioned, the dynamic execution component of a Camunda Process Agent is an ad hoc subprocess. Configuration of this component is really high level and intentionally leaves out certain details in order to make for more maintainable agents. So in this particular scope, the important parts to configure are:
- The LLM you want to use
- The system prompt describing the agent's goal
- The user prompt describing the specific request
Now obviously agents need a tool set to get the job done, but details of the tools are intentionally decoupled from the high-level configuration. This means that if you've designed your agent correctly, the agent's tools can be added, removed, or replaced without needing to update the high-level configuration of the agent itself. So with that in mind, how are these three configurations intended to be written and designed?
Choosing an LLM for your agent
The choice of LLM greatly depends on what exactly you require the LLM to achieve. Camunda lets you connect to a list of LLM providers (e.g., Bedrock and OpenAI), but you can also connect to your own locally hosted provider. In the grand scheme of things, the provider isn't that important, but the specific LLM that you choose certainly is.
The choice really depends on what the agent is being asked to do. All agents you build will be required to take in some user prompt and, based on their goal, choose one or more tools from a list that can get them closer to fulfilling that goal. They may also need to find or generate the inputs for those tools, and they also need to assess the results of the tools executed. In general, this is not a complicated requirement for an LLM. Things get more complicated when you take a look at the details required by the tools and the prompt.
Some things to consider are:
- Does this agent need to process documents?
- Will the user prompt of tools request the agent to read a PDF?
- Does the agent need to recontextualize data from one format to another?
- Maybe one tool produces JSON while another needs that data in Markdown?
- Does the agent need to deal with the potential of contradictory output from tools?
- There could be multiple datasources, public and private, that need to be accessed to get full context.
- Will the agent need to hold a large context to solve the problem?
- There could be a procedure and regulations document that the agent needs to understand.
When you're able to determine the full scope of what your LLM is responsible for, you can then evaluate the available LLMs. Generally speaking, you should pick more than one acceptable option and run some A/B tests on them when the agent is complete to zero in on which is the best option for cost and accuracy.
The system prompt
The system prompt of the agent contains its fundamental understanding of how it should understand its purpose. There's a lot to consider here, but this will focus on the high-level requirements.
System prompts built for Process Agents need to be contextualized slightly differently from the ways you may have done for other tools. This is really down to that fundamental principle of having the LLM's configuration and the configuration of the tools decoupled. It brings the requirements of the system prompt up in terms of granularity. You're not going to be telling it which tools it should use or even how it should try to solve the problem (if you're able to do that, just build a deterministic process). What you're trying to do here is use the system prompt to define the overarching goals, restrictions, and requirements associated with how to be successful in solving the problem presented. In general it should contain at least the following sections.
Explain any context it would need in order to understand the world it inhabits. Include here if you need a glossary of terms for it to be understood. It's also where you can broadly explain what the scope it should operate in. This ensures that any user prompt that requests something the agent shouldn't be dealing with is ignored.
You are **Financial Report Summary Agent**, a helpful, generic agent that will be given a financial report about a company and attempt to extract all the important and pertinant information from it, enrich that understanding with data sourced locally within the bank to help generate insights about the company in question. No other document types are relevent and if they are submitted the request can be considered out of scope and can be ignored.Define the overarching goal or goals:
You have specific goals when given a document and after all goals are acheived you need to produce a final report and deliver it to an Analyst. You've succeeded when the analyist is happy with the report, if the analyist isn't completly happy they will suggest changes or ask followup clarifications. Support the analyist with these followupsDefine the requirements or limitations in achieving those goals.
Your goals are as follows:
1. You also need to make sure that **EVERY** person or entities referened in the document are validated against existing data sources. If Any person or entity that cannot be verified, make sure to note your attempts in verification and why it didn't work.
2. Extract all relevent financial information. Find the list of required formula to run for this document type. run the computation of those formula with the discovered finalcial data and show this to the user
3. Generate a high-level summary on the financial soundness of the company and persons invovled based on what you've learned.Practical advice on how to achieve the goal:
You can call the same tool multiple times by providing different input values. Don't guess any tools which were not explicitly configured. If no tool matches the request, do not try to generate an answer. If you're not able to find a good answer, return with a message stating why you're not able to.Another important and unique aspect of Process Agents is that they should not be expected to create new data—they are allowed to re-contextualize existing data, but they should not be expected to create anything from thin air. All the new data they get should come from existing tools.
Tools are provided, you should prefer them instead of guessing an answer. In fact you should never ever need to create data from thin air, only use data that has resulted in tool execution. Only ever recontextualize data.
If there's some data that you need for tool execution that you don't know you cannot use that tool.Finally and incredibly important is to help the agent understand what it should do if it's unable to succeed in its goal.
If, after utilizing the entire toolset at your dispostal you're unable to acheive one or more of these goals - make sure to give a detailed summary of what you tried to do, why it could be acheived and you suggestions for next stepsThe easiest way to understand how to achieve this is by understanding what exactly to include and what specifically should be excluded.
A system prompt can include:
- Definitions of relevant terms
- Defining things like the "user" or the "request" will help it relate them better to the goal it's trying to achieve, e.g.: "A user in this case is an internal employee who needs support."
- Clearly defined goal(s) for completing the subprocess
- The agent's primary purpose is to succeed in whatever goal it's given. But make sure that goal is defined in a way that makes sense for the overall process. There's a big difference between "Create a report on a required company" and "Have a report on a company accepted by the end user."
- How to proceed when the goal(s) cannot be achieved
- As important for the agent as how to succeed is how to fail. If you do not clearly define for the agent what it's supposed to do if its goal is not achievable for any reason, it's going to do weird, unpredictable things like search for loopholes in the system prompt or maybe just make up new success criteria.
- Clear guidelines for judgment of success or failure
- Often there are specific ways in which success can be defined. With restrictions or requirements, make sure that the agent has all the guidance needed to understand what the criteria for success really are.
- Restrictions on introduction of new data to the process
- This is simple and important. Your agent should only get new information or data from the tools provided. Clearly state that it should not generate data, but it is okay for it to manipulate existing data into new formats.
- Clear instructions that it should only use available tools
- The list of tools and their definitions are provided to the LLM by Zeebe; it should only pick from that list when deciding what to do.
A system prompt should not include:
- Any references to specific tools
- Don't say things like "Make sure to use the email tool to send the result to the client." If there is a specific guideline, be more high-level: "Make sure the result is communicated to the client using the tools provided." Doing this makes it easier to either add or replace new methods of communication without needing to touch the system prompt.
- Unnecessary granular context for the goal
- Don't say: "When sending an email, make sure to use the same language as the client." You can either be more high-level — "Communication to the client should match their language" — or you can define the communication strategy in the tool definition itself.
- Dependencies between different tools
- Don't tell the LLM that a certain tool always needs to be run after another—this can be modeled in BPMN as a specific dependency within the subprocess. It's much more reliable to model it.
Generally speaking, the system prompt should also be hard-coded rather than dynamically populated by a process variable. This prevents any kind of injection at run time, but it also makes it easier to debug when the agent does weird stuff—which will happen depending on the LLM. It's good to have some constants when dealing with dynamic execution, and the system prompt is a great thing to keep constant for all instances of a specific version.
The user prompt
The user prompt is the least complicated part of the configuration. It's the initial request that the agent needs to find a solution for, and generally speaking it's either plain text from a human or maybe a document, or perhaps both. Because Process Agents are best used in situations where deterministic execution isn't possible, the user prompt is often a little unpredictable in terms of format or content. This is not a problem for an LLM, and you just need to be aware that you don't necessarily need too much "pre-processing" of the user prompt before handing it over to the agent. That said, there can be legitimate reasons to do this; preventing the agent from wasting time and money on irrelevant requests or routing only requests from pre-cleared users are some examples. This is also the place where attempts at prompt injection can take place — if your prompt is directly exposed to the web or unverified users you should implement a prompt firewall to catch bad actors before the prompt is passed along to the agent for processing.
Configurations to consider
The LLM has some additional configurations. You should really understand all of them before going into production, but there are some pretty important ones that'll be mentioned here because they'll help make sure the agent is running both efficiently and more predictably.
Maximum tokens: The maximum number of tokens to allow in the generated response. The default value is the maximum allowed value for the model that you are using. You might want to lower this to make the agent cheaper to run, especially in scenarios where the responses from the agent never really need to be long or complex.
Temperature and Top P: Both of these settings control the creativity the agent will use in its response. It differs based on the model, but when generating for an Agentic Orchestrator you're not going to want a lot of creativity.
Include Agent Context and Agent Context: If you're interested in persisting the agent's understanding of the process it's working within the process, you should select the Include Agent Context setting. Then you need to define how it is stored. That's what the Agent Context field is for — what you enter here will be the name of the variable containing the context. Details about the LLM's context of the problem, tool execution, and conversations are stored here and are very useful when debugging and later assessing how the agent performed. By default this variable is stored as a process variable, but often these contexts can get quite big — so if you're expecting the agent's context to go over about 5 MB we suggest changing the Memory storage type from In Process to Camunda Document. Then it'll be kept in an external document store where file size limits aren't an issue.
Maximum Model Calls: This is an important guardrail for the LLM to prevent it looping infinitely while trying to solve the goal. It's generally the answer to the question, "How many model calls should it take to solve the problem?" While the default is 10, you should really consider this question for the use case and select an appropriate number. After the calls are used up, it'll stop the agent and throw an incident.
In the next post I'll go into detail about how exactly the tools needed by the agent should be designed and implemented. This should give a full picture of Camunda Agent design.
