Why we Re-Implemented BPMN Multi-Instance Support in 7.3

Have you ever experienced bugs with multi-instance activities? You may choose from any of these: CAM-986, CAM-1731, CAM-2075, CAM-2338, CAM-2787, CAM-2897, CAM-3851, CAM-3925.
From the engine’s early beginnings, its multi-instance implementation was more of a quick hack than a durable solution. Yet, it was carried from release to release, the pile of bugs and hair lost by desperate developers growing steadily. With Camunda BPM 7.3, we have refactored multi-instance fundamentally, drying one of the largest bug sources and fighting developer bald-headedness.
This post provides insight into the engine’s execution model, two alternatives of treating multi-instance in that model, and why we believe our recent changes have dramatically improved the situation.

On Process Execution

In order to understand the implementation of multi-instance, we have to make a quick excursion into how the process engine executes a process model. Let us consider the following process (without multi instance):

In order to execute an instance of this process, the process engine needs two things:

Activity Model: A representation of the process model that allows to reason about the causality of activities and other execution-relevant aspects

Execution State Model: A representation of process instance state, like tokens

For the first problem, the process engine parses the BPMN 2.0 XML and creates a ProcessDefinition that contains representations of all the activities in the process model. This is not a loose collection of activities. Instead it maintains the necessary relations between activities required for process execution. These relations are either represented as sequence flow (internally called transitions) in case of direct causality or a parent-child relationship in case that an activity is contained within another. The example process is represented as follows:

Activities are represented as blue boxes and may be related by a happens before or a parent-child relationship. These relations do not suffice to represent all aspects relevant to execution, which is why activities have further properties. The most important property is the activity behavior (yellow boxes) that implements what the activity means in the BPMN diagram, such as creating a task in a user’s task list.
For the execution state model, i.e. to represent which activities are currently active, the process engine has a concept called executions. An execution can be understood as something in between an activity instance (meaning that for every active activity, there is always at least one execution) and a token (meaning that executions can move from one activity to the next).
These two concepts allow to define a simplified model of process execution:

When a process instance is started, the process engine creates an initial execution on the start event of the process model

An execution can be used to execute an activity and temporarily represent the corresponding activity instance

When an activity instance has ended, the process engine evaluates the activity’s outgoing sequence flow and executes the next activity (potentially re-using the current execution; token-like behavior)

When a scope is executed, new executions are created that execute activities contained within that scope (the current execution cannot be re-used; activity-instance-like behavior)

These tasks are implemented in the very core of the Camunda engine, also referred to as the Process Virtual Machine (PVM). Note that the second task of the above list is much more complex than it looks at first sight. In detail, the follow steps need to be performed:
Before executing the actual behavior (called preparation phase in the following):

Initialize the activity instance

Create a job for asynchronous continuation (asyncBefore)

Invoke execution listeners for the activity instance’s start event

Execute the activity’s input variable mappings

If this activity defines events (e.g. a boundary event), create the necessary event subscriptions and jobs

Create a history event for the new activity instance

Executing the actual behavior (called execution phase):

Invoke the activity’s implementation of org.camunda.bpm.engine.impl.pvm.delegate.ActivityBehavior

After executing the actual behavior (called finalization phase):

Invoke execution listeners for the activity instance’s end event

Execute the activity’s output variable mappings

Delete event subscriptions and jobs created before executing the behavior

Create a history update event for the finished activity instance

Create a job for asynchronous continuation (asyncAfter)

Tear down the activity instance

All of these concerns are cross-cutting. Regardless the type and behavior of an activity, they need to be executed for every single activity instance. The PVM implements these concerns in a mostly solid and clean way. Speaking in terms of the activity model diagram above, the PVM is designed to execute these aspects when the blue boxes are instantiated.
With multi-instance the game becomes a little more complicated.

Multi-Instance in the PVM Model

There are different understandings on how multi-instance fits into the PVM’s execution model. From 7.2 to 7.3, we have revised our understanding and re-implemented multi-instance based on a different view. The two concepts are:

Pre 7.3: Multi-instance is an aspect of the activity’s behavior

7.3: Multi-instance is represented by a dedicated scope in the activity model, like an embedded sub process

For explanation, let’s consider a slightly changed process model where the activity Write Blog Post is now a parallel multi-instance activity:

blog post process with parallel multi-instance activity

Pre 7.3 Multi-Instance

In Camunda versions prior to 7.3, multi-instance is understood and implemented as an aspect of an activity’s ActivityBehavior. That means, the actual ActivityBehavior (e.g. the behavior of invoking a web service in case of a service task) is wrapped in a multi-instance-specific behavior. In the activity model, this looks as follows:

activity model with a multi-instance-specific behavior

When this behavior is executed, it creates as many activity instances (= executions) as there are configured in the multi-instance loop characteristics and triggers them to execute the wrapped behavior.
However, this solution does not fit well with the PVM’s execution model: As mentioned above, the execution of an activity instance is divided into (1) preparation, (2) execution, and (3) finalization phase and therefore spans much more than the invocation of the activity behavior. Let us consider what happens when executing an instance of Write Blog Post with this model:

An execution (token) encounters the activity Write Blog Post; the PVM executes the preparation phase in the context of that execution and creates a new activity instance

The PVM executes the execution phase and accordingly the multi-instance activity behavior

The multi-instance activity behavior has to evaluate how many instances are configured and generate as many additional executions

The multi-instance activity behavior has to perform the user task activity behavior in the context of these executions

The multi-instance activity behavior must join these executions when they have finished execution and trigger process continuation when the last one has finished

The PVM executes the finalization phase in the context of the execution leaving the activity

The problem with this sequence is in the steps 3 to 5. Aspects that are part of the preparation and finalization phase and thus covered by the PVM must now be performed by the multi-instance activity behavior. For example, it must ensure that execution listeners are invoked for each of the configured instances. This is not as trivial as it sounds, since the listeners for the first instance have already been invoked during the regular preparation phase in step 1. It is as if the PVM regularly executes a single instance of the multi-instance activity and leaves it to the activity behavior to realize:

Wait a second. This is multi-instance. I should create some more instances.

Similar to the issue with listeners, there are problems with each of the aspects executed in the preparation/finalization phases, resulting in a lot of code duplication and a lively source of bugs.

Multi-Instance in 7.3

In Camunda 7.3, we changed the notion of multi-instance in the core engine fundamentally. Our change is based on the notion of a multi-instance body. A multi-instance body is a scope that contains the actual activity for which multi instance is configured (in the following referred to as the inner activity).

Representing the body explicitly as a scope in the activity model is a convenient way of leveraging the PVM’s execution model of preparation, execution, and finalization phases for multi-instance. When an activity instance of the body is executed in the context of an execution, the multi-instance activity behavior now only creates executions as configured in the loop characteristics and then tells the PVM to execute the inner activity as often as needed. All activity instances, the instance of the body and the instances of the inner activity, are now handled by the core PVM.
As a side note: The multi-instance body is not something we have made up ourselves. The BPMN 2.0 specification mentions it in exactly one line (Section 10.4.7, page 281):

BPMN has the following model elements with scope characteristics:

Scopes are used to define the semantics of:

Choreography
Pool
Sub-Process
Task
Activity
Multi-instances body
Visibility of Data Objects (including DataInput and DataOutput)
Event resolution
Starting/stopping of token execution

So the BPMN specification does foresee the need of multi-instance activities for an extra scope in which events or variables can be defined. Sadly enough though, it does not define the concept of a multi-instances body any further. Whether it is meant to be an actual activity (with a proper instance lifecycle) is left to the reader’s imagination.

What do we gain?

Apart from improved code quality and maintainability, treating multi-instance body and inner activity as two separate things allows us to differentiate between them when executing any of the cross-cutting concerns of activity execution. To be more precise:
Activity instances: Have a look at the following process instance as shown in Cockpit:

In the tree of activity instance, it is now possible to represent the multi-instance body and relate single instances of the inner activity to instances of the body. It looks as follows in Camunda 7.3:

The following shows the same process state in Camunda 7.2:

In 7.2 and earlier, it is impossible to tell if both instances of MI Subprocess belong to one multi-instance activity instance with two inner instances or to two multi-instance activity instances with one inner instance each.
History: Similar to the previous point, the multi-instance body is now logged in the process engine history including start time, end time, and duration. This way it is easily possible to determine how long all instances have taken.
Process Instance Modification: The re-implementation made it possible to modify active multi-instance activities with our new 7.3 feature. We had literally no idea how to build this with the pre 7.3 multi-instance concept.
Asynchronous Continuation: While not yet implemented, it is going to be possible to make either the multi-instance body (already works) or the inner activity asynchronous (does not work yet). The latter is a useful addition in cases of true parallelism since synchronization of inner instances can then be asynchronously performed after their actual work is done.
Explaining Multi-Instance: Summing up the previous points: It is now much easier to relate execution aspects to either the multi-instance body or the inner activity instances. Multi-instance and its behavior in Camunda can now be easier communicated and understood by both users and developers.

Back to the blog

Start the discussion at forum.camunda.io

On Process Execution

Multi-Instance in the PVM Model

Pre 7.3 Multi-Instance

Multi-Instance in 7.3

What do we gain?

Try All Features of Camunda

What Native Agentic Architecture Actually Looks Like

AWS vetted Camunda so you don't have to

ProcessOS Field Notes: What We are Hearing So Far

What Native Agentic Architecture Actually Looks Like

AWS vetted Camunda so you don't have to

ProcessOS Field Notes: What We are Hearing So Far

On Process Execution

Multi-Instance in the PVM Model

Pre 7.3 Multi-Instance

Multi-Instance in 7.3

What do we gain?

Try All Features of Camunda

Related Content

What Native Agentic Architecture Actually Looks Like

AWS vetted Camunda so you don't have to

ProcessOS Field Notes: What We are Hearing So Far

Related Content

What Native Agentic Architecture Actually Looks Like

AWS vetted Camunda so you don't have to

ProcessOS Field Notes: What We are Hearing So Far