Why we Re-Implemented BPMN Multi-Instance Support in 7.3

Have you ever experienced bugs with multi-instance activities? You may choose from any of these: CAM-986, CAM-1731, CAM-2075, CAM-2338, CAM-2787, CAM-2897, CAM-3851, CAM-3925.
From the engine’s early beginnings, its multi-instance implementation was more of a quick hack than a durable solution. Yet, it was carried from release to release, the pile of bugs and hair lost by desperate developers growing steadily. With Camunda BPM 7.3, we have refactored multi-instance fundamentally, drying one of the largest bug sources and fighting developer bald-headedness.
This post provides insight into the engine’s execution model, two alternatives of treating multi-instance in that model, and why we believe our recent changes have dramatically improved the situation.

On Process Execution

In order to understand the implementation of multi-instance, we have to make a quick excursion into how the process engine executes a process model. Let us consider the following process (without multi instance):

In order to execute an instance of this process, the process engine needs two things:

  • Activity Model: A representation of the process model that allows to reason about the causality of activities and other execution-relevant aspects
  • Execution State Model: A representation of process instance state, like tokens

For the first problem, the process engine parses the BPMN 2.0 XML and creates a ProcessDefinition that contains representations of all the activities in the process model. This is not a loose collection of activities. Instead it maintains the necessary relations between activities required for process execution. These relations are either represented as sequence flow (internally called transitions) in case of direct causality or a parent-child relationship in case that an activity is contained within another. The example process is represented as follows:

Activities are represented as blue boxes and may be related by a happens before or a parent-child relationship. These relations do not suffice to represent all aspects relevant to execution, which is why activities have further properties. The most important property is the activity behavior (yellow boxes) that implements what the activity means in the BPMN diagram, such as creating a task in a user’s task list.
For the execution state model, i.e. to represent which activities are currently active, the process engine has a concept called executions. An execution can be understood as something in between an activity instance (meaning that for every active activity, there is always at least one execution) and a token (meaning that executions can move from one activity to the next).
These two concepts allow to define a simplified model of process execution:

  • When a process instance is started, the process engine creates an initial execution on the start event of the process model
  • An execution can be used to execute an activity and temporarily represent the corresponding activity instance
  • When an activity instance has ended, the process engine evaluates the activity’s outgoing sequence flow and executes the next activity (potentially re-using the current execution; token-like behavior)
  • When a scope is executed, new executions are created that execute activities contained within that scope (the current execution cannot be re-used; activity-instance-like behavior)

These tasks are implemented in the very core of the Camunda engine, also referred to as the Process Virtual Machine (PVM). Note that the second task of the above list is much more complex than it looks at first sight. In detail, the follow steps need to be performed:
Before executing the actual behavior (called preparation phase in the following):

Executing the actual behavior (called execution phase):

  • Invoke the activity’s implementation of org.camunda.bpm.engine.impl.pvm.delegate.ActivityBehavior

After executing the actual behavior (called finalization phase):

  • Invoke execution listeners for the activity instance’s end event
  • Execute the activity’s output variable mappings
  • Delete event subscriptions and jobs created before executing the behavior
  • Create a history update event for the finished activity instance
  • Create a job for asynchronous continuation (asyncAfter)
  • Tear down the activity instance

All of these concerns are cross-cutting. Regardless the type and behavior of an activity, they need to be executed for every single activity instance. The PVM implements these concerns in a mostly solid and clean way. Speaking in terms of the activity model diagram above, the PVM is designed to execute these aspects when the blue boxes are instantiated.
With multi-instance the game becomes a little more complicated.

Multi-Instance in the PVM Model

There are different understandings on how multi-instance fits into the PVM’s execution model. From 7.2 to 7.3, we have revised our understanding and re-implemented multi-instance based on a different view. The two concepts are:

  • Pre 7.3: Multi-instance is an aspect of the activity’s behavior
  • 7.3: Multi-instance is represented by a dedicated scope in the activity model, like an embedded sub process

For explanation, let’s consider a slightly changed process model where the activity Write Blog Post is now a parallel multi-instance activity:

Pre 7.3 Multi-Instance

In Camunda versions prior to 7.3, multi-instance is understood and implemented as an aspect of an activity’s ActivityBehavior. That means, the actual ActivityBehavior (e.g. the behavior of invoking a web service in case of a service task) is wrapped in a multi-instance-specific behavior. In the activity model, this looks as follows:

When this behavior is executed, it creates as many activity instances (= executions) as there are configured in the multi-instance loop characteristics and triggers them to execute the wrapped behavior.
However, this solution does not fit well with the PVM’s execution model: As mentioned above, the execution of an activity instance is divided into (1) preparation, (2) execution, and (3) finalization phase and therefore spans much more than the invocation of the activity behavior. Let us consider what happens when executing an instance of Write Blog Post with this model:

  1. An execution (token) encounters the activity Write Blog Post; the PVM executes the preparation phase in the context of that execution and creates a new activity instance
  2. The PVM executes the execution phase and accordingly the multi-instance activity behavior
  3. The multi-instance activity behavior has to evaluate how many instances are configured and generate as many additional executions
  4. The multi-instance activity behavior has to perform the user task activity behavior in the context of these executions
  5. The multi-instance activity behavior must join these executions when they have finished execution and trigger process continuation when the last one has finished
  6. The PVM executes the finalization phase in the context of the execution leaving the activity

The problem with this sequence is in the steps 3 to 5. Aspects that are part of the preparation and finalization phase and thus covered by the PVM must now be performed by the multi-instance activity behavior. For example, it must ensure that execution listeners are invoked for each of the configured instances. This is not as trivial as it sounds, since the listeners for the first instance have already been invoked during the regular preparation phase in step 1. It is as if the PVM regularly executes a single instance of the multi-instance activity and leaves it to the activity behavior to realize:

Wait a second. This is multi-instance. I should create some more instances.

Similar to the issue with listeners, there are problems with each of the aspects executed in the preparation/finalization phases, resulting in a lot of code duplication and a lively source of bugs.

Multi-Instance in 7.3

In Camunda 7.3, we changed the notion of multi-instance in the core engine fundamentally. Our change is based on the notion of a multi-instance body. A multi-instance body is a scope that contains the actual activity for which multi instance is configured (in the following referred to as the inner activity).

Representing the body explicitly as a scope in the activity model is a convenient way of leveraging the PVM’s execution model of preparation, execution, and finalization phases for multi-instance. When an activity instance of the body is executed in the context of an execution, the multi-instance activity behavior now only creates executions as configured in the loop characteristics and then tells the PVM to execute the inner activity as often as needed. All activity instances, the instance of the body and the instances of the inner activity, are now handled by the core PVM.
As a side note: The multi-instance body is not something we have made up ourselves. The BPMN 2.0 specification mentions it in exactly one line (Section 10.4.7, page 281):

BPMN has the following model elements with scope characteristics:

  • Choreography
  • Pool
  • Sub-Process
  • Task
  • Activity
  • Multi-instances body

Scopes are used to define the semantics of:

  • Visibility of Data Objects (including DataInput and DataOutput)
  • Event resolution
  • Starting/stopping of token execution

So the BPMN specification does foresee the need of multi-instance activities for an extra scope in which events or variables can be defined. Sadly enough though, it does not define the concept of a multi-instances body any further. Whether it is meant to be an actual activity (with a proper instance lifecycle) is left to the reader’s imagination.

What do we gain?

Apart from improved code quality and maintainability, treating multi-instance body and inner activity as two separate things allows us to differentiate between them when executing any of the cross-cutting concerns of activity execution. To be more precise:
Activity instances: Have a look at the following process instance as shown in Cockpit:

In the tree of activity instance, it is now possible to represent the multi-instance body and relate single instances of the inner activity to instances of the body. It looks as follows in Camunda 7.3:

The following shows the same process state in Camunda 7.2:

In 7.2 and earlier, it is impossible to tell if both instances of MI Subprocess belong to one multi-instance activity instance with two inner instances or to two multi-instance activity instances with one inner instance each.
History: Similar to the previous point, the multi-instance body is now logged in the process engine history including start time, end time, and duration. This way it is easily possible to determine how long all instances have taken.
Process Instance Modification: The re-implementation made it possible to modify active multi-instance activities with our new 7.3 feature. We had literally no idea how to build this with the pre 7.3 multi-instance concept.
Asnychronous Continuation: While not yet implemented, it is going to be possible to make either the multi-instance body (already works) or the inner activity asynchronous (does not work yet). The latter is a useful addition in cases of true parallelism since synchronization of inner instances can then be asynchronously performed after their actual work is done.
Explaining Multi-Instance: Summing up the previous points: It is now much easier to relate execution aspects to either the multi-instance body or the inner activity instances. Multi-instance and its behavior in Camunda can now be easier communicated and understood by both users and developers.

  • Orchestrating Cloud Events with Zeebe

    Disclaimer: This blog post is about Cloud-Native software, containers, Cloud Events, and Workflows. It describes a concrete example that you can run yourself using Kubernetes, Helm, and Zeebe.io. You should be familiar with Kubernetes and Helm to follow along and will learn about zeebe.io and cloudevents.io on your way. While working with Kubernetes the chances are quite high that you’ll find services written in different languages and using different technologies stacks. CloudEvents (cloudevents.io / CNCF spec) was born to enable these systems to exchange information by describing their events in a standard way, no matter which transports these services are using (HTTP, Messaging AMPQ/JMS, Protobuf, etc).  In such scenarios, where you have events being produced and consumed by different systems, there are common requirements that start to arise when...

    Read more
  • How we automatically keep our Documentation Screenshots...

    When you open the Camunda User Guide, you’ll see that there are many screenshots explaining the different functions and options the product offers. We hope that you, as a user, find those screenshots helpful. But for us as developers, creating and keeping those screenshots up to date has always been a pain. https://unsplash.com/photos/dDppsuM_UpE At the time of writing, the user guide for Camunda Optimize contained 94 screenshots. And with every release we add more functionality, which means the number of screenshots continuously grows. When we change the look of buttons or add a new section to the header, we need to update every screenshot that has a button or header in it. In practice, this meant we were manually recreating every...

    Read more
  • Git push to deploy to Camunda Cloud

    Using the Zeebe Action for GitHub, you can automate your GitHub repo to deploy BPMN models to Camunda Cloud when you push to a specific branch. In this quick tutorial, I show you how to configure your GitHub repo to deploy all BPMN models in the bpmn directory of your repo on a push to master. If you don’t have a Camunda Cloud account yet, you can join the public beta to get one. Create a client in Camunda Cloud Go into your Zeebe cluster in the Camunda Cloud console, and create a new client. You might want to name it “GitHub-Automation” so you know what it is for. Copy the “Connection Info” block by clicking the copy icon in the lower right-hand...

    Read more