One concrete scenario is worth looking at, as customers stumble upon it regularly: Doing some kind of batch processing via BPMN, where you have a high number of parallel activities in one process instance.
The important characteristics are
It is modeled using parallel
Multiple Instance (MI)
You have high numbers of elements for the MI (> 1000)
You are using wait states or save points within the parallel branch
This scenario is supported by Camunda, but you can run into problems you need to have a close eye onto.
The basic problem is the
execution tree getting really big in this scenario. In most situations, the engine has to load the whole tree in order to do anything, even if that happens only in one parallel path. This not only influences performance but also adds load to the database. Turning off execution pre-fetching (available as internal process engine configuration property) is not recommended, as it may cause other trouble. Cockpit also suffers from huge data chunks, making it slow.
If you add additional scopes, like the BPMN subprocess ((2)), this leads to an additional executions being created. Every embedded subprocess doubles the size of the execution tree, so avoid subprocesses in this situation.
The described problems only arise if you have wait state or save points in your process model, as only then the engine needs to persist the process instance to the database. If you run through the multiple instances in one transaction, the internal optimization removes almost all runtime database update statements, so almost nothing needs to be done (except for the history).
There is one very specific scenario you need to avoid. When a parallel activity is finished and you want to collect the result in a list, you might use a process variable storing that list ((4)). With running a lot of instances in parallel, they might finish at the same time and try to change that process variable simultaneously, leading to optimistic lock exceptions. This typically leads to retries. Even if this situation can heal itself, it increases the load on the database. And assume that you serialize that list as reasonable big XML (growing to several megabytes) in the process variables. That means Camunda sends this chunk of data to the database in every transaction, but might even lose the commit because of the optimistic lock. Now that situation fuels itself, as commit times increase by having big chunks of data, leading to more parallel activities finishing within that time frame, leading to more optimistic lock exceptions.
In this situation, the best approach is not to collect any results, at least not in Camunda itself. You might still leverage a simple database table, where every instance can insert a new line for its result. This would remove the lock problems and is very simple to set up.
In any case, the situation improves if you don’t wait for the parallel processing to finish. This avoids a lot of the problem described here. You can also use workarounds like polling for all subprocesses to finish. Obviously, this is not only harder to understand from a business perspective, but also requires more effort to develop, so it should only be used if you run into serious performance trouble.