Dealing With Problems and Exceptions

Try to carefully study and fully understand the concepts of wait states (save points) acting as 'transaction borders' for technical (ACID) transactions. In case of technical failures, they are by default rolled back and need to be retried either by the user or the background job executor. It's also important to distinguish between this kind of technical reaction and a business reaction predefined in the process, typically, but not necessarily, to deal with purely business related issues.
Dealing With Problems and Exceptions is also related to
Dealing With Problems and Exceptions

Understanding Transactions in Processes

Technical (ACID) Transactions

Every time we use the Camunda API to ask the process engine to do something (like e.g. starting a process, completing a task, signaling an execution), the engine will advance in the process until it reaches wait states on each active path of execution.

1 User Tasks and Receive Tasks are wait states …​
2 …​ so are all Intermediate Catching Events and …​
3 …​ the Event Based Gateway - which offers the possibility of reacting to one of multiple Intermediate Catching Events.
4 Furthermore several task types (Service, Send, Business Rule Tasks) …​
5 …​ as well as the Throwing Message Events might be implemented as External Tasks, which are then wait states, too.

At a wait state, any further process execution must wait for some trigger. Wait states will therefore always be persisted to the database: within a single database transaction, the process engine will cover the distance from one transaction boundary of persisted wait states to the next such boundary. However, you have fine grained control over these transaction boundaries by introducing additional save points using the "async before" and "async after" attributes. A background job executor will then make sure that the process continues asynchronously.

Learn more about Transactions in Processes in general and Asynchronous Continuations in the User Guide.

Business Transactions

Sometimes when we refer to "transactions" in processes we refer to a very different concept, which must be clearly distinguished from "technical" database transactions. A business transaction marks a section in a process for which 'all or nothing' semantics similar to a technical transaction should apply, but from a pure business perspective.

1 A Transaction Subprocess marks a long running "business transaction", meaning here that in case …​
2 …​ the approval for the vacation is withdrawn at least four weeks in advance, we must not go on vacation. However …​
3 …​ we will want to cancel the hotel we already booked. With this task, which will just show up in our task list in case the approval for the vacation was withdrawn, we "roll back" the business transaction, in other words compensate what we already have done.

The borders of "business transactions" are not at all related to technical transactions. It’s really just a possibility to compensate the scope of a sub process from a business perspective.

Learn more about Transaction Subprocesses in the User Guide.

Demarcating Custom Transaction Borders

Using Additional Save Points

You have fine grained control over transaction borders by introducing additional optional "save points" on top of the obligatory "wait states". Use the asyncBefore='true' and asyncAfter='true' attributes in your process definition BPMN XML. The process state will then be persisted at these points and a background job executor will make sure that it is continued asynchronously.

1 A user task is an obligatory wait state for the process engine. After the creation of the user task, the process state will be persisted and committed to the database. The engine will wait for user interaction.
2 This service task is executed "synchronously" (by default), in other words within the same thread and the same database transaction with which a user attempts to complete the "Write tweet" user task. When we assume that this service fails in cases in which the language used is deemed to be too explicit, the database transaction rolls back and the user task will therefore remain uncompleted. The user must re-attempt, e.g. by correcting the tweet.
3 This service task is executed "asynchronously". By setting the asyncBefore='true' attribute we introduce an additional save point at which the process state will be persisted and committed to the database. A separate job executor thread will continue the process asynchronously by using a separate database transaction. In case this transaction fails the service task will be retried and eventually marked as failed - in order to be dealt with by a human operator.
Pay special attention to the consequence of these save points with regards to retrying. A retry for a job may be required if there are any failures during the transaction which follows the save point represented by the job. Depending on your subsequent transaction boundaries this may very well be much more than just the service task which you configured to be asyncBefore='true'! The process instance will always roll back to its last known save point.

Marking Every Service Task as Asynchronous

A typical rule of thumb, especially when doing a lot of service orchestration, is to mark every service task being asynchronous.

If you want to know more about the underlying reasons for marking service tasks as asynchronous - or not - we strongly recommend to read the next section about Knowing Typical Do’s and Don’ts for Save Points.

The downside is that the jobs slightly increase the overall resource consumption. But this is often worth it, as it has a couple of advantages for operations:

  • The process stops at the service task causing the specific error.

  • You can configure a meaningful retry strategy for every service task.

  • You can leverage the suspension features for service tasks.

It is not directly configurable to change Camunda Platform’s default "async" behavior for all service tasks at once. However, you can achieve that by implementing a custom ProcessEnginePlugin introducing a BpmnParseListener which adds async flags on-the-fly (eventually combined with custom BPMN extension attributes to control this behavior). Compare a full code example for a similar scenario on GitHub.

Knowing Typical Do’s and Don’ts for Save Points

Aside a general strategy to mark service tasks as being save points you will often want to configure typical save points.

Do Configure a SavePoint After

  • User Tasks User Task

    This savepoint allows users to complete their tasks without waiting for expensive subsequent steps and without seeing an unexpected rollback of their user transaction to the waitstate before the user task.

Sometimes, e.g. when validating user input by means of a subsequent step, you want exactly that: rolling back the user transaction to the user task waitstate. In that case you might want to introduce a savepoint right after the validation step.
  • Service Tasks (or other steps) causing Non-idempotent Side Effects Service Task Script Task Send Task Message Intermediate Event Message End Event

    This savepoint makes sure that a side effect which must not happen more often than once is not accidentally repeated because any subsequent steps might roll back the transaction to a savepoint well before the affected step. End Events should be included if the process can be called from other processes.

  • Service Tasks (or other steps) executing Expensive Computations Service Task Script Task Send Task Message Intermediate Event Message End Event

    This savepoint makes sure that a computationally expensive step does not have to be repeated just because any subsequent steps might roll back the transaction to a savepoint well before the affected step. End Events should be included if the process can be called from other processes.

  • Receive Tasks (or other steps) catching external events, possibly with payload Receive Task Message Intermediate Event Signal Intermediate Event

    This savepoint makes sure that a external event like a message is persisted as soon as possible. It cannot get lost just because any subsequent steps might roll back the transaction to a savepoint well before the affected step. This applies also to External Service Tasks.

Do Configure a SavePoint Before

  • Start Events None Start Event Message Start Event Signal Start Event Timer Start Event

    This savepoint allows to immediately return a process instance object to the user thread creating it - well before anything happens in the process instance.

  • Service Tasks (or other steps) invoking Remote Systems Service Task Script Task Send Task Message Intermediate Event Message End Event

    This savepoint makes sure that you always transactionally separate the potentially more often failing remote calls from anything that happens before such a step. If a service call fails you will see the process instance waiting in the corresponding service task in cockpit.

  • Parallel Joins Parallel Join Inclusive Join Multiinstance Task

    Parallel joins synchronize separate process pathes, which is why one of two path executions arriving at a parallel join at the same time will be rolled back with an optimistic locking exception and must be retryed later on. Therefore such a savepoint makes sure that the path synchronisation will be taken care of by Camunda’s internal job executor. Note that for multi instance activities, there exists a dedicated "multi instance asynchronous after" flag which saves every single instance of those multiple instances directly after their execution, hence still "before" their technical synchronization.

The Camunda JobExecutor works (by default) with exclusive jobs, meaning that just one exclusive job per process instance may be executed at once. Hence, job executor threads will by default not cause optimistic locking exceptions at parallel joins "just by themselves", but other threads using the Camunda API might cause them - either for themselves or also for the job executor.

Don’t Configure Save Points Before

  • User Tasks and other Wait States User Task Receive Task Message Intermediate Event Signal Intermediate Event Timer Intermediate Event Event Based Gateway including steps configured as External Tasks Service Task Send Task Business Rule Task Message Intermediate Event Message End Event

Such savepoints just introduce overhead as wait-states such as user tasks, receive tasks, external service tasks and catching events like timer, message or signal always by definition finish the transaction and wait for external intervention anyway.

  • All Forking and Exclusively Joining Gateways Exclusive Gateway Parallel Join Inclusive Join

    There should just be no need to do that, unless execution listeners are configured at such points, which could fail and might need to be transactionally separated from other parts of the execution.

Bonus  Adding save points automatically to every model

If you agree on certain save points to be important in all your process definitions, you can add required BPMN XML attributes automatically by a Process Engine Plugin during deployment. Then you don’t have to add this configuration to each and every process definition yourself.

As a weaker alternative the plugin could check for existance of correct configuration and log warnings or errors if save points are missing.

Take a look at this example for details.

Dealing With Exceptions

Rolling Back a Transaction

It is important to understand that every non-handled, propagated exception happening during process execution rolls back the current technical transaction. Therefore the process instance will find its last known wait state (or save point). The following image visualizes that default behavior.

rollback
1 When we ask the Camunda engine to complete a task …​
2 …​ it tries to advance the process within the borders of a technical transaction until it reaches wait states (or save points) again.
3 However, in cases where a non-handled exception occurs on the way, this transaction is rolled back and we find the user task we tried to complete to be still uncompleted.

From the perspective of a user trying to complete the task, it appears impossible to complete the task, because a subsequent service throws an exception. This can be unfortunate, and so you very well may want to introduce additional save points, e.g. here before the send task.

<sendTask id="send" name="Send invoice to customer" camunda:asyncBefore="true" camunda:class="my.SendDelegate" />

But hindering the user to complete the user task can also be just what you want. Consider e.g. the possibility to validate task form input via a subsequent service:

1 A user needs to provide data with a user task form. When trying to complete the form …​
2 …​ the subsequent synchronously executed service task finds a validation problem and throws an exception which rolls back the transaction and leaves the user task uncompleted.
Learn more about Rollback on Exceptions and the reasoning for this design in the User Guide.

Handling an Exception via the Process

As an alternative to rolling back a transaction, we can also handle an exception via the process which called the failing piece of code.

1 We decide that we want to deal with an exception in the process: in case the invoice cannot be sent automatically …​
2 …​ we assign a task to a human user, who is now in charge of taking care of delivering the invoice.
Please be aware of the following technical constraint: in case your transaction manager already marks the current transaction for rollback (as possible in Java transaction managers), handling the exception in this way is not possible as the process engine cannot commit its work in this transaction.
Learn more about the usage of Error Events in the User Guide.

Distinguishing between Exceptions and Results

As an alternative to throwing a Java exception, you can also write a problematic result into a process variable and model a XOR-Gateway later in the process flow to take a different path if that problem occurs.

From a business perspective the underlying problem then looks less like an error and more like a result of an activity, so as a rule of thumb we deal with expected results of activities by means of gateways, but model exceptional errors, which hinder us in reaching the expected result as boundary error events.

1 The task is to "check the customer’s creditworthiness", so we can reason that we expect as a result to know whether the customer is credit worthy or not.
2 We can therefore model an exclusive gateway working on that result and decide via the subsequent process flow what to do with a customer who is not credit worthy. Here we just consider the order to be declined.
3 However, it could be that we cannot reach a result, because while we are trying to obtain knowledge about the customer’s creditworthiness, we discover that the ID we have is not associated with any known real person. We can’t obtain the expected result and therefore model a boundary error event. In the example the consequence is just the same and we consider the order to be declined.

Throwing and Handling BPMN Errors

In BPMN process definitions, we can explicitly model an end event as an error.

1 In case the item is not available, we finish the process with an error end event.

It is crucial to understand that according to the BPMN spec, such a BPMN error is either handled via the process or terminates the process instance. It does not roll back the technical transaction! Therefore you can and normally should always handle the BPMN Error via the (parent) process scope calling it or embedding the process fragment throwing the error.

1 The boundary error event deals with the case that the item is unavailable. The details of the subprocess are shown in the diagram above.

Note, that you can mimic a BPMN error in your Java code by explicitly throwing an exception of type org.camunda.bpm.engine.delegate.BpmnError. The consequences for the process is the same as if it were an explicit error end event. So, in case your 'purchase' activity is not a sub process, but a service task, it could throw a BPMN Error informing the process that the good is unavailable:

throw new BpmnError(GOOD_UNAVAILABLE);

Modeling for Easier Operations

Make sure you also understand how to model for easier operations of Camunda - in particular by understanding retry behaviour and incident management for service tasks.

No guarantee - The statements made in this publication are recommendations based on the practical experience of the authors. They are not part of Camunda’s official product documentation. Camunda cannot accept any responsibility for the accuracy or timeliness of the statements made. If examples of source code are shown, a total absence of errors in the provided source code cannot be guaranteed. Liability for any damage resulting from the application of the recommendations presented here, is excluded.

Copyright © Camunda Services GmbH - All rights reserved. The disclosure of the information presented here is only permitted with written consent of Camunda Services GmbH.