A new release for Zeebe 0.25.1 and Operate 0.25.0 is available now. You can grab the releases via the usual channels:
As usual, if you’d like to get started immediately, you can find information about it directly on the Zeebe & Operate documentation website.
It is possible to perform a rolling upgrade to Zeebe 0.25 from the 0.24.4 release but not from previous 0.24 patch releases. For Camunda Cloud users, 0.25.1 will become the new production version in the next week, your existing clusters will be upgraded automatically.
We’d like to add a special shout out to @aivinog1 for all his contributions in the last few months. Thank you Alexey!
Here are some highlights:
- Many, many bug fixes
- Several configurations to tune Zeebe’s performance
- Improvements related to upgrading and compatibility
- Usability improvements in the clients
- LDAP Authentication
- Connect To Secured Elasticsearch
- Default Elasticsearch Indices Configuration
- Improved Migration/Archiving of Large Workflow Instances
- Liveness/Readiness Check
In the rest of this post, we’ll go into more details about the changes that the latest stable releases bring.
As seen from the highlights, this release focused both on continuing to improve stability as well as providing ways for users to configure Zeebe to get the best performance. Most bug fixes have already been released over the last quarter as patch releases for 0.23.x and 0.24.x.
By default, Raft followers will flush written entries to disk before acknowledging their writes to the leader. This is required to ensure consistency, otherwise a follower could lose an acknowledged entry which would invalidate the quorum. Even though most users should not disable this, as it can cause logs to be inconsistent in replicated clusters, it it’s now possible to do so. This should only be used by advanced users who wish to exchange fault tolerance for a performance gain.
Taking snapshots allows Zeebe to truncate its log and reduce its disk usage. Normally, it happens automatically after a configurable period. This release introduces an API to trigger snapshots without waiting for the snapshot interval. This can be helpful for testing or to reduce the amount of records that have to be reprocessed when upgrading.
It may happen that reprocessing fails after upgrading to a new version due to changes in the workflow engine’s logic. To mitigate the impact of this, the broker will now inform the user if any such issues will occur when upgrading and what can be done to solve them.
It’s now possible to configure the RocksDB column family options in the Zeebe configurations. These options may be useful to tune RocksDB’s performance for specific use cases. An example of these configurations can be found in the template for the broker configuration in the distribution’s
It’s now possible to use
now() in FEEL expressions in timer events, which evaluates to the current date-time.
Making improvements and bug fixes in the Raft implementation often requires changes to data that is either persisted or sent through the network. This posed an issue, as it implied breaking backwards compatibility. This release fixes this issue by introducing a way to make backwards compatible changes. Enabling this feature required some preparation in order to maintain backwards compatibility from previous releases which is why the rolling upgrade is only possible from 0.24.4, as that is the only release which supports both the old and new format.
By default, Zeebe uses file channels to read its log segments. Although previous releases had already introduced an optional optimization which makes use of memory mapped log segments, it was unsafe to use it in replicated clusters. This release makes it safe to do so, although it should still be considered experimental as it’s not fully mature. This setting can be enabled under
Raft duplicate leader bug fixes
We’ve also fixed two serious issues in our Raft implementation. The first would cause a recently deposed leader to commit his uncommitted entries without acquiring a quorum when receiving new entries from the new leader. The second would cause a node to vote for two candidates in the same term, which means two leaders could be elected for the same term. Both of these bugs could result in logs diverging and becoming corrupted. It’s interesting to note that both have been discovered by our new randomized tests for the Raft implementation. In every CI build, these tests generate new random operation sequences which, over time, allow us to test many different executions that would be impossible to test manually.
Restarting a node often took a long time due to the need for a snapshot to be fully replicated and installed. Optimizing this process has brought the restart time down by several orders of magnitude. There are also new Grafana metrics to help monitor restart performance.
Previously, exporters were not updating the exported position if records were filtered. Fixing this bug prevents Zeebe’s log from growing without being compacted in a low load scenario where only filtered records are written.
Improved workflow validation
It’s possible to configure parameters to ensure that a minimum amount of free disk space. To prevent infringing this limit, a Zeebe broker will step down.
You can now specify a custom resource name when deploying workflows with zbctl. This is helpful to prevent deploying duplicate workflows when using different clients since, for duplicate workflows to be filtered, they must have the same resource name.
When publishing a message, the response to the command will contain an identifying key for the published message.
Both the Java and Go clients now add information about their type and version in the user-agent of the auth requests. This information is useful when investigating issues that stem from the interaction between Zeebe and specific clients.
The JobWorker in the Java client interface now implements the AutoCloseable interface which makes it possible to use the JobWorker with a try-with-resources block.
With Operate 0.25 we added support to connect Operate with your own LDAP to
allow authentication of users. You can read more about how to configure the
LDAP connection in our documentation.
Connect To Secured Elasticsearch
As often requested by users, and to align with the capability of Zeebe, Operate
is now able to be configured to connect to a secured elasticsearch instance.
See the configuration section of our documentation.
Default Elasticsearch Indices Configuration
It is now possible to set the default number of shards and replicas of the
Operate Elasticsearch indices in the configuration of Operate, see the
documentation for more information.
Improved Migration/Archiving of Large Workflow Instances
In Zeebe a workflow instance can have a large amount of activities, for example if
a loop or multi-instance sub-process is involved. Therefore, migrating or
archiving one of these instances can take a noticeable amount of time. In
previous versions of Operate this could lead to requests timing out between
Operate and Elasticsearch. With the latest version of Operate we are now using
the Elasticsearch Task API to better handle such long-running data modifications.
With the new version, we adjust the exposed liveness and readiness checks to be
aligned with the common best practices in Spring Boot, see the
to see what changed.
Note about Zeebe 0.25.0
As you might have noticed in this announcement we are referring to Zeebe 0.25.1, and not 0.25.0. The Zeebe 0.25.0 release contains a feature which was intended to detect anomalies during upgrading a Zeebe cluster. Before announcing the release to the public, we
discovered an issue with this feature which could lead to a degraded user experience during normal runs. This feature was originally built to help us ensure that upgrades were safe, and as such, should not impact the normal usage of Zeebe . To compromise, we’ve added a feature flag in 0.25.1 which lets you turn the detection on and off – by default it is off, and we would recommend users to turn it on during upgrades. This can be done via an experimental configuration flag
zeebe.broker.experimental.detectReprocessingInconsistency = true (or an environment variable
You can read more about the recommended upgrade procedure in our documentation.
Get In Touch
There are a number of ways to get in touch with the Zeebe community to ask questions and give us feedback.
- Join the Zeebe user forum
- Join the Zeebe Slack community
- Reach out to dev advocate Josh Wulf on Twitter
- Reach out to dev advocate Mauricio Salatino on Twitter
We hope to hear from you!