New patch releases for Zeebe are available now: 0.22.4 and 0.23.3 and contain various bug fixes as well as minor enhancements. You can grab the releases via the usual channels:
Zeebe 0.23.3 is fully compatible with 0.23.2, as is 0.22.4 with 0.22.3. This means it is possible to perform a rolling upgrade for your existing clusters. For Camunda Cloud users, Zeebe 0.23.3 is already the latest production version, meaning your existing clusters should have been migrated by the time you see this post.
Without further ado, here is a list of the notable changes.
The 0.22.4 patch contains 4 bug fixes, with 3 of them also part of 0.23.3. These are the fix to allow exporting custom job headers to Elasticsearch, the NPE fix when triggering timers, and fixing a race condition when cancelling a workflow instance. You can read more about these in the 0.23.3 release notes below.
There was a bug reported by user strawhat5, where a broker would reject all incoming requests and throw an
IllegalStateException: Failed to recover broker exception on restart or fail over. This was due to a race condition and the reuse of a mutable
LogStream object, resulting in log inconsistencies – that is, resulting in an inconsistent log, one of the highest severity bugs that can affect the Zeebe broker. After some investigation, this was fixed with the following PR (later back ported to 0.22).
While it’s always been possible to export custom job headers to Elasticsearch, there was an issue with the index template that was previously used, resulting in an error for some specific values (specifically when the header contained a period in its name), as reported by user eetay. The fix implemented was to disable indexing custom headers – the headers will still be stored in Elasticsearch, but not searchable via queries.
Here we see the result of improving the Log4J2 Stackdriver layout – this bug was found thanks to it, allowing the development team to use Google Cloud’s Error Reporting tool to pre-emptively find and fix bugs.
This bug was the result of a race condition, when a timer was triggered as the element it referred to was left during process execution, but before the timer could be cancelled.
Here there were actually two bugs which had the same cause: #4400 and #4352. Both were caused by a race condition when cancelling/interrupting running workflow instances, which would result in stuck instances that could not make progress nor be cancelled, resulting in “garbage” data and resource usage. Or, as our developer saig0 put it:
On terminating the sub-process, a new token is spawned for the sequence flow that is waiting on the joining parallel gateway. As a result, the flow scope of the sub-process can not be completed or terminated.
The fix here was to publish only deferred boundary events when the sub-process is terminated to avoid that other events are published.
We’re aware that this is not a bug fix, and normally this wouldn’t be included in a patch release. However, in this case we decided to include it, as it brings better integration with Stackdriver related tools, such as Error Reporting. We run most of our tests and benchmarks on Google Cloud, and as such good integration with Stackdriver tools increases our ability to diagnose issues, both retroactively but also preemptively.
The enhancement does not include all possible features (e.g. adding custom labels or operation grouping), but focuses on making sure that errors were properly reported. So if you’re using the layout, be aware that it has now changed a little:
- It is now called simply StackdriverLayout, as opposed to StackdriverJSONLayout
- While not garbage free in the Log4J2 sense, it tries to be as close as possible to it
- You can configure a service context so your errors are properly grouped in Error Reporting:
- The service name can be configured via the system property log.stackdriver.serviceName, or the environment variable ZEEBE_LOG_STACKDRIVER_SERVICENAME. If omitted, this defaults to “zeebe”.
- The service version can be configured via the system property log.stackdriver.serviceVersion, or the environment variable ZEEBE_LOG_STACKDRIVER_SERVICEVERSION. If omitted, this defaults to “development”.
When starting the Zeebe broker or standalone gateway, one of the first things we do is print out the effective configuration if the log level is DEBUG or lower. One issue with this is that it would print out the Elasticsearch credentials, which is sensitive information.
The fix in the end was a broader one, where any configuration field called “username” or “password” will now be printed out as 3 asterisks (e.g. “***”).
Get In Touch
There are a number of ways to get in touch with the Zeebe community to ask questions and give us feedback.
- Join the Zeebe user forum
- Join the Zeebe Slack community
- Reach out to dev advocate Josh Wulf on Twitter
- Reach out to dev advocate Mauricio Salatino on Twitter
We hope to hear from you!