A sincere thanks to Bernd Rücker for his feedback during the writing of this blog post.
This is part 1 of 2 in a 2-part blog post series. Part 2 is available here.
We’re building Zeebe to be a next-generation workflow engine for emerging use cases such as microservices orchestration–use cases that may require an engine to handle hundreds of thousands (or millions) of new workflow instances per second.
And to do that, we’re using a graphical modeling standard that’s been around for almost 15 years: BPMN (Business Process Model and Notation).
Even though BPMN is a battle-tested ISO standard, it’s possible that many of you have never been hands-on with it or maybe haven’t even heard of it.
Or worse, you’ve heard of BPMN, but you’ve written it off as a legacy technology that’s only relevant in a monolithic or SOA world.
Last month, we came across this quote from a Hacker News commenter (who we promise is not an incognito member the Zeebe team):
"BPMN is the most underappreciated technology in our field IMO."
We couldn’t agree more, and in that spirit, we’ll be publishing a two-part blog series about BPMN and microservices orchestration–and more specifically, why BPMN is a great fit for up-and-coming use cases in the workflow world.
In this part 1, we’ll:
- Provide a quick intro to BPMN
- Make the case why a well-established standard that thrived in the past can thrive in the future, too
- Review common orchestration patterns supported by BPMN
- Discuss the current state and future plans for BPMN in Zeebe
In part 2, we’ll:
- Dive into BPMN’s graphical models (and other ways to define workflows)
- Look at examples where using a graphical model instead of a code-based model greatly simplifies workflow definition
A short primer on BPMN
BPMN is a widely-used modeling standard for defining and executing business processes. First released in 2004 (with the modern BPMN 2.0 specification following in 2011–this is what Zeebe uses), BPMN has been an ISO standard since 2013.
BPMN is used to define a graphical model and so-called execution semantics. In other words, the visual model is stored as an XML file that can be executed directly on an engine that keeps persistent state of running workflow instances.
To give an example, the model below is expressed with this XML.
It’s important to say that BPMN involves no code generation and no transformation! The XML is itself the source code. And BPMN is only concerned with the flow–you can use normal code for all other aspects of your solution.
That’s a key point for microservices orchestration, where external workers carry out the tasks in your workflow. When combined with the right engine, BPMN makes it easy to connect tasks in a workflow to microservices and to do so in a way that doesn’t violate the principles of loose coupling and service independence.
Extending the sample order workflow above, we can build 3 distinct microservices to handle payments, inventory, and shipping. The workflow engine is responsible for sending work to the right service at the right point in the process.
Lastly, there’s BPMN’s maturity. BPMN is popular and well-established, and it’s proven its value in many workflow automation projects at companies both large and small. For this reason, there’s already a lot of experienced BPMN talent in the market as well as tutorials and books that make it easy for newcomers to learn the standard.
That all sounds good. But can BPMN handle my fancy new architecture?
Let’s go into metaphor mode for a moment.
While the specifics around the history of the automobile are up for debate, many give credit to Karl Benz for building the first car and taking it for a spin around Mannheim, Germany in 1886. In the past 130 or so years, the capabilities of the automobile–in this case, literally the “engine”–have evolved quite a bit.
The “flow”, however, has remained fairly static. The roadways, signage, and laws that help us get safely from point A to point B still follow many of the same patterns that were implemented many decades ago. This might change eventually (see: Hyperloop, drones), but the point is that a given “flow pattern” can support many significant “engine” advancements.
So when we evaluate a well-established standard like BPMN, we have to distinguish between “flow requirements” and “engine requirements”.
BPMN has distilled a number of patterns around flows that are timeless. Carrying out a series of activities in a sequence or in parallel can be applied to a more traditional BPMN use case such as human task management as well as calling serverless functions in AWS. Waiting for an incoming copies of printed and signed documents is comparable pattern-wise to correlating multiple messages in your event-streaming architecture.
What does indeed change is the throughput (number of workflow instances) as well as performance and scalability requirements. These are problems that can be solved with a new engine executing the same flow language–and that’s the approach we’re taking with Zeebe, which can scale to millions of new workflow instances per second.
The alternative is to build a new engine and invent a new flow language while you’re at it. But with a new flow language, you’re inevitably going to spend time solving problems that have already been solved in BPMN. And you might not be able to solve these problems as effectively or in a way that’s easily understandable by all stakeholders.
We’ll elaborate on this idea in part 2 of this series, where we’ll compare a complex real-world workflow pattern as expressed in BPMN versus Amazon States Language.
For now, let’s review examples of common workflow patterns to help demonstrate why we’re so confident that BPMN is the right flow language for microservices orchestration and other next-gen workflow use cases.
Defining Orchestration Patterns in BPMN
We’re not going to provide a complete BPMN tutorial in this post. Our goal is for you to understand a subset of the building blocks at your disposal and give a few examples of how you can put them to use.
That shouldn’t stop you from going deeper on BPMN if you’d like to. Camunda’s BPMN tutorial that we mentioned above is a good place to get started, as is our BPMN reference.
You can also start to get hands on with our Zeebe-specific graphical modeling tool, and we’ll talk a lot more about graphical models in part 2 of this series.
Now onto the patterns.
Sequence flow, decisions, and parallel processing
At the core of BPMN is sequence flow, which defines the order in which the steps in a workflow are carried out.
As you might imagine, limiting a workflow to a simple one-after-the-other sequence of tasks leaves a lot of real-world business logic unaddressed. In addition to tasks (units of work), BPMN workflows consist of gateways (which steer the flow) and events (which represent things that happen that a workflow can react to or notify other systems about).
BPMN provides constructs for routing a workflow instance to a single sequence flow based on associated data (the exclusive gateway) and also for one or more sequence flows that need to be carried out in parallel (the parallel gateway).
Message correlation with timeout
BPMN’s receive task is one way that the standard provides support for message correlation, a very powerful feature that makes it possible to move a waiting workflow instance forward or take some other action only when a message can be correctly matched (“correlated”) with a specific workflow instance that’s waiting for it using a common identifier.
This is a feature that would be especially difficult to build from scratch and then support at scale–and with BPMN combined with the right engine, you get it out-of-the-box.
BPMN’s support for both sending and receiving messages means that models can integrate seamlessly with a message-driven architecture, an architecture that is particularly common in the microservices world.
Workflows can be started by certain types of messages; they can also emit a message to be used by a downstream system. Or a workflow instance can end based on a received message. For example, an in-flight workflow instance–such as an order fulfilment process in an e-commerce company–can be terminated in response to an incoming order cancellation message that’s associated with a specific order.
As you’ll see (and we’ll repeat frequently), the ease with which different elements can be combined is what makes BPMN so powerful.
For example, the receive task can be combined with a Timer event so that if the required message doesn’t arrive within 4 hours, the task “times out” and the workflow instance follows a different path.
Correlation of multiple messages
Correlating one message with a workflow instance is helpful, but what if you need to correlate two, or three, or ten?
BPMN has this pattern covered, too. You can wait for two or more messages to sync and to merge their payloads before moving a workflow instance forward by combining the receive task with the parallel gateway.
Let’s take it one step further and combine this pattern with a timeout.
Again, the addition of the timer and subprocess in the diagram below is just one example of how different BPMN elements can be combined to express a complex flow; while there are, of course, certain combinations that won’t make logical sense, there’s essentially no limit to how you can connect BPMN symbols to define workflows.
Waiting for an arbitrary number of messages
In some cases, we might not know how many messages that we need to wait for will be associated with a given workflow instance.
Consider an example where we need to receive an itemAvailable
message for every item in an order before we move forward with the workflow. The number of items in each order might vary widely, and we can account for that in our model using BPMN’s multiple-instance activity.
Error handling
There might be certain “business logic errors” that you’ll need to design for in your workflows. Here, we aren’t talking about errors where a service fails for a technical reason, but instead, cases where a workflow can’t proceed due to a business problem that we can plan for in advance. BPMN’s error boundary event was designed for this particular case.
In this example, we try to charge a customer’s credit card. If the charge is declined due to insufficient funds in the customer’s account, we’ll notify the customer about the issue.
We Could Keep Going…
When it comes to patterns supported by BPMN, we’ve only just scratched the surface. We hope you have an idea of how many different use cases can be expressed with these BPMN symbols alone.
What we’re showing in these examples isn’t meant to be prescriptive or tell you exactly how you should use BPMN. Rather, we aim to spark your imagination about the types of models you can build.
The State of BPMN in Zeebe
Hopefully, you’ve arrived at this point in the post with an understanding of BPMN’s possibilities when it comes defining and executing complex workflows.
But the real question is: how much of BPMN do we support in Zeebe?
In the long-term, Zeebe will support all symbols that make sense for workflow automation, same as we’ve done with the Camunda BPMN Workflow Engine.
As for right now, Zeebe 0.11 (the most recent release) supports:
Of course, that’s a limited scope, and up to this point, we’ve focused primarily on Zeebe’s engine–that is, making sure that Zeebe has the scalability and performance to handle high-throughput use cases.
But we’ve been investing heavily in BPMN support for Zeebe this quarter, and we’ll be ready to support message correlation in the near future:
As we get Zeebe ready for production in 2018, we plan to add support for more symbols such as:
- Timers,
- Scopes (subprocesses), and
- Parallel execution
In 2019, we’ll expand symbol support based on user feedback and what we know about the use cases that Zeebe will address.
We’re at the end of part 1 of our BPMN series. There’s a really important aspect of BPMN that we only mentioned in passing in this post: the fact that models can be defined graphically then executed directly by an engine.
So we’ll be back soon with part 2, where we’ll go into the many benefits of graphical models in BPMN, particularly when compared to other flow languages built for the orchestration use case. We’ll also address concerns of developers who have had negative experiences with graphical models.
If you have any questions or comments about this post, we’d love to hear from you. Find us on Twitter (@ZeebeHQ) or on the Zeebe forum and Slack community.