Build a Kafka Cluster on an Oracle Kubernetes EngineGepubliceerd: Auteur: Michel Schildmeijer Categorie: Oracle
Deze blog is alleen beschikbaar in het Engels
Messaging is an important part of Middleware platforms. It is in fact an essential part for integrating and distributing data amongst source and target systems. With the rise of application and data integration patterns, message brokers became an important part in handling and transferring data in a solid and secure matter, in the most optimal way with a minimum of loss of data during transfer.
Traditional ways of messaging and message brokers can be divided into several parts:
JMS – Java Messaging
JMS offers a set of APIs for messaging: put a message on a queue and someone else, sometime later, perhaps a far distance from each other, takes the message off the queue and processes it. It is decoupled in time and location of the message provider and consumer. Even if the message consumer happens to be down for a time, messages can be reproduced.
JMS also has a publish/subscribe capability where the producer puts the message to a "topic”. Any interested parties can subscribe to that topic, receiving messages as and when they are produced, but for now I will focus on the queue capability. There is some decoupling of the relationship between provider and consumer. However, some coupling remains.
There are a few opensource message brokers which use the JMS protocol: Oracle WebLogic and IBM WebSphere MQ, which is a technology that can also be used to transport JMS messages. Furthermore, IBM WebSphere MQ is also a native IBM queueing mechanism.
Enterprise Service Bus:
An ESB is a Content-Based Routing capability. But the ESB also uses the fundamental queuing capabilities offered by JMS.
A second way of coupling for an ESB is possible if the consumer and producer agree about the message format. That's fine in simple cases. But when you have many producers who all send messages to the same queue, you will have versioning problems. New message formats are introduced but you don't want to change all the existing providers.
Message transformation is another ESB capability; to talk to multiple formats of data.
New and evolved ways of messaging
In our ever-changing, 24/7, “always on” world, the need for real-time, scalable and high-performance communication drives companies to think of new ways to provide these capabilities.
One of the most popular and highly adopted technologies around is an open-source technology called Kafka, provided by the Apache Software Foundation. It was originally written by LinkedIn and donated to the Opensource Community in 2011.
Kafka is a “topic based” platform that provides a near real-time, low-latency, high-throughput data stream processing platform capable of managing almost real-time data feeds. It is mainly written in Java and Scala.
The platform holds a set of APIs (connectors) and topics (message containers) that are all held together within a cluster to provide high availability, high performance messaging and scalability. In the picture, you can see a very simple representation of a Kafka platform:
As noted, this is a very simple representation. Over the years, Kafka has been enhanced with newer APIs called connectors, which can be used to connect any flavor of technology.
This article is not specifically written to deep dive into Kafka, but I think a little context is necessary. There are already many books, articles and courses around Kafka and streams technology. In this article, I would like to focus on how to build a Kafka Cluster in a Cloud and Container Native topology.
Kafka in the Cloud
If you say Kafka, the name Confluent immediately pops up. Confluent is a company that “does Big Data” and was established by some of the original Kafka founders. They provide a complete solution build on Kafka and all the tools a data integration and streaming platform needs.
You can run Kafka in your own datacenter or on premises as part of your infrastructure. However, with Cloud strategies these days, more companies run Kafka in the Cloud in some form of Cloud subscription. All the known technology players have an end-to-end PaaS solution build with Kafka:
- Microsoft Azure Events Hub and Confluent
- Amazon Managed Streams for Kafka (MSK)
- Google Cloud – Integrated and hosting Confluent Cloud
- Confluent Cloud
- Oracle OCI Event Hub
All these solutions have their own implementations, infrastructures, connectors and more, but in the end, they are more or less the same.
I do not want to discuss these solutions for now. Instead, I want to focus on how to build a Kafka Cluster on a Kubernetes Platform in the Oracle Cloud.
Why Kafka on Kubernetes (in the Cloud)?
The managed solutions I discussed are all fine, but when your situation is a hybrid between Cloud and on premises, maybe consider having some control. Apache Kafka on Kubernetes is perfect for cloud-native development. Cloud-native applications are independent, loosely coupled and distributed services that deliver high scalability via the cloud. This is also applicable for event-driven applications built on Kafka; they are designed to scale across a distributed hybrid cloud environment. Another benefit of running Kafka on Kubernetes is infrastructure abstraction: it can be configured once and run everywhere. This is also a benefit if you build your infrastructure with the “Configuration as Code” method.
The flexible scalability of Kubernetes is suited for a solution like Kafka. Kubernetes allows applications to scale resources up and down with a simple command, or scale automatically based on usage. Kubernetes also has the option to span across different platform implementations (Cloud, on-premises, hybrid clouds, and different operating systems).
Kafka on Oracle Kubernetes Engine (OKE)
You can run your Kubernetes Clusters on premises, but I chose for a hybrid solution. Meaning: a company is not completely Cloud-based. A hybrid situation will always exist. In fact, they will occur more and more in the coming years.
There are a few options to build a Kafka Cluster on Kubernetes:
- Build it yourself completely – which means building compute instances, storage, container images, and a Kubernetes cluster
- Use a Kubernetes Operator, that does a lot of work for you. There are a few, such as:
o Strimzi Operator
o KUDO Operator
For building Kafka on OKE, I used the following components:
- OCI Cloud Infrastructure EU-Frankfurt
- Block Storage and NFS
- 4 Compute instances (as part of the OKE Cluster)
- OKE Node Pool of 4 instances
- Cloud Networking and Load Balancer
- Kubernetes version 1.19.7
- KUDO Operator
An operator in Kubernetes terms is an extension API for a Kubernetes cluster. It handles all non-Kubernetes business, manages it, and can cooperate with all the Kubernetes features. Many operators have been developed. You can even write your own if required. In this case, the operator handles the creation of all things Kafka and Zookeeper. Zookeeper is a component within Kafka that holds all the Kafka configuration.
Setup and Configuration
An OKE Cluster is already configured as explained previously. Next, download and install the KUDO operator:
Now that the KUDO CLI is installed, we can install Kafka.
It is important to use the proper storage class for the persistent volume setup. For the KUDO Kafka implementation on OCI, we’re using the following class oci-bv, which represents the blockvolume.csi.oraclecloud.com provisioner.
As seen, for each OKE member block storage is configured:
Finally, the command to create Zookeeper and Kafka instance pods is:
Creation will be handled by the KUDO operator.
In my case, I did not use a Loadbalancer controller but configured it on the OCI Loadbalancer. So, set it to NodePort:
And do a quick verification if things are working
K8skafka is the runtime container in the POD instance.
The next step will be creating Kafka connectors. This will come in a next blog, where we set up Autonomous Transaction Processing in combination with Kafka and WebLogic 14 JMS.