Build a Kafka Cluster on an Oracle Kubernetes Engine

Build a Kafka Cluster on an Oracle Kubernetes Engine

Published on: Category: Oracle

Messaging is an important part of Middleware platforms. It is in fact an essential part for integrating and distributing data amongst source and target systems. With the rise of application and data integration patterns, message brokers became an important part in handling and transferring data in a solid and secure matter, in the most optimal way with a minimum of loss of data during transfer.

Traditional ways of messaging and message brokers can be divided into several parts:

Less intelligent

JMS – Java Messaging

JMS offers a set of APIs for messaging: put a message on a queue and someone else, sometime later, perhaps a far distance from each other, takes the message off the queue and processes it. It is decoupled in time and location of the message provider and consumer. Even if the message consumer happens to be down for a time, messages can be reproduced.

A simple JMS Queue

JMS also has a publish/subscribe capability where the producer puts the message to a "topic”. Any interested parties can subscribe to that topic, receiving messages as and when they are produced, but for now I will focus on the queue capability. There is some decoupling of the relationship between provider and consumer. However, some coupling remains.

There are a few opensource message brokers which use the JMS protocol: Oracle WebLogic and IBM WebSphere MQ, which is a technology that can also be used to transport JMS messages. Furthermore, IBM WebSphere MQ is also a native IBM queueing mechanism.

More Intelligent:

Enterprise Service Bus:

An ESB is a Content-Based Routing capability. But the ESB also uses the fundamental queuing capabilities offered by JMS.

A second way of coupling for an ESB is possible if the consumer and producer agree about the message format. That's fine in simple cases. But when you have many producers who all send messages to the same queue, you will have versioning problems. New message formats are introduced but you don't want to change all the existing providers.  

Message transformation is another ESB capability; to talk to multiple formats of data.

New and evolved ways of messaging

In our ever-changing, 24/7, “always on” world, the need for real-time, scalable and high-performance communication drives companies to think of new ways to provide these capabilities.

One of the most popular and highly adopted technologies around is an open-source technology called Kafka, provided by the Apache Software Foundation. It was originally written by LinkedIn and donated to the Opensource Community in 2011.

Kafka is quite popular, according this picture of Datalabs©

Kafka is a “topic based” platform that provides a near real-time, low-latency, high-throughput data stream processing platform capable of managing almost real-time data feeds. It is mainly written in Java and Scala.

The platform holds a set of APIs (connectors) and topics (message containers) that are all held together within a cluster to provide high availability, high performance messaging and scalability. In the picture, you can see a very simple representation of a Kafka platform:

As noted, this is a very simple representation. Over the years, Kafka has been enhanced with newer APIs called connectors, which can be used to connect any flavor of technology.

This article is not specifically written to deep dive into Kafka, but I think a little context is necessary. There are already many books, articles and courses around Kafka and streams technology. In this article, I would like to focus on how to build a Kafka Cluster in a Cloud and Container Native topology.

Kafka in the Cloud

If you say Kafka, the name Confluent immediately pops up. Confluent is a company that “does Big Data” and was established by some of the original Kafka founders. They provide a complete solution build on Kafka and all the tools a data integration and streaming platform needs.

You can run Kafka in your own datacenter or on premises as part of your infrastructure. However, with Cloud strategies these days, more companies run Kafka in the Cloud in some form of Cloud subscription. All the known technology players have an end-to-end PaaS solution build with Kafka:

  • Microsoft Azure Events Hub and Confluent
  • Amazon Managed Streams for Kafka (MSK)
  • Google Cloud – Integrated and hosting Confluent Cloud
  • Confluent Cloud
  • Oracle OCI Event Hub

All these solutions have their own implementations, infrastructures, connectors and more, but in the end, they are more or less the same.

I do not want to discuss these solutions for now. Instead, I want to focus on how to build a Kafka Cluster on a Kubernetes Platform in the Oracle Cloud.

Why Kafka on Kubernetes (in the Cloud)?

The managed solutions I discussed are all fine, but when your situation is a hybrid between Cloud and on premises, maybe consider having some control. Apache Kafka on Kubernetes is perfect for cloud-native development. Cloud-native applications are independent, loosely coupled and distributed services that deliver high scalability via the cloud. This is also applicable for event-driven applications built on Kafka; they are designed to scale across a distributed hybrid cloud environment. Another benefit of running Kafka on Kubernetes is infrastructure abstraction: it can be configured once and run everywhere. This is also a benefit if you build your infrastructure with the “Configuration as Code” method.

The flexible scalability of Kubernetes is suited for a solution like Kafka. Kubernetes allows applications to scale resources up and down with a simple command, or scale automatically based on usage. Kubernetes also has the option to span across different platform implementations (Cloud, on-premises, hybrid clouds, and different operating systems).

Kafka on Oracle Kubernetes Engine (OKE)

You can run your Kubernetes Clusters on premises, but I chose for a hybrid solution. Meaning: a company is not completely Cloud-based. A hybrid situation will always exist. In fact, they will occur more and more in the coming years.

There are a few options to build a Kafka Cluster on Kubernetes:

  • Build it yourself completely – which means building compute instances, storage, container images, and a Kubernetes cluster
  • Use a Kubernetes Operator, that does a lot of work for you. There are a few, such as:
    o   Strimzi Operator
    o   KUDO Operator

For building Kafka on OKE, I used the following components:

  • OCI Cloud Infrastructure EU-Frankfurt
  • Block Storage and NFS
  • 4 Compute instances (as part of the OKE Cluster)
  • OKE Node Pool of 4 instances
  • Cloud Networking and Load Balancer
  • Kubernetes version 1.19.7
  • KUDO Operator
Schematic overview of architecture

An operator in Kubernetes terms is an extension API for a Kubernetes cluster. It handles all non-Kubernetes business, manages it, and can cooperate with all the Kubernetes features. Many operators have been developed. You can even write your own if required. In this case, the operator handles the creation of all things Kafka and Zookeeper. Zookeeper is a component within Kafka that holds all the Kafka configuration.

Setup and Configuration

An OKE Cluster is already configured as explained previously. Next, download and install the KUDO operator:

Now that the KUDO CLI is installed, we can install Kafka.

Storage class

It is important to use the proper storage class for the persistent volume setup. For the KUDO Kafka implementation on OCI, we’re using the following class oci-bv, which represents the blockvolume.csi.oraclecloud.com provisioner.

As seen, for each OKE member block storage is configured:

Finally, the command to create Zookeeper and Kafka instance pods is:

Creation will be handled by the KUDO operator.

In my case, I did not use a Loadbalancer controller but configured it on the OCI Loadbalancer. So, set it to NodePort:

And do a quick verification if things are working

K8skafka is the runtime container in the POD instance.

Next Steps

The next step will be creating Kafka connectors. This will come in a next blog, where we set up Autonomous Transaction Processing in combination with Kafka and WebLogic 14 JMS.

Michel Schildmeijer
About the author Michel Schildmeijer

Having made his start in the pharmacy sector, Michel transitioned to IT in 1996, working on a UNIX TTY terminal-based system and the MUMPS language. He currently works as a solutions architect at Qualogy, with a focus on middleware, application integration and service-oriented architecture. His passion for middleware started in 2000 when working as a support analyst for a financial institute with BEA WebLogic and Tuxedo. Michel is an expert on the WebLogic platform. He serves customers in his role as architect and advises them in all aspects of their IT landscape. He became an Oracle ACE in 2012 and wrote two books about WebLogic: Oracle WebLogic Server 11gR1 PS2: Administration Essentials and Oracle WebLogic Server 12c: First Look. He is a well-known speaker at national and international conferences and is recognised as an official Oracle Speaker. Read his blog: https://community.oracle.com/blogs/mnemonic

More posts by Michel Schildmeijer
Comments
Reply