Sep 30, 2015 - This article will get you part of the way there by describing how to deploy Kafka locally using Docker and test it using kafkacat.
What is Kafka? Kafka is an open-source, very scalable, distributed messaging platform by Apache. It is designed to handle large volumes of data in real-time efficiently.
Kafka works on the concept of a “publish-subscribe” methodology:. “Producers” will push content to the Kafka cluster, to a destination “topic”. A Kafka cluster is managed by Zookeeper, and can contain one or more “Brokers”. Brokers manage topics and the associated ingress and egress messages. “Consumers” subscribe to specific topics and will receive a continuous stream of data from the cluster for as long as there is a Producer pushing content to that topic. Topics have two key properties: partitions and replication factors.
Partitions logically separate the messages within a topic into small “groups” which are distributed between all available brokers. For example, if there are 3 brokers in a cluster, and a topic has 6 partitions, Broker #1 might be assigned Partition #3 & #6, Broker #2 - Partition #1 & #5, and Broker #3 - Partition #2 & #4. A replication factor helps add redundancy to a topic and its partitions across the available Brokers. Using the example above, if the same topic has a replication factor of 3, then each broker would have a copy of the partitions managed by other brokers in addition to its own. Hence, if a broker goes down, the messages stored in the partitions held by it are not lost.
The cool thing about Kafka consumers is that, when they are grouped together and used in conjunction with partitions, they can automatically load balance incoming messages between them! This is really handy for avoiding Consumer bottlenecks if you are dealing with real-time data in high volumes. If a Consumer cannot process the incoming messages fast enough, it will fall behind and you will begin to notice an increasing delay between the current time and the incoming message. If a second Consumer is added to the same group as the first, and both are subscribed to a topic with 6 partitions, then Consumer #1 might be assigned messages from Partition #1, #2, #5, and Consumer #2 might be assigned messages from Partition #3, #4, #6. This is instead of Consumer #1 bottlenecking by having to process Partitions #1-6 by itself.
Deploying Kafka in Docker greatly simplifies deployment as we do not need to manually configure each broker individually! We can use single Docker Compose file to deploy Kafka to multiple server instances using Docker Swarm in a single command.
Additionally, Docker allows us to easily scale up our cluster to additional nodes in the future if we require. Requirements When deployed via Docker, Kafka tends to use approximately 1.3 GB - 1.5 GB of RAM per broker - so make sure your server instances have enough memory allocated and available.
Process Before proceeding, make sure you have Docker installed on ALL of the servers you wish to deploy Kafka to. Part 1 - Create a Docker Swarm To begin, we need to initialize a Docker Swarm. A Swarm will consist of at least one Master Node, and one or more Worker Nodes (or, if you just have a single server instance, a single Master Node). Initialize the Swarm & Master Node On the server instance you have designated to be the Master, initialize the Swarm. This will set the current instance as the Master node: docker swarm init -advertise-addr MANAGER-IP:2377 Where MANAGER-IP is the static IP address of the server instance.
You can find this using ifconfig. This will generate a unique command which you will run on all other nodes in the next section in order to designate them as Worker nodes.
Save this command in a safe place. Example ( your command and token will be different!): docker swarm join -token SWMT.ewrp MANAGER-IP:2377 2.
Check your firewall Swarm nodes use port 2377 to communicate with the Master node which must not be blocked by any firewall. Additionally, Swarm daemons use ports 7946 (tcp/udp) and 4789 (udp) to communicate.
Make sure your all of your instances are able to communicate:. TCP/UDP traffic on ports 2377 and 7946. UDP traffic on port 4789. Configure the Worker Nodes On all other server instances you wish to deploy Kafka on, run the command generated in the step #1 above. This will add these instances to the Docker Swarm as Worker Nodes. If you are having issues:. As per Step #2 above, make sure all your instances can communicate on the specified ports.
Make sure all of your instances have static IP addresses assigned to them. Verify the Swarm On the Master Node only, run the following command to check the status of all nodes in the swarm: docker node ls Part 2 - Deploying Kafka 1.