banner image 1 banner image 2


March 16, 2023
5 mins
blog-img 1
Veeramani T

This article explains, how parallelization can be achieved in Kafka when there is an overload.

By Veeramani T — “A skilled techie”

In our organization, an important sync job was running simultaneously. During peak times, this sync job gets slower due to overload. To improve the order sync performance, we did a parallelization with Kafka.

This blog will give you an idea of how parallelization is being achieved in Kafka. If you already know about Kafka, this blog will be easily understandable. Otherwise, I can give you a simple introduction to KAFKA.

Apache Kafka

Apache Kafka is an event streaming (publish-subscribe) based durable messaging system. A messaging system, that sends messages between processes, applications, and servers.

Event streaming is an implementation of the pub/sub pattern but with these changes:

  • Events occur instead of messages.
  • Events are ordered, typically by time.
  • Consumers can read events from a specific point on the topic.
  • The events have temporal durability.

Old Architecture:

Currently, we have two kinds of systems. (ERP and CaratLane) During specific intervals, we poll the data from ERP and sync it to the CaratLane system. This sync is fully based on events such as order creation, order updation, and invoice creation. Every order should have these events, which need to be synced in the CaratLane system in chronological order.

Brief about events:

Order creation — we can see this event when an order is created in the ERP system

Order updation — we can see this event when order detail is updated in the ERP system

Invoice creation— we can see this event when an invoice is generated in the ERP system

Apache Events

What are we doing in this sync?

We can fetch the records from the ERP system, based on the order creation, order updation, and invoice creation events.

An update operation can be performed, if the entries are already present in the CaratLane system else, we can create a new entry.

Problems with the old architecture:

Generally, the sync happens simultaneously. However, when there is a heavy load in the ERP, issues may appear in the CaratLane system, such as sync getting delayed and parallelization occurring when a sequential way of creation, updation, and invoice creation does not take place.

For Example,

A problem may occur if order creation and order updation events are operated by two different processors at the same time.

To avoid this, we can utilize the kafka producer-consumer approach.

New Architecture:


We may be familiar that, Kafka is following a producer-consumer architecture. The producer generates the order event data to Kafka and the consumer reads the produced message and, puts it into the CaratLane system.

Apache Kafka Architecture diagram

Secondly, we are maintaining three kinds of topics(online, store, and international) in this sync. If an order is placed online, then the orders should be published on the online topic, and similarly, store and international orders should be published on the store and international topic.

Once the message is published in the kafka topic, consumers will start to read the published message and must sync it to the CaratLane system.

Problem with the New Architecture:

In the new architecture, we may also face an issue if a different event for the same order message is published to a different partition.

For Example,

Let’s assume, an order creation event message is published to partition 1 and an order updation event message is published to partition 2.

If a consumer reads these messages at the same time, it will create a problem in the CaratLane system.

To avoid this issue, we are using a partition key concept. The partition key is a kafka in-built feature. When we use the partition key as a common id, kafka will publish all the events data against the order to the same partition itself, and load balancing is also maintained by kafka hence, kafka topic partitioning ensures the ordering of messages is done properly.

Message 1 - {sync_id:1, data: 'this data should sync to MySQL db', event: 'order_creation'}
Message 2 - {sync_id:1, data: 'this data should sync to MySQL db', event: 'order_updation'}
Message 3 - {sync_id:1, data: 'this data should sync to MySQL db', event: 'invoice_creation'}
Message 4 - {sync_id:2, data: 'this data should sync to MySQL db', event: 'order_creation'}
Message 5 - {sync_id:2, data: 'this data should sync to MySQL db', event: 'order_updation'}

Kafka publisher will publish as:

Message 1 to partition 1.

Message 2 to partition 1.

Message 3 to partition 1.

Message 4 to partition 2

Message 5 to partition 2, etc.,

The above messages 1, 2, and 3 should be present in one partition, and messages 4 and 5 should be present in another partition. If these messages are scattered in different partitions it may create a problem.

“Parallelization is simple in the case of consumers. When consumers read the data from the subscribed topics, it should be published in the CaratLane system. We can create two or more consumers for a single topic to speed up this sync process.”

Meet the team!


Veeramani T


Ramyaraghavan R

We at CaratLane are solving some of the most intriguing challenges to make our mark in the relatively uncharted omnichannel jewellery industry. If you are interested in tackling such obstacles, feel free to drop your updated resume/CV to
blog-img 2


blog-img 3
5 mins
May 17, 2023
Sharing Data Between Controllers: Best Practices S...

This article will help you to understand the diffe

By Naveen C

blog-img 3
5 mins
March 21, 2023
Understanding Auto Layout and Constraints in Swift...

This article gives you an easy way of understandin

By Ramasamy P