IMPROVING THE PERFORMANCE OF ORDER SYNC
By Veeramani T — “A skilled techie”
In our organization, an important sync job was running simultaneously. During peak times, this sync job gets slower due to overload. To improve the order sync performance, we did a parallelization with Kafka.
This blog will give you an idea of how parallelization is being achieved in Kafka. If you already know about Kafka, this blog will be easily understandable. Otherwise, I can give you a simple introduction to KAFKA.
Apache Kafka is an event streaming (publish-subscribe) based durable messaging system. A messaging system, that sends messages between processes, applications, and servers.
Event streaming is an implementation of the pub/sub pattern but with these changes:
Currently, we have two kinds of systems. (ERP and CaratLane) During specific intervals, we poll the data from ERP and sync it to the CaratLane system. This sync is fully based on events such as order creation, order updation, and invoice creation. Every order should have these events, which need to be synced in the CaratLane system in chronological order.
Order creation — we can see this event when an order is created in the ERP system
Order updation — we can see this event when order detail is updated in the ERP system
Invoice creation — we can see this event when an invoice is generated in the ERP system
We can fetch the records from the ERP system, based on the order creation, order updation, and invoice creation events.
An update operation can be performed, if the entries are already present in the CaratLane system else, we can create a new entry.
Generally, the sync happens simultaneously. However, when there is a heavy load in the ERP, issues may appear in the CaratLane system, such as sync getting delayed and parallelization occurring when a sequential way of creation, updation, and invoice creation does not take place.
For Example,
A problem may occur if order creation and order updation events are operated by two different processors at the same time.
To avoid this, we can utilize the kafka producer-consumer approach.
We may be familiar that, Kafka is following a producer-consumer architecture. The producer generates the order event data to Kafka and the consumer reads the produced message and, puts it into the CaratLane system.
Secondly, we are maintaining three kinds of topics(online, store, and international) in this sync. If an order is placed online, then the orders should be published on the online topic, and similarly, store and international orders should be published on the store and international topic.
Once the message is published in the kafka topic, consumers will start to read the published message and must sync it to the CaratLane system.
In the new architecture, we may also face an issue if a different event for the same order message is published to a different partition.
For Example,
Let’s assume, an order creation event message is published to partition 1 and an order updation event message is published to partition 2.
If a consumer reads these messages at the same time, it will create a problem in the CaratLane system.
To avoid this issue, we are using a partition key concept. The partition key is a kafka in-built feature. When we use the partition key as a common id, kafka will publish all the events data against the order to the same partition itself, and load balancing is also maintained by kafka hence, kafka topic partitioning ensures the ordering of messages is done properly.
Message 1 - {sync_id:1, data: 'this data should sync to MySQL db', event: 'order_creation'}
Message 2 - {sync_id:1, data: 'this data should sync to MySQL db', event: 'order_updation'}
Message 3 - {sync_id:1, data: 'this data should sync to MySQL db', event: 'invoice_creation'}
Message 4 - {sync_id:2, data: 'this data should sync to MySQL db', event: 'order_creation'}
Message 5 - {sync_id:2, data: 'this data should sync to MySQL db', event: 'order_updation'}
Kafka publisher will publish as:
Message 1 to partition 1.
Message 2 to partition 1.
Message 3 to partition 1.
Message 4 to partition 2
Message 5 to partition 2, etc.,
The above messages 1, 2, and 3 should be present in one partition, and messages 4 and 5 should be present in another partition. If these messages are scattered in different partitions it may create a problem.
“Parallelization is simple in the case of consumers. When consumers read the data from the subscribed topics, it should be published in the CaratLane system. We can create two or more consumers for a single topic to speed up this sync process.”
Author
Editor
We at CaratLane are solving some of the most intriguing challenges to make our mark in the relatively uncharted omnichannel jewellery industry. If you are interested in tackling such obstacles, feel free to drop your updated resume/CV to careers@caratlane.com
Leave a Reply