Correct Kafka producer choice for REST endpoint under heavy load

Hi, everyone! Thank you for trying to help, I will be brief.

To learn Kafka, I am trying the following:

  • I have a REST endpoint that pushes JSON request data into Kafka.
  • I want publishing into Kafka to be asynchronous
  • this way HTTP handler can immediately return HTTP response 202

I have read about Kafka, and watched Confluent’s video on YouTube.
I am still not able to make a confident decision between using sync or async Kafka producer.
This is where I need your help. Before I continue, allow me to provide some code:

func SomeGinHandler(c *gin.Context) {
    // assume we extracted data from request's JSON
    // data is stuffed into someValue byte array

	/* what comes is a very sensitive spot, 
       because Publish() may place data on the queue, 
       but still somehow fail; now we have "dirty" data in the queue 
       and I see no way to remove it from queue at this moment
    */

	err = kafkaProducer.Publish(c.Request.Context(), []byte(someKey), someValue))
	if err != nil {
		c.AbortWithStatusJSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
		return
	}

    // the point is that above Publish() call should not block but also be reliable
    // I may not lose message from this HTTP handler
    // reading through Kafka so far, I fear this is not possible 
    // I guess I must make a tradeoff, but what is the correct choice??
    // stick with synchronous producer, with ack = waitALL ??
    // or maybe async producer can somehow work here ?? how ??

	c.JSON(http.StatusAccepted, gin.H{"message" : "request accepted"})
}

As I have described in the code comments, I want Kafka publishing code to immediately return, so REST handler doesn’t get blocked.
.
I will process data in a separate microservice, using Kafka consumer group.

I may not afford message loss, and after reading about async producer, I found out it just “fires and forgets” the message, so message loss is possible.

Apparently, I must make a tradeoff, so what approach do you suggest in the above scenario, taking into consideration that REST endpoint will be under heavy load?

Regarding libraries, I am leaning towards franz-go, but sarama seems ok too. I would like to avoid CGO dependency, but if needed confluent’s library will do too. If you have some other library in mind I will take it into consideration as well. The nature of the problem I face is of design issue, so I do not expect library will magically solve it.

This sounds unrelated to what the two ends are. The basic logic is this:

new req -> handle some time -> to db -> make resp

Generally speaking, for the sake of strong data consistency, strict processing will be adopted. Maybe you should try some methods such as wal to record the treatment plan if the writing fails.

I guess synchronous producer is the way to go, because reliability beats asynchronicity in my case. I guess it will always be a tradeoff, so I have decided to follow along your advice and chose reliability because high load could be solved with horizontal scaling, thus Kafka sync producer could become fast enough to not become a bottleneck.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.