What's the best way to do thousands of api requests parallely in GO?

Hey gophers,

I need to fetch 5000+ items from the same endpoint simultaneously. The API endpoint returns 500 items at a time in JSON format, and I have to iterate through the offset parameter to fetch the rest of the items. Each item contains information about a specific product, resulting in 5000+ unique products.

I have two main requirements:
a) I want to make all the requests in parallel.
b) I want to explore the best methods for processing the data, including how to save the data and the most suitable data structure.

I think this is probably too vague to get a super detailed response. But - often APIs that return an offset do so because they’re meant to be called serially. Is there a way you can get a count beforehand so you can determine how many sets of 500 items you need to pull?

Doing things concurrently is a relatively simple problem to solve. Check out this simple example from The Go Programming Language. You can even read this part of that book for free. Scroll to page 36 of that PDF (page 17 of the book). I have recommended that book often and for good reason; it is excellent.

In terms of data structures, if you’re talking about Go structs to represent unmarshalled JSON, take some example JSON and paste it into this tool to generate a struct. That will at least be a good starting point.

I have a lot of experience interacting with external APIs and I can tell you this: getting the best performance here is probably going to involve some trial and error. You might try to get too clever with concurrency and find yourself rate limited or saturate your/their connection. My best advice is: start with a simple solution that works and don’t worry about performance too much yet. 5,000 products doesn’t sound like a huge amount of data to me so performance might not even be a concern.

In summary: keep it simple. Make something that works. Tweak it over time to achieve better performance. But that’s just my $.02 and somebody else might have more specific advice! Also might help if you mention what API you’re using.

2 Likes

The endpoint is private access. I can get around the rate limits using whitelisted proxies.

By data structures, I mean I will have to save the json response in some form, like go slices, arrays or even go structs like you said so that I can put each product through a set of conditions and business logics to filter them.

I’ve done what you recommended in python a few weeks back and it got very complicated when I wanted to improve performance. This is why I switched to Go. In your opinion, is it much easier to pivot in Go when I need the extra performance boost?

OK in that case, that link I gave you above is going to be really useful I would imagine. Just paste your example JSON in and you will have a great starting point. That was built by the author of Caddy and I use it all the time.

I don’t know that I’d say that. But in general, refactoring in statically typed languages is easier and less error-prone. And certainly easy concurrency is one of Go’s selling points; if you want a good book on this, Katherine Cox-Buday’s book is excellent. Try to focus on keeping your functions short and sweet where you can because refactoring small functions is easier than a giant chunk of spaghetti code.

But once again, I’d suggest just getting things working and then deciding how/where you need to optimize. Do you even know that pulling data from the API is your bottleneck? Maybe whatever processing you do to the products is the slow part. Maybe you need to synchronously pull data from the API and send the products on a channel to be asynchronously processed/saved/whatever. My point with this is: I don’t know, and you don’t know either. Yet. :slight_smile:

Also one other argument for Go is: testing/benchmarking stuff is built in to the language and incredibly easy (and there is a culture of benchmarking). Want to know if something is slow? Benchmark it.

1 Like