Kicking off Goroutines as fast as possible?

I have been working on a terminal app and need to load data from a 1,250 URLs currently I am doing the following:

var wg = sync.WaitGroup
var lock = sync.RWMutex

for _, text := range urls {
    time.Sleep(time.Millisecond * 1)
    wg.Add(1)
    go parseData(text, wg, lock)
}
wg.Wait()

func parseData(url string, wg sync.WaitGroup, lock sync.RWMutex) {
    defer wg.Done()
    defer lock.Unlock()

    htmlBody := readHTML(url)

    lock.Lock()
    addToMap(htmlBody)
}

Is there a better way to spin up these Goroutines so they could execute as fast as the connection allows? Without the sleep function it creates the Goroutines much too fast and gives a ā€œtoo many connectionsā€ error.

However, if I have the lock around the readHTML function and let them spin up all ā€œat onceā€ it takes far too long to get all the data.

Something I noticed with the time.Sleep() function in this usage is that occasionally the HTML still errors out. Itā€™s rare, maybe 1 in 100, but still undesirable.

Any and all thoughts are welcome, thanks!

1 Like

Find out what your connection limit is and use a pool of workers.

My assumption is that you hit a limit of file descriptors in your operating system or the other side denies connections because you are two fast.


Edit

Instead of a worker pool you could also use other means of rate limiting, eg a token bucket implementation.

4 Likes

Instead of the mutex, you may want to use sync.Map and you should use some rate limiter anyways, as @NobbZ said, in order not to saturate your resources because the list of urlā€˜s may grow.

2 Likes

Thanks @NobbZ and @acim Iā€™ll read up on these solutions and give them a try.

#AlwaysLearning :nerd_face:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.