Kicking off Goroutines as fast as possible?


(Jaron R. Swab) #1

I have been working on a terminal app and need to load data from a 1,250 URLs currently I am doing the following:

var wg = sync.WaitGroup
var lock = sync.RWMutex

for _, text := range urls {
    time.Sleep(time.Millisecond * 1)
    wg.Add(1)
    go parseData(text, wg, lock)
}
wg.Wait()

func parseData(url string, wg sync.WaitGroup, lock sync.RWMutex) {
    defer wg.Done()
    defer lock.Unlock()

    htmlBody := readHTML(url)

    lock.Lock()
    addToMap(htmlBody)
}

Is there a better way to spin up these Goroutines so they could execute as fast as the connection allows? Without the sleep function it creates the Goroutines much too fast and gives a “too many connections” error.

However, if I have the lock around the readHTML function and let them spin up all “at once” it takes far too long to get all the data.

Something I noticed with the time.Sleep() function in this usage is that occasionally the HTML still errors out. It’s rare, maybe 1 in 100, but still undesirable.

Any and all thoughts are welcome, thanks!


(Norbert Melzer) #2

Find out what your connection limit is and use a pool of workers.

My assumption is that you hit a limit of file descriptors in your operating system or the other side denies connections because you are two fast.


Edit

Instead of a worker pool you could also use other means of rate limiting, eg a token bucket implementation.


(Boban Acimovic) #3

Instead of the mutex, you may want to use sync.Map and you should use some rate limiter anyways, as @NobbZ said, in order not to saturate your resources because the list of url‘s may grow.


(Jaron R. Swab) #4

Thanks @NobbZ and @acim I’ll read up on these solutions and give them a try.

#AlwaysLearning :nerd_face:


(system) closed #5

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.