Thank you @petrus, that works. Here’s the full code segment.
sums := make([]uint, pairscnt)
lastwins := make([]uint, pairscnt)
var wg sync.WaitGroup
for i, r_hi := range restwins {
wg.Add(1)
go func(i, r_hi int) {
defer wg.Done()
l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
lastwins[i] = l; sums[i] = c
fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
}(i, r_hi)
}
wg.Wait()
fmt.Printf("\r%d of %d twinpairs done", pairscnt, pairscnt)
FYI, some numbers for single|multi-threaded versions; Linux, I7-3.5GHz, 8 threads.
input | single-threaded | multi-threaded
__________________|_________________|_______________
100_000_000_000 | 29.1 secs | 5.8 secs
500_000_000_000 | 145.6 secs | 29.9 secs
1_000_000_000_000 | 316.9 secs | 59.9 secs
These times from “noisy” system, browser streaming video, et al going on too.
But it shows relative differences between the versions.
However, Rust, Nim, etc, that do true parallelism are much faster for this algorithm.