Making this code section concurrent/parallel

I’m implementing an algorithm in Go I’ve done in various other languages.
I’ve first done a single-threaded GO version to ensure I’ve got it working.
Now I want to do a multi-threaded version, as I’ve done with the other languages.
I’m using Go 1.16.

Below is the single-threaded Go code I needed to make multi-threaded.

sums := make([]uint, pairscnt) 
lastwins := make([]uint, pairscnt)

for i, r_hi := range restwins { // sieve twinpair restracks
   l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, 
                       modpg, primes, resinvrs)
   lastwins[i] = l; sums[i] = c
   fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
}

Here I use a goroutine to make it multi-threaded.
It compiles, but seems to only executes one time, and gives incorrect numerical results.

sums := make([]uint, pairscnt)
lastwins := make([]uint, pairscnt)
  
for i, r_hi := range restwins {
   go func() {
      l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, 
                          modpg, primes, resinvrs)
      lastwins[i] = l; sums[i] = c
      fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
   }()
}

Here’s the same (working) snippet in Crystal, which has a similar concurrency model.

sums = Array(UInt64).new(pairscnt, 0)
lastwins = Array(UInt64).new(pairscnt, 0)
done = Channel(Nil).new(pairscnt)

restwins.each_with_index do |r_hi, i|
  spawn do
    lastwins[i], sums[i] = twins_sieve(r_hi, kmin, kmax, kb, start_num, 
                           end_num, modpg, primes, resinvrs)
    print "\r#{i + 1} of #{pairscnt} twinpairs done"
    done.send(nil)
end end
pairscnt.times { done.receive }

Here’s the Rust snippet, which does true parallel execution, and is fastest of all versions.

let (lastwins, sums): (Vec<_>, Vec<_>) = {
  let counter = RelaxedCounter::new();
  restwins.par_iter().map( |r_hi| {
    let out = twins_sieve(*r_hi, kmin, kmax, kb, start_num, end_num,
                          modpg, &primes.to_vec(), &resinvrs);
    print!("\r{} of {} twinpairs done", counter.increment(), pairscnt);
    out
  }).unzip()
};

I hope a Go guru can tell me what I need to do to get the code to execute correctly.

The issue is here:

for i, r_hi := range restwins {
   go func() {
      // using i and r_hi
   }()
}

In Go, functions that close over variables access them “by reference.” The for loop creates a new goroutine for each element in restwins but those goroutines share the actual i and r_hi variables, so you might encounter scenarios such as:

  • The goroutines you create within the loop don’t actually get scheduled and started until after the loop completes. If this happens, then all the goroutines see the same final value of i and r_hi.
  • Maybe the goroutines are batched together and you see some number of them observe one value for i and then some other goroutines see some other one value for i, etc.

The solution is to either:

  • Pass i and r_hi as parameters to your inner goroutine function:

    for i, r_hi := range restwins {
       go func(i, r_hi uint) {
          // ...
       }(i, r_hi)
    }
    

    The benefit to doing it this way is if you don’t have to close over any other values, your function gets turned into a “normal” function without any additional processing overhead that closures can require.

  • Or duplicate the variables outside of the function:

    for i, r_hi := range restwins {
       i, r_hi := i, r_hi
       go func() {
          // ...
       }()
    }
    

    The benefit to doing it this way is that you don’t have to repeat the type names, so if you ever change, say, from uint to uint64, you don’t need to change the types in the parameters to the function.

Go: Frequently Asked Questions (FAQ)

What happens with closures running as goroutines

Both techniques compile, and using htop I see the threads working.
However, the output values from twins_sieve aren’t being stored in the arrays,
so I don’t get the correct output values.


A Tour of Go

The Go Programming Language Specification

The Go Programming Language, Alan A. A. Donovan · Brian W. Kernighan

The Go Blog: Concurrency is not parallelism


package main

import (
	"fmt"
	"sync"
)

func main() {
	pairscnt := 7
	restwins := make([]uint, pairscnt)
	sums := make([]uint, pairscnt)
	lastwins := make([]uint, pairscnt)

	var wg sync.WaitGroup
	for i, r_hi := range restwins {
		wg.Add(1)
		go func(i int, r_hi uint) {
			defer wg.Done()
			l, c := uint(i), uint(r_hi+1)
			lastwins[i] = l
			sums[i] = c
			fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
		}(i, r_hi)
	}
	wg.Wait()
	fmt.Println()
	fmt.Println(lastwins, sums)
}

https://play.golang.org/p/sTPp7-j0iFs

7 of 7 twinpairs done
3 of 7 twinpairs done
2 of 7 twinpairs done
5 of 7 twinpairs done
4 of 7 twinpairs done
6 of 7 twinpairs done
1 of 7 twinpairs done
[0 1 2 3 4 5 6] [1 1 1 1 1 1 1]

While this runs, it make not be optimal.

Fastest Prime Sieve, in Nim

Thank you @petrus, that works. Here’s the full code segment.

sums := make([]uint, pairscnt)
lastwins := make([]uint, pairscnt)

var wg sync.WaitGroup
for i, r_hi := range restwins {
  wg.Add(1)
  go func(i, r_hi int) {
    defer wg.Done()
    l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
    lastwins[i] = l; sums[i] = c
    fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
  }(i, r_hi)
}
wg.Wait()
fmt.Printf("\r%d of %d twinpairs done", pairscnt, pairscnt)

FYI, some numbers for single|multi-threaded versions; Linux, I7-3.5GHz, 8 threads.

     input        | single-threaded | multi-threaded
__________________|_________________|_______________
  100_000_000_000 |     29.1 secs   |     5.8 secs
  500_000_000_000 |    145.6 secs   |    29.9 secs
1_000_000_000_000 |    316.9 secs   |    59.9 secs

These times from “noisy” system, browser streaming video, et al going on too.
But it shows relative differences between the versions.

However, Rust, Nim, etc, that do true parallelism are much faster for this algorithm.