Making this code section concurrent/parallel

jzakiya · February 20, 2021, 7:56pm

I’m implementing an algorithm in Go I’ve done in various other languages.
I’ve first done a single-threaded GO version to ensure I’ve got it working.
Now I want to do a multi-threaded version, as I’ve done with the other languages.
I’m using Go 1.16.

Below is the single-threaded Go code I needed to make multi-threaded.

sums := make([]uint, pairscnt) 
lastwins := make([]uint, pairscnt)

for i, r_hi := range restwins { // sieve twinpair restracks
   l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, 
                       modpg, primes, resinvrs)
   lastwins[i] = l; sums[i] = c
   fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
}

Here I use a goroutine to make it multi-threaded.
It compiles, but seems to only executes one time, and gives incorrect numerical results.

sums := make([]uint, pairscnt)
lastwins := make([]uint, pairscnt)
  
for i, r_hi := range restwins {
   go func() {
      l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, 
                          modpg, primes, resinvrs)
      lastwins[i] = l; sums[i] = c
      fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
   }()
}

Here’s the same (working) snippet in Crystal, which has a similar concurrency model.

sums = Array(UInt64).new(pairscnt, 0)
lastwins = Array(UInt64).new(pairscnt, 0)
done = Channel(Nil).new(pairscnt)

restwins.each_with_index do |r_hi, i|
  spawn do
    lastwins[i], sums[i] = twins_sieve(r_hi, kmin, kmax, kb, start_num, 
                           end_num, modpg, primes, resinvrs)
    print "\r#{i + 1} of #{pairscnt} twinpairs done"
    done.send(nil)
end end
pairscnt.times { done.receive }

Here’s the Rust snippet, which does true parallel execution, and is fastest of all versions.

let (lastwins, sums): (Vec<_>, Vec<_>) = {
  let counter = RelaxedCounter::new();
  restwins.par_iter().map( |r_hi| {
    let out = twins_sieve(*r_hi, kmin, kmax, kb, start_num, end_num,
                          modpg, &primes.to_vec(), &resinvrs);
    print!("\r{} of {} twinpairs done", counter.increment(), pairscnt);
    out
  }).unzip()
};

I hope a Go guru can tell me what I need to do to get the code to execute correctly.

skillian · February 20, 2021, 8:05pm

The issue is here:

for i, r_hi := range restwins {
   go func() {
      // using i and r_hi
   }()
}

In Go, functions that close over variables access them “by reference.” The for loop creates a new goroutine for each element in restwins but those goroutines share the actual i and r_hi variables, so you might encounter scenarios such as:

The goroutines you create within the loop don’t actually get scheduled and started until after the loop completes. If this happens, then all the goroutines see the same final value of i and r_hi.
Maybe the goroutines are batched together and you see some number of them observe one value for i and then some other goroutines see some other one value for i, etc.

The solution is to either:

Pass i and r_hi as parameters to your inner goroutine function:
```
for i, r_hi := range restwins {
   go func(i, r_hi uint) {
      // ...
   }(i, r_hi)
}
```
The benefit to doing it this way is if you don’t have to close over any other values, your function gets turned into a “normal” function without any additional processing overhead that closures can require.
Or duplicate the variables outside of the function:
```
for i, r_hi := range restwins {
   i, r_hi := i, r_hi
   go func() {
      // ...
   }()
}
```
The benefit to doing it this way is that you don’t have to repeat the type names, so if you ever change, say, from uint to uint64, you don’t need to change the types in the parameters to the function.

petrus · February 20, 2021, 8:45pm

Go: Frequently Asked Questions (FAQ)

What happens with closures running as goroutines

jzakiya · February 20, 2021, 10:49pm

Both techniques compile, and using htop I see the threads working.
However, the output values from twins_sieve aren’t being stored in the arrays,
so I don’t get the correct output values.

petrus · February 20, 2021, 11:54pm

A Tour of Go

The Go Programming Language Specification

The Go Programming Language, Alan A. A. Donovan · Brian W. Kernighan

The Go Blog: Concurrency is not parallelism

package main

import (
	"fmt"
	"sync"
)

func main() {
	pairscnt := 7
	restwins := make([]uint, pairscnt)
	sums := make([]uint, pairscnt)
	lastwins := make([]uint, pairscnt)

	var wg sync.WaitGroup
	for i, r_hi := range restwins {
		wg.Add(1)
		go func(i int, r_hi uint) {
			defer wg.Done()
			l, c := uint(i), uint(r_hi+1)
			lastwins[i] = l
			sums[i] = c
			fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
		}(i, r_hi)
	}
	wg.Wait()
	fmt.Println()
	fmt.Println(lastwins, sums)
}

7 of 7 twinpairs done
3 of 7 twinpairs done
2 of 7 twinpairs done
5 of 7 twinpairs done
4 of 7 twinpairs done
6 of 7 twinpairs done
1 of 7 twinpairs done
[0 1 2 3 4 5 6] [1 1 1 1 1 1 1]

While this runs, it make not be optimal.

Fastest Prime Sieve, in Nim

jzakiya · February 21, 2021, 2:37am

Thank you @petrus, that works. Here’s the full code segment.

sums := make([]uint, pairscnt)
lastwins := make([]uint, pairscnt)

var wg sync.WaitGroup
for i, r_hi := range restwins {
  wg.Add(1)
  go func(i, r_hi int) {
    defer wg.Done()
    l, c := twins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
    lastwins[i] = l; sums[i] = c
    fmt.Printf("\r%d of %d twinpairs done", (i + 1), pairscnt)
  }(i, r_hi)
}
wg.Wait()
fmt.Printf("\r%d of %d twinpairs done", pairscnt, pairscnt)

FYI, some numbers for single|multi-threaded versions; Linux, I7-3.5GHz, 8 threads.

     input        | single-threaded | multi-threaded
__________________|_________________|_______________
  100_000_000_000 |     29.1 secs   |     5.8 secs
  500_000_000_000 |    145.6 secs   |    29.9 secs
1_000_000_000_000 |    316.9 secs   |    59.9 secs

These times from “noisy” system, browser streaming video, et al going on too.
But it shows relative differences between the versions.

However, Rust, Nim, etc, that do true parallelism are much faster for this algorithm.

system · May 22, 2021, 2:51am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.