Passing array by Reference vs Values

jzakiya · January 1, 2022, 11:35pm

I have working code which I’m trying to reduce its memory use.

I have a primes slice that I pass to a go function to do parallel processing, which can be very large (up to 100s of millions of primes). It’s read only after its created, and not changed.

The go functions eats up allot of memory compared to the Rust version of the same program, as it only passes primes by reference vs value (I originally passed copies of the primes array until I figured out how to do it by reference, and memory use dropped significantly).

The Go version memory use behaves like Rust's before making the change so I’ve looked at the docs and tried to pass primes by reference according to this doc:

Here’s a snippet of the original code.

  var wg sync.WaitGroup
  for i, r_hi := range rescousins {
    wg.Add(1)
    go func(i, r_hi int) {
      defer wg.Done()
      l, c := cousins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, primes, resinvrs)
      lastcousins[i] = l; ■■■■[i] = c
      fmt.Printf("\r%d of %d cousinpairs done", (i + 1), pairscnt)
    }(i, r_hi)
  }
  wg.Wait()

And here’s the modded version to pass primes by reference.

  refprimes := &primes
  var wg sync.WaitGroup
  for i, r_hi := range rescousins {
    wg.Add(1)
    go func(i, r_hi int) {
      defer wg.Done()
      l, c := cousins_sieve(r_hi, kmin, kmax, kb, start_num, end_num, modpg, *refprimes, resinvrs)
      lastcousins[i] = l; ■■■■[i] = c
      fmt.Printf("\r%d of %d cousinpairs done", (i + 1), pairscnt)
    }(i, r_hi)
  }
  wg.Wait()

In both cases it seems primes data is still being copied into each thread to use.

Can this be written so that isn’t the case, so primes data can be shared and not have to be copied to use?

gulliet · January 2, 2022, 9:56am

If I am not mistaken, you are not passing the array by reference.

Here is an example:

package main

import (
	"fmt"
	"sync"
)

func main() {
	var wg sync.WaitGroup

	arr := [5]int{1, 2, 4, 6, 10}
	fmt.Println(arr)

	wg.Add(1)
	go func() {
		defer wg.Done()
		toPrimes(&arr)
	}()

	wg.Wait()
	fmt.Println(arr)
}

func toPrimes(ptr *[5]int) {
	for i := 0; i < len(*ptr); i++ {
		ptr[i] += 1
	}
}

A better approach may be to use a slice of int rather than an array of int.


package main

import (
	"fmt"
	"sync"
)

func main() {
	var wg sync.WaitGroup

	si := []int{1, 2, 4, 6, 10}
	fmt.Println(si)

	wg.Add(1)
	go func() {
		defer wg.Done()
		toPrimes(si)
	}()

	wg.Wait()
	fmt.Println(si)
}

func toPrimes(primes []int) {
	for i := 0; i < len(primes); i++ {
		primes[i] += 1
	}
}

Hope this helps!

GonzaSaya · January 2, 2022, 12:21pm

Yes!
*&primes is equal to primes variable.

In your code:
refprimes is equal to &primes
*refprimes is equal to *&primes
So
*refprimes is igual to primes

jzakiya · January 2, 2022, 10:16pm

I’m using 1.17.5.

Those don’t work, and no matter what I do to get the program to compile they all use the same amount of memory.

So it seems no matter what the data in primes is not shared but copied in total to each thread, which is not what I want.

So the ultimate question is, does Go allow sharing of the data in primes in each thread, and if so how?

GonzaSaya · January 3, 2022, 12:48am

Can you share the definition of the primes variable?
Or can you share the full script?
I believe you are defining as an array instead of a slice.

jzakiya · January 3, 2022, 1:05am

Here’s the whole code.

gist.github.com

https://gist.github.com/jzakiya/0ea756a8f6fd09f56cd9374d0dcf4197

cousinprimes_ssoz.go

// This Go source file is a multiple threaded implementation to perform an
// extremely fast Segmented Sieve of Zakiya (SSoZ) to find Cousin Primes <= N.

// Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
// Output is the number of cousiin primes <= N, or in range N1 to N2; the last
// cousin prime value for the range; and the total time of execution.

// This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
// 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
// probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

This file has been truncated. show original

I create primes in func sozpg and use it as input in go func cousins_sieve.

I have 16 GB of system memory, so I can run the following example.

➜  go-projects echo 11844600000000000 11844601500991000 | ./cousinprimes_ssoz
threads = 8
using Prime Generator parameters for P11
segment size = 262144 resgroups; seg array is [1 x 4096] 64-bits
cousinprime candidates = 87720435; resgroups = 649781
each of 135 threads has nextp[2 x 6240199] array
setup time = 261.943245ms 
perform cousinprimes ssoz sieve
135 of 135 cousinpairs done
sieve time = 14.021829717s
total time = 14.283803133s
last segment = 125493 resgroups; segment slices = 3
total cousins = 1446744; last cousin = 11844601500989267/-4%

I use htop to monitor threads|mem use (on a i7-6700HQ 4C|8T, 2.6-3.5 GHz, Linux laptop).

This input takes about 14.5 GB max mem with Go; the Rust version takes < 4 GB max.
Here’s the Rust code.

gist.github.com

https://gist.github.com/jzakiya/8879c0f4dfda543eaf92a3186de554d7

cousinprimes_ssoz.rs

// This Rust source file is a multiple threaded implementation to perform an
// extremely fast Segmented Sieve of Zakiya (SSoZ) to find Cousin Primes <= N.

// Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
// Output is the number of cousin primes <= N, or in range N1 to N2; the last
// cousin prime value for the range; and the total time of execution.

// This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
// 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
// probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

This file has been truncated. show original

system · April 3, 2022, 1:06am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.