Concurrency slower?


(Marc Zahn) #1

Hi,

I am still new to Go and I wanted to try out the potential of concurrency. I created a simple programm for creating lists of lotto numbers:

package main

import (
    "fmt"
    "math/rand"
    "sync"
    "time"
)

var list = [][]int{}

func main() {

    rand.Seed(time.Now().UnixNano())

    rec := make(chan []int, 10000000)

    go func(c chan []int) {
        for {
            list = append(list, <-c)
        }
    }(rec)

    start := time.Now()
    fmt.Println("Start")
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        func() {
            defer wg.Done()
            for i := 0; i < 1000000; i++ {
                add(rec)
            }
        }()
    }
    wg.Wait()
    d := time.Now().Sub(start)
    fmt.Println("End ", len(list), " ", d.Seconds())
}

func add(c chan []int) {
    c <- []int{
        rand.Intn(49),
        rand.Intn(49),
        rand.Intn(49),
        rand.Intn(49),
        rand.Intn(49),
        rand.Intn(49),
        rand.Intn(49),
    }
}

In fact it is pretty simple. I have an array which I want to fill with 10,000,000 [7]int arrays. For that I iterate, create in each iteration an array with random numbers write them into the channel and the reading go routine appends it to the array.

Some things I found out so far:

  • Initially on my system it took 8.5 s (i7)

  • Giving the channel such a large buffer reduces the used time to ~4.5 s

  • make’ing the array list with the final size increases the used time by ~20%

  • Running

    func() {
    defer wg.Done()
    for i := 0; i < 1000000; i++ {
    add(rec)
    }
    }()

as a go routine increases the used time to ~13 s - WTH? Shouldn’t this be faster?

Could someone explain me why the latter happens? I just do not get it…

Strange thing on top: After running the script there are rarely 10,000,000 items in that list but ~20 - 100 less.

EDIT: What I just found out: If I avoid the channel but append the values in the app function directly I go down to ~3.7 s


(Jakob Borg) #2

Channel operations and goroutines have a cost. It’s not high, but if you’re doing almost no work per channel operation or goroutine it adds up. Also, the rand package level convenience functions use locking for serialization, preventing you from getting any parallel work done there.


(Marc Zahn) #3

Ok,

I guess I found the fastest possible solution - But maybe one with more experience could improve even this:

package main

import (
    "fmt"
    "math/rand"
    "sync"
    "time"
)

var parallelCount = 10
var iterations = 1000000
var list = make([][]int, parallelCount * iterations)

var wg sync.WaitGroup
func main() {
    start := time.Now()
    fmt.Println("Start")
    wg.Add(parallelCount)
    for i := 0; i < parallelCount; i++ {
        go func() {
            defer wg.Done()

            r := rand.New(rand.NewSource(time.Now().UnixNano()))

            for i := 0; i < iterations; i++ {
                list[parallelCount * iterations - 1] = add(r)
            }
        }()
    }
    wg.Wait()
    d := time.Now().Sub(start)
    fmt.Println("End ", len(list), " ", d.Seconds())
}

func add(r *rand.Rand) []int {
    return []int{
        r.Intn(49),
        r.Intn(49),
        r.Intn(49),
        r.Intn(49),
        r.Intn(49),
        r.Intn(49),
        r.Intn(49),
    }
}

On the same system as yesterday this takes only ~0.6 s. Funny is that I accidentally found the best combination of goroutine count and iteration count. Both 5/2000000 and 20/500000 are ~20% slower Has someone an idea about this?

What I found out about the lost messages: The program quit before the buffer was completely wirtten into the list array.

All in all I learned now that sophisticated solutions are not always the best :smiley:


(nathankerr) #4

Based on the testing you are doing, the fastest solution (based on your fastest possible solution) is:

package main

import (
	"fmt"
	"time"
)

var parallelCount = 10
var iterations = 1000000
var list = make([][]int, parallelCount*iterations)

func main() {
	start := time.Now()
	fmt.Println("Start")
	d := time.Now().Sub(start)
	fmt.Println("End ", len(list), " ", d.Seconds())
}

When I run it, I get:

Start
End  10000000   0.000399549

Yes, I did just delete most of the code you worked so hard to write. There was also another bug:

list[parallelCount * iterations - 1] = add(r)

Only one item of list was ever set. While this really helps with cache hits, it doesn’t help with providing correct results. Of course it is really hard to test that a random set of data is correct. Since you are manually checking things anyway, viewing a random sample will work well enough:

for i := 0; i < 10; i++ {
	j := rand.Intn(len(list))
	fmt.Println(j, list[j])
}

Adding that to the end of main and printing len(list) at the beginning of your fastest version produces:

Start 10000000
End  10000000   4.770166086
8498081 []
9727887 []
7131847 []
9984059 []
1902081 []
4941318 []
954425 []
6122540 []
8240456 []
6203300 []

A Serial Version

Working back from the two versions provided and applying the improved test ideas, I think the serial (i.e., no concurrency, no parallelism) version of your program would be something like:

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func main() {
	const n = 10000000 // number of lotto numbers to produce
	list := make([][]int, 0, n)
	rand.Seed(time.Now().UnixNano())

	start := time.Now()
	fmt.Printf("Start\t%v\n", len(list))

	for i := 0; i < n; i++ {
		list = append(list, []int{
			rand.Intn(49),
			rand.Intn(49),
			rand.Intn(49),
			rand.Intn(49),
			rand.Intn(49),
			rand.Intn(49),
			rand.Intn(49),
		})
	}

	fmt.Printf("End\t%v %v\n", len(list), time.Since(start))
	for i := 0; i < 10; i++ {
		j := rand.Intn(len(list))
		fmt.Println(j, list[j])
	}
}

New versions should be in separate files and only change main between the Start and End Printfs.

I ran the program with go run serial.go and got:

Start	0
End	10000000 12.957932904s
3872754 [7 34 41 36 13 32 33]
9294501 [27 45 14 19 1 15 38]
3055379 [43 9 47 29 5 1 8]
3537380 [35 40 38 21 41 43 6]
2400376 [8 1 15 4 23 19 13]
610863 [34 15 18 3 34 19 26]
5844596 [17 22 5 33 48 29 13]
4260860 [45 45 13 11 15 45 40]
7587553 [4 15 38 4 24 12 13]
9757221 [30 41 11 20 33 7 46]

A Parallel Version

Start by finding the minimal unit of work:

list = append(list, []int{
	rand.Intn(49),
	rand.Intn(49),
	rand.Intn(49),
	rand.Intn(49),
	rand.Intn(49),
	rand.Intn(49),
	rand.Intn(49),
})

This is the smallest unit of work the program does. The rest of the code basically says do this unit 10,000,000 times. Since the program is simple, the minimal unit is also simple. In more complicated programs the minimal unit is usually a function or could be refactored into a function.

If the minimal unit were safe for concurrent operation (in this case it is not), a goroutine could be ran for each unit of work the program required. However, as @calmh pointed out, there is a cost to doing this. All of this extra work is overhead as it was not needed in the serial version.

The trick to minimizing overhead while gaining the benefits of parallelizing the work is to find the minimal scalable unit. By scalable I mean that the amount of work done by each goroutine is adjustable, potentially from the minimal amount to the entire amount. In this case, the minimal scalable unit is:

for i := 0; i < amountOfWork; i++ {
	list = append(list, []int{
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
	})
}

The next step is to make the minimal scalable unit safe for concurrent execution. The main problem here is append. Since I learned to write code on distributed systems (i.e., many separate computers), my first instinct is to build up a local list and then send it to another goroutine that combines the lists together. While this method works, building local lists, sending them around, and combining them together is all extra work that does not need to be done because each goroutine can just change its part of the list. This can be checked as safe by using the race detector. That changes the minimal scalable unit to:

for i := begin; i < end; i++ {
	list[i] = []int{
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
		rand.Intn(49),
	}
}

Where begin and end are the part of the work to be accomplished by that unit.

The problem with this approach is the limitation that only code between the Start and End Printfs can be changed. Since list was created with the full capacity it needed, but its length was zero (to fulfill the criteria that len(list) need to start at 0), assigning list[i] would panic. To fix this problem, I simply remake list as make([][]int, n).

The next problem, as pointed out by @calmh, to solve is serialized access to the default random number source. Since most of the work done in this program is getting random numbers, this serialization limits the program’s parallel performance. Random number generators (RNGs) generate a cyclical sequence of numbers. Seeding a RNG tells the generator where it should start in the sequence. This is why using the same seed produces the same sequence of “random” numbers. Using overlapping sequences by not having sufficiently separated seeds reduces the randomness of the numbers used. The way to do this is to use a common seed with an offset for the number of numbers used by the previous goroutines.

Solving these problems resulted in my parallel version:

package main

import (
	"fmt"
	"math/rand"
	"runtime"
	"sync"
	"time"
)

func main() {
	const n = 10000000 // number of lotto numbers to produce
	list := make([][]int, 0, n)
	rand.Seed(time.Now().UnixNano())

	start := time.Now()
	fmt.Printf("Start\t%v\n", len(list))

	list = make([][]int, n) // work around the entire list not being accessible
	var wg sync.WaitGroup
	workers := runtime.GOMAXPROCS(-1) // one for each proc
	seed := time.Now().UnixNano()

	for i := 0; i < workers; i++ {
		work := n / workers
		begin := work * i
		end := begin + work

		if i == workers-1 {
			end += n % workers
		}

		r := rand.New(rand.NewSource(seed + 7*int64(begin)))

		wg.Add(1)
		go func() {
			defer wg.Done()

			for i := begin; i < end; i++ {
				list[i] = []int{
					r.Intn(49),
					r.Intn(49),
					r.Intn(49),
					r.Intn(49),
					r.Intn(49),
					r.Intn(49),
					r.Intn(49),
				}
			}
		}()
	}

	wg.Wait()

	fmt.Printf("End\t%v %v\n", len(list), time.Since(start))
	for i := 0; i < 10; i++ {
		j := rand.Intn(len(list))
		fmt.Println(j, list[j])
	}
}

Since computation is the limiting factor for this program, I chose to run one goroutine for each processor (controlled with GOMAXPROCS, defaults to the number of cores on the machine). Running on my two core computer gives:

Start	0
End	10000000 5.377756077s
4047617 [10 21 47 8 25 42 44]
9164640 [16 31 15 33 1 33 43]
3197595 [40 8 39 22 13 36 20]
1374711 [12 4 16 20 17 21 33]
2543575 [9 46 25 23 2 20 28]
9157311 [10 2 1 21 28 2 18]
7488333 [0 37 24 24 4 45 38]
269903 [21 40 15 48 5 41 11]
3858409 [33 46 37 35 44 35 27]
8445215 [40 8 36 14 1 32 37]

Reducing the runtime from 12.957932904s to 5.377756077s.

Other Comments

If I were doing this for production code, I would want a better way to test that my changes improved performance while not changing the results. The first thing I would do is move the code out of main into a function with a signature like:

func lottoNumbers(n int, seed int64) [][]int

Notice that seed in an argument, allowing a test harness to use the same seed for each version. The results could then be compared in their entirety. Such a comparison is the only way to know that the random numbers used are distributed as expected and that the changes have not produced incorrect results.


[ASK] Multithreaded input from each line
(Marc Zahn) #5

Hi Nathan,

thank you very much for your detailed answer. I really appreciate!


(Dave Cheney) #6

Protip: use the block profile to uncover paralism bottlenecks​.


(nathankerr) #7

I found a problem with my explanation of Random Number Generators.

Imagine there is a function, rands that takes a seed and returns an infinite slice of random values.

For deterministic RNGs (like math/rand's), rand(seed)[n] == rand(seed)[n] is true for every integer n.

I thought that rand(seed)[n+1] == rand(seed+1)[n] was true for every integer n. While it is true that for some RNGs, the RNG implemented in math/rand “cooks” the seed, making this assumption not true.

Why does this matter?

I was trying to test the correctness of my parallel version by comparing it to the results of my serial version. The test passed when there was only one worker. When using more workers (i.e., increasing parallelism), the comparison failed at the start of the second worker’s output because of my mistaken assumption. Since I can’t control the randomness by controlling the seed, I need to rethink my test.

This also means that:

r := rand.New(rand.NewSource(seed + 7*int64(begin)))

is overly complicated (though not really incorrect) and can be reduced to:

r := rand.New(rand.NewSource(seed + i))

Note that the seeds still need to be different, otherwise the workers would all be doing the same work.


(nathankerr) #8

Ever since I first responded to this thread I wanted to expand my response so that it demonstrated both how to parallelize code and how to use go’s tooling to do so.

I finally managed to get the tutorial completely written and posted. I show how the testing and benchmarking tools motivate and justify every change that is made. I even used @dfc’s protip of using the block profile to show that math/rand's default source was a bottleneck.

The end result is even faster. Generating 10,000,000 lotto numbers now takes 4.770699064s instead of the 5.377756077s my parallel version posted here did.

Read how I did it! https://pocketgophers.com/concurrency-slower/


(Thomas) #9

Hi! Nice post I have some suggestion to improve the performance.

First, in your code rand.Intn(49) generate number in [0:49) so [0:48]. it should be 50 if you want number in [0:49], but that doesn’t change performance.

Consider using [][7]uint8 as your list type, this allow you to save lots of memory allocation, for 10,000,000 the memory consumption get from 500Mo to 70Mo. And removing allocation in the worker’s loop make it faster.

Since rand.Intn is the bottleneck, for fun I implement a wrapper to generate 7 numbers of 6bits number from only one int64 rand numbers (almost). Using a int64 to only use 6bits (2⁵-1 < 49 < 2⁶-1) is a waste of rand bits.

Benchmark show that my rand wrapper is around x2 the speed of using 7 rand.Intn.
Using [7]int8 speed thing by approximately x1.5

So the overall is around x3 time faster and use much less memory (x11 less). My machine as 8 core (4 without hyperthreading). I obtain the same difference in a 2009 macbook pro core 2 duo.

BenchmarkNaiveSP-8            	10000000	       165 ns/op	      88 B/op	       1 allocs/op
BenchmarkNaiveSPuint8-8       	20000000	       101 ns/op	       7 B/op	       0 allocs/op

BenchmarkHisLotto-8           	30000000	        44.6 ns/op	      80 B/op	       0 allocs/op
BenchmarkLottoSP-8            	30000000	        50.9 ns/op	       7 B/op	       0 allocs/op
BenchmarkLottoMP_1-8          	30000000	        50.7 ns/op	       7 B/op	       0 allocs/op
BenchmarkLottoMP_2-8          	50000000	        25.5 ns/op	       7 B/op	       0 allocs/op
BenchmarkLottoMP_4-8          	100000000	        14.0 ns/op	       7 B/op	       0 allocs/op
BenchmarkLottoMP_8-8          	100000000	        13.9 ns/op	       7 B/op	       0 allocs/op
BenchmarkLottoMP_16-8         	100000000	        11.2 ns/op	       7 B/op	       0 allocs/op
BenchmarkLottoMP_32-8         	100000000	        11.3 ns/op	       7 B/op	       0 allocs/op
BenchmarkLottoMP_MAXPROCS-8   	100000000	        13.5 ns/op	       7 B/op	       0 allocs/op

BenchmarkRand7Intn-8          	20000000	       109 ns/op	       0 B/op	       0 allocs/op
BenchmarkRand7Int31n-8        	20000000	        96.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkRand7Int63n-8        	10000000	       172 ns/op	       0 B/op	       0 allocs/op
BenchmarkMyRand7Numbers-8     	20000000	        70.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkMyRandDraw-8         	30000000	        47.9 ns/op	       0 B/op	       0 allocs/op

The code:

package main

import (
	"math/rand"
	"sync"
	"time"
)

// NumberOfBall upper bound of the value a ball can be pick : [0:NumberOfBall)
const NumberOfBall = 50

type draw [7]uint8

func GenerateLottoNumberSP(N int) []draw {
	list := make([]draw, 0, N)
	rnd := newMyRand(time.Now().UnixNano())

	for i := 0; i < N; i++ {
		list = append(list, rnd.nextDraw())
	}
	return list
}

func GenerateLottoNumberMP(N int, numWorker int) []draw {

	wg := sync.WaitGroup{}
	list := make([]draw, N)

	step := N / numWorker

	for i := 0; i < numWorker; i++ {
		begin := i * step
		end := begin + step
		if end > N {
			end = N
		}
		seed := time.Now().UnixNano() + int64(i)
		wg.Add(1)
		go func(workerTab []draw, seed int64) {
			rnd := newMyRand(seed)
			for i := 0; i < len(workerTab); i++ {
				workerTab[i] = rnd.nextDraw()
			}
			wg.Done()
		}(list[begin:end], seed)
	}

	wg.Wait()

	return list
}

type myRand struct {
	rnd *rand.Rand
	n   int
	gen [10]uint8
}

func newMyRand(seed int64) *myRand {
	r := &myRand{}
	r.rnd = rand.New(rand.NewSource(seed))
	r.fill()

	return r
}

func (rnd *myRand) fill() {
	r := rnd.rnd.Uint64()
	rnd.gen[0] = uint8(r & 63)
	rnd.gen[1] = uint8((r >> 6) & 63)
	rnd.gen[2] = uint8((r >> 12) & 63)
	rnd.gen[3] = uint8((r >> 18) & 63)
	rnd.gen[4] = uint8((r >> 24) & 63)
	rnd.gen[5] = uint8((r >> 30) & 63)
	rnd.gen[6] = uint8((r >> 36) & 63)
	rnd.gen[7] = uint8((r >> 42) & 63)
	rnd.gen[8] = uint8((r >> 48) & 63)
	rnd.gen[9] = uint8((r >> 54) & 63)
	// we only use 60 bits out of 64
	rnd.n = 0
}

func (rnd *myRand) nextUint8() uint8 {
	// refill
	if rnd.n >= len(rnd.gen) {
		rnd.fill()
	}
	next := rnd.gen[rnd.n]
	rnd.n++
	return next
}

func (rnd *myRand) nextNumber() uint8 {
	next := rnd.nextUint8()
	for next >= NumberOfBall { // To ensure uniform distribution
		next = rnd.nextUint8()
	}
	return next
}

func (rnd *myRand) nextDraw() draw {
	r := rnd.rnd.Uint64()
	n := draw{
		uint8(r & 63),
		uint8((r >> 6) & 63),
		uint8((r >> 12) & 63),
		uint8((r >> 18) & 63),
		uint8((r >> 24) & 63),
		uint8((r >> 30) & 63),
		uint8((r >> 36) & 63),
		// we only use 40 bits out of 64
	}
	for i := 0; i < 7; i++ {
		if n[i] >= NumberOfBall {
			n[i] = rnd.nextNumber()
		}
	}
	return n
}

The test & benchmark. It needs your implementation to compare in BenchmarkHisLotto:

package main

import (
	"math"
	"math/rand"
	"runtime"
	"testing"
	"time"
)

func BenchmarkNaiveSP(b *testing.B) {
	list := make([][]int, 0, b.N)
	rnd := rand.New(rand.NewSource(time.Now().UnixNano()))

	for i := 0; i < b.N; i++ {
		list = append(list, []int{
			rnd.Intn(NumberOfBall),
			rnd.Intn(NumberOfBall),
			rnd.Intn(NumberOfBall),
			rnd.Intn(NumberOfBall),
			rnd.Intn(NumberOfBall),
			rnd.Intn(NumberOfBall),
			rnd.Intn(NumberOfBall),
		})
	}
}

func BenchmarkNaiveSPuint8(b *testing.B) {
	list := make([][7]uint8, 0, b.N)
	rnd := rand.New(rand.NewSource(time.Now().UnixNano()))

	for i := 0; i < b.N; i++ {
		list = append(list, [7]uint8{
			uint8(rnd.Int31n(NumberOfBall)),
			uint8(rnd.Int31n(NumberOfBall)),
			uint8(rnd.Int31n(NumberOfBall)),
			uint8(rnd.Int31n(NumberOfBall)),
			uint8(rnd.Int31n(NumberOfBall)),
			uint8(rnd.Int31n(NumberOfBall)),
			uint8(rnd.Int31n(NumberOfBall)),
		})
	}
}

func BenchmarkHisLotto(b *testing.B) {
	lottoNumbers(b.N)
}

func BenchmarkLottoSP(b *testing.B) {
	GenerateLottoNumberSP(b.N)
}

func BenchmarkLottoMP_1(b *testing.B) {
	GenerateLottoNumberMP(b.N, 1)
}
func BenchmarkLottoMP_2(b *testing.B) {
	GenerateLottoNumberMP(b.N, 2)
}
func BenchmarkLottoMP_4(b *testing.B) {
	GenerateLottoNumberMP(b.N, 4)
}
func BenchmarkLottoMP_8(b *testing.B) {
	GenerateLottoNumberMP(b.N, 8)
}
func BenchmarkLottoMP_16(b *testing.B) {
	GenerateLottoNumberMP(b.N, 16)
}
func BenchmarkLottoMP_32(b *testing.B) {
	GenerateLottoNumberMP(b.N, 32)
}

func BenchmarkLottoMP_MAXPROCS(b *testing.B) {
	GenerateLottoNumberMP(b.N, runtime.GOMAXPROCS(-1))
}

func TestMyRandNumber(t *testing.T) {
	// Warning this is an empirical test by generation a distribution

	const N = 10000000 * 7

	rnd := newMyRand(time.Now().UnixNano())

	dist := make([]int64, NumberOfBall)
	for i := 0; i < N; i++ {
		n := rnd.nextNumber()
		if n < 0 || n >= NumberOfBall {
			t.Fatalf("found non valid number : %d, after %d iteration", n, i)
		}
		dist[n]++
	}

	avg := N / float64(NumberOfBall)
	for i, d := range dist {
		if math.Abs(avg-float64(d)) > avg/100 {
			t.Errorf("rand apparently not uniform %d, %d, %f", i, d, avg)
		}
		// fmt.Printf("%2d %10d\n", i, d)
	}
}

func TestMyRandDraw(t *testing.T) {
	// Warning this is an empirical test by generation a distribution

	const N = 10000000

	rnd := newMyRand(time.Now().UnixNano())

	dist := make([]int64, NumberOfBall)
	for i := 0; i < N; i++ {
		sevenNumbers := rnd.nextDraw()
		for _, n := range sevenNumbers {
			if n < 0 || n >= NumberOfBall {
				t.Fatalf("found non valid number : %d, after %d iteration", n, i)
			}
			dist[n]++
		}
	}

	avg := 7 * N / float64(NumberOfBall)
	for i, d := range dist {
		if math.Abs(avg-float64(d)) > avg/100 {
			t.Errorf("rand apparently not uniform %d, %d, %f", i, d, avg)
		}
		// fmt.Printf("%2d %10d\n", i, d)
	}
}

func TestStdRand(t *testing.T) {
	// Warning this is an empirical test by generation a distribution

	const N = 10000000 * 7

	rnd := rand.New(rand.NewSource(time.Now().UnixNano()))

	dist := make([]int64, NumberOfBall)
	for i := 0; i < N; i++ {
		n := rnd.Intn(NumberOfBall)
		if n < 0 || n >= NumberOfBall {
			t.Fatalf("found non valid number : %d, after %d iteration", n, i)
		}
		dist[n]++
	}

	avg := N / float64(NumberOfBall)
	for i, d := range dist {
		if math.Abs(avg-float64(d)) > avg/100 {
			t.Errorf("rand apparently not uniform %d, %d, %f", i, d, avg)
		}
		// fmt.Printf("%2d %10d\n", i, d)
	}
}

func BenchmarkRand7Intn(b *testing.B) {
	rnd := rand.New(rand.NewSource(time.Now().UnixNano()))
	for i := 0; i < b.N; i++ {
		rnd.Intn(NumberOfBall)
		rnd.Intn(NumberOfBall)
		rnd.Intn(NumberOfBall)
		rnd.Intn(NumberOfBall)
		rnd.Intn(NumberOfBall)
		rnd.Intn(NumberOfBall)
		rnd.Intn(NumberOfBall)
	}
}
func BenchmarkRand7Int31n(b *testing.B) {
	rnd := rand.New(rand.NewSource(time.Now().UnixNano()))
	for i := 0; i < b.N; i++ {
		rnd.Int31n(NumberOfBall)
		rnd.Int31n(NumberOfBall)
		rnd.Int31n(NumberOfBall)
		rnd.Int31n(NumberOfBall)
		rnd.Int31n(NumberOfBall)
		rnd.Int31n(NumberOfBall)
		rnd.Int31n(NumberOfBall)
	}
}
func BenchmarkRand7Int63n(b *testing.B) {
	rnd := rand.New(rand.NewSource(time.Now().UnixNano()))
	for i := 0; i < b.N; i++ {
		rnd.Int63n(NumberOfBall)
		rnd.Int63n(NumberOfBall)
		rnd.Int63n(NumberOfBall)
		rnd.Int63n(NumberOfBall)
		rnd.Int63n(NumberOfBall)
		rnd.Int63n(NumberOfBall)
		rnd.Int63n(NumberOfBall)
	}
}

func BenchmarkMyRand7Numbers(b *testing.B) {
	rnd := newMyRand(time.Now().UnixNano())
	for i := 0; i < b.N; i++ {
		rnd.nextNumber()
		rnd.nextNumber()
		rnd.nextNumber()
		rnd.nextNumber()
		rnd.nextNumber()
		rnd.nextNumber()
		rnd.nextNumber()
	}
}

func BenchmarkMyRandDraw(b *testing.B) {
	rnd := newMyRand(time.Now().UnixNano())
	for i := 0; i < b.N; i++ {
		rnd.nextDraw()
	}
}

(nathankerr) #10

Nice work, @Tomu!

To make it even easier to compare your results with the results in my post, I changed NumberOfBall to 49 (so that my tests would pass) and wrapped GenerateLottoNumberMP:

func lottoNumbers(n int) []draw {
	return GenerateLottoNumberMP(n, runtime.GOMAXPROCS(-1))
}

Since my tests already converted the results to float64, I didn’t have to make any changes to the test to get this to compile.

I ran the benchmark on my machine with the same setup as my other runs:

$ go test -bench=. -count 10 -benchmem| tee toma.txt
BenchmarkLottoNumbers-2   	      10	 108976820 ns/op	 7015555 B/op	       9 allocs/op
BenchmarkLottoNumbers-2   	      10	 101535082 ns/op	 7015254 B/op	       8 allocs/op
BenchmarkLottoNumbers-2   	      10	 111754857 ns/op	 7015606 B/op	       9 allocs/op
BenchmarkLottoNumbers-2   	      10	 110160809 ns/op	 7015337 B/op	       8 allocs/op
BenchmarkLottoNumbers-2   	      20	 116207870 ns/op	 7015337 B/op	       8 allocs/op
BenchmarkLottoNumbers-2   	      10	 103640660 ns/op	 7015542 B/op	       8 allocs/op
BenchmarkLottoNumbers-2   	      10	 114540844 ns/op	 7015212 B/op	       8 allocs/op
BenchmarkLottoNumbers-2   	      20	  94104806 ns/op	 7015088 B/op	       8 allocs/op
BenchmarkLottoNumbers-2   	      10	 118753843 ns/op	 7015088 B/op	       8 allocs/op
BenchmarkLottoNumbers-2   	      20	  95803664 ns/op	 7015092 B/op	       8 allocs/op
PASS
ok  	concurrency-slower	23.357s

Sadly this forum won’t let me use the benchstat’s html output like in my post, so we have to live with the text output:

$ benchstat reduce-allocs.txt toma.txt
name            old time/op    new time/op    delta
LottoNumbers-2     336ms ± 8%     108ms ±12%  -68.01%  (p=0.000 n=8+10)

name            old alloc/op   new alloc/op   delta
LottoNumbers-2    80.0MB ± 0%     7.0MB ± 0%  -91.23%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
LottoNumbers-2      5.00 ± 0%      8.00 ± 0%  +60.00%  (p=0.000 n=9+8)
benchstat serial.txt toma.txt
name            old time/op    new time/op    delta
LottoNumbers-2     1.26s ± 7%     0.11s ±12%   -91.47%  (p=0.000 n=10+10)

name            old alloc/op   new alloc/op   delta
LottoNumbers-2    88.0MB ± 0%     7.0MB ± 0%   -92.03%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
LottoNumbers-2     1.00M ± 0%     0.00M ± 0%  -100.00%  (p=0.000 n=10+8)

(system) #11

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.