Based on the testing you are doing, the fastest solution (based on your fastest possible solution) is:
package main
import (
"fmt"
"time"
)
var parallelCount = 10
var iterations = 1000000
var list = make([][]int, parallelCount*iterations)
func main() {
start := time.Now()
fmt.Println("Start")
d := time.Now().Sub(start)
fmt.Println("End ", len(list), " ", d.Seconds())
}
When I run it, I get:
Start
End 10000000 0.000399549
Yes, I did just delete most of the code you worked so hard to write. There was also another bug:
list[parallelCount * iterations - 1] = add(r)
Only one item of list
was ever set. While this really helps with cache hits, it doesnât help with providing correct results. Of course it is really hard to test that a random set of data is correct. Since you are manually checking things anyway, viewing a random sample will work well enough:
for i := 0; i < 10; i++ {
j := rand.Intn(len(list))
fmt.Println(j, list[j])
}
Adding that to the end of main
and printing len(list)
at the beginning of your fastest version produces:
Start 10000000
End 10000000 4.770166086
8498081 []
9727887 []
7131847 []
9984059 []
1902081 []
4941318 []
954425 []
6122540 []
8240456 []
6203300 []
A Serial Version
Working back from the two versions provided and applying the improved test ideas, I think the serial (i.e., no concurrency, no parallelism) version of your program would be something like:
package main
import (
"fmt"
"math/rand"
"time"
)
func main() {
const n = 10000000 // number of lotto numbers to produce
list := make([][]int, 0, n)
rand.Seed(time.Now().UnixNano())
start := time.Now()
fmt.Printf("Start\t%v\n", len(list))
for i := 0; i < n; i++ {
list = append(list, []int{
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
})
}
fmt.Printf("End\t%v %v\n", len(list), time.Since(start))
for i := 0; i < 10; i++ {
j := rand.Intn(len(list))
fmt.Println(j, list[j])
}
}
New versions should be in separate files and only change main
between the Start and End Printf
s.
I ran the program with go run serial.go
and got:
Start 0
End 10000000 12.957932904s
3872754 [7 34 41 36 13 32 33]
9294501 [27 45 14 19 1 15 38]
3055379 [43 9 47 29 5 1 8]
3537380 [35 40 38 21 41 43 6]
2400376 [8 1 15 4 23 19 13]
610863 [34 15 18 3 34 19 26]
5844596 [17 22 5 33 48 29 13]
4260860 [45 45 13 11 15 45 40]
7587553 [4 15 38 4 24 12 13]
9757221 [30 41 11 20 33 7 46]
A Parallel Version
Start by finding the minimal unit of work:
list = append(list, []int{
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
})
This is the smallest unit of work the program does. The rest of the code basically says do this unit 10,000,000 times. Since the program is simple, the minimal unit is also simple. In more complicated programs the minimal unit is usually a function or could be refactored into a function.
If the minimal unit were safe for concurrent operation (in this case it is not), a goroutine could be ran for each unit of work the program required. However, as @calmh pointed out, there is a cost to doing this. All of this extra work is overhead as it was not needed in the serial version.
The trick to minimizing overhead while gaining the benefits of parallelizing the work is to find the minimal scalable unit. By scalable I mean that the amount of work done by each goroutine is adjustable, potentially from the minimal amount to the entire amount. In this case, the minimal scalable unit is:
for i := 0; i < amountOfWork; i++ {
list = append(list, []int{
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
})
}
The next step is to make the minimal scalable unit safe for concurrent execution. The main problem here is append
. Since I learned to write code on distributed systems (i.e., many separate computers), my first instinct is to build up a local list and then send it to another goroutine that combines the lists together. While this method works, building local lists, sending them around, and combining them together is all extra work that does not need to be done because each goroutine can just change its part of the list. This can be checked as safe by using the race detector. That changes the minimal scalable unit to:
for i := begin; i < end; i++ {
list[i] = []int{
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
rand.Intn(49),
}
}
Where begin
and end
are the part of the work to be accomplished by that unit.
The problem with this approach is the limitation that only code between the Start and End Printf
s can be changed. Since list
was created with the full capacity it needed, but its length was zero (to fulfill the criteria that len(list)
need to start at 0), assigning list[i]
would panic. To fix this problem, I simply remake list as make([][]int, n)
.
The next problem, as pointed out by @calmh, to solve is serialized access to the default random number source. Since most of the work done in this program is getting random numbers, this serialization limits the programâs parallel performance. Random number generators (RNGs) generate a cyclical sequence of numbers. Seeding a RNG tells the generator where it should start in the sequence. This is why using the same seed produces the same sequence of ârandomâ numbers. Using overlapping sequences by not having sufficiently separated seeds reduces the randomness of the numbers used. The way to do this is to use a common seed with an offset for the number of numbers used by the previous goroutines.
Solving these problems resulted in my parallel version:
package main
import (
"fmt"
"math/rand"
"runtime"
"sync"
"time"
)
func main() {
const n = 10000000 // number of lotto numbers to produce
list := make([][]int, 0, n)
rand.Seed(time.Now().UnixNano())
start := time.Now()
fmt.Printf("Start\t%v\n", len(list))
list = make([][]int, n) // work around the entire list not being accessible
var wg sync.WaitGroup
workers := runtime.GOMAXPROCS(-1) // one for each proc
seed := time.Now().UnixNano()
for i := 0; i < workers; i++ {
work := n / workers
begin := work * i
end := begin + work
if i == workers-1 {
end += n % workers
}
r := rand.New(rand.NewSource(seed + 7*int64(begin)))
wg.Add(1)
go func() {
defer wg.Done()
for i := begin; i < end; i++ {
list[i] = []int{
r.Intn(49),
r.Intn(49),
r.Intn(49),
r.Intn(49),
r.Intn(49),
r.Intn(49),
r.Intn(49),
}
}
}()
}
wg.Wait()
fmt.Printf("End\t%v %v\n", len(list), time.Since(start))
for i := 0; i < 10; i++ {
j := rand.Intn(len(list))
fmt.Println(j, list[j])
}
}
Since computation is the limiting factor for this program, I chose to run one goroutine for each processor (controlled with GOMAXPROCS, defaults to the number of cores on the machine). Running on my two core computer gives:
Start 0
End 10000000 5.377756077s
4047617 [10 21 47 8 25 42 44]
9164640 [16 31 15 33 1 33 43]
3197595 [40 8 39 22 13 36 20]
1374711 [12 4 16 20 17 21 33]
2543575 [9 46 25 23 2 20 28]
9157311 [10 2 1 21 28 2 18]
7488333 [0 37 24 24 4 45 38]
269903 [21 40 15 48 5 41 11]
3858409 [33 46 37 35 44 35 27]
8445215 [40 8 36 14 1 32 37]
Reducing the runtime from 12.957932904s to 5.377756077s.
Other Comments
If I were doing this for production code, I would want a better way to test that my changes improved performance while not changing the results. The first thing I would do is move the code out of main into a function with a signature like:
func lottoNumbers(n int, seed int64) [][]int
Notice that seed in an argument, allowing a test harness to use the same seed for each version. The results could then be compared in their entirety. Such a comparison is the only way to know that the random numbers used are distributed as expected and that the changes have not produced incorrect results.