Go slower to execute due to cache misses

Hi,

I have develop small code as selection-sort and use slices for go program and vector for c++.
Note: C++ code compile with O2 flag.
Why comparing ? just to know what level of speed can be achieved by go.

Here what I found that go take 10 sec to sort while c++ vector take only 5 sec. (almost half of go)
Then I tried to run using perf tool to see whether issue with cache or not I found below stats

cache-misses:
for go -> 15,397,360
for c++ -> 1,808,180

Am I missing here some compiler flag for go to optimize code ?
“go build -o go_test selection_sort.go”
Is there any API to prefetch buffer e.g. __builting_prefetch in gcc ? or compiler handle that.

Thanks,
~Rohit

could you provide the code? It will be easier to help you out.

Please correct me if something wrong.

-------------- c++ code ----------------------

void updateVector(vector &ptr, int count) {
for (int i = 0; i < count; i++)
{
ptr.push_back(count - i);
// cout << i << " -> " << ptr[i] << endl;
}

}

void selection_sort(vector &ptr) {

clock_t start, stop;
double total;
long nsec;
long sec;
int min = 0;

cout << "Vector size : " << ptr.size() << endl;

start = clock();

for (int i = 0; i < (ptr.size() - 1); i++)
{
	min = i;
	for (int j = min + 1; j < ptr.size(); j++) {
		if(ptr[j] < ptr[min]) {
			min = j;
		}
	}

	int temp = ptr[min];
	ptr[min] = ptr[i];
	ptr[i] = temp;
}

stop = clock();

total = (double)(stop - start) / CLOCKS_PER_SEC;

cout << "CPP Execution time : " << total << endl;

}

from main first call update function and then sort.

----------------------------------------- go code ---------------------------------------------------

func buildBuffer(count int) []int {
var buffer []int

buffer = make([]int, count)

for i := 0; i < count; i++ {
	buffer[i] = count - i
	//fmt.Printf("%d -> %d \n", i, buffer[i])
}

return buffer

}

func selection_sort(buf []int, count int) {
min := 0
t1 := time.Now()

for i := 0; i < count-1; i++ {
	min = i
	for j := i + 1; j < count; j++ {
		if buf[min] > buf[j] {
			min = j
		}
	}
	buf[i], buf[min] = buf[min], buf[i]
}

t2 := time.Now()
elapse := t2.Sub(t1)
fmt.Printf("go Execution time : %f \n", elapse.Seconds())

}


I tried in C code also it gives same performance like go wiht -O2 flag, but when i change flag to -O3 it take less time similar to C++. Then with optimization flag in C code I tired using __builtin_preftech code and found performance got improved.

This indicate me go compiler is not adding prefetch instruction which are important in today’s world.
Please let me know if any such mechanism available in golang.

oid sort_ints(int * ptr, int count) {
int temp, min;
clock_t start, stop;
double total;

start = clock();

for (int i = 0; i < count - 1; i++)
{
    min = i;
    __builtin_prefetch(&ptr[min], 1, 3);
    for (int j = min + 1; j < count; j++)
    {
        __builtin_prefetch(&ptr[j+1], 0, 0);
        if(ptr[j] < ptr[min]) {
            min = j;
            __builtin_prefetch(&ptr[min], 1, 2);
        }        
    }

    temp = ptr[min];
    ptr[min] = ptr[i];
    ptr[i] = temp;
}

stop = clock();
total = (double)(stop -start) / CLOCKS_PER_SEC;

printf("C Execution time : %f \n", total);

}

Is any mechanism to do prefetch of data in Go ? e.g. in gcc compiler is __builtin_prefetch(…) or _mm_prefetch(…).

I’m afraid you cannot modify the prefetch behaviour. Go is very limited in this area. I think that these line cases it:

buf[i], buf[min] = buf[min], buf[i]

Even if it’s prefetched, it invalidates it next time.
BTW, why do you implement the sort algorithm by yourself? Is the sort.Sort too slow? https://golang.org/pkg/sort/#Sort

Thanks, it always better to use sort library to sort element.
But I took selection sort to see go performance compare to C, reason just want to explore and get to know capability of go. It seen to me interesting.