Split slice in sub-slices by group defined by a second slice

Pisistrato · October 5, 2021, 12:50pm

I am a casual gopher and I would need some help/tips on how to improve the following task.

I have two slices:

the first slice s contains strings, those are not unique, as from time to time the same string is repeated twice or multiple times.
the second slice n contains integers, its content refers to the iteration in which each string was generated from a loop above.

As a minimal example:

s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"} //the strings
n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}                      // the iteration that produced the strings

Essentially, the string “a” was generated during the iteration 1, same for “b”, while “c” was generated in iterations 1, 2, and 3…

I need now to split the slice n into sub-slices defined by the ‘grouped’ slice s:

//my end results would be
r := [][]int{{1}, {1}, {1, 2, 3}, {4}, {2, 3, 4}}

Essentially, the firs sub-slice is based on the string “a”, which is generated only in iteration 1, the second sub-slice is “b” also generated only in 1, then there is “c” generated at 1,2, and 3…

I thought to make use of a map since keys are unique.
Again:

//my map end result
m := make(map[string][]int) 
m["a"] = []int{1}
m["b"] = []int{1}
m["c"] = []int{1, 2, 3}
m["d"] = []int{4}
m["e"] = []int{2, 3, 4}

The following code works just fine, but it is super slow when the input slices (s and n) are in the range of several thousand elements.

package main

import (
	"fmt"
)

func main() {
	s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"}
	n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}

	computedm := make(map[string][]int)
	for i := 0; i < len(s); i++ {
		x := indexof(s, s[i])
		var y []int
		for _, p := range x {
			y = append(y, n[p])
		}
		computedm[s[i]] = y
	}
	fmt.Println(computedm)
}

func indexof(ss []string, s string) []int {
	var res []int
	for i, j := range ss {
		if j == s {
			res = append(res, i)
		}
	}
	return res
}

How can I make it faster? lightning fast…

christophberger · October 6, 2021, 4:56pm

Without looking closer or running any profiling, I’d guess that the slice creation and append operations inside the loops are the most costly ones. If you can pre-allocate the space you need, this might speed up things.

GonzaSaya · October 7, 2021, 1:40am

Hi @Pisistrato, Hope you are doing good.

I had run the follow script in my laptop and took between 6.000 and 10.000 nanoseconds.

package main

import (
	"fmt"
	"time"
)

func main() {
	s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"}
	n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}

	computedm := make(map[string][]int)

	startTime := time.Now()
	for i := 0; i < len(s); i++ {
		x := indexof(s, s[i])
		var y []int
		for _, p := range x {
			y = append(y, n[p])
		}
		computedm[s[i]] = y
	}
	fmt.Println(time.Since(startTime).Nanoseconds())
	fmt.Println(computedm)
}

func indexof(ss []string, s string) []int {
	var res []int
	for i, j := range ss {
		if j == s {
			res = append(res, i)
		}
	}
	return res
}

Also I had run the following script and took between 3.000 and 5.000 nanoseconds.

package main

import (
	"fmt"
	"time"
)

func main() {
	s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"}
	n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}
	computedm := make(map[string]map[int]bool)

	startTime := time.Now()

	for i, x := range s {
		if compute, found := computedm[x]; found {
			compute[n[i]] = true
		} else {
			computedm[x] = map[int]bool{n[i]: true}
		}
	}
	fmt.Println(time.Since(startTime).Nanoseconds())

	fmt.Println(computedm)
}

Let me know if the solution is correct.

Regards!
Saya