Split slice in sub-slices by group defined by a second slice

I am a casual :upside_down_face: gopher and I would need some help/tips on how to improve the following task.

I have two slices:

  • the first slice s contains strings, those are not unique, as from time to time the same string is repeated twice or multiple times.
  • the second slice n contains integers, its content refers to the iteration in which each string was generated from a loop above.

As a minimal example:

s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"} //the strings
n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}                      // the iteration that produced the strings

Essentially, the string β€œa” was generated during the iteration 1, same for β€œb”, while β€œc” was generated in iterations 1, 2, and 3…

I need now to split the slice n into sub-slices defined by the β€˜grouped’ slice s:

//my end results would be
r := [][]int{{1}, {1}, {1, 2, 3}, {4}, {2, 3, 4}}

Essentially, the firs sub-slice is based on the string β€œa”, which is generated only in iteration 1, the second sub-slice is β€œb” also generated only in 1, then there is β€œc” generated at 1,2, and 3…

I thought to make use of a map since keys are unique.
Again:

//my map end result
m := make(map[string][]int) 
m["a"] = []int{1}
m["b"] = []int{1}
m["c"] = []int{1, 2, 3}
m["d"] = []int{4}
m["e"] = []int{2, 3, 4}

The following code works just fine, but it is super slow when the input slices (s and n) are in the range of several thousand elements.

package main

import (
	"fmt"
)

func main() {
	s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"}
	n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}

	computedm := make(map[string][]int)
	for i := 0; i < len(s); i++ {
		x := indexof(s, s[i])
		var y []int
		for _, p := range x {
			y = append(y, n[p])
		}
		computedm[s[i]] = y
	}
	fmt.Println(computedm)
}

func indexof(ss []string, s string) []int {
	var res []int
	for i, j := range ss {
		if j == s {
			res = append(res, i)
		}
	}
	return res
}

How can I make it faster? lightning fast…

Without looking closer or running any profiling, I’d guess that the slice creation and append operations inside the loops are the most costly ones. If you can pre-allocate the space you need, this might speed up things.

Hi @Pisistrato, Hope you are doing good.

I had run the follow script in my laptop and took between 6.000 and 10.000 nanoseconds.

package main

import (
	"fmt"
	"time"
)

func main() {
	s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"}
	n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}

	computedm := make(map[string][]int)

	startTime := time.Now()
	for i := 0; i < len(s); i++ {
		x := indexof(s, s[i])
		var y []int
		for _, p := range x {
			y = append(y, n[p])
		}
		computedm[s[i]] = y
	}
	fmt.Println(time.Since(startTime).Nanoseconds())
	fmt.Println(computedm)
}

func indexof(ss []string, s string) []int {
	var res []int
	for i, j := range ss {
		if j == s {
			res = append(res, i)
		}
	}
	return res
}

Also I had run the following script and took between 3.000 and 5.000 nanoseconds.

package main

import (
	"fmt"
	"time"
)

func main() {
	s := []string{"a", "b", "c", "c", "e", "c", "e", "d", "e"}
	n := []int{1, 1, 1, 2, 2, 3, 3, 4, 4}
	computedm := make(map[string]map[int]bool)

	startTime := time.Now()

	for i, x := range s {
		if compute, found := computedm[x]; found {
			compute[n[i]] = true
		} else {
			computedm[x] = map[int]bool{n[i]: true}
		}
	}
	fmt.Println(time.Since(startTime).Nanoseconds())

	fmt.Println(computedm)
}

Let me know if the solution is correct.

Regards!
Saya

1 Like