Algorithm: map[string][]string -> reversed map[string][]string

You have a dataset that represents the relationships between dogs and cats. The dataset is structured as a map, where each dog is associated with a list of cat names. The cat names for a given dog are unique. Overall, all cat names and dog names are unique, although 2 dogs may belong the same cat, and two cats may belong to the same dog.

Problem: Your goal is to generate a new dataset that reverses the relationships, where each cat is associated with a list of unique dog names.

How would you solve this efficiently ? What do you do with large data ? Does Go offer any support/library data structures for this scenario ? What is the idiomatic way to approach this problem ?

PS: Iâ€™ve already experimented with the â€śiterate through the input map, for each value add it to the new map, at its corresponding key, if not already thereâ€ť. Looking for something more efficient than this. Similarly, the â€śuse an auxiliary map to check if a dog already belongs to this catâ€ť.

``````  dogscats := make(map[string][]string)  // map each dog to a catlist.
for cat, doglist := range catsdogs {
for _, dog := range doglist {
dogscats[dog] = append(dogscats[dog], cat)
}
}
``````

For many cats per dog, you may be reallocating `catlist`s. If thatâ€™s a performance problem,
loop through `catsdogs` once to count the number of cats per dog so you can allocate the `catlist`s to the correct initial size. Another performance problem may come from growing the map. You can count the unique dogs to pre-allocate the size of `dogscats`, but doing that youâ€™ll need a set of dogs (as a map) that would need to grow similarly to `dogscats`, so it may not be worthwhile.

Just for fun, hereâ€™s @mje 's algorithm in a runnable environment: https://goplay.tools/snippet/6UoNk3saTo8

Iâ€™m curious what more efficient stuff people come up with. Maybe even something with memory arenaâ€™s or what not?

in case there are multiple duplicates, have an auxiliary data structure, dogCats is a map[string]map[string]interface{}. After you build this auxiliary data structure, you build your result from it.

Yet it is highly dependent on how many duplicates, you allocate more memory than needed, etc.

Hereâ€™s another version but this time with the dummy data initialised in stead of built with `append`: https://goplay.tools/snippet/r5lLSEkJTRW

Underlying data structure is `map[string][]string`.

The most efficient solution is probably encoding the dog-cats relationship in a `[][]bool`-matrix. No need to alter the matrix since you can just iterate over it with columns and rows switched (in effect transposing the matrix).

Running example:

`````` dogs := [3]string{"dog1", "dog2", "dog3"}
cats := [4]string{"cat1", "cat2", "cat3", "cat4"}

dogm := [3][4]bool{
[4]bool{true, true, true, true},    // dog1 has cat1, cat2, cat3, cat4
[4]bool{true, false, true, false},  // dog2 has cat1, cat3
[4]bool{true, false, false, false}, // dog3 has cat1
}

// print source data
printmatrix(dogs[:], cats[:], dogm)
``````

Currently Iâ€™m still trying to write a (perhaps generic) `printmatrix`-function in stead of the 2 separate ones that avoids copying the matrix (with hard-coded dimensions) to slices.

Your idea is a clear example of thinking out of the box, congratulations to you sir. Having said that, I do not think I can use it: if you have N millions of dogs and M millions of cats, or even more, you need to consider the costs of storing a N*M bool matrix in memory, in this case it seems too much.

Not directly related, if a dog entity is just a simple small string, like â€śdog1â€ť, the difference between storing a bool or the string itself, is not big enough to merit memoization. A bool is a byte, while a string is something like 3 ints (one of them being a pointer to the immutable rune slice itself).

Keep coming with innovative ideas, I am sure this problem can be solved more efficiently than a pair of maps.

The dog/cat example is likely to very sparse. If your real problem is also very sparse, consider a sparse matrix implementation like CSR. Sparse matrix implementations are optimized for (relatively) fast matrix/matrix operations, so youâ€™ll pay a runtime penalty for random element access.

You can also reduce space by using an int64 to hold 64 matrix elements, but this could work against, and certainly complicate, a sparse implementation.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.