Super fast execution time (we are talking under 2 secs) for counting occurrences of a char in large number of strings


(Vatsal Mishra) #1

Hi, just got into Go. Been playing with Go from past 2 years, only following tutorials and writing throwaway codes. Not on production but I am at a senior level in my career and planning to make full time switch to Go. Coming from Nodejs for about 5 years writing production code.

I wanted to test out Go as my tool of choice on some interesting online puzzle/challenge based exercises. What I like about Go is it is almost like C to run and compile and is very fast. But I am stuck at one point I really need some help. I have a code that is already quite fast but I need it even faster. And it is related to breaking strings even some large strings like:

DXqzxJrCnsAUSmkUkQCedQTcyYHIUPixMbyLKQoAKOuDDgxNWBRCrmErkNafCKeQnfBYxPahgumJclYAMPcFbizNVbuxnbnBgrbDIbmqXbZZOzzdQeItmJhiKdFYevbMMcRWdGnajOZXoaaeGlyTUhbhkifRfgftKzqNfDNQToNhQqmxBmqXtTefDLWJtEEddUsiZfYTVnCMDqjGCbgxUMDZazkVdPZglTQlZqGUEKiJbrSfUYDOKgqKhAGNphTBCmoBCwYIggUJaMPUmUlyadLhDTwTCqWeeMiiigtQOncqBtNNnANPDyygrVBTNxYRjIBTOUmehoXkHmXwMLlpLlXdgbyvyYmkGOdWIpDsfACMMcKapxRaJOFrPeNQcZClphhAZKueMWtaAdaPBtQMKvdbbQtGsVRjdAJcNFgzezcEfEWiYoUIlXlnqFTIAKtDMFhzkoZopDvWhreHFARQadFqIyAiKToxyXWllxzdCUBKNGrUIjLImqAwragDwWawUzHHyhKbDOnTuENEpandTpzkSipoeqKYrUGrbSsgGFBCMphDRWHKlIwdQMTVEgvxaDODwJlEQNhCEQIvMpgzzefqQANWGKtGoiL

into all the possible string combinations and finding occurrence of a set of chars in each sub-string. If you are curious the result is 7811003.

As I said the program is already fast enough but is there any scope to make it even faster. Here is the pseudo code

func(words []string){
    var charset = []string{.........}
    count := 0
    for word :=  range words{
        for _, char := range charset{
             count = count + strings.Count(word, char)
        }
    }
}

The above code is the part that I think is the bottleneck in performance just to be sure I am taking the ‘count’ variable as big.Int to calculate bigger values. I have around 10 input strings that have to be processed within the 1.1 sec time and 64 kb memory usage as that is the C benchmark.


(Christophe Meessen) #2

This code can be optimized by changing the algorithm.

If I understood correctly, you want to compute the total number of occurence of the characters of charset in the input words.

You could iterate on each character of each word and increment an int counter in a map whose key is the character in the charset. When you have done that, your final count is the sum of int values of the map.

You don’t need a big.Int because int is 64 bit and count wont exceed the number of letters in words. Using big.Int is much slower than int.

If the charset is limited to ASCII letters, you can use a slice of 256 int and use the ASCII code of the letter as index of the slice. Then, for each letter of each word, you increment the corresponding int in the slice. When done, you sum the int values of the characters in the charset. This will be the fastest.