Why is the performance of this function better?

Heisenberg · August 16, 2020, 2:25pm

https://go-review.googlesource.com/c/go/+/245177

In this modification, why is the performance greatly improved after the rightward shift? According to my test, different data sets affect the performance effect. The test data/e.txt file has good effect. However, for text files, the effect is average, what is the reason?
Anybody interested?

ncw · August 17, 2020, 2:21pm

From the commit message

When calculating the hash value, change the shift number to generate more hash positions to avoid storage conflicts

The way LZW compression works is finding matches in the previous input. These matches are found using a hash table of previous input.

From the source

	// table is the hash table from 20-bit keys to 12-bit values. Each table
	// entry contains key<<12|val and collisions resolve by linear probing.
	// The keys consist of a 12-bit code prefix and an 8-bit byte suffix.
	// The values are a 12-bit code.
	table [tableSize]uint32

The linear probing part means that if a hash collision was detected then it probes forwards in the hash table in a linear O(n) fashion. So if you get your hash function wrong then this degrades from an O(1) operation to an O(n) operations.

This change improves the hash function. Less collisions means less linear probing which means faster performance.

I hope that helps!

Heisenberg · August 18, 2020, 2:46am

Thanks for your reply, I feel I understand the part.

Nick Craig-Wood via Go Forum forum@golangbridge.org 于2020年8月17日周一下午10:26写道：

system · November 16, 2020, 2:46am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.