In this modification, why is the performance greatly improved after the rightward shift? According to my test, different data sets affect the performance effect. The test data/e.txt file has good effect. However, for text files, the effect is average, what is the reason?
Anybody interested?
When calculating the hash value, change the shift number to generate more hash positions to avoid storage conflicts
The way LZW compression works is finding matches in the previous input. These matches are found using a hash table of previous input.
From the source
// table is the hash table from 20-bit keys to 12-bit values. Each table
// entry contains key<<12|val and collisions resolve by linear probing.
// The keys consist of a 12-bit code prefix and an 8-bit byte suffix.
// The values are a 12-bit code.
table [tableSize]uint32
The linear probing part means that if a hash collision was detected then it probes forwards in the hash table in a linear O(n) fashion. So if you get your hash function wrong then this degrades from an O(1) operation to an O(n) operations.
This change improves the hash function. Less collisions means less linear probing which means faster performance.