Unicode collation using golang.org/x/text/collate

I am implementing a library for data representation.
Refer: https://github.com/bnclabs/gson

One of the feature support by this package is to compile composite
data (JSON supported) into binary format that can be sorted using
memcmp.

As part of this binary-collation feature I need to support string
sorting based on ICU collation standard. After a bit of googling
came across this awesome package:


And I am using Collator.Key() to compile string into ICU sort-key
that can be used with memcmp.

Wrote couple of test cases for this and it works fine.

My Question is:

After compiling the string value to binary-comparable sort-key, can
I get back the original string value from its sort-key ? Is that a
limitation with ICU standard or a limitation with golang.org/x/text/collate

Thanks,

I too have encountered the same issue. One possible solution is to store the original string alongside the sortkey (perhaps separated by a known delimiter, or as a tuple) in the “encode” phase and fetch the original string in the “decode” phase. However, this adds the overhead of extra space.

Like you already mentioned, I was unable to find a reverse mapping from sortkey to original string. From a quick perusal of the code, it looks like the text package of GoLang uses CGo to call libicu, which in-turn doesn’t have the reverse mapping function.

Ideally, someone with collation/language expertise can advise us on what the best practices to be followed in this scenario are!

Following up on my previous comment, here are some relevant links to suggest that you in-fact need to store both the sort-key, and the original data – There doesn’t seem to exist an inverse function that could map back from the sort-key to the original key.


Thanks,
Aman Achpal

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.