Unicode "Is" functions - which most likely best to use?

Reg · October 27, 2020, 8:44am

I understand that this could be somewhat subjective so I’ll try to outline the parts that I’m mostly sure about to narrow the focus as much as possible.

I want to do some rudimentary login name checking. My goal would be to allow all unicode characters that would be used in a name including in non-English languages + numbers and basic punctuation that could be in a name but not writing punctuation like !, ? nor control character like CR, LF and so on and white space is allowed in this case.

The obvious things to allow would be: unicode.IsDigit/unicode.IsNumber, unicode.IsLetter and to not use IsControl but I am unclear with IsMark as it looks like a subcategory would be useful but another not and have no idea about IsGraphic and unsure about IsPunct.

Has anyone delved into unicode enough to have an idea on what would be a reasonable selection of allowed unicode?

Update:
After playing around with words from different countries I have so far come up with this:

  switch  {
  case unicode.IsDigit(chr), unicode.IsLetter(chr), unicode.IsSpace(chr), unicode.Is(unicode.Dash, chr), unicode.Is(unicode.Hyphen, chr):
  case unicode.IsPunct(chr):
    if !strings.ContainsAny(string(chr),"’'ʼ՚ߴߵߵ’＇") { // Apostrophes
      logger.Error(`Message`)
    }
  default :
    logger.Error(`Message`)
  }

system · January 25, 2021, 8:44am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.