Invalid rune : Ä

I am getting an invalid rune value for the Swedish letter Ä, according to this site…

…the ascii code should be 196, but the above code returns 195, which is Ã?

This is the first time I am messing aroung with runes, so I might have misunderstood something fundamental here. Why doesn’t it return 196?

Ok, so if I do it like this, it works:

Guess I had misunderstood something fundamental :slight_smile:

ASCII has only 7 bit. Windows-1252 is not ASCII, it is one of many ASCII compatible 8 bit character encodings.

Though Go uses the full unicode spectrum in which “Latin Capital Letter A with Diaeresis” (Ä) has the ordinal 196 (U+00C4).

The golang runtime does internally encode these UTF-8. So this grapheme would actually take up 2 bytes (0xC3 0x84).

There are other ways to build the character commonly represented as Ä, You already found about U+00C4, though it can also be composed from A (U+0041) and ◌̈ (U+0308). This would in UTF-8 ultimatively require 3 byte to encode (1 for A and 2 for ◌̈).

Also please remember, that Ä isn’t used in swedish alphabets only, but in the danish and German as well.

In general the lowercase variants are used throughout a lot of alphabets, as the diaresis (◌̈) above a letter usually mean some kind of vocal pause. In the German language this is not true though, as the umlauts have been evolved from ligatures like Æ, which had a different sound than an A followed by an E.

If you want to deal with any input that is not UTF-8 encoded (original 7 bit ASCII is a subset of UTF-8) you need to convert between the encodings. Unicode ordinals are often used as an universal inbetween mapping for any kind of re-encoding of characters.

1 Like

Yes, you still appear to be missing something fundamental.

The Go Blog: Strings, bytes, runes and characters in Go

About the Unicode Standard

1 Like

This link explains most of what I was missing, thank you…

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.