Invalid rune : Ä

I am getting an invalid rune value for the Swedish letter Ä, according to this site…

…the ascii code should be 196, but the above code returns 195, which is Ã?

https://play.golang.org/p/6qC7x7bI2pg

This is the first time I am messing aroung with runes, so I might have misunderstood something fundamental here. Why doesn’t it return 196?

Ok, so if I do it like this, it works:

https://play.golang.org/p/vxrNX6VsIjr

Guess I had misunderstood something fundamental :slight_smile:

ASCII has only 7 bit. Windows-1252 is not ASCII, it is one of many ASCII compatible 8 bit character encodings.

Though Go uses the full unicode spectrum in which “Latin Capital Letter A with Diaeresis” (Ä) has the ordinal 196 (U+00C4).

The golang runtime does internally encode these UTF-8. So this grapheme would actually take up 2 bytes (0xC3 0x84).

There are other ways to build the character commonly represented as Ä, You already found about U+00C4, though it can also be composed from A (U+0041) and ◌̈ (U+0308). This would in UTF-8 ultimatively require 3 byte to encode (1 for A and 2 for ◌̈).

Also please remember, that Ä isn’t used in swedish alphabets only, but in the danish and German as well.

In general the lowercase variants are used throughout a lot of alphabets, as the diaresis (◌̈) above a letter usually mean some kind of vocal pause. In the German language this is not true though, as the umlauts have been evolved from ligatures like Æ, which had a different sound than an A followed by an E.

If you want to deal with any input that is not UTF-8 encoded (original 7 bit ASCII is a subset of UTF-8) you need to convert between the encodings. Unicode ordinals are often used as an universal inbetween mapping for any kind of re-encoding of characters.

1 Like

Yes, you still appear to be missing something fundamental.

The Go Blog: Strings, bytes, runes and characters in Go

About the Unicode Standard

1 Like

This link explains most of what I was missing, thank you…

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.