String encoding. Is this a bug!

Hi.

I am writing my own JSON parser and was looking at String encoding/decoding.

I think I have found a bug but as I am reasonably new to Go I would like to clarify the issue first. I am familiar with i18n from years of Java programming so I think this is an issue.

The problem occurs with escape x sequence. For example \xNN where NN are two hex digits.

The \uNNNN encoding seems to be OK.

It seems to work fine up until \x80. As you can see from the code, every thing higher than \x7F results in a rune with a value 65533.

Regards

Stuart

This is my code:

package main
import (
	"fmt"
	"strings"
)
func main() {
	fmt.Println(bytesString("Test '\x40' @"))
	fmt.Println(bytesString("Test '\x7F' Not sure but OK"))
	fmt.Println(bytesString("Test '\x80' This is bad?"))
	fmt.Println(bytesString("Test '\x81' This is bad?"))
	fmt.Println(bytesString("Test '\x88' This is bad?"))
}
func bytesString(inStr string) string {
	var sb strings.Builder
	for _, c := range inStr {
		if c < 32 || c > 127 {
			sb.WriteString(fmt.Sprintf("(%d)", c))
		} else {
			sb.WriteString(fmt.Sprintf("%c", c))
		}
	}
	return sb.String()
}

The output from this is

Test '@' @
Test '' Not sure but OK
Test '(65533)' This is bad?
Test '(65533)' This is bad?
Test '(65533)' This is bad?

There is a rune displayed for \x7F as shown in the image below:

Screenshot from 2021-10-02 15-53-25

\x80' is not a valid utf8 encoded string.

65533 is U+FFFD, the replacement character for undecodable sequences.

Often visualised as .

1 Like

Ok that’s great.

Thanks for the quick response.

Stuart