How to store text correctly?

Hi there,

I’m new to Go and want to write an very simple tool. So, at least I thought it would be simple.

All the program should do is save some text (which I type into the command line) to a textfile. This works so far, but I have a couple of issues. Here is the code:

package main

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"os"

	aurora "github.com/logrusorgru/aurora"
)

func save(text *[]byte) error {
	filename := "data/test.txt"

	return ioutil.WriteFile(filename, *text, 0600)
}

func main() {
	var input string
	var b []byte
	scanner := bufio.NewScanner(os.Stdin)

	for {
		fmt.Print(aurora.Magenta(">> "))
		scanner.Scan()
		input = scanner.Text()

		if input == "fin" {
			err := save(&b)
			if err != nil {
				fmt.Println(err.Error())
			}
			return
		}

		b = append(b[:], scanner.Bytes()...)
		b = append(b[:], []byte("\n")...)
	}
}

As said it works so far. It saves the file. When I open the file in Visual Studio Code it shows up as I expected. But when I open it in editor.exe the newlines are gone and everything is in one line. When I open it in Notepad, the newlines are there but umlauts don’t show up correctly.

I do not know anything about character encoding so this is very confusing for me. Is there a way to save textfiles in a way so every text editor can show those files correctly?

Yes, and you are doing it correctly. But those editors that you want to use to open the file need to be configurable on their input encoding. There is no way to specify the encoding of a plain-text file in itself.

Then you need to set it up to use the same encoding that is used when writing the file. It should be the same encoding as your terminal uses.

The newlines are still there, but editor.exe is not displaying them. editor.exe does need windows line endings to actually show a linebreak. Windows Line endings are "\r\n".

I do probably know not much more than you, but as a rule of thumb:

If encodings differ, output will as well.

2 Likes

The best option is to stick with UTF-8 and Unicode as Go uses by default, and use better Windows software.

However, a lot of older Windows software will expect text files to be encoded in ISO-8859-1 (Latin 1). Here’s an article about converting character encodings:

http://technosophos.com/2016/03/09/go-quickly-converting-character-encodings.html

Some Windows software won’t handle UTF-8, and hence will also need the files to be written in UTF-16 for things like emoji to work. Here’s some info about how Go handles that:

https://ipfs.io/ipfs/QmfYeDhGH9bZzihBUDEQbCbTc5k5FZKURMUoUvfmc27BwL/encoding/utf-16_and_go.html

1 Like

You do forget that even the terminal in windows doesn’t necessarily use utf-whatever.

And reading from stdin just gives you the bytes you get from the system, there is no magic conversion into utf8.

Thanks for the help. I will just not use wordpad (like every sane person). I also replaced the \n with \r\n and it works just fine now.

thanks guys for this info!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.