bufio.Scanner reads only last line of text file

Background:

  • Golang version go1.11 on Windows 10
  • Input text file produced by Python in which a line ended with only ‘\r’ character. Open the file in text editor, we can see line by line. Python code can read the file line-by-line properly.
  • scanner.Split(bufio.ScanLines) reads only the last line of the file, no error reported.
  • Edit the file using a text editor and add more lines, GO starts behaving weirdly; e.g., returning last line with characters from previous line.

Can you please post a code snippet that shows exactly how you are reading the file? If it is part of a long program, then write a short program in Go that does the file reading and shows the behavior you wrote about.

Your correct Scanner can’t handle files with only \r as line ends

package main

import (
	"bufio"
	"fmt"
	"os"
)

func main() {
	f, _ := os.Open("test.txt")
	defer f.Close()
	scanner := bufio.NewScanner(f)
	for scanner.Scan() {
		fmt.Println(">", scanner.Text())
	}
}

on a text

Hello, it's me
I was wondering if after all these years you'd like to meet
To go over everything
They say that time's supposed to heal ya
But I ain't done much healing
Hello, can you hear me
I'm in California dreaming about who we used to be
When we were younger and free
I've forgotten how it felt before the world fell at our feet

will output just the last line BUT without the >

I’ve forgotten how it felt before the world fell at our feet

Hi @jayts

Here is Python code writing to a file:

    with open('c:/temp/test.txt', 'w') as out:
        out.write('Hello Golang!\rI am a Python programmer.\rI think I like to Go\r')

Golang code to read the file (as @johandalabacka posted below)

func main() {
	f, _ := os.Open("test.txt")
	defer f.Close()
	scanner := bufio.NewScanner(f)
	for scanner.Scan() {
		fmt.Println(">", scanner.Text())
	}
}

I took the ScanLines function from the source code and changed it so it works with \r but now it will only work with \r and not \n or


package main

import (
	"bufio"
	"bytes"
	"fmt"
	"os"
)

func ScanLinesWithCR(data []byte, atEOF bool) (advance int, token []byte, err error) {
	if atEOF && len(data) == 0 {
		return 0, nil, nil
	}
	if i := bytes.IndexByte(data, '\r'); i >= 0 {
		// We have a full carriage return-terminated line.
		return i + 1, data[0:i], nil
	}
	// If we're at EOF, we have a final, non-terminated line. Return it.
	if atEOF {
		return len(data), data, nil
	}
	// Request more data.
	return 0, nil, nil
}

func main() {
	f, _ := os.Open("test.txt")
	defer f.Close()
	scanner := bufio.NewScanner(f)
	scanner.Split(ScanLinesWithCR)
	for scanner.Scan() {
		fmt.Println(">", scanner.Text())
	}
}

But really the standard function should handle this case. I see if I can report it.

Thats not a line end.

A line ends in \n (MacOS X and Linux) or \r\n (Windows). \r alone was only on old computers from the eighties and early nineties. You should normally not see it in the wild anymore.

And actually the scanner does scan the full content, not only the last “line” as you call it. But since most terminal emulators will simply set the cursor to the beginning of the line when encountering a \r, the next chars will overwrite the prevoius content. Just check it out:

XXXXXXXXXXXXXXXXXXX one
Greetings from line

(You probably need to replace the lineendings manually).


edit

The documentation of bufio.ScanLines does even tell you in regex notation and in a plain english sentence how it wants a line end to look like:

ScanLines is a split function for a Scanner that returns each line of text, stripped of any trailing end-of-line marker. The returned line may be empty. The end-of-line marker is one optional carriage return followed by one mandatory newline. In regular expression notation, it is \r?\n. The last non-empty line of input will be returned even if it has no newline.

1 Like

It is a “user-error” kind of situation in which generated files contain invalid line ending. What we are discussing here is how Golang behaves in such situations. Note that text editors and Python (I have not yet tried Java) can read such files.

The function behaves as documented. If you have different needs you need to code something that works for your specification.

@NobbZ your correct ScanLines work as specified. I assumed it was a bug… :confused:

The end-of-line marker is one optional carriage return followed by one mandatory newline. In regular expression notation, it is \r?\n

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.