bufio.NewReaderSize with 1M size corrupts memory unless scanner.Buffer also called


(Robert h Cowham) #1

I got memory corruption using the equivalent of the following code:

	file, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()
	fp := newFileParser()
	reader := bufio.NewReaderSize(file, 1024*1024)
	scanner := bufio.NewScanner(reader)
	for scanner.Scan() {
		line := scanner.Bytes()
		fp.parseLine(line)
	}

The corruption was removed when I created an appropriately sized buffer for use in the scanner:

	fp := newFileParser()
	const maxCapacity = 1024 * 1024
	buf := make([]byte, maxCapacity)
	reader := bufio.NewReaderSize(file, maxCapacity)
	scanner := bufio.NewScanner(reader)
	scanner.Buffer(buf, maxCapacity)
	for scanner.Scan() {
		line := scanner.Bytes()
		fp.parseLine(line)
	}

I discovered the default buffer is 64k. But I would expect the Scanner to be able to discover how big a buffer is available, and for example panic instead of silently corrupting memory (overwriting a data structure with the contents of the data read from the file) - are my expectations unreasonable?

Regards
Robert


(Constantin Konstantinidis) #2

Behaviour is inline with documentation which says here https://golang.org/pkg/bufio/#pkg-variables that scanner stops without error. There is also a related issue https://github.com/golang/go/issues/26431 about the way bufio stops.


(Robert h Cowham) #3

Maybe I am missing something - the scanner didn’t stop with or without an error, it just carried on but it overwrote memory.

It just raises concerns in my mind, it wouldn’t surprise me in C/C++ but it does a little in Go!

When debugging this, on a particular file I traced the corruption happening reliably after a fixed number of times around the scanner.Scan() loop.


(Constantin Konstantinidis) #4

Sorry for missing the point. Did you get a Go error message or was the data inconsistent ?


(Jakob Borg) #5

I’m pretty sure this shouldn’t happen. Can you make a reproducible test case?

“Equivalent code” isn’t great. The devil is in the details - in your original paste I’d take a close look at what fp.parseLine does exactly. (That is, verify that you are acting in accordance with what the documentation says about the return value of scanner.Bytes.)


(Robert h Cowham) #6

Sure will have a go. Current full working version (it’s a log parser matching begin and end records - still work in progress):

Maybe I’m doing other silly things (first real go program). Need to sanitise a log file.


(Johan Dahl) #7

Hi.

In function

func (block *Block) addLine(line []byte, lineNo int64) {

You need to copy the line before appending it. Otherwise will it be written with new data on next read.

Create a new slice with same length and use the copy command. See https://golang.org/ref/spec#Appending_and_copying_slices

With kind regards


(George Calianu) #8

Is nothing wrong in what happen. In the source code of bufio says that

MaxScanTokenSize is the maximum size used to buffer a token
unless the user provides an explicit buffer with Scan.Buffer.
The actual maximum token size may be smaller as the buffer
may need to include, for instance, a newline.

MaxScanTokenSize = 64 * 1024

you did very well by extending initial buffer with scanner.Buffer(buf, maxCapacity).


(Robert h Cowham) #9

Thanks Johan - that was the silly thing! Problem solved.