bufio.NewReaderSize with 1M size corrupts memory unless scanner.Buffer also called

I got memory corruption using the equivalent of the following code:

	file, err := os.Open(os.Args[1])
	if err != nil {
	defer file.Close()
	fp := newFileParser()
	reader := bufio.NewReaderSize(file, 1024*1024)
	scanner := bufio.NewScanner(reader)
	for scanner.Scan() {
		line := scanner.Bytes()

The corruption was removed when I created an appropriately sized buffer for use in the scanner:

	fp := newFileParser()
	const maxCapacity = 1024 * 1024
	buf := make([]byte, maxCapacity)
	reader := bufio.NewReaderSize(file, maxCapacity)
	scanner := bufio.NewScanner(reader)
	scanner.Buffer(buf, maxCapacity)
	for scanner.Scan() {
		line := scanner.Bytes()

I discovered the default buffer is 64k. But I would expect the Scanner to be able to discover how big a buffer is available, and for example panic instead of silently corrupting memory (overwriting a data structure with the contents of the data read from the file) - are my expectations unreasonable?


Behaviour is inline with documentation which says here https://golang.org/pkg/bufio/#pkg-variables that scanner stops without error. There is also a related issue https://github.com/golang/go/issues/26431 about the way bufio stops.

Maybe I am missing something - the scanner didn’t stop with or without an error, it just carried on but it overwrote memory.

It just raises concerns in my mind, it wouldn’t surprise me in C/C++ but it does a little in Go!

When debugging this, on a particular file I traced the corruption happening reliably after a fixed number of times around the scanner.Scan() loop.

Sorry for missing the point. Did you get a Go error message or was the data inconsistent ?

I’m pretty sure this shouldn’t happen. Can you make a reproducible test case?

“Equivalent code” isn’t great. The devil is in the details - in your original paste I’d take a close look at what fp.parseLine does exactly. (That is, verify that you are acting in accordance with what the documentation says about the return value of scanner.Bytes.)

Sure will have a go. Current full working version (it’s a log parser matching begin and end records - still work in progress):

Maybe I’m doing other silly things (first real go program). Need to sanitise a log file.


In function

func (block *Block) addLine(line []byte, lineNo int64) {

You need to copy the line before appending it. Otherwise will it be written with new data on next read.

Create a new slice with same length and use the copy command. See https://golang.org/ref/spec#Appending_and_copying_slices

With kind regards

1 Like

Is nothing wrong in what happen. In the source code of bufio says that

MaxScanTokenSize is the maximum size used to buffer a token
unless the user provides an explicit buffer with Scan.Buffer.
The actual maximum token size may be smaller as the buffer
may need to include, for instance, a newline.

MaxScanTokenSize = 64 * 1024

you did very well by extending initial buffer with scanner.Buffer(buf, maxCapacity).

1 Like

Thanks Johan - that was the silly thing! Problem solved.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.