bufio.NewReaderSize with 1M size corrupts memory unless scanner.Buffer also called

Robaato · October 8, 2018, 6:21pm

I got memory corruption using the equivalent of the following code:

	file, err := os.Open(os.Args[1])
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()
	fp := newFileParser()
	reader := bufio.NewReaderSize(file, 1024*1024)
	scanner := bufio.NewScanner(reader)
	for scanner.Scan() {
		line := scanner.Bytes()
		fp.parseLine(line)
	}

The corruption was removed when I created an appropriately sized buffer for use in the scanner:

	fp := newFileParser()
	const maxCapacity = 1024 * 1024
	buf := make([]byte, maxCapacity)
	reader := bufio.NewReaderSize(file, maxCapacity)
	scanner := bufio.NewScanner(reader)
	scanner.Buffer(buf, maxCapacity)
	for scanner.Scan() {
		line := scanner.Bytes()
		fp.parseLine(line)
	}

I discovered the default buffer is 64k. But I would expect the Scanner to be able to discover how big a buffer is available, and for example panic instead of silently corrupting memory (overwriting a data structure with the contents of the data read from the file) - are my expectations unreasonable?

Regards
Robert

Costa_Konstantinidis · October 8, 2018, 6:43pm

Behaviour is inline with documentation which says here https://golang.org/pkg/bufio/#pkg-variables that scanner stops without error. There is also a related issue https://github.com/golang/go/issues/26431 about the way bufio stops.

Robaato · October 8, 2018, 6:51pm

Maybe I am missing something - the scanner didn’t stop with or without an error, it just carried on but it overwrote memory.

It just raises concerns in my mind, it wouldn’t surprise me in C/C++ but it does a little in Go!

When debugging this, on a particular file I traced the corruption happening reliably after a fixed number of times around the scanner.Scan() loop.

Costa_Konstantinidis · October 8, 2018, 6:59pm

Sorry for missing the point. Did you get a Go error message or was the data inconsistent ?

calmh · October 8, 2018, 7:04pm

I’m pretty sure this shouldn’t happen. Can you make a reproducible test case?

“Equivalent code” isn’t great. The devil is in the details - in your original paste I’d take a close look at what fp.parseLine does exactly. (That is, verify that you are acting in accordance with what the documentation says about the return value of scanner.Bytes.)

Robaato · October 8, 2018, 11:00pm

Sure will have a go. Current full working version (it’s a log parser matching begin and end records - still work in progress):

github.com

rcowham/go-libp4dlog/blob/master/p4dlog.go

/*
Package p4dlog parses Perforce text logs (not structured logs).

It assumes you have set configurable server=3 (or greater)
You may also have decided to set track=1 to get more detailed usage of
access to different tables.

See p4dlog_test.go for examples of log entries.

*/
package p4dlog

import (
	"bufio"
	"bytes"
	"crypto/md5"
	"encoding/hex"
	"encoding/json"
	"fmt"
	"log"

This file has been truncated. show original

Maybe I’m doing other silly things (first real go program). Need to sanitise a log file.

johandalabacka · October 9, 2018, 2:00am

Hi.

In function

func (block *Block) addLine(line []byte, lineNo int64) {

You need to copy the line before appending it. Otherwise will it be written with new data on next read.

Create a new slice with same length and use the copy command. See https://golang.org/ref/spec#Appending_and_copying_slices

With kind regards

geosoft1 · October 9, 2018, 6:26am

Is nothing wrong in what happen. In the source code of bufio says that

MaxScanTokenSize is the maximum size used to buffer a token
unless the user provides an explicit buffer with Scan.Buffer.
The actual maximum token size may be smaller as the buffer
may need to include, for instance, a newline.

MaxScanTokenSize = 64 * 1024

you did very well by extending initial buffer with scanner.Buffer(buf, maxCapacity).

Robaato · October 9, 2018, 7:20am

Thanks Johan - that was the silly thing! Problem solved.

system · January 7, 2019, 7:21am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.