Problem with bytes.Buffer

mbraeutig · September 21, 2017, 4:11pm

Hello Together,

i am a go beginner and i am coming from germany, and i hope you can help me.

In the moment i am working on a little backup/copy program. Nothing special, only copy.
You can find the code-snippet here: https://play.golang.org/p/vhBunDg92V

There ist nothing magic in the whole Programm, a litte bit more as the snippet.
(For example before copy, making the Folder …)

The Implementation of the FileOpener interface makes a os.Open(filename) and os.Stat(filename) and a Checksum, and give the ReadCloser Interface inside the File back.

The Implementation of the FileWriter interface makes a http Post Request with the file as body input.

The whole program works ok.
Also when i copy bigger files, for example 1GB, the program needs not more than 10MB, very Nice.
Only working with the ReaderCloser Interface
func copyFileNormal(opener FileOpener, writer FileWriter, src, dst string) error {…}

Now i was a litte bit experience with buffering the file first before to write.
func copyFileBuffered(opener FileOpener, writer FileWriter, src, dst string) error {…}

How i do this ?

Depends on the version inside the function, the program needs now 3 * file size of memory
When i now copy a file with 1GB the Program needs now 3GB, not so good.
The whole file size with little bit overhead is ok, but not 2 * or 3 * of the file size.

What do i wrong ?
Or is that a normal behavior of the GC ?
Which Version is the best or is the a better solution ?
Or what is the problem here ?

Greets and Thanks
Michael

nathankerr · September 21, 2017, 5:11pm

I’m not sure how you are measuring memory use and there isn’t enough code to run it myself…

I would start by benchmarking the different versions and running the benchmarks with -benchmem to get a GC-aware memory measurement. Here are some resources to get started:

Happy to help if you have problems getting this to work.

mbraeutig · September 22, 2017, 11:17am

Hi,

thanks for the tips, this was a good start point.
So, i was making a mem-pprofil session

And i hav found the Reason

The capacity of the buffer was to small.

So the programm was running in this section:
https://golang.org/src/bytes/buffer.go?s=6465:6524#L203

 if free := cap(b.buf) - len(b.buf); free < MinRead {
		// not enough space at end
		newBuf := b.buf
		if b.off+free < MinRead {
			// not enough space using beginning of buffer;
			// double buffer capacity
			newBuf = makeSlice(2*cap(b.buf) + MinRead)   !!!!!!!!!!!!!!!!!!!!!!!!
		}
		copy(newBuf, b.buf[b.off:])
		b.buf = newBuf[:len(b.buf)-b.off]
		b.off = 0
	}

so when i make this, all is ok

buf := bytes.NewBuffer(make([]byte, 0, file.Size+bytes.MinRead))
n, err := buf.ReadFrom(file)

So big differnt is the +bytes.MinRead

Does this mean, that the capacity of the slice/buffer must be allways greater then my later underlying data ?

I think i mus go back to read the section of the “make allocation”
https://golang.org/doc/effective_go.html#allocation_make

Thanks
Michael

nathankerr · September 22, 2017, 11:37am

Does this mean, that the capacity of the slice/buffer must be allways greater then my later underlying data ?

It means that bytes.Buffer needs to think there is enough space in the underlying []byte to successfully Read the next chunk. It doesn’t know how much is coming in the next Read or the rest of the loop.

I suspect you could write a more efficient implementation based on your knowledge of file.Size (which buffer.ReadFrom does not have).

mbraeutig · September 22, 2017, 12:33pm

Hi,

like this ?

c := make([]byte, file.Size)
n, err := io.ReadFull(file, c)
log.Printf("Read %v bytes", n)
if err != nil {
	log.Printf("Copy error, err: %s", err)
}

nathankerr · September 22, 2017, 3:37pm

Maybe. What does benchmarking it say?

navossoc · September 23, 2017, 12:13pm

In my opinion, if you want to read all the data in one single operation and you don’t have any memory constraints.
As far you know ahead of time what is the size of the data and it is a really good idea to set the capacity (preallocate it).

If you make a slice of 10 bytes, when you need to insert a new item and hit the maximum capacity, it will allocate a new backing array with the double of the size (20 bytes) (or whatever the current implementation is)

That is why you had such performance increase just preallocating it.

[]'s

system · December 22, 2017, 9:24pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.