This looked interesting because later in the development of my project, I may need to read in whole files too. I decided to study into it a bit.
To get memory statistics, top
may not be the best tool. The Go package runtime
has methods for reporting memory statistics, and you can get a lot more detail. In the code sample below, I’m using runtime.ReadMemStats()
for this purpose.
Not just in Go, but generally, it is a lot more efficient to read in a whole file in a single gulp rather than a line at a time, but I was surprised to see how much difference it makes. I wrote a little program to compare gojo’s read() function with the one provided by clbanning.
Here is the full program, and directions for usage follow the code:
package main
import (
"bufio"
"bytes"
"fmt"
"io/ioutil"
"os"
"runtime"
"strconv"
)
/* gojo (original poster) */
func read1(path string) (*[]string, error) {
var lines []string
file, err := os.OpenFile(path, os.O_RDONLY, os.ModePerm)
if err != nil {
return nil, err
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
if len(line) == 0 { continue }
lines = append(lines, scanner.Text())
}
return &lines, nil
}
/* Charles Banning's (clbanning) */
func read2(path string) ([][]byte, error) {
data, err := ioutil.ReadFile(path)
if err != nil {
return nil, err
}
return bytes.Split(data, []byte("\n")), nil
}
const (
gojo = iota + 1
clbanning
)
func usage(exitval int) {
//
fmt.Fprintf(os.Stderr,"usage: %s <filename> <version>\n",os.Args[0])
fmt.Fprintf(os.Stderr,"\tversion is either 1 (for gojo) or 2 (for clbanning)\n")
os.Exit(exitval)
}
func main() {
//
var ms runtime.MemStats
var err error
var version int
if len(os.Args) > 3 {
//
fmt.Fprintf(os.Stderr,"%s: too many arguments\n",os.Args[0])
usage(1)
}
if len(os.Args) < 3 {
//
fmt.Fprintf(os.Stderr,"%s: missing argument(s)\n",os.Args[0])
usage(1)
}
version, err = strconv.Atoi(os.Args[2])
if err != nil {
//
fmt.Fprintf(os.Stderr,"%s: version error\n",os.Args[0])
usage(2)
}
switch version {
//
case gojo:
// non-testing version: ignore return value
_, err = read1(os.Args[1])
// for testing: print the first line of the file
// var s1 *[]string
// s1, err = read1(os.Args[1])
// fmt.Printf("%s\n",(*s1)[0:1][0]) // Print first line of s1
case clbanning:
// non-testing version: ignore return value
_, err = read2(os.Args[1])
// for testing: print the first line of the file
// var s2 [][]byte
// s2, err = read2(os.Args[1])
// fmt.Printf("%s\n",string(s2[0]))
default:
fmt.Fprintf(os.Stderr,"%s: version error\n",os.Args[0])
os.Exit(2)
}
if err != nil {
//
fmt.Fprintf(os.Stderr,"Can't open file \"%s\"\n",os.Args[1])
os.Exit(2)
}
runtime.ReadMemStats(&ms)
fmt.Printf("\n")
fmt.Printf("Alloc: %d MB, TotalAlloc: %d MB, Sys: %d MB\n",
ms.Alloc/1024/1024, ms.TotalAlloc/1024/1024,ms.Sys/1024/1024)
fmt.Printf("Mallocs: %d, Frees: %d\n",
ms.Mallocs, ms.Frees)
fmt.Printf("HeapAlloc: %d MB, HeapSys: %d MB, HeapIdle: %d MB\n",
ms.HeapAlloc/1024/1024, ms.HeapSys/1024/1024, ms.HeapIdle/1024/1024)
fmt.Printf("HeapObjects: %d\n", ms.HeapObjects)
fmt.Printf("\n")
}
To use it, name the file “readfile.go”, then use the ‘go build’ command to build it:
go build readfile.go
To do the tests, use the readfile program like this:
readfile input_file 1
or
readfile input_file 2
where input_file
is the name of a file.
When I tested a 570 MB file, I got these results:
Reading a line at a time (gojo):
Alloc: 651 MB, TotalAlloc: 1192 MB, Sys: 812 MB
Mallocs: 1166446, Frees: 518919
HeapAlloc: 651 MB, HeapSys: 767 MB, HeapIdle: 74 MB
HeapObjects: 647527
Reading all at once (clbanning):
Alloc: 583 MB, TotalAlloc: 583 MB, Sys: 662 MB
Mallocs: 192, Frees: 13
HeapAlloc: 583 MB, HeapSys: 639 MB, HeapIdle: 55 MB
HeapObjects: 179
You can easily see that reading the file a line at a time required many more memory allocations and freeings of memory, and resulted in many more objects on the heap. (The numbers you get on your own data depends on how many newlines are in the file.)
To understand the numbers better, this will help:
(Documentation for package runtime
, type MemStats
)
There are many fields in the MemStats
struct than I used, and you can modify the code to your liking to look at other things. Have fun.