Is there a faster alternative to ioutil.ReadFile?

Hey,
I am trying to make a program for checking file duplicates based on md5 checksum.
Not really sure whether I am missing something or not, but this function reading the XCode installer app (it has like 8GB) uses 16GB of Ram

func search() {
	unique := make(map[string]string)
	files, err := ioutil.ReadDir(".")
	if err != nil {
		log.Println(err)
	}

	for _, file := range files {
		fileName := file.Name()
		fmt.Println("CHECKING:", fileName)
		fi, err := os.Stat(fileName)
		if err != nil {
			fmt.Println(err)
			continue
		}
		if fi.Mode().IsRegular() {
			data, err := ioutil.ReadFile(fileName)
			if err != nil {
				fmt.Println(err)
				continue
			}
			sum := md5.Sum(data)
			hexDigest := hex.EncodeToString(sum[:])
   			if _, ok := unique[hexDigest]; ok == false {
			 	unique[hexDigest] = fileName
			} else {
			 	fmt.Println("DUPLICATE:", fileName)
			}
		}
	}
}

As per my debugging the issue is with the file reading
Is there a better approach to do that?
thanks

Yes, you should definitely avoid reading the whole file into memory. Instead, create a hasher, open the file, and use io.Copy to copy from the file into the hasher.

fd, err := os.Open(filename)
// handle error
defer fd.Close()

h := md5.New()
_, err = io.Copy(h, fd)
// handle error

sum := md5.Sum(nil)
hexDigest := ...
2 Likes

awesome, thanks

Also, you could use a faster hasher such as blake2. MD5 is obsolete.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.