The Go Programming Language (gopl) - Exercise 1.4

I have been going through The go programming language book and I got stuck trying to come up with a solution for one of the exercises.

Exercise 1.4, available here, starts with dup2 a program that reads either user input from stdin or from a sequence of named files and prints out the number of duplicate lines.Dup2 can be found here.

Such that

./dup2 file.txt file2.txt
5 cat
5 dog

Where

$cat file.txt && echo && cat file2.txt
cat
cat
dog
dog

cat
cat
cat
dog
dog
dog

In the exercise the reader is asked to

Modify dup2 to print the names of all the files in which each duplicated line occurs.

So far i have been able to come up with this

// Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan.
// License: https://creativecommons.org/licenses/by-nc-sa/4.0/

// See page 10.
//!+

// Dup2 prints the count and text of lines that appear more than once
// in the input.  It reads from stdin or from a list of named files.
package main

import (
	"bufio"
	"fmt"
	"os"
)

var files []string = os.Args[1:]

func main() {
	counts := make(map[string]int)
	if len(files) == 0 {
		countLines(os.Stdin, counts)
	} else {
		for i, arg := range files {
			f, err := os.Open(arg)
			if err != nil {
				fmt.Fprintf(os.Stderr, "dup2: %v\n", err)
				continue
			}
			fmt.Printf("Filename:%s\n",files[i])
			countLines(f, counts)
			f.Close()
		}
	}
	for line, n := range counts {
		if n > 1 {
			fmt.Printf("%d\t%s\n", n, line)
		}
	}
}

func countLines(f *os.File, counts map[string]int) {
	input := bufio.NewScanner(f)
	for input.Scan() {
		counts[input.Text()]++
	}
	// NOTE: ignoring potential errors from input.Err()
}

//!-

It prints all the names of all the files passed the program even if the file does not have duplicate lines.

What can i do so that it prints only the names of files with duplicate lines?

I think the idea is to keep the set of files where it occurs for each word. You would need to modify the counts map (currently string -> integer) to something that maps from string to a structure that contains both the count and set of file names.

Hi @calmh :smile:
So i thought about how i would be able to create such a structure,did some research and found out the strucutre you are describing is a nested map!
With that i was able to create a nested map that looked like this map[fileName:map[line:count]].Now i’m able to get the name of a file together with its duplicate line count and print out only the files with duplicate lines.
Here is the full code

package main

import (
	"bufio"
	"fmt"
	"os"
)

var files []string = os.Args[1:]
var linesStdin = map[string]int{}
var counts = map[string]map[string]int{}

func main() {
	if len(files) == 0 {
		countStdin(os.Stdin, linesStdin)
		for line, n := range linesStdin {
			if n > 1 {
				fmt.Printf("%s\t%d\n", line, n)
			}
		}
	} else {
		for _, arg := range files {
			counts[arg] = map[string]int{}
			f, err := os.Open(arg)
			if err != nil {
				fmt.Fprintf(os.Stderr, "dup2: %v\n", err)
				continue
			}
			countLines(f, counts)
			f.Close()
		}
		for fileName, innerMap := range counts {
			for line, n := range innerMap {
				if n > 1 {
					fmt.Printf("%s\t%s\t%d\n", fileName, line, n)
				}
			}
		}
	}
}

func countStdin(f *os.File, linesStdin map[string]int) {
	input := bufio.NewScanner(f)
	for input.Scan() {
		linesStdin[input.Text()]++
	}

}

func countLines(f *os.File, counts map[string]map[string]int) {
	input := bufio.NewScanner(f)
	for input.Scan() {
		counts[f.Name()][input.Text()]++
	}
}

What do you think about it and are there any improvements i can make?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.