How to use Go to parse text files

I am new to Go programming.

Watching the invention of Go story by Rob Pike and Go discussions involving Ken Thompson, I believe I have a good overview of what purposes they expect Go to be used for. Go is both a low level compile to native machine code language as well as a server-side production environment project language. Where the C language syntax is difficult to understand, Go syntax was simplified (I assume versus C) to be easier to understand.

My takeaway from this research, Go is based on the power of the C language, simplifying the syntax, and including features that would benefit where many developers work, the “server room”. As if C and Erlang got married and had a Go child.

I would like to learn how to work with files, open,read,close.

This is an example from Clojure in a Nutshell talk.

Download a book as text, Project Gutenberg
Convert Book to string
Provide a result containing word characters only including apostrophies for possessives
Return the first 20 words in the book
20 most frequently used words
20 longest words
Longest palindrome

How would this program function when run on non-English texts? What results do you expect?

First off, check out:

And you could get pretty far with a naĂŻve implementation using strings.Split:

func main() {
	// Download a book
	res, err := http.Get("https://www.gutenberg.org/cache/epub/1513/pg1513.txt")
	if err != nil {
		log.Fatal(err)
	}
	// Read the body
	body, err := io.ReadAll(res.Body)
	if err != nil {
		log.Fatal(err)
	}
	res.Body.Close()
	// Make into a string
	converted := string(body)
	// Split based on space to get all words in the book
	lines := strings.Split(converted, "\n")
	// Keep track of the longest word.
	longest := ""
	for _, line := range lines {
		allWords := strings.Split(line, " ")
		for _, v := range allWords {
			if len(v) > len(longest) {
				longest = v
			}
		}
	}

	fmt.Println("Longest word:", longest)
}

Note this is completely untested and meant to be used as a starting point to get you going. It’s also pretty inefficient. So once you get going with this, check out the following examples of scanners. Scan lines:

And how to scan words:

So you could do the get, and then as you’re scanning lines/words into strings do your processing then instead of converting the entire thing to a string then using strings.Split (again - that is a pretty memory-intensive way of doing things). Will efficiency matter in this case? Probably not - but if you run into memory problems you could try a scanner. Anyway, hope this gets you going on the right path!

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.