[Solved] Getting data races and need help with a solution

Hello,

I just started to learn about GoRoutines and Channels which prompted me to write a web scraper.
In this function I’m getting a url from a slice and using goQuery to get the various bits of information that I need, then passing it into a slice which gets passed into a 2d slice which is then passed into a function that I have that writes a csv file so the data can be seen in excel.

func getWinery(j string) {
	winery := []string{}
	w, _ := goquery.NewDocument(j)

	w.Find(".col-lg-9").Each(func(index int, item *goquery.Selection) {
		name := item.Find("h1").Text()
		r, _ := regexp.Compile("^.*\r?\n((?:.*\r?\n){2})")
		address := r.FindStringSubmatch(strings.TrimSpace(item.Find("address").Text()))[1]
		url, _ := item.Find("a").Attr("href")

		winery = append(winery, name, address, url)
	})
	wineries = append(wineries, winery)
}

The data race is happening at
wineries = append(wineries, winery)
Which I initial thought would be to use channels but at the moment I’m not sure how to do that.
The full code is at https://github.com/MattJBrowning/SuperWineryScraper/blob/master/main.go

1 Like

I’m guessing that the .Each function tries to run the function in parallel which is causing that function to run for each result at the same time.

You could try putting a sync.Mutex around this line

But if I can be a little more fatalistic (or realistic, depending on your POV) I think you’ll have a lot of trouble using goquery safely with an api like that, so it might be a good idea to investigate an alternative solution.

Maybe the Map function would work better.

I had a look at the code for Each and cannot see any parallism in there. Can you please post the entire race report, that should make it easy to figure out where the race is coming from.

I believe this is the report. Line 73 is wineries = append(wineries, winery)

==================
WARNING: DATA RACE
Read at 0x00000155f0a0 by goroutine 24:
  main.getWinery()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x3aa
  main.worker()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97

Previous write at 0x00000155f0a0 by goroutine 17:
  main.getWinery()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x45d
  main.worker()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97

Goroutine 24 (running) created at:
  main.main()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249

Goroutine 17 (running) created at:
  main.main()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
==================
==================
WARNING: DATA RACE
Read at 0x00c42051a9a8 by goroutine 24:
  runtime.growslice()
      /usr/local/Cellar/go/1.8.3/libexec/src/runtime/slice.go:82 +0x0
  main.getWinery()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x51b
  main.worker()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97

Previous write at 0x00c42051a9a8 by goroutine 17:
  main.getWinery()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x40f
  main.worker()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97

Goroutine 24 (running) created at:
  main.main()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249

Goroutine 17 (running) created at:
  main.main()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
==================
==================
WARNING: DATA RACE
Write at 0x00c420154400 by goroutine 24:
  main.getWinery()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x40f
  main.worker()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97

Previous write at 0x00c420154400 by goroutine 22:
  main.getWinery()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x40f
  main.worker()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97

Goroutine 24 (running) created at:
  main.main()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249

Goroutine 22 (running) created at:
  main.main()
      /Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
==================
Found 3 data race(s)
exit status 66

I removed the .Each after your comment on it. Realized that it wasn’t needed in that situation. The updated function is:

func getWinery(j string) {
	winery := []string{}
	w, _ := goquery.NewDocument(j)

	item := w.Find(".col-lg-9")
	name := item.Find("h1").Text()
	r, _ := regexp.Compile("^.*\r?\n((?:.*\r?\n){2})")
	address := r.FindStringSubmatch(strings.TrimSpace(item.Find("address").Text()))[1]
	url, _ := item.Find("a").Attr("href")

	winery = append(winery, name, address, url)

	wineries = append(wineries, winery)
}

Your data race is on this global variable

https://github.com/MattJBrowning/SuperWineryScraper/blob/master/main.go#L17

You will need a mutex around any access or update to that variable.

3 Likes

That seemed to do the trick and is also good to know for future use. Thank you for the help!

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.