MattsLab
(Matt Browning)
June 22, 2017, 12:57am
1
Hello,
I just started to learn about GoRoutines and Channels which prompted me to write a web scraper.
In this function I’m getting a url from a slice and using goQuery to get the various bits of information that I need, then passing it into a slice which gets passed into a 2d slice which is then passed into a function that I have that writes a csv file so the data can be seen in excel.
func getWinery(j string) {
winery := []string{}
w, _ := goquery.NewDocument(j)
w.Find(".col-lg-9").Each(func(index int, item *goquery.Selection) {
name := item.Find("h1").Text()
r, _ := regexp.Compile("^.*\r?\n((?:.*\r?\n){2})")
address := r.FindStringSubmatch(strings.TrimSpace(item.Find("address").Text()))[1]
url, _ := item.Find("a").Attr("href")
winery = append(winery, name, address, url)
})
wineries = append(wineries, winery)
}
The data race is happening at
wineries = append(wineries, winery)
Which I initial thought would be to use channels but at the moment I’m not sure how to do that.
The full code is at https://github.com/MattJBrowning/SuperWineryScraper/blob/master/main.go
1 Like
dfc
(Dave Cheney)
June 22, 2017, 1:09am
2
I’m guessing that the .Each
function tries to run the function in parallel which is causing that function to run for each result at the same time .
You could try putting a sync.Mutex
around this line
But if I can be a little more fatalistic (or realistic, depending on your POV) I think you’ll have a lot of trouble using goquery safely with an api like that, so it might be a good idea to investigate an alternative solution.
dfc
(Dave Cheney)
June 22, 2017, 1:15am
3
Maybe the Map function would work better.
I had a look at the code for Each
and cannot see any parallism in there. Can you please post the entire race report, that should make it easy to figure out where the race is coming from.
MattsLab
(Matt Browning)
June 22, 2017, 1:26am
4
I believe this is the report. Line 73 is wineries = append(wineries, winery)
==================
WARNING: DATA RACE
Read at 0x00000155f0a0 by goroutine 24:
main.getWinery()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x3aa
main.worker()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97
Previous write at 0x00000155f0a0 by goroutine 17:
main.getWinery()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x45d
main.worker()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97
Goroutine 24 (running) created at:
main.main()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
Goroutine 17 (running) created at:
main.main()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
==================
==================
WARNING: DATA RACE
Read at 0x00c42051a9a8 by goroutine 24:
runtime.growslice()
/usr/local/Cellar/go/1.8.3/libexec/src/runtime/slice.go:82 +0x0
main.getWinery()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x51b
main.worker()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97
Previous write at 0x00c42051a9a8 by goroutine 17:
main.getWinery()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x40f
main.worker()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97
Goroutine 24 (running) created at:
main.main()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
Goroutine 17 (running) created at:
main.main()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
==================
==================
WARNING: DATA RACE
Write at 0x00c420154400 by goroutine 24:
main.getWinery()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x40f
main.worker()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97
Previous write at 0x00c420154400 by goroutine 22:
main.getWinery()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:73 +0x40f
main.worker()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:46 +0x97
Goroutine 24 (running) created at:
main.main()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
Goroutine 22 (running) created at:
main.main()
/Users/matt/Code/Go/src/github.com/MattJBrowning/SuperWineryScraper/main.go:29 +0x249
==================
Found 3 data race(s)
exit status 66
I removed the .Each
after your comment on it. Realized that it wasn’t needed in that situation. The updated function is:
func getWinery(j string) {
winery := []string{}
w, _ := goquery.NewDocument(j)
item := w.Find(".col-lg-9")
name := item.Find("h1").Text()
r, _ := regexp.Compile("^.*\r?\n((?:.*\r?\n){2})")
address := r.FindStringSubmatch(strings.TrimSpace(item.Find("address").Text()))[1]
url, _ := item.Find("a").Attr("href")
winery = append(winery, name, address, url)
wineries = append(wineries, winery)
}
dfc
(Dave Cheney)
June 22, 2017, 1:37am
5
Your data race is on this global variable
https://github.com/MattJBrowning/SuperWineryScraper/blob/master/main.go#L17
You will need a mutex around any access or update to that variable.
3 Likes
MattsLab
(Matt Browning)
June 22, 2017, 1:45am
6
That seemed to do the trick and is also good to know for future use. Thank you for the help!
1 Like
system
(system)
Closed
September 20, 2017, 1:46am
7
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.