Hey Nathan, I want to thank your deliberate answer, appreciate a lot.
I try to read your code and learn a lot.
I try to use different implementation to deal with this same question, and think about what will be more idiomatic, pros, cons and trade-offs.
Contexts : Crawler all the links given specific URL
First solution is
//common part:
type Links struct {
Level int
Link string
}
var tokens = make(chan struct{}, 1)
func main() {
start := time.Now()
c := utility.New()
c.ParsingPath = "https://ryanfakir.github.io"
linkList := make(chan Links)
out := make(chan Links)
wg := &sync.WaitGroup{}
depth := 3
//differece begins Approach One
go func() {
// start first link
linkList <- Links{1, c.ParsingPath}
tokens <- struct{}{}
}()
var res []Links
for i := 0; i < 2; i++ {
go func() {
for v := range linkList {
fmt.Printf(" runtime.NumGoroutine()= %+v\n", runtime.NumGoroutine())
if v.Level >= depth {
continue
}
select {
case <-tokens:
// append untouch urls from res
var items []Links
for _, unTouchedLinks := range res {
for _, url := range c.Parser(unTouchedLinks.Link) {
items = append(items, Links{unTouchedLinks.Level + 1, url})
}
}
for _, url := range c.Parser(v.Link) {
items = append(items, Links{v.Level + 1, url})
}
res = append(res, items...)
// go func(v Links, res []string) {
for _, item := range res {
out <- item
}
default:
res = append(res, v)
}
//}(v, res)
}
}()
}
m := make(map[string]bool)
var i int
for link := range out {
if !m[link.Link] && link.Level <= depth {
i++
fmt.Printf("time.Since(start) = %+v\n", time.Since(start))
fmt.Printf("i = %+v\n", i)
m[link.Link] = true
go func() {
tokens <- struct{}{}
linkList <- link
}()
}
} // this is end of main() }
// Approach Two
wg := &sync.WaitGroup{}
go func() {
linkList <- Links{1, c.ParsingPath}
wg.Add(1)
}()
for i := 0; i < 2; i++ {
go func() {
for v := range linkList {
fmt.Printf(" runtime.NumGoroutine()= %+v\n", runtime.NumGoroutine())
if v.Level >= depth {
continue
}
res := c.Parser(v.Link)
go func(v Links, res []string) {
for _, item := range res {
wg.Add(1)
out <- Links{v.Level + 1, item}
}
}(v, res)
}
}()
}
m := make(map[string]bool)
var i int
for link := range out {
if !m[link.Link] && link.Level <= depth {
wg.Done()
fmt.Printf("time.Since(start) = %+v\n", time.Since(start))
// How many link been processed
i++
fmt.Printf("i = %+v\n", i)
m[link.Link] = true
linkList <- link
}
}
// This won't work because out channel never end, so it dead lock, even though all links within depth finished processing
wg.Wait()
fmt.Println("done") // this is end of main() }
To make it easier understand, allow me to draw pictures:
<img src="/uploads/default/original/2X/3/34bedb7ecc6cb4a558cbd423b86e5f54c00963a2.png" width="690" height="365">
**Observation**:
1. Given only one goroutine created in for-loop(graph shows 2 goroutines created)
Approach one need take 56s to finish 3 depths with 137 links totally(using 153 goroutines to finish the work), while Approach two only take 2.5s to finished same amount of work(using 12 goroutines).
1. Given two goroutines, Approach one finish same amount of work, it takes 2,67s. While two use 1.62s.
1. Keep the variable state at main goroutine is good idea(map, res). res is shared by multi-goroutines, from the result show the slice is concurrency-safe.
1. Less goroutine created, less time consuming, more robust.
Finally, Here is my few question:
Q1: Given ONE using tokens to control how many goroutines able to process the Url in the same time
Q2: Using both approach I still not solving DEAD-LOCK problem, any idea on that?
Q3: better solution? any idiomatic way to solve this case?
Thanks again for all the patience and kindness to answer questions.