Question about Goroutine

:heart:First, I want thank you for your time to look at this thread, and patiently read thru end. :heart:

type Links struct {
 Level int
 Link  string
}

var tokens = make(chan struct{}, 10)

func main() {
 linkList := make(chan Links)
 go func() {
  for _, v := range os.Args[1:] {
   linkList <- Links{1, v}
  }
 }()
 depth := 4
 m := make(map[string]bool)
 for link := range linkList {
  fmt.Printf(" runtime.NumGoroutine()= %+v\n", runtime.NumGoroutine())
  if !m[link.Link] && link.Level < depth {
   m[link.Link] = true
   go func(link Links) {
    fmt.Printf("link = %+v\n", link)
    tokens <- struct{}{}
    res := utility.Parser(link.Link)
    <-tokens
    for _, v := range res {
     linkList <- Links{link.Level + 1, v}
    }
   }(link)
  }
 }
}

Context is I want to parse all links on the page, just practice to familiar with golang
Q1: Here is my code, I found goroutine jump really high to 180+, and then finally diminish to 35, but I already set 10 as maximum, but happened out there?

 linkList := make(chan string)
 go func() {
  linkList <- os.Args[1]
 }()
 m := make(map[string]bool)
 for link := range linkList {
  if !m[link] {
    m[link] = true
    go func(link string) {
     linkList <- utility.Parse(link)...
    }(link)
  }
 }

Q2: I am wondering any syntax sugar on linkList <- utility.Parse(link)..., compiler is not happy with this line.
Q3: Is there anyway in Forum I can highlight the code snippet, so the reader can have better way to scan code?

I assume you mean you set the GOMAXPROCS environment variable. This sets the number of active (unblocked) operating system threads, not the number of goroutines. Many goroutines can run on a single thread.

One way of limiting the number of goroutines you use is with worker pools: https://gobyexample.com/worker-pools

Hey Nathan,

fmt.Printf(" runtime.NumGoroutine()= %+v\n", runtime.NumGoroutine()) in the program. which print the goroutine up to 180+ during running, and decrease to 35 at end of program.
Will check the goByExample as well. Thanks!:tiger:

I took a closer look and discovered two things.

First,

tokens &lt;- struct{}{}
res := utility.Parser(link.Link)
&lt;-tokens

limits concurrent access to utility.Parser, not the number of goroutines. Practically, this might be the resource that should be limited. However, something like a worker pool could limit both goroutines and concurrent access to ‘utility.Parser’

Second, main always deadlocks.

I make a running version https://play.golang.org/p/p5_SqVvj2C that simulates utility.Parser with FakeLinks, moved all the tunable values to constants at the beginning, and made it so it will not deadlock.

FakeLinks takes a configurable amount of time, sleepDuration a random number (at most maxToGenerate) of random “links”. It also prints the number of concurrent FakeLinks that are running.

To beat the deadlock in main, I used a sync.WaitGroup that holds the number of links that still need to be processed. This is a bit nuanced because the number of links to process is unknown until after all the links are processed. I think I got right.

I think this leaves you in a position to use a worker pool. Please post your solution and any more questions you have.

Hey Nathan, I want to thank your deliberate answer, appreciate a lot.:cherry_blossom:
I try to read your code and learn a lot.

I try to use different implementation to deal with this same question, and think about what will be more idiomatic, pros, cons and trade-offs.

Contexts : Crawler all the links given specific URL
First solution is

//common part:
type Links struct {
	Level int
	Link  string
}
var tokens = make(chan struct{}, 1)

func main() {
	start := time.Now()
	c := utility.New()
	c.ParsingPath = "https://ryanfakir.github.io"
	linkList := make(chan Links)
	out := make(chan Links)
	wg := &sync.WaitGroup{}
	depth := 3
//differece begins Approach One 
go func() {
	// start first link
	linkList <- Links{1, c.ParsingPath}
	tokens <- struct{}{}
}()
var res []Links
for i := 0; i < 2; i++ {
	go func() {
		for v := range linkList {
			fmt.Printf(" runtime.NumGoroutine()= %+v\n", runtime.NumGoroutine())
			if v.Level >= depth {
				continue
			}
			select {
			case <-tokens:
				// append untouch urls from res
				var items []Links
				for _, unTouchedLinks := range res {
					for _, url := range c.Parser(unTouchedLinks.Link) {
						items = append(items, Links{unTouchedLinks.Level + 1, url})
					}
				}
				for _, url := range c.Parser(v.Link) {
					items = append(items, Links{v.Level + 1, url})
				}
				res = append(res, items...)
				// go func(v Links, res []string) {
				for _, item := range res {
					out <- item
				}
			default:
				res = append(res, v)
			}
			//}(v, res)
		}
	}()
}

m := make(map[string]bool)
var i int
for link := range out {
	if !m[link.Link] && link.Level <= depth {
		i++
		fmt.Printf("time.Since(start) = %+v\n", time.Since(start))
		fmt.Printf("i = %+v\n", i)
		m[link.Link] = true
		go func() {
			tokens <- struct{}{}
			linkList <- link
		}()

	}
} // this is end of main() }
          
                                 
    // Approach Two
	wg := &sync.WaitGroup{}
	go func() {
		linkList <- Links{1, c.ParsingPath}
		wg.Add(1)
	}()
	for i := 0; i < 2; i++ {
		go func() {
			for v := range linkList {
				fmt.Printf(" runtime.NumGoroutine()= %+v\n", runtime.NumGoroutine())
				if v.Level >= depth {
					continue
				}
				res := c.Parser(v.Link)
				go func(v Links, res []string) {
					for _, item := range res {
						wg.Add(1)
						out <- Links{v.Level + 1, item}
					}
				}(v, res)
			}
		}()
	}

	m := make(map[string]bool)
	var i int
	for link := range out {
		if !m[link.Link] && link.Level <= depth {
			wg.Done()
			fmt.Printf("time.Since(start) = %+v\n", time.Since(start))
			// How many link been processed
			i++
			fmt.Printf("i = %+v\n", i)
			m[link.Link] = true
			linkList <- link
		}
	}
	// This won't work because out channel never end, so it dead lock, even though all links within depth finished processing
	wg.Wait()
	fmt.Println("done") // this is end of main() }

To make it easier understand, allow me to draw pictures: 
<img src="/uploads/default/original/2X/3/34bedb7ecc6cb4a558cbd423b86e5f54c00963a2.png" width="690" height="365">

**Observation**:

1. Given only one  goroutine created in for-loop(graph shows 2 goroutines created)
Approach one need take 56s to finish 3 depths with 137 links totally(using 153 goroutines to finish the work), while Approach two only take 2.5s to finished same amount of work(using 12 goroutines).

1. Given two goroutines, Approach one finish same amount of work, it takes 2,67s. While two use 1.62s.

1. Keep the variable state at main goroutine is good idea(map, res). res is shared by multi-goroutines, from the result show the slice is concurrency-safe.

1. Less goroutine created, less time consuming, more robust.

Finally, Here is my few question:
Q1: Given ONE using tokens to control how many goroutines able to process the Url in the same time
Q2: Using both approach I still not solving DEAD-LOCK problem, any idea on that?
Q3: better solution?  any idiomatic way to solve this case?

Thanks again for all the patience and kindness to answer questions.

Thanks for providing code and pictures :slight_smile:

Observation 3. Keep the variable state at main goroutine is good idea(map, res). res is shared by multi-goroutines, from the result show the slice is concurrency safe.

The slice is not safe for concurrency. You were lucky because the way your tokens worked serialized access to res by not letting more than one goroutine run at a time.

Observation 4. Less goroutine created, less time consuming, more robust.

If this were true, then you should not be using goroutines. In general, goroutines are not expensive resources. I usually only control the number of goroutines to control access to some other resource, for example, some memory or cpu heavy process.

Q1: Given ONE using tokens to control how many goroutines able to process the Url in the same time

I think that only one goroutine will be active at a time because tokens can only hold one item (because you defined it that way) and it is received in one goroutine and then sent to in another.

Q2: Using both approach I still not solving DEAD-LOCK problem, any idea on that?

Both ONE and TWO loop over out. This loop will never end because out is never closed.

If you look at my solution in https://play.golang.org/p/p5_SqVvj2C1, which does not deadlock, main blocks on wg.Wait(). The careful use of wg.Add and wg.Done tracks how many links remain to be processed, and hits 0 (triggering the end of wg.Wait, when all the links have been processed.

Q3: better solution? Any idiomatic way to solve this case?

I think a manager/worker approach would work.

The manager would queue up links to process from linkListand then send them to workers in a worker pool. When the manager has no more links that need to be processed and none of the workers are busy, then stop the program.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.