Stop goroutines when a condition is met in one of them

I am learning how concurrency works in Go and how to work with it.
I am facing now the challenge of stopping running goroutines when in one of them a given condition is met.

Take the following dummy example (this is probably not the best way to accomplish the task, but it is good enough for me to understand how to work with goroutines, I hope :D)

Say you have a slice of N slices (with N being quite a large number), each inner slice contains M integers (where M can be any number from 1 to 1000+). You now want to check if in one of these inner slices a given value (say 2) is present.

To leverage concurrency, you would iterate over the element of the outer slice in parallel and then iterate over the integer values to check if 2 is present. To avoid waiting the inner iterations to complete, you could use a stop channel to signal the goroutines that a 2 has been found already.

Here my code for a smaller data set

package main

import (
	"fmt"
	"math/rand"
	"sync"
	"time"
)

// main function
func main() {
	rand.Seed(time.Now().UnixNano())
	min := 1
	max := 50

	sofs := make([][]int, 50)

	for i := 0; i < 50; i++ {
		new_length := rand.Intn(max-min+1) + min

		sofs[i] = make([]int, new_length)

		for j := 0; j < new_length; j++ {
			sofs[i][j] = rand.Intn(max-min+1) + min
		}
	}

	// A channel
	achan := make(chan bool, len(sofs))
	var wg sync.WaitGroup

	stopCh := make(chan struct{})

	for _, x := range sofs {
		wg.Add(1)
		go func(s []int) {
			defer wg.Done()
			for l, vs := range s {
				select {
				case <-stopCh:
				default:
					if vs == 2 {
						close(stopCh)
						achan <- true
						fmt.Println(s)
						fmt.Println(l)
						return
					}
				}
			}
		}(x)
	}

	wg.Wait()
	close(achan)

	for v := range achan {
		if v {
			fmt.Println("2 is in sofs!")
			break
		}
	}

}

Although it works (if randomness gives you a 2 in one of the inner slices), the inner iteration

for l, vs := range s {
				select {
				case <-stopCh:
				default:
					if vs == 2 {
						close(stopCh)
						achan <- true
						fmt.Println(s)
						fmt.Println(l)
						return
					}
				}
			}

would need to finish its task (vs == 2?) before being able to check if a stop signal has been sent to stopCh. This means that if instead of a simple task (? vs == 2), one have a more demanding one which you can simulated with

for l, vs := range s {
				select {
				case <-stopCh:
				default:
					if vs == 2 {
                        time.Sleep(1000 * time.Second)
						close(stopCh)
						achan <- true
						fmt.Println(s)
						fmt.Println(l)
						return
					}
				}
			}

you will waste time…
Is there a wait to somehow kill the goroutines without waiting for them to check the stopCh?

Hi @Pisistrato,

Killing goroutines pre-emptively does not seem like a good idea to me. An operation that gets interrupted at an unpredictable point may cause data corruption or worse things.

In other words, the code that is to be interrupted should be aware of that interruption and should get a chance to clean up and leave a consistent state behind.

If a select case is a really long-running operation, try splitting the operation into independent steps with a determined outcome for each step (and maybe a rollback strategy). Then the select block can check more often for a stop signal while the code completes the given task incrementally.

And do some benchmarks first, to see if the code needs improvement at all. Premature optimization can be a waste of time and lead to code that is harder to read and maintain.

3 Likes

Cancelling go routines happens often with web servers and DB micro-services, sometimes using a Deadline or a context.Context variable passed to the go routine. In fact, the “context” package was added to the stdlib to facilitate such cancellation requirements. (Yeah, you may need a ‘defer’ in the go routine to facilitate cleanup, but that’s what ‘defer’ is for.)

1 Like

your goroutines need to check more often if they have been cancelled. There is no other way.

The context cancel, mentioned by @clbanning does the same thing: your code and many library functions will check from time to time if the context is cancelled and return. There is no instant return, resources do get consumed between the last two consecutive checks.

thanks for the awesome information.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.