Questions about semaphore vs pool worker patterns

Hello,

I’ve been exploring the semaphore and worker pool patterns, and tried to implement the same program, a port scanner, using both.

The program using a semaphore:

package main

import (
	"fmt"
	"net"
	"sort"
	"strconv"
	"sync"
	"time"
)

func main() {
	const ports = 65535
	const limit = 256

	var wg sync.WaitGroup
	var mutex sync.Mutex

	sem := make(chan bool, limit)
	openPorts := []int{}

	for i := 0; i < ports; i++ {
		i := i

		sem <- true

		wg.Add(1)
		go func() {
			defer wg.Done()
			defer func() {
				<-sem
			}()

			open := isPortOpen(i)
			if open {
				mutex.Lock()

				openPorts = append(openPorts, i)

				mutex.Unlock()
			}
		}()
	}

	wg.Wait()
}

func isPortOpen(port int) bool {
	addr := "localhost" + ":" + strconv.Itoa(port)
	conn, err := net.DialTimeout("tcp", addr, 10*time.Second)
	if err != nil {
		return false
	}

	defer conn.Close()

	return true
}

// Output:
// go run -race main.go  34.58s user 11.69s system 541% cpu 8.550 total

The program using a worker pool:

package main

import (
	"fmt"
	"net"
	"sort"
	"strconv"
	"sync"
	"time"
)

var openPorts = []int{}

func main() {
	const ports = 65535
	const workers = 256

	var wg sync.WaitGroup
	var mutex sync.Mutex

	jobs := make(chan int, ports)

	wg.Add(ports)
	for i := 0; i < workers; i++ {
		go worker(jobs, &wg, &mutex)
	}

	for i := 1; i <= ports; i++ {
		jobs <- i
	}

	wg.Wait()
}

func worker(ports chan int, wg *sync.WaitGroup, mutex *sync.Mutex) {
	for p := range ports {
		open := isPortOpen(p)
		if open {
			mutex.Lock()

			openPorts = append(openPorts, p)

			mutex.Unlock()
		}

		wg.Done()
	}
}

func isPortOpen(port int) bool {
	addr := "localhost" + ":" + strconv.Itoa(port)
	conn, err := net.DialTimeout("tcp", addr, 10*time.Second)
	if err != nil {
		return false
	}

	defer conn.Close()

	return true
}

// Output:
// go run -race main.go  34.05s user 12.37s system 638% cpu 7.270 total

Both programs are run with time go run -race main.go.

I’m curious why the semaphore pattern is slower than the worker pool, but uses less CPU. Where do the patterns differ? Is there any preferred approach?

Additionally, if I’m correct, the maximum number of active goroutines in this case is given by the machine’s maximum number of file descriptors, which can be checked with ulimit -n. It is safe to use that number? Can it be know from within the program?

Hello,

Understanding how channels work should help you (i guess). Please check the link below.

Since a goroutine costs 2kb of memory, you can calculate how many goroutines you can start regarding your resources.

I hope someone puts more answers to this question :slight_smile: