Hello,
I’ve been exploring the semaphore and worker pool patterns, and tried to implement the same program, a port scanner, using both.
The program using a semaphore:
package main
import (
"fmt"
"net"
"sort"
"strconv"
"sync"
"time"
)
func main() {
const ports = 65535
const limit = 256
var wg sync.WaitGroup
var mutex sync.Mutex
sem := make(chan bool, limit)
openPorts := []int{}
for i := 0; i < ports; i++ {
i := i
sem <- true
wg.Add(1)
go func() {
defer wg.Done()
defer func() {
<-sem
}()
open := isPortOpen(i)
if open {
mutex.Lock()
openPorts = append(openPorts, i)
mutex.Unlock()
}
}()
}
wg.Wait()
}
func isPortOpen(port int) bool {
addr := "localhost" + ":" + strconv.Itoa(port)
conn, err := net.DialTimeout("tcp", addr, 10*time.Second)
if err != nil {
return false
}
defer conn.Close()
return true
}
// Output:
// go run -race main.go 34.58s user 11.69s system 541% cpu 8.550 total
The program using a worker pool:
package main
import (
"fmt"
"net"
"sort"
"strconv"
"sync"
"time"
)
var openPorts = []int{}
func main() {
const ports = 65535
const workers = 256
var wg sync.WaitGroup
var mutex sync.Mutex
jobs := make(chan int, ports)
wg.Add(ports)
for i := 0; i < workers; i++ {
go worker(jobs, &wg, &mutex)
}
for i := 1; i <= ports; i++ {
jobs <- i
}
wg.Wait()
}
func worker(ports chan int, wg *sync.WaitGroup, mutex *sync.Mutex) {
for p := range ports {
open := isPortOpen(p)
if open {
mutex.Lock()
openPorts = append(openPorts, p)
mutex.Unlock()
}
wg.Done()
}
}
func isPortOpen(port int) bool {
addr := "localhost" + ":" + strconv.Itoa(port)
conn, err := net.DialTimeout("tcp", addr, 10*time.Second)
if err != nil {
return false
}
defer conn.Close()
return true
}
// Output:
// go run -race main.go 34.05s user 12.37s system 638% cpu 7.270 total
Both programs are run with
time go run -race main.go
.
I’m curious why the semaphore pattern is slower than the worker pool, but uses less CPU. Where do the patterns differ? Is there any preferred approach?
Additionally, if I’m correct, the maximum number of active goroutines in this case is given by the machine’s maximum number of file descriptors, which can be checked with ulimit -n
. It is safe to use that number? Can it be know from within the program?