[os/exec]How to make sure that Wait() always complete?


(Julien) #1

Hello, I have a weird issue where I spawn a process and the cmd.Wait() method never returns even though the spawned Linux process is dead, I saw this from the doc:

If any of c.Stdin, c.Stdout or c.Stderr are not an *os.File, Wait also waits for the respective I/O loop copying to or from the process to complete.

Wait waits for the command to exit and waits for any copying to stdin or copying from stdout or stderr to complete.

In my code I have this:

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stdout

How can I make sure that those goroutines that do the I/O copy always returns when the process dies even if the copy is no completed? To add a bit more context I think that the subprocess forks() itself when it crashes… so it might create an issue with the stderr / stdout.

Thanks,


(Johan Dahl) #2

Hi. Do you have any example code?


(Julien) #3

It looks like something like that:

// Start the server
err = srv.Start()
if err != nil {
	return
}

// Setup traps for Signals
// TODO: maybe we need a larger buffer for the channel?
signalCh := make(chan os.Signal, 10)
signal.Notify(signalCh, syscall.SIGINT, syscall.SIGHUP, syscall.SIGTERM, syscall.SIGUSR1, syscall.SIGCHLD)

go func() {
	for signal := range signalCh {
		switch signal {
		case syscall.SIGINT:
			srv.sendSignal(syscall.SIGINT)
		case syscall.SIGHUP:
			srv.sendSignal(syscall.SIGHUP)
		case syscall.SIGTERM:
			srv.sendSignal(syscall.SIGTERM)
		case syscall.SIGUSR1:
			srv.sendSignal(syscall.SIGUSR1)
		default:
			logger.Warn(fmt.Sprintf("received an unhandled signal: %v", signal.String()))
		}
	}
}()

// Wait for the server to crash / exit gracefuly
err = srv.cmd.Wait()
if err != nil {
    // do A
} else {
    // do B
}

srv.cmd.Wait() nevers returns even though the spawned process crashed. ( I saw it on Linux using ps ). After reading the Wait() doc since os.Stdout is a *File it shouldn’t block so it’s probably a different issue.


(Johan Dahl) #4

What happens if you run the command with CombinedOutput() ? Does this return if program crashes?


(Julien) #5

It’s very hard to reproduce actually, I’ve seen it 3 times, and it’s not something I’m able to reproduce on my side, it works 99% of the time.


(Johan Dahl) #6

Hard to test any solutions then. One way to do it on unix systems is to get the pid from the commands Process field.

 pid := srv.Process.Pid

and then check if the process exists by sending a 0 signal to it. It doesn’t really send a signal but an error is returned if pid doesn’t exist

killErr := syscall.Kill(pid, syscall.Signal(0))
procExists := killErr == nil