Timeouts in systemd with goinggo/workpool

gerbenjacobs · March 29, 2016, 9:26pm

I’ve created a small Go binary that I want to have a systemd service for.
Running the binary manually works, but when I try to use systemctl to start it, it will do it’s inital non-workpool setup but then timeout: holdoff time over, scheduling restart. Before that happens I would have expected the first workers to be done with the first job, but I also never see their output

I then tested the same script with a Hello World webserver binary and that works, leaving me the fact that goinggo/workpool is probably involved.

I’m not really familiar with systemd, however I did try to set its Type to forking as well, but that didn’t work.

So my questions is; Does anyone have experience with systemd (or similiar) and the workpool package (or goroutines in general) and managed to get systemd to not timeout with goroutines

dfc · March 29, 2016, 9:41pm

If the program crashed there should be information in stderr which systemd will have squirreled away in its journal. That’s the extent of my knowledge with systemd. If you can find and post the complete crash report, maybe we can figure it out.

gerbenjacobs · March 29, 2016, 9:57pm

It’s not crashing and there is not much logging.

Mar 29 23:20:46 raspberrypi systemd[1]: Started HW Runner Service.
Mar 29 23:20:47 raspberrypi hwrunner[22850]: [Collected 16 names] Took: 1.180100723s
Mar 29 23:21:07 raspberrypi systemd[1]: hwrunner.service holdoff time over, scheduling restart.

The first output is coming from just before I start a go func() {} that is using a sync.WaitGroup and the Workpool package.

I forgot about the WaitGroup, maybe that can also be the cause.

Am I correct in stating that if I run a Go binary with goroutines that it all exists in one process?

dfc · March 29, 2016, 10:08pm

That’s dmesg, not the output I asked for.

What is the problem you are having? Does you application not start? Or does
it start but not interact with systemd as you wish?

Can you post a code sample that demonstrates the problem ?

gerbenjacobs · March 29, 2016, 10:21pm

The application starts. But does somehow not signal to systemd that it’s up and running.

This is the main loop in the code; collectNames() will return a slice of Job structs that PostWork requires.

func runner(n int) {
	start := time.Now()
	jobs, err := collectNames(n)
	if err != nil {
		fmt.Println("Can not collect names. Trying again in 3 seconds..")
		time.Sleep(3 * time.Second)
		runner(n)
	}

	go func() {
		for i, _ := range jobs {
			wg.Add(1)
			if err := workPool.PostWork(QueueName, &jobs[i]); err != nil {
				fmt.Printf("ERROR: %s\n", err)
				time.Sleep(3 * time.Second)
			}
		}
		wg.Wait()
		elapsed := time.Since(start)
		logger(fmt.Sprintf("[Done!] Total time: %s\n", elapsed))
		client.Trigger("updater", "update", map[string]string{"msg": logString})
		logString = ""
		runner(n)
	}()
}

And the actual main;

func main() {
	workerCount := 2
	queueLength := int32(1000)
	nameLength := 16

	client = pusher.Client{
		AppId:  os.Getenv("PUSHER_APP_ID"),
		Key:    os.Getenv("PUSHER_APP_KEY"),
		Secret: os.Getenv("PUSHER_APP_SECRET"),
	}

	QueueName = "hwrunner"
	workPool = workpool.New(workerCount, queueLength)

	runner(nameLength)

	reader := bufio.NewReader(os.Stdin)
	reader.ReadString('\n')
}

The Stdin reader bit was part of the WorkPool example, to stop the program from exiting while its queues are still full.
My system knowledge is not that great, but I understand that it’s not really a proper exit code. But then again it is also not supposed to be, but more run as a daemon.

dfc · March 29, 2016, 11:54pm

Why does the application need to signal systemd to do anything ? Can you use a mode where systemd just checks that the child process is running, ie has not exited yet.

I don’t know what reading from stdin is going to do for you, stdin is connected to /dev/null by default when running under an init system (not just systemd) so ReadString will immediately return io.EOF, but you’re not checking the error.

My guess is your main function returns instantly, and when main returns, the program will exit.

system · June 27, 2016, 11:57pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.