The shell script kills the go process, causing the script to be killed as well

The automatic upgrade of the server’s resident go service is implemented in the following way, which involves overwriting the service through a script.

The general logic is as follows

#!/bin/bash

version=$1
md5=$2
log_path=$3
echo "version:$version; md5:$md5" >> $log_path

## 1. delete server rpm
# 1、删除rpm
rm_cmd="sudo rpm -e server-20240202.rpm"
exec_ret=$($rm_cmd 2>&1)
exec_status=$?
if [ ${exec_status} != 0 ]; then
    echo "Failed to remove rpm, exec_ret: $exec_ret" >> $log_path
    exit 1
fi
echo "1、remove rpm successfully." >> $log_path
sleep 5

## 2. install new rpm
install_rpm_cmd="sudo rpm -ivh server-20240228.rpm"
exec_ret=$($install_rpm_cmd 2>&1)
exec_status=$?
if [ ${exec_status} != 0 ]; then
    echo "Failed to install new rpm, exec_ret: $exec_ret" >> $log_path
    exit 1
fi

echo "2、install rpm successfully." >> update.log
sleep 5

## 3. kill old process
command_of_kill="ps -ef | grep server| grep -v update.sh | grep -v grep | awk '{print \$2}' |xargs sudo kill -9"
exec_ret=$(/bin/bash -c "$command_of_kill" 2>&1)
exec_status=$?
if [ ${exec_status} != 0 ]; then
    echo "Failed to kill all server's process, exec_ret: $exec_ret" >> $log_path
    exit 1
fi

echo "3、kill server's process successfully."  >> $log_path
sleep 5

## 4. start server
command_of_server_start="export PATH=\$PATH:/usr/sbin/:/sbin/; cd /usr/local/server/; sudo ./start.sh start"

exec_ret=$(/bin/bash -c "$command_of_server_start" 2>&1)
exec_status=$?
if [ ${exec_status} != 0 ]; then
    echo "Failed to start server, exec_ret: $exec_ret" >> $log_path
    exit 1
fi

echo "4、start server successfully." >> $log_path
sleep 5

echo "all、Update server successfully." >> $log_path

I downloaded the upgrade script and upgrade rpm through go, and then executed the script using the following command

    upgradeAbsPath:=/tmp/update.sh
	logPath := fmt.Sprintf("/tmp/update-%s-%s.log", version, md5)
	exec_script := fmt.Sprintf("nohup %s %s %s %s > /dev/null 2>&1 &", upgradeAbsPath, version, md5, logPath)
	cmd := exec.Command("/bin/bash", "-c", exec_script)
	log.Infof("exec command: /bin/bash -c %s", exec_script)
	err = cmd.Run()
	if err != nil {
		return fmt.Errorf("cmd Run failed, err: %w", err)
	}
	log.Infof("exec log path %s", logPath)

According to my understanding, it should be able to upgrade normally, but judging from the output, it stopped after deleting the process.

1、remove rpm successfully.
2、install rpm successfully.
3、kill server's process successfully.

The new service was replaced but failed to start.

I see the same thing by printing the relevant processes every second

========start========
2024-02-28 16:17:03
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
========end========
========start========
2024-02-28 16:17:04
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
root     26815 17337  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26894 26815  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26895 26894  0 16:17 pts/6    00:00:00 sudo rpm -ivh server-20240228.rpm
root     26896 26895  0 16:17 pts/6    00:00:00 rpm -ivh server-20240228.rpm
root     26897 26896  0 16:17 pts/6    00:00:00 rpm -ivh server-20240228.rpm
========end========
========start========
2024-02-28 16:17:05
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
root     26815 17337  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26894 26815  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26895 26894  0 16:17 pts/6    00:00:00 sudo rpm -ivh server-20240228.rpm
root     26896 26895 99 16:17 pts/6    00:00:01 rpm -ivh server-20240228.rpm
========end========
========start========
2024-02-28 16:17:06
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
root     26815 17337  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26894 26815  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26895 26894  0 16:17 pts/6    00:00:00 sudo rpm -ivh server-20240228.rpm
root     26896 26895 99 16:17 pts/6    00:00:02 rpm -ivh server-20240228.rpm
========end========
========start========
2024-02-28 16:17:07
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
root     26815 17337  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26894 26815  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26895 26894  0 16:17 pts/6    00:00:00 sudo rpm -ivh server-20240228.rpm
root     26896 26895 79 16:17 pts/6    00:00:03 rpm -ivh server-20240228.rpm
========end========
========start========
2024-02-28 16:17:08
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
root     26815 17337  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26894 26815  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26895 26894  0 16:17 pts/6    00:00:00 sudo rpm -ivh server-20240228.rpm
root     26896 26895 84 16:17 pts/6    00:00:04 rpm -ivh server-20240228.rpm
========end========
========start========
2024-02-28 16:17:09
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
root     26815 17337  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26894 26815  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26895 26894  0 16:17 pts/6    00:00:00 sudo rpm -ivh server-20240228.rpm
root     26896 26895 88 16:17 pts/6    00:00:05 rpm -ivh server-20240228.rpm
========end========
========start========
2024-02-28 16:17:10
root     23413     1  0 16:16 ?        00:00:00 /usr/local/server/server
root     23437 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin1 -f /usr/local/server/plugin/plugin1/plugin1.conf
root     23439 23413  0 16:16 ?        00:00:00 /usr/local/server/plugins/plugin2 -f /usr/local/server/plugin/plugin1/plugin2.conf
root     26815 17337  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26894 26815  0 16:17 pts/6    00:00:00 /bin/bash /tmp/update.sh 1.0.2 md5_fake /tmp/update.log
root     26895 26894  0 16:17 pts/6    00:00:00 sudo rpm -ivh server-20240228.rpm
root     26896 26895 90 16:17 pts/6    00:00:06 rpm -ivh server-20240228.rpm
========end========
========start========
2024-02-28 16:17:11
========end========
========start========
2024-02-28 16:17:12
========end========
========start========
2024-02-28 16:17:13
========end========

Based on the relevant process number, it stands to reason that the upgrade script process has separated from the parent process server, but I don’t know why it was deleted.

But when I entered the server and manually executed the following command, it completed executing the script, that is, the upgrade was successful.

nohup /tmp/update.sh 1.0.2 md5_fake /tmp/update.log > /dev/null 2>&1 &

Including process and log scripts can be reflected.

update.log

1、remove rpm successfully.
2、install rpm successfully.
3、kill server's process successfully.
4、start server successfully.
all、Update server successfully.

So it seems that there is nothing wrong with the script. I don’t know what the difference is between executing the upgrade script through the go language and executing it manually. As a result, the upgrade script cannot be fully executed.

I really can’t find the problem, I’m looking forward to your help.

When the shell script kills the child process (the Go process) abruptly (using kill without further instructions), the operating system might also terminate the parent process (the shell script itself) due to the default job control behavior. This is to prevent orphaned processes (child processes without a parent) from lingering and potentially consuming resources.

This is more a linux processes question than a go language question. In essence nohup is not enough, as it only prevents the child process from receiving the HUP command. You will need to decouple the script process from the go process completely.

One way to do that is by using systemd-run to start a process as a transient in a new control-group, detached from your parent process: systemd-run

I tried to use the following code to separate the script into a separate process group.

cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
if err := cmd.Start(); err != nil {
    return err
}

This did not seem to solve the problem. It was still deleted at the same time. How can I prevent the script from being killed?

I’m not much of a Linux sys admin (maybe try posting this on a Linux forum?). That said: is your go program the thing executing the script? Like is it supposed to be self-updating? I’ve managed to create self-updating go services on Linux by using systemctl. Just swap new executable in while old is still running, then call systemctl restart myservice. If you want to try setting up a systemd service this is a good guide:

Because my service needs to be compatible with different Linux versions, most of them are centos6 or centos7, a few are redhat, and there may be other versions. It seems that some versions do not support the systemctl command. I use the script in the hope of being more versatile. Is there a systemctl-like command that is applicable to most Linux systems? Looking forward to your reply

It looks like you could use the service command:

Again - you might have better luck on a Linux forum. But that looks like it might fit the bill. The main thing (and this is just my experience on later versions of CentOS and Ubuntu) is: the last command you run needs to stop AND start your executable in one command else the script will stop executing before the “start” command fires.

I have a feeling that since you are using os/exec here it does not recognize unix & at the end of the command, that it should detach from the process. In go, the spawned processes will be terminated as soon as go binary exits. And since you are actually detaching from the process, go gets return code 0 if this detach was successful. *cmd has underlying Process instance to control spawned process. It has Release() method, which can be used instead of wait. As far as I recall it releases process resources and thus, it won’t be killed when binary exits.