Goroutine hangs and doesn't finish execution

Hi I am new to gaoling and I am trying ssh into hundreds of machines and check for a successful connection. I am using goroutines and channels for this. My code runs fine on a small sample size but on larger ones it only completes sometimes while many times it does not.

        results := make(chan string)
    errors := make(chan string)
        for i := 0; i < len(hosts); i++ {
                host := hosts[i]
                go func(host string) {
                        conn, err := ssh.Dial("tcp", fmt.Sprintf("%s:%s", host, port), sshConfig)
                        if err != nil {
                                errors <- "Connection Failed " + host
                        } else if conn != nil {
                                results <- host
                                conn.Close()
                        }

                }(host)
        }

        for i := 0; i < len(hosts); i++ {
                select {
                case res := <-results:
                        resList = append(resList, res)
                case err := <-errors:
                        fmt.Println(err)
                }
        }
        close(results)
        close(errors)

any help is appreciated!

On a first glance, I would try closing conn before sending host to results, to avoid open connections piling up (and perhaps hitting some limit) while the goroutines wait for the unbuffered results channel to become available.

                            conn.Close()
                            results <- host

Thanks for the suggestion. I am trying that now but I have a feeling that is not the reason. I am attempting to connect to around 650 machines and expecting only two successful hosts. I see the output get spit out really fast and then it hangs when most/all of the output is completed

edit: still running into this problem. When it reaches the last few hosts it hangs

There are several design glitches here like using for loop instead of range, then “else if” without a real reason (the first condition can just return), then close can be defered. But this has nothing to do with the problem :slight_smile:

I would write this:

                        conn, err := ssh.Dial("tcp", fmt.Sprintf("%s:%s", host, port), sshConfig)
                        if err != nil {
                                errors <- "Connection Failed " + host
                        } else if conn != nil {
                                results <- host
                                conn.Close()
                        }

as

                        conn, err := ssh.Dial("tcp", fmt.Sprintf("%s:%s", host, port), sshConfig)
                        defer conn.Close() // Close even in case of errors
                        if err != nil {
                                errors <- "Connection Failed " + host
                                return
                        }
                        results <- host

And the loops should be like:

for host := range hosts {

Do you use timeout it your sshConfig? Can it be that some connection just hangs too long?

@Josh_Longhi,

I am interested about ssh connection tests to big group of hosts also.

Would you mind to post the complete code for another golang beginner ?

I like to use your example code plus library from goexpect/gexpect to build a ssh login verification tool.

Thanks

Thanks, I am new to go I will take these suggestions into consideration.

Yes I have a timeout it does not seem to help.

The only part I omitted from the ssh portion is the config. Here it is:

sshConfig := &ssh.ClientConfig{
	User: "user",
	Auth: []ssh.AuthMethod{
		ssh.Password("pass"),
	},
	HostKeyCallback: ssh.InsecureIgnoreHostKey(),
	Timeout:         20 * time.Second,
}
port := "22"
1 Like

Try to put debugging output before and after ssh.Dial to see if this still hangs. If not, then there is some problem in Go, otherwise it is just ssh.Dial. Maybe you should also limit the number of simultanous goroutines using buffered channel. But I supose you would get an OS error in this case (using too much resources).

Hi All

@Josh_Longhi’s code with go routine is actually too much for me to understand at first, so I created sshgo1.go with logic for only doing 1 host connection that works in my test lab. It has logging to show error message.

I am learning to finish sshgo2.go next by adding “for loop” .
And finally sshgo3.go with goroutine.

It seems that all my failed connections finish dealing and the program is hanging and now showing the two successful dials I expect. Could this be a channel issue?

# provide  wrong password
[me@fedora01 sshgo]$ go run sshgo3.go
Connection Failed fedora01.test.lan
Connection Failed fedora01.test.lan
Connection Failed fedora01.test.lan
[                                                                                                                                                                                                                                                                                                           ]
# supply correct password.
[me@fedora01 sshgo]$ go run sshgo3.go
[                                                                                                                                                                                                                                                                                                            fedora01.test.lan fedora01.test.lan fedora01.test.lan]
[me@fedora01 sshgo]$

  • todo next after break
    • understand goroutine more
    • expand the hosts list.

@Josh_Longhi, Maybe you can add log.Fatal in goroutine section of code after this line " errors <- “Connection Failed " + host”, it will exit out and showing which host failed.

Again thanks for bring up this topic to encourage me to learn golang. hope we all got the idea ssh login check program we need.

No, channels should be fine, but you can try to remove them and put sleep at the end, maybe 30 minutes or so until goroutines finish.

Or try to timeout goroutine using time.After or context.Timeout. You can find some examples here:

http://dahernan.github.io/2015/02/04/context-and-cancellation-of-goroutines/

I suspect to connectivity problems.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.