I was working on some code that returns back an io.Reader for data extraction rather than the full output as a slice of bytes which would take up more memory for very large data.
For this question, it is posed against SSH client interface, but could apply to any reader whose underlying resources it comes from has some resources being closed at some point.
The sample test/demo code: Go Playground - The Go Programming Language. It runs locally on my computer, and runs in Go Playground, though sometimes the playground sandbox environment execution will fail. I tested code against real SSH connection earlier also rather than this demo code using test/mock SSH server, showing this code here as is easier to demo with.
package main
import (
"fmt"
"io"
"net"
"github.com/metarsit/sshtest"
"golang.org/x/crypto/ssh"
)
func main() {
addr := "localhost:2222"
// for real demo, data should ideally be some very large string or array of bytes in size of hundreds of MB
data := "supposedly some very large data being streamed for I/O processing"
// init dummy test server to interface to SSH client related code being tested
hp := sshtest.NewHoneyPot(addr)
// Start the server in the background
go func() {
hp.ListenAndServe()
}()
defer hp.Close()
hp.SetReturnString(data)
// init the SSH client dependency to test exec cmd & getting the output stream
cfg := &ssh.ClientConfig{
User: "jdoe",
Auth: []ssh.AuthMethod{
ssh.Password("secret"),
},
HostKeyCallback: ssh.HostKeyCallback(
func(hostname string, remote net.Addr, key ssh.PublicKey) error {
return nil
},
),
}
outs, err := runSshCommand(addr, cfg, "echo \"hello world!\"")
if err != nil {
fmt.Printf("%v\n", err)
return
}
//result, err := io.ReadAll(outs)
result := make([]byte, 20) // demo arbitrary partial read against reader
// NOTE/TODO: what happens when fully reading the returned "outs" reader
// takes a long time? Can the SSH session or client connection close
// on remote end and cause stream read failure?
//
// And more importantly do the deferred close within the RunSshCommand
// for session and client affect reading of the returned stream at the caller
// side after the function has already returned but reader hasn't been
// completely read yet? Seems no effect the latter case from this simple
// demo here?
_, err = io.ReadFull(outs, result)
if err != nil {
fmt.Printf("%v\n", err)
return
}
fmt.Printf("main/caller output:\n%s\n", result)
}
func runSshCommand(addr string, cfg *ssh.ClientConfig, cmd string) (io.Reader, error) {
client, err := ssh.Dial("tcp", addr, cfg)
if err != nil {
return nil, fmt.Errorf("Create client failed %v", err)
}
defer client.Close()
// open session
session, err := client.NewSession()
if err != nil {
return nil, fmt.Errorf("Create session failed %v", err)
}
defer session.Close()
stderr, err := session.StderrPipe()
if err != nil {
err = fmt.Errorf("cannot open stderr pipe for cmd '%s': %s", cmd, err)
return nil, err
}
stdout, err := session.StdoutPipe()
if err != nil {
err = fmt.Errorf("cannot open stdout pipe for cmd '%s': %s", cmd, err)
return nil, err
}
err = session.Run(cmd)
if err != nil {
err = fmt.Errorf("cannot run cmd '%s': %s", cmd, err)
return nil, err
}
combinedOutputStream := io.MultiReader(stdout, stderr)
return combinedOutputStream, nil
}
I was unsure the outcome of the code (same thoughts as my colleague that we’ll get to in a bit) when I was first working with it but the sample code works. When I posted similar code for review at work, a colleague posed question that the io.Reader’s underlying source is stdout from SSH session, that the called function has deferred close statements on the SSH resources (session, and client connection), and on function exit from the deferred closes, wouldn’t the associated stdout linked to the returned reader be closed (and thus we would fail to read the data)?
So the questions for code review here are:
- why does the sample code still work? Did my colleague & I have mistaken assumptions about how stdout operates relative to the SSH client session?
- under what conditions would the code not work? How to alter the example to highlight problem cases?
I’m assuming one train of thought would be to return or pass back references to the resources being closed so that you close them as needed when done reading the data from the associated reader or on erroring out rather than deferring the close within the function being called, which would make you think turns the reader out of scope when passed back to the caller. But I would think doing it this way could get cumbersome and complicated for the caller to do, to also have to manage the closes and possibly do kind of async processing. All this wouldn’t be something to worry about in the simplified case of where function simply reads all the stdout data and returns a slice of bytes instead of the reader, but this is at expense of using up memory for the simplification.
What is go best practice for something like this when you want to read/transfer a lot of data (like over SSH) but don’t want to use up memory (or temp files and disk space) in doing so, where we try to use other means like the io.Reader interface. Am I heading in right direction? Or there are other ways to do it or enhancements I could make here?