I am using curl -T <FILE> to upload a file, which could be as large as 5GB or even larger. The server implementation could be an HTTP server in Go, Rust, or Java, based on HTTP/1.1.
I need to calculate the checksums of the file on the server side, including MD5, SHA1, and SHA256. This requires three calculations, so it is best to calculate them in parallel, and I must not read the file into memory because this would cause the service to OOM directly (and also not write temporary files). However, the biggest problem we encountered is that a HTTP Request Body cannot be read multiple times: after using it to calculate the checksum once, trying to read it again will become empty. We have tried to use Go’s io.Pipe() to do this, but it doesn’t seem very efficient.
After calculating the checksums, we need to determine based on the business logic whether to upload the file to some S3, which requires the file stream again.
Is this requirement possible to implement? Could you please give some specific code?
You can’t re-read the body just like you can’t re-read from os.Stdin: You’re reading directly from the stream input (in the case of stdin: Whatever the user types at the keyboard or is piped in from another application, and in the case of the request: Whatever the server is sending in its response) and once it’s been consumed, it’s gone. You have to store somewhere if you want to re-read it, or re-request the information (e.g. read it once to hash it, then request it again and send that directly to S3).
Can you clarify what you mean by it not seeming very efficient?