Reading HTTP client response twice

Hi,

Here I need to read client response twice so that I can:

  1. return it from Get handler.
  2. dump it to a file using Log function.

What is the most memory efficient way please? I looked into TeeReader but got confused. Also, not sure where and how to close body too.

Thank you

package api

func Get(w http.ResponseWriter, r *http.Request) {
	req, err := http.NewRequestWithContext(r.Context(), http.MethodGet, "https://api.fake.io/id-1", nil)
	if err != nil {
		w.WriteHeader(500)
		return
	}

	res, err := http.DefaultTransport.RoundTrip(req)
	if err != nil {
		w.WriteHeader(500)
		return
	}

	defer audit.Log(req, res)

	dat, err := io.ReadAll(res.Body)
	if err != nil {
		w.WriteHeader(500)
		return
	}
	defer res.Body.Close()

	w.Write(dat)
}
package audit

func Log(req *http.Request, res *http.Response) error {
	dump, err := httputil.DumpResponse(res, true)
	if err != nil {
		return err
	}

	buf := &bytes.Buffer{}
	buf.WriteString(fmt.Sprintf("%s %s HTTP/%d.%d\n", req.Method, req.URL.RequestURI(), req.ProtoMajor, req.ProtoMinor))
	buf.Write(dump)

	file, err := os.OpenFile(uuid.NewString() + ".log", os.O_WRONLY|os.O_CREATE|os.O_TRUNC, os.FileMode(0600))
	if err != nil {
		return err
	}
	defer file.Close()

	if _, err := file.Write(buf.Bytes()); err != nil {
		return err
	}

	return nil
}
func Get(w http.ResponseWriter, r *http.Request) {
	req, err := http.NewRequestWithContext(r.Context(), http.MethodGet, "https://api.fake.io/id-1", nil)
	if err != nil {
		w.WriteHeader(500)
		return
	}

	res, err := http.DefaultTransport.RoundTrip(req)
	if err != nil {
		w.WriteHeader(500)
		return
	}
	defer res.Body.Close()
	dat, err := io.ReadAll(res.Body)
	if err != nil {
		w.WriteHeader(500) //Because you have a logic here to determine whether the read is normal, you cannot achieve the minimum memory overhead of stream iocopy.
		_ = Log(req, res, dat)
		return
	}
	_ = Log(req, res, dat)
	_, _ = w.Write(dat)
}

func Log(req *http.Request, res *http.Response, data []byte) error {
	dump, err := httputil.DumpResponse(res, false)
	if err != nil {
		return err
	}
	file, err := os.OpenFile(uuid.NewString()+".log", os.O_WRONLY|os.O_CREATE|os.O_TRUNC, os.FileMode(0600))
	if err != nil {
		return err
	}
	defer file.Close()
	_, err = file.WriteString(fmt.Sprintf("%s %s HTTP/%d.%d\n", req.Method, req.URL.RequestURI(), req.ProtoMajor, req.ProtoMinor))
	if err != nil {
		return err
	}
	_, err = file.Write(dump)
	if err != nil {
		return err
	}
	_, err = file.WriteString("\r\n\r\n")
	if err != nil {
		return err
	}
	_, err = file.Write(data)
	if err != nil {
		return err
	}
	return nil
}

Because you have a logic here to determine whether the read is normal, you cannot achieve the minimum memory overhead of stream iocopy.

From your code, just remove the defer here:

The DumpResponse makes a copy of the body internally.

Both answers are miles away from the question.

I don’t think so. I think the reason @GonzaSaya suggested you avoid defer there was: it creates a copy of your response body so you don’t have the problem of the body being empty on second read. Relevant code here:

https://cs.opensource.google/go/go/+/refs/tags/go1.23.3:src/net/http/httputil/dump.go;l=281

So you could run that NOT deferred, and to simplify copying resp.Body to w, instead of using io.ReadAll you can just use io.Copy like so:

_, err = io.Copy(w, resp.Body)

So your pseudocode would become something like this (probably won’t compile; writing this in text editor here so tweak as needed):

func Get(w http.ResponseWriter, r *http.Request) {
	req, err := http.NewRequestWithContext(r.Context(), http.MethodGet, "https://api.fake.io/id-1", nil)
	if err != nil {
		w.WriteHeader(500) // You should probably do something with err
		return
	}
	res, err := http.DefaultTransport.RoundTrip(req)
	if err != nil {
		w.WriteHeader(500) // Also do something with err here
		return
	}
	// Log the request; reads from res but creates copy of res.Body
	// so we can read it again later.
	audit.Log(req, res)
	// Copy our copied resp.Body to w
	_, err = io.Copy(w, resp.Body)
	if err != nil {
		w.WriteHeader(500) // Also do something with err here
	}
	// You should probably also write headers, etc. on w here in the happy path.
	res.Body.Close()
}
2 Likes

audit.Log is meant to be, purposely, running in the background without blocking the flow hence it is defer-red. I could have removed it myself as well but it is not applicable.

Also, I am saying “What is the most memory efficient way please?”, not “How can I achieve this without creating a copy of response?”.

Because of these two particular reasons, I consider answers redundant.

No, because you used io.Copy, the first time buf copied the data, it would already return http.OK, so it doesn’t make sense to return a status code of 500.

As I answered the answer, it has solved both your questions. If you don’t know why I wrote this, then you should read the logic.
Maximum memory savings means sharing the underlying data to the greatest extent possible.

Sorry I still don’t understand how it answers the question when I say removing defer is not an option.

Maximum memory savings means sharing the underlying data to the greatest extent possible.

Let’s start again. How do you achieve this statement without removing defer please? If it is not achievable, please show me another example where defer is still in place.

Well, defer-ing things isn’t the only way to accomplish a debug task not blocking your web request. You could just spin up a goroutine and do the audit.Log stuff there. I often do things that are side-effects of web handlers in their own goroutines for that reason.

Your original code is doing io.ReadAll(res.Body) which reads the entire response into memory. If you want to be able to handle large requests, you would need to stream res.Body into a buffer and do something with it. But generally speaking, in web servers, we just limit the request size to sane defaults using http.MaxBytesReader and don’t worry about it unless it’s a handler where we are expecting a large response. And in that case, yes you have to stream it.

Are you expecting large responses and thus need to stream them? If so, check out this answer:

You could accomplish something like that but mix it with io.MultiWriter.

All this having been said, if you are not expecting large responses, it’s probably more hassle than its’ worth at this point. Have you profiled your app? Is it using too much memory? Do you know this is the problematic area of code?

1 Like

True. I copy/pasted that code but you’re correct that returning a 500 after io.Copy is incorrect. I guess unless it fails before its’ first write and thus doesn’t set http.OK.

This is what I actually do behind the scene:

  1. Send a HTTP request to a remote server.
  2. Do two things with the HTTP response coming from the remote server.
    1- Read it, parse it to struct and give it to my client in handler.
    2- Dump whole response (uri, headers and body …) into a (because I am auditing all the responses):
    a) TXT file, if the application is running in local env.
    b) S3 bucket, if the application is not running in local env.

I just didin’t want a and b delay me from responding to my client because both sometimes take long time. That’s why I used defer.

@Dean_Davidson So I guess what you are saying is that:

  1. instead of using defer, use it like go audit.Log(...) and manage DATA RACE which is happening at the moment.
  2. stream it rather than using io.ReadAll(res.Body). I guess you mean var buf bytes.Buffer - io.Copy(&buf, resp.Body)?

By the way if you can think of any better solution, I am happy to listen. Thank you.

If I were asked to implement this feature, I would do something like this:

func Get(w http.ResponseWriter, r *http.Request) {
	req, err := http.NewRequestWithContext(r.Context(), http.MethodGet, "https://api.fake.io/id-1", nil)
	if err != nil {
		w.WriteHeader(500)
		return
	}

	res, err := http.DefaultTransport.RoundTrip(req)
	if err != nil {
		w.WriteHeader(500)
		return
	}
	defer res.Body.Close()

	bs := new(bytes.Buffer)
	writer := io.MultiWriter(w, bs)
	_, _ = io.Copy(writer, res.Body) // will 200 Code
	// plan a		// Recommend this plan
	flush, ok := writer.(http.Flusher)
	if ok {
		flush.Flush()
	}
	_ = Log(req, res, bs)
	// plan b
	go func() { // or goroutine pool ; If it's just simple concurrency, When a high number of requests are made in a short period of time, a large amount of memory overhead can be incurred.
		_ = Log(req, res, bs)
	}()
}

func Log(req *http.Request, res *http.Response, bs *bytes.Buffer) error {
	dump, err := httputil.DumpResponse(res, false)
	if err != nil {
		return err
	}
	file, err := os.OpenFile(uuid.NewString()+".log", os.O_WRONLY|os.O_CREATE|os.O_TRUNC, os.FileMode(0600))
	if err != nil {
		return err
	}
	defer file.Close()
	_, err = file.WriteString(fmt.Sprintf("%s %s HTTP/%d.%d\n", req.Method, req.URL.RequestURI(), req.ProtoMajor, req.ProtoMinor))
	if err != nil {
		return err
	}
	_, err = file.Write(dump)
	if err != nil {
		return err
	}
	_, err = file.WriteString("\r\n\r\n")
	if err != nil {
		return err
	}
	_, err = io.Copy(file, bs)
	if err != nil {
		return err
	}
	return nil
}

After going through all examples/suggestions here, I am not sure how to apply it to code below. Body can be at any size.

package xhttp

import (
	"context"
	"io"
	"net/http"
)

type Client struct {
	Client http.RoundTripper
}

func (c Client) Request(ctx context.Context, met, url string, bdy io.Reader, hdrs map[string]string) (*http.Response, error) {
	req, err := http.NewRequestWithContext(ctx, met, url, bdy)
	if err != nil {
		return nil, err
	}

	for k, v := range hdrs {
		req.Header.Add(k, v)
	}

	// go LogRequest()

	res, err := c.Client.RoundTrip(req)
	if err != nil {
		return nil, err
	}

	// go LogResponse()

	return res, nil
}

// LogRequest writes whole HTTP request to a file.
func LogRequest() {
	// GET /api/v1/postcodes?active=false HTTP/1.1
	// Host: 0.0.0.0:8080
	// Accept: */*
	// Accept-Encoding: gzip, deflate, br
	// Cache-Control: no-cache
	// Connection: keep-alive
	// Content-Length: 29
	// Content-Type: application/json
	// User-Agent: PostmanRuntime/7.42.0
	
	// {
	// 	"postcode": "SW1A0AA"
	// }
}

// LogResponse writes whole HTTP response to a file.
func LogResponse() {
	// GET /api/v1/postcodes?active=false HTTP/1.1
	// X-Request-Id: 1f532ad9-4062-46d3-9e2f-9fd5f3c74d9a
	// Content-Type: application/json
	//
	// Status: 200 OK
	// 
	// {
	//  "exists": true
	// }
}

1- Read it, parse it to struct and give it to my client in handler.

If you parse the whole request into a struct it will be completely read into memory anyway. As @Dean_Davidson said - Streaming will not provide any measurable benefit in your case. It is better to just use the byte-array, which you have created anyway and send a pointer to this to your audit method.

The best way is probably to read request URL/method and so synchronously (if you want by using defer) but do the actual write operation to S3 in a separate go-routine. This go routine will not have a data-race, since it only reads constant data (byte-array and the string which you created from the request)

Putting the actual writing of the audit data into a separate go routine is helpful, since the http-connection will be released when the original http-handler returns. When using only defer (and no extra go routine), it will still block and not return from the call until the deferred method is finished. And this your http request and the underlying connection will be blocked until the audit log is written.

Good call. I missed that in the original question.

Yeah - though since it’s a slice if you don’t pass a pointer you’re just passing the slice header; not the underlying array. So I don’t think you’d see a big difference between these in terms of allocations/memory footprint:

// Passing a pointer to our slice header.
func DoSomething(b *[]byte)
// Passing slice header; but slice header contains pointer to underlying
// array, so array isn't copied.
func DoSomething(b []byte)

Anyway, I totally agree with you. Modify the code to only read the request body once. Then use a goroutine to do the expensive part (uploading to S3). Pull out the relevant parts of DumpRequest to avoid creating multiple copies of the response body. Call it a day.

1 Like

I took what you all said into account and ended up with this one if you would like to review just in case I didn’t missunderstand.