Decompress gzip stream of unknown length

noone · April 22, 2017, 3:48pm

I think the easiest way of explaining is pseudo code:

ungz_stream = ?                           // some kind of gz decompress object
tmp := make([]byte, 64)
plain_buf := make([]byte, 1024)
for {
    l, _ = con.Read(tmp)                  // receive a chunk of gzipped data
    ungz_stream.Write(tmp[:l])            // add the compressed gz chunk
    l, _ = ungz_stream.ReadPlain(plain)   // read all available plain data to buffer

    plain[:l]   // now I would like to have uncompressed plain bytes in my plain buffer
}

My only requirements are that I can continuously add data without knowing the final length and
that I would like to be able to keep as much plain data as possible in case of a connection loss and therefore damaged gzip data.

I don’t really know how gzip works internally. The compressing part is currently using C# GZipOutputStream().

I already experimented a bunch with gzip.Reader() but it seems like it requires a stream that can be read to the end.
(There are a bunch of examples out there with file objects.)

calmh · April 22, 2017, 11:51pm

Typically, you’d

var conn net.Conn // this is where your data comes from. Could be any io.Reader
// ...
gr, err := gzip.NewReader(conn) // this step expects a gzip header at the head of conn
if err != nil {
    // there was a problem with the header
}

buf := make([]byte, 1024)
for {
    n, err := gr.Read(buf)
    // you got n bytes of uncompressed data, and/or an error
}

What problem or error are you running into? Presumably the C# sides writes gzip chunks, which should be fine as I understand it.

dfc · April 23, 2017, 12:29am

Use io.Copy with a multiwriter consisting of your gz writer for the compressed content, and a bytes.Buffer for the plain uncompressed output

For extra safety you should wrap the source reader in a limitreader to ensure your program is not sabotaged by large inputs.

noone · April 23, 2017, 11:04am

The problem is that I manually decrypt the data in between. So my actual gzip data doesn’t come directly from the convenient net.Conn reader but from a cipher.BlockMode CryptBlocks. I left this part out to not overcomplicate the example. I implemented the whole encryption part manually because I have to work with an old .net version in which crypto libraries practically don’t exist so its nice to have full control over the protocol.

calmh · April 23, 2017, 11:08am

So you need something between. A bytes.Buffer could work, where you Write() the decrypted bytes and the gzip.Reader reads them.

noone · April 23, 2017, 11:11am

The problem is that the gzip.Reader expects the passed io.Reader to already hold data and immediately starts reading and therefore immediately encounters EOF and finalizes itself. I need some kind of io.Reader that doesn’t return EOF or an error if no data is available. The docs discourage reader behaviour like that but don’t strictly prohibit it. And I’ve just seen that you even incorporated that header problem in your first example, sorry.

calmh · April 23, 2017, 11:22am

Perhaps you can write something around an io.Pipe to connect reads from the gzip.Reader with reads from the net.Conn and/or calls to CryptBlocks. You’ll need to buffer the data in between as the gzip.Reader will likely not read a full block at a time. Alternatively, some sort of framing on the wire, so you have a header saying “here’s comes a 123456 bytes gzip chunk” and then you know what to expect.

noone · April 23, 2017, 11:29am

In the end I just hoped I wouldn’t have to manually control the gzip stream input somehow inside my socket reading/decrypting code. I could just implement everything in a way that the reader gets instantiated after I have 11 bytes and gzip.Reader().read only gets called if data is available. The whole thing is a big hassle and results in really dirty code that doesn’t adapt well and is overly complicated. I just hoped that it is possible to wrap the gzip reader in something stream aware but I guess not. Many thanks for the help.

noone · April 25, 2017, 3:13pm

I forgot to post my actual solution that seems to work fine so far.
It is not very pretty and there is a lot missing but it gets the job done for me.

type GzipStream struct {
	buf *bytes.Buffer
	r *gzip.Reader
	init bool
}

func NewGzipStream() *GzipStream{
	gzs := new(GzipStream)
	gzs.buf = new(bytes.Buffer)
	return gzs
}

func (g *GzipStream) Write(b []byte) (int, error){
	return g.buf.Write(b)
}

func (g *GzipStream) Read(b []byte) (int, error){
	if !g.init {
		if g.buf.Len() < 11 { return 0, nil }
		g.init = true
		g.r, _ = gzip.NewReader(g.buf)
	}
	if g.buf.Len() == 0 { return 0, nil }
	return g.r.Read(b)
}

func (g *GzipStream) Close() error {
	return g.r.Close()
}

Improvements are always welcome.

dfc · April 25, 2017, 11:09pm

Your solution looks quite complicated, especially g.init, that’s a code smell to me. From looking at the code your requirements are:

Write gzip’d data to an io.Writer; read uncompressed data from an io.Reader

Where does the gzip’d data being written to the io.Writer come from? I think this can probably be accomplished more cleanly with io.Copy.

Tamas_Gulacsi · April 26, 2017, 5:18am

Use io.Pipe to turn reading into writing.
Write in your decipherer, and wrap the io.PipeReader in gzip.NewReader.
That’s all.

system · July 25, 2017, 5:18am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.