I’m working on a program I call “standpipe” which takes data from standard input, writes it into a temporary file and then feeds that data into standard output. The idea is to allow the source program (writing to stdout) to write as fast as it can even if the target program (reading from stdin) can’t keep up. My (somewhat contrived) use case is that when using memory-intensive compression algorithms like LZMA2 in
7z and piping the data to some target over the network, I’d like
7z to complete as fast as possible to release that memory, even if my Internet can’t keep up. Of course there are alternatives I could take, but as a learning experience, I’d like to figure out why what I’m trying isn’t working.
My repository is here: https://github.com/skillian/standpipe
After cloning/downloading, changing to the
sp directory and then
go building, the usage for the
sp command is:
usage: sp [ -f CACHEFILE ] [ --log-level LOGLEVEL ] [ -s PAGESIZE ] standpipe to cache output from one command before piping it into a slower command. optional arguments: -f, --cache-file Custom cache file name. If not used, a temp file is created instead. --log-level Specify a custom logging level (useful for debugging). -s, --page-size Page size within the standpipe file. Pages are updated in random locations within the standpipe file so to reduce the amount of seeking, this value should be as large as possible. There are two pages always kept in memory at a time: One for reading and one for writing, so this value is a balancing act between reduced seeks and memory usage
Here’s a set of commands I’m using to alpha test the program before I write my test cases (I’m one of those people that write my tests afterward):
dd if=/dev/urandom bs=32768 count=8192 of=~/test.dat cat ~/test.dat | sp -f ~/test.sp -s 1048576 | gzip -c -9 > ~/test.dat.gz
When I compile and run my program, the standpipe file header is generated and the cache file is loaded up with data fast (I have not yet determined if the data being written is valid or if the same buffer is being written, etc. That will be one of my next steps to test).
My issue is that no data seems to ever be read from the cache file to be written to stdout. I thought I was using
sync.Cond correctly, but my guess right now is I’m missing something crucial with the locking/signalling. I would consider using a
chan instead of a slice of offsets except that:
- I don’t know what the length of the
V1Pipe.offscan grow to (10, 100, 1000, 10,000, 100,000, etc).
- I need to be able to flush what’s in the pipe when the program is interrupted, so I’d need to make sure that the
V1Pipe.Closefunction gets all of the offsets in
V1Pipe.offsand another goroutine listening on a conceptual
chandoesn’t steal one while we’re closing.
Can I bother any of you gophers to review what I have and see if I’m doing something wrong with goroutines/locking/something else that I should try to fix? Even though I admit I don’t need this to work, I’d like to understand why what I have doesn’t work and what I have to change to make it work to make me a better programmer.
Suggestions to improve my programming “style” are also welcome! Thanks for your consideration!