Hi, all,
Background:
I’m working on a program I call “standpipe” which takes data from standard input, writes it into a temporary file and then feeds that data into standard output. The idea is to allow the source program (writing to stdout) to write as fast as it can even if the target program (reading from stdin) can’t keep up. My (somewhat contrived) use case is that when using memory-intensive compression algorithms like LZMA2 in xz
and/or 7z
and piping the data to some target over the network, I’d like xz
/7z
to complete as fast as possible to release that memory, even if my Internet can’t keep up. Of course there are alternatives I could take, but as a learning experience, I’d like to figure out why what I’m trying isn’t working.
Details:
My repository is here: https://github.com/skillian/standpipe
After cloning/downloading, changing to the sp
directory and then go build
ing, the usage for the sp
command is:
usage: sp [ -f CACHEFILE ] [ --log-level LOGLEVEL ] [ -s PAGESIZE ]
standpipe to cache output from one command before piping it into a slower command.
optional arguments:
-f, --cache-file
Custom cache file name. If not used, a temp file is created
instead.
--log-level Specify a custom logging level (useful for debugging).
-s, --page-size
Page size within the standpipe file. Pages are updated in random
locations within the standpipe file so to reduce the amount of
seeking, this value should be as large as possible. There are
two pages always kept in memory at a time: One for reading and
one for writing, so this value is a balancing act between
reduced seeks and memory usage
Here’s a set of commands I’m using to alpha test the program before I write my test cases (I’m one of those people that write my tests afterward):
dd if=/dev/urandom bs=32768 count=8192 of=~/test.dat
cat ~/test.dat | sp -f ~/test.sp -s 1048576 | gzip -c -9 > ~/test.dat.gz
When I compile and run my program, the standpipe file header is generated and the cache file is loaded up with data fast (I have not yet determined if the data being written is valid or if the same buffer is being written, etc. That will be one of my next steps to test).
Issue:
My issue is that no data seems to ever be read from the cache file to be written to stdout. I thought I was using sync.Cond
correctly, but my guess right now is I’m missing something crucial with the locking/signalling. I would consider using a chan
instead of a slice of offsets except that:
- I don’t know what the length of the
V1Pipe.offs
can grow to (10, 100, 1000, 10,000, 100,000, etc). - I need to be able to flush what’s in the pipe when the program is interrupted, so I’d need to make sure that the
V1Pipe.Close
function gets all of the offsets inV1Pipe.offs
and another goroutine listening on a conceptualV1Pipe.offs
chan
doesn’t steal one while we’re closing.
Question:
Can I bother any of you gophers to review what I have and see if I’m doing something wrong with goroutines/locking/something else that I should try to fix? Even though I admit I don’t need this to work, I’d like to understand why what I have doesn’t work and what I have to change to make it work to make me a better programmer.
Suggestions to improve my programming “style” are also welcome! Thanks for your consideration!