Deadlocking (maybe) when reading and writing to same *os.File


(Sean Killian) #1

Hi, all,

Background:

I’m working on a program I call “standpipe” which takes data from standard input, writes it into a temporary file and then feeds that data into standard output. The idea is to allow the source program (writing to stdout) to write as fast as it can even if the target program (reading from stdin) can’t keep up. My (somewhat contrived) use case is that when using memory-intensive compression algorithms like LZMA2 in xz and/or 7z and piping the data to some target over the network, I’d like xz/7z to complete as fast as possible to release that memory, even if my Internet can’t keep up. Of course there are alternatives I could take, but as a learning experience, I’d like to figure out why what I’m trying isn’t working.

Details:

My repository is here: https://github.com/skillian/standpipe

After cloning/downloading, changing to the sp directory and then go building, the usage for the sp command is:

usage: sp [ -f CACHEFILE ] [ --log-level LOGLEVEL ] [ -s PAGESIZE ]

standpipe to cache output from one command before piping it into a slower command.

optional arguments:
  -f, --cache-file
                Custom cache file name.  If not used, a temp file is created
                instead.
  --log-level   Specify a custom logging level (useful for debugging).
  -s, --page-size
                Page size within the standpipe file. Pages are updated in random
                locations within the standpipe file so to reduce the amount of
                seeking, this value should be as large as possible.  There are
                two pages always kept in memory at a time:  One for reading and
                one for writing, so this value is a balancing act between
                reduced seeks and memory usage

Here’s a set of commands I’m using to alpha test the program before I write my test cases (I’m one of those people that write my tests afterward):

dd if=/dev/urandom bs=32768 count=8192 of=~/test.dat
cat ~/test.dat | sp -f ~/test.sp -s 1048576 | gzip -c -9 > ~/test.dat.gz

When I compile and run my program, the standpipe file header is generated and the cache file is loaded up with data fast (I have not yet determined if the data being written is valid or if the same buffer is being written, etc. That will be one of my next steps to test).

Issue:

My issue is that no data seems to ever be read from the cache file to be written to stdout. I thought I was using sync.Cond correctly, but my guess right now is I’m missing something crucial with the locking/signalling. I would consider using a chan instead of a slice of offsets except that:

  1. I don’t know what the length of the V1Pipe.offs can grow to (10, 100, 1000, 10,000, 100,000, etc).
  2. I need to be able to flush what’s in the pipe when the program is interrupted, so I’d need to make sure that the V1Pipe.Close function gets all of the offsets in V1Pipe.offs and another goroutine listening on a conceptual V1Pipe.offs chan doesn’t steal one while we’re closing.

Question:

Can I bother any of you gophers to review what I have and see if I’m doing something wrong with goroutines/locking/something else that I should try to fix? Even though I admit I don’t need this to work, I’d like to understand why what I have doesn’t work and what I have to change to make it work to make me a better programmer.

Suggestions to improve my programming “style” are also welcome! Thanks for your consideration!


(Sean Killian) #2

I made some significant refactoring and I’m not deadlocking any more. My issue seemed to be related to my own bytes.Buffer-like implementation that never grows to work with my fixed-size blocks within the standpipe caching file. I was returning a nil error and 0 bytes when the buffer was full. I refactored so I return a “buffer full” error when trying to write into the buffer and a “buffer empty” error when reading from an empty buffer. I then handle switching buffers from the standpipe file’s Read and Write functions as I empty and fill the underlying buffers, respectively.

It still doesn’t work yet but I’m not (as) stuck any more.


(system) closed #3

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.