Hi everyone, I’m new to Go development and I have a small question for you today.
I am in need of developing a small package that can store files into an archive. Once the files are written in the archive, and the main program closes, the next time I run the program I will need to add more data into the archive not only as new files but also by appending new data to files already into the archive.
My first thought was to go with the “archive/zip” library, only to find out that the library does not support the possibility to add new files to an existing zip nor append content to existing files within the zip. There is even a github issue about this with some ideas\suggestions (https://github.com/golang/go/issues/15626). It also seems the same applies to tar and gzip.
The question is: is there a best practice or recommended way to achieve the above? Should I extend the “archive/zip” to handle the cases I need?
As far as I understand about compressing archives mechanism flow, you need to re-compress every-time you modify/delete the payloads for effective compression.
Hence, that’s the reason why Append/Delete is considered costly “good-to-have” feature. What you can do is to design a wrapper functions on top of the standard library (if you really need it). The wrapper should have the following sequence:
Thanks for the answer. I feel like I’m approach this all the wrong way and that a compressed archive might be the right way to approach this. But since I will have many different files I thought it might be kinda nice to have everything inside a “container” instead of having everything laying around in a folder.
It depends on the requirements the nature of the payloads. If the total uncompressed files is small (<5GB) and the modifications is not frequent, you can use your method.
Otherwise, you might need to consider other approaches like scattering the archive into multiple small archives (reorganization). It’s more of strategizing your approach.
It boils down to:
What is the average and edge cases file size for each payload changes?
How frequent would you change payload?
How does your customer access the payload via your compression choice?
That’s is different story. On Linux (not sure about Mac / Windows), we can mount a encrypted storage drive as a “container” to store everything inside. Using this method only needs os.exec for one time mount.
This method facilitates an even easier way to modify payload without needing to archive into compressed file. Even if that is a requirement, you can easily archive that mounted directory at will.
The cost are:
not platform independent
requires you to learn a few things like RAID 1 (optional), cryptsetup, and lvm.
I believe in Windows there is such solution already. (I’m not a Windows user for >5 years already so )
What I’m doing is saving HTTP requests and responses coming into a proxy. I am not considering SQLite because I don’t really need a relational database and I was suggested to consider a simple JSON file where I would append the requests and responses. The thing is that something it seems to easily create large file. An history of roughly 244 items is already a couple of MB.
It boils down to how your customer queries the information to update their local “database” at a particular time. This is “database” and “design” question.
For example, for a large tracker data like IoT sensor dump per seconds, I would organize the data by (30 mins timestamps) chained archives (similar to logging mechanism or git in software development). Then on the client sides, there is a function simply re-construct these archives into a cache-able database after receiving the archives one at a time.
This is done for 2 reasons:
Small archives are a lot easier to transmit and distribute over network.
Client can consume the data one stage at a time (e.g. with or without a particular update)
Large archive files at one point will be sliced into multiple small file fragments for effective network transmission so it is redundant effort. Also, in case you need to optimize or secure the “database”, it is a lot of wait time with consumer level laptop.
If you do really need go to single file and grow beyond 5GB (assuming peaked at 5TB), I would suggest you use some kind of transaction database like no-SQL databases like