Help me untangle my API

I have written an API for mounting an S3 bucket here: https://github.com/VertebrateResequencing/wr/tree/develop/minfys

It’s currently “good enough”, but not “good”. How would you reorganise things to make things cleaner and nicer?

In particular:

  1. Most of the code specific to connecting to the remote system is in remote.go which defines remote struct and methods, but it would be nice if these remote objects weren’t hard-coded for this particular remote system, but instead someone could easily “plug in” code for a different remote system. (remote is used by the main MinFys object to do its work)

  2. file.go ends up using constructs like f.r.fs.mutex.Lock() which seems quite bad. fs is a MinFys which creates and stores on itself a remote ®, which in turn keeps a reference back to fs. r then creates a CachedFile (f), which keeps a reference back to r. Then I’m accessing properties and calling methods on the grand-parent of f. How can this be done without duplicating method code or unnecessarily duplicating and coping property values down the hierarchy?

I have only taken a brief look at this. Most of my time was spent looking at an ERD generated with https://github.com/gmarik/go-erd.

The components that are problematic are: MinFys, remote, CachedFile, and S3File. To untangle the mess, I think they should only know about each other as implementations of pathfs.FileSystem and nodefs.File. Each implementation would then focus on a single concern. For example:

  • MuxFS would multiplex between many pathfs.FileSystems
  • S3FS would provide direct, immediate access to the contents of its configured bucket (i.e., no caching)
  • CacheFS would provided cached access to a pathfs.FileSystem such as S3FS

None of these implementation should have access to its parent. For example, it should not be possible to get from an S3FS to a MuxFS. The separation of concerns should limit duplicating method core or property values because the code and properties belong to the only thing that needs to deal with them.

The nodefs.File implementations should follow a similar pattern.

Separating concerns like this has you implementing more FileSystems and Files, but each implementation is simpler.

Thanks for the suggestions. I’ll need some time to figure out exactly how to implement as you suggest. I’m unsure as to how to resolve this though:

My CachedFile needs to record what it has done in some global variable (because multiple CachedFile objects for the same file can get created and used by a user simultaneously or sequentially, but I must coordinate their actions to avoid downloading the same data more than once - see my usage of f.r.fs.downloaded). I currently store this information in my MinFys, hence requiring access to the grandparent which you say (and I agree) shouldn’t be done. But how else can I do this?

downloaded is not a term the cache even needs. The cache should not care anything about how the cached FS works, but just have a strategy for caching.

The cache itself coordinates access to cached things, using the underlying FS and Files as needed. I expect that CacheFS and CacheFile interact with a shared cache. The cache then interacts with the underlying FS.

Trying to understand your suggestion… can you offer a little text diagram to clarify? Is it something like:

MuxFS -> CacheFS -> cache <-> CacheFile
cache -> S3FS -> S3File

(I don’t think I got that right)

That’s pretty much what I’m thinking. Just make sure that the separate parts (Mux, Cache, S3) only interact through the pathfs.FileSystem and nodes.File interfaces.

I would start with MuxFS and S3FS and then figure out CacheFS (since it is the most tricky).

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.