Born again newbie question on binary file reading

Hello, I should explain that by “born again newbie”, I’m referring to the fact that I studied Go quite extensively a few year ago, following the Todd McLeod courses. Since then I have been devoting my time to electronic music production and have evidently lost a lot of my Golang chops. Now through my music, a situation has arisen where I need to use my former Go powers, with some help from you guys!

I have a file format which is a mixture of binary and text. The text looks as if it can be converted to a Dictionary, with a series of keys such as “comment”, “author” etc. The text appears to be written with Pascal style strings where the text is preceded by a byte (or more) indicating the length of the strings.

The position of the text marked “comment” at 0x83 in the attached picture seems to be constant across all files of this type (note the length byte at 0x82 - is this part of a larger number structure covering the previous 3 bytes making it a 64 bit length, I wonder). But the position of all subsequent pieces of text can vary according to the content. All the files have the same collection of keys.

All the text I am interested in extracting is that which is contained in the first 0x0200 bytes, and ideally would consist of code which extracts the key/value pairs to a dictionary structure and prints it out.

I’m not looking for someone to provide me with an off-the-shelf solution. Instead, if anyone is willing to spend the time giving me a series of guiding prods to lead me in the right direction, I would be very grateful!

I’ve managed to get a file into a slice using ioutil.ReadFile but am unsure how to continue. I suspect I’m going to have to use different interfaces to allow me to read the numeric data as well as the text data.

So you have successfully read the file into a byte slice.
Now you want to decode that slice into some sort of data structure which hold the data in a meaningful form.

Maybe star by defining this structure - what format do you want the data presented in.
Then you can start writing code which steps through the byte slice and populates this structure.

Yes! Thanks @amnon! Even though I am not sure yet how they will be populated, there is no reason why I cannot start defining and coding the data structures that will store the extracted information. From a psychological point of view this will definitely feel like progress, and of course, in the process of doing it, the answer may jump out of the depths of my subconscious and surprise me :wink:

You could read chunks of data from the file with *File ReadAt to read from a specific position i.e. something like this:

f, err := os.Open("./dataFile.dat")
if err != nil {
    log.Panic(err)
}

chunk, err := readFromFile(f, 64, 20) //read 20 bytes in the file from byte 64

func readFromFile(file *os.File, offset, size int) ([]byte, error) {
    res := make([]byte, size)
    if _, err := file.ReadAt(res, int64(offset)); err != nil {
        return nil, err
    }
    return res, nil
}

…then once you have a little slice chunk of data you could do something like this if you get your struct byte alignment correct to what the chunk of bytes contains to deserialize the bytes into a struct:

var target someStructType
buf := bytes.NewReader(chunk)
err := binary.Read(buf, binary.LittleEndian, &target)
if err != nil {
    fmt.Println("binary.Read failed:", err)
}

The magic key to this idea working as it needs to is understanding the structure of the data in the file(s) well enough to be able to accurately align chunks of file bytes to structs.

I don’t know the above will compile btw because I just typed it in here… but it should be close enough to give you some ideas how you could go about this :wink:

Hi @ukhobo thanks for the reply! I’ll check out your code tomorrow. I’ve got a pretty good idea of the structure of the file - which bits are numeric and which bits are text. Methods like file.ReadAt certainly sound as if they will be useful.

I’m making some progress, I’ve been concentrating non-coding issues like reacquainting myself with git and GitHub, and structuring the code into cmd and output packages.One question that has sprung up. I’ve successfully create a go.mod but I do not seem to have an accompanying go.sum. At what point is the go.sum created?

The repro is at https://github.com/carlca/gowig by the way :wink:

HI @ukhobo, I’m pleased to report some progress, with your help. This is my code…

	streamPos := 0x7f
	chunk, err := readFromFile(f, streamPos, 4) //read 4 bytes in the file from byte 0x7f 

	var size int32
	buf := bytes.NewReader(chunk)
	err = binary.Read(buf, binary.BigEndian, &size)
	if err != nil {
		fmt.Println("binary.Read failed:", err)
	}
	fmt.Println(size)

I’m grabbing 4 bytes of data: 0, 0, 0 and 7 into chunk which is then put into buf. I had to change size from an int64 to a int32 to match the 4 byte length of buf (I was getting unexpected EOFs otherwise). Finally I had to change the LittleEndian to BigEndian. The thing that gave this away, is that size was being output as 1879048192 aka 0x70000000! Buoyed with this success, I’m going to call it a night, and start afresh tomorrow. One thing I learnt in my former profession as a developer, was that at this time of night, it’s best to quit while you’re ahead :wink:

Thanks for your help so far. This is just the kind of help I was hoping for :grinning:

Good progress, it’s nice to hear that my bug ridden idea has been somewhat useful (so far) :blush:

On the go.mod / go.sum front, go.mod contains package references and go.sum contains calculated hashes of packages that are external to your project. If at some point you were to add an import in your code to an external package, you should then see god.mod populated with that package and go.sum auto populated with the hash of the package when you go build your code. If you don’t import/reference any external package(s) then go.mod will remain mostly empty and go.sum won’t exist because it doesn’t need to.

I’ve made some more progress. Here is part of my latest code…

func ProcessPreset(filename string) error {
	f, err := os.Open(filename)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	var streamPos int32 = 0x7f
	var size int32
	var text string

	if streamPos, size, err = readIntChunk(f, streamPos); err != nil {
		return err
	}
	fmt.Println("size: ", size)
	fmt.Println("stringPos: ", streamPos)

	if streamPos, size, text, err = readTextChunk(f, streamPos, size); err != nil {
		return err
	}

	fmt.Println("size: ", size)
	fmt.Println("stringPos: ", streamPos)
	fmt.Println("text: ", text)

	streamPos++

	if streamPos, size, err = readIntChunk(f, streamPos); err != nil {
		return err
	}
	fmt.Println("size: ", size)
	fmt.Println("stringPos: ", streamPos)

	if streamPos, size, text, err = readTextChunk(f, streamPos, size); err != nil {
		return err
	}

	fmt.Println("size: ", size)
	fmt.Println("stringPos: ", streamPos)
	fmt.Println("text: ", text)

	return nil
}

As you can see, I have chosen to use the slightly flattened approach to handling errors, except for right at the start of the function. I just cannot work out how to do this without the compiler complaining. Any ideas?

What do you think of inverting the err != nil logic to flatten out the code even more. As long as the non-error route didn’t become too indented, I think it’s quite elegant, though I may be horribly wrong :wink:

func readNextSizeAndChunk(f *os.File, streamPos int32) (int32, int32, string, error) {
	var err error
	if streamPos, size, err := readIntChunk(f, streamPos); err == nil {
		if streamPos, size, text, err := readTextChunk(f, streamPos, size); err == nil {
			return streamPos, size, text, nil
		}
	}
	return 0, 0, "", err
}

From a personal pov I struggle a little to read, follow and immediately understand the logic of your 2nd code snip tbh but that might be because I can’t see all of your code.

Sometimes people say that Go code should not try and be too clever and that it’s better to write code that is easily understandable even if that makes it more verbose. On the other hand this thinking might only apply to code that will need to be read and maintained by many people and I’d say that if this is a personal project, it’s fine to implement whatever code pattern will work for you / future you.

Your 1st code snip appears to compile and run without any error in the playground: https://play.golang.org/p/3fKyT5gffxH