Parsing nested XML with multiple namespaces

crghilardi · August 8, 2022, 12:30am

I’m looking to parse out a subset of values from a nested XML. I’m hoping to not have to bring in any dependencies and just use the standard library xml package.

The full example is nested and has a few different namespaces at play. I saw there are a number of open issues regarding xml namespaces, so I made a small MWE with what I thought was the minimum level of detail with a namespace and that worked successfully for what I was trying to extract.

small example - Go Playground

...
//abbreviated, see go playground link for full data structure

	reader := strings.NewReader(dat1)
	decoder := xml.NewDecoder(reader)

	//https://stackoverflow.com/questions/59615418/how-to-parse-xml-in-slice-format
	type StuffList struct {
		Stuff []string `xml:"SubLevel1>SubLevel2"`
	}

	var sl StuffList

	err := decoder.Decode(&sl)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("rl:", sl.Stuff)

With that in hand, I tried to apply that to the full document which has additional levels of nesting and namespaces and it returns an empty slice.

Full example - Go Playground link

Can I actually read this w/ Decode? Not sure where the line is between Decode or needing an Unmarshall call? Do I need some more levels of detail in the struct tag to be able to “see” the tree correctly? I have tried with adding one or two additional levels with no success.
When is a struct of structs needed vs being able to just read into a slice? I thought specifying the proper tag path in the tag would essentially save me from having to create structs for the full tree?
Is there a way to introspect where the smaller MWE works and where the larger example fails? I don’t quite understand what is going on between the two. I tried the delve package a little bit but did not mange to get anything useful.

skillian · August 8, 2022, 12:07pm

After you embed your SubLevel1 and SubLevel2 into additional layers, the path in your Stuff struct must become longer to “reach” down into that level. I changed:

type StuffList struct {
		Stuff []string `xml:"SubLevel1>SubLevel2"`
	}

to

type StuffList struct {
		Stuff []string `xml:"Response2>TopLevel>SubLevel1>SubLevel2"`
	}

and it worked: Go Playground - The Go Programming Language

So, to answer your questions:

Yes
I would recommend adding layers of structs and/or slices of structs when you need more information from more levels of the XML. If all you’re looking for is information from a specific XPath, then what you have here is fine. If you need not just multiple XPaths, but need to know also which SubLevel2 corresponds to which SubLevel1, then you might need to define a slice of SubLevel1 structs which each contain a slice of strings for the SubLevel2 values.
I’m not sure if there’s a way to do it or not. I just scrolled through and noticed that the layers in your second example are nested deeper, but the paths were the same, so I started adding prefixes, “TopLevel”, and then “Response2” until it returned values. I’m not sure if there’s a way to debug.

system · November 6, 2022, 12:08pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.