Partially unmarshalling an XML file

jfgratton · November 6, 2022, 5:11pm

I need to extract a few fields from an XML file and wonder how to go about it.

First, some context:
I need to fetch a libvirt VM’s snapshot information, which is encoded in XML.

I do this with
data, _ := snap.GetXMLDesc(0)

(data is a string var… so yes, libvirt returns ±8kb files in a single string var).

If I write that string in an xml file, it amounts to 188 lines (7.8kb) of fileds + attributes. I only need 3 of them, really.

From various readings on the net, I gather that to unmarshall the XML file, I would need to create a data struct where I’d map all fields/attributes, and so on. Surely there’s a better way, when I need about 3 lines of that XML file?

One workaround I thought about is to dump that string var in a file, and “grep” within that file to get my info, but found that un-elegant. There must be a way to only map the info I need from that file than that ?

Here’s a sample of the XML file. Let’s say I wanted only the “parent”, “creation type” and “type arch” fileds + attributes, besides dumping the XML to a file, I do not see my way around that.

<description>vmman-generated snapshot</description>
  <state>shutoff</state>
  <parent>
    <name>0.clear</name>
  </parent>
  <creationTime>1667742695</creationTime>
  <memory snapshot='no'/>
  <disks>
    <disk name='vda' snapshot='internal'/>
  </disks>
  <domain type='kvm'>
    <name>alpinedev</name>
    <uuid>055e3a07-533a-41ee-ae94-cbee5ff404f0</uuid>
    <metadata>
      <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
        <libosinfo:os id="http://alpinelinux.org/alpinelinux/3.16"/>
      </libosinfo:libosinfo>
    </metadata>
    <memory unit='KiB'>786432</memory>
    <currentMemory unit='KiB'>786432</currentMemory>
    <vcpu placement='static'>1</vcpu>
    <os>
      <type arch='x86_64' machine='pc-q35-7.0'>hvm</type>
      <boot dev='hd'/>
    </os>

NobbZ · November 6, 2022, 7:19pm

You might take a look at “sax” or “stream” parsing of XML.

Alternatively XPath.

If the library you decide to use is implemented well, memory consumption will be relatively minimal, plus/minus some garbage collectable artifacts which generally happen while seeking files.

jfgratton · November 6, 2022, 7:32pm

Oh… I guess I’ve missed XPath, for some reason ?!. It is very similar to a Python3 lib I’ve used, awhile pack (must be a port/fork). This is the closest I’ve came across for a solution, today.

I was going the workaround way I’ve mentioned in my OP, pinching my nose all the way down

Thanks, @NobbZ . I do not know why I’ve missed XPath in my search ! Do not care, I’ve a solution, now.

clbanning · November 8, 2022, 12:41pm

You could have just defined a struct with the tags you wanted to read from the data: Go Playground - The Go Programming Language

jfgratton · November 9, 2022, 11:26pm

Easier yet than XPath, @clbanning , thanks. I did not think you could “partial map” between a struct and an xml doc. I was to test it in the playground, and well, got carried away with @nobbz’s solution.

Thanks, that works.

jfgratton · November 10, 2022, 12:52am

Ok, @clbanning , I thought it worked, but it does not (note to self: compiles != works). I’m not sure if it’s my limited so-far knowledge of GO, or my rusty rememberance of XML docs: here’s an edited version of the XML I need to parse:

<domainsnapshot>
  <name>1.hello</name>
  <description>vmman-generated snapshot</description>
  <state>shutoff</state>
  <parent>
    <name>0.clear</name>
  </parent>
  <creationTime>1667742695</creationTime>
</domainsnapshot>

Now, I need the snapshot name, parent name (if present) and creationTime.

I’ve built the following structs:

type ParentElement struct {
	XMLName xml.Name `xml:"parent"`
	Name    string   `xml:"parentname"`
}
type SnapshotXMLstruct struct {
	SnapshotName string        `xml:"snapname"`
	Creationdate uint64        `xml:"creationdate"`
	ParentName   ParentElement `xml:"parent",omitempty`
}

ParentName has omitempty set as this tag might be missing in the XML doc.
My code to retrieve the XML and append the 3 needed fields in my own struct is thus:

	var snapXMLdata SnapshotXMLstruct
	var snaps []SnapshotXMLstruct
	<snip>
	snapshots, _ := domain.ListAllSnapshots(0)

	for _, snap := range snapshots {
		data, _ := snap.GetXMLDesc(0)
		err := xml.Unmarshal([]byte(data), &snapXMLdata)
		if err != nil {
			fmt.Println("Error: ", err)
			os.Exit(0)
		}
		snaps = append(snaps, snapXMLdata)
	}

The data var is non-empty, so it’s not a question of not fetching a valid XML doc.
… yet, snapXMLdata is empty, and err == nil.

I guess I’m not getting that simple a concept :-/

clbanning · November 10, 2022, 12:52pm

Here’s an example using your data - Go Playground - The Go Programming Language.

jfgratton · November 10, 2022, 3:39pm

I see what I was doing wrong here… Thank you !

system · February 8, 2023, 3:40pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.