Hi Id like to perform some basic validation on html files in order to sanitise some input, (i.e. check that standard html tags are in correct places and properly terminated etc). However it seems that package golang.org/x/net/html doesnt really do any significant high-level validation like this, and accepts invalid html as input. What is the easiest way to achieve such validation?
I would expect the Parse function to error out on invalid HTML input. It might not meet all of your specific requirements (e.g. if you want to validate against a particular HTML standard) but it would at least be a start.
Seems the Parse() method tries to be very forgiving.
I did a quick Web search for “golang html validator” (and variations thereof) without results,
but
Searching GitHub for
html checker language:go
returned BlackEspresso/htmlcheck that appears to be what you are looking for. (The terse readme is not helpful, it only demonstrates how the tool detects a missing closing tag.)
Yes, thanks for that, I hadn’t come across that library. However, it doesn’t look very mature or recently maintained so I’m not overly keen to use it. It seems there is a opportunity here for a good library to do proper W3C validation.
Another idea that just came to me: As HTML is an application of XML, the encoding/xml package can provide the desired level of validation. Unlike the html package, encoding/xml's decoder does error out on things like missing closing tags.