HTML validation

Hi Id like to perform some basic validation on html files in order to sanitise some input, (i.e. check that standard html tags are in correct places and properly terminated etc). However it seems that package golang.org/x/net/html doesnt really do any significant high-level validation like this, and accepts invalid html as input. What is the easiest way to achieve such validation?

Thanks

Andy

1 Like

I would expect the Parse function to error out on invalid HTML input. It might not meet all of your specific requirements (e.g. if you want to validate against a particular HTML standard) but it would at least be a start.

well, everything I throw at it appears to be accepted without error. What input would cause it to return an error?

i.e.

	package main

	import (
		"fmt"
		"golang.org/x/net/html"
		"strings"
	)

	func main() {
		n, err := html.Parse(strings.NewReader("<!DOCTYPE html><nothtml>&#<body><a href=\"unclosed tag\"</html>"))
		fmt.Println(n)
		fmt.Println(err)
	}
1 Like

Seems the Parse() method tries to be very forgiving.

I did a quick Web search for “golang html validator” (and variations thereof) without results,

but

Searching GitHub for

html checker language:go 

returned BlackEspresso/htmlcheck that appears to be what you are looking for. (The terse readme is not helpful, it only demonstrates how the tool detects a missing closing tag.)

Yes, thanks for that, I hadn’t come across that library. However, it doesn’t look very mature or recently maintained so I’m not overly keen to use it. It seems there is a opportunity here for a good library to do proper W3C validation.

Another idea that just came to me: As HTML is an application of XML, the encoding/xml package can provide the desired level of validation. Unlike the html package, encoding/xml's decoder does error out on things like missing closing tags.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.