Hello, I was recently working on something that required me to search through some html and I ran into a strange issue. I am new to Go from other languages, so apologies if I am missing some nuance here.
The Data
struct member of a x/net/html.Token
is a string, but I cannot do any kind of string comparison against it?
MWE:
package main
import (
"fmt"
"io"
"log"
"reflect"
"strings"
"golang.org/x/net/html"
)
func main() {
dat := `<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><apm_do_not_touch><body onload="document.forms[0].submit()"><noscript> <p> <strong> Your browser does not support JavaScript, press the Continue button once to proceed. </strong></p> </noscript> <form action="https://website.com" method="post"> <input type="hidden" name="SomeName" value="SomeValue"/><noscript> <div> <input type="submit" value="DoStuff"/></div> </noscript> </form></body></apm_do_not_touch></html>`
reader := strings.NewReader(dat)
tokenizer := html.NewTokenizer(reader)
//largely from https://drstearns.github.io/tutorials/tokenizing/
for {
//get the next token type
tokenType := tokenizer.Next()
//if it's an error token, we either reached
//the end of the file, or the HTML was malformed
if tokenType == html.ErrorToken {
err := tokenizer.Err()
if err == io.EOF {
//end of the file, break out of the loop
break
}
//otherwise, there was an error tokenizing,
//which likely means the HTML was malformed.
//since this is a simple command-line utility,
//we can just use log.Fatalf() to report the error
//and exit the process with a non-zero status code
log.Fatalf("error tokenizing HTML: %v", tokenizer.Err())
}
//process the token according to the token type...
//input is a self closing token
if tokenType == html.SelfClosingTagToken {
fmt.Println(tokenizer.Token().Data)
fmt.Println("input")
data := tokenizer.Token().Data
//these seem to match
fmt.Println("t1:", reflect.TypeOf(tokenizer.Token().Data))
fmt.Println("t2:", reflect.TypeOf("input"))
//these seem to match
fmt.Println("len1:", len(tokenizer.Token().Data))
fmt.Println("len2:", len("input"))
fmt.Println("len3:", len(data))
fmt.Println("compare1:", "input" == tokenizer.Token().Data)
fmt.Println("compare2:", strings.Compare(tokenizer.Token().Data, "input") == 0)
fmt.Println("compare3:", "" == tokenizer.Token().Data)
fmt.Println("compare4:", strings.Compare(tokenizer.Token().Data, "") == 0)
fmt.Println("compare5:", "input" == data)
fmt.Println("compare6:", strings.Compare(data, "input") == 0)
if tokenizer.Token().Data == "input" {
fmt.Println("We never reach this")
}
}
}
}
According to the docs, Data
should be a string (which the reflect.TypeOf
statement seems to show) but does not show any length, and I can’t check equality against it? Any idea what’s going on here? I’m assuming if the docs were incorrect and I was actually getting a pointer or a reference or something it would show up in the TypeOf
check, but everything just says string
.