Can't compare x/net/html Token.Data to a standard string

Hello, I was recently working on something that required me to search through some html and I ran into a strange issue. I am new to Go from other languages, so apologies if I am missing some nuance here.

The Data struct member of a x/net/html.Token is a string, but I cannot do any kind of string comparison against it?

MWE:

Go Playground version

package main

import (
   "fmt"
   "io"
   "log"
   "reflect"
   "strings"

   "golang.org/x/net/html"
)

func main() {

   dat := `<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><apm_do_not_touch><body onload="document.forms[0].submit()"><noscript> <p>    <strong> Your browser does not support JavaScript, press the Continue button once to proceed. </strong></p> </noscript> <form action="https://website.com" method="post"> <input type="hidden" name="SomeName" value="SomeValue"/><noscript> <div>  <input type="submit" value="DoStuff"/></div> </noscript> </form></body></apm_do_not_touch></html>`

   reader := strings.NewReader(dat)
   tokenizer := html.NewTokenizer(reader)

   //largely from https://drstearns.github.io/tutorials/tokenizing/
   for {
   	//get the next token type
   	tokenType := tokenizer.Next()

   	//if it's an error token, we either reached
   	//the end of the file, or the HTML was malformed
   	if tokenType == html.ErrorToken {
   		err := tokenizer.Err()
   		if err == io.EOF {
   			//end of the file, break out of the loop
   			break
   		}
   		//otherwise, there was an error tokenizing,
   		//which likely means the HTML was malformed.
   		//since this is a simple command-line utility,
   		//we can just use log.Fatalf() to report the error
   		//and exit the process with a non-zero status code
   		log.Fatalf("error tokenizing HTML: %v", tokenizer.Err())
   	}

   	//process the token according to the token type...
   	//input is a self closing token
   	if tokenType == html.SelfClosingTagToken {

   		fmt.Println(tokenizer.Token().Data)
   		fmt.Println("input")

   		data := tokenizer.Token().Data

   		//these seem to match
   		fmt.Println("t1:", reflect.TypeOf(tokenizer.Token().Data))
   		fmt.Println("t2:", reflect.TypeOf("input"))

   		//these seem to match
   		fmt.Println("len1:", len(tokenizer.Token().Data))
   		fmt.Println("len2:", len("input"))
   		fmt.Println("len3:", len(data))

   		fmt.Println("compare1:", "input" == tokenizer.Token().Data)
   		fmt.Println("compare2:", strings.Compare(tokenizer.Token().Data, "input") == 0)

   		fmt.Println("compare3:", "" == tokenizer.Token().Data)
   		fmt.Println("compare4:", strings.Compare(tokenizer.Token().Data, "") == 0)

   		fmt.Println("compare5:", "input" == data)
   		fmt.Println("compare6:", strings.Compare(data, "input") == 0)

   		if tokenizer.Token().Data == "input" {
   			fmt.Println("We never reach this")
   		}
   	}
   }
}

According to the docs, Data should be a string (which the reflect.TypeOf statement seems to show) but does not show any length, and I can’t check equality against it? Any idea what’s going on here? I’m assuming if the docs were incorrect and I was actually getting a pointer or a reference or something it would show up in the TypeOf check, but everything just says string.

You can only call Token() once per loop. The first call to tokenizer.Token().Data returns "input" but the second call to tokenizer.Token() returns a Token with empty data so all of the subsequent checks fail. You should instead use data and not keep calling tokenizer.Token().

Ahhh OK, great I see the issue now.

Here is the corrected block for completeness:

if tokenType == html.SelfClosingTagToken {
			data := tokenizer.Token().Data

			//fmt.Println(tokenizer.Token().Data)
			fmt.Println("input")
			fmt.Println(data)

			fmt.Println("len2:", len("input"))
			fmt.Println("len3:", len(data))

			fmt.Println("compare5:", "input" == data)
			fmt.Println("compare6:", strings.Compare(data, "input") == 0)

			if data == "input" {
				fmt.Println("Now we reach this!")
			}

outputs:

input
input
len2: 5
len3: 5
compare5: true
compare6: true
Now we reach this!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.