Json xss unmarshal not working as expected

I have a struct which has XSS injected in it. In order to remove it, I json.Marshal it, then run json.HTMLEscape. Then I json.Unmarshal it into a new struct.

The problem is the new struct has XSS injected still.

I simply can’t figure how to remove the XSS from the struct. I can write a function to do it on the field but considering there is json.HTMLEscape and we can Unmarshal it back it I expect it to work fine, but its not.

type Person struct {
    Name string `json:"name"`
}
func main() {
    var p, p2 Person
     // p.Name has XSS
    p.Name = "<script>alert(1)</script>"
    var tBytes bytes.Buffer

    // I marshal it so I can use json.HTMLEscape
    marshalledJson, _ := json.Marshal(p)
    json.HTMLEscape(&tBytes, marshalledJson)

    // here I insert it into a new struct, sadly the p2 struct has the XSS still 
    err := json.Unmarshal(tBytes.Bytes(), &p2)
    if err != nil {
        fmt.Printf(err.Error())
    }
    fmt.Print(p2)

} 

expected outcome is p2.Name to be sanitized like &lt;script&gt;alert(1)&lt;/script&gt;

2 Likes

That what I got whe run you code in the go playground…(<script>alert(1)</script>)

2 Likes

it should be &lt;script&gt;alert(1)&lt;/script&gt; because the above does json.HTMLEscape

if you run fmt.Print(tBytes.String()) its xss is sanitized however, the p2 is not …
Its very strange to me

2 Likes

No it shouldn’t be &lt;script&gt;alert(1)&lt;/script&gt;; it should be \u003cscript\u003ealert(1)\u003c/script\u003e as stated in the documentation for json.Marshal (here quoted):

func Marshal
func Marshal(v interface{}) ([]byte, error)
Marshal returns the JSON encoding of v.

Marshal traverses the value v recursively. If an encountered value implements the Marshaler interface and is not a nil pointer, Marshal calls its MarshalJSON method to produce JSON. If no MarshalJSON method is present but the value implements encoding.TextMarshaler instead, Marshal calls its MarshalText method and encodes the result as a JSON string. The nil pointer exception is not strictly necessary but mimics a similar, necessary exception in the behavior of UnmarshalJSON.

Otherwise, Marshal uses the following type-dependent default encodings:

Boolean values encode as JSON booleans.

Floating point, integer, and Number values encode as JSON numbers.

String values encode as JSON strings coerced to valid UTF-8, replacing invalid bytes with the Unicode replacement rune. The angle brackets “<” and “>” are escaped to “\u003c” and “\u003e” to keep some browsers from misinterpreting JSON output as HTML. Ampersand “&” is also escaped to “\u0026” for the same reason. This escaping can be disabled using an Encoder that had SetEscapeHTML(false) called on it.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.