Clean string from html content

cinematik · January 29, 2019, 1:41pm

There is a string, for instance

s := "John Thank you."

What is the best way to clean the string from

"..."

content?

lutzhorn · January 29, 2019, 1:49pm

For Python there is the broadly used Beautiful Soup.

For Go there is a port of this called soup. The method Text is what you are looking for.

cinematik · January 29, 2019, 2:17pm

I’m I understood you right that you supposed to get content of tag? But what I want to get in result is to convert the source string into:

Thank you.

GonzaSaya · January 30, 2019, 3:52pm

Hi @cinematik
Please, check this regular expression

package main

import (
“fmt”
“regexp”
)

const sample = JohnThank you.delete this

func main() {
var re = regexp.MustCompile(<(b|B)>\b(([^<])*|(<[^b])*|(<b[^>])*)\b<\/(b|B)>)
s := re.ReplaceAllString(sample, ``)
fmt.Println(s)
}

system · April 30, 2019, 4:04pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.