RegEx behaves differently with English vs Greek

I want to find whole words in Greek text using Golang.

Using \b as the word boundary for the regular expression pattern
works when I use English, but not Greek.

When I run the following, one occurrence of joy is found, which is correct.

But, for the Greek, nothing is returned.
If I remove the \b on each side of the Greek, two occurrences of ἀγαπη are found.

func main() {
	pattern := "\\bjoy\\b"
	text := "sing joyfully with joy"
	matcher, err := regexp.Compile(pattern)
	if err != nil {
		fmt.Println(err)
	}
	indexes := matcher.FindAllStringIndex(text, -1)
	expect := 1
	got := len(indexes)
	if expect != got {
		fmt.Printf("expected %d, got %d\n", expect, got)
	}
	pattern = "\\bἀγαπη\\b"
	text = "Ὡς ἀγαπη τὰ σκηνώματά σου. Ὡς ἀγαπητὰ τὰ σκηνώματά σου."
	matcher, err = regexp.Compile(pattern)
	if err != nil {
		fmt.Println(err)
	}
	indexes = matcher.FindAllStringIndex(text, -1)
	expect = 1
	got = len(indexes)
	if expect != got {
		fmt.Printf("expected %d, got %d\n", expect, got)
	}
}

\b is ASCII word boundary, you probably want to use \B.

\B means it is not a word boundary.
I need a word boundary.

The suggestion here by kosmo works:

pattern = "(\\s+|\\w+)ἀγαπη(\\s+|\\w+)"

Yeah, sorry, I think in a hurry I missread the description and associated the “not” with “ASCII” rather than “word boundary”…

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.