Simplifying a bit for clarity here: Let’s say I want a function that extracts non-word characters from a string, and use a regexp for that. Could look like this:
func essence(s string) string {
re := regexp.MustCompile(`\W`)
n := strings.ToLower(re.ReplaceAllString(s, ""))
return n
}
Is my understanding correct, that re will be compiled again and again each time the function is called? That would (at least for more complex regexps) be expensive.
Where would good developers define re?
In main()?
At package level (i.e. before main()?
Does the concept of closures come handy for this? (Not experienced in that.)
I know, there is more than one way to do it, but what is elegant and best maintainable?
Thanks for any smart hints.
First off: I always think it’s important to benchmark things to see just how slow they are (and Go has great tooling to make it easy and this is a big part of Go culture IMO). So let’s benchmark it:
package main
import (
"regexp"
"strings"
"testing"
)
var n string
func BenchmarkMyFunction(b *testing.B) {
for i := 0; i < b.N; i++ {
re := regexp.MustCompile(`\W`)
n = strings.ToLower(re.ReplaceAllString(" the string ", ""))
}
}
func BenchmarkSingleCompile(b *testing.B) {
b.StopTimer()
re := regexp.MustCompile(`\W`)
b.StartTimer()
for i := 0; i < b.N; i++ {
n = strings.ToLower(re.ReplaceAllString(" the string ", ""))
}
}
Which produces this:
cpu: Apple M4
BenchmarkMyFunction-10 2152874 527.1 ns/op 935 B/op 16 allocs/op
BenchmarkSingleCompile-10 4599074 257.4 ns/op 56 B/op 4 allocs/op
So - it’s slower to just keep compiling the regular expression, but my laptop can crank it out in 527 ns per op. That’s 0.000527 milliseconds. Do things add up and it it important to write performant code? Yes. But it is likely you will have other, larger fish to fry in terms of performance. Unless you are calling this in a tight loop, it’s unlikely to matter much.
That said, I think this is a situation where a private global variable is fine. Even idiomatic if you look at the stdlib and other heavy-hitters from the ecosystem. Here are some examples:
So - my opinion is go with something like this:
// findNonWord is a regex to find non-word characters.
var findNonWord = regexp.MustCompile(`\W`)
// essence returns a lower-case version of `s` with all
// non-word characters removed.
func essence(s string) string {
n := strings.ToLower(findNonWord.ReplaceAllString(s, ""))
return n
}
1 Like
Thank you very much; also for the testing / benchmarking bycatch.
I know that, even with a more complex regex and many calls to my function it will not make a serious difference in overall performance, but as prefer to be consistent / clean, I will avoid compiling the regexp again an again and make the regexp global within the package.
At the moment I am wondering if it would make sense to make such a regexp a (package) constant…