I’ve the below 2 strings, that actually means the same:
GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO
And
Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent
I’ve more than 1000 sets like this, which is panic to do them manually, I tried to do them as:
package main
import (
"fmt"
"strings"
)
func main() {
var str1 = "Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent"
var str2 = "GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO"
cnt := 0
for _, i := range strings.Fields(str1) {
for _, j := range strings.Fields(str2) {
if strings.ToLower(i) == strings.ToLower(j) {
cnt += 1
}
}
}
fmt.Printf("str1 is: %d length, and str2 is: %d length, they have; %d common words.", len(str1), len(str2), cnt)
}
But the match is very low, I got:
str1 is: 197 length, and str2 is: 358 length, they have; 29 common words.
I tried also using Levenshtein_distance as:
// Levenshtein Distance in Golang
package main
import "fmt"
func levenshtein(str1, str2 []rune) int {
s1len := len(str1)
s2len := len(str2)
column := make([]int, len(str1)+1)
for y := 1; y <= s1len; y++ {
column[y] = y
}
for x := 1; x <= s2len; x++ {
column[0] = x
lastkey := x - 1
for y := 1; y <= s1len; y++ {
oldkey := column[y]
var incr int
if str1[y-1] != str2[x-1] {
incr = 1
}
column[y] = minimum(column[y]+1, column[y-1]+1, lastkey+incr)
lastkey = oldkey
}
}
return column[s1len]
}
func minimum(a, b, c int) int {
if a < b {
if a < c {
return a
}
} else {
if b < c {
return b
}
}
return c
}
func main(){
var str1 = []rune("Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent")
var str2 = []rune("GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO")
fmt.Println("Distance between str1 and str2:",levenshtein(str1,str2))
}
But the distance between them looks to be very long, I got:
Distance between str1 and str2: 304
Any idea how can i improve this?