Finding the match percentage between 2 statement

hyousef · October 20, 2020, 7:18pm

I’ve the below 2 strings, that actually means the same:

GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO

And

Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent

I’ve more than 1000 sets like this, which is panic to do them manually, I tried to do them as:

package main

import (
	"fmt"
	"strings"
)

func main() {
	var str1 = "Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent"
	var str2 = "GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO"

	cnt := 0
	for _, i := range strings.Fields(str1) {
		for _, j := range strings.Fields(str2) {
			if strings.ToLower(i) == strings.ToLower(j) {
				cnt += 1
			}
		}
	}
	fmt.Printf("str1 is: %d length, and str2 is: %d length, they have; %d common words.", len(str1), len(str2), cnt)
}

But the match is very low, I got:

str1 is: 197 length, and str2 is: 358 length, they have; 29 common words.

I tried also using Levenshtein_distance as:


// Levenshtein Distance in Golang
package main
import "fmt"
 
func levenshtein(str1, str2 []rune) int {
    s1len := len(str1)
    s2len := len(str2)
    column := make([]int, len(str1)+1)
 
    for y := 1; y <= s1len; y++ {
        column[y] = y
    }
    for x := 1; x <= s2len; x++ {
        column[0] = x
        lastkey := x - 1
        for y := 1; y <= s1len; y++ {
            oldkey := column[y]
            var incr int
            if str1[y-1] != str2[x-1] {
                incr = 1
            }
 
            column[y] = minimum(column[y]+1, column[y-1]+1, lastkey+incr)
            lastkey = oldkey
        }
    }
    return column[s1len]
}
 
func minimum(a, b, c int) int {
    if a < b {
        if a < c {
            return a
        }
    } else {
        if b < c {
            return b
        }
    }
    return c
}
 
func main(){
    var str1 = []rune("Neoprene Rubber Chemical Resistant Gloves, Size: 7; Length: 32 cm; Standard: BS EN 388; Resistant to wide range of Chemicals such as Ethylene Oxide. Make: Polyco, Model: Duraprene III or Equivalent")
    var str2 = []rune("GLOVES: LENGTH: 32 CM MATERIAL: NEOPRENE RUBBER FREE FLOW TEXT: RESISTANT TO WIDE RANGE OF GLOVES, TYPE: CHEMICAL RESISTANT, SIZE: 7, MATERIAL: NEOPRENE RUBBER, STANDARD: BS EN 388/BS EN 374, FFT: RESISTANT TO WIDE RANGE OF CHEMICALS SUCH AS ETHYLENE OXIDE IDEAL FOR LONG TERM HEAVY WORK IN CHEMICAL ENVIRONMENT MANUFACTURER REFERENCES: ORIGINAL_MNFR: POLYCO")
    fmt.Println("Distance between str1 and str2:",levenshtein(str1,str2))
}

But the distance between them looks to be very long, I got:

Distance between str1 and str2: 304

Any idea how can i improve this?

ermanimer · October 20, 2020, 7:33pm

I’d define a struct and write a custom parser.

hyousef · October 20, 2020, 8:46pm

Any example?

ermanimer · October 21, 2020, 4:43am

You can check any yaml, toml or json parser’s code. It is a simple task. For example put an integer field Size(int) in your struct, look for “size: 7” in your string and set size to 7. It will be easier to match.

system · January 19, 2021, 4:43am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.