[SOLVED] String size of 20 character


(Oussama) #1

Hi,

I would like to create a structure where I have two fields, string size of 20 character

For example

type Name struct { firstName, lastName string }
I keep looking for any method where I can trim the string value of firstName and lastName to a size of 20 character, any ideas could be a great help!


(Holloway) #2

One easy way is to have those fields as private variables and offer getter and setter interface function. Example:

const (
    maxLength = 20
)

type Name struct {
    first string
    last string
}

func (n *Name) Set(firstName string, lastName string) {
    n.first = firstName
    n.last = lastName

    if len(firstName) > maxLength {
        n.first = firstName[:maxLength]
    }

    if len(lastName) > maxLength {
        n.last = lastName[:maxLength]
    }
}

func (n *Name) GetWithFirst(withFirstName bool) string {
    if withFirstName {
        return n.last + ", " + n.first
    }
    return n.last
}

Then, you can call the structure this way:

func main() {
	l := &Name{}
	l.Set("Dumitru Margareta Corneliu Leopold Blanca Karol Aeon Ignatius Raphael Maria Niketas A. Shilage", "Mihaly")
	s := l.GetWithFirst(true)
	fmt.Printf("My name is: %s\n", s)
}
// Output:
// My name is: Mihaly, Dumitru Margareta Co

(Oussama) #3

Great ! Thank you Holloway. Indeed using getter setter interface function is a very good way.
This is exactly what I need.
:slight_smile:
I will mark this solved in 24h, maybe someone have others solutions


(Johan Dahl) #4

Hi. It works but not with any characters above code 127. It can also destroy multi-byte character so you get invalid characters at the end of the string.

See this https://play.golang.org/p/uVu1egklEuL


(Holloway) #5

Thanks for pointing out! My bad for not going into runes level. Learnt something new. :sweat_smile:

Here’s a newly improved codes: https://play.golang.org/p/-vBmeU2oNWg

// Cotributed by: Chew, Kean Ho (Holloway), Johan Dahl
//
// main program is about trimming names to 20 characters
package main

import (
	"fmt"
)

const (
    maxLength = 20
)

type Name struct {
    first string
    last string
}

func (n *Name) Set(firstName string, lastName string) {
	var rs []rune

	n.first = firstName
	if len(firstName) > maxLength {
		rs = []rune(firstName)
		n.first = string(rs[:maxLength])
	}

	n.last = lastName
	if len(lastName) > maxLength {
		rs = []rune(lastName)
		n.last = string(rs[:maxLength])
	}
}

func (n *Name) GetWithFirst(withFirstName bool) string {
    if withFirstName {
        return n.last + ", " + n.first
    }
    return n.last
}

func main() {
	l := &Name{}
	l.Set("Dumitru Margareta Ἄγγελος Leopold Blanca Karol Aeon Ignatius Raphael Maria Niketas A. Shilage", "Mihaly")
	s := l.GetWithFirst(true)
	fmt.Printf("My name is: %s\n", s)
}
// Output:
// My name is: Mihaly, Dumitru Margareta Ἄγ

(Johan Dahl) #6

One more thing. You can’t use len it counts bytes. You must use

utf8.RuneCountInString(firstName)

(Oussama) #7

I learned something new too, didn’t know that there is multy-byte character, that explain the runes utility I guess


(Holloway) #8

Noted. The len(..) and rune count is a huge difference. Playground: https://play.golang.org/p/9zTJ3nB1ceq

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	s := "Ἄγγελος"
	
	x := len(s)
	y := utf8.RuneCountInString(s)

	fmt.Printf("length: x=%v y=%v \n" , x, y)
}
// Output:
// length: x=15 y=7 

Same here. Glad that I learnt something too! :smile:

Updated code is:

// Cotributed by: Chew, Kean Ho (Holloway), Johan Dahl
//
// main program is about trimming names to 20 characters
package main

import (
	"fmt"
	"unicode/utf8"
)

const (
	maxLength = 20
)

type Name struct {
	first string
	last string
}

func (n *Name) Set(firstName string, lastName string) {
	var rs []rune

	n.first = firstName
	if utf8.RuneCountInString(firstName) > maxLength {
		rs = []rune(firstName)
		n.first = string(rs[:maxLength])
	}

	n.last = lastName
	if utf8.RuneCountInString(lastName) > maxLength {
		rs = []rune(lastName)
		n.last = string(rs[:maxLength])
	}
}

func (n *Name) GetWithFirst(withFirstName bool) string {
    if withFirstName {
        return n.last + ", " + n.first
    }
    return n.last
}

func main() {
	l := &Name{}
	l.Set("Dumitru Margareta Ἄγγελος Leopold Blanca Karol Aeon Ignatius Raphael Maria Niketas A. Shilage", "Mihaly")
	s := l.GetWithFirst(true)
	fmt.Printf("My name is: %s\n", s)
}

// Output:
// My name is: Mihaly, Dumitru Margareta Ἄγ

Playground: https://play.golang.org/p/HhlSevmcC0c


(Qi Yin) #9

I think we should provide a better way to truncate strings.
I tried to compare the performance of the following two methods:

import (
	"testing"
	"unicode/utf8"
	"unsafe"
)

func SubStrA(s string, length int) string {
	if utf8.RuneCountInString(s) > length {
		rs := []rune(s)
		return string(rs[:length])
	}

	return s
}

func SubStrB(s string, length int) string {
	var size, n int
	for i := 0; i < length && n < len(s); i++ {
		_, size = utf8.DecodeRuneInString(s[n:])
		n += size
	}

	return s[:n]
}

func SubStrC(s string, length int) string {
	var size, n int
	for i := 0; i < length && n < len(s); i++ {
		_, size = utf8.DecodeRuneInString(s[n:])
		n += size
	}

	b := make([]byte, n)
	copy(b, s[:n])
	return *(*string)(unsafe.Pointer(&b))
}

var s = "Go语言是Google开发的一种静态强类型、编译型、并发型,并具有垃圾回收功能的编程语言。为了方便搜索和识别,有时会将其称为Golang。"

func BenchmarkSubStrA(b *testing.B) {
	for i := 0; i < b.N; i++ {
		SubStrA(s, 20)
	}
}

func BenchmarkSubStrB(b *testing.B) {
	for i := 0; i < b.N; i++ {
		SubStrB(s, 20)
	}
}

func BenchmarkSubStrC(b *testing.B) {
	for i := 0; i < b.N; i++ {
		SubStrC(s, 20)
	}
}
goos: darwin
goarch: amd64
BenchmarkSubStrA-8        745708              1624 ns/op             336 B/op          2 allocs/op
BenchmarkSubStrB-8       9568920               122 ns/op               0 B/op          0 allocs/op
BenchmarkSubStrC-8       7274718               157 ns/op              48 B/op          1 allocs/op
PASS
ok      command-line-arguments  4.782s

Their performance gap is huge. Whether to refer to the original string depends on your purpose, Improper use may lead to memory leakage.

SubStrC Despite the assigned substring, there will still be a performance gap of nearly ten times.


(Holloway) #10

Wow, that’s a good finding. :+1: Here’s the benchmark results from Linux amd64:

goos: linux
goarch: amd64
pkg: gosandbox
BenchmarkSubStrA-8        857563              1297 ns/op
BenchmarkSubStrB-8      10494722               114 ns/op
BenchmarkSubStrC-8       8502182               142 ns/op
PASS
ok      gosandbox       3.786s

(Oussama) #11

Amazing guys thank your for your returns especially for the Benchmark :slight_smile:


(Holloway) #12

Yeah. It’s an eye-opening experience for me too! Here’s the latest code applying Qi Yin’s solution:

// Cotributed by: Chew, Kean Ho (Holloway), Johan Dahl, Qi Yin
//
// main program is about trimming names to 20 characters
package main

import (
	"fmt"
	"unicode/utf8"
)

const (
	maxLength = 20
)

type Name struct {
	first string
	last  string
}

func (n *Name) trim(s string, length int) string {
	var size, x int

	for i := 0; i < length && x < len(s); i++ {
		_, size = utf8.DecodeRuneInString(s[x:])
		x += size
	}

	return s[:x]
}

func (n *Name) Set(firstName string, lastName string) {
	n.first = n.trim(firstName, maxLength)
	n.last = n.trim(lastName, maxLength)
}

func (n *Name) GetWithFirst(withFirstName bool) string {
	if withFirstName {
		return n.last + ", " + n.first
	}

	return n.last
}

func main() {
	l := &Name{}
	l.Set("Dumitru Margareta Ἄγγελος Leopold Blanca Karol Aeon Ignatius Raphael Maria Niketas A. Shilage", "Mihaly")
	s := l.GetWithFirst(true)
	fmt.Printf("My name is: %s\n", s)
}

// Output:
// My name is: Mihaly, Dumitru Margareta Ἄγ

(Qi Yin) #13

I got new inspiration from here, and completed the performance improvement of exstrings.Reverse, which made me feel very good.


(Holloway) #14

@thinkeridea, do you have case studies / details or examples supporting this point? I’m getting a bit nervous with byte manipulation in Go. It’s look as if we cannot use byte but runes in the future.

Byte manipulation as in manipulate stream of binary/hex data. As far as I understand, runes are multi-bytes characters (thus chopping the bytes slice is not equal to chopping by characters).


(Qi Yin) #15

This is a simple example of intercepting a string in a sliced manner. Actually, no memory copy has occurred, but just a reference is created. Often the functions in the standard library also truncate the string.

This can be simple and efficient, because no new memory is allocated, which is often very effective.

However, if the life cycle of the original string is very short and very large, and the life cycle of the obtained substring is long enough and very short, then GC will not release the original that is no longer used in the substring lifetime. String memory, which causes a memory leak.

There are such problems in most of the actual programs, but they are not so unbearable, only minor short-term problems, which can be ignored without pursuing extreme performance.

We should still pay attention to this issue and deal with it when necessary, but not too often, and extreme situations rarely occur.

Here is a Chinese introduction to strings, maybe there will be some help [string 优化误区及建议] (https://blog.thinkeridea.com/201902/go/string_ye_shi_yin_yong_lei_xing.html)

package main

import (
	"fmt"
	"reflect"
	"strings"
	"unsafe"
)

func main() {
	s := strings.Repeat("A", 2000)

	s1 := s[5:10]

	fmt.Println((*reflect.StringHeader)(unsafe.Pointer(&s)).Data)  // 824634433536
	fmt.Println((*reflect.StringHeader)(unsafe.Pointer(&s1)).Data) // 824634433541

	// Changing the s string type to [] byte does not produce duplication, but only parsing the string data stored in the s variable in memory in the way of byte
	sh := (*reflect.StringHeader)(unsafe.Pointer(&s))
	bs := *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
		Data: sh.Data,
		Len:  sh.Len,
		Cap:  sh.Len,
	}))

	// Modify one segment of data, which is referenced by s1.
	copy(bs[5:10], strings.Repeat("B", 5))

	// It is found that the data of s1 has changed.
	fmt.Println(s1) // BBBBB
}


(Holloway) #16

I see. That’s how the leak appears.

In that case, I’m still safe to use []byte manipulations as long as I’m not involved with actual runes (e.g. for utf8 characters). Otherwise, I need to upgrade it to []rune instead.

Some cases I do with []byte are I/O binary stream data processing (e.g. SPI/I2C).

Thank you!


(Qi Yin) #17

I designed a method exutf8.RuneIndexInString and exutf8.RuneIndex get byte index based on the number of characters, which accelerated the calculation of the index position of the intercepted string, And encapsulates the easier to use string truncation methods exutf8.RuneSubString and exutf8.RuneSub, Easier to find aliases in exstrings.SubString and exbytes.Sub locations.

I tested and compared the performance of various methods in exunicode/exutf8/benchmark/sub_string_test.go. The test results are as follows:

$ go test -benchmem -bench="."                                                        
goos: darwin
goarch: amd64
pkg: github.com/thinkeridea/go-extend/exunicode/exutf8/benchmark
BenchmarkSubStrRunes-8                    875361              1511 ns/op             336 B/op          2 allocs/op
BenchmarkSubStrRange-8                  11738449                96.7 ns/op             0 B/op          0 allocs/op
BenchmarkSubStrDecodeRuneInString-8     11425912               111 ns/op               0 B/op          0 allocs/op
BenchmarkSubStrRuneIndexInString-8      14508450                82.0 ns/op             0 B/op          0 allocs/op
BenchmarkSubStrRuneSubString-8          14334190                82.3 ns/op             0 B/op          0 allocs/op
PASS
ok      github.com/thinkeridea/go-extend/exunicode/exutf8/benchmark     7.447s

Although the RuneSubString method is slightly inferior to RuneIndexInStrin, it provides a more flexible and easy-to-use interface.

This is the best result I can achieve at present, and I hope it can be improved.