Parse a String using delimiters

Hello and greetings,

I’m a new Gopher and I’ve started migrating some Python code to Go, but I stuck at a dead end. Sorry to provide some .py lines over here, but I got some doubts about the best (fastest) way to do that in Go.

Functions explanation:
Executing the function parsertoken("_My input.string", " _,.", 2) will result “input”.
Parsercount(“Go=-rocks!”, " =-") will result 2.

def parsertoken(istring, idelimiters, iposition):
    """
    Return a specific token of a given input string,
    considering its position and the provided delimiters

    :param istring: raw input string
    :param idelimiteres: delimiters to split the tokens
    :param iposition: position of the token
    :return: token
    """
    	vlist=''.join([s if s not in idelimiters else ' ' for s in istring]).split()
    	return vlist[vposition]

def parsercount(istring, idelimiters):
    """
    Return the number of tokens at the input string
    considering the delimiters provided

    :param istring: raw input string
    :param idelimiteres: delimiters to split the tokens
    :return: a list with all the tokens found
    """
    	vlist=''.join([s if s not in idelimiters else ' ' for s in istring]).split()
    	return len(vlist)-1

Given I really care about speed, in my Go implementation, I’m thinking to change the former API, mainly because to get multiple tokens from a string, I have to split the string every single time.

Cheers

The strings package should be all you need.

As for checking the speed/memory allocation of your implementation, I suggest you use the -bench=. and -benchmem flags when running your tests (go test ./... -bench=. -benchmem).

If you haven’t written a benchmark in Go yet, here’s an example:

  • Create a file called parser.go;
  • Create another file called parser_test.go. This file should live side by side with parser.go. Also, notice the _test suffix, that’s obligatory.
  • In parser.go, put your code there, something like:
package myparser

func ParseToken(str string) string {
  // do the stuff
  return theTokenParsed
}
  • And then, in you parser_test.go, write the following:
package myparser

import (
   "testing"
)

// The Benchmark prefix is obligatory
func BenchmarkMyAwesomeTokenParser(b *testing.B) {
  exStr := "abc_def|123"

  for i := 0; i < b.N; i++ {
     ParseToken(exStr)
  }
}
  • All good, now Go will run your function as many times as it’s needed to collect info about how you’re dealing with the parser;
  • Run: go test ./... -bench=. -benchmem
  • Check the results, refactor and repeat.

I hope that helps.

Hey @Alfred,

Unless you are trying to accomplish something else, this should be all you need to rewrite your code from Python to Go.

You should look into the strings.Split functions and as for counting results, just use the builtin len() function.

package main

import (
	"fmt"
	"strings"
)

func main() {
	// Split functions: https://golang.org/pkg/strings/#Split.

	// parsertoken.
	//
	// Split a string on a `,` and return position 0.
	fmt.Println(strings.Split("Hello, World", ",")[0])

	// parsercount.
	//
	// Split a string on a `,` and return how many results
	// were returned.
	fmt.Println(len(strings.Split("Hello, World", ",")))
}

Edit: I just noticed you were actually splitting on multiple delimiters, so in that case, you could use something like the following for parsertoken:

func parsertoken(input, delims string, pos int) string {
	// Change all delims in string to spaces.
	for _, r := range delims {
		input = strings.Replace(input, string(r), " ", -1)
	}
	// Split the input on spaces.
	return strings.Split(input, " ")[pos]
}

And as for parsercount, you could also use the same technique and write it like so:

func parsercount(input, delims string) int {
	// Change all delims in string to spaces.
	for _, r := range delims {
		input = strings.Replace(input, string(r), " ", -1)
	}
	// Split the input on spaces and return the number of items.
	return len(strings.Split(input, " "))
}

Edit 2: I just realized you might not want to always be always also splitting on spaces like in the function I posted, but I’m sure you get the gist of it, since it would be easy to fix using those types of functions lol.

So anyway yeah, after my last edit, I decided to quickly just write this as a different example, so here’s something else you could use, which is fast and has barely any allocations in cases where you will re-use the same delimiters:

package main

import "fmt"

func parserToken(input string, delims map[rune]struct{}, pos int) string {
	splits := []string{}

	prev := 0
	for pos, r1 := range input {
		if _, ok := delims[r1]; ok {
			splits = append(splits, input[prev:pos])
			prev = pos + 1
		}
	}
	splits = append(splits, input[prev:])

	return splits[pos]
}

func main() {
	input := "_My input.string"
	delims := " _,."

	delimsMap := make(map[rune]struct{})
	for _, r := range delims {
		delimsMap[r] = struct{}{}
	}
	fmt.Println(parserToken(input, delimsMap, 2))
}

Benchmarks: 2000000 492 ns/op 112 B/op 3 allocs/op

Although even with my first example, if you wanted to still always split on spaces, this is still faster with less allocs per op anyway.

Benchmarks from first example:
2000000 486 ns/op 144 B/op 9 allocs/op

Thank you all !

@Benjamin, your latest code rocks !

It’s faster than a Dlang counterpart and it’s as fast as Rust. I liked the way you did parserToken.

Cheers.

Cool, glad to help!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.