Easy way for letter substitution (reverse complementary DNA sequence)

I want to get reverse complementary DNA sequences regardless of being capital or lower case.
For example, I can easily get reverse complementary sequences using Perl as shown below.

$sequence=“ATgcccA”;$reversecomplementray=reverse $sequence;
$reversecomplementray=~tr/ATGCatgc/TACGtacg/;

I will get TgggcAT from ATgcccA.

First I tried to get complementary DNA sequences like TAcgggT from ATgcccA.
But using replace in golang (if I replace in A, T, G, C order), I will get TTgccT, AAgccA, AAcccA, and AAgggA although I want to get TAcgggT.
I may be able to solve this problem in complex ways.
But it’s very annoying.
Is there an easy way to substitute multiple letters at the same time in golang?
Thanks in advance.
Complementary: A > T, T > A, G > C, C > G
Reverse: reverse order


Hmm. I think that I should make slices and edit them.

1 Like

See the Go strings package documentation for string functions: https://golang.org/pkg/strings/

Use the strings.Replacer type.

package main

import (
	"fmt"
	"strings"
)

var dnaComplement = strings.NewReplacer(
	"A", "T", "T", "A", "G", "C", "C", "G",
	"a", "t", "t", "a", "g", "c", "c", "g",
)

// reverse complementary DNA sequence
func rcDNA(s string) string {
	c := dnaComplement.Replace(s)
	rc := make([]byte, len(c))
	for i, j := 0, len(rc)-1; i < len(rc); i, j = i+1, j-1 {
		rc[i] = c[j]
	}
	return string(rc)
}

func main() {
	// TgggcAT from ATgcccA
	fmt.Println(rcDNA("ATgcccA"))
}

https://play.golang.org/p/IXI6PY7XUXN

1 Like

Thanks for your advice. I read some documents, but some documents are poor or I couldn’t understand them since I am poor at computer science and golang.
But I had to code with golang instead of Perl due to big data.
I should search for more documents using appropriate words from now on.
Enjoy your weekend.

Since you say you have big data, I ran a benchmark to check performance. I have revised my answer so that the rcDNA function is significantly faster. NewReplacer now runs once at the start of the program rather than each time the rcDNA function is called.

1 Like

Thank you very much. I will try your new answer. I checked the present size of my output file. Running times seem to too widely vary depending algorithms and functions.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.