Generating 5million records in loop and inserting these records to couchbase db using golang to check performance

Hello All,

Am new to GO doing POC for my project .Can anyone please help me with the sample program to generate 5million records in loop and inserting these records to couchbase db using golang to check performance .

Any sample program to run and check the performance would be very helpful due to timeline constraint .

Do you have anythin already you can show use? You will need:

  • a Couchbase DB installation
  • a Couchbase DB client library for Go
  • a simple Go program to connect to the DB and generate and insert records

What exactly do you want to measure? The Go part will not be the problem. If anything affects performance, it is how fast the DB can insert records and how fast or slow your network connection is.

2 Likes

Benchmarking databases or performance testing is not an easy task, first you should have full understanding of the database/s. You need to define correct schemes/tables/collections/indexes/analysers/engines in the given db and then you can run your tests. If you don’t have strong knowledge on the dbs, I recommend you to make research, check existing benchmark in your use case and then consult someone who knows better.

If it is a fun project, start from somewhere, do your tests, try to increase performance, decrease resource usages, and then if you think this is all you can do, ask for help.

i already have completed all these steps . Below is my program . I have strong knowledge on database side but am new to GOLANG ,writting program for testing is taking more time for me than defined timeline for my project .Could you please help me to modify my code how can i insert 5lakhs record by configuring 5 threads , each thread inserts 1lakh records parallelly

package main

import (
    "fmt"
    "gopkg.in/couchbase/gocb.v1"
    "strconv"
    "github.com/zenthangplus/goccm"
    "time"
    "runtime"
)

var  (    
    bucket *gocb.Bucket
)
type Person struct {
	FirstName, LastName string
}

func main(){
    cluster, err := gocb.Connect("couchbase://ip") //Connects to the cluster
    if err != nil {
        fmt.Println(err.Error())
        return
    }
    cluster.Authenticate(gocb.PasswordAuthenticator{
        Username: "",
        Password: "",
    })
    fmt.Println("Cluster:%v",cluster)
        bucket, err = cluster.OpenBucket("", "") //Connects to the bucket
        if err != nil {
            fmt.Println(err.Error())
            return
        }
    fmt.Println("Bucket:%v",bucket)

    // Limit 3 goroutines to run concurrently.
    c := goccm.New(3)
        dt := time.Now()
    fmt.Println("Start Current date and time is: ", dt.String())
    fmt.Println(runtime.NumCPU())
    for i := 1; i <= 500000; i++ {
        fmt.Println("Starting new thread... %v",i )
        person := Person{FirstName: "Syeda", LastName: "Anjum"}
        test := "Go_Demo_"+strconv.Itoa(i)    
        // This function have to call before any goroutine
        c.Wait()
        go createDocument(test, &person)
        fmt.Println("Completed thread... %v",i)
        c.Done()
    }
    fmt.Println(runtime.NumCPU())
    // This function have to call to ensure all goroutines have finished after close the main program.
    c.WaitAllDone()
    fmt.Println("End Current date and time is: ", dt.String())
    fmt.Println(runtime.NumCPU())
} 

func createDocument(documentId string, person *Person) {
	fmt.Println("Upserting a full document...")
	_, error := bucket.Upsert(documentId, person, 0)
	if error != nil {
		fmt.Println(error.Error())
		return
	}
	fmt.Println("After Upsert fetching the document")
}

i need to check how much time golang takes to perform bulk insert and get operartion from to couchbase .How it is more efficient than other programming languages and how spawning threads i can implement through GO and check the performance as i have tested in c++ like below

Number of Requests per thread : 100000
Total Number of threads       : 5
WRITE PROPORTION              : 50%
READ  PROPORTION              : 50%
THREAD_ID 3 TOTAL WRITE : 50000, TOTAL READ : 50000, TOTAL OPERATION TIME : 30.13 S, AVG_LATENCY = 0.000301 S, OPS_PER_SECOND = 3319.12 
THREAD_ID 2 TOTAL WRITE : 50000, TOTAL READ : 50000, TOTAL OPERATION TIME : 30.22 S, AVG_LATENCY = 0.000302 S, OPS_PER_SECOND = 3309.02 
THREAD_ID 1 TOTAL WRITE : 50000, TOTAL READ : 50000, TOTAL OPERATION TIME : 30.25 S, AVG_LATENCY = 0.000303 S, OPS_PER_SECOND = 3305.39 
THREAD_ID 0 TOTAL WRITE : 50000, TOTAL READ : 50000, TOTAL OPERATION TIME : 30.29 S, AVG_LATENCY = 0.000303 S, OPS_PER_SECOND = 3301.78 
THREAD_ID 4 TOTAL WRITE : 50000, TOTAL READ : 50000, TOTAL OPERATION TIME : 30.31 S, AVG_LATENCY = 0.000303 S, OPS_PER_SECOND = 3298.83

Here are an example worker/thread implementation in go. You can’t control OS thread in go, but you can limit go scheduler to use only specified number go routine on the given jobs.

We need first limited number go routine to reach 5 concurrency as you desire:

for i := 0; i < 5; i++ {
	// create jobs per worker/thread
	workerJobs := make(chan interface{}, jobsCount)
	jobs = append(jobs, workerJobs)
	go worker(i, workerJobs, results)
}

and then send jobs to all created workers

for _, job := range jobs {
	for j := 1; j <= jobsCount; j++ {
		job <- j
	}
	close(job)
}

Wait all workers to complete their jobs

    // wait all workers to complete their jobs
	for a := 1; a <= jobsCount*len(jobs); a++ {
		//wait all the workers to return results
		<-results
	}

play link

1 Like

Thank you so much for your timely help ,can you please calrify few doubts?

  1. Am calling createDocument function as goroutine , shall i call it as normal function ?
  2. where do i need to call my function for createDocument when i use the code provided by you ?

Change the jobs from workerJobs := make(chan interface{}, jobsCount) to workerJobs := make(chan Person, jobsCount)

And call createDocument in worker:

func worker(id int, jobs <-chan Person, results chan<- interface{}) {
	// do your thread measurements here
	fmt.Printf("worker %d started \n", id)
	for person := range jobs {
        createDocument("randomdoc", person)
		results <- job
	}
	fmt.Printf("worker %d completed \n", id)
}
1 Like

modified my code as suggested ,but am stuck on this below error. Tried declaring j as struct but couldn’t get to use it later for for operation

command-line-arguments

./goprocs.go:54:8: cannot use j (type int) as type Person in send

package main

import (
    "fmt"
    "gopkg.in/couchbase/gocb.v1"
)   
    
var  (    
    bucket *gocb.Bucket
)   
type Person struct {
    FirstName, LastName string
}

func main() {
	// if you have more than 5 core, make it 5 to use 5 os thread
	//runtime.GOMAXPROCS(5)
	jobsCount := 50
    cluster, err := gocb.Connect("couchbase://") //Connects to the cluster
    if err != nil {
        fmt.Println(err.Error())
        return
    }
    cluster.Authenticate(gocb.PasswordAuthenticator{
        Username: "",
        Password: "",
    })
    fmt.Println("Cluster:%v",cluster)
        bucket, err = cluster.OpenBucket("", "") //Connects to the bucket
        if err != nil {
            fmt.Println(err.Error())
            return
        }
    fmt.Println("Bucket:%v",bucket)


    person := Person{FirstName: "Syeda", LastName: "Anjum"}
    var jobs []chan Person
    results := make(chan interface{}, jobsCount)

    //jobs := make(chan int, jobsCount)
    //results := make(chan int, jobsCount)

    fmt.Println("value of results: %v",&results)

	for i := 0; i < 5; i++ {
		// create jobs per worker/thread
		workerJobs := make(chan Person, jobsCount)
		jobs = append(jobs, workerJobs)
		go worker(i, workerJobs, results)
	}
	for _, job := range jobs {
		for j:= 1; j <= jobsCount; j++ {
			job <- j
		}
		close(job)
	}
	
	// wait all workers to complete their jobs
	for a := 1; a <= jobsCount*len(jobs); a++ {
		//wait all the workers to return results
		<-results
	}
	fmt.Println("finished")

}
func worker(id int, jobs <-chan Person, results chan<- interface{}) {
	// do your thread measurements here
	fmt.Printf("worker %d started \n", id)
	for person := range jobs {
        createDocument("randomdoc", &person)
		results <- person
	}
	fmt.Printf("worker %d completed \n", id)
}
func createDocument(documentId string, person *Person) {
    fmt.Println("Upserting a full document...")
    _, error := bucket.Upsert(documentId, person, 0)
    if error != nil {
        fmt.Println(error.Error())
        return
    }
    fmt.Println("After Upsert fetching the document")
}

must be

var jobs chan Person

Convert this:

 for _, job := range jobs {
 		for j := 1; j <= jobsCount; j++ {
 			job <- j
 		}
 		close(job)
 	}

to this:

for _, job := range jobs {
		for j := 1; j <= jobsCount; j++ {
			job <- Person{FirstName: "Syeda", LastName: "Anjum"}
		}
		close(job)
	}
1 Like
  1. variable job defined in main() is not being picked in function worker ,used the same function worker given by you . i edited it to jobs it worked ,but only one document inserted in COUCHBASE

command-line-arguments

./goprocs.go:63:14: undefined: job

  1. Didn’t understand what this below job does?

     for _, job := range jobs {
     for j := 1; j <= jobsCount; j++ {
     	job <- Person{FirstName: "Syeda", LastName: "Anjum"}
     }
     close(job)
    }
    

Could you please check the ## comments mentioned in code ?
Below is the complete code

package main

import (
    "fmt"
    "gopkg.in/couchbase/gocb.v1"
)   
    
var  (    
    bucket *gocb.Bucket
)   
type Person struct {
    FirstName, LastName string
}

func main() {
	// if you have more than 5 core, make it 5 to use 5 os thread runtime.GOMAXPROCS(5)
	jobsCount := 500
    cluster, err := gocb.Connect("couchbase://10.10.216.53") //Connects to the cluster
    if err != nil {
        fmt.Println(err.Error())
        return
    }
    cluster.Authenticate(gocb.PasswordAuthenticator{
        Username: "golang",
        Password: "mavenir",
    })
    fmt.Println("Cluster:%v",cluster)
        bucket, err = cluster.OpenBucket("golang", "") //Connects to the bucket
        if err != nil {
            fmt.Println(err.Error())
            return
        }
    fmt.Println("Bucket:%v",bucket)

    var jobs []chan Person
    results := make(chan interface{}, jobsCount)

	for i := 0; i < 5; i++ {
		// create jobs per worker/thread
		workerJobs := make(chan Person, jobsCount)
		jobs = append(jobs, workerJobs)
		go worker(i, workerJobs, results)    ########### jobs should be passed right instead of workerJobs? i tried passing jobs got error ./goprocs.go:43:18: cannot use jobs (type []chan Person) as type <-chan Person in argument to worker

	}
	for _, job := range jobs {
		for j := 1; j <= jobsCount; j++ {
			job <- Person{FirstName: "Syeda", LastName: "Anjum"}
		}
		close(job)
	}
// wait all workers to complete their jobs
	for a := 1; a <= jobsCount*len(jobs); a++ {
		//wait all the workers to return results
		<-results
	}
	fmt.Println("finished")

}

func worker(id int, jobs <-chan Person, results chan<- interface{}) {
	// do your thread measurements here
	fmt.Printf("worker %d started \n", id)
	for person := range jobs {
        createDocument("randomdoc", &person)
		results <- jobs
	}
	fmt.Printf("worker %d completed \n", id)
}
func createDocument(documentId string, person *Person) int {
    //fmt.Println("Upserting a full document...")
    _, error := bucket.Upsert(documentId, person, 0)
    if error != nil {
        fmt.Println(error.Error())
        return -1
    }
    return 1
}

I removed all the codes related to couch base:

https://play.golang.org/p/XKUyquqMSep

########### jobs should be passed right instead of workerJobs? i tried passing jobs got error

no, you should pass workerJobs because you want to run exactly same amount of jobs per worker/thread

1 Like

Thank you so much for your extreme help , Am able to insert records with multiple threads configured . Must appreciate your timely response for solving the problem :slight_smile: Thanks again

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.