Utility to Insert huge amount of documents into couchbase at once

Hi,

I want to created a utility which reads a huge amount of JSON Documents (Approx 1 lakh) From a csv/flat file and bulk Inserts them into couchbase in a single iteration. Since the array buffer size is limited, currently what I have to do is divide all the data into smaller batches and push them with multiple bucket.Do() operations. This process has made the insertion very slow.

Can anyone suggest me any alternate approach for this purpose?

Why is this a problem? Can’t you just read the input from STDIN document by document? You don’t have to read everything into memory at once.

@lutzhorn, I’m a newbie in go. My aim is to create a load on Couchbase server by inserting the documents at a speed of 0.1 M writes/sec. I was planning to split data into multiple files (0.1 M documents per file) and push them. Reading the file document by document will make it really slow. plus the bulk insert operation is only able to insert approx 2100 records at once.
The files contain tab separated ID and JSON Documents.

The code that I have managed to write so far is :

package main
 
import (
    "bufio"
    "encoding/csv"
    "fmt"
    "io"
    "log"
    "os"
    "gopkg.in/couchbase/gocb.v1"
)
 
var  (    ID string   
    JSONData  string
)


func main() {

    cluster, _ := gocb.Connect("couchbase://localhost")               
	bucket, _ := cluster.OpenBucket("example", "")
    var  items []gocb.BulkOp  

    csvFile, _ := os.Open("E:\\results.txt")                     
    reader := csv.NewReader(bufio.NewReader(csvFile))
    reader.Comma = '\t'
    reader.LazyQuotes = true

    for {                                                                
        line, error := reader.Read()
        if error == io.EOF {
            break
        } else if error != nil {
            log.Fatal(error)
        }
        ID = line[0]
        JSONData = line[1]
        items = append(items, &gocb.UpsertOp{Key: ID, Value: JSONData})
    }

    InsertData(bucket, &items)                                         
    err := bucket.Close()                                                    
	if err != nil {
	   fmt.Println("ERROR CLOSING COUCHBASE CONNECTION:", err)
	   }
}

func InsertData(bucket *gocb.Bucket, item *[]gocb.BulkOp) (){
	err := bucket.Do(*item)
	if err != nil {
	   fmt.Println("ERROR PERFORMING BULK INSERT:", err)
	   }
}

As the size of the file grows up, this code becomes inefficient.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.