Gocb: bulk insert into couchbase using golang- entire data is not being inserted

Krishan · July 18, 2017, 7:34am

I am creating JSON Data (approx. 5000 records) in my SQL server instance and trying to Insert it into couchbase bucket using bulk insert operation in golang. The problem here is that every time I run the code after flushing the bucket, entire data is not being pushed and a random number of records (between 2000 to 3000) are being inserted only. I also made sure that all the document id’s are unique.

The code is:

package main
import (
    "database/sql"
    "log"
    "fmt"
    _ "github.com/denisenkom/go-mssqldb"
    "gopkg.in/couchbase/gocb.v1"
)

func main() {
    var (
        ID string
        JSONData string
    )

var items []gocb.BulkOp      
cluster, _ := gocb.Connect("couchbase://localhost")
bucket, _ := cluster.OpenBucket("example", "")

condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")

// Get approx 5000 Records From SQL Server in JSON format
rows, err = condb.Query("Select id, JSONData From User")
if err != nil {
    log.Fatal(err)
    err = nil
}

for rows.Next() {
    _ = rows.Scan(&ID,&JSONData)
    items = append(items, &gocb.InsertOp{Key: ID, Value: JSONData})
}

//Bulk Load JSON into Couchbase
err = bucket.Do(items)
if err != nil {
    fmt.Println("ERRROR PERFORMING BULK INSERT:", err)
}

_ = bucket.Close() 
}

Please tell me where I went wrong here.
FYI the columns ID and JSONdata in sql query contain valid key and JSON strings.

nathankerr · July 18, 2017, 8:08am

Hi,

Your code has unchecked errors. Please add error handling code and run your program again. Hopefully one of those unchecked errors will tell you what is going wrong.

Krishan · July 18, 2017, 9:44am

Hi,
I modified the code to check all the errors but still I am having same problem and there’s no error showing when I compile and run the code.

nathankerr · July 18, 2017, 10:34am

The docs show that ‘InsertOp’ has an Err field. Try checking if any of those fields are set after bucket.Do is ran.

Krishan · July 18, 2017, 10:47am

I ran a loop through the items array and printed out the Err field. It is showing the error message ‘queue overflowed’ repetatively. This means that there is a limit on the number of items that can be appended to it. How can I handle this situation? It would’ve been great if I could upload at least 10K items at a time in a single operation. guess there’s no option other than splitting the data into smaller batches and using multiple bulk inserts. If there is, please let me know

nathankerr · July 18, 2017, 11:17am

Splitting the batches up seems to be what to do.

I would do so in the rows.Next loop.

Krishan · July 18, 2017, 11:56am

So there’s no workaround to load the entire data in a single operation? Actually I followed the split and Insert approach by splitting the data into 3 batches in sql server, appended it into 3 different items arrays and used 3 bulk insert operations. This way I am able to insert complete data but the code is running slow( and it will become slower as the amount of data that needs to be inserted is increasing gradually) . I tried converting these operations into goroutines but having a hard time synchronizing them as I am new to this.

nathankerr · July 18, 2017, 12:19pm

I took a glance through the gocb code and could not figure out if the bulk operations actually sent the operations across the network in bulk or not. I do know that it does a lot of work on the client side. I just don’t know that it is any better than calling bucket.Insert in goroutines.

I would try calling bucket.Insert directly in the rows.Next loop and seeing how it compares to using bulk operations. If its similar, parallelize that.

Also, from my quick read, I don’t think that parallelizing bulk operations will help anything.

Krishan · July 18, 2017, 1:01pm

Inserttion in the loop is pretty slow as compared to the multiple bulk inserts.

nathankerr · July 18, 2017, 1:12pm

Thanks for checking.

It was hard to tell because the bulk operations have a lot of concurrent stuff going on above the client itself.

Krishan · July 18, 2017, 1:13pm

Can we specify the size an array explicitly? maybe the default size (when you dont specify any) is smaller than the actual upper limit. (I’m just taking a hunch here. have no idea about the default size anyway)

nathankerr · July 18, 2017, 1:25pm

Arrays always have an explicit size, so I’m guessing you mean slice. Yes.

var items []gocb.BulkOp

At this point items is nil. It has a length of 0 and a capacity of 0. When you append to items (e.g., items = append(items, ...)), the append function will grow the capacity when the appended values would make the length exceed the capacity. Growing the list actually means creating a larger backing array and copying the old one over to the new one.

If you know how large the list could be (e.g., the upper limit), then you can avoid the cost of growing the slice with:

items := make([]gocb.BulkOp, 0, upperLimit)

(I usually use the short declaration form for this.)

This makes a slice with a length of 0 and a capacity of upperLimit. You can continue using append as before.

Krishan · July 19, 2017, 10:42am

@nathankerr Thanks a lot for your consult. It helped a lot. Although, still I am not able to achieve the speed that i expected.

system · October 17, 2017, 10:46am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.