I am creating JSON Data (approx. 5000 records) in my SQL server instance and trying to Insert it into couchbase bucket using bulk insert operation in golang. The problem here is that every time I run the code after flushing the bucket, entire data is not being pushed and a random number of records (between 2000 to 3000) are being inserted only. I also made sure that all the document id’s are unique.
The code is:
package main
import (
"database/sql"
"log"
"fmt"
_ "github.com/denisenkom/go-mssqldb"
"gopkg.in/couchbase/gocb.v1"
)
func main() {
var (
ID string
JSONData string
)
var items []gocb.BulkOp
cluster, _ := gocb.Connect("couchbase://localhost")
bucket, _ := cluster.OpenBucket("example", "")
condb, _ := sql.Open("mssql", "server=.\\SQLEXPRESS;port=62587; user id=<id>;password=<pwd>;")
// Get approx 5000 Records From SQL Server in JSON format
rows, err = condb.Query("Select id, JSONData From User")
if err != nil {
log.Fatal(err)
err = nil
}
for rows.Next() {
_ = rows.Scan(&ID,&JSONData)
items = append(items, &gocb.InsertOp{Key: ID, Value: JSONData})
}
//Bulk Load JSON into Couchbase
err = bucket.Do(items)
if err != nil {
fmt.Println("ERRROR PERFORMING BULK INSERT:", err)
}
_ = bucket.Close()
}
Please tell me where I went wrong here.
FYI the columns ID and JSONdata in sql query contain valid key and JSON strings.
Your code has unchecked errors. Please add error handling code and run your program again. Hopefully one of those unchecked errors will tell you what is going wrong.
I ran a loop through the items array and printed out the Err field. It is showing the error message ‘queue overflowed’ repetatively. This means that there is a limit on the number of items that can be appended to it. How can I handle this situation? It would’ve been great if I could upload at least 10K items at a time in a single operation. guess there’s no option other than splitting the data into smaller batches and using multiple bulk inserts. If there is, please let me know
So there’s no workaround to load the entire data in a single operation? Actually I followed the split and Insert approach by splitting the data into 3 batches in sql server, appended it into 3 different items arrays and used 3 bulk insert operations. This way I am able to insert complete data but the code is running slow( and it will become slower as the amount of data that needs to be inserted is increasing gradually) . I tried converting these operations into goroutines but having a hard time synchronizing them as I am new to this.
I took a glance through the gocb code and could not figure out if the bulk operations actually sent the operations across the network in bulk or not. I do know that it does a lot of work on the client side. I just don’t know that it is any better than calling bucket.Insert in goroutines.
I would try calling bucket.Insert directly in the rows.Next loop and seeing how it compares to using bulk operations. If its similar, parallelize that.
Also, from my quick read, I don’t think that parallelizing bulk operations will help anything.
Can we specify the size an array explicitly? maybe the default size (when you dont specify any) is smaller than the actual upper limit. (I’m just taking a hunch here. have no idea about the default size anyway)
Arrays always have an explicit size, so I’m guessing you mean slice. Yes.
var items []gocb.BulkOp
At this point items is nil. It has a length of 0 and a capacity of 0. When you append to items (e.g., items = append(items, ...)), the append function will grow the capacity when the appended values would make the length exceed the capacity. Growing the list actually means creating a larger backing array and copying the old one over to the new one.
If you know how large the list could be (e.g., the upper limit), then you can avoid the cost of growing the slice with:
items := make([]gocb.BulkOp, 0, upperLimit)
(I usually use the short declaration form for this.)
This makes a slice with a length of 0 and a capacity of upperLimit. You can continue using append as before.