Problem with cgo performance

aotto1968 · August 14, 2020, 8:56pm

Hi, I run an benchmark on my C-Library using GO frontend… I figure out that GO has an remarkable performance lost in calling a C function: example calling my Function “MqReadD”

my C-Code (#1) consume 10ms… the GO Wrapper (#2) and the CGO wrapper (#3) consume together 110ms … -> as you see only 10% of the time is the real work and 90% is the GO/CGO overhead.

-> some ideas available to speedup GO code ?

(I already use export GODEBUG=cgocheck=0 to get best performance.)

(pprof) list ReadD
Total: 2.71s
ROUTINE ======================== MqReadD in ...NHI1/theLink/libmsgque/read_mq.c
         0       10ms (flat, ■■■)  0.37% of Total
         .          .   1204:  struct MqS * const context,
         .          .   1205:  MQ_DBL * const valP
         .          .   1206:)
         .          .   1207:{
         .          .   1208:  check_CTX(MQ_HDL_NULL_ERROR(MqC))
         .       10ms   1209:  return sReadA8(context, (union MqBufferAtomU * const) valP, MQ_DBLT);
         .          .   1210:}
         .          .   1211:
         .          .   1212:enum MqErrorE
         .          .   1213:MqReadC (
         .          .   1214:  struct MqS * const context,
ROUTINE ======================== gomsgque.ReadD..1gomsgque.MqC in /.../NHI1/theLink/gomsgque/src/gomsgque/MqC.go
         0      120ms (flat, ■■■)  4.43% of Total
         .          .   1282:}
         .          .   1283:
         .          .   1284:/// \refdoc{ReadD}
         .          .   1285:func (this *MqC) ReadD () float64 {
         .          .   1286:  hdl := this.getCTX()
         .       10ms   1287:  var val_out C.MQ_DBL
         .      110ms   1288:  var errVal C.enum_MqErrorE = C.MqReadD (hdl, &val_out)
         .          .   1289:  if (errVal > C.MQ_CONTINUE) { MqErrorC_Check(C.MQ_MNG(hdl), errVal) }
         .          .   1290:  return (float64)(val_out)
         .          .   1291:}
         .          .   1292:
         .          .   1293:/// \refdoc{ReadF}
ROUTINE ======================== gomsgque._Cfunc_MqReadD in /tmp/go-build/b001/_cgo_gotypes.go
         0      110ms (flat, ■■■)  4.06% of Total
         .          .   3028://extern _cgo_a12d8325d15e_Cfunc_MqReadC
         .          .   3029:func _cgo_a12d8325d15e_Cfunc_MqReadC(p0 _Ctype_MQ_CTX, p1 *_Ctype_MQ_CST) uint32
         .          .   3030:
         .          .   3031:func _Cfunc_MqReadD(p0 _Ctype_MQ_CTX, p1 *_Ctype_MQ_DBL) uint32 {
         .          .   3032:   defer syscall.CgocallDone()
         .       60ms   3033:   syscall.Cgocall()
         .       10ms   3034:   r := _cgo_a12d8325d15e_Cfunc_MqReadD(p0, p1)
         .          .   3035:   return r
         .       40ms   3036:}
         .          .   3037://extern _cgo_a12d8325d15e_Cfunc_MqReadD
         .          .   3038:func _cgo_a12d8325d15e_Cfunc_MqReadD(p0 _Ctype_MQ_CTX, p1 *_Ctype_MQ_DBL) uint32
         .          .   3039:
         .          .   3040:func _Cfunc_MqReadF(p0 _Ctype_MQ_CTX, p1 *_Ctype_MQ_FLT) uint32 {
         .          .   3041:   defer syscall.CgocallDone()

petrus · August 14, 2020, 10:01pm

From the expert:

C to Go calls taking a long time - is this cgo overhead or my mistake?
https://groups.google.com/d/msg/golang-nuts/B44pEq-uso8/uvL69eCxCgAJ

a plausible rule of
thumb is that a call from Go to C takes as long as ten function calls,
and calling from C to Go is worse. There are several reasons for
this, and there is certainly interest in making it faster, but it’s a
hard problem.

This unfortunately means that you should not design your program to
casually call between Go and C. Where possible you should batch calls
and you should try to build data structures entirely in one language
before passing them to the other language.

Ian

aotto1968 · August 15, 2020, 6:31am

Hi, thanks for answer → I think the core problem is the CGO wrapper and the both functions “Cgocall” and “CgocallDone”

         .          .    3031:func _Cfunc_MqReadD(p0 _Ctype_MQ_CTX, p1 *_Ctype_MQ_DBL) uint32 {
         .          .    3032:   defer syscall.CgocallDone()
         .       60ms    3033:   syscall.Cgocall()
         .       10ms    3034:   r := _cgo_a12d8325d15e_Cfunc_MqReadD(p0, p1)
         .          .    3035:   return r
         .       40ms    3036:}

→ found some source at libgo/runtime/go-cgo.c - native_client/nacl-gcc - Git at Google
the docu is:

Prepare to call from code written in Go to code written in C or
C++. This takes the current goroutine out of the Go scheduler, as
though it were making a system call. Otherwise the program can
lock up if the C code goes to sleep on a mutex or for some other
reason. This idea is to call this function, then immediately call
the C/C++ function. After the C/C++ function returns, call
syscall_cgocalldone. The usual Go code would look like
syscall.Cgocall()
defer syscall.Cgocalldone()
cfunction()

ok → it is possible that GO generate a wrapper without the “Cgocall” and “CgocallDone” ?
→ possible as switch in toplevel go code:

this would help short-running C functions without blocking problems

        .          .    1284:/// \refdoc{ReadD}
         .          .   1285:func (this *MqC) ReadD () float64 {
         .          .   1286:  hdl := this.getCTX()
         .       10ms   1287:  var val_out C.MQ_DBL
                               // noblock            -> !! NEW !!
         .      110ms   1288:  var errVal C.enum_MqErrorE = C.MqReadD (hdl, &val_out)
         .          .   1289:  if (errVal > C.MQ_CONTINUE) { MqErrorC_Check(C.MQ_MNG(hdl), errVal) }
         .          .   1290:  return (float64)(val_out)
         .          .   1291:}

system · November 13, 2020, 6:31am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.