Centos 8 Long syscall6 times in profile compared to centos7

This is an app that does multi threaded network IO (userspace NFS client written in golang). It should be able to do about line rate data transfer on a 10Gb NIC. On Centos 7 this is the case 10GiB in 10/11 seconds, however centos 8, is about 2-3 seconds slower, ~14 seconds consistently.

Same VM specs running on the same physical host.

Project: GitHub - sile16/fbcp: FlashBlade Fast File transfer tool

Command: "

time ./fbcp -threads 16 -sizeMB 8 -profile pp.txt 172.19.0.20:/DEMO-NFS-1/10GRand |cat > /dev/null

golang 19.1

Here are the profiles for each.
Centos7
3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

File: fbcp
Type: cpu
Time: Sep 15, 2022 at 8:51am (CDT)
Duration: 11.22s, Total samples = 31.89s (284.32%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 28.68s, 89.93% of 31.89s total
Dropped 241 nodes (■■■ <= 0.16s)
Showing top 20 nodes out of 90
      flat  flat%   sum%        ■■■   ■■■%
    12.28s 38.51% 38.51%     12.28s 38.51%  runtime/internal/syscall.Syscall6
     5.19s 16.27% 54.78%      5.19s 16.27%  runtime.memclrNoHeapPointers
     4.66s 14.61% 69.39%      4.66s 14.61%  runtime.memmove
     2.68s  8.40% 77.80%      2.68s  8.40%  runtime.futex
     1.02s  3.20% 81.00%      1.02s  3.20%  runtime.epollwait
     0.64s  2.01% 83.00%      0.64s  2.01%  runtime.madvise
     0.43s  1.35% 84.35%      1.12s  3.51%  runtime.stealWork
     0.36s  1.13% 85.48%      0.36s  1.13%  runtime.(*randomEnum).next (inline)
     0.19s   0.6% 86.08%      0.20s  0.63%  runtime.casgstatus
     0.18s  0.56% 86.64%      6.14s 19.25%  runtime.findRunnable
     0.16s   0.5% 87.14%      0.21s  0.66%  runtime.lock2
     0.12s  0.38% 87.52%      9.39s 29.44%  internal/poll.(*FD).Read
     0.12s  0.38% 87.90%      0.16s   0.5%  runtime.checkTimers
     0.12s  0.38% 88.27%     12.40s 38.88%  syscall.RawSyscall6
     0.11s  0.34% 88.62%      1.22s  3.83%  runtime.notesleep
     0.11s  0.34% 88.96%      0.20s  0.63%  runtime.reentersyscall
     0.10s  0.31% 89.28%      1.16s  3.64%  runtime.netpoll
     0.08s  0.25% 89.53%      5.64s 17.69%  runtime.mallocgc
     0.07s  0.22% 89.75%      9.48s 29.73%  net.(*conn).Read
     0.06s  0.19% 89.93%      9.76s 30.61%  bufio.(*Reader).Read
(pprof) quit

Centos8
4.18.0-408.el8.x86_64 #1 SMP Mon Jul 18 17:42:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Type: cpu
Time: Sep 15, 2022 at 8:51am (CDT)
Duration: 14.32s, Total samples = 27.91s (194.87%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 25.39s, 90.97% of 27.91s total
Dropped 213 nodes (■■■ <= 0.14s)
Showing top 20 nodes out of 69
      flat  flat%   sum%        ■■■   ■■■%
    15.77s 56.50% 56.50%     15.77s 56.50%  runtime/internal/syscall.Syscall6
     3.15s 11.29% 67.79%      3.15s 11.29%  runtime.memclrNoHeapPointers
     3.14s 11.25% 79.04%      3.14s 11.25%  runtime.memmove
     1.24s  4.44% 83.48%      1.24s  4.44%  runtime.futex
     1.01s  3.62% 87.10%      1.01s  3.62%  runtime.epollwait
     0.19s  0.68% 87.78%      0.45s  1.61%  runtime.stealWork
     0.11s  0.39% 88.18%      3.43s 12.29%  runtime.findRunnable
     0.11s  0.39% 88.57%      1.15s  4.12%  runtime.netpoll
     0.11s  0.39% 88.96%     15.88s 56.90%  syscall.RawSyscall6
     0.08s  0.29% 89.25%      0.15s  0.54%  runtime.exitsyscall
     0.08s  0.29% 89.54%      3.41s 12.22%  runtime.mallocgc
     0.07s  0.25% 89.79%     10.97s 39.30%  bufio.(*Reader).Read
     0.06s  0.21% 90.00%     10.59s 37.94%  internal/poll.(*FD).Read
     0.06s  0.21% 90.22%     10.70s 38.34%  net.(*conn).Read
     0.05s  0.18% 90.40%     11.03s 39.52%  io.ReadAtLeast
     0.04s  0.14% 90.54%     10.30s 36.90%  syscall.read
     0.03s  0.11% 90.65%      0.16s  0.57%  github.com/rasky/go-xdr/xdr2.(*Encoder).encodeStruct
     0.03s  0.11% 90.76%     10.64s 38.12%  net.(*netFD).Read
     0.03s  0.11% 90.86%      3.61s 12.93%  runtime.schedule
     0.03s  0.11% 90.97%      0.52s  1.86%  runtime.stopm

Hi, @sile16, welcome to the forum.

While you may find an answer here on the Go Bridge forum, I think that your issue is technical enough that it merits posting an issue in the Go project issue tracker on GitHub. Good luck!

EDIT: After searching a bit for a similar issue, I found this and it was closed because the issue tracker is not a place for asking questions (in that case, about why the memory usage is different between kernel versions), and I suspect they may do the same thing to your question about runtime. Of course you can try asking anyway, but maybe StackOverflow or some other site the Go team has listed on their Questions page would be better.

The last thing I want to do is discourage you from asking here in this forum, but I think that because your issue seems to be specific to the Linux kernel version, you might need more exposure to your question to find someone with an environment that can replicate your scenario.