A colleague and I run a large project that has to control and receive data from a special-purpose PCI-express card, running custom firmware. This is a data acquisition system for high-speed x-ray and gamma-ray detectors, arrays of superconducting sensors (just for context). I can’t say whether Go was the optimal choice for the project–my colleague still argues that Rust would have been better–but I know it’s worked extremely well since we launched this in 2017 (as a replacement for a C++ monstrosity).
We talk to the PCIe card by opening/closing and reading/writing device-special files provided (obviously) by a device driver. There are control registers for configuration and a scatter-gather DMA for transferring the high-speed data (typically 20 to 200 MB/second, depending on one’s instrument configuration) to the computer RAM.
I find that Go 1.16, 1.17, 1.18, and 1.19 all run our program just fine. When I build the program with Go 1.20, however, the program hangs. The build succeeds, and the configuration steps (appear to) work correctly at run time, setting up the scatter-gather DMA cycle. When we try to read from the DMA buffer the first time, the program hangs. And this happens only in Go 1.20!
I’m afraid I cannot provide a minimum reproducible example, owing to the fact that you’d need our specific hardware (running our specific firmware) and the corresponding device drivers. I understand I can’t given enough information to solve the problem.
Still, maybe someone can offer ideas? Is there something special I should know about Go 1.20 that might help me track down the problem? I’ve read the 1.20 release notes a dozen times, but maybe I’m missing the significance of the key point in there?
Some system facts: Ubuntu 22.04, Go 1.20.3, 16 GB RAM. (The same problem has also been noted on a different PC running Ubuntu 20.04.)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
$ go version
go version go1.20.3 linux/amd64
$ free
total used free shared buff/cache available
Mem: 16331116 6773036 173288 50804 9384792 9178968
Swap: 2097148 39936 2057212
$ lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.3 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation Z97 Chipset LPC Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
01:00.0 Unassigned class [ff00]: Altera Corporation Device 0004 (rev 01)
02:00.0 VGA compatible controller: NVIDIA Corporation TU117GL [T400 4GB] (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 10fa (rev a1)
04:00.0 Ethernet controller: Qualcomm Atheros Killer E220x Gigabit Ethernet Controller (rev 13)
The Altera (pci 01:00.0) is the PCIe device in question.
I have tried a few steps that seemed potentially relevant:
- removing the deprecated
syscall
package, replacing it withgolang.org/x/sys/unix
- calling a function such as
C.posix_memalign(...)
directly from Go, versus calling a handwritten cgo wrapper function that in turn callsposix_memalign
. - calling
C.read(...)
directly versus callingunix.Read(fd, buffer)
withbuffer
being the result of aC.GoBytes(...)
call on a previously allocated C pointer.
They all leave the Go 1.16-1.19 versions working, and the 1.20 version hanging.
For now, the workaround is to panic when the user builds with Go 1.20 and tries to use this particular data source. That hardly seems like a long-term solution, though.