unix.Poll() help

I am currently in the process of porting blktrace userspace program into Golang.

Everything works fine except for poll().


From the blktrace source code linked above, poll() is used to poll the per-cpu buffers from relayfs endpoints (/sys/kernel/debug/block/DEVICE/trace_CPU#_).

The issue I’m having with my implementation is that poll just times out and returns no POLLIN event during normal operation. I am using the same timeout as blktrace (500). I have also run btrace alongside my example code, blktrace poll working as intended and my implementation is getting no POLLIN events.

package main

import (

func main() {
        //change path to your trace0 location
        path := "/sys/kernel/debug/block/vda1/trace0"
        timeout := 500

        f, err := syscall.Open(path, syscall.O_RDONLY|syscall.O_NONBLOCK, 0666)
        if err != nil {
                log.Fatalf("failed to open %q: %v (%#[2]v)", path, err)
        defer syscall.Close(f)

        start := time.Now()
        fds := []unix.PollFd{{Fd: int32(f), Events: unix.POLLIN}}

        for true {
                n, err := unix.Poll(fds, timeout)
                if fds[0].Revents & unix.POLLIN == unix.POLLIN {
                        fmt.Printf("n=%d err=%v delay=%v flags=%016b (POLLIN=%t)\n",
                                n, err, time.Since(start), fds[0].Revents, fds[0].Revents&unix.POLLIN != 0)

I changed the code to use syscall.Open due to os.Open adding the fd to epoll (found out during strace debugging), I thought that may have been the issue but it seems not.

Regarding the ‘no POLLIN event during normal operation’ quote, I have noticed that when I have been running btrace alongside my example code, that when I exit btrace I will get POLLIN events.

To note, I have verified the rest of the codebase works. I have read from the buffers using a for loop with read() enclosed, but causes high cpu usage. I have tried tail and inotify solutions, but relayfs endpoints are not seekable so tail wont work with them and inotify does not work for sysfs.

I have also tested the example code against other pollable files blktrace uses (dropped, msg) and both provide POLLIN events as intended. These are debugfs files instead of relay channels though.

Any ideas how I could get this to work, or ideas for another way of polling?

Thanks in advance!

If I am not wrong, you’ve answered your question already. Poll delivers event to one process only: if you run your Go replacement, you can’t run the original blktrace or it will steal the Poll events.

Maybe someone who knows Poll semantics can confirm, but some similar systemcalls like Accept work the same way (only one process Accepts).

To clarify, I was running blktrace alongside the example code, with and without strace to confirm the example code was working, which it does in a way as it does receive pollin events when blktrace exits and stops polling.

In the real application, blktrace is not running. I initiate tracing at the device level without using the blktrac userspace program from within my Go application and then setup Poll() on the trace0 file. Nothing else is reading from, nor Poll()'ing that file that I’m aware of, as the blktrace userspace program is not running alongside my real application. Infact, this is impossible as blktrace will error out if the device is already tracing as BLKTRACESETUP will fail. This is where I’m getting stuck, as in this scenario I should be getting POLLIN events, but I’m not. If I read() within a for loop I get the data, no EOF’s etc which makes me believe nothing else is stealing reads from the FD, but this is causing high cpu load for reading just a single trace file so not a usable solution.

I just went through my strace output files again, and noticed epoll is still called on the FD, seems I somehow missed that or looked at the wrong strace output (I had a C poll() test as well).

348164 openat(AT_FDCWD, "/sys/kernel/debug/block/vda1/trace0", O_RDONLY|O_NONBLOCK <unfinished ...>
348164 <... openat resumed> )           = 9

348155 epoll_ctl(4, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=707980592, u64=139750353857840}}) = 0
348158 fcntl(9, F_GETFL)                = 0 (flags O_RDONLY)
348158 fcntl(9, F_SETFL, O_RDONLY|O_NONBLOCK) = 0

348155 fcntl(9, F_GETFL)                = 0x1 (flags O_WRONLY)
348155 fcntl(9, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
348155 fcntl(9, F_GETFL)                = 0x801 (flags O_WRONLY|O_NONBLOCK)
348155 fcntl(9, F_SETFL, O_WRONLY)      = 0

348162 dup2(9, 2)                       = 2

348158 read(9,  <unfinished ...>
348158 <... read resumed> 0xc42121e200, 512) = -1 EAGAIN (Resource temporarily unavailable)

348276 read(9,  <unfinished ...>
348276 <... read resumed> "", 512)      = 0
348276 epoll_ctl(4, EPOLL_CTL_DEL, 9, {0, {u32=0, u64=0}}) = 0
348276 close(9 <unfinished ...>
348276 <... close resumed> )            = 0

Seems that epoll could still be causing the issue. I am not calling EPOLL_CTL, nor the read()'s on the FD, so it must be the Go runtime. Any ideas on how to stop this behaviour as I’m already opening via syscall.Open()?

I guess you could switch your syscall.Open to unix.Open, but I see no reason why the runtime should poll your fd, at least from the code you pasted above… Maybe you do anything else over that fd in code you didn’t paste?

I am currently opening, closing and reading the file via direct C calls to bypass the Go runtime.

From digging further into the strace files, I’ve noticed a pipe2() call.

399585 openat(AT_FDCWD, "/sys/kernel/debug/block/vda1/trace0", O_RDONLY|O_NONBLOCK) = 9
...other output here
399581 pipe2([8, 9], O_CLOEXEC)         = 0
...other output here
399581 epoll_ctl(4, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=4262047024, u64=140320843607344}}) = 0

9 being the FD for my open trace0 file being used at the write end.

From looking further into the strace files for blktrace, there is a sizeable number of poll timeouts, but it does also receive some POLLIN events. I was seeing this as normal behaviour as relayfs waits for a sub-buffer to fill before writing to the trace file, and I was simulating disk usage to fill those buffers up.

However, digging deeper into the blktrace code, a forceRead variable I overlooked is set for piped output and this is checked alongside the revents in the poll loop. So basically, blktrace is reading the trace file even if there is no POLLIN event. It seems this behaviour was throwing me off, after implementing this forceRead into my code I can “poll” the file for read without constantly high cpu usage.

It seems weird that Poll() is even used in this manner, but it does allow me to poll the trace files without high cpu usage so it works. It does cause issues with buffer padding through, relayfs should be removing the padding on read() interface but now it isn’t, most likely because I’m now read()'ing before a full sub-buffer is ready but I’m working around that.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.