qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Hanna Czenczek <hreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	hibriansong@gmail.com, eblake@redhat.com, qemu-block@nongnu.org
Subject: Re: [RESEND PATCH v5 00/13] aio: add the aio_add_sqe() io_uring API
Date: Mon, 3 Nov 2025 11:45:39 -0500	[thread overview]
Message-ID: <20251103164539.GA88691@fedora> (raw)
In-Reply-To: <aQiuenOKNLPzYtGb@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 7527 bytes --]

On Mon, Nov 03, 2025 at 02:30:34PM +0100, Kevin Wolf wrote:
> Am 03.11.2025 um 11:40 hat Kevin Wolf geschrieben:
> > Am 30.10.2025 um 19:11 hat Kevin Wolf geschrieben:
> > > Am 30.10.2025 um 16:21 hat Stefan Hajnoczi geschrieben:
> > > > v5:
> > > > - Explain how fdmon-io_uring.c differs from other fdmon implementations
> > > >   in commit message [Kevin]
> > > > - Move test-nested-aio-poll aio_get_g_source() removal into commit that touches test case [Kevin]
> > > > - Avoid g_source_add_poll() use-after-free in fdmon_poll_update() [Kevin]
> > > > - Avoid duplication in fdmon_epoll_gsource_dispatch(), use fdmon_epoll_wait() [Kevin]
> > > > - Drop unnecessary revents checks in fdmon_poll_gsource_dispatch() [Kevin]
> > > > - Mention in commit message that fdmon-io_uring.c is the new default [Kevin]
> > > > - Add comments explaining how to clean up resources in error paths [Kevin]
> > > > - Indicate error in return value from function with Error *errp arg [Kevin]
> > > > - Add patch to unindent fdmon_io_uring_destroy() [Kevin]
> > > > - Add patch to introduce FDMonOps->dispatch() callback [Kevin]
> > > > - Drop patch with hacky BH optimization for fdmon-io_uring.c [Kevin]
> > > > - Replace cqe_handler_bh with FDMonOps->dispatch() [Kevin]
> > > > - Rename AioHandler->cqe_handler field to ->internal_cqe_handler [Kevin]
> > > > - Consolidate fdmon-io_uring.c trace-events changes into this commit
> > > > - Reduce #ifdef HAVE_IO_URING_PREP_WRITEV2 code duplication [Kevin]
> > > 
> > > The changes look good to me.
> > > 
> > > However, the test cases are still failing. I just tried to see where
> > > test-aio is stuck, and while I looked for a backtrace first, I noticed
> > > that just attaching gdb to the process and immediately detaching again
> > > makes the test unstuck. Very strange.
> > > 
> > > This is the backtrace, maybe a bit unsurpring:
> > > 
> > > (gdb) bt
> > > #0  0x00007ffff7e6fec6 in __io_uring_submit () from /lib64/liburing.so.2
> > > #1  0x00005555556f4394 in fdmon_io_uring_wait (ctx=0x555556409950, ready_list=0x7fffffffcda0, timeout=749993088) at ../util/fdmon-io_uring.c:410
> > > #2  0x00005555556ed29f in aio_poll (ctx=0x555556409950, blocking=true) at ../util/aio-posix.c:699
> > > #3  0x0000555555681547 in test_timer_schedule () at ../tests/unit/test-aio.c:413
> > > #4  0x00007ffff6f30e7e in test_case_run (tc=0x55555640d340, test_run_name=0x55555640de10 "/aio/timer/schedule", path=<optimized out>) at ../glib/gtestutils.c:3115
> > > #5  g_test_run_suite_internal (suite=suite@entry=0x5555558696d0, path=path@entry=0x0) at ../glib/gtestutils.c:3210
> > > #6  0x00007ffff6f30df3 in g_test_run_suite_internal (suite=suite@entry=0x555555867480, path=path@entry=0x0) at ../glib/gtestutils.c:3229
> > > #7  0x00007ffff6f30df3 in g_test_run_suite_internal (suite=suite@entry=0x555555867720, path=path@entry=0x0) at ../glib/gtestutils.c:3229
> > > #8  0x00007ffff6f313aa in g_test_run_suite (suite=suite@entry=0x555555867720) at ../glib/gtestutils.c:3310
> > > #9  0x00007ffff6f31440 in g_test_run () at ../glib/gtestutils.c:2379
> > > #10 g_test_run () at ../glib/gtestutils.c:2366
> > > #11 0x000055555567e204 in main (argc=1, argv=0x7fffffffd488) at ../tests/unit/test-aio.c:872
> > > 
> > > And running it under strace shows that we're indeed hanging in the
> > > syscall:
> > > 
> > > write(1, "# Start of timer tests\n", 23) = 23
> > > eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 9
> > > io_uring_enter(7, 1, 0, 0, NULL, 8)     = 1
> > > clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 0x7ffc239bec80) = 0
> > > io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8
> > > 
> > > Of course, if I start the test without strace and then attach strace to
> > > the running process, that gets it unstuck like attaching gdb (not very
> > > surprising, I guess, it's both just ptrace).
> > > 
> > > Finally I tried Ctrl-C while having strace logging to a file, and now
> > > the io_uring_enter() returns 1 (rather than EINTR or 0 or whatever):
> > > 
> > > io_uring_enter(7, 1, 1, IORING_ENTER_GETEVENTS, NULL, 8) = 1
> > > --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
> > > +++ killed by SIGINT +++
> > > 
> > > Not sure what to make of this.
> > > 
> > > I think you already said you run the same kernel version, but just to be
> > > sure, I'm running 6.17.5-200.fc42.x86_64.
> > 
> > I'm at the point where I'm bisecting compiler flags...
> > 
> > I have seen three different outcomes from test-aio:
> > 
> > 1. It hangs. This is what I saw in my normal clang build. This configure
> >    line seems to be enough to trigger it:
> >    ../configure '--target-list=x86_64-softmmu' '--cc=clang' '--cxx=clang++'
> > 
> > 2. An assertion failure. I haven't seen this in the actual QEMU tree
> >    with clang. With gcc, it seems to happen if you use -O0:
> >    ../configure '--target-list=x86_64-softmmu' '--enable-debug'
> > 
> >    Outside of the QEMU tree with a manual Makefile, I saw this behaviour
> >    with clang and -fstack-protector-strong, but without
> >    -ftrivial-auto-var-init=zero. Add the latter turns it into the hang.
> > 
> > 3. It just passes. This is what I saw in my default gcc build without
> >    --enable-debug. The test also passes with --disable-stack-protector
> >    added to both configure lines in 1 and 2.
> > 
> > Not sure yet where the flags make the difference, but I guess it does
> > hint at something going wrong on the stack.
> 
> Ok, that was quite some debugging, but I think I have it. The problem is
> add_timeout_sqe():
> 
> static void add_timeout_sqe(AioContext *ctx, int64_t ns)
> {
>     struct io_uring_sqe *sqe;
>     ts = (struct __kernel_timespec) {
>         .tv_sec = ns / NANOSECONDS_PER_SECOND,
>         .tv_nsec = ns % NANOSECONDS_PER_SECOND,
>     };
> 
>     sqe = get_sqe(ctx);
>     io_uring_prep_timeout(sqe, &ts, 1, 0);
>     io_uring_sqe_set_data(sqe, NULL);
> }
> 
> What io_uring_prep_timeout() does is that it just stores the ts pointer
> in the SQE, the timeout is never copied anywhere. Obviously, by the time
> that we submit the SQE, ts has been out of scope for a long time, so the
> kernel reads random data as a timeout.
> 
> # bpftrace -e 'kfunc:io_timeout { printf("%s: io_timeout %lld s + %lld ns\n", comm, ((struct io_timeout_data *)args.req->async_data)->ts.tv_sec, ((struct io_timeout_data *)args.req->async_data)->ts.tv_nsec ) }'
> Attaching 1 probe...
> test-aio: io_timeout 0 s + 140736377549872 ns
> 
> >>> hex(140736377549872)
> '0x7fffbdca7430'
> 
> That looked a bit suspicious for a timeout. :-)
> 
> After fixing this, we still have the problem that io_uring_enter() can
> return early without failing with EINTR when something like a signal
> arrives. This means that a blocking aio_poll(true) can actually return
> without any progress.  Not sure if it matters in practice, but it can
> make test cases fail.
> 
> Not completely sure when this happens, though. When running the aio-test
> under strace, kill -CONT makes it return early and fail the assertion,
> but without strace, I can't seem to reproduce the problem at the moment.
> Attaching strace or gdb to the running process that is waiting for the
> timeout also makes it return early and fail the assertion.

Hi Kevin,
Thank you for going through the effort of debugging this!

I'll see if I can track down the issue with io_uring_enter() returning
early.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-11-03 16:47 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30 15:21 [RESEND PATCH v5 00/13] aio: add the aio_add_sqe() io_uring API Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 01/13] aio-posix: fix race between io_uring CQE and AioHandler deletion Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 02/13] aio-posix: keep polling enabled with fdmon-io_uring.c Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 03/13] tests/unit: skip test-nested-aio-poll with io_uring Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 04/13] aio-posix: integrate fdmon into glib event loop Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 05/13] aio: remove aio_context_use_g_source() Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 06/13] aio: free AioContext when aio_context_new() fails Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 07/13] aio: add errp argument to aio_context_setup() Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 08/13] aio-posix: gracefully handle io_uring_queue_init() failure Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 09/13] aio-posix: unindent fdmon_io_uring_destroy() Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 10/13] aio-posix: add fdmon_ops->dispatch() Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 11/13] aio-posix: add aio_add_sqe() API for user-defined io_uring requests Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 12/13] block/io_uring: use aio_add_sqe() Stefan Hajnoczi
2025-10-30 15:21 ` [RESEND PATCH v5 13/13] block/io_uring: use non-vectored read/write when possible Stefan Hajnoczi
2025-10-30 18:11 ` [RESEND PATCH v5 00/13] aio: add the aio_add_sqe() io_uring API Kevin Wolf
2025-11-03 10:40   ` Kevin Wolf
2025-11-03 13:30     ` Kevin Wolf
2025-11-03 16:45       ` Stefan Hajnoczi [this message]
2025-11-04  2:08       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251103164539.GA88691@fedora \
    --to=stefanha@redhat.com \
    --cc=eblake@redhat.com \
    --cc=hibriansong@gmail.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).