All of lore.kernel.org
 help / color / mirror / Atom feed
From: Klaus Jensen <its@irrelevant.dk>
To: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Cc: qemu-devel@nongnu.org, kbusch@kernel.org,
	Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [PATCH v4] hw/nvme: Use ioeventfd to handle doorbell updates
Date: Tue, 26 Jul 2022 14:08:09 +0200	[thread overview]
Message-ID: <Yt/ZKVHjSTTt08MV@apples> (raw)
In-Reply-To: <Yt/Qs5PelXjX8E1v@apples>

[-- Attachment #1: Type: text/plain, Size: 8238 bytes --]

On Jul 26 13:32, Klaus Jensen wrote:
> On Jul 26 13:24, Klaus Jensen wrote:
> > On Jul 26 12:09, Klaus Jensen wrote:
> > > On Jul 26 11:19, Klaus Jensen wrote:
> > > > On Jul 26 15:55, Jinhao Fan wrote:
> > > > > at 3:41 PM, Klaus Jensen <its@irrelevant.dk> wrote:
> > > > > 
> > > > > > On Jul 26 15:35, Jinhao Fan wrote:
> > > > > >> at 4:55 AM, Klaus Jensen <its@irrelevant.dk> wrote:
> > > > > >> 
> > > > > >>> We have a regression following this patch that we need to address.
> > > > > >>> 
> > > > > >>> With this patch, issuing a reset on the device (`nvme reset /dev/nvme0`
> > > > > >>> will do the trick) causes QEMU to hog my host cpu at 100%.
> > > > > >>> 
> > > > > >>> I'm still not sure what causes this. The trace output is a bit
> > > > > >>> inconclusive still.
> > > > > >>> 
> > > > > >>> I'll keep looking into it.
> > > > > >> 
> > > > > >> I cannot reproduce this bug. I just start the VM and used `nvme reset
> > > > > >> /dev/nvme0`. Did you do anything before the reset?
> > > > > > 
> > > > > > Interesting and thanks for checking! Looks like a kernel issue then!
> > > > > > 
> > > > > > I remember that I'm using a dev branch (nvme-v5.20) of the kernel and
> > > > > > reverting to a stock OS kernel did not produce the bug.
> > > > > 
> > > > > I’m using 5.19-rc4 which I pulled from linux-next on Jul 1. It works ok on
> > > > > my machine.
> > > > 
> > > > Interesting. I can reproduce on 5.19-rc4 from torvalds tree. Can you
> > > > drop your qemu command line here?
> > > > 
> > > > This is mine.
> > > > 
> > > > /home/kbj/work/src/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
> > > >   -nodefaults \
> > > >   -display "none" \
> > > >   -machine "q35,accel=kvm,kernel-irqchip=split" \
> > > >   -cpu "host" \
> > > >   -smp "4" \
> > > >   -m "8G" \
> > > >   -device "intel-iommu" \
> > > >   -netdev "user,id=net0,hostfwd=tcp::2222-:22" \
> > > >   -device "virtio-net-pci,netdev=net0" \
> > > >   -device "virtio-rng-pci" \
> > > >   -drive "id=boot,file=/home/kbj/work/vol/machines/img/nvme.qcow2,format=qcow2,if=virtio,discard=unmap,media=disk,read-only=no" \
> > > >   -device "pcie-root-port,id=pcie_root_port1,chassis=1,slot=0" \
> > > >   -device "nvme,id=nvme0,serial=deadbeef,bus=pcie_root_port1,mdts=7" \
> > > >   -drive "id=null,if=none,file=null-co://,file.read-zeroes=on,format=raw" \
> > > >   -device "nvme-ns,id=nvm-1,drive=nvm-1,bus=nvme0,nsid=1,drive=null,logical_block_size=4096,physical_block_size=4096" \
> > > >   -pidfile "/home/kbj/work/vol/machines/run/null/pidfile" \
> > > >   -kernel "/home/kbj/work/src/kernel/linux/arch/x86_64/boot/bzImage" \
> > > >   -append "root=/dev/vda1 console=ttyS0,115200 audit=0 intel_iommu=on" \
> > > >   -virtfs "local,path=/home/kbj/work/src/kernel/linux,security_model=none,readonly=on,mount_tag=kernel_dir" \
> > > >   -serial "mon:stdio" \
> > > >   -d "guest_errors" \
> > > >   -D "/home/kbj/work/vol/machines/log/null/qemu.log" \
> > > >   -trace "pci_nvme*"
> > > 
> > > Alright. It was *some* config issue with my kernel. Reverted to a
> > > defconfig + requirements and the issue went away.
> > > 
> > 
> > And it went away because I didn't include iommu support in that kernel (and its
> > not enabled by default on the stock OS kernel).
> > 
> > > I'll try to track down what happended, but doesnt look like qemu is at
> > > fault here.
> > 
> > OK. So.
> > 
> > I can continue to reproduce this if the machine has a virtual intel iommu
> > enabled. And it only happens when this commit is applied.
> > 
> > I even backported this patch (and the shadow doorbell patch) to v7.0 and v6.2
> > (i.e. no SRIOV or CC logic changes that could be buggy) and it still exhibits
> > this behavior. Sometimes QEMU coredumps on poweroff and I managed to grab one:
> > 
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  nvme_process_sq (opaque=0x556329708110) at ../hw/nvme/ctrl.c:5720
> > 5720   NvmeCQueue *cq = n->cq[sq->cqid];
> > [Current thread is 1 (Thread 0x7f7363553cc0 (LWP 2554896))]
> > (gdb) bt
> > #0  nvme_process_sq (opaque=0x556329708110) at ../hw/nvme/ctrl.c:5720
> > #1  0x0000556326e82e28 in nvme_sq_notifier (e=0x556329708148) at ../hw/nvme/ctrl.c:3993
> > #2  0x000055632738396a in aio_dispatch_handler (ctx=0x5563291c3160, node=0x55632a228b60) at ../util/aio-posix.c:329
> > #3  0x0000556327383b22 in aio_dispatch_handlers (ctx=0x5563291c3160) at ../util/aio-posix.c:372
> > #4  0x0000556327383b78 in aio_dispatch (ctx=0x5563291c3160) at ../util/aio-posix.c:382
> > #5  0x000055632739d748 in aio_ctx_dispatch (source=0x5563291c3160, callback=0x0, user_data=0x0) at ../util/async.c:311
> > #6  0x00007f7369398163 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
> > #7  0x00005563273af279 in glib_pollfds_poll () at ../util/main-loop.c:232
> > #8  0x00005563273af2f6 in os_host_main_loop_wait (timeout=0x1dbe22c0) at ../util/main-loop.c:255
> > #9  0x00005563273af404 in main_loop_wait (nonblocking=0x0) at ../util/main-loop.c:531
> > #10 0x00005563270714d9 in qemu_main_loop () at ../softmmu/runstate.c:726
> > #11 0x0000556326c7ea46 in main (argc=0x2e, argv=0x7ffc6977f198, envp=0x7ffc6977f310) at ../softmmu/main.c:50
> > 
> > At this point, there should not be any CQ/SQs (I detached the device from the
> > kernel driver which deletes all queues and bound it to vfio-pci instead), but
> > somehow a stale notifier is called on poweroff and the queue is bogus, causing
> > the segfault.
> > 
> > (gdb) p cq->cqid
> > $2 = 0x7880
> > 
> > My guess would be that we are not cleaning up the notifier properly. Currently
> > we do this
> > 
> >     if (cq->ioeventfd_enabled) {
> >         memory_region_del_eventfd(&n->iomem,
> >                                   0x1000 + offset, 4, false, 0, &cq->notifier);
> >         event_notifier_cleanup(&cq->notifier);
> >     }
> > 
> > 
> > Any ioeventfd experts that has some insights into what we are doing
> > wrong here? Something we need to flush? I tried with a test_and_clear on
> > the eventfd but that didnt do the trick.
> > 
> > I think we'd need to revert this until we can track down what is going wrong.
> 
> One more thing - I now also triggered the coredump with just a `modprobe
> vfio-pci` following a `nvme reset /dev/nvme0`.
> 
> Similar backtrace.

Alright. Forget about the iommu, that was just a coincidence.

This patch seems to fix it. I guess it is the
event_notifier_set_handler(..., NULL) that does the trick, but I'd like
to understand why ;)


diff --git i/hw/nvme/ctrl.c w/hw/nvme/ctrl.c
index 533ad14e7a61..3bc3c6bfbe78 100644
--- i/hw/nvme/ctrl.c
+++ w/hw/nvme/ctrl.c
@@ -4238,7 +4238,9 @@ static void nvme_cq_notifier(EventNotifier *e)
     NvmeCQueue *cq = container_of(e, NvmeCQueue, notifier);
     NvmeCtrl *n = cq->ctrl;
 
-    event_notifier_test_and_clear(&cq->notifier);
+    if (!event_notifier_test_and_clear(e)) {
+        return;
+    }
 
     nvme_update_cq_head(cq);
 
@@ -4275,7 +4277,9 @@ static void nvme_sq_notifier(EventNotifier *e)
 {
     NvmeSQueue *sq = container_of(e, NvmeSQueue, notifier);
 
-    event_notifier_test_and_clear(&sq->notifier);
+    if (!event_notifier_test_and_clear(e)) {
+        return;
+    }
 
     nvme_process_sq(sq);
 }
@@ -4307,6 +4311,8 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
     if (sq->ioeventfd_enabled) {
         memory_region_del_eventfd(&n->iomem,
                                   0x1000 + offset, 4, false, 0, &sq->notifier);
+        event_notifier_set_handler(&sq->notifier, NULL);
+        nvme_sq_notifier(&sq->notifier);
         event_notifier_cleanup(&sq->notifier);
     }
     g_free(sq->io_req);
@@ -4697,6 +4703,8 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
     if (cq->ioeventfd_enabled) {
         memory_region_del_eventfd(&n->iomem,
                                   0x1000 + offset, 4, false, 0, &cq->notifier);
+        event_notifier_set_handler(&cq->notifier, NULL);
+        nvme_cq_notifier(&cq->notifier);
         event_notifier_cleanup(&cq->notifier);
     }
     if (msix_enabled(&n->parent_obj)) {

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2022-07-26 12:09 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-05 14:24 [PATCH v4] hw/nvme: Use ioeventfd to handle doorbell updates Jinhao Fan
2022-07-05 17:11 ` Klaus Jensen
2022-07-05 18:43   ` Keith Busch
2022-07-06 11:34     ` Jinhao Fan
2022-07-07  5:51       ` Klaus Jensen
2022-07-07  8:50         ` Jinhao Fan
2022-07-06 10:57   ` Jinhao Fan
2022-07-09  3:06 ` Jinhao Fan
2022-07-12 12:23   ` Klaus Jensen
2022-07-14  5:35     ` Klaus Jensen
2022-07-25 20:55 ` Klaus Jensen
2022-07-26  7:35   ` Jinhao Fan
2022-07-26  7:41     ` Klaus Jensen
2022-07-26  7:55       ` Jinhao Fan
2022-07-26  9:19         ` Klaus Jensen
2022-07-26 10:09           ` Klaus Jensen
2022-07-26 11:24             ` Klaus Jensen
2022-07-26 11:32               ` Klaus Jensen
2022-07-26 12:08                 ` Klaus Jensen [this message]
2022-07-27  7:06                   ` Klaus Jensen
2022-07-27  8:16                     ` Jinhao Fan
2022-07-27  8:21                       ` Klaus Jensen
2022-07-27 15:09                         ` Jinhao Fan
2022-07-26 12:35 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yt/ZKVHjSTTt08MV@apples \
    --to=its@irrelevant.dk \
    --cc=fanjinhao21s@ict.ac.cn \
    --cc=hreitz@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.