[PATCH v2 0/3] iothread and irqfd support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] iothread and irqfd support
@ 2022-08-26 15:12 Jinhao Fan
  2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: its, kbusch, stefanha, Jinhao Fan

This patch series adds support for using a seperate iothread for NVMe
IO emulation, which brings the potential of applying polling. The
first two patches implements support for irqfd, which solves thread
safety problems for interrupt emulation outside the main loop thread.

Changes since v1:
 - Avoid duplicate initilization of cq timer

Jinhao Fan (3):
  hw/nvme: support irq(de)assertion with eventfd
  hw/nvme: use KVM irqfd when available
  hw/nvme: add iothread support

 hw/nvme/ctrl.c       | 328 +++++++++++++++++++++++++++++++++++++++----
 hw/nvme/ns.c         |  21 ++-
 hw/nvme/nvme.h       |  12 +-
 hw/nvme/trace-events |   3 +
 4 files changed, 335 insertions(+), 29 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd
  2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
@ 2022-08-26 15:12 ` Jinhao Fan
  2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
  2022-08-26 15:12 ` [PATCH v2 3/3] hw/nvme: add iothread support Jinhao Fan
  2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: its, kbusch, stefanha, Jinhao Fan, Klaus Jensen, open list:nvme

When the new option 'irq-eventfd' is turned on, the IO emulation code
signals an eventfd when it want to (de)assert an irq. The main loop
eventfd handler does the actual irq (de)assertion.  This paves the way
for iothread support since QEMU's interrupt emulation is not thread
safe.

Asserting and deasseting irq with eventfd has some performance
implications. For small queue depth it increases request latency but
for large queue depth it effectively coalesces irqs.

Comparision (KIOPS):

QD            1   4  16  64
QEMU         38 123 210 329
irq-eventfd  32 106 240 364

Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/ctrl.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------
 hw/nvme/nvme.h |   3 ++
 2 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..51792f3955 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -526,34 +526,57 @@ static void nvme_irq_check(NvmeCtrl *n)
     }
 }
 
+static void nvme_irq_do_assert(NvmeCtrl *n, NvmeCQueue *cq)
+{
+    if (msix_enabled(&(n->parent_obj))) {
+        trace_pci_nvme_irq_msix(cq->vector);
+        msix_notify(&(n->parent_obj), cq->vector);
+    } else {
+        trace_pci_nvme_irq_pin();
+        assert(cq->vector < 32);
+        n->irq_status |= 1 << cq->vector;
+        nvme_irq_check(n);
+    }
+}
+
 static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
 {
     if (cq->irq_enabled) {
-        if (msix_enabled(&(n->parent_obj))) {
-            trace_pci_nvme_irq_msix(cq->vector);
-            msix_notify(&(n->parent_obj), cq->vector);
+        if (cq->assert_notifier.initialized) {
+            event_notifier_set(&cq->assert_notifier);
         } else {
-            trace_pci_nvme_irq_pin();
-            assert(cq->vector < 32);
-            n->irq_status |= 1 << cq->vector;
-            nvme_irq_check(n);
+            nvme_irq_do_assert(n, cq);
         }
     } else {
         trace_pci_nvme_irq_masked();
     }
 }
 
+static void nvme_irq_do_deassert(NvmeCtrl *n, NvmeCQueue *cq)
+{
+    if (msix_enabled(&(n->parent_obj))) {
+        return;
+    } else {
+        assert(cq->vector < 32);
+        if (!n->cq_pending) {
+            n->irq_status &= ~(1 << cq->vector);
+        }
+        nvme_irq_check(n);
+    }
+}
+
 static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
 {
     if (cq->irq_enabled) {
-        if (msix_enabled(&(n->parent_obj))) {
-            return;
+        if (cq->deassert_notifier.initialized) {
+            /*
+             * The deassert notifier will only be initilized when MSI-X is NOT
+             * in use. Therefore no need to worry about extra eventfd syscall
+             * for pin-based interrupts.
+             */
+            event_notifier_set(&cq->deassert_notifier);
         } else {
-            assert(cq->vector < 32);
-            if (!n->cq_pending) {
-                n->irq_status &= ~(1 << cq->vector);
-            }
-            nvme_irq_check(n);
+            nvme_irq_do_deassert(n, cq);
         }
     }
 }
@@ -1338,6 +1361,50 @@ static void nvme_update_cq_head(NvmeCQueue *cq)
     trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
 }
 
+static void nvme_assert_notifier_read(EventNotifier *e)
+{
+    NvmeCQueue *cq = container_of(e, NvmeCQueue, assert_notifier);
+    if (event_notifier_test_and_clear(e)) {
+        nvme_irq_do_assert(cq->ctrl, cq);
+    }
+}
+
+static void nvme_deassert_notifier_read(EventNotifier *e)
+{
+    NvmeCQueue *cq = container_of(e, NvmeCQueue, deassert_notifier);
+    if (event_notifier_test_and_clear(e)) {
+        nvme_irq_do_deassert(cq->ctrl, cq);
+    }
+}
+
+static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
+{
+    int ret;
+
+    ret = event_notifier_init(&cq->assert_notifier, 0);
+    if (ret < 0) {
+        return;
+    }
+
+    event_notifier_set_handler(&cq->assert_notifier,
+                                nvme_assert_notifier_read);
+
+    if (!msix_enabled(&n->parent_obj)) {
+        ret = event_notifier_init(&cq->deassert_notifier, 0);
+        if (ret < 0) {
+            event_notifier_set_handler(&cq->assert_notifier, NULL);
+            event_notifier_cleanup(&cq->assert_notifier);
+
+            return;
+        }
+
+        event_notifier_set_handler(&cq->deassert_notifier,
+                                   nvme_deassert_notifier_read);
+    }
+
+    return;
+}
+
 static void nvme_post_cqes(void *opaque)
 {
     NvmeCQueue *cq = opaque;
@@ -1377,8 +1444,10 @@ static void nvme_post_cqes(void *opaque)
         QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
     }
     if (cq->tail != cq->head) {
-        if (cq->irq_enabled && !pending) {
-            n->cq_pending++;
+        if (cq->irq_enabled) {
+            if (!pending) {
+                n->cq_pending++;
+            }
         }
 
         nvme_irq_assert(n, cq);
@@ -4705,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
         event_notifier_set_handler(&cq->notifier, NULL);
         event_notifier_cleanup(&cq->notifier);
     }
+    if (cq->assert_notifier.initialized) {
+        event_notifier_set_handler(&cq->assert_notifier, NULL);
+        event_notifier_cleanup(&cq->assert_notifier);
+    }
+    if (cq->deassert_notifier.initialized) {
+        event_notifier_set_handler(&cq->deassert_notifier, NULL);
+        event_notifier_cleanup(&cq->deassert_notifier);
+    }
     if (msix_enabled(&n->parent_obj)) {
         msix_vector_unuse(&n->parent_obj, cq->vector);
     }
@@ -4734,7 +4811,7 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req)
         n->cq_pending--;
     }
 
-    nvme_irq_deassert(n, cq);
+    nvme_irq_do_deassert(n, cq);
     trace_pci_nvme_del_cq(qid);
     nvme_free_cq(cq, n);
     return NVME_SUCCESS;
@@ -4772,6 +4849,14 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
     }
     n->cq[cqid] = cq;
     cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+
+    /*
+     * Only enable irq eventfd for IO queues since we always emulate admin
+     * queue in main loop thread.
+     */
+    if (cqid && n->params.irq_eventfd) {
+        nvme_init_irq_notifier(n, cq);
+    }
 }
 
 static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
@@ -7671,6 +7756,7 @@ static Property nvme_props[] = {
     DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false),
     DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false),
     DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, false),
+    DEFINE_PROP_BOOL("irq-eventfd", NvmeCtrl, params.irq_eventfd, false),
     DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0),
     DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl,
                      params.auto_transition_zones, true),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 79f5c281c2..4850d3e965 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -398,6 +398,8 @@ typedef struct NvmeCQueue {
     uint64_t    ei_addr;
     QEMUTimer   *timer;
     EventNotifier notifier;
+    EventNotifier assert_notifier;
+    EventNotifier deassert_notifier;
     bool        ioeventfd_enabled;
     QTAILQ_HEAD(, NvmeSQueue) sq_list;
     QTAILQ_HEAD(, NvmeRequest) req_list;
@@ -422,6 +424,7 @@ typedef struct NvmeParams {
     bool     auto_transition_zones;
     bool     legacy_cmb;
     bool     ioeventfd;
+    bool     irq_eventfd;
     uint8_t  sriov_max_vfs;
     uint16_t sriov_vq_flexible;
     uint16_t sriov_vi_flexible;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
  2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
  2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
@ 2022-08-26 15:12 ` Jinhao Fan
  2022-08-26 15:34   ` Keith Busch
  2022-08-26 15:12 ` [PATCH v2 3/3] hw/nvme: add iothread support Jinhao Fan
  2 siblings, 1 reply; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: its, kbusch, stefanha, Jinhao Fan, Klaus Jensen, open list:nvme

Use KVM's irqfd to send interrupts when possible. This approach is
thread safe. Moreover, it does not have the inter-thread communication
overhead of plain event notifiers since handler callback are called
in the same system call as irqfd write.

Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/ctrl.c       | 145 ++++++++++++++++++++++++++++++++++++++++++-
 hw/nvme/nvme.h       |   3 +
 hw/nvme/trace-events |   3 +
 3 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 51792f3955..396f3f0cdd 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -192,6 +192,7 @@
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/hostmem.h"
 #include "hw/pci/msix.h"
@@ -1377,8 +1378,115 @@ static void nvme_deassert_notifier_read(EventNotifier *e)
     }
 }
 
+static int nvme_kvm_vector_use(NvmeCtrl *n, NvmeCQueue *cq, uint32_t vector)
+{
+    KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
+    int ret;
+
+    ret = kvm_irqchip_add_msi_route(&c, vector, &n->parent_obj);
+    if (ret < 0) {
+        return ret;
+    }
+
+    kvm_irqchip_commit_route_changes(&c);
+
+    cq->virq = ret;
+
+    return 0;
+}
+
+static int nvme_kvm_vector_unmask(PCIDevice *pci_dev, unsigned vector,
+                                  MSIMessage msg)
+{
+    NvmeCtrl *n = NVME(pci_dev);
+    int ret;
+
+    trace_pci_nvme_irq_unmask(vector, msg.address, msg.data);
+
+    for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+        NvmeCQueue *cq = n->cq[i];
+
+        if (!cq) {
+            continue;
+        }
+
+        if (cq->vector == vector) {
+            if (cq->msg.data != msg.data || cq->msg.address != msg.address) {
+                ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg,
+                                                   pci_dev);
+                if (ret < 0) {
+                    return ret;
+                }
+
+                kvm_irqchip_commit_routes(kvm_state);
+
+                cq->msg = msg;
+            }
+
+            ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+                                                     &cq->assert_notifier,
+                                                     NULL, cq->virq);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+    }
+
+    return 0;
+}
+
+static void nvme_kvm_vector_mask(PCIDevice *pci_dev, unsigned vector)
+{
+    NvmeCtrl *n = NVME(pci_dev);
+
+    trace_pci_nvme_irq_mask(vector);
+
+    for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+        NvmeCQueue *cq = n->cq[i];
+
+        if (!cq) {
+            continue;
+        }
+
+        if (cq->vector == vector) {
+            kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+                                                  &cq->assert_notifier,
+                                                  cq->virq);
+        }
+    }
+}
+
+static void nvme_kvm_vector_poll(PCIDevice *pci_dev, unsigned int vector_start,
+                                 unsigned int vector_end)
+{
+    NvmeCtrl *n = NVME(pci_dev);
+
+    trace_pci_nvme_irq_poll(vector_start, vector_end);
+
+    for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+        NvmeCQueue *cq = n->cq[i];
+
+        if (!cq) {
+            continue;
+        }
+
+        if (!msix_is_masked(pci_dev, cq->vector)) {
+            continue;
+        }
+
+        if (cq->vector >= vector_start && cq->vector <= vector_end) {
+            if (event_notifier_test_and_clear(&cq->assert_notifier)) {
+                msix_set_pending(pci_dev, i);
+            }
+        }
+    }
+}
+
+
 static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
 {
+    bool with_irqfd = msix_enabled(&n->parent_obj) &&
+                      kvm_msi_via_irqfd_enabled();
     int ret;
 
     ret = event_notifier_init(&cq->assert_notifier, 0);
@@ -1386,12 +1494,27 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
         return;
     }
 
-    event_notifier_set_handler(&cq->assert_notifier,
-                                nvme_assert_notifier_read);
+    if (with_irqfd) {
+        ret = nvme_kvm_vector_use(n, cq, cq->vector);
+        if (ret < 0) {
+            event_notifier_cleanup(&cq->assert_notifier);
+
+            return;
+        }
+    } else {
+        event_notifier_set_handler(&cq->assert_notifier,
+                                   nvme_assert_notifier_read);
+    }
 
     if (!msix_enabled(&n->parent_obj)) {
         ret = event_notifier_init(&cq->deassert_notifier, 0);
         if (ret < 0) {
+            if (with_irqfd) {
+                kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+                                                      &cq->assert_notifier,
+                                                      cq->virq);
+            }
+
             event_notifier_set_handler(&cq->assert_notifier, NULL);
             event_notifier_cleanup(&cq->assert_notifier);
 
@@ -4764,6 +4887,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
 
 static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
 {
+    bool with_irqfd = msix_enabled(&n->parent_obj) &&
+                      kvm_msi_via_irqfd_enabled();
     uint16_t offset = (cq->cqid << 3) + (1 << 2);
 
     n->cq[cq->cqid] = NULL;
@@ -4775,6 +4900,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
         event_notifier_cleanup(&cq->notifier);
     }
     if (cq->assert_notifier.initialized) {
+        if (with_irqfd) {
+            kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+                                                  &cq->assert_notifier,
+                                                  cq->virq);
+            kvm_irqchip_release_virq(kvm_state, cq->virq);
+        }
         event_notifier_set_handler(&cq->assert_notifier, NULL);
         event_notifier_cleanup(&cq->assert_notifier);
     }
@@ -6528,6 +6659,9 @@ static int nvme_start_ctrl(NvmeCtrl *n)
     uint32_t page_size = 1 << page_bits;
     NvmeSecCtrlEntry *sctrl = nvme_sctrl(n);
 
+    bool with_irqfd = msix_enabled(&n->parent_obj) &&
+                      kvm_msi_via_irqfd_enabled();
+
     if (pci_is_vf(&n->parent_obj) && !sctrl->scs) {
         trace_pci_nvme_err_startfail_virt_state(le16_to_cpu(sctrl->nvi),
                                                 le16_to_cpu(sctrl->nvq),
@@ -6617,6 +6751,12 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 
     nvme_select_iocs(n);
 
+    if (with_irqfd) {
+        return msix_set_vector_notifiers(PCI_DEVICE(n), nvme_kvm_vector_unmask,
+                                         nvme_kvm_vector_mask,
+                                         nvme_kvm_vector_poll);
+    }
+
     return 0;
 }
 
@@ -7734,6 +7874,7 @@ static void nvme_exit(PCIDevice *pci_dev)
         pcie_sriov_pf_exit(pci_dev);
     }
 
+    msix_unset_vector_notifiers(pci_dev);
     msix_uninit(pci_dev, &n->bar0, &n->bar0);
     memory_region_del_subregion(&n->bar0, &n->iomem);
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 4850d3e965..b0b986b024 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -20,6 +20,7 @@
 
 #include "qemu/uuid.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
 #include "hw/block/block.h"
 
 #include "block/nvme.h"
@@ -396,10 +397,12 @@ typedef struct NvmeCQueue {
     uint64_t    dma_addr;
     uint64_t    db_addr;
     uint64_t    ei_addr;
+    int         virq;
     QEMUTimer   *timer;
     EventNotifier notifier;
     EventNotifier assert_notifier;
     EventNotifier deassert_notifier;
+    MSIMessage  msg;
     bool        ioeventfd_enabled;
     QTAILQ_HEAD(, NvmeSQueue) sq_list;
     QTAILQ_HEAD(, NvmeRequest) req_list;
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index fccb79f489..b11fcf4a65 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -2,6 +2,9 @@
 pci_nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
 pci_nvme_irq_pin(void) "pulsing IRQ pin"
 pci_nvme_irq_masked(void) "IRQ is masked"
+pci_nvme_irq_mask(uint32_t vector) "IRQ %u gets masked"
+pci_nvme_irq_unmask(uint32_t vector, uint64_t addr, uint32_t data) "IRQ %u gets unmasked, addr=0x%"PRIx64" data=0x%"PRIu32""
+pci_nvme_irq_poll(uint32_t vector_start, uint32_t vector_end) "IRQ poll, start=0x%"PRIu32" end=0x%"PRIu32""
 pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
 pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=0x%"PRIx64" eis_addr=0x%"PRIx64""
 pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] hw/nvme: add iothread support
  2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
  2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
  2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
@ 2022-08-26 15:12 ` Jinhao Fan
  2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: its, kbusch, stefanha, Jinhao Fan, open list:nvme

Add an option "iothread=x" to do emulation in a seperate iothread.
This improves the performance because QEMU's main loop is responsible
for a lot of other work while iothread is dedicated to NVMe emulation.
Moreover, emulating in iothread brings the potential of polling on
SQ/CQ doorbells, which I will bring up in a following patch.

Iothread can be enabled by:
-object iothread,id=nvme0 \
-device nvme,iothread=nvme0 \

Performance comparisons (KIOPS):

QD         1   4  16  64
QEMU      41 136 242 338
iothread  53 155 245 309

Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
---
 hw/nvme/ctrl.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------
 hw/nvme/ns.c   | 21 +++++++++++++---
 hw/nvme/nvme.h |  6 ++++-
 3 files changed, 82 insertions(+), 12 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 396f3f0cdd..2d81f7c7f6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4458,7 +4458,13 @@ static int nvme_init_cq_ioeventfd(NvmeCQueue *cq)
         return ret;
     }
 
-    event_notifier_set_handler(&cq->notifier, nvme_cq_notifier);
+    if (cq->cqid) {
+        aio_set_event_notifier(n->ctx, &cq->notifier, true, nvme_cq_notifier,
+                               NULL, NULL);
+    } else {
+        event_notifier_set_handler(&cq->notifier, nvme_cq_notifier);
+    }
+
     memory_region_add_eventfd(&n->iomem,
                               0x1000 + offset, 4, false, 0, &cq->notifier);
 
@@ -4487,7 +4493,13 @@ static int nvme_init_sq_ioeventfd(NvmeSQueue *sq)
         return ret;
     }
 
-    event_notifier_set_handler(&sq->notifier, nvme_sq_notifier);
+    if (sq->sqid) {
+        aio_set_event_notifier(n->ctx, &sq->notifier, true, nvme_sq_notifier,
+                               NULL, NULL);
+    } else {
+        event_notifier_set_handler(&sq->notifier, nvme_sq_notifier);
+    }
+
     memory_region_add_eventfd(&n->iomem,
                               0x1000 + offset, 4, false, 0, &sq->notifier);
 
@@ -4503,7 +4515,12 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
     if (sq->ioeventfd_enabled) {
         memory_region_del_eventfd(&n->iomem,
                                   0x1000 + offset, 4, false, 0, &sq->notifier);
-        event_notifier_set_handler(&sq->notifier, NULL);
+        if (sq->sqid) {
+            aio_set_event_notifier(n->ctx, &sq->notifier, true, NULL, NULL,
+                                   NULL);
+        } else {
+            event_notifier_set_handler(&sq->notifier, NULL);
+        }
         event_notifier_cleanup(&sq->notifier);
     }
     g_free(sq->io_req);
@@ -4573,7 +4590,13 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
         sq->io_req[i].sq = sq;
         QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry);
     }
-    sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
+
+    if (sq->sqid) {
+        sq->timer = aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS,
+                                  nvme_process_sq, sq);
+    } else {
+        sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
+    }
 
     if (n->dbbuf_enabled) {
         sq->db_addr = n->dbbuf_dbs + (sqid << 3);
@@ -4896,7 +4919,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
     if (cq->ioeventfd_enabled) {
         memory_region_del_eventfd(&n->iomem,
                                   0x1000 + offset, 4, false, 0, &cq->notifier);
-        event_notifier_set_handler(&cq->notifier, NULL);
+        if (cq->cqid) {
+            aio_set_event_notifier(n->ctx, &cq->notifier, true, NULL, NULL,
+                                   NULL);
+        } else {
+            event_notifier_set_handler(&cq->notifier, NULL);
+        }
         event_notifier_cleanup(&cq->notifier);
     }
     if (cq->assert_notifier.initialized) {
@@ -4979,7 +5007,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
         }
     }
     n->cq[cqid] = cq;
-    cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+
+    if (cq->cqid) {
+        cq->timer = aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS,
+                                  nvme_post_cqes, cq);
+    } else {
+        cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+    }
 
     /*
      * Only enable irq eventfd for IO queues since we always emulate admin
@@ -7759,6 +7793,14 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
     if (pci_is_vf(&n->parent_obj) && !sctrl->scs) {
         stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
     }
+
+    if (n->params.iothread) {
+        n->iothread = n->params.iothread;
+        object_ref(OBJECT(n->iothread));
+        n->ctx = iothread_get_aio_context(n->iothread);
+    } else {
+        n->ctx = qemu_get_aio_context();
+    }
 }
 
 static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
@@ -7831,7 +7873,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
         ns = &n->namespace;
         ns->params.nsid = 1;
 
-        if (nvme_ns_setup(ns, errp)) {
+        if (nvme_ns_setup(ns, n->ctx, errp)) {
             return;
         }
 
@@ -7862,6 +7904,15 @@ static void nvme_exit(PCIDevice *pci_dev)
     g_free(n->sq);
     g_free(n->aer_reqs);
 
+    aio_context_acquire(n->ctx);
+    blk_set_aio_context(n->namespace.blkconf.blk, qemu_get_aio_context(), NULL);
+    aio_context_release(n->ctx);
+
+    if (n->iothread) {
+        object_unref(OBJECT(n->iothread));
+        n->iothread = NULL;
+    }
+
     if (n->params.cmb_size_mb) {
         g_free(n->cmb.buf);
     }
@@ -7885,6 +7936,8 @@ static Property nvme_props[] = {
                      HostMemoryBackend *),
     DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
                      NvmeSubsystem *),
+    DEFINE_PROP_LINK("iothread", NvmeCtrl, params.iothread, TYPE_IOTHREAD,
+                     IOThread *),
     DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
     DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
     DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 62a1f97be0..eb9141a67b 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -146,9 +146,11 @@ lbaf_found:
     return 0;
 }
 
-static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp)
+static int nvme_ns_init_blk(NvmeNamespace *ns, AioContext *ctx, Error **errp)
 {
     bool read_only;
+    AioContext *old_context;
+    int ret;
 
     if (!blkconf_blocksizes(&ns->blkconf, errp)) {
         return -1;
@@ -170,6 +172,17 @@ static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp)
         return -1;
     }
 
+    old_context = blk_get_aio_context(ns->blkconf.blk);
+    aio_context_acquire(old_context);
+    ret = blk_set_aio_context(ns->blkconf.blk, ctx, errp);
+    aio_context_release(old_context);
+
+    if (ret) {
+        error_setg(errp, "Set AioContext on BlockBackend failed");
+        return ret;
+    }
+
+
     return 0;
 }
 
@@ -482,13 +495,13 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
     return 0;
 }
 
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
+int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp)
 {
     if (nvme_ns_check_constraints(ns, errp)) {
         return -1;
     }
 
-    if (nvme_ns_init_blk(ns, errp)) {
+    if (nvme_ns_init_blk(ns, ctx, errp)) {
         return -1;
     }
 
@@ -563,7 +576,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
         }
     }
 
-    if (nvme_ns_setup(ns, errp)) {
+    if (nvme_ns_setup(ns, n->ctx, errp)) {
         return;
     }
 
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index b0b986b024..224b73e6c4 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -22,6 +22,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
 #include "hw/block/block.h"
+#include "sysemu/iothread.h"
 
 #include "block/nvme.h"
 
@@ -276,7 +277,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 }
 
 void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
@@ -433,6 +434,7 @@ typedef struct NvmeParams {
     uint16_t sriov_vi_flexible;
     uint8_t  sriov_max_vq_per_vf;
     uint8_t  sriov_max_vi_per_vf;
+    IOThread *iothread;
 } NvmeParams;
 
 typedef struct NvmeCtrl {
@@ -464,6 +466,8 @@ typedef struct NvmeCtrl {
     uint64_t    dbbuf_dbs;
     uint64_t    dbbuf_eis;
     bool        dbbuf_enabled;
+    IOThread    *iothread;
+    AioContext  *ctx;
 
     struct {
         MemoryRegion mem;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
  2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
@ 2022-08-26 15:34   ` Keith Busch
  2022-08-26 15:45     ` Klaus Jensen
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Keith Busch @ 2022-08-26 15:34 UTC (permalink / raw)
  To: Jinhao Fan; +Cc: qemu-devel, its, stefanha, Klaus Jensen, open list:nvme

On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> Use KVM's irqfd to send interrupts when possible. This approach is
> thread safe. Moreover, it does not have the inter-thread communication
> overhead of plain event notifiers since handler callback are called
> in the same system call as irqfd write.
> 
> Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>

No idea what's going on here... This one is causing the following assert
failure with --enable-kvm:

  qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
and linux kernel returns EINVAL in that case. It's never set that way without
this patch. Am I the only one seeing this?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
  2022-08-26 15:34   ` Keith Busch
@ 2022-08-26 15:45     ` Klaus Jensen
  2022-08-26 15:54       ` Keith Busch
  2022-08-26 16:04     ` Jinhao Fan
  2022-08-27  8:28     ` Jinhao Fan
  2 siblings, 1 reply; 10+ messages in thread
From: Klaus Jensen @ 2022-08-26 15:45 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jinhao Fan, qemu-devel, stefanha, Klaus Jensen, open list:nvme

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Aug 26 09:34, Keith Busch wrote:
> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > Use KVM's irqfd to send interrupts when possible. This approach is
> > thread safe. Moreover, it does not have the inter-thread communication
> > overhead of plain event notifiers since handler callback are called
> > in the same system call as irqfd write.
> > 
> > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> 
> No idea what's going on here... This one is causing the following assert
> failure with --enable-kvm:
> 
>   qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> 
> I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> and linux kernel returns EINVAL in that case. It's never set that way without
> this patch. Am I the only one seeing this?

Argh, sorry, I threw that patch together a bit too quickly. I was just
so pumped because I believed I had solved the issue hehe.

Are you missing the ioeventfd=on and irq-eventfd=on parameters by any
chance? Without those I'm also getting an assertion, but a different one

qemu-system-x86_64: ../hw/pci/msix.c:119: msix_fire_vector_notifier: Assertion `ret >= 0' failed.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
  2022-08-26 15:45     ` Klaus Jensen
@ 2022-08-26 15:54       ` Keith Busch
  2022-08-26 15:58         ` Klaus Jensen
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2022-08-26 15:54 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: Jinhao Fan, qemu-devel, stefanha, Klaus Jensen, open list:nvme

On Fri, Aug 26, 2022 at 05:45:21PM +0200, Klaus Jensen wrote:
> On Aug 26 09:34, Keith Busch wrote:
> > On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > > Use KVM's irqfd to send interrupts when possible. This approach is
> > > thread safe. Moreover, it does not have the inter-thread communication
> > > overhead of plain event notifiers since handler callback are called
> > > in the same system call as irqfd write.
> > > 
> > > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > 
> > No idea what's going on here... This one is causing the following assert
> > failure with --enable-kvm:
> > 
> >   qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> > 
> > I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> > and linux kernel returns EINVAL in that case. It's never set that way without
> > this patch. Am I the only one seeing this?
> 
> Argh, sorry, I threw that patch together a bit too quickly. I was just
> so pumped because I believed I had solved the issue hehe.
> 
> Are you missing the ioeventfd=on and irq-eventfd=on parameters by any
> chance? Without those I'm also getting an assertion, but a different one

I had not enabled those yet. This was purely a regrsession test with my
previously working paramaters for a sanity check.

If I enable those new nvme parameters, then it is successful.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
  2022-08-26 15:54       ` Keith Busch
@ 2022-08-26 15:58         ` Klaus Jensen
  0 siblings, 0 replies; 10+ messages in thread
From: Klaus Jensen @ 2022-08-26 15:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jinhao Fan, qemu-devel, stefanha, Klaus Jensen, open list:nvme

[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]

On Aug 26 09:54, Keith Busch wrote:
> On Fri, Aug 26, 2022 at 05:45:21PM +0200, Klaus Jensen wrote:
> > On Aug 26 09:34, Keith Busch wrote:
> > > On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > > > Use KVM's irqfd to send interrupts when possible. This approach is
> > > > thread safe. Moreover, it does not have the inter-thread communication
> > > > overhead of plain event notifiers since handler callback are called
> > > > in the same system call as irqfd write.
> > > > 
> > > > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > > 
> > > No idea what's going on here... This one is causing the following assert
> > > failure with --enable-kvm:
> > > 
> > >   qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> > > 
> > > I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> > > and linux kernel returns EINVAL in that case. It's never set that way without
> > > this patch. Am I the only one seeing this?
> > 
> > Argh, sorry, I threw that patch together a bit too quickly. I was just
> > so pumped because I believed I had solved the issue hehe.
> > 
> > Are you missing the ioeventfd=on and irq-eventfd=on parameters by any
> > chance? Without those I'm also getting an assertion, but a different one
> 
> I had not enabled those yet. This was purely a regrsession test with my
> previously working paramaters for a sanity check.
> 

Yeah, sorry, I just threw it out there with tunnel vision on the kvm
irq part, not doing my due diligence ;)

I'll fix it up!

> If I enable those new nvme parameters, then it is successful.

Great :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
  2022-08-26 15:34   ` Keith Busch
  2022-08-26 15:45     ` Klaus Jensen
@ 2022-08-26 16:04     ` Jinhao Fan
  2022-08-27  8:28     ` Jinhao Fan
  2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 16:04 UTC (permalink / raw)
  To: Keith Busch; +Cc: qemu-devel, its, stefanha, Klaus Jensen, open list:nvme

at 11:34 PM, Keith Busch <kbusch@kernel.org> wrote:

> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
>> Use KVM's irqfd to send interrupts when possible. This approach is
>> thread safe. Moreover, it does not have the inter-thread communication
>> overhead of plain event notifiers since handler callback are called
>> in the same system call as irqfd write.
>> 
>> Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
>> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> 
> No idea what's going on here... This one is causing the following assert
> failure with --enable-kvm:
> 
>  qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

My intuition is that if irq-eventfd is off we shouldn’t call
kvm_irqchip_commit_routes(). Probably we missed some check here.

> I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> and linux kernel returns EINVAL in that case. It's never set that way without
> this patch. Am I the only one seeing this?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
  2022-08-26 15:34   ` Keith Busch
  2022-08-26 15:45     ` Klaus Jensen
  2022-08-26 16:04     ` Jinhao Fan
@ 2022-08-27  8:28     ` Jinhao Fan
  2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-27  8:28 UTC (permalink / raw)
  To: Keith Busch; +Cc: qemu-devel, its, stefanha, Klaus Jensen, open list:nvme

at 11:34 PM, Keith Busch <kbusch@kernel.org> wrote:

> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
>> Use KVM's irqfd to send interrupts when possible. This approach is
>> thread safe. Moreover, it does not have the inter-thread communication
>> overhead of plain event notifiers since handler callback are called
>> in the same system call as irqfd write.
>> 
>> Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
>> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> 
> No idea what's going on here... This one is causing the following assert
> failure with --enable-kvm:
> 
>  qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> 
> I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> and linux kernel returns EINVAL in that case. It's never set that way without
> this patch. Am I the only one seeing this?

nvme_start_ctrl() registers MSI-X masking handlers without checking
irq-eventfd. This causes nvme_kvm_vector_unmask() to be called when it is
not supposed to.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-08-27  8:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
2022-08-26 15:34   ` Keith Busch
2022-08-26 15:45     ` Klaus Jensen
2022-08-26 15:54       ` Keith Busch
2022-08-26 15:58         ` Klaus Jensen
2022-08-26 16:04     ` Jinhao Fan
2022-08-27  8:28     ` Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 3/3] hw/nvme: add iothread support Jinhao Fan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).