* [PATCH v2 0/3] iothread and irqfd support
@ 2022-08-26 15:12 Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
To: qemu-devel; +Cc: its, kbusch, stefanha, Jinhao Fan
This patch series adds support for using a seperate iothread for NVMe
IO emulation, which brings the potential of applying polling. The
first two patches implements support for irqfd, which solves thread
safety problems for interrupt emulation outside the main loop thread.
Changes since v1:
- Avoid duplicate initilization of cq timer
Jinhao Fan (3):
hw/nvme: support irq(de)assertion with eventfd
hw/nvme: use KVM irqfd when available
hw/nvme: add iothread support
hw/nvme/ctrl.c | 328 +++++++++++++++++++++++++++++++++++++++----
hw/nvme/ns.c | 21 ++-
hw/nvme/nvme.h | 12 +-
hw/nvme/trace-events | 3 +
4 files changed, 335 insertions(+), 29 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd
2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
@ 2022-08-26 15:12 ` Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 3/3] hw/nvme: add iothread support Jinhao Fan
2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
To: qemu-devel
Cc: its, kbusch, stefanha, Jinhao Fan, Klaus Jensen, open list:nvme
When the new option 'irq-eventfd' is turned on, the IO emulation code
signals an eventfd when it want to (de)assert an irq. The main loop
eventfd handler does the actual irq (de)assertion. This paves the way
for iothread support since QEMU's interrupt emulation is not thread
safe.
Asserting and deasseting irq with eventfd has some performance
implications. For small queue depth it increases request latency but
for large queue depth it effectively coalesces irqs.
Comparision (KIOPS):
QD 1 4 16 64
QEMU 38 123 210 329
irq-eventfd 32 106 240 364
Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/nvme/ctrl.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------
hw/nvme/nvme.h | 3 ++
2 files changed, 106 insertions(+), 17 deletions(-)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 87aeba0564..51792f3955 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -526,34 +526,57 @@ static void nvme_irq_check(NvmeCtrl *n)
}
}
+static void nvme_irq_do_assert(NvmeCtrl *n, NvmeCQueue *cq)
+{
+ if (msix_enabled(&(n->parent_obj))) {
+ trace_pci_nvme_irq_msix(cq->vector);
+ msix_notify(&(n->parent_obj), cq->vector);
+ } else {
+ trace_pci_nvme_irq_pin();
+ assert(cq->vector < 32);
+ n->irq_status |= 1 << cq->vector;
+ nvme_irq_check(n);
+ }
+}
+
static void nvme_irq_assert(NvmeCtrl *n, NvmeCQueue *cq)
{
if (cq->irq_enabled) {
- if (msix_enabled(&(n->parent_obj))) {
- trace_pci_nvme_irq_msix(cq->vector);
- msix_notify(&(n->parent_obj), cq->vector);
+ if (cq->assert_notifier.initialized) {
+ event_notifier_set(&cq->assert_notifier);
} else {
- trace_pci_nvme_irq_pin();
- assert(cq->vector < 32);
- n->irq_status |= 1 << cq->vector;
- nvme_irq_check(n);
+ nvme_irq_do_assert(n, cq);
}
} else {
trace_pci_nvme_irq_masked();
}
}
+static void nvme_irq_do_deassert(NvmeCtrl *n, NvmeCQueue *cq)
+{
+ if (msix_enabled(&(n->parent_obj))) {
+ return;
+ } else {
+ assert(cq->vector < 32);
+ if (!n->cq_pending) {
+ n->irq_status &= ~(1 << cq->vector);
+ }
+ nvme_irq_check(n);
+ }
+}
+
static void nvme_irq_deassert(NvmeCtrl *n, NvmeCQueue *cq)
{
if (cq->irq_enabled) {
- if (msix_enabled(&(n->parent_obj))) {
- return;
+ if (cq->deassert_notifier.initialized) {
+ /*
+ * The deassert notifier will only be initilized when MSI-X is NOT
+ * in use. Therefore no need to worry about extra eventfd syscall
+ * for pin-based interrupts.
+ */
+ event_notifier_set(&cq->deassert_notifier);
} else {
- assert(cq->vector < 32);
- if (!n->cq_pending) {
- n->irq_status &= ~(1 << cq->vector);
- }
- nvme_irq_check(n);
+ nvme_irq_do_deassert(n, cq);
}
}
}
@@ -1338,6 +1361,50 @@ static void nvme_update_cq_head(NvmeCQueue *cq)
trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
}
+static void nvme_assert_notifier_read(EventNotifier *e)
+{
+ NvmeCQueue *cq = container_of(e, NvmeCQueue, assert_notifier);
+ if (event_notifier_test_and_clear(e)) {
+ nvme_irq_do_assert(cq->ctrl, cq);
+ }
+}
+
+static void nvme_deassert_notifier_read(EventNotifier *e)
+{
+ NvmeCQueue *cq = container_of(e, NvmeCQueue, deassert_notifier);
+ if (event_notifier_test_and_clear(e)) {
+ nvme_irq_do_deassert(cq->ctrl, cq);
+ }
+}
+
+static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
+{
+ int ret;
+
+ ret = event_notifier_init(&cq->assert_notifier, 0);
+ if (ret < 0) {
+ return;
+ }
+
+ event_notifier_set_handler(&cq->assert_notifier,
+ nvme_assert_notifier_read);
+
+ if (!msix_enabled(&n->parent_obj)) {
+ ret = event_notifier_init(&cq->deassert_notifier, 0);
+ if (ret < 0) {
+ event_notifier_set_handler(&cq->assert_notifier, NULL);
+ event_notifier_cleanup(&cq->assert_notifier);
+
+ return;
+ }
+
+ event_notifier_set_handler(&cq->deassert_notifier,
+ nvme_deassert_notifier_read);
+ }
+
+ return;
+}
+
static void nvme_post_cqes(void *opaque)
{
NvmeCQueue *cq = opaque;
@@ -1377,8 +1444,10 @@ static void nvme_post_cqes(void *opaque)
QTAILQ_INSERT_TAIL(&sq->req_list, req, entry);
}
if (cq->tail != cq->head) {
- if (cq->irq_enabled && !pending) {
- n->cq_pending++;
+ if (cq->irq_enabled) {
+ if (!pending) {
+ n->cq_pending++;
+ }
}
nvme_irq_assert(n, cq);
@@ -4705,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
event_notifier_set_handler(&cq->notifier, NULL);
event_notifier_cleanup(&cq->notifier);
}
+ if (cq->assert_notifier.initialized) {
+ event_notifier_set_handler(&cq->assert_notifier, NULL);
+ event_notifier_cleanup(&cq->assert_notifier);
+ }
+ if (cq->deassert_notifier.initialized) {
+ event_notifier_set_handler(&cq->deassert_notifier, NULL);
+ event_notifier_cleanup(&cq->deassert_notifier);
+ }
if (msix_enabled(&n->parent_obj)) {
msix_vector_unuse(&n->parent_obj, cq->vector);
}
@@ -4734,7 +4811,7 @@ static uint16_t nvme_del_cq(NvmeCtrl *n, NvmeRequest *req)
n->cq_pending--;
}
- nvme_irq_deassert(n, cq);
+ nvme_irq_do_deassert(n, cq);
trace_pci_nvme_del_cq(qid);
nvme_free_cq(cq, n);
return NVME_SUCCESS;
@@ -4772,6 +4849,14 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
}
n->cq[cqid] = cq;
cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+
+ /*
+ * Only enable irq eventfd for IO queues since we always emulate admin
+ * queue in main loop thread.
+ */
+ if (cqid && n->params.irq_eventfd) {
+ nvme_init_irq_notifier(n, cq);
+ }
}
static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req)
@@ -7671,6 +7756,7 @@ static Property nvme_props[] = {
DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false),
DEFINE_PROP_BOOL("legacy-cmb", NvmeCtrl, params.legacy_cmb, false),
DEFINE_PROP_BOOL("ioeventfd", NvmeCtrl, params.ioeventfd, false),
+ DEFINE_PROP_BOOL("irq-eventfd", NvmeCtrl, params.irq_eventfd, false),
DEFINE_PROP_UINT8("zoned.zasl", NvmeCtrl, params.zasl, 0),
DEFINE_PROP_BOOL("zoned.auto_transition", NvmeCtrl,
params.auto_transition_zones, true),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 79f5c281c2..4850d3e965 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -398,6 +398,8 @@ typedef struct NvmeCQueue {
uint64_t ei_addr;
QEMUTimer *timer;
EventNotifier notifier;
+ EventNotifier assert_notifier;
+ EventNotifier deassert_notifier;
bool ioeventfd_enabled;
QTAILQ_HEAD(, NvmeSQueue) sq_list;
QTAILQ_HEAD(, NvmeRequest) req_list;
@@ -422,6 +424,7 @@ typedef struct NvmeParams {
bool auto_transition_zones;
bool legacy_cmb;
bool ioeventfd;
+ bool irq_eventfd;
uint8_t sriov_max_vfs;
uint16_t sriov_vq_flexible;
uint16_t sriov_vi_flexible;
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
@ 2022-08-26 15:12 ` Jinhao Fan
2022-08-26 15:34 ` Keith Busch
2022-08-26 15:12 ` [PATCH v2 3/3] hw/nvme: add iothread support Jinhao Fan
2 siblings, 1 reply; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
To: qemu-devel
Cc: its, kbusch, stefanha, Jinhao Fan, Klaus Jensen, open list:nvme
Use KVM's irqfd to send interrupts when possible. This approach is
thread safe. Moreover, it does not have the inter-thread communication
overhead of plain event notifiers since handler callback are called
in the same system call as irqfd write.
Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
hw/nvme/ctrl.c | 145 ++++++++++++++++++++++++++++++++++++++++++-
hw/nvme/nvme.h | 3 +
hw/nvme/trace-events | 3 +
3 files changed, 149 insertions(+), 2 deletions(-)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 51792f3955..396f3f0cdd 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -192,6 +192,7 @@
#include "qapi/error.h"
#include "qapi/visitor.h"
#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
#include "sysemu/block-backend.h"
#include "sysemu/hostmem.h"
#include "hw/pci/msix.h"
@@ -1377,8 +1378,115 @@ static void nvme_deassert_notifier_read(EventNotifier *e)
}
}
+static int nvme_kvm_vector_use(NvmeCtrl *n, NvmeCQueue *cq, uint32_t vector)
+{
+ KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
+ int ret;
+
+ ret = kvm_irqchip_add_msi_route(&c, vector, &n->parent_obj);
+ if (ret < 0) {
+ return ret;
+ }
+
+ kvm_irqchip_commit_route_changes(&c);
+
+ cq->virq = ret;
+
+ return 0;
+}
+
+static int nvme_kvm_vector_unmask(PCIDevice *pci_dev, unsigned vector,
+ MSIMessage msg)
+{
+ NvmeCtrl *n = NVME(pci_dev);
+ int ret;
+
+ trace_pci_nvme_irq_unmask(vector, msg.address, msg.data);
+
+ for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+ NvmeCQueue *cq = n->cq[i];
+
+ if (!cq) {
+ continue;
+ }
+
+ if (cq->vector == vector) {
+ if (cq->msg.data != msg.data || cq->msg.address != msg.address) {
+ ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg,
+ pci_dev);
+ if (ret < 0) {
+ return ret;
+ }
+
+ kvm_irqchip_commit_routes(kvm_state);
+
+ cq->msg = msg;
+ }
+
+ ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+ &cq->assert_notifier,
+ NULL, cq->virq);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static void nvme_kvm_vector_mask(PCIDevice *pci_dev, unsigned vector)
+{
+ NvmeCtrl *n = NVME(pci_dev);
+
+ trace_pci_nvme_irq_mask(vector);
+
+ for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+ NvmeCQueue *cq = n->cq[i];
+
+ if (!cq) {
+ continue;
+ }
+
+ if (cq->vector == vector) {
+ kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+ &cq->assert_notifier,
+ cq->virq);
+ }
+ }
+}
+
+static void nvme_kvm_vector_poll(PCIDevice *pci_dev, unsigned int vector_start,
+ unsigned int vector_end)
+{
+ NvmeCtrl *n = NVME(pci_dev);
+
+ trace_pci_nvme_irq_poll(vector_start, vector_end);
+
+ for (uint32_t i = 1; i <= n->params.max_ioqpairs; i++) {
+ NvmeCQueue *cq = n->cq[i];
+
+ if (!cq) {
+ continue;
+ }
+
+ if (!msix_is_masked(pci_dev, cq->vector)) {
+ continue;
+ }
+
+ if (cq->vector >= vector_start && cq->vector <= vector_end) {
+ if (event_notifier_test_and_clear(&cq->assert_notifier)) {
+ msix_set_pending(pci_dev, i);
+ }
+ }
+ }
+}
+
+
static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
{
+ bool with_irqfd = msix_enabled(&n->parent_obj) &&
+ kvm_msi_via_irqfd_enabled();
int ret;
ret = event_notifier_init(&cq->assert_notifier, 0);
@@ -1386,12 +1494,27 @@ static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
return;
}
- event_notifier_set_handler(&cq->assert_notifier,
- nvme_assert_notifier_read);
+ if (with_irqfd) {
+ ret = nvme_kvm_vector_use(n, cq, cq->vector);
+ if (ret < 0) {
+ event_notifier_cleanup(&cq->assert_notifier);
+
+ return;
+ }
+ } else {
+ event_notifier_set_handler(&cq->assert_notifier,
+ nvme_assert_notifier_read);
+ }
if (!msix_enabled(&n->parent_obj)) {
ret = event_notifier_init(&cq->deassert_notifier, 0);
if (ret < 0) {
+ if (with_irqfd) {
+ kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+ &cq->assert_notifier,
+ cq->virq);
+ }
+
event_notifier_set_handler(&cq->assert_notifier, NULL);
event_notifier_cleanup(&cq->assert_notifier);
@@ -4764,6 +4887,8 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req)
static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
{
+ bool with_irqfd = msix_enabled(&n->parent_obj) &&
+ kvm_msi_via_irqfd_enabled();
uint16_t offset = (cq->cqid << 3) + (1 << 2);
n->cq[cq->cqid] = NULL;
@@ -4775,6 +4900,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
event_notifier_cleanup(&cq->notifier);
}
if (cq->assert_notifier.initialized) {
+ if (with_irqfd) {
+ kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+ &cq->assert_notifier,
+ cq->virq);
+ kvm_irqchip_release_virq(kvm_state, cq->virq);
+ }
event_notifier_set_handler(&cq->assert_notifier, NULL);
event_notifier_cleanup(&cq->assert_notifier);
}
@@ -6528,6 +6659,9 @@ static int nvme_start_ctrl(NvmeCtrl *n)
uint32_t page_size = 1 << page_bits;
NvmeSecCtrlEntry *sctrl = nvme_sctrl(n);
+ bool with_irqfd = msix_enabled(&n->parent_obj) &&
+ kvm_msi_via_irqfd_enabled();
+
if (pci_is_vf(&n->parent_obj) && !sctrl->scs) {
trace_pci_nvme_err_startfail_virt_state(le16_to_cpu(sctrl->nvi),
le16_to_cpu(sctrl->nvq),
@@ -6617,6 +6751,12 @@ static int nvme_start_ctrl(NvmeCtrl *n)
nvme_select_iocs(n);
+ if (with_irqfd) {
+ return msix_set_vector_notifiers(PCI_DEVICE(n), nvme_kvm_vector_unmask,
+ nvme_kvm_vector_mask,
+ nvme_kvm_vector_poll);
+ }
+
return 0;
}
@@ -7734,6 +7874,7 @@ static void nvme_exit(PCIDevice *pci_dev)
pcie_sriov_pf_exit(pci_dev);
}
+ msix_unset_vector_notifiers(pci_dev);
msix_uninit(pci_dev, &n->bar0, &n->bar0);
memory_region_del_subregion(&n->bar0, &n->iomem);
}
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 4850d3e965..b0b986b024 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -20,6 +20,7 @@
#include "qemu/uuid.h"
#include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
#include "hw/block/block.h"
#include "block/nvme.h"
@@ -396,10 +397,12 @@ typedef struct NvmeCQueue {
uint64_t dma_addr;
uint64_t db_addr;
uint64_t ei_addr;
+ int virq;
QEMUTimer *timer;
EventNotifier notifier;
EventNotifier assert_notifier;
EventNotifier deassert_notifier;
+ MSIMessage msg;
bool ioeventfd_enabled;
QTAILQ_HEAD(, NvmeSQueue) sq_list;
QTAILQ_HEAD(, NvmeRequest) req_list;
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index fccb79f489..b11fcf4a65 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -2,6 +2,9 @@
pci_nvme_irq_msix(uint32_t vector) "raising MSI-X IRQ vector %u"
pci_nvme_irq_pin(void) "pulsing IRQ pin"
pci_nvme_irq_masked(void) "IRQ is masked"
+pci_nvme_irq_mask(uint32_t vector) "IRQ %u gets masked"
+pci_nvme_irq_unmask(uint32_t vector, uint64_t addr, uint32_t data) "IRQ %u gets unmasked, addr=0x%"PRIx64" data=0x%"PRIu32""
+pci_nvme_irq_poll(uint32_t vector_start, uint32_t vector_end) "IRQ poll, start=0x%"PRIu32" end=0x%"PRIu32""
pci_nvme_dma_read(uint64_t prp1, uint64_t prp2) "DMA read, prp1=0x%"PRIx64" prp2=0x%"PRIx64""
pci_nvme_dbbuf_config(uint64_t dbs_addr, uint64_t eis_addr) "dbs_addr=0x%"PRIx64" eis_addr=0x%"PRIx64""
pci_nvme_map_addr(uint64_t addr, uint64_t len) "addr 0x%"PRIx64" len %"PRIu64""
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/3] hw/nvme: add iothread support
2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
@ 2022-08-26 15:12 ` Jinhao Fan
2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 15:12 UTC (permalink / raw)
To: qemu-devel; +Cc: its, kbusch, stefanha, Jinhao Fan, open list:nvme
Add an option "iothread=x" to do emulation in a seperate iothread.
This improves the performance because QEMU's main loop is responsible
for a lot of other work while iothread is dedicated to NVMe emulation.
Moreover, emulating in iothread brings the potential of polling on
SQ/CQ doorbells, which I will bring up in a following patch.
Iothread can be enabled by:
-object iothread,id=nvme0 \
-device nvme,iothread=nvme0 \
Performance comparisons (KIOPS):
QD 1 4 16 64
QEMU 41 136 242 338
iothread 53 155 245 309
Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
---
hw/nvme/ctrl.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------
hw/nvme/ns.c | 21 +++++++++++++---
hw/nvme/nvme.h | 6 ++++-
3 files changed, 82 insertions(+), 12 deletions(-)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 396f3f0cdd..2d81f7c7f6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4458,7 +4458,13 @@ static int nvme_init_cq_ioeventfd(NvmeCQueue *cq)
return ret;
}
- event_notifier_set_handler(&cq->notifier, nvme_cq_notifier);
+ if (cq->cqid) {
+ aio_set_event_notifier(n->ctx, &cq->notifier, true, nvme_cq_notifier,
+ NULL, NULL);
+ } else {
+ event_notifier_set_handler(&cq->notifier, nvme_cq_notifier);
+ }
+
memory_region_add_eventfd(&n->iomem,
0x1000 + offset, 4, false, 0, &cq->notifier);
@@ -4487,7 +4493,13 @@ static int nvme_init_sq_ioeventfd(NvmeSQueue *sq)
return ret;
}
- event_notifier_set_handler(&sq->notifier, nvme_sq_notifier);
+ if (sq->sqid) {
+ aio_set_event_notifier(n->ctx, &sq->notifier, true, nvme_sq_notifier,
+ NULL, NULL);
+ } else {
+ event_notifier_set_handler(&sq->notifier, nvme_sq_notifier);
+ }
+
memory_region_add_eventfd(&n->iomem,
0x1000 + offset, 4, false, 0, &sq->notifier);
@@ -4503,7 +4515,12 @@ static void nvme_free_sq(NvmeSQueue *sq, NvmeCtrl *n)
if (sq->ioeventfd_enabled) {
memory_region_del_eventfd(&n->iomem,
0x1000 + offset, 4, false, 0, &sq->notifier);
- event_notifier_set_handler(&sq->notifier, NULL);
+ if (sq->sqid) {
+ aio_set_event_notifier(n->ctx, &sq->notifier, true, NULL, NULL,
+ NULL);
+ } else {
+ event_notifier_set_handler(&sq->notifier, NULL);
+ }
event_notifier_cleanup(&sq->notifier);
}
g_free(sq->io_req);
@@ -4573,7 +4590,13 @@ static void nvme_init_sq(NvmeSQueue *sq, NvmeCtrl *n, uint64_t dma_addr,
sq->io_req[i].sq = sq;
QTAILQ_INSERT_TAIL(&(sq->req_list), &sq->io_req[i], entry);
}
- sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
+
+ if (sq->sqid) {
+ sq->timer = aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS,
+ nvme_process_sq, sq);
+ } else {
+ sq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_process_sq, sq);
+ }
if (n->dbbuf_enabled) {
sq->db_addr = n->dbbuf_dbs + (sqid << 3);
@@ -4896,7 +4919,12 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
if (cq->ioeventfd_enabled) {
memory_region_del_eventfd(&n->iomem,
0x1000 + offset, 4, false, 0, &cq->notifier);
- event_notifier_set_handler(&cq->notifier, NULL);
+ if (cq->cqid) {
+ aio_set_event_notifier(n->ctx, &cq->notifier, true, NULL, NULL,
+ NULL);
+ } else {
+ event_notifier_set_handler(&cq->notifier, NULL);
+ }
event_notifier_cleanup(&cq->notifier);
}
if (cq->assert_notifier.initialized) {
@@ -4979,7 +5007,13 @@ static void nvme_init_cq(NvmeCQueue *cq, NvmeCtrl *n, uint64_t dma_addr,
}
}
n->cq[cqid] = cq;
- cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+
+ if (cq->cqid) {
+ cq->timer = aio_timer_new(n->ctx, QEMU_CLOCK_VIRTUAL, SCALE_NS,
+ nvme_post_cqes, cq);
+ } else {
+ cq->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, nvme_post_cqes, cq);
+ }
/*
* Only enable irq eventfd for IO queues since we always emulate admin
@@ -7759,6 +7793,14 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
if (pci_is_vf(&n->parent_obj) && !sctrl->scs) {
stl_le_p(&n->bar.csts, NVME_CSTS_FAILED);
}
+
+ if (n->params.iothread) {
+ n->iothread = n->params.iothread;
+ object_ref(OBJECT(n->iothread));
+ n->ctx = iothread_get_aio_context(n->iothread);
+ } else {
+ n->ctx = qemu_get_aio_context();
+ }
}
static int nvme_init_subsys(NvmeCtrl *n, Error **errp)
@@ -7831,7 +7873,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
ns = &n->namespace;
ns->params.nsid = 1;
- if (nvme_ns_setup(ns, errp)) {
+ if (nvme_ns_setup(ns, n->ctx, errp)) {
return;
}
@@ -7862,6 +7904,15 @@ static void nvme_exit(PCIDevice *pci_dev)
g_free(n->sq);
g_free(n->aer_reqs);
+ aio_context_acquire(n->ctx);
+ blk_set_aio_context(n->namespace.blkconf.blk, qemu_get_aio_context(), NULL);
+ aio_context_release(n->ctx);
+
+ if (n->iothread) {
+ object_unref(OBJECT(n->iothread));
+ n->iothread = NULL;
+ }
+
if (n->params.cmb_size_mb) {
g_free(n->cmb.buf);
}
@@ -7885,6 +7936,8 @@ static Property nvme_props[] = {
HostMemoryBackend *),
DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
NvmeSubsystem *),
+ DEFINE_PROP_LINK("iothread", NvmeCtrl, params.iothread, TYPE_IOTHREAD,
+ IOThread *),
DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 62a1f97be0..eb9141a67b 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -146,9 +146,11 @@ lbaf_found:
return 0;
}
-static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp)
+static int nvme_ns_init_blk(NvmeNamespace *ns, AioContext *ctx, Error **errp)
{
bool read_only;
+ AioContext *old_context;
+ int ret;
if (!blkconf_blocksizes(&ns->blkconf, errp)) {
return -1;
@@ -170,6 +172,17 @@ static int nvme_ns_init_blk(NvmeNamespace *ns, Error **errp)
return -1;
}
+ old_context = blk_get_aio_context(ns->blkconf.blk);
+ aio_context_acquire(old_context);
+ ret = blk_set_aio_context(ns->blkconf.blk, ctx, errp);
+ aio_context_release(old_context);
+
+ if (ret) {
+ error_setg(errp, "Set AioContext on BlockBackend failed");
+ return ret;
+ }
+
+
return 0;
}
@@ -482,13 +495,13 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
return 0;
}
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
+int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp)
{
if (nvme_ns_check_constraints(ns, errp)) {
return -1;
}
- if (nvme_ns_init_blk(ns, errp)) {
+ if (nvme_ns_init_blk(ns, ctx, errp)) {
return -1;
}
@@ -563,7 +576,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
}
}
- if (nvme_ns_setup(ns, errp)) {
+ if (nvme_ns_setup(ns, n->ctx, errp)) {
return;
}
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index b0b986b024..224b73e6c4 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -22,6 +22,7 @@
#include "hw/pci/pci.h"
#include "hw/pci/msi.h"
#include "hw/block/block.h"
+#include "sysemu/iothread.h"
#include "block/nvme.h"
@@ -276,7 +277,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
}
void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeNamespace *ns, AioContext *ctx, Error **errp);
void nvme_ns_drain(NvmeNamespace *ns);
void nvme_ns_shutdown(NvmeNamespace *ns);
void nvme_ns_cleanup(NvmeNamespace *ns);
@@ -433,6 +434,7 @@ typedef struct NvmeParams {
uint16_t sriov_vi_flexible;
uint8_t sriov_max_vq_per_vf;
uint8_t sriov_max_vi_per_vf;
+ IOThread *iothread;
} NvmeParams;
typedef struct NvmeCtrl {
@@ -464,6 +466,8 @@ typedef struct NvmeCtrl {
uint64_t dbbuf_dbs;
uint64_t dbbuf_eis;
bool dbbuf_enabled;
+ IOThread *iothread;
+ AioContext *ctx;
struct {
MemoryRegion mem;
--
2.25.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
@ 2022-08-26 15:34 ` Keith Busch
2022-08-26 15:45 ` Klaus Jensen
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Keith Busch @ 2022-08-26 15:34 UTC (permalink / raw)
To: Jinhao Fan; +Cc: qemu-devel, its, stefanha, Klaus Jensen, open list:nvme
On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> Use KVM's irqfd to send interrupts when possible. This approach is
> thread safe. Moreover, it does not have the inter-thread communication
> overhead of plain event notifiers since handler callback are called
> in the same system call as irqfd write.
>
> Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
No idea what's going on here... This one is causing the following assert
failure with --enable-kvm:
qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
and linux kernel returns EINVAL in that case. It's never set that way without
this patch. Am I the only one seeing this?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
2022-08-26 15:34 ` Keith Busch
@ 2022-08-26 15:45 ` Klaus Jensen
2022-08-26 15:54 ` Keith Busch
2022-08-26 16:04 ` Jinhao Fan
2022-08-27 8:28 ` Jinhao Fan
2 siblings, 1 reply; 10+ messages in thread
From: Klaus Jensen @ 2022-08-26 15:45 UTC (permalink / raw)
To: Keith Busch
Cc: Jinhao Fan, qemu-devel, stefanha, Klaus Jensen, open list:nvme
[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]
On Aug 26 09:34, Keith Busch wrote:
> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > Use KVM's irqfd to send interrupts when possible. This approach is
> > thread safe. Moreover, it does not have the inter-thread communication
> > overhead of plain event notifiers since handler callback are called
> > in the same system call as irqfd write.
> >
> > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>
> No idea what's going on here... This one is causing the following assert
> failure with --enable-kvm:
>
> qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
>
> I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> and linux kernel returns EINVAL in that case. It's never set that way without
> this patch. Am I the only one seeing this?
Argh, sorry, I threw that patch together a bit too quickly. I was just
so pumped because I believed I had solved the issue hehe.
Are you missing the ioeventfd=on and irq-eventfd=on parameters by any
chance? Without those I'm also getting an assertion, but a different one
qemu-system-x86_64: ../hw/pci/msix.c:119: msix_fire_vector_notifier: Assertion `ret >= 0' failed.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
2022-08-26 15:45 ` Klaus Jensen
@ 2022-08-26 15:54 ` Keith Busch
2022-08-26 15:58 ` Klaus Jensen
0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2022-08-26 15:54 UTC (permalink / raw)
To: Klaus Jensen
Cc: Jinhao Fan, qemu-devel, stefanha, Klaus Jensen, open list:nvme
On Fri, Aug 26, 2022 at 05:45:21PM +0200, Klaus Jensen wrote:
> On Aug 26 09:34, Keith Busch wrote:
> > On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > > Use KVM's irqfd to send interrupts when possible. This approach is
> > > thread safe. Moreover, it does not have the inter-thread communication
> > > overhead of plain event notifiers since handler callback are called
> > > in the same system call as irqfd write.
> > >
> > > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> >
> > No idea what's going on here... This one is causing the following assert
> > failure with --enable-kvm:
> >
> > qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> >
> > I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> > and linux kernel returns EINVAL in that case. It's never set that way without
> > this patch. Am I the only one seeing this?
>
> Argh, sorry, I threw that patch together a bit too quickly. I was just
> so pumped because I believed I had solved the issue hehe.
>
> Are you missing the ioeventfd=on and irq-eventfd=on parameters by any
> chance? Without those I'm also getting an assertion, but a different one
I had not enabled those yet. This was purely a regrsession test with my
previously working paramaters for a sanity check.
If I enable those new nvme parameters, then it is successful.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
2022-08-26 15:54 ` Keith Busch
@ 2022-08-26 15:58 ` Klaus Jensen
0 siblings, 0 replies; 10+ messages in thread
From: Klaus Jensen @ 2022-08-26 15:58 UTC (permalink / raw)
To: Keith Busch
Cc: Jinhao Fan, qemu-devel, stefanha, Klaus Jensen, open list:nvme
[-- Attachment #1: Type: text/plain, Size: 1746 bytes --]
On Aug 26 09:54, Keith Busch wrote:
> On Fri, Aug 26, 2022 at 05:45:21PM +0200, Klaus Jensen wrote:
> > On Aug 26 09:34, Keith Busch wrote:
> > > On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
> > > > Use KVM's irqfd to send interrupts when possible. This approach is
> > > > thread safe. Moreover, it does not have the inter-thread communication
> > > > overhead of plain event notifiers since handler callback are called
> > > > in the same system call as irqfd write.
> > > >
> > > > Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
> > > > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > >
> > > No idea what's going on here... This one is causing the following assert
> > > failure with --enable-kvm:
> > >
> > > qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> > >
> > > I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> > > and linux kernel returns EINVAL in that case. It's never set that way without
> > > this patch. Am I the only one seeing this?
> >
> > Argh, sorry, I threw that patch together a bit too quickly. I was just
> > so pumped because I believed I had solved the issue hehe.
> >
> > Are you missing the ioeventfd=on and irq-eventfd=on parameters by any
> > chance? Without those I'm also getting an assertion, but a different one
>
> I had not enabled those yet. This was purely a regrsession test with my
> previously working paramaters for a sanity check.
>
Yeah, sorry, I just threw it out there with tunnel vision on the kvm
irq part, not doing my due diligence ;)
I'll fix it up!
> If I enable those new nvme parameters, then it is successful.
Great :)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
2022-08-26 15:34 ` Keith Busch
2022-08-26 15:45 ` Klaus Jensen
@ 2022-08-26 16:04 ` Jinhao Fan
2022-08-27 8:28 ` Jinhao Fan
2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-26 16:04 UTC (permalink / raw)
To: Keith Busch; +Cc: qemu-devel, its, stefanha, Klaus Jensen, open list:nvme
at 11:34 PM, Keith Busch <kbusch@kernel.org> wrote:
> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
>> Use KVM's irqfd to send interrupts when possible. This approach is
>> thread safe. Moreover, it does not have the inter-thread communication
>> overhead of plain event notifiers since handler callback are called
>> in the same system call as irqfd write.
>>
>> Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
>> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>
> No idea what's going on here... This one is causing the following assert
> failure with --enable-kvm:
>
> qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
My intuition is that if irq-eventfd is off we shouldn’t call
kvm_irqchip_commit_routes(). Probably we missed some check here.
> I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> and linux kernel returns EINVAL in that case. It's never set that way without
> this patch. Am I the only one seeing this?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] hw/nvme: use KVM irqfd when available
2022-08-26 15:34 ` Keith Busch
2022-08-26 15:45 ` Klaus Jensen
2022-08-26 16:04 ` Jinhao Fan
@ 2022-08-27 8:28 ` Jinhao Fan
2 siblings, 0 replies; 10+ messages in thread
From: Jinhao Fan @ 2022-08-27 8:28 UTC (permalink / raw)
To: Keith Busch; +Cc: qemu-devel, its, stefanha, Klaus Jensen, open list:nvme
at 11:34 PM, Keith Busch <kbusch@kernel.org> wrote:
> On Fri, Aug 26, 2022 at 11:12:04PM +0800, Jinhao Fan wrote:
>> Use KVM's irqfd to send interrupts when possible. This approach is
>> thread safe. Moreover, it does not have the inter-thread communication
>> overhead of plain event notifiers since handler callback are called
>> in the same system call as irqfd write.
>>
>> Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn>
>> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
>
> No idea what's going on here... This one is causing the following assert
> failure with --enable-kvm:
>
> qemu-system-x86_64: ../accel/kvm/kvm-all.c:1781: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
>
> I find it calls KVM_SET_GSI_ROUTING ioctl with gsi set to KVM_IRQ_ROUTING_MSI,
> and linux kernel returns EINVAL in that case. It's never set that way without
> this patch. Am I the only one seeing this?
nvme_start_ctrl() registers MSI-X masking handlers without checking
irq-eventfd. This causes nvme_kvm_vector_unmask() to be called when it is
not supposed to.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-08-27 8:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-26 15:12 [PATCH v2 0/3] iothread and irqfd support Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 1/3] hw/nvme: support irq(de)assertion with eventfd Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 2/3] hw/nvme: use KVM irqfd when available Jinhao Fan
2022-08-26 15:34 ` Keith Busch
2022-08-26 15:45 ` Klaus Jensen
2022-08-26 15:54 ` Keith Busch
2022-08-26 15:58 ` Klaus Jensen
2022-08-26 16:04 ` Jinhao Fan
2022-08-27 8:28 ` Jinhao Fan
2022-08-26 15:12 ` [PATCH v2 3/3] hw/nvme: add iothread support Jinhao Fan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).