* [Qemu-devel] [PATCH] virtio: Use ioeventfd for virtqueue notify
@ 2010-09-30 14:01 Stefan Hajnoczi
2010-10-03 11:01 ` [Qemu-devel] " Avi Kivity
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2010-09-30 14:01 UTC (permalink / raw)
To: qemu-devel
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Michael S. Tsirkin, Khoa Huynh, Sridhar Samudrala
Virtqueue notify is currently handled synchronously in userspace virtio.
This prevents the vcpu from executing guest code while hardware
emulation code handles the notify.
On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to
the iothread and allowing the VM to continue execution. This model is
similar to how vhost receives virtqueue notifies.
The result of this change is improved performance for userspace virtio
devices. Virtio-blk throughput increases especially for multithreaded
scenarios and virtio-net transmit throughput increases substantially.
Full numbers are below.
This patch employs ioeventfd virtqueue notify for all virtio devices.
Linux kernels pre-2.6.34 only allow for 6 ioeventfds per VM and care
must be taken so that vhost-net, the other ioeventfd user in QEMU, is
able to function. On such kernels ioeventfd virtqueue notify will not
be used.
Khoa Huynh <khoa@us.ibm.com> collected the following data for
virtio-blk with cache=none,aio=native:
FFSB Test Threads Unmodified Patched
(MB/s) (MB/s)
Large file create 1 21.7 21.8
8 101.0 118.0
16 119.0 157.0
Sequential reads 1 21.9 23.2
8 114.0 139.0
16 143.0 178.0
Random reads 1 3.3 3.6
8 23.0 25.4
16 43.3 47.8
Random writes 1 22.2 23.0
8 93.1 111.6
16 110.5 132.0
Sridhar Samudrala <sri@us.ibm.com> collected the following data for
virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest.
Guest to Host TCP_STREAM throughput(Mb/sec)
-------------------------------------------
Msg Size vhost-net virtio-net virtio-net/ioeventfd
65536 12755 6430 7590
16384 8499 3084 5764
4096 4723 1578 3659
1024 1827 981 2060
Host to Guest TCP_STREAM throughput(Mb/sec)
-------------------------------------------
Msg Size vhost-net virtio-net virtio-net/ioeventfd
65536 11156 5790 5853
16384 10787 5575 5691
4096 10452 5556 4277
1024 4437 3671 5277
Guest to Host TCP_RR latency(transactions/sec)
----------------------------------------------
Msg Size vhost-net virtio-net virtio-net/ioeventfd
1 9903 3459 3425
4096 7185 1931 1899
16384 6108 2102 1923
65536 3161 1610 1744
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
Small changes are required for qemu-kvm.git. I will send them once qemu.git
has virtio-ioeventfd support.
hw/vhost.c | 6 ++--
hw/virtio.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
hw/virtio.h | 9 +----
kvm-all.c | 39 +++++++++++++++++++++
kvm-stub.c | 5 +++
kvm.h | 1 +
6 files changed, 156 insertions(+), 10 deletions(-)
diff --git a/hw/vhost.c b/hw/vhost.c
index 1b8624d..f127a07 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -517,7 +517,7 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
goto fail_guest_notifier;
}
- r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, true);
+ r = virtio_set_host_notifier(vdev, idx, true);
if (r < 0) {
fprintf(stderr, "Error binding host notifier: %d\n", -r);
goto fail_host_notifier;
@@ -539,7 +539,7 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
fail_call:
fail_kick:
- vdev->binding->set_host_notifier(vdev->binding_opaque, idx, false);
+ virtio_set_host_notifier(vdev, idx, false);
fail_host_notifier:
vdev->binding->set_guest_notifier(vdev->binding_opaque, idx, false);
fail_guest_notifier:
@@ -575,7 +575,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
}
assert (r >= 0);
- r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, false);
+ r = virtio_set_host_notifier(vdev, idx, false);
if (r < 0) {
fprintf(stderr, "vhost VQ %d host cleanup failed: %d\n", idx, r);
fflush(stderr);
diff --git a/hw/virtio.c b/hw/virtio.c
index fbef788..f075b3a 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -16,6 +16,7 @@
#include "trace.h"
#include "virtio.h"
#include "sysemu.h"
+#include "kvm.h"
/* The alignment to use between consumer and producer parts of vring.
* x86 pagesize again. */
@@ -77,6 +78,11 @@ struct VirtQueue
VirtIODevice *vdev;
EventNotifier guest_notifier;
EventNotifier host_notifier;
+ enum {
+ HOST_NOTIFIER_DEASSIGNED, /* inactive */
+ HOST_NOTIFIER_ASSIGNED, /* active */
+ HOST_NOTIFIER_OFFLIMITS, /* active but outside our control */
+ } host_notifier_state;
};
/* virt queue functions */
@@ -453,6 +459,93 @@ void virtio_update_irq(VirtIODevice *vdev)
virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
}
+/* Service virtqueue notify from a host notifier */
+static void virtio_read_host_notifier(void *opaque)
+{
+ VirtQueue *vq = opaque;
+ EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+ if (event_notifier_test_and_clear(notifier)) {
+ if (vq->vring.desc) {
+ vq->handle_output(vq->vdev, vq);
+ }
+ }
+}
+
+/* Transition between host notifier states */
+static int virtio_set_host_notifier_state(VirtIODevice *vdev, int n, int state)
+{
+ VirtQueue *vq = &vdev->vq[n];
+ EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+ int rc;
+
+ if (!kvm_enabled()) {
+ return -ENOSYS;
+ }
+
+ /* If the number of ioeventfds is limited, use them for vhost only */
+ if (state == HOST_NOTIFIER_ASSIGNED && !kvm_has_many_iobus_devs()) {
+ state = HOST_NOTIFIER_DEASSIGNED;
+ }
+
+ /* Ignore if no state change */
+ if (vq->host_notifier_state == state) {
+ return 0;
+ }
+
+ /* Disable read handler if transitioning away from assigned */
+ if (vq->host_notifier_state == HOST_NOTIFIER_ASSIGNED) {
+ qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL);
+ }
+
+ /* Toggle host notifier if transitioning to or from deassigned */
+ if (state == HOST_NOTIFIER_DEASSIGNED ||
+ vq->host_notifier_state == HOST_NOTIFIER_DEASSIGNED) {
+ rc = vdev->binding->set_host_notifier(vdev->binding_opaque, n,
+ state != HOST_NOTIFIER_DEASSIGNED);
+ if (rc < 0) {
+ return rc;
+ }
+ }
+
+ /* Enable read handler if transitioning to assigned */
+ if (state == HOST_NOTIFIER_ASSIGNED) {
+ qemu_set_fd_handler(event_notifier_get_fd(notifier),
+ virtio_read_host_notifier, NULL, vq);
+ }
+
+ vq->host_notifier_state = state;
+ return 0;
+}
+
+/* Try to assign/deassign host notifiers for all virtqueues */
+static void virtio_set_host_notifiers(VirtIODevice *vdev, bool assigned)
+{
+ int state = assigned ? HOST_NOTIFIER_ASSIGNED : HOST_NOTIFIER_DEASSIGNED;
+ int i;
+
+ if (!vdev->binding->set_host_notifier) {
+ return;
+ }
+
+ for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
+ if (vdev->vq[i].host_notifier_state == HOST_NOTIFIER_OFFLIMITS) {
+ continue;
+ }
+
+ if (!vdev->vq[i].vring.desc) {
+ continue;
+ }
+
+ virtio_set_host_notifier_state(vdev, i, state);
+ }
+}
+
+int virtio_set_host_notifier(VirtIODevice *vdev, int n, bool assigned)
+{
+ int state = assigned ? HOST_NOTIFIER_OFFLIMITS : HOST_NOTIFIER_ASSIGNED;
+ return virtio_set_host_notifier_state(vdev, n, state);
+}
+
void virtio_reset(void *opaque)
{
VirtIODevice *vdev = opaque;
@@ -467,6 +560,7 @@ void virtio_reset(void *opaque)
vdev->isr = 0;
vdev->config_vector = VIRTIO_NO_VECTOR;
virtio_notify_vector(vdev, vdev->config_vector);
+ virtio_set_host_notifiers(vdev, false);
for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
vdev->vq[i].vring.desc = 0;
@@ -592,6 +686,16 @@ void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector)
vdev->vq[n].vector = vector;
}
+void virtio_set_status(VirtIODevice *vdev, uint8_t val)
+{
+ virtio_set_host_notifiers(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
+
+ if (vdev->set_status) {
+ vdev->set_status(vdev, val);
+ }
+ vdev->status = val;
+}
+
VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
void (*handle_output)(VirtIODevice *, VirtQueue *))
{
@@ -719,6 +823,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f)
}
virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
+ virtio_set_host_notifiers(vdev, vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
return 0;
}
@@ -746,6 +851,7 @@ VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
vdev->vq[i].vector = VIRTIO_NO_VECTOR;
vdev->vq[i].vdev = vdev;
+ vdev->vq[i].host_notifier_state = HOST_NOTIFIER_DEASSIGNED;
}
vdev->name = name;
diff --git a/hw/virtio.h b/hw/virtio.h
index 96514e6..d76157e 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -125,13 +125,7 @@ struct VirtIODevice
uint16_t device_id;
};
-static inline void virtio_set_status(VirtIODevice *vdev, uint8_t val)
-{
- if (vdev->set_status) {
- vdev->set_status(vdev, val);
- }
- vdev->status = val;
-}
+void virtio_set_status(VirtIODevice *vdev, uint8_t val);
VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
void (*handle_output)(VirtIODevice *,
@@ -217,6 +211,7 @@ target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n);
uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n);
+int virtio_set_host_notifier(VirtIODevice *vdev, int n, bool assigned);
EventNotifier *virtio_queue_get_guest_notifier(VirtQueue *vq);
EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
void virtio_irq(VirtQueue *vq);
diff --git a/kvm-all.c b/kvm-all.c
index 1cc696f..2f09e34 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -24,6 +24,7 @@
#include "qemu-barrier.h"
#include "sysemu.h"
#include "hw/hw.h"
+#include "hw/event_notifier.h"
#include "gdbstub.h"
#include "kvm.h"
#include "bswap.h"
@@ -72,6 +73,7 @@ struct KVMState
int irqchip_in_kernel;
int pit_in_kernel;
int xsave, xcrs;
+ int many_iobus_devs;
};
static KVMState *kvm_state;
@@ -423,6 +425,36 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
return ret;
}
+static int kvm_check_many_iobus_devs(void)
+{
+ /* Older kernels have a 6 device limit on the KVM io bus. In that case
+ * creating many ioeventfds must be avoided. This tests checks for the
+ * limitation.
+ */
+ EventNotifier notifiers[7];
+ int i, ret = 0;
+ for (i = 0; i < ARRAY_SIZE(notifiers); i++) {
+ ret = event_notifier_init(¬ifiers[i], 0);
+ if (ret < 0) {
+ break;
+ }
+ ret = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, true);
+ if (ret < 0) {
+ event_notifier_cleanup(¬ifiers[i]);
+ break;
+ }
+ }
+
+ /* Decide whether many devices are supported or not */
+ ret = i == ARRAY_SIZE(notifiers);
+
+ while (i-- > 0) {
+ kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, false);
+ event_notifier_cleanup(¬ifiers[i]);
+ }
+ return ret;
+}
+
static void kvm_set_phys_mem(target_phys_addr_t start_addr,
ram_addr_t size,
ram_addr_t phys_offset)
@@ -699,6 +731,8 @@ int kvm_init(int smp_cpus)
kvm_state = s;
cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
+ s->many_iobus_devs = kvm_check_many_iobus_devs();
+
return 0;
err:
@@ -1028,6 +1062,11 @@ int kvm_has_xcrs(void)
return kvm_state->xcrs;
}
+int kvm_has_many_iobus_devs(void)
+{
+ return kvm_state->many_iobus_devs;
+}
+
void kvm_setup_guest_memory(void *start, size_t size)
{
if (!kvm_has_sync_mmu()) {
diff --git a/kvm-stub.c b/kvm-stub.c
index d45f9fa..b0887fb 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -99,6 +99,11 @@ int kvm_has_robust_singlestep(void)
return 0;
}
+int kvm_has_many_iobus_devs(void)
+{
+ return 0;
+}
+
void kvm_setup_guest_memory(void *start, size_t size)
{
}
diff --git a/kvm.h b/kvm.h
index 50b6c01..f405906 100644
--- a/kvm.h
+++ b/kvm.h
@@ -42,6 +42,7 @@ int kvm_has_robust_singlestep(void);
int kvm_has_debugregs(void);
int kvm_has_xsave(void);
int kvm_has_xcrs(void);
+int kvm_has_many_iobus_devs(void);
#ifdef NEED_CPU_H
int kvm_init_vcpu(CPUState *env);
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-09-30 14:01 [Qemu-devel] [PATCH] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
@ 2010-10-03 11:01 ` Avi Kivity
2010-10-03 13:51 ` Michael S. Tsirkin
` (2 more replies)
2010-10-19 13:07 ` Stefan Hajnoczi
2010-10-19 13:33 ` Michael S. Tsirkin
2 siblings, 3 replies; 23+ messages in thread
From: Avi Kivity @ 2010-10-03 11:01 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Steve Dobbelstein, Anthony Liguori, kvm, Michael S. Tsirkin,
qemu-devel, Khoa Huynh, Sridhar Samudrala
On 09/30/2010 04:01 PM, Stefan Hajnoczi wrote:
> Virtqueue notify is currently handled synchronously in userspace virtio.
> This prevents the vcpu from executing guest code while hardware
> emulation code handles the notify.
>
> On systems that support KVM, the ioeventfd mechanism can be used to make
> virtqueue notify a lightweight exit by deferring hardware emulation to
> the iothread and allowing the VM to continue execution. This model is
> similar to how vhost receives virtqueue notifies.
Note that this is a tradeoff. If an idle core is available and the
scheduler places the iothread on that core, then the heavyweight exit is
replaced by a lightweight exit + IPI. If the iothread is co-located
with the vcpu, then we'll take a heavyweight exit in any case.
The first case is very likely if the host cpu is undercommitted and
there is heavy I/O activity. This is a typical subsystem benchmark
scenario (as opposed to a system benchmark like specvirt). My feeling
is that total system throughput will be decreased unless the scheduler
is clever enough to place the iothread and vcpu on the same host cpu
when the system is overcommitted.
We can't balance "feeling" against numbers, especially when we have a
precedent in vhost-net, so I think this should go in. But I think we
should also try to understand the effects of the extra IPIs and
cacheline bouncing that this creates. While virtio was designed to
minimize this, we know it has severe problems in this area.
> The result of this change is improved performance for userspace virtio
> devices. Virtio-blk throughput increases especially for multithreaded
> scenarios and virtio-net transmit throughput increases substantially.
> Full numbers are below.
>
> This patch employs ioeventfd virtqueue notify for all virtio devices.
> Linux kernels pre-2.6.34 only allow for 6 ioeventfds per VM and care
> must be taken so that vhost-net, the other ioeventfd user in QEMU, is
> able to function. On such kernels ioeventfd virtqueue notify will not
> be used.
>
> Khoa Huynh<khoa@us.ibm.com> collected the following data for
> virtio-blk with cache=none,aio=native:
>
> FFSB Test Threads Unmodified Patched
> (MB/s) (MB/s)
> Large file create 1 21.7 21.8
> 8 101.0 118.0
> 16 119.0 157.0
>
> Sequential reads 1 21.9 23.2
> 8 114.0 139.0
> 16 143.0 178.0
>
> Random reads 1 3.3 3.6
> 8 23.0 25.4
> 16 43.3 47.8
>
> Random writes 1 22.2 23.0
> 8 93.1 111.6
> 16 110.5 132.0
Impressive numbers. Can you also provide efficiency (bytes per host cpu
seconds)?
How many guest vcpus were used with this? With enough vcpus, there is
also a reduction in cacheline bouncing, since the virtio state in the
host gets to stay on one cpu (especially with aio=native).
> Sridhar Samudrala<sri@us.ibm.com> collected the following data for
> virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest.
>
> Guest to Host TCP_STREAM throughput(Mb/sec)
> -------------------------------------------
> Msg Size vhost-net virtio-net virtio-net/ioeventfd
> 65536 12755 6430 7590
> 16384 8499 3084 5764
> 4096 4723 1578 3659
> 1024 1827 981 2060
Even more impressive (expected since the copying, which isn't present
for block, is now shunted off into an iothread).
On the last test you even exceeded vhost-net. Any theories how/why?
Again, efficiency numbers would be interesting.
> Host to Guest TCP_STREAM throughput(Mb/sec)
> -------------------------------------------
> Msg Size vhost-net virtio-net virtio-net/ioeventfd
> 65536 11156 5790 5853
> 16384 10787 5575 5691
> 4096 10452 5556 4277
> 1024 4437 3671 5277
Here you exceed vhost-net, too.
> +static int kvm_check_many_iobus_devs(void)
> +{
> + /* Older kernels have a 6 device limit on the KVM io bus. In that case
> + * creating many ioeventfds must be avoided. This tests checks for the
> + * limitation.
> + */
> + EventNotifier notifiers[7];
> + int i, ret = 0;
> + for (i = 0; i< ARRAY_SIZE(notifiers); i++) {
> + ret = event_notifier_init(¬ifiers[i], 0);
> + if (ret< 0) {
> + break;
> + }
> + ret = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, true);
> + if (ret< 0) {
> + event_notifier_cleanup(¬ifiers[i]);
> + break;
> + }
> + }
> +
> + /* Decide whether many devices are supported or not */
> + ret = i == ARRAY_SIZE(notifiers);
> +
> + while (i--> 0) {
> + kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, false);
> + event_notifier_cleanup(¬ifiers[i]);
> + }
> + return ret;
> +}
Sorry about that.
IIRC there was a problem (shared by vhost-net) with interrupts remaining
enabled in the window between the guest kicking the queue and the host
waking up and disabling interrupts. An even more vague IIRC mst had an
idea to fix this?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-03 11:01 ` [Qemu-devel] " Avi Kivity
@ 2010-10-03 13:51 ` Michael S. Tsirkin
2010-10-03 14:21 ` Avi Kivity
2010-10-04 14:30 ` Stefan Hajnoczi
2010-10-05 11:00 ` rukhsana ansari
2 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2010-10-03 13:51 UTC (permalink / raw)
To: Avi Kivity
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
qemu-devel, Khoa Huynh, Sridhar Samudrala
On Sun, Oct 03, 2010 at 01:01:59PM +0200, Avi Kivity wrote:
> >
> >Guest to Host TCP_STREAM throughput(Mb/sec)
> >-------------------------------------------
> >Msg Size vhost-net virtio-net virtio-net/ioeventfd
> >65536 12755 6430 7590
> >16384 8499 3084 5764
> > 4096 4723 1578 3659
> > 1024 1827 981 2060
>
> Even more impressive (expected since the copying, which isn't
> present for block, is now shunted off into an iothread).
>
> On the last test you even exceeded vhost-net. Any theories how/why?
>
> Again, efficiency numbers would be interesting.
>
> >Host to Guest TCP_STREAM throughput(Mb/sec)
> >-------------------------------------------
> >Msg Size vhost-net virtio-net virtio-net/ioeventfd
> >65536 11156 5790 5853
> >16384 10787 5575 5691
> > 4096 10452 5556 4277
> > 1024 4437 3671 5277
>
> Here you exceed vhost-net, too.
This is with small packets- I suspect this is the extra
per interrupt overhead that eventfd has.
> >+static int kvm_check_many_iobus_devs(void)
> >+{
> >+ /* Older kernels have a 6 device limit on the KVM io bus. In that case
> >+ * creating many ioeventfds must be avoided. This tests checks for the
> >+ * limitation.
> >+ */
> >+ EventNotifier notifiers[7];
> >+ int i, ret = 0;
> >+ for (i = 0; i< ARRAY_SIZE(notifiers); i++) {
> >+ ret = event_notifier_init(¬ifiers[i], 0);
> >+ if (ret< 0) {
> >+ break;
> >+ }
> >+ ret = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, true);
> >+ if (ret< 0) {
> >+ event_notifier_cleanup(¬ifiers[i]);
> >+ break;
> >+ }
> >+ }
> >+
> >+ /* Decide whether many devices are supported or not */
> >+ ret = i == ARRAY_SIZE(notifiers);
> >+
> >+ while (i--> 0) {
> >+ kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, false);
> >+ event_notifier_cleanup(¬ifiers[i]);
> >+ }
> >+ return ret;
> >+}
>
> Sorry about that.
>
> IIRC there was a problem (shared by vhost-net) with interrupts
> remaining enabled in the window between the guest kicking the queue
> and the host waking up and disabling interrupts. An even more vague
> IIRC mst had an idea to fix this?
This is one of the things that vring2 is supposed to fix.
> --
> error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-03 13:51 ` Michael S. Tsirkin
@ 2010-10-03 14:21 ` Avi Kivity
2010-10-03 14:28 ` Michael S. Tsirkin
0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2010-10-03 14:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
qemu-devel, Khoa Huynh, Sridhar Samudrala
On 10/03/2010 03:51 PM, Michael S. Tsirkin wrote:
> On Sun, Oct 03, 2010 at 01:01:59PM +0200, Avi Kivity wrote:
> > >
> > >Guest to Host TCP_STREAM throughput(Mb/sec)
> > >-------------------------------------------
> > >Msg Size vhost-net virtio-net virtio-net/ioeventfd
> > >65536 12755 6430 7590
> > >16384 8499 3084 5764
> > > 4096 4723 1578 3659
> > > 1024 1827 981 2060
> >
> > Even more impressive (expected since the copying, which isn't
> > present for block, is now shunted off into an iothread).
> >
> > On the last test you even exceeded vhost-net. Any theories how/why?
> >
> > Again, efficiency numbers would be interesting.
> >
> > >Host to Guest TCP_STREAM throughput(Mb/sec)
> > >-------------------------------------------
> > >Msg Size vhost-net virtio-net virtio-net/ioeventfd
> > >65536 11156 5790 5853
> > >16384 10787 5575 5691
> > > 4096 10452 5556 4277
> > > 1024 4437 3671 5277
> >
> > Here you exceed vhost-net, too.
>
> This is with small packets- I suspect this is the extra
> per interrupt overhead that eventfd has.
This is using eventfd as well.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-03 14:21 ` Avi Kivity
@ 2010-10-03 14:28 ` Michael S. Tsirkin
2010-10-04 1:18 ` Anthony Liguori
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2010-10-03 14:28 UTC (permalink / raw)
To: Avi Kivity
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
qemu-devel, Khoa Huynh, Sridhar Samudrala
On Sun, Oct 03, 2010 at 04:21:57PM +0200, Avi Kivity wrote:
> On 10/03/2010 03:51 PM, Michael S. Tsirkin wrote:
> >On Sun, Oct 03, 2010 at 01:01:59PM +0200, Avi Kivity wrote:
> >> >
> >> >Guest to Host TCP_STREAM throughput(Mb/sec)
> >> >-------------------------------------------
> >> >Msg Size vhost-net virtio-net virtio-net/ioeventfd
> >> >65536 12755 6430 7590
> >> >16384 8499 3084 5764
> >> > 4096 4723 1578 3659
> >> > 1024 1827 981 2060
> >>
> >> Even more impressive (expected since the copying, which isn't
> >> present for block, is now shunted off into an iothread).
> >>
> >> On the last test you even exceeded vhost-net. Any theories how/why?
> >>
> >> Again, efficiency numbers would be interesting.
> >>
> >> >Host to Guest TCP_STREAM throughput(Mb/sec)
> >> >-------------------------------------------
> >> >Msg Size vhost-net virtio-net virtio-net/ioeventfd
> >> >65536 11156 5790 5853
> >> >16384 10787 5575 5691
> >> > 4096 10452 5556 4277
> >> > 1024 4437 3671 5277
> >>
> >> Here you exceed vhost-net, too.
> >
> >This is with small packets- I suspect this is the extra
> >per interrupt overhead that eventfd has.
>
> This is using eventfd as well.
Sorry, I meant irqfd.
> --
> error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-03 14:28 ` Michael S. Tsirkin
@ 2010-10-04 1:18 ` Anthony Liguori
2010-10-04 8:04 ` Avi Kivity
0 siblings, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2010-10-04 1:18 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
qemu-devel, Khoa Huynh, Avi Kivity, sri
On 10/03/2010 09:28 AM, Michael S. Tsirkin wrote:
>
>> This is using eventfd as well.
>>
> Sorry, I meant irqfd.
>
I've tried using irqfd in userspace. It hurts performance quite a bit
compared to doing an ioctl so I would suspect this too.
A last_used_idx or similar mechanism should help performance quite a bit
on top of ioeventfd too.
Regards,
Anthony Liguori
>> --
>> error compiling committee.c: too many arguments to function
>>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-04 1:18 ` Anthony Liguori
@ 2010-10-04 8:04 ` Avi Kivity
2010-10-04 14:01 ` Anthony Liguori
0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2010-10-04 8:04 UTC (permalink / raw)
To: Anthony Liguori
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Michael S. Tsirkin, qemu-devel, Khoa Huynh, sri
On 10/04/2010 03:18 AM, Anthony Liguori wrote:
> On 10/03/2010 09:28 AM, Michael S. Tsirkin wrote:
>>
>>> This is using eventfd as well.
>> Sorry, I meant irqfd.
>
> I've tried using irqfd in userspace. It hurts performance quite a bit
> compared to doing an ioctl so I would suspect this too.
>
> A last_used_idx or similar mechanism should help performance quite a
> bit on top of ioeventfd too.
>
Any idea why? While irqfd does quite a bit of extra locking, it
shouldn't be that bad.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-04 8:04 ` Avi Kivity
@ 2010-10-04 14:01 ` Anthony Liguori
2010-10-04 16:12 ` Michael S. Tsirkin
0 siblings, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2010-10-04 14:01 UTC (permalink / raw)
To: Avi Kivity
Cc: Steve Dobbelstein, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
qemu-devel, Khoa Huynh, sri
On 10/04/2010 03:04 AM, Avi Kivity wrote:
> On 10/04/2010 03:18 AM, Anthony Liguori wrote:
>> On 10/03/2010 09:28 AM, Michael S. Tsirkin wrote:
>>>
>>>> This is using eventfd as well.
>>> Sorry, I meant irqfd.
>>
>> I've tried using irqfd in userspace. It hurts performance quite a
>> bit compared to doing an ioctl so I would suspect this too.
>>
>> A last_used_idx or similar mechanism should help performance quite a
>> bit on top of ioeventfd too.
>>
>
> Any idea why? While irqfd does quite a bit of extra locking, it
> shouldn't be that bad.
Not really. It was somewhat counter intuitive.
A worthwhile experiment might be to do some layering violations and have
vhost do an irq injection via an ioctl and see what the performance
delta is. I suspect it could give vhost a nice boost.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-03 11:01 ` [Qemu-devel] " Avi Kivity
2010-10-03 13:51 ` Michael S. Tsirkin
@ 2010-10-04 14:30 ` Stefan Hajnoczi
2010-10-05 11:00 ` rukhsana ansari
2 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2010-10-04 14:30 UTC (permalink / raw)
To: Avi Kivity
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Michael S. Tsirkin, qemu-devel, Khoa Huynh, Sridhar Samudrala
On Sun, Oct 3, 2010 at 12:01 PM, Avi Kivity <avi@redhat.com> wrote:
> On 09/30/2010 04:01 PM, Stefan Hajnoczi wrote:
>>
>> Virtqueue notify is currently handled synchronously in userspace virtio.
>> This prevents the vcpu from executing guest code while hardware
>> emulation code handles the notify.
>>
>> On systems that support KVM, the ioeventfd mechanism can be used to make
>> virtqueue notify a lightweight exit by deferring hardware emulation to
>> the iothread and allowing the VM to continue execution. This model is
>> similar to how vhost receives virtqueue notifies.
>
> Note that this is a tradeoff. If an idle core is available and the
> scheduler places the iothread on that core, then the heavyweight exit is
> replaced by a lightweight exit + IPI. If the iothread is co-located with
> the vcpu, then we'll take a heavyweight exit in any case.
>
> The first case is very likely if the host cpu is undercommitted and there is
> heavy I/O activity. This is a typical subsystem benchmark scenario (as
> opposed to a system benchmark like specvirt). My feeling is that total
> system throughput will be decreased unless the scheduler is clever enough to
> place the iothread and vcpu on the same host cpu when the system is
> overcommitted.
>
> We can't balance "feeling" against numbers, especially when we have a
> precedent in vhost-net, so I think this should go in. But I think we should
> also try to understand the effects of the extra IPIs and cacheline bouncing
> that this creates. While virtio was designed to minimize this, we know it
> has severe problems in this area.
Right, there is a danger of optimizing for subsystem benchmark cases
rather than real world usage. I have posted some results that we've
gathered but more scrutiny is welcome.
>> Khoa Huynh<khoa@us.ibm.com> collected the following data for
>> virtio-blk with cache=none,aio=native:
>>
>> FFSB Test Threads Unmodified Patched
>> (MB/s) (MB/s)
>> Large file create 1 21.7 21.8
>> 8 101.0 118.0
>> 16 119.0 157.0
>>
>> Sequential reads 1 21.9 23.2
>> 8 114.0 139.0
>> 16 143.0 178.0
>>
>> Random reads 1 3.3 3.6
>> 8 23.0 25.4
>> 16 43.3 47.8
>>
>> Random writes 1 22.2 23.0
>> 8 93.1 111.6
>> 16 110.5 132.0
>
> Impressive numbers. Can you also provide efficiency (bytes per host cpu
> seconds)?
Khoa, do you have the host CPU numbers for these benchmark runs?
> How many guest vcpus were used with this? With enough vcpus, there is also
> a reduction in cacheline bouncing, since the virtio state in the host gets
> to stay on one cpu (especially with aio=native).
Guest: 2 vcpu, 4 GB RAM
Host: 16 cpus, 12 GB RAM
Khoa, is this correct?
Stefan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-04 14:01 ` Anthony Liguori
@ 2010-10-04 16:12 ` Michael S. Tsirkin
2010-10-04 16:20 ` Anthony Liguori
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2010-10-04 16:12 UTC (permalink / raw)
To: Anthony Liguori
Cc: Steve Dobbelstein, Stefan Hajnoczi, kvm, qemu-devel, Khoa Huynh,
Avi Kivity, sri
On Mon, Oct 04, 2010 at 09:01:14AM -0500, Anthony Liguori wrote:
> On 10/04/2010 03:04 AM, Avi Kivity wrote:
> > On 10/04/2010 03:18 AM, Anthony Liguori wrote:
> >>On 10/03/2010 09:28 AM, Michael S. Tsirkin wrote:
> >>>
> >>>>This is using eventfd as well.
> >>>Sorry, I meant irqfd.
> >>
> >>I've tried using irqfd in userspace. It hurts performance quite
> >>a bit compared to doing an ioctl so I would suspect this too.
> >>
> >>A last_used_idx or similar mechanism should help performance
> >>quite a bit on top of ioeventfd too.
> >>
> >
> >Any idea why? While irqfd does quite a bit of extra locking, it
> >shouldn't be that bad.
>
> Not really. It was somewhat counter intuitive.
>
> A worthwhile experiment might be to do some layering violations and
> have vhost do an irq injection via an ioctl and see what the
> performance delta is.
I think you don't even need to try that hard.
Just comment this line:
// proxy->pci_dev.msix_mask_notifier = virtio_pci_mask_notifier;
this is what switches to irqfd when msi vector is unmasked.
> I suspect it could give vhost a nice boost.
>
> Regards,
>
> Anthony Liguori
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-04 16:12 ` Michael S. Tsirkin
@ 2010-10-04 16:20 ` Anthony Liguori
2010-10-04 16:25 ` Michael S. Tsirkin
0 siblings, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2010-10-04 16:20 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Steve Dobbelstein, Stefan Hajnoczi, kvm, qemu-devel, Khoa Huynh,
Avi Kivity, sri
On 10/04/2010 11:12 AM, Michael S. Tsirkin wrote:
> On Mon, Oct 04, 2010 at 09:01:14AM -0500, Anthony Liguori wrote:
>
>> On 10/04/2010 03:04 AM, Avi Kivity wrote:
>>
>>> On 10/04/2010 03:18 AM, Anthony Liguori wrote:
>>>
>>>> On 10/03/2010 09:28 AM, Michael S. Tsirkin wrote:
>>>>
>>>>>
>>>>>> This is using eventfd as well.
>>>>>>
>>>>> Sorry, I meant irqfd.
>>>>>
>>>> I've tried using irqfd in userspace. It hurts performance quite
>>>> a bit compared to doing an ioctl so I would suspect this too.
>>>>
>>>> A last_used_idx or similar mechanism should help performance
>>>> quite a bit on top of ioeventfd too.
>>>>
>>>>
>>> Any idea why? While irqfd does quite a bit of extra locking, it
>>> shouldn't be that bad.
>>>
>> Not really. It was somewhat counter intuitive.
>>
>> A worthwhile experiment might be to do some layering violations and
>> have vhost do an irq injection via an ioctl and see what the
>> performance delta is.
>>
> I think you don't even need to try that hard.
> Just comment this line:
> // proxy->pci_dev.msix_mask_notifier = virtio_pci_mask_notifier;
> this is what switches to irqfd when msi vector is unmasked.
>
That drops to userspace though for all irqs, no?
Or did you mean that commenting that line out improves performance
demonstrating the overhead of irqfd?
Regards,
Anthony Liguori
>
>> I suspect it could give vhost a nice boost.
>>
>> Regards,
>>
>> Anthony Liguori
>>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-04 16:20 ` Anthony Liguori
@ 2010-10-04 16:25 ` Michael S. Tsirkin
0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2010-10-04 16:25 UTC (permalink / raw)
To: Anthony Liguori
Cc: Steve Dobbelstein, Stefan Hajnoczi, kvm, qemu-devel, Khoa Huynh,
Avi Kivity, sri
On Mon, Oct 04, 2010 at 11:20:19AM -0500, Anthony Liguori wrote:
> On 10/04/2010 11:12 AM, Michael S. Tsirkin wrote:
> >On Mon, Oct 04, 2010 at 09:01:14AM -0500, Anthony Liguori wrote:
> >>On 10/04/2010 03:04 AM, Avi Kivity wrote:
> >>>On 10/04/2010 03:18 AM, Anthony Liguori wrote:
> >>>>On 10/03/2010 09:28 AM, Michael S. Tsirkin wrote:
> >>>>>>This is using eventfd as well.
> >>>>>Sorry, I meant irqfd.
> >>>>I've tried using irqfd in userspace. It hurts performance quite
> >>>>a bit compared to doing an ioctl so I would suspect this too.
> >>>>
> >>>>A last_used_idx or similar mechanism should help performance
> >>>>quite a bit on top of ioeventfd too.
> >>>>
> >>>Any idea why? While irqfd does quite a bit of extra locking, it
> >>>shouldn't be that bad.
> >>Not really. It was somewhat counter intuitive.
> >>
> >>A worthwhile experiment might be to do some layering violations and
> >>have vhost do an irq injection via an ioctl and see what the
> >>performance delta is.
> >I think you don't even need to try that hard.
> >Just comment this line:
> >// proxy->pci_dev.msix_mask_notifier = virtio_pci_mask_notifier;
> >this is what switches to irqfd when msi vector is unmasked.
>
> That drops to userspace though for all irqs, no?
Exactly.
> Or did you mean that commenting that line out improves performance
> demonstrating the overhead of irqfd?
>
> Regards,
>
> Anthony Liguori
Haven't tried this, but possibly.
> >> I suspect it could give vhost a nice boost.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-03 11:01 ` [Qemu-devel] " Avi Kivity
2010-10-03 13:51 ` Michael S. Tsirkin
2010-10-04 14:30 ` Stefan Hajnoczi
@ 2010-10-05 11:00 ` rukhsana ansari
2010-10-05 11:58 ` Avi Kivity
2 siblings, 1 reply; 23+ messages in thread
From: rukhsana ansari @ 2010-10-05 11:00 UTC (permalink / raw)
To: Avi Kivity
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Michael S. Tsirkin, qemu-devel, Khoa Huynh, Sridhar Samudrala
Hi,
W.r.t:
> Note that this is a tradeoff. If an idle core is available and the
> scheduler places the iothread on that core, then the heavyweight exit is
> replaced by a lightweight exit + IPI. If the iothread is co-located with
> the vcpu, then we'll take a heavyweight exit in any case.
>
Q: Does the kvm kernel code check for such a condition and take a
heavyweight exit?
> The first case is very likely if the host cpu is undercommitted and there is
> heavy I/O activity. This is a typical subsystem benchmark scenario (as
> opposed to a system benchmark like specvirt). My feeling is that total
> system throughput will be decreased unless the scheduler is clever enough to
> place the iothread and vcpu on the same host cpu when the system is
> overcommitted.
>
>
Q: Sorry if the answer is obvious here.
If the heavyweight exit is taken when both threads are assigned to the
same core, how will the system throughput increase?
Thanks
Rukhsana
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-05 11:00 ` rukhsana ansari
@ 2010-10-05 11:58 ` Avi Kivity
0 siblings, 0 replies; 23+ messages in thread
From: Avi Kivity @ 2010-10-05 11:58 UTC (permalink / raw)
To: rukhsana ansari
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Michael S. Tsirkin, qemu-devel, Khoa Huynh, Sridhar Samudrala
On 10/05/2010 01:00 PM, rukhsana ansari wrote:
> Hi,
>
> W.r.t:
> > Note that this is a tradeoff. If an idle core is available and the
> > scheduler places the iothread on that core, then the heavyweight exit is
> > replaced by a lightweight exit + IPI. If the iothread is co-located with
> > the vcpu, then we'll take a heavyweight exit in any case.
> >
> Q: Does the kvm kernel code check for such a condition and take a
> heavyweight exit?
No. The heavyweight exit is caused by a context switch (partial) or
return to userspace (full).
> > The first case is very likely if the host cpu is undercommitted and there is
> > heavy I/O activity. This is a typical subsystem benchmark scenario (as
> > opposed to a system benchmark like specvirt). My feeling is that total
> > system throughput will be decreased unless the scheduler is clever enough to
> > place the iothread and vcpu on the same host cpu when the system is
> > overcommitted.
> >
> >
> Q: Sorry if the answer is obvious here.
> If the heavyweight exit is taken when both threads are assigned to the
> same core, how will the system throughput increase?
>
Co-locating threads on the same core reduces cross-core traffic.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-09-30 14:01 [Qemu-devel] [PATCH] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-10-03 11:01 ` [Qemu-devel] " Avi Kivity
@ 2010-10-19 13:07 ` Stefan Hajnoczi
2010-10-19 13:12 ` Anthony Liguori
2010-10-19 13:33 ` Michael S. Tsirkin
2 siblings, 1 reply; 23+ messages in thread
From: Stefan Hajnoczi @ 2010-10-19 13:07 UTC (permalink / raw)
To: qemu-devel
Cc: Steve Dobbelstein, Anthony Liguori, kvm, Michael S. Tsirkin,
Khoa Huynh, Sridhar Samudrala
On Thu, Sep 30, 2010 at 03:01:52PM +0100, Stefan Hajnoczi wrote:
> Virtqueue notify is currently handled synchronously in userspace virtio.
> This prevents the vcpu from executing guest code while hardware
> emulation code handles the notify.
>
> On systems that support KVM, the ioeventfd mechanism can be used to make
> virtqueue notify a lightweight exit by deferring hardware emulation to
> the iothread and allowing the VM to continue execution. This model is
> similar to how vhost receives virtqueue notifies.
>
> The result of this change is improved performance for userspace virtio
> devices. Virtio-blk throughput increases especially for multithreaded
> scenarios and virtio-net transmit throughput increases substantially.
> Full numbers are below.
>
> This patch employs ioeventfd virtqueue notify for all virtio devices.
> Linux kernels pre-2.6.34 only allow for 6 ioeventfds per VM and care
> must be taken so that vhost-net, the other ioeventfd user in QEMU, is
> able to function. On such kernels ioeventfd virtqueue notify will not
> be used.
>
> Khoa Huynh <khoa@us.ibm.com> collected the following data for
> virtio-blk with cache=none,aio=native:
>
> FFSB Test Threads Unmodified Patched
> (MB/s) (MB/s)
> Large file create 1 21.7 21.8
> 8 101.0 118.0
> 16 119.0 157.0
>
> Sequential reads 1 21.9 23.2
> 8 114.0 139.0
> 16 143.0 178.0
>
> Random reads 1 3.3 3.6
> 8 23.0 25.4
> 16 43.3 47.8
>
> Random writes 1 22.2 23.0
> 8 93.1 111.6
> 16 110.5 132.0
>
> Sridhar Samudrala <sri@us.ibm.com> collected the following data for
> virtio-net with 2.6.36-rc1 on the host and 2.6.34 on the guest.
>
> Guest to Host TCP_STREAM throughput(Mb/sec)
> -------------------------------------------
> Msg Size vhost-net virtio-net virtio-net/ioeventfd
> 65536 12755 6430 7590
> 16384 8499 3084 5764
> 4096 4723 1578 3659
> 1024 1827 981 2060
>
> Host to Guest TCP_STREAM throughput(Mb/sec)
> -------------------------------------------
> Msg Size vhost-net virtio-net virtio-net/ioeventfd
> 65536 11156 5790 5853
> 16384 10787 5575 5691
> 4096 10452 5556 4277
> 1024 4437 3671 5277
>
> Guest to Host TCP_RR latency(transactions/sec)
> ----------------------------------------------
>
> Msg Size vhost-net virtio-net virtio-net/ioeventfd
> 1 9903 3459 3425
> 4096 7185 1931 1899
> 16384 6108 2102 1923
> 65536 3161 1610 1744
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
> Small changes are required for qemu-kvm.git. I will send them once qemu.git
> has virtio-ioeventfd support.
>
> hw/vhost.c | 6 ++--
> hw/virtio.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> hw/virtio.h | 9 +----
> kvm-all.c | 39 +++++++++++++++++++++
> kvm-stub.c | 5 +++
> kvm.h | 1 +
> 6 files changed, 156 insertions(+), 10 deletions(-)
Is there anything stopping this patch from being merged?
Thanks,
Stefan
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-19 13:07 ` Stefan Hajnoczi
@ 2010-10-19 13:12 ` Anthony Liguori
2010-10-19 13:35 ` Michael S. Tsirkin
0 siblings, 1 reply; 23+ messages in thread
From: Anthony Liguori @ 2010-10-19 13:12 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Steve Dobbelstein, Anthony Liguori, kvm, Michael S. Tsirkin,
qemu-devel, Khoa Huynh, sri
On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote:
> Is there anything stopping this patch from being merged?
>
Michael, any objections? If not, I'll merge it.
Regards,
Anthony Liguori
> Thanks,
> Stefan
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-09-30 14:01 [Qemu-devel] [PATCH] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-10-03 11:01 ` [Qemu-devel] " Avi Kivity
2010-10-19 13:07 ` Stefan Hajnoczi
@ 2010-10-19 13:33 ` Michael S. Tsirkin
2010-10-25 13:25 ` Stefan Hajnoczi
2010-10-25 15:01 ` Stefan Hajnoczi
2 siblings, 2 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2010-10-19 13:33 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Steve Dobbelstein, Anthony Liguori, kvm, qemu-devel, Khoa Huynh,
Sridhar Samudrala
As a general comment, could you please try to split this patch
up, to make it easier to review? I did a pass over it but I am
still not understanding it completely.
My main concern is with the fact that we add more state
in notifiers that can easily get out of sync with users.
If we absolutely need this state, let's try to at least
document the state machine, and make the API
for state transitions more transparent.
On Thu, Sep 30, 2010 at 03:01:52PM +0100, Stefan Hajnoczi wrote:
> diff --git a/hw/vhost.c b/hw/vhost.c
> index 1b8624d..f127a07 100644
> --- a/hw/vhost.c
> +++ b/hw/vhost.c
> @@ -517,7 +517,7 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
> goto fail_guest_notifier;
> }
>
> - r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, true);
> + r = virtio_set_host_notifier(vdev, idx, true);
> if (r < 0) {
> fprintf(stderr, "Error binding host notifier: %d\n", -r);
> goto fail_host_notifier;
> @@ -539,7 +539,7 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
>
> fail_call:
> fail_kick:
> - vdev->binding->set_host_notifier(vdev->binding_opaque, idx, false);
> + virtio_set_host_notifier(vdev, idx, false);
> fail_host_notifier:
> vdev->binding->set_guest_notifier(vdev->binding_opaque, idx, false);
> fail_guest_notifier:
> @@ -575,7 +575,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
> }
> assert (r >= 0);
>
> - r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, false);
> + r = virtio_set_host_notifier(vdev, idx, false);
> if (r < 0) {
> fprintf(stderr, "vhost VQ %d host cleanup failed: %d\n", idx, r);
> fflush(stderr);
> diff --git a/hw/virtio.c b/hw/virtio.c
> index fbef788..f075b3a 100644
> --- a/hw/virtio.c
> +++ b/hw/virtio.c
> @@ -16,6 +16,7 @@
> #include "trace.h"
> #include "virtio.h"
> #include "sysemu.h"
> +#include "kvm.h"
>
> /* The alignment to use between consumer and producer parts of vring.
> * x86 pagesize again. */
> @@ -77,6 +78,11 @@ struct VirtQueue
> VirtIODevice *vdev;
> EventNotifier guest_notifier;
> EventNotifier host_notifier;
> + enum {
> + HOST_NOTIFIER_DEASSIGNED, /* inactive */
> + HOST_NOTIFIER_ASSIGNED, /* active */
> + HOST_NOTIFIER_OFFLIMITS, /* active but outside our control */
> + } host_notifier_state;
This state machine confuses me. Please note that users already
track notifier state and call set with assign/deassign correctly.
The comment does not help: what does 'outside our control' mean?
Who's control?
> };
>
> /* virt queue functions */
> @@ -453,6 +459,93 @@ void virtio_update_irq(VirtIODevice *vdev)
> virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
> }
>
> +/* Service virtqueue notify from a host notifier */
> +static void virtio_read_host_notifier(void *opaque)
> +{
> + VirtQueue *vq = opaque;
> + EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
> + if (event_notifier_test_and_clear(notifier)) {
> + if (vq->vring.desc) {
> + vq->handle_output(vq->vdev, vq);
> + }
> + }
> +}
> +
> +/* Transition between host notifier states */
> +static int virtio_set_host_notifier_state(VirtIODevice *vdev, int n, int state)
really unfortunate naming for functions: we seem to have
about 4 of them starting with virtio_set_host_notifier*
> +{
> + VirtQueue *vq = &vdev->vq[n];
> + EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
> + int rc;
> +
> + if (!kvm_enabled()) {
> + return -ENOSYS;
> + }
If this means that there's no need to do anything for non kvm,
return 0 here.
> +
> + /* If the number of ioeventfds is limited, use them for vhost only */
> + if (state == HOST_NOTIFIER_ASSIGNED && !kvm_has_many_iobus_devs()) {
> + state = HOST_NOTIFIER_DEASSIGNED;
> + }
> +
> + /* Ignore if no state change */
> + if (vq->host_notifier_state == state) {
> + return 0;
> + }
> +
> + /* Disable read handler if transitioning away from assigned */
> + if (vq->host_notifier_state == HOST_NOTIFIER_ASSIGNED) {
> + qemu_set_fd_handler(event_notifier_get_fd(notifier), NULL, NULL, NULL);
> + }
> +
> + /* Toggle host notifier if transitioning to or from deassigned */
> + if (state == HOST_NOTIFIER_DEASSIGNED ||
> + vq->host_notifier_state == HOST_NOTIFIER_DEASSIGNED) {
> + rc = vdev->binding->set_host_notifier(vdev->binding_opaque, n,
> + state != HOST_NOTIFIER_DEASSIGNED);
> + if (rc < 0) {
> + return rc;
> + }
> + }
> +
> + /* Enable read handler if transitioning to assigned */
> + if (state == HOST_NOTIFIER_ASSIGNED) {
> + qemu_set_fd_handler(event_notifier_get_fd(notifier),
> + virtio_read_host_notifier, NULL, vq);
> + }
> +
> + vq->host_notifier_state = state;
> + return 0;
> +}
> +
> +/* Try to assign/deassign host notifiers for all virtqueues */
> +static void virtio_set_host_notifiers(VirtIODevice *vdev, bool assigned)
void? don't we care whether this fails?
> +{
This really confuses me. Why is this not a simple loop
over all vqs?
> + int state = assigned ? HOST_NOTIFIER_ASSIGNED : HOST_NOTIFIER_DEASSIGNED;
> + int i;
> +
> + if (!vdev->binding->set_host_notifier) {
> + return;
> + }
> +
> + for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
> + if (vdev->vq[i].host_notifier_state == HOST_NOTIFIER_OFFLIMITS) {
> + continue;
> + }
> +
> + if (!vdev->vq[i].vring.desc) {
> + continue;
> + }
> +
> + virtio_set_host_notifier_state(vdev, i, state);
ignores return value?
> + }
> +}
> +
> +int virtio_set_host_notifier(VirtIODevice *vdev, int n, bool assigned)
> +{
> + int state = assigned ? HOST_NOTIFIER_OFFLIMITS : HOST_NOTIFIER_ASSIGNED;
So here assigned == false sets state to ASSIGNED?
> + return virtio_set_host_notifier_state(vdev, n, state);
> +}
> +
> void virtio_reset(void *opaque)
> {
> VirtIODevice *vdev = opaque;
> @@ -467,6 +560,7 @@ void virtio_reset(void *opaque)
> vdev->isr = 0;
> vdev->config_vector = VIRTIO_NO_VECTOR;
> virtio_notify_vector(vdev, vdev->config_vector);
> + virtio_set_host_notifiers(vdev, false);
>
> for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
> vdev->vq[i].vring.desc = 0;
> @@ -592,6 +686,16 @@ void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector)
> vdev->vq[n].vector = vector;
> }
>
> +void virtio_set_status(VirtIODevice *vdev, uint8_t val)
> +{
> + virtio_set_host_notifiers(vdev, val & VIRTIO_CONFIG_S_DRIVER_OK);
> +
> + if (vdev->set_status) {
> + vdev->set_status(vdev, val);
> + }
> + vdev->status = val;
How does this interact with vhost.c which uses both notifiers and status
callback?
> +}
> +
> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> void (*handle_output)(VirtIODevice *, VirtQueue *))
> {
> @@ -719,6 +823,7 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f)
> }
>
> virtio_notify_vector(vdev, VIRTIO_NO_VECTOR);
> + virtio_set_host_notifiers(vdev, vdev->status & VIRTIO_CONFIG_S_DRIVER_OK);
> return 0;
> }
>
> @@ -746,6 +851,7 @@ VirtIODevice *virtio_common_init(const char *name, uint16_t device_id,
> for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
> vdev->vq[i].vector = VIRTIO_NO_VECTOR;
> vdev->vq[i].vdev = vdev;
> + vdev->vq[i].host_notifier_state = HOST_NOTIFIER_DEASSIGNED;
> }
>
> vdev->name = name;
> diff --git a/hw/virtio.h b/hw/virtio.h
> index 96514e6..d76157e 100644
> --- a/hw/virtio.h
> +++ b/hw/virtio.h
> @@ -125,13 +125,7 @@ struct VirtIODevice
> uint16_t device_id;
> };
>
> -static inline void virtio_set_status(VirtIODevice *vdev, uint8_t val)
> -{
> - if (vdev->set_status) {
> - vdev->set_status(vdev, val);
> - }
> - vdev->status = val;
> -}
> +void virtio_set_status(VirtIODevice *vdev, uint8_t val);
>
> VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> void (*handle_output)(VirtIODevice *,
> @@ -217,6 +211,7 @@ target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n);
> uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
> void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
> VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n);
> +int virtio_set_host_notifier(VirtIODevice *vdev, int n, bool assigned);
I think it's not the best API. Two functions e.g.
_assign
_deassign
would make it easier to avoid confusion.
> EventNotifier *virtio_queue_get_guest_notifier(VirtQueue *vq);
> EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
> void virtio_irq(VirtQueue *vq);
> diff --git a/kvm-all.c b/kvm-all.c
> index 1cc696f..2f09e34 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -24,6 +24,7 @@
> #include "qemu-barrier.h"
> #include "sysemu.h"
> #include "hw/hw.h"
> +#include "hw/event_notifier.h"
> #include "gdbstub.h"
> #include "kvm.h"
> #include "bswap.h"
> @@ -72,6 +73,7 @@ struct KVMState
> int irqchip_in_kernel;
> int pit_in_kernel;
> int xsave, xcrs;
> + int many_iobus_devs;
> };
>
> static KVMState *kvm_state;
> @@ -423,6 +425,36 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
> return ret;
> }
>
> +static int kvm_check_many_iobus_devs(void)
> +{
> + /* Older kernels have a 6 device limit on the KVM io bus. In that case
> + * creating many ioeventfds must be avoided. This tests
This tests -> this test
>+ checks for the limitation.
> + */
> + EventNotifier notifiers[7];
This is low level stuff, we really don't need to use notifiers here,
simple eventfd will be enough.
> + int i, ret = 0;
> + for (i = 0; i < ARRAY_SIZE(notifiers); i++) {
> + ret = event_notifier_init(¬ifiers[i], 0);
> + if (ret < 0) {
> + break;
> + }
> + ret = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, true);
> + if (ret < 0) {
> + event_notifier_cleanup(¬ifiers[i]);
> + break;
> + }
> + }
> +
> + /* Decide whether many devices are supported or not */
> + ret = i == ARRAY_SIZE(notifiers);
> +
> + while (i-- > 0) {
> + kvm_set_ioeventfd_pio_word(event_notifier_get_fd(¬ifiers[i]), 0, i, false);
> + event_notifier_cleanup(¬ifiers[i]);
> + }
> + return ret;
> +}
> +
> static void kvm_set_phys_mem(target_phys_addr_t start_addr,
> ram_addr_t size,
> ram_addr_t phys_offset)
> @@ -699,6 +731,8 @@ int kvm_init(int smp_cpus)
> kvm_state = s;
> cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
>
> + s->many_iobus_devs = kvm_check_many_iobus_devs();
> +
> return 0;
>
> err:
> @@ -1028,6 +1062,11 @@ int kvm_has_xcrs(void)
> return kvm_state->xcrs;
> }
>
> +int kvm_has_many_iobus_devs(void)
> +{
> + return kvm_state->many_iobus_devs;
> +}
> +
> void kvm_setup_guest_memory(void *start, size_t size)
> {
> if (!kvm_has_sync_mmu()) {
> diff --git a/kvm-stub.c b/kvm-stub.c
> index d45f9fa..b0887fb 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -99,6 +99,11 @@ int kvm_has_robust_singlestep(void)
> return 0;
> }
>
> +int kvm_has_many_iobus_devs(void)
> +{
> + return 0;
> +}
> +
> void kvm_setup_guest_memory(void *start, size_t size)
> {
> }
> diff --git a/kvm.h b/kvm.h
> index 50b6c01..f405906 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -42,6 +42,7 @@ int kvm_has_robust_singlestep(void);
> int kvm_has_debugregs(void);
> int kvm_has_xsave(void);
> int kvm_has_xcrs(void);
> +int kvm_has_many_iobus_devs(void);
Looks like a pretty low level API.
Name it 'kvm_has_ioeventfd' or something like this?
>
> #ifdef NEED_CPU_H
> int kvm_init_vcpu(CPUState *env);
> --
> 1.7.1
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-19 13:12 ` Anthony Liguori
@ 2010-10-19 13:35 ` Michael S. Tsirkin
2010-10-19 13:44 ` Stefan Hajnoczi
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2010-10-19 13:35 UTC (permalink / raw)
To: Anthony Liguori
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
qemu-devel, Khoa Huynh, sri
On Tue, Oct 19, 2010 at 08:12:42AM -0500, Anthony Liguori wrote:
> On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote:
> >Is there anything stopping this patch from being merged?
>
> Michael, any objections? If not, I'll merge it.
I don't really understand what's going on there. The extra state in
notifiers especially scares me. If you do and are comfortable with the
code, go ahead :)
--
MST
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-19 13:44 ` Stefan Hajnoczi
@ 2010-10-19 13:43 ` Michael S. Tsirkin
2010-11-10 14:52 ` Stefan Hajnoczi
0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2010-10-19 13:43 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Anthony Liguori, qemu-devel, Khoa Huynh, sri
On Tue, Oct 19, 2010 at 02:44:35PM +0100, Stefan Hajnoczi wrote:
> On Tue, Oct 19, 2010 at 2:35 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Oct 19, 2010 at 08:12:42AM -0500, Anthony Liguori wrote:
> >> On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote:
> >> >Is there anything stopping this patch from being merged?
> >>
> >> Michael, any objections? If not, I'll merge it.
> >
> > I don't really understand what's going on there. The extra state in
> > notifiers especially scares me. If you do and are comfortable with the
> > code, go ahead :)
>
> I'm happy to address your comments. The state machine was a bit icky
> but I don't see a way around it.
I think the situation is similar to irqfd in qemu-kvm - take a look
there, specifically msix mask notifiers.
--
MST
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-19 13:35 ` Michael S. Tsirkin
@ 2010-10-19 13:44 ` Stefan Hajnoczi
2010-10-19 13:43 ` Michael S. Tsirkin
0 siblings, 1 reply; 23+ messages in thread
From: Stefan Hajnoczi @ 2010-10-19 13:44 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Anthony Liguori, qemu-devel, Khoa Huynh, sri
On Tue, Oct 19, 2010 at 2:35 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Oct 19, 2010 at 08:12:42AM -0500, Anthony Liguori wrote:
>> On 10/19/2010 08:07 AM, Stefan Hajnoczi wrote:
>> >Is there anything stopping this patch from being merged?
>>
>> Michael, any objections? If not, I'll merge it.
>
> I don't really understand what's going on there. The extra state in
> notifiers especially scares me. If you do and are comfortable with the
> code, go ahead :)
I'm happy to address your comments. The state machine was a bit icky
but I don't see a way around it. Will follow up to your review email.
Stefan
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-19 13:33 ` Michael S. Tsirkin
@ 2010-10-25 13:25 ` Stefan Hajnoczi
2010-10-25 15:01 ` Stefan Hajnoczi
1 sibling, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2010-10-25 13:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Steve Dobbelstein, Anthony Liguori, kvm, qemu-devel, Khoa Huynh,
Sridhar Samudrala
On Tue, Oct 19, 2010 at 03:33:41PM +0200, Michael S. Tsirkin wrote:
> My main concern is with the fact that we add more state
> in notifiers that can easily get out of sync with users.
> If we absolutely need this state, let's try to at least
> document the state machine, and make the API
> for state transitions more transparent.
I'll try to describe how it works. If you're happy with the design in
principle then I can rework the code. Otherwise we can think about a
different design.
The goal is to use ioeventfd instead of the synchronous pio emulation
path that userspace virtqueues use today. Both virtio-blk and
virtio-net increase performance with this approach because it does not
block the vcpu from executing guest code while the I/O operation is
initiated.
We want to automatically create an event notifier and setup ioeventfd
for each initialized virtqueue.
Vhost already uses ioeventfd so it is important not to interfere with
devices that have enabled vhost. If vhost is enabled, then the device's
virtqueues are off-limits and should not be tampered with.
Furthermore, older kernels limit you to 6 ioeventfds per guest. On such
systems it is risky to automatically use ioeventfd for userspace
virtqueues, since that could take a precious ioeventfd away from another
virtio device using vhost. Existing guest configurations would break so
it is simplest to avoid using ioeventfd for userspace virtqueues on such
hosts.
The design adds logic into hw/virtio.c to automatically use ioeventfd
for userspace virtqueues. Specific virtio devices like blk and net
require no modification. The logic sits below the set_host_notifier()
function that vhost uses.
This design stays in sync because it speaks two interfaces that allow it
to accurately track whether or not to use ioeventfd:
1. virtio_set_host_notifier() is used by vhost. When vhost enables the
host notifier we stay out of the way.
2. virtio_reset()/virtio_set_status()/virtio_load() define the device
life-cycle and transition the state machine appropriately. Migration
is supported.
Here is the state machine that tracks a virtqueue:
assigned
^ / \ ^
e. / / c. g. \ \ b.
/ / \ \
/ v f. v \ a.
offlimits ---------------> deassigned <-- start
<---------------
d.
a. The virtqueue starts deassigned with no ioeventfd.
b. When the device status becomes VIRTIO_CONFIG_S_DRIVER_OK we try to
assign an ioeventfd to each virtqueue, except if the 6 ioeventfd
limitation is present.
c, d. The virtqueue becomes offlimits if vhost enables the host notifier.
e. The ioeventfd becomes assigned again when the host notifier is disabled by vhost.
f. Except when the 6 ioeventfd limitation is present, then the ioeventfd
becomes unassigned because we want to avoid using ioeventfd.
g. When the device is reset its virtqueues become deassigned again.
Does this make sense?
Stefan
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-19 13:33 ` Michael S. Tsirkin
2010-10-25 13:25 ` Stefan Hajnoczi
@ 2010-10-25 15:01 ` Stefan Hajnoczi
1 sibling, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2010-10-25 15:01 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
qemu-devel, Khoa Huynh, Sridhar Samudrala
On Tue, Oct 19, 2010 at 03:33:41PM +0200, Michael S. Tsirkin wrote:
Apologies if you receive this twice, the original message either
disappeared or was delayed somehow.
> My main concern is with the fact that we add more state
> in notifiers that can easily get out of sync with users.
> If we absolutely need this state, let's try to at least
> document the state machine, and make the API
> for state transitions more transparent.
I'll try to describe how it works. If you're happy with the design in
principle then I can rework the code. Otherwise we can think about a
different design.
The goal is to use ioeventfd instead of the synchronous pio emulation
path that userspace virtqueues use today. Both virtio-blk and
virtio-net increase performance with this approach because it does not
block the vcpu from executing guest code while the I/O operation is
initiated.
We want to automatically create an event notifier and setup ioeventfd
for each initialized virtqueue.
Vhost already uses ioeventfd so it is important not to interfere with
devices that have enabled vhost. If vhost is enabled, then the device's
virtqueues are off-limits and should not be tampered with.
Furthermore, older kernels limit you to 6 ioeventfds per guest. On such
systems it is risky to automatically use ioeventfd for userspace
virtqueues, since that could take a precious ioeventfd away from another
virtio device using vhost. Existing guest configurations would break so
it is simplest to avoid using ioeventfd for userspace virtqueues on such
hosts.
The design adds logic into hw/virtio.c to automatically use ioeventfd
for userspace virtqueues. Specific virtio devices like blk and net
require no modification. The logic sits below the set_host_notifier()
function that vhost uses.
This design stays in sync because it speaks two interfaces that allow it
to accurately track whether or not to use ioeventfd:
1. virtio_set_host_notifier() is used by vhost. When vhost enables the
host notifier we stay out of the way.
2. virtio_reset()/virtio_set_status()/virtio_load() define the device
life-cycle and transition the state machine appropriately. Migration
is supported.
Here is the state machine that tracks a virtqueue:
assigned
^ / \ ^
e. / / c. g. \ \ b.
/ / \ \
/ v f. v \ a.
offlimits ---------------> deassigned <-- start
<---------------
d.
a. The virtqueue starts deassigned with no ioeventfd.
b. When the device status becomes VIRTIO_CONFIG_S_DRIVER_OK we try to
assign an ioeventfd to each virtqueue, except if the 6 ioeventfd
limitation is present.
c, d. The virtqueue becomes offlimits if vhost enables the host notifier.
e. The ioeventfd becomes assigned again when the host notifier is
disabled by vhost.
f. Except when the 6 ioeventfd limitation is present, then the ioeventfd
becomes unassigned because we want to avoid using ioeventfd.
g. When the device is reset its virtqueues become deassigned again.
Does this make sense?
Stefan
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] Re: [PATCH] virtio: Use ioeventfd for virtqueue notify
2010-10-19 13:43 ` Michael S. Tsirkin
@ 2010-11-10 14:52 ` Stefan Hajnoczi
0 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2010-11-10 14:52 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Steve Dobbelstein, Anthony Liguori, Stefan Hajnoczi, kvm,
Anthony Liguori, qemu-devel, Khoa Huynh, sri
Hi Michael,
I have looked into the way irqfd with msix mask notifiers works. From
what I can tell, the guest notifiers are enabled by vhost net in order
to hook up irqfds for the virtqueues. MSIX allows vectors to be
masked so there is a mmio write notifier in qemu-kvm to toggle the
irqfd and its QEMU fd handler when the guest toggles the MSIX mask.
The irqfd is disabled but stays open as an eventfd while masked. That
means masking/unmasking the vector does not close/open the eventfd
file descriptor itself.
I'm having trouble finding a direct parallel to virtio-ioeventfd here.
We always want to have an ioeventfd per virtqueue unless the host
kernel does not support >6 ioeventfds per VM.
When vhost sets the host notifier we want to remove the QEMU fd
handler and allow vhost to use the event notifier's fd as it wants.
When vhost clears the host notifier we want to add the QEMU fd handler
again (unless the kernel does not support >6 ioeventfds per VM).
I think hooking in at the virtio-pci.c level instead of virtio.c is
possible but we're still going to have the same state transitions. I
hope it can be done without adding per-virtqueue variables that track
state.
Before I go down this route, is there something I've missed and do you
think this approach will be better?
Stefan
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2010-11-10 14:52 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-30 14:01 [Qemu-devel] [PATCH] virtio: Use ioeventfd for virtqueue notify Stefan Hajnoczi
2010-10-03 11:01 ` [Qemu-devel] " Avi Kivity
2010-10-03 13:51 ` Michael S. Tsirkin
2010-10-03 14:21 ` Avi Kivity
2010-10-03 14:28 ` Michael S. Tsirkin
2010-10-04 1:18 ` Anthony Liguori
2010-10-04 8:04 ` Avi Kivity
2010-10-04 14:01 ` Anthony Liguori
2010-10-04 16:12 ` Michael S. Tsirkin
2010-10-04 16:20 ` Anthony Liguori
2010-10-04 16:25 ` Michael S. Tsirkin
2010-10-04 14:30 ` Stefan Hajnoczi
2010-10-05 11:00 ` rukhsana ansari
2010-10-05 11:58 ` Avi Kivity
2010-10-19 13:07 ` Stefan Hajnoczi
2010-10-19 13:12 ` Anthony Liguori
2010-10-19 13:35 ` Michael S. Tsirkin
2010-10-19 13:44 ` Stefan Hajnoczi
2010-10-19 13:43 ` Michael S. Tsirkin
2010-11-10 14:52 ` Stefan Hajnoczi
2010-10-19 13:33 ` Michael S. Tsirkin
2010-10-25 13:25 ` Stefan Hajnoczi
2010-10-25 15:01 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).