[PATCH v2 0/6] KVM: Improve MMIO Coalescing API

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/6] KVM: Improve MMIO Coalescing API
@ 2024-07-18 19:35 Ilias Stamatis
  2024-07-18 19:35 ` [PATCH v2 1/6] KVM: Fix coalesced_mmio_has_room() Ilias Stamatis
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Ilias Stamatis @ 2024-07-18 19:35 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: pdurrant, dwmw, nh-open-source, Ilias Stamatis

The current MMIO coalescing design has a few drawbacks which limit its
usefulness. Currently all coalesced MMIO zones use the same ring buffer.
That means that upon a userspace exit we have to handle potentially
unrelated MMIO writes synchronously. And a VM-wide lock needs to be
taken in the kernel when an MMIO exit occurs.

Additionally, there is no direct way for userspace to be notified about
coalesced MMIO writes. If the next MMIO exit to userspace is when the
ring buffer has filled then a substantial (and unbounded) amount of time
may have passed since the first coalesced MMIO.

This series adds new ioctls to KVM that allow for greater control by
making it possible to associate different MMIO zones with different ring
buffers. It also allows userspace to use poll() to check for coalesced
writes in order to avoid userspace exits in vCPU threads (see patch 3
for why this can be useful).

The idea of improving the API in this way originally came from Paul
Durrant (pdurrant@amazon.co.uk) but the implementation is mine.

The first patch in the series is a bug in the existing code that I
discovered while writing a selftest and can be merged independently.

Changelog:

v2: 
  - Fixed a bug in poll() in patch 3 where a NULL check was omitted.
  - Added an explicit include to allow the build to succeed on powerpc.
  - Rebased on top kvm/queue resolving a few conflicts.
  - Tiny changes on the last 2 patches.

v1: https://lore.kernel.org/kvm/20240710085259.2125131-1-ilstam@amazon.com/T/

Ilias Stamatis (6):
  KVM: Fix coalesced_mmio_has_room()
  KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl
  KVM: Support poll() on coalesced mmio buffer fds
  KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls
  KVM: Documentation: Document v2 of coalesced MMIO API
  KVM: selftests: Add coalesced_mmio_test

 Documentation/virt/kvm/api.rst                |  91 +++++
 include/linux/kvm_host.h                      |   1 +
 include/uapi/linux/kvm.h                      |  18 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/coalesced_mmio_test.c       | 313 ++++++++++++++++++
 virt/kvm/coalesced_mmio.c                     | 205 +++++++++++-
 virt/kvm/coalesced_mmio.h                     |  17 +-
 virt/kvm/kvm_main.c                           |  40 ++-
 8 files changed, 665 insertions(+), 21 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/coalesced_mmio_test.c

-- 
2.34.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/6] KVM: Fix coalesced_mmio_has_room()
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
@ 2024-07-18 19:35 ` Ilias Stamatis
  2024-07-18 19:35 ` [PATCH v2 2/6] KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl Ilias Stamatis
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ilias Stamatis @ 2024-07-18 19:35 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: pdurrant, dwmw, nh-open-source, Ilias Stamatis, Paul Durrant

The following calculation used in coalesced_mmio_has_room() to check
whether the ring buffer is full is wrong and only allows half the buffer
to be used.

avail = (ring->first - last - 1) % KVM_COALESCED_MMIO_MAX;
if (avail == 0)
	/* full */

The % operator in C is not the modulo operator but the remainder
operator. Modulo and remainder operators differ with respect to negative
values. But all values are unsigned in this case anyway.

The above might have worked as expected in python for example:
>>> (-86) % 170
84

However it doesn't work the same way in C.

printf("avail: %d\n", (-86) % 170);
printf("avail: %u\n", (-86) % 170);
printf("avail: %u\n", (-86u) % 170u);

Using gcc-11 these print:

avail: -86
avail: 4294967210
avail: 0

Fix the calculation and allow all but one entries in the buffer to be
used as originally intended.

Fixes: 105f8d40a737 ("KVM: Calculate available entries in coalesced mmio ring")
Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
 virt/kvm/coalesced_mmio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 1b90acb6e3fe..184c5c40c9c1 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -43,7 +43,6 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev,
 static int coalesced_mmio_has_room(struct kvm_coalesced_mmio_dev *dev, u32 last)
 {
 	struct kvm_coalesced_mmio_ring *ring;
-	unsigned avail;
 
 	/* Are we able to batch it ? */
 
@@ -52,8 +51,7 @@ static int coalesced_mmio_has_room(struct kvm_coalesced_mmio_dev *dev, u32 last)
 	 * there is always one unused entry in the buffer
 	 */
 	ring = dev->kvm->coalesced_mmio_ring;
-	avail = (ring->first - last - 1) % KVM_COALESCED_MMIO_MAX;
-	if (avail == 0) {
+	if ((last + 1) % KVM_COALESCED_MMIO_MAX == READ_ONCE(ring->first)) {
 		/* full */
 		return 0;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/6] KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
  2024-07-18 19:35 ` [PATCH v2 1/6] KVM: Fix coalesced_mmio_has_room() Ilias Stamatis
@ 2024-07-18 19:35 ` Ilias Stamatis
  2024-08-17  0:32   ` Sean Christopherson
  2024-07-18 19:35 ` [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds Ilias Stamatis
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Ilias Stamatis @ 2024-07-18 19:35 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: pdurrant, dwmw, nh-open-source, Ilias Stamatis, Paul Durrant

The current MMIO coalescing design has a few drawbacks which limit its
usefulness. Currently all coalesced MMIO zones use the same ring buffer.
That means that upon a userspace exit we have to handle potentially
unrelated MMIO writes synchronously. And a VM-wide lock needs to be
taken in the kernel when an MMIO exit occurs.

Additionally, there is no direct way for userspace to be notified about
coalesced MMIO writes. If the next MMIO exit to userspace is when the
ring buffer has filled then a substantial (and unbounded) amount of time
may have passed since the first coalesced MMIO.

Add a KVM_CREATE_COALESCED_MMIO_BUFFER ioctl to KVM. This ioctl simply
returns a file descriptor to the caller but does not allocate a ring
buffer. Userspace can then pass this fd to mmap() to actually allocate a
buffer and map it to its address space.

Subsequent patches will allow userspace to:

- Associate the fd with a coalescing zone when registering it so that
  writes to that zone are accumulated in that specific ring buffer
  rather than the VM-wide one.
- Poll for MMIO writes using this fd.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---

v1->v2: 
  - Rebased on top of kvm/queue resolving the conflict in kvm.h

 include/linux/kvm_host.h  |   1 +
 include/uapi/linux/kvm.h  |   2 +
 virt/kvm/coalesced_mmio.c | 142 +++++++++++++++++++++++++++++++++++---
 virt/kvm/coalesced_mmio.h |   9 +++
 virt/kvm/kvm_main.c       |   4 ++
 5 files changed, 150 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 689e8be873a7..ea381de7812e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -801,6 +801,7 @@ struct kvm {
 	struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
 	spinlock_t ring_lock;
 	struct list_head coalesced_zones;
+	struct list_head coalesced_buffers;
 #endif
 
 	struct mutex irq_lock;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 637efc055145..87f79a820fc0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1573,4 +1573,6 @@ struct kvm_pre_fault_memory {
 	__u64 padding[5];
 };
 
+#define KVM_CREATE_COALESCED_MMIO_BUFFER _IO(KVMIO,   0xd6)
+
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 184c5c40c9c1..6443d4b62548 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -4,6 +4,7 @@
  *
  * Copyright (c) 2008 Bull S.A.S.
  * Copyright 2009 Red Hat, Inc. and/or its affiliates.
+ * Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
  *
  *  Author: Laurent Vivier <Laurent.Vivier@bull.net>
  *
@@ -14,6 +15,7 @@
 #include <linux/kvm_host.h>
 #include <linux/slab.h>
 #include <linux/kvm.h>
+#include <linux/anon_inodes.h>
 
 #include "coalesced_mmio.h"
 
@@ -40,17 +42,14 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev,
 	return 1;
 }
 
-static int coalesced_mmio_has_room(struct kvm_coalesced_mmio_dev *dev, u32 last)
+static int coalesced_mmio_has_room(struct kvm_coalesced_mmio_ring *ring, u32 last)
 {
-	struct kvm_coalesced_mmio_ring *ring;
-
 	/* Are we able to batch it ? */
 
 	/* last is the first free entry
 	 * check if we don't meet the first used entry
 	 * there is always one unused entry in the buffer
 	 */
-	ring = dev->kvm->coalesced_mmio_ring;
 	if ((last + 1) % KVM_COALESCED_MMIO_MAX == READ_ONCE(ring->first)) {
 		/* full */
 		return 0;
@@ -65,17 +64,28 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
 {
 	struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
 	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
+	spinlock_t *lock = dev->buffer_dev ?
+			   &dev->buffer_dev->ring_lock :
+			   &dev->kvm->ring_lock;
 	__u32 insert;
 
 	if (!coalesced_mmio_in_range(dev, addr, len))
 		return -EOPNOTSUPP;
 
-	spin_lock(&dev->kvm->ring_lock);
+	spin_lock(lock);
+
+	if (dev->buffer_dev) {
+		ring = dev->buffer_dev->ring;
+		if (!ring) {
+			spin_unlock(lock);
+			return -EOPNOTSUPP;
+		}
+	}
 
 	insert = READ_ONCE(ring->last);
-	if (!coalesced_mmio_has_room(dev, insert) ||
+	if (!coalesced_mmio_has_room(ring, insert) ||
 	    insert >= KVM_COALESCED_MMIO_MAX) {
-		spin_unlock(&dev->kvm->ring_lock);
+		spin_unlock(lock);
 		return -EOPNOTSUPP;
 	}
 
@@ -87,7 +97,7 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
 	ring->coalesced_mmio[insert].pio = dev->zone.pio;
 	smp_wmb();
 	ring->last = (insert + 1) % KVM_COALESCED_MMIO_MAX;
-	spin_unlock(&dev->kvm->ring_lock);
+	spin_unlock(lock);
 	return 0;
 }
 
@@ -122,6 +132,7 @@ int kvm_coalesced_mmio_init(struct kvm *kvm)
 	 */
 	spin_lock_init(&kvm->ring_lock);
 	INIT_LIST_HEAD(&kvm->coalesced_zones);
+	INIT_LIST_HEAD(&kvm->coalesced_buffers);
 
 	return 0;
 }
@@ -132,11 +143,125 @@ void kvm_coalesced_mmio_free(struct kvm *kvm)
 		free_page((unsigned long)kvm->coalesced_mmio_ring);
 }
 
+static void coalesced_mmio_buffer_vma_close(struct vm_area_struct *vma)
+{
+	struct kvm_coalesced_mmio_buffer_dev *dev = vma->vm_private_data;
+
+	spin_lock(&dev->ring_lock);
+
+	vfree(dev->ring);
+	dev->ring = NULL;
+
+	spin_unlock(&dev->ring_lock);
+}
+
+static const struct vm_operations_struct coalesced_mmio_buffer_vm_ops = {
+	.close = coalesced_mmio_buffer_vma_close,
+};
+
+static int coalesced_mmio_buffer_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct kvm_coalesced_mmio_buffer_dev *dev = file->private_data;
+	unsigned long pfn;
+	int ret = 0;
+
+	spin_lock(&dev->ring_lock);
+
+	if (dev->ring) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	dev->ring = vmalloc_user(PAGE_SIZE);
+	if (!dev->ring) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	pfn = vmalloc_to_pfn(dev->ring);
+
+	if (remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE,
+			    vma->vm_page_prot)) {
+		vfree(dev->ring);
+		dev->ring = NULL;
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
+
+	vma->vm_ops = &coalesced_mmio_buffer_vm_ops;
+	vma->vm_private_data = dev;
+
+out_unlock:
+	spin_unlock(&dev->ring_lock);
+
+	return ret;
+}
+
+static int coalesced_mmio_buffer_release(struct inode *inode, struct file *file)
+{
+
+	struct kvm_coalesced_mmio_buffer_dev *buffer_dev = file->private_data;
+	struct kvm_coalesced_mmio_dev *mmio_dev, *tmp;
+	struct kvm *kvm = buffer_dev->kvm;
+
+	/* Deregister all zones associated with this ring buffer */
+	mutex_lock(&kvm->slots_lock);
+
+	list_for_each_entry_safe(mmio_dev, tmp, &kvm->coalesced_zones, list) {
+		if (mmio_dev->buffer_dev == buffer_dev) {
+			if (kvm_io_bus_unregister_dev(kvm,
+			    mmio_dev->zone.pio ? KVM_PIO_BUS : KVM_MMIO_BUS,
+			    &mmio_dev->dev))
+				break;
+		}
+	}
+
+	list_del(&buffer_dev->list);
+	kfree(buffer_dev);
+
+	mutex_unlock(&kvm->slots_lock);
+
+	return 0;
+}
+
+static const struct file_operations coalesced_mmio_buffer_ops = {
+	.mmap = coalesced_mmio_buffer_mmap,
+	.release = coalesced_mmio_buffer_release,
+};
+
+int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm)
+{
+	int ret;
+	struct kvm_coalesced_mmio_buffer_dev *dev;
+
+	dev = kzalloc(sizeof(struct kvm_coalesced_mmio_buffer_dev),
+		      GFP_KERNEL_ACCOUNT);
+	if (!dev)
+		return -ENOMEM;
+
+	dev->kvm = kvm;
+	spin_lock_init(&dev->ring_lock);
+
+	ret = anon_inode_getfd("coalesced_mmio_buf", &coalesced_mmio_buffer_ops,
+			       dev, O_RDWR | O_CLOEXEC);
+	if (ret < 0) {
+		kfree(dev);
+		return ret;
+	}
+
+	mutex_lock(&kvm->slots_lock);
+	list_add_tail(&dev->list, &kvm->coalesced_buffers);
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
 					 struct kvm_coalesced_mmio_zone *zone)
 {
 	int ret;
 	struct kvm_coalesced_mmio_dev *dev;
+	struct kvm_coalesced_mmio_buffer_dev *buffer_dev = NULL;
 
 	if (zone->pio != 1 && zone->pio != 0)
 		return -EINVAL;
@@ -149,6 +274,7 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
 	kvm_iodevice_init(&dev->dev, &coalesced_mmio_ops);
 	dev->kvm = kvm;
 	dev->zone = *zone;
+	dev->buffer_dev = buffer_dev;
 
 	mutex_lock(&kvm->slots_lock);
 	ret = kvm_io_bus_register_dev(kvm,
diff --git a/virt/kvm/coalesced_mmio.h b/virt/kvm/coalesced_mmio.h
index 36f84264ed25..37d9d8f325bb 100644
--- a/virt/kvm/coalesced_mmio.h
+++ b/virt/kvm/coalesced_mmio.h
@@ -20,6 +20,14 @@ struct kvm_coalesced_mmio_dev {
 	struct kvm_io_device dev;
 	struct kvm *kvm;
 	struct kvm_coalesced_mmio_zone zone;
+	struct kvm_coalesced_mmio_buffer_dev *buffer_dev;
+};
+
+struct kvm_coalesced_mmio_buffer_dev {
+	struct list_head list;
+	struct kvm *kvm;
+	spinlock_t ring_lock;
+	struct kvm_coalesced_mmio_ring *ring;
 };
 
 int kvm_coalesced_mmio_init(struct kvm *kvm);
@@ -28,6 +36,7 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
 					struct kvm_coalesced_mmio_zone *zone);
 int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
 					struct kvm_coalesced_mmio_zone *zone);
+int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm);
 
 #else
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d0788d0a72cc..9eb22287384f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5246,6 +5246,10 @@ static long kvm_vm_ioctl(struct file *filp,
 		r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone);
 		break;
 	}
+	case KVM_CREATE_COALESCED_MMIO_BUFFER: {
+		r = kvm_vm_ioctl_create_coalesced_mmio_buffer(kvm);
+		break;
+	}
 #endif
 	case KVM_IRQFD: {
 		struct kvm_irqfd data;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/6] KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl
  2024-07-18 19:35 ` [PATCH v2 2/6] KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl Ilias Stamatis
@ 2024-08-17  0:32   ` Sean Christopherson
  0 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2024-08-17  0:32 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, pbonzini, pdurrant, dwmw, nh-open-source, Paul Durrant

On Thu, Jul 18, 2024, Ilias Stamatis wrote:
> @@ -40,17 +42,14 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev,
>  	return 1;
>  }
>  
> -static int coalesced_mmio_has_room(struct kvm_coalesced_mmio_dev *dev, u32 last)
> +static int coalesced_mmio_has_room(struct kvm_coalesced_mmio_ring *ring, u32 last)
>  {
> -	struct kvm_coalesced_mmio_ring *ring;
> -
>  	/* Are we able to batch it ? */
>  
>  	/* last is the first free entry
>  	 * check if we don't meet the first used entry
>  	 * there is always one unused entry in the buffer
>  	 */
> -	ring = dev->kvm->coalesced_mmio_ring;
>  	if ((last + 1) % KVM_COALESCED_MMIO_MAX == READ_ONCE(ring->first)) {
>  		/* full */
>  		return 0;
> @@ -65,17 +64,28 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
>  {
>  	struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
>  	struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
> +	spinlock_t *lock = dev->buffer_dev ?
> +			   &dev->buffer_dev->ring_lock :
> +			   &dev->kvm->ring_lock;

I'd prefer

	spinlock_t *lock = dev->buffer_dev ? &dev->buffer_dev->ring_lock :
					     &dev->kvm->ring_lock;

to make it more obvious which parts the "guts" of the effective if-statement.


> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d0788d0a72cc..9eb22287384f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5246,6 +5246,10 @@ static long kvm_vm_ioctl(struct file *filp,
>  		r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone);
>  		break;
>  	}
> +	case KVM_CREATE_COALESCED_MMIO_BUFFER: {

Curly braces aren't needed, the above case has 'em because it declares local
variables.

> +		r = kvm_vm_ioctl_create_coalesced_mmio_buffer(kvm);
> +		break;
> +	}
>  #endif
>  	case KVM_IRQFD: {
>  		struct kvm_irqfd data;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
  2024-07-18 19:35 ` [PATCH v2 1/6] KVM: Fix coalesced_mmio_has_room() Ilias Stamatis
  2024-07-18 19:35 ` [PATCH v2 2/6] KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl Ilias Stamatis
@ 2024-07-18 19:35 ` Ilias Stamatis
  2024-07-20  1:35   ` kernel test robot
  2024-07-20 21:10   ` kernel test robot
  2024-07-18 19:35 ` [PATCH v2 4/6] KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls Ilias Stamatis
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: Ilias Stamatis @ 2024-07-18 19:35 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: pdurrant, dwmw, nh-open-source, Ilias Stamatis

There is no direct way for userspace to be notified about coalesced MMIO
writes when using KVM_REGISTER_COALESCED_MMIO. If the next MMIO exit is
when the ring buffer has filled then a substantial (and unbounded)
amount of time may have passed since the first coalesced MMIO.

To improve this, make it possible for userspace to use poll() and
select() on the fd returned by the KVM_CREATE_COALESCED_MMIO_BUFFER
ioctl. This way a userspace VMM could have dedicated threads that deal
with writes to specific MMIO zones.

For example, a common use of MMIO, particularly in the realm of network
devices, is as a doorbell. A write to a doorbell register will trigger
the device to initiate a DMA transfer.

When a network device is emulated by userspace a write to a doorbell
register would typically result in an MMIO exit so that userspace can
emulate the DMA transfer in a timely manner. No further processing can
be done until userspace performs the necessary emulation and re-invokes
KVM_RUN. Even if userspace makes use of another thread to emulate the
DMA transfer such MMIO exits are disruptive to the vCPU and they may
also be quite frequent if, for example, the vCPU is sending a sequence
of short packets to the network device.

By supporting poll() on coalesced buffer fds, userspace can have
dedicated threads wait for new doorbell writes and avoid the performance
hit of userspace exits on the main vCPU threads.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---

v1->v2:
  - Added a NULL-check for dev->ring in coalesced_mmio_buffer_poll()

 virt/kvm/coalesced_mmio.c | 22 ++++++++++++++++++++++
 virt/kvm/coalesced_mmio.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 6443d4b62548..64428b0a4aad 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -16,6 +16,7 @@
 #include <linux/slab.h>
 #include <linux/kvm.h>
 #include <linux/anon_inodes.h>
+#include <linux/poll.h>
 
 #include "coalesced_mmio.h"
 
@@ -98,6 +99,10 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
 	smp_wmb();
 	ring->last = (insert + 1) % KVM_COALESCED_MMIO_MAX;
 	spin_unlock(lock);
+
+	if (dev->buffer_dev)
+		wake_up_interruptible(&dev->buffer_dev->wait_queue);
+
 	return 0;
 }
 
@@ -224,9 +229,25 @@ static int coalesced_mmio_buffer_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static __poll_t coalesced_mmio_buffer_poll(struct file *file, struct poll_table_struct *wait)
+{
+	struct kvm_coalesced_mmio_buffer_dev *dev = file->private_data;
+	__poll_t mask = 0;
+
+	poll_wait(file, &dev->wait_queue, wait);
+
+	spin_lock(&dev->ring_lock);
+	if (dev->ring && (READ_ONCE(dev->ring->first) != READ_ONCE(dev->ring->last)))
+		mask = POLLIN | POLLRDNORM;
+	spin_unlock(&dev->ring_lock);
+
+	return mask;
+}
+
 static const struct file_operations coalesced_mmio_buffer_ops = {
 	.mmap = coalesced_mmio_buffer_mmap,
 	.release = coalesced_mmio_buffer_release,
+	.poll = coalesced_mmio_buffer_poll,
 };
 
 int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm)
@@ -240,6 +261,7 @@ int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm)
 		return -ENOMEM;
 
 	dev->kvm = kvm;
+	init_waitqueue_head(&dev->wait_queue);
 	spin_lock_init(&dev->ring_lock);
 
 	ret = anon_inode_getfd("coalesced_mmio_buf", &coalesced_mmio_buffer_ops,
diff --git a/virt/kvm/coalesced_mmio.h b/virt/kvm/coalesced_mmio.h
index 37d9d8f325bb..d1807ce26464 100644
--- a/virt/kvm/coalesced_mmio.h
+++ b/virt/kvm/coalesced_mmio.h
@@ -26,6 +26,7 @@ struct kvm_coalesced_mmio_dev {
 struct kvm_coalesced_mmio_buffer_dev {
 	struct list_head list;
 	struct kvm *kvm;
+	wait_queue_head_t wait_queue;
 	spinlock_t ring_lock;
 	struct kvm_coalesced_mmio_ring *ring;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds
  2024-07-18 19:35 ` [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds Ilias Stamatis
@ 2024-07-20  1:35   ` kernel test robot
  2024-07-22 14:26     ` Stamatis, Ilias
  2024-07-20 21:10   ` kernel test robot
  1 sibling, 1 reply; 16+ messages in thread
From: kernel test robot @ 2024-07-20  1:35 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, pbonzini
  Cc: oe-kbuild-all, pdurrant, dwmw, nh-open-source, Ilias Stamatis

Hi Ilias,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kvm/queue]
[also build test WARNING on next-20240719]
[cannot apply to mst-vhost/linux-next linus/master kvm/linux-next v6.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ilias-Stamatis/KVM-Fix-coalesced_mmio_has_room/20240719-034316
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
patch link:    https://lore.kernel.org/r/20240718193543.624039-4-ilstam%40amazon.com
patch subject: [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds
config: x86_64-randconfig-122-20240719 (https://download.01.org/0day-ci/archive/20240720/202407200922.pJAJMVRk-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240720/202407200922.pJAJMVRk-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407200922.pJAJMVRk-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __poll_t [usertype] mask @@     got int @@
   arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse:     expected restricted __poll_t [usertype] mask
   arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse:     got int
   arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/mm.h, ...):
   include/linux/page-flags.h:240:46: sparse: sparse: self-comparison always evaluates to false
   include/linux/page-flags.h:240:46: sparse: sparse: self-comparison always evaluates to false

vim +241 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c

   231	
   232	static __poll_t coalesced_mmio_buffer_poll(struct file *file, struct poll_table_struct *wait)
   233	{
   234		struct kvm_coalesced_mmio_buffer_dev *dev = file->private_data;
   235		__poll_t mask = 0;
   236	
   237		poll_wait(file, &dev->wait_queue, wait);
   238	
   239		spin_lock(&dev->ring_lock);
   240		if (dev->ring && (READ_ONCE(dev->ring->first) != READ_ONCE(dev->ring->last)))
 > 241			mask = POLLIN | POLLRDNORM;
   242		spin_unlock(&dev->ring_lock);
   243	
   244		return mask;
   245	}
   246	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds
  2024-07-20  1:35   ` kernel test robot
@ 2024-07-22 14:26     ` Stamatis, Ilias
  0 siblings, 0 replies; 16+ messages in thread
From: Stamatis, Ilias @ 2024-07-22 14:26 UTC (permalink / raw)
  To: kvm@vger.kernel.org, Stamatis, Ilias, pbonzini@redhat.com
  Cc: nh-open-source@amazon.com, Durrant, Paul, Woodhouse, David,
	oe-kbuild-all@lists.linux.dev

On Sat, 2024-07-20 at 09:35 +0800, kernel test robot wrote:
> Hi Ilias,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on kvm/queue]
> [also build test WARNING on next-20240719]
> [cannot apply to mst-vhost/linux-next linus/master kvm/linux-next v6.10]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Ilias-Stamatis/KVM-Fix-coalesced_mmio_has_room/20240719-034316
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
> patch link:    https://lore.kernel.org/r/20240718193543.624039-4-ilstam%40amazon.com
> patch subject: [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds
> config: x86_64-randconfig-122-20240719 (https://download.01.org/0day-ci/archive/20240720/202407200922.pJAJMVRk-lkp@intel.com/config)
> compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240720/202407200922.pJAJMVRk-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes: https://lore.kernel.org/oe-kbuild-all/202407200922.pJAJMVRk-lkp@intel.com/
> 
> sparse warnings: (new ones prefixed by >>)
> > > arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __poll_t [usertype] mask @@     got int @@
>    arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse:     expected restricted __poll_t [usertype] mask
>    arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse:     got int
>    arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/mm.h, ...):
>    include/linux/page-flags.h:240:46: sparse: sparse: self-comparison always evaluates to false
>    include/linux/page-flags.h:240:46: sparse: sparse: self-comparison always evaluates to false
> 
> vim +241 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c
> 
>    231
>    232  static __poll_t coalesced_mmio_buffer_poll(struct file *file, struct poll_table_struct *wait)
>    233  {
>    234          struct kvm_coalesced_mmio_buffer_dev *dev = file->private_data;
>    235          __poll_t mask = 0;
>    236
>    237          poll_wait(file, &dev->wait_queue, wait);
>    238
>    239          spin_lock(&dev->ring_lock);
>    240          if (dev->ring && (READ_ONCE(dev->ring->first) != READ_ONCE(dev->ring->last)))
>  > 241                  mask = POLLIN | POLLRDNORM;


This must be EPOLLIN | EPOLLRDNORM, but I will wait for more feedback /
reviews before sending another version.


Ilias

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds
  2024-07-18 19:35 ` [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds Ilias Stamatis
  2024-07-20  1:35   ` kernel test robot
@ 2024-07-20 21:10   ` kernel test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-07-20 21:10 UTC (permalink / raw)
  To: Ilias Stamatis, kvm, pbonzini
  Cc: oe-kbuild-all, pdurrant, dwmw, nh-open-source, Ilias Stamatis

Hi Ilias,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kvm/queue]
[also build test WARNING on next-20240719]
[cannot apply to mst-vhost/linux-next linus/master kvm/linux-next v6.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ilias-Stamatis/KVM-Fix-coalesced_mmio_has_room/20240719-034316
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
patch link:    https://lore.kernel.org/r/20240718193543.624039-4-ilstam%40amazon.com
patch subject: [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds
config: riscv-randconfig-r121-20240719 (https://download.01.org/0day-ci/archive/20240721/202407210447.D2HLgqLd-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce: (https://download.01.org/0day-ci/archive/20240721/202407210447.D2HLgqLd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407210447.D2HLgqLd-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> arch/riscv/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __poll_t [usertype] mask @@     got int @@
   arch/riscv/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse:     expected restricted __poll_t [usertype] mask
   arch/riscv/kvm/../../../virt/kvm/coalesced_mmio.c:241:22: sparse:     got int

vim +241 arch/riscv/kvm/../../../virt/kvm/coalesced_mmio.c

   231	
   232	static __poll_t coalesced_mmio_buffer_poll(struct file *file, struct poll_table_struct *wait)
   233	{
   234		struct kvm_coalesced_mmio_buffer_dev *dev = file->private_data;
   235		__poll_t mask = 0;
   236	
   237		poll_wait(file, &dev->wait_queue, wait);
   238	
   239		spin_lock(&dev->ring_lock);
   240		if (dev->ring && (READ_ONCE(dev->ring->first) != READ_ONCE(dev->ring->last)))
 > 241			mask = POLLIN | POLLRDNORM;
   242		spin_unlock(&dev->ring_lock);
   243	
   244		return mask;
   245	}
   246	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 4/6] KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
                   ` (2 preceding siblings ...)
  2024-07-18 19:35 ` [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds Ilias Stamatis
@ 2024-07-18 19:35 ` Ilias Stamatis
  2024-08-17  0:36   ` Sean Christopherson
  2024-07-18 19:35 ` [PATCH v2 5/6] KVM: Documentation: Document v2 of coalesced MMIO API Ilias Stamatis
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Ilias Stamatis @ 2024-07-18 19:35 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: pdurrant, dwmw, nh-open-source, Ilias Stamatis, Paul Durrant

Add 2 new ioctls, KVM_REGISTER_COALESCED_MMIO2 and
KVM_UNREGISTER_COALESCED_MMIO2. These do the same thing as their v1
equivalents except an fd returned by KVM_CREATE_COALESCED_MMIO_BUFFER
needs to be passed as an argument to them.

The fd representing a ring buffer is associated with an MMIO region
registered for coalescing and all writes to that region are accumulated
there. This is in contrast to the v1 API where all regions have to share
the same buffer. Nevertheless, userspace code can still use the same
ring buffer for multiple zones if it wishes to do so.

Userspace can check for the availability of the new API by checking if
the KVM_CAP_COALESCED_MMIO2 capability is supported.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---

v1->v2:
  - Rebased on top of kvm/queue resolving the conflict in kvm.h

 include/uapi/linux/kvm.h  | 16 ++++++++++++++++
 virt/kvm/coalesced_mmio.c | 37 +++++++++++++++++++++++++++++++------
 virt/kvm/coalesced_mmio.h |  7 ++++---
 virt/kvm/kvm_main.c       | 36 +++++++++++++++++++++++++++++++++++-
 4 files changed, 86 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 87f79a820fc0..7e8d5c879986 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -480,6 +480,16 @@ struct kvm_coalesced_mmio_zone {
 	};
 };
 
+struct kvm_coalesced_mmio_zone2 {
+	__u64 addr;
+	__u32 size;
+	union {
+		__u32 pad;
+		__u32 pio;
+	};
+	int buffer_fd;
+};
+
 struct kvm_coalesced_mmio {
 	__u64 phys_addr;
 	__u32 len;
@@ -933,6 +943,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_PRE_FAULT_MEMORY 236
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_COALESCED_MMIO2 239
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
@@ -1573,6 +1584,11 @@ struct kvm_pre_fault_memory {
 	__u64 padding[5];
 };
 
+/* Available with KVM_CAP_COALESCED_MMIO2 */
 #define KVM_CREATE_COALESCED_MMIO_BUFFER _IO(KVMIO,   0xd6)
+#define KVM_REGISTER_COALESCED_MMIO2 \
+			_IOW(KVMIO,  0xd7, struct kvm_coalesced_mmio_zone2)
+#define KVM_UNREGISTER_COALESCED_MMIO2 \
+			_IOW(KVMIO,  0xd8, struct kvm_coalesced_mmio_zone2)
 
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 64428b0a4aad..729f3293f768 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -17,6 +17,7 @@
 #include <linux/kvm.h>
 #include <linux/anon_inodes.h>
 #include <linux/poll.h>
+#include <linux/file.h>
 
 #include "coalesced_mmio.h"
 
@@ -279,19 +280,40 @@ int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm)
 }
 
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
-					 struct kvm_coalesced_mmio_zone *zone)
+					 struct kvm_coalesced_mmio_zone2 *zone,
+					 bool use_buffer_fd)
 {
-	int ret;
+	int ret = 0;
+	struct file *file;
 	struct kvm_coalesced_mmio_dev *dev;
 	struct kvm_coalesced_mmio_buffer_dev *buffer_dev = NULL;
 
 	if (zone->pio != 1 && zone->pio != 0)
 		return -EINVAL;
 
+	if (use_buffer_fd) {
+		file = fget(zone->buffer_fd);
+		if (!file)
+			return -EBADF;
+
+		if (file->f_op != &coalesced_mmio_buffer_ops) {
+			fput(file);
+			return -EINVAL;
+		}
+
+		buffer_dev = file->private_data;
+		if (!buffer_dev->ring) {
+			fput(file);
+			return -ENOBUFS;
+		}
+	}
+
 	dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev),
 		      GFP_KERNEL_ACCOUNT);
-	if (!dev)
-		return -ENOMEM;
+	if (!dev) {
+		ret = -ENOMEM;
+		goto out_free_file;
+	}
 
 	kvm_iodevice_init(&dev->dev, &coalesced_mmio_ops);
 	dev->kvm = kvm;
@@ -307,17 +329,20 @@ int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
 	list_add_tail(&dev->list, &kvm->coalesced_zones);
 	mutex_unlock(&kvm->slots_lock);
 
-	return 0;
+	goto out_free_file;
 
 out_free_dev:
 	mutex_unlock(&kvm->slots_lock);
 	kfree(dev);
+out_free_file:
+	if (use_buffer_fd)
+		fput(file);
 
 	return ret;
 }
 
 int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
-					   struct kvm_coalesced_mmio_zone *zone)
+					   struct kvm_coalesced_mmio_zone2 *zone)
 {
 	struct kvm_coalesced_mmio_dev *dev, *tmp;
 	int r;
diff --git a/virt/kvm/coalesced_mmio.h b/virt/kvm/coalesced_mmio.h
index d1807ce26464..32792adb7cb4 100644
--- a/virt/kvm/coalesced_mmio.h
+++ b/virt/kvm/coalesced_mmio.h
@@ -19,7 +19,7 @@ struct kvm_coalesced_mmio_dev {
 	struct list_head list;
 	struct kvm_io_device dev;
 	struct kvm *kvm;
-	struct kvm_coalesced_mmio_zone zone;
+	struct kvm_coalesced_mmio_zone2 zone;
 	struct kvm_coalesced_mmio_buffer_dev *buffer_dev;
 };
 
@@ -34,9 +34,10 @@ struct kvm_coalesced_mmio_buffer_dev {
 int kvm_coalesced_mmio_init(struct kvm *kvm);
 void kvm_coalesced_mmio_free(struct kvm *kvm);
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
-					struct kvm_coalesced_mmio_zone *zone);
+					struct kvm_coalesced_mmio_zone2 *zone,
+					bool use_buffer_fd);
 int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
-					struct kvm_coalesced_mmio_zone *zone);
+					struct kvm_coalesced_mmio_zone2 *zone);
 int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm);
 
 #else
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9eb22287384f..ef7dcce80136 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4892,6 +4892,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #ifdef CONFIG_KVM_MMIO
 	case KVM_CAP_COALESCED_MMIO:
 		return KVM_COALESCED_MMIO_PAGE_OFFSET;
+	case KVM_CAP_COALESCED_MMIO2:
 	case KVM_CAP_COALESCED_PIO:
 		return 1;
 #endif
@@ -5230,15 +5231,48 @@ static long kvm_vm_ioctl(struct file *filp,
 #ifdef CONFIG_KVM_MMIO
 	case KVM_REGISTER_COALESCED_MMIO: {
 		struct kvm_coalesced_mmio_zone zone;
+		struct kvm_coalesced_mmio_zone2 zone2;
 
 		r = -EFAULT;
 		if (copy_from_user(&zone, argp, sizeof(zone)))
 			goto out;
-		r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone);
+
+		zone2.addr = zone.addr;
+		zone2.size = zone.size;
+		zone2.pio = zone.pio;
+		zone2.buffer_fd = -1;
+
+		r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone2, false);
+		break;
+	}
+	case KVM_REGISTER_COALESCED_MMIO2: {
+		struct kvm_coalesced_mmio_zone2 zone;
+
+		r = -EFAULT;
+		if (copy_from_user(&zone, argp, sizeof(zone)))
+			goto out;
+
+		r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone, true);
 		break;
 	}
 	case KVM_UNREGISTER_COALESCED_MMIO: {
 		struct kvm_coalesced_mmio_zone zone;
+		struct kvm_coalesced_mmio_zone2 zone2;
+
+		r = -EFAULT;
+		if (copy_from_user(&zone, argp, sizeof(zone)))
+			goto out;
+
+		zone2.addr = zone.addr;
+		zone2.size = zone.size;
+		zone2.pio = zone.pio;
+		zone2.buffer_fd = -1;
+
+		r = kvm_vm_ioctl_unregister_coalesced_mmio(kvm, &zone2);
+		break;
+	}
+	case KVM_UNREGISTER_COALESCED_MMIO2: {
+		struct kvm_coalesced_mmio_zone2 zone;
 
 		r = -EFAULT;
 		if (copy_from_user(&zone, argp, sizeof(zone)))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/6] KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls
  2024-07-18 19:35 ` [PATCH v2 4/6] KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls Ilias Stamatis
@ 2024-08-17  0:36   ` Sean Christopherson
  0 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2024-08-17  0:36 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm, pbonzini, pdurrant, dwmw, nh-open-source, Paul Durrant

On Thu, Jul 18, 2024, Ilias Stamatis wrote:
> Add 2 new ioctls, KVM_REGISTER_COALESCED_MMIO2 and
> KVM_UNREGISTER_COALESCED_MMIO2. These do the same thing as their v1
> equivalents except an fd returned by KVM_CREATE_COALESCED_MMIO_BUFFER
> needs to be passed as an argument to them.
> 
> The fd representing a ring buffer is associated with an MMIO region
> registered for coalescing and all writes to that region are accumulated
> there. This is in contrast to the v1 API where all regions have to share
> the same buffer. Nevertheless, userspace code can still use the same
> ring buffer for multiple zones if it wishes to do so.
> 
> Userspace can check for the availability of the new API by checking if
> the KVM_CAP_COALESCED_MMIO2 capability is supported.
> 
> Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
> 
> v1->v2:
>   - Rebased on top of kvm/queue resolving the conflict in kvm.h
> 
>  include/uapi/linux/kvm.h  | 16 ++++++++++++++++
>  virt/kvm/coalesced_mmio.c | 37 +++++++++++++++++++++++++++++++------
>  virt/kvm/coalesced_mmio.h |  7 ++++---
>  virt/kvm/kvm_main.c       | 36 +++++++++++++++++++++++++++++++++++-
>  4 files changed, 86 insertions(+), 10 deletions(-)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 87f79a820fc0..7e8d5c879986 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -480,6 +480,16 @@ struct kvm_coalesced_mmio_zone {
>  	};
>  };
>  
> +struct kvm_coalesced_mmio_zone2 {
> +	__u64 addr;
> +	__u32 size;
> +	union {
> +		__u32 pad;
> +		__u32 pio;
> +	};
> +	int buffer_fd;

Dunno if it matters, but KVM typically uses __u32 for file descriptors and then
does the verificaton on the backend (which needs to be there regardless).

> diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
> index 64428b0a4aad..729f3293f768 100644
> --- a/virt/kvm/coalesced_mmio.c
> +++ b/virt/kvm/coalesced_mmio.c
> @@ -17,6 +17,7 @@
>  #include <linux/kvm.h>
>  #include <linux/anon_inodes.h>
>  #include <linux/poll.h>
> +#include <linux/file.h>
>  
>  #include "coalesced_mmio.h"
>  
> @@ -279,19 +280,40 @@ int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm)
>  }
>  
>  int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
> -					 struct kvm_coalesced_mmio_zone *zone)
> +					 struct kvm_coalesced_mmio_zone2 *zone,
> +					 bool use_buffer_fd)
>  {
> -	int ret;
> +	int ret = 0;

Heh, init ret to zero, and then return a pile of error before ret is ever touched.
I assume ret can be set to '0' after mutex_unlock(&kvm->slots_lock); in the happy
path, which would also show that it is indeed the happy path.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9eb22287384f..ef7dcce80136 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4892,6 +4892,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  #ifdef CONFIG_KVM_MMIO
>  	case KVM_CAP_COALESCED_MMIO:
>  		return KVM_COALESCED_MMIO_PAGE_OFFSET;
> +	case KVM_CAP_COALESCED_MMIO2:
>  	case KVM_CAP_COALESCED_PIO:
>  		return 1;
>  #endif
> @@ -5230,15 +5231,48 @@ static long kvm_vm_ioctl(struct file *filp,
>  #ifdef CONFIG_KVM_MMIO
>  	case KVM_REGISTER_COALESCED_MMIO: {
>  		struct kvm_coalesced_mmio_zone zone;
> +		struct kvm_coalesced_mmio_zone2 zone2;
>  
>  		r = -EFAULT;
>  		if (copy_from_user(&zone, argp, sizeof(zone)))
>  			goto out;
> -		r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone);
> +
> +		zone2.addr = zone.addr;
> +		zone2.size = zone.size;
> +		zone2.pio = zone.pio;
> +		zone2.buffer_fd = -1;
> +
> +		r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone2, false);
> +		break;
> +	}
> +	case KVM_REGISTER_COALESCED_MMIO2: {

Ah, the curly braces were just early :-)

> +		struct kvm_coalesced_mmio_zone2 zone;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&zone, argp, sizeof(zone)))
> +			goto out;
> +
> +		r = kvm_vm_ioctl_register_coalesced_mmio(kvm, &zone, true);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 5/6] KVM: Documentation: Document v2 of coalesced MMIO API
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
                   ` (3 preceding siblings ...)
  2024-07-18 19:35 ` [PATCH v2 4/6] KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls Ilias Stamatis
@ 2024-07-18 19:35 ` Ilias Stamatis
  2024-07-18 19:35 ` [PATCH v2 6/6] KVM: selftests: Add coalesced_mmio_test Ilias Stamatis
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ilias Stamatis @ 2024-07-18 19:35 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: pdurrant, dwmw, nh-open-source, Ilias Stamatis, Paul Durrant

Document the KVM_CREATE_COALESCED_MMIO_BUFFER and
KVM_REGISTER_COALESCED_MMIO2 ioctls.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---

v1->v2:
  - Added a sentence explaining that closing the fd results in
    deregistering the associated regions.
  - Rebased on top of kvm/queue and updated the section numbers.

 Documentation/virt/kvm/api.rst | 91 ++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 8e5dad80b337..bd47cf64595a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4919,6 +4919,8 @@ For the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE.
 :Parameters: struct kvm_coalesced_mmio_zone
 :Returns: 0 on success, < 0 on error
 
+KVM_(UN)REGISTER_COALESCED_MMIO2 can be used instead if available.
+
 Coalesced I/O is a performance optimization that defers hardware
 register write emulation so that userspace exits are avoided.  It is
 typically used to reduce the overhead of emulating frequently accessed
@@ -6418,6 +6420,95 @@ the capability to be present.
 
 `flags` must currently be zero.
 
+4.144 KVM_CREATE_COALESCED_MMIO_BUFFER
+-------------------------------------
+
+:Capability: KVM_CAP_COALESCED_MMIO2
+:Architectures: all
+:Type: vm ioctl
+:Parameters: none
+:Returns: An fd on success, < 0 on error
+
+Returns an fd, but does not allocate a buffer. Also see
+KVM_REGISTER_COALESCED_MMIO2.
+
+The fd must be first passed to mmap() to allocate a page to be used as a ring
+buffer that is shared between kernel and userspace. The page must be
+interpreted as a struct kvm_coalesced_mmio_ring.
+
+::
+
+  struct kvm_coalesced_mmio_ring {
+  	__u32 first, last;
+  	struct kvm_coalesced_mmio coalesced_mmio[];
+  };
+
+The kernel will increment the last index and userspace is expected to do the
+same with the first index after consuming entries. The upper bound of the
+coalesced_mmio array is defined as KVM_COALESCED_MMIO_MAX.
+
+::
+
+  struct kvm_coalesced_mmio {
+  	__u64 phys_addr;
+  	__u32 len;
+  	union {
+  		__u32 pad;
+  		__u32 pio;
+  	};
+  	__u8  data[8];
+  };
+
+After allocating a buffer with mmap(), the fd must be passed as an argument to
+KVM_REGISTER_COALESCED_MMIO2 to associate an I/O region to which writes are
+coalesced with the ring buffer. Multiple I/O regions can be associated with the
+same ring buffer. Closing the fd after unmapping it automatically deregisters
+all I/O regions associated with it.
+
+poll() is also supported on the fd so that userspace can be notified of I/O
+writes without having to wait until the next exit to userspace.
+
+4.145 KVM_(UN)REGISTER_COALESCED_MMIO2
+-------------------------------------
+
+:Capability: KVM_CAP_COALESCED_MMIO2
+:Architectures: all
+:Type: vm ioctl
+:Parameters: struct kvm_coalesced_mmio_zone2
+:Returns: 0 on success, < 0 on error
+
+Coalesced I/O is a performance optimization that defers hardware register write
+emulation so that userspace exits are avoided. It is typically used to reduce
+the overhead of emulating frequently accessed hardware registers.
+
+When a hardware register is configured for coalesced I/O, write accesses do not
+exit to userspace and their value is recorded in a ring buffer that is shared
+between kernel and userspace.
+
+::
+
+  struct kvm_coalesced_mmio_zone2 {
+  	__u64 addr;
+  	__u32 size;
+  	union {
+  		__u32 pad;
+  		__u32 pio;
+  	};
+  	int buffer_fd;
+  };
+
+KVM_CREATE_COALESCED_MMIO_BUFFER must be used to allocate a buffer fd which
+must be first mmaped before passed to KVM_REGISTER_COALESCED_MMIO2, otherwise
+the ioctl will fail.
+
+Coalesced I/O is used if one or more write accesses to a hardware register can
+be deferred until a read or a write to another hardware register on the same
+device. This last access will cause a vmexit and userspace will process
+accesses from the ring buffer before emulating it. That will avoid exiting to
+userspace on repeated writes.
+
+Alternatively, userspace can call poll() on the buffer fd if it wishes to be
+notified of new I/O writes that way.
 
 5. The kvm_run structure
 ========================
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/6] KVM: selftests: Add coalesced_mmio_test
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
                   ` (4 preceding siblings ...)
  2024-07-18 19:35 ` [PATCH v2 5/6] KVM: Documentation: Document v2 of coalesced MMIO API Ilias Stamatis
@ 2024-07-18 19:35 ` Ilias Stamatis
  2024-08-17  0:40 ` [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Sean Christopherson
  2024-08-23 23:47 ` Sean Christopherson
  7 siblings, 0 replies; 16+ messages in thread
From: Ilias Stamatis @ 2024-07-18 19:35 UTC (permalink / raw)
  To: kvm, pbonzini; +Cc: pdurrant, dwmw, nh-open-source, Ilias Stamatis

Test the KVM_CREATE_COALESCED_MMIO_BUFFER, KVM_REGISTER_COALESCED_MMIO2
and KVM_UNREGISTER_COALESCED_MMIO2 ioctls.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
---
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/coalesced_mmio_test.c       | 313 ++++++++++++++++++
 2 files changed, 314 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/coalesced_mmio_test.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index b084ba2262a0..c22b50396006 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -147,6 +147,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
 TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 TEST_GEN_PROGS_x86_64 += pre_fault_memory_test
+TEST_GEN_PROGS_x86_64 += coalesced_mmio_test
 
 # Compiled outputs used by test targets
 TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
diff --git a/tools/testing/selftests/kvm/coalesced_mmio_test.c b/tools/testing/selftests/kvm/coalesced_mmio_test.c
new file mode 100644
index 000000000000..103cfcb92219
--- /dev/null
+++ b/tools/testing/selftests/kvm/coalesced_mmio_test.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Test the KVM_CREATE_COALESCED_MMIO_BUFFER, KVM_REGISTER_COALESCED_MMIO2 and
+ * KVM_UNREGISTER_COALESCED_MMIO2 ioctls by making sure that MMIO writes to
+ * associated zones end up in the correct ring buffer. Also test that we don't
+ * exit to userspace when there is space in the corresponding buffer.
+ */
+
+#include <kvm_util.h>
+#include <ucall_common.h>
+
+#define PAGE_SIZE 4096
+
+/*
+ * Somewhat arbitrary location and slot, intended to not overlap anything.
+ */
+#define MEM_REGION_SLOT         10
+#define MEM_REGION_GPA          0xc0000000UL
+#define MEM_REGION_SIZE         (PAGE_SIZE * 2)
+#define MEM_REGION_PAGES        DIV_ROUND_UP(MEM_REGION_SIZE, PAGE_SIZE)
+
+#define COALESCING_ZONE1_GPA    MEM_REGION_GPA
+#define COALESCING_ZONE1_SIZE   PAGE_SIZE
+#define COALESCING_ZONE2_GPA    (COALESCING_ZONE1_GPA + COALESCING_ZONE1_SIZE)
+#define COALESCING_ZONE2_SIZE   PAGE_SIZE
+
+#define MMIO_WRITE_DATA         0xbadbeef
+#define MMIO_WRITE_DATA2        0xbadc0de
+
+#define BATCH_SIZE              4
+
+static void guest_code(void)
+{
+	uint64_t *gpa;
+
+	/*
+	 * The first write should result in an exit
+	 */
+	gpa = (uint64_t *)(MEM_REGION_GPA);
+	WRITE_ONCE(*gpa, MMIO_WRITE_DATA);
+
+	/*
+	 * These writes should be stored in a coalescing ring buffer and only
+	 * the last one should result in an exit.
+	 */
+	for (int i = 0; i < KVM_COALESCED_MMIO_MAX; i++) {
+		gpa = (uint64_t *)(COALESCING_ZONE1_GPA + i * sizeof(*gpa));
+		WRITE_ONCE(*gpa, MMIO_WRITE_DATA + i);
+
+		/* Let's throw a PIO into the mix */
+		if (i == KVM_COALESCED_MMIO_MAX / 2)
+			GUEST_SYNC(0);
+	}
+
+	/*
+	 * These writes should be stored in two separate ring buffers and they
+	 * shouldn't result in an exit.
+	 */
+	for (int i = 0; i < BATCH_SIZE; i++) {
+		gpa = (uint64_t *)(COALESCING_ZONE1_GPA + i * sizeof(*gpa));
+		WRITE_ONCE(*gpa, MMIO_WRITE_DATA + i);
+
+		gpa = (uint64_t *)(COALESCING_ZONE2_GPA + i * sizeof(*gpa));
+		WRITE_ONCE(*gpa, MMIO_WRITE_DATA2 + i);
+	}
+
+	GUEST_SYNC(0);
+
+	/*
+	 * These writes should be stored in the same ring buffer and they
+	 * shouldn't result in an exit.
+	 */
+	for (int i = 0; i < BATCH_SIZE; i++) {
+		if (i < BATCH_SIZE / 2)
+			gpa = (uint64_t *)(COALESCING_ZONE1_GPA + i * sizeof(*gpa));
+		else
+			gpa = (uint64_t *)(COALESCING_ZONE2_GPA + i * sizeof(*gpa));
+
+		WRITE_ONCE(*gpa, MMIO_WRITE_DATA2 + i);
+	}
+
+	GUEST_SYNC(0);
+
+	/*
+	 * This last write should result in an exit because the host should
+	 * have disabled I/O coalescing by now.
+	 */
+	gpa = (uint64_t *)(COALESCING_ZONE1_GPA);
+	WRITE_ONCE(*gpa, MMIO_WRITE_DATA);
+}
+
+static void assert_mmio_write(struct kvm_vcpu *vcpu, uint64_t addr, uint64_t value)
+{
+	uint64_t data;
+
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_MMIO);
+	TEST_ASSERT(vcpu->run->mmio.is_write, "Got MMIO read, not MMIO write");
+
+	memcpy(&data, vcpu->run->mmio.data, vcpu->run->mmio.len);
+	TEST_ASSERT_EQ(vcpu->run->mmio.phys_addr, addr);
+	TEST_ASSERT_EQ(value, data);
+}
+
+static void assert_ucall_exit(struct kvm_vcpu *vcpu, uint64_t command)
+{
+	uint64_t cmd;
+	struct ucall uc;
+
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+	cmd = get_ucall(vcpu, &uc);
+	TEST_ASSERT_EQ(cmd, command);
+}
+
+static void assert_ring_entries(struct kvm_coalesced_mmio_ring *ring,
+				uint32_t nentries,
+				uint64_t addr,
+				uint64_t value)
+{
+	uint64_t data;
+
+	for (int i = READ_ONCE(ring->first); i < nentries; i++) {
+		TEST_ASSERT_EQ(READ_ONCE(ring->coalesced_mmio[i].len),
+			       sizeof(data));
+		memcpy(&data, ring->coalesced_mmio[i].data,
+		       READ_ONCE(ring->coalesced_mmio[i].len));
+
+		TEST_ASSERT_EQ(READ_ONCE(ring->coalesced_mmio[i].phys_addr),
+			       addr + i * sizeof(data));
+		TEST_ASSERT_EQ(data, value + i);
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	uint64_t gpa;
+	struct kvm_coalesced_mmio_ring *ring, *ring2;
+	struct kvm_coalesced_mmio_zone2 zone, zone2;
+	int ring_fd, ring_fd2;
+	int r;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_COALESCED_MMIO2));
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_READONLY_MEM));
+	TEST_ASSERT(BATCH_SIZE * 2 <= KVM_COALESCED_MMIO_MAX,
+		"KVM_COALESCED_MMIO_MAX too small");
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+
+	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, MEM_REGION_GPA,
+				    MEM_REGION_SLOT, MEM_REGION_PAGES,
+				    KVM_MEM_READONLY);
+
+	gpa = vm_phy_pages_alloc(vm, MEM_REGION_PAGES, MEM_REGION_GPA,
+				 MEM_REGION_SLOT);
+	TEST_ASSERT(gpa == MEM_REGION_GPA, "Failed vm_phy_pages_alloc");
+
+	virt_map(vm, MEM_REGION_GPA, MEM_REGION_GPA, MEM_REGION_PAGES);
+
+	/*
+	 * Test that allocating an fd and memory mapping it works
+	 */
+	ring_fd = __vm_ioctl(vm, KVM_CREATE_COALESCED_MMIO_BUFFER, NULL);
+	TEST_ASSERT(ring_fd != -1, "Failed KVM_CREATE_COALESCED_MMIO_BUFFER");
+
+	ring = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
+		    ring_fd, 0);
+	TEST_ASSERT(ring != MAP_FAILED, "Failed to allocate ring buffer");
+
+	/*
+	 * Test that trying to map the same fd again fails
+	 */
+	ring2 = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
+		     ring_fd, 0);
+	TEST_ASSERT(ring2 == MAP_FAILED && errno == EBUSY,
+		    "Mapping the same fd again should fail with EBUSY");
+
+	/*
+	 * Test that the first and last ring indices are zero
+	 */
+	TEST_ASSERT_EQ(READ_ONCE(ring->first), 0);
+	TEST_ASSERT_EQ(READ_ONCE(ring->last), 0);
+
+	/*
+	 * Run the vCPU and make sure the first MMIO write results in a
+	 * userspace exit since we have not setup MMIO coalescing yet.
+	 */
+	vcpu_run(vcpu);
+	assert_mmio_write(vcpu, MEM_REGION_GPA, MMIO_WRITE_DATA);
+
+	/*
+	 * Let's actually setup MMIO coalescing now...
+	 */
+	zone.addr = COALESCING_ZONE1_GPA;
+	zone.size = COALESCING_ZONE1_SIZE;
+	zone.buffer_fd = ring_fd;
+	r = __vm_ioctl(vm, KVM_REGISTER_COALESCED_MMIO2, &zone);
+	TEST_ASSERT(r != -1, "Failed KVM_REGISTER_COALESCED_MMIO2");
+
+	/*
+	 * The guest will start doing MMIO writes in the coalesced regions but
+	 * will also do a ucall when the buffer is half full. The first
+	 * userspace exit should be due to the ucall and not an MMIO exit.
+	 */
+	vcpu_run(vcpu);
+	assert_ucall_exit(vcpu, UCALL_SYNC);
+	TEST_ASSERT_EQ(READ_ONCE(ring->first), 0);
+	TEST_ASSERT_EQ(READ_ONCE(ring->last), KVM_COALESCED_MMIO_MAX / 2 + 1);
+
+	/*
+	 * Run the guest again. Next exit should be when the buffer is full.
+	 * One entry always remains unused.
+	 */
+	vcpu_run(vcpu);
+	assert_mmio_write(vcpu,
+	    COALESCING_ZONE1_GPA + (KVM_COALESCED_MMIO_MAX - 1) * sizeof(uint64_t),
+	    MMIO_WRITE_DATA + KVM_COALESCED_MMIO_MAX - 1);
+	TEST_ASSERT_EQ(READ_ONCE(ring->first), 0);
+	TEST_ASSERT_EQ(READ_ONCE(ring->last), KVM_COALESCED_MMIO_MAX - 1);
+
+	assert_ring_entries(ring, KVM_COALESCED_MMIO_MAX - 1,
+			    COALESCING_ZONE1_GPA, MMIO_WRITE_DATA);
+
+	/*
+	 * Let's setup another region with a separate buffer
+	 */
+	ring_fd2 = __vm_ioctl(vm, KVM_CREATE_COALESCED_MMIO_BUFFER, NULL);
+	TEST_ASSERT(ring_fd != -1, "Failed KVM_CREATE_COALESCED_MMIO_BUFFER");
+
+	ring2 = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,
+		     ring_fd2, 0);
+	TEST_ASSERT(ring2 != MAP_FAILED, "Failed to allocate ring buffer");
+
+	zone2.addr = COALESCING_ZONE2_GPA;
+	zone2.size = COALESCING_ZONE2_SIZE;
+	zone2.buffer_fd = ring_fd2;
+	r = __vm_ioctl(vm, KVM_REGISTER_COALESCED_MMIO2, &zone2);
+	TEST_ASSERT(r != -1, "Failed KVM_REGISTER_COALESCED_MMIO2");
+
+	/*
+	 * Move the consumer pointer of the first ring forward.
+	 *
+	 * When re-entering the vCPU the guest will write BATCH_SIZE
+	 * times to each MMIO zone.
+	 */
+	WRITE_ONCE(ring->first,
+		(READ_ONCE(ring->first) + BATCH_SIZE) % KVM_COALESCED_MMIO_MAX);
+
+	vcpu_run(vcpu);
+	assert_ucall_exit(vcpu, UCALL_SYNC);
+
+	TEST_ASSERT_EQ(READ_ONCE(ring->first), BATCH_SIZE);
+	TEST_ASSERT_EQ(READ_ONCE(ring->last),
+	  (KVM_COALESCED_MMIO_MAX - 1 + BATCH_SIZE) % KVM_COALESCED_MMIO_MAX);
+	TEST_ASSERT_EQ(READ_ONCE(ring2->first), 0);
+	TEST_ASSERT_EQ(READ_ONCE(ring2->last), BATCH_SIZE);
+
+	assert_ring_entries(ring, BATCH_SIZE, COALESCING_ZONE1_GPA, MMIO_WRITE_DATA);
+	assert_ring_entries(ring2, BATCH_SIZE, COALESCING_ZONE2_GPA, MMIO_WRITE_DATA2);
+
+	/*
+	 * Unregister zone 2 and register it again but this time use the same
+	 * ring buffer used for zone 1.
+	 */
+	r = __vm_ioctl(vm, KVM_UNREGISTER_COALESCED_MMIO2, &zone2);
+	TEST_ASSERT(r != -1, "Failed KVM_UNREGISTER_COALESCED_MMIO2");
+
+	zone2.buffer_fd = ring_fd;
+	r = __vm_ioctl(vm, KVM_REGISTER_COALESCED_MMIO2, &zone2);
+	TEST_ASSERT(r != -1, "Failed KVM_REGISTER_COALESCED_MMIO2");
+
+	/*
+	 * Enter the vCPU again. This time writes to both regions should go
+	 * to the same ring buffer.
+	 */
+	WRITE_ONCE(ring->first,
+		(READ_ONCE(ring->first) + BATCH_SIZE) % KVM_COALESCED_MMIO_MAX);
+
+	vcpu_run(vcpu);
+	assert_ucall_exit(vcpu, UCALL_SYNC);
+
+	TEST_ASSERT_EQ(READ_ONCE(ring->first), BATCH_SIZE * 2);
+	TEST_ASSERT_EQ(READ_ONCE(ring->last),
+	  (KVM_COALESCED_MMIO_MAX - 1 + BATCH_SIZE * 2) % KVM_COALESCED_MMIO_MAX);
+
+	WRITE_ONCE(ring->first,
+		(READ_ONCE(ring->first) + BATCH_SIZE) % KVM_COALESCED_MMIO_MAX);
+
+	/*
+	 * Test that munmap and close work.
+	 */
+	r = munmap(ring, PAGE_SIZE);
+	TEST_ASSERT(r == 0, "Failed to munmap()");
+	r = close(ring_fd);
+	TEST_ASSERT(r == 0, "Failed to close()");
+
+	r = munmap(ring2, PAGE_SIZE);
+	TEST_ASSERT(r == 0, "Failed to munmap()");
+	r = close(ring_fd2);
+	TEST_ASSERT(r == 0, "Failed to close()");
+
+	/*
+	 * close() should have also deregistered all I/O regions associated
+	 * with the ring buffer automatically. Make sure that when the guest
+	 * writes to the region again this results in an immediate exit.
+	 */
+	vcpu_run(vcpu);
+	assert_mmio_write(vcpu, COALESCING_ZONE1_GPA, MMIO_WRITE_DATA);
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/6] KVM: Improve MMIO Coalescing API
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
                   ` (5 preceding siblings ...)
  2024-07-18 19:35 ` [PATCH v2 6/6] KVM: selftests: Add coalesced_mmio_test Ilias Stamatis
@ 2024-08-17  0:40 ` Sean Christopherson
  2024-08-23 23:47 ` Sean Christopherson
  7 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2024-08-17  0:40 UTC (permalink / raw)
  To: Ilias Stamatis; +Cc: kvm, pbonzini, pdurrant, dwmw, nh-open-source

On Thu, Jul 18, 2024, Ilias Stamatis wrote:
> The current MMIO coalescing design has a few drawbacks which limit its
> usefulness. Currently all coalesced MMIO zones use the same ring buffer.
> That means that upon a userspace exit we have to handle potentially
> unrelated MMIO writes synchronously. And a VM-wide lock needs to be
> taken in the kernel when an MMIO exit occurs.
> 
> Additionally, there is no direct way for userspace to be notified about
> coalesced MMIO writes. If the next MMIO exit to userspace is when the
> ring buffer has filled then a substantial (and unbounded) amount of time
> may have passed since the first coalesced MMIO.
> 
> This series adds new ioctls to KVM that allow for greater control by
> making it possible to associate different MMIO zones with different ring
> buffers. It also allows userspace to use poll() to check for coalesced
> writes in order to avoid userspace exits in vCPU threads (see patch 3
> for why this can be useful).
> 
> The idea of improving the API in this way originally came from Paul
> Durrant (pdurrant@amazon.co.uk) but the implementation is mine.
> 
> The first patch in the series is a bug in the existing code that I
> discovered while writing a selftest and can be merged independently.

Ya, I'll grab it, maybe for 6.11?  Doesn't seem urgent though.

Anyways, I gave this a *very* cursory review.  I'd go ahead and send v3 though,
you're going to need Paolo's eyeballs on this, and he's offline until September.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/6] KVM: Improve MMIO Coalescing API
  2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
                   ` (6 preceding siblings ...)
  2024-08-17  0:40 ` [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Sean Christopherson
@ 2024-08-23 23:47 ` Sean Christopherson
  2024-08-27 10:35   ` Stamatis, Ilias
  7 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2024-08-23 23:47 UTC (permalink / raw)
  To: Sean Christopherson, kvm, pbonzini, Ilias Stamatis
  Cc: pdurrant, dwmw, nh-open-source

On Thu, 18 Jul 2024 20:35:37 +0100, Ilias Stamatis wrote:
> The current MMIO coalescing design has a few drawbacks which limit its
> usefulness. Currently all coalesced MMIO zones use the same ring buffer.
> That means that upon a userspace exit we have to handle potentially
> unrelated MMIO writes synchronously. And a VM-wide lock needs to be
> taken in the kernel when an MMIO exit occurs.
> 
> Additionally, there is no direct way for userspace to be notified about
> coalesced MMIO writes. If the next MMIO exit to userspace is when the
> ring buffer has filled then a substantial (and unbounded) amount of time
> may have passed since the first coalesced MMIO.
> 
> [...]

Applied patch 1 to kvm-x86 generic.  I deliberately didn't put this in fixes or
Cc: it for stable, as the bug has been around for sooo long without anyone
noticing that there's basically zero chance that the bug is actively causing
issues.

I also reworked and expanded the changelog significantly to make it more clear
why things break, what the fallout is (KVM can _sometimes_ use the full ring),
and to call out that the lockless scheme that the buggy commit was preparing
for never seems to have landed.

Please take a gander at the changelog and holler if I messed anything up.

[1/6] KVM: Fix coalesced_mmio_has_room() to avoid premature userspace exit
      https://github.com/kvm-x86/linux/commit/92f6d4130497

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re:  [PATCH v2 0/6] KVM: Improve MMIO Coalescing API
  2024-08-23 23:47 ` Sean Christopherson
@ 2024-08-27 10:35   ` Stamatis, Ilias
  2024-08-27 13:45     ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Stamatis, Ilias @ 2024-08-27 10:35 UTC (permalink / raw)
  To: kvm@vger.kernel.org, pbonzini@redhat.com, Stamatis, Ilias,
	seanjc@google.com
  Cc: nh-open-source@amazon.com, Durrant, Paul, Woodhouse, David

On Fri, 2024-08-23 at 16:47 -0700, Sean Christopherson wrote:
> On Thu, 18 Jul 2024 20:35:37 +0100, Ilias Stamatis wrote:
> > The current MMIO coalescing design has a few drawbacks which limit its
> > usefulness. Currently all coalesced MMIO zones use the same ring buffer.
> > That means that upon a userspace exit we have to handle potentially
> > unrelated MMIO writes synchronously. And a VM-wide lock needs to be
> > taken in the kernel when an MMIO exit occurs.
> > 
> > Additionally, there is no direct way for userspace to be notified about
> > coalesced MMIO writes. If the next MMIO exit to userspace is when the
> > ring buffer has filled then a substantial (and unbounded) amount of time
> > may have passed since the first coalesced MMIO.
> > 
> > [...]
> 
> Applied patch 1 to kvm-x86 generic.  I deliberately didn't put this in fixes or
> Cc: it for stable, as the bug has been around for sooo long without anyone
> noticing that there's basically zero chance that the bug is actively causing
> issues.
> 
> I also reworked and expanded the changelog significantly to make it more clear
> why things break, what the fallout is (KVM can _sometimes_ use the full ring),
> and to call out that the lockless scheme that the buggy commit was preparing
> for never seems to have landed.
> 
> Please take a gander at the changelog and holler if I messed anything up.
> 
> [1/6] KVM: Fix coalesced_mmio_has_room() to avoid premature userspace exit
>       https://github.com/kvm-x86/linux/commit/92f6d4130497
> 
> --
> https://github.com/kvm-x86/linux/tree/next

It looks good, thanks.

> Doh, I applied v2 instead of v3.  Though unless mine eyes deceive me,
> they're the same.

They are the same indeed.

I'm not entirely sure what's the recommended thing to do when updating
only some of the patches of a series, whether to send new versions for
the revised patches only or resend everything. I opted to do the latter
and then I called out the changes between v2 and v3 inside the patches
that were actually modified.

Thanks,
Ilias


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/6] KVM: Improve MMIO Coalescing API
  2024-08-27 10:35   ` Stamatis, Ilias
@ 2024-08-27 13:45     ` Sean Christopherson
  0 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2024-08-27 13:45 UTC (permalink / raw)
  To: Ilias Stamatis
  Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
	nh-open-source@amazon.com, Paul Durrant, David Woodhouse

On Tue, Aug 27, 2024, Ilias Stamatis wrote:
> On Fri, 2024-08-23 at 16:47 -0700, Sean Christopherson wrote:
> > Doh, I applied v2 instead of v3.  Though unless mine eyes deceive me,
> > they're the same.
> 
> They are the same indeed.
> 
> I'm not entirely sure what's the recommended thing to do when updating
> only some of the patches of a series, whether to send new versions for
> the revised patches only or resend everything. I opted to do the latter

Resend everything, as you did.  I simply messed up and grabbed the wrong version.

> and then I called out the changes between v2 and v3 inside the patches
> that were actually modified.

FWIW, I strongly prefer the delta be captured in the cover letter, though others
may obviously have different preferences.

https://lore.kernel.org/all/ZQnAN9TC6b8mSJ%2Ft@google.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-08-27 13:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-18 19:35 [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Ilias Stamatis
2024-07-18 19:35 ` [PATCH v2 1/6] KVM: Fix coalesced_mmio_has_room() Ilias Stamatis
2024-07-18 19:35 ` [PATCH v2 2/6] KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl Ilias Stamatis
2024-08-17  0:32   ` Sean Christopherson
2024-07-18 19:35 ` [PATCH v2 3/6] KVM: Support poll() on coalesced mmio buffer fds Ilias Stamatis
2024-07-20  1:35   ` kernel test robot
2024-07-22 14:26     ` Stamatis, Ilias
2024-07-20 21:10   ` kernel test robot
2024-07-18 19:35 ` [PATCH v2 4/6] KVM: Add KVM_(UN)REGISTER_COALESCED_MMIO2 ioctls Ilias Stamatis
2024-08-17  0:36   ` Sean Christopherson
2024-07-18 19:35 ` [PATCH v2 5/6] KVM: Documentation: Document v2 of coalesced MMIO API Ilias Stamatis
2024-07-18 19:35 ` [PATCH v2 6/6] KVM: selftests: Add coalesced_mmio_test Ilias Stamatis
2024-08-17  0:40 ` [PATCH v2 0/6] KVM: Improve MMIO Coalescing API Sean Christopherson
2024-08-23 23:47 ` Sean Christopherson
2024-08-27 10:35   ` Stamatis, Ilias
2024-08-27 13:45     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).