linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] kvm: support any-length wildcard ioeventfd
@ 2014-03-31 18:50 Michael S. Tsirkin
  2014-03-31 18:50 ` [PATCH 2/2] vmx: speed up wildcard MMIO EVENTFD Michael S. Tsirkin
  0 siblings, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2014-03-31 18:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Paolo Bonzini, gleb, kvm

It is sometimes benefitial to ignore IO size, and only match on address.
In hindsight this would have been a better default than matching length
when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
of access can be optimized on VMX: there no need to do page lookups.
This can currently be done with many ioeventfds but in a suboptimal way.

However we can't change kernel/userspace ABI without risk of breaking
some applications.
Use len = 0 to mean "ignore length for matching" in a more optimal way.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/uapi/linux/kvm.h |  3 ++-
 arch/x86/kvm/x86.c       |  1 +
 virt/kvm/eventfd.c       | 27 ++++++++++++++++++++++-----
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 932d7f2..d73007e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -464,7 +464,7 @@ enum {
 struct kvm_ioeventfd {
 	__u64 datamatch;
 	__u64 addr;        /* legal pio/mmio address */
-	__u32 len;         /* 1, 2, 4, or 8 bytes    */
+	__u32 len;         /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
 	__s32 fd;
 	__u32 flags;
 	__u8  pad[36];
@@ -675,6 +675,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_SPAPR_MULTITCE 94
 #define KVM_CAP_EXT_EMUL_CPUID 95
 #define KVM_CAP_HYPERV_TIME 96
+#define KVM_CAP_IOEVENTFD_NO_LENGTH 97
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c6218b..8ae1ff5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2602,6 +2602,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_IRQ_INJECT_STATUS:
 	case KVM_CAP_IRQFD:
 	case KVM_CAP_IOEVENTFD:
+	case KVM_CAP_IOEVENTFD_NO_LENGTH:
 	case KVM_CAP_PIT2:
 	case KVM_CAP_PIT_STATE2:
 	case KVM_CAP_SET_IDENTITY_MAP_ADDR:
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index abe4d60..9eadb59 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -600,7 +600,15 @@ ioeventfd_in_range(struct _ioeventfd *p, gpa_t addr, int len, const void *val)
 {
 	u64 _val;
 
-	if (!(addr == p->addr && len == p->length))
+	if (addr != p->addr)
+		/* address must be precise for a hit */
+		return false;
+
+	if (!p->length)
+		/* length = 0 means only look at the address, so always a hit */
+		return true;
+
+	if (len != p->length)
 		/* address-range must be precise for a hit */
 		return false;
 
@@ -671,9 +679,11 @@ ioeventfd_check_collision(struct kvm *kvm, struct _ioeventfd *p)
 
 	list_for_each_entry(_p, &kvm->ioeventfds, list)
 		if (_p->bus_idx == p->bus_idx &&
-		    _p->addr == p->addr && _p->length == p->length &&
-		    (_p->wildcard || p->wildcard ||
-		     _p->datamatch == p->datamatch))
+		    _p->addr == p->addr &&
+		    (!_p->length || !p->length ||
+		     (_p->length == p->length &&
+		      (_p->wildcard || p->wildcard ||
+		       _p->datamatch == p->datamatch))))
 			return true;
 
 	return false;
@@ -697,8 +707,9 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 	int                       ret;
 
 	bus_idx = ioeventfd_bus_from_flags(args->flags);
-	/* must be natural-word sized */
+	/* must be natural-word sized, or 0 to ignore length */
 	switch (args->len) {
+	case 0:
 	case 1:
 	case 2:
 	case 4:
@@ -716,6 +727,12 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 	if (args->flags & ~KVM_IOEVENTFD_VALID_FLAG_MASK)
 		return -EINVAL;
 
+	/* ioeventfd with no length can't be combined with DATAMATCH */
+	if (!args->len &&
+	    args->flags & (KVM_IOEVENTFD_FLAG_PIO |
+			   KVM_IOEVENTFD_FLAG_DATAMATCH))
+		return -EINVAL;
+
 	eventfd = eventfd_ctx_fdget(args->fd);
 	if (IS_ERR(eventfd))
 		return PTR_ERR(eventfd);
-- 
MST


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] vmx: speed up wildcard MMIO EVENTFD
  2014-03-31 18:50 [PATCH 1/2] kvm: support any-length wildcard ioeventfd Michael S. Tsirkin
@ 2014-03-31 18:50 ` Michael S. Tsirkin
  2014-04-17 12:33   ` Michael S. Tsirkin
  0 siblings, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2014-03-31 18:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Paolo Bonzini, gleb, kvm

With KVM, MMIO is much slower than PIO, due to the need to
do page walk and emulation. But with EPT, it does not have to be: we
know the address from the VMCS so if the address is unique, we can look
up the eventfd directly, bypassing emulation.

Unfortunately, this only works if userspace does not need to match on
access length and data.  The implementation adds a separate FAST_MMIO
bus internally. This serves two purposes:
    - minimize overhead for old userspace that does not use eventfd with lengtth = 0
    - minimize disruption in other code (since we don't know the length,
      devices on the MMIO bus only get a valid address in write, this
      way we don't need to touch all devices to teach them to handle
      an invalid length)

At the moment, this optimization only has effect for EPT on x86.

It will be possible to speed up MMIO for NPT and MMU using the same
idea in the future.

With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
I was unable to detect any measureable slowdown to non-eventfd MMIO.

Making MMIO faster is important for the upcoming virtio 1.0 which
includes an MMIO signalling capability.

The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
pre-review and suggestions.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/kvm_host.h |  1 +
 include/uapi/linux/kvm.h |  1 +
 arch/x86/kvm/vmx.c       |  4 ++++
 virt/kvm/eventfd.c       | 16 ++++++++++++++++
 virt/kvm/kvm_main.c      |  1 +
 5 files changed, 23 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b8e9a43..c2eaa9f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -163,6 +163,7 @@ enum kvm_bus {
 	KVM_MMIO_BUS,
 	KVM_PIO_BUS,
 	KVM_VIRTIO_CCW_NOTIFY_BUS,
+	KVM_FAST_MMIO_BUS,
 	KVM_NR_BUSES
 };
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d73007e..fc75810 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -450,6 +450,7 @@ enum {
 	kvm_ioeventfd_flag_nr_pio,
 	kvm_ioeventfd_flag_nr_deassign,
 	kvm_ioeventfd_flag_nr_virtio_ccw_notify,
+	kvm_ioeventfd_flag_nr_fast_mmio,
 	kvm_ioeventfd_flag_nr_max,
 };
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3927528..78a1932 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5493,6 +5493,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 	gpa_t gpa;
 
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+	if (!kvm_io_bus_write(vcpu->kvm, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+		skip_emulated_instruction(vcpu);
+		return 1;
+	}
 
 	ret = handle_mmio_page_fault_common(vcpu, gpa, true);
 	if (likely(ret == RET_MMIO_PF_EMULATE))
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9eadb59..c61f2eb 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -770,6 +770,16 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 	if (ret < 0)
 		goto unlock_fail;
 
+	/* When length is ignored, MMIO is also put on a separate bus, for
+	 * faster lookups.
+	 */
+	if (!args->len && !(args->flags & KVM_IOEVENTFD_FLAG_PIO)) {
+		ret = kvm_io_bus_register_dev(kvm, KVM_FAST_MMIO_BUS,
+					      p->addr, 0, &p->dev);
+		if (ret < 0)
+			goto register_fail;
+	}
+
 	kvm->buses[bus_idx]->ioeventfd_count++;
 	list_add_tail(&p->list, &kvm->ioeventfds);
 
@@ -777,6 +787,8 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 
 	return 0;
 
+register_fail:
+	kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
 unlock_fail:
 	mutex_unlock(&kvm->slots_lock);
 
@@ -816,6 +828,10 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 			continue;
 
 		kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
+		if (!p->length) {
+			kvm_io_bus_unregister_dev(kvm, KVM_FAST_MMIO_BUS,
+						  &p->dev);
+		}
 		kvm->buses[bus_idx]->ioeventfd_count--;
 		ioeventfd_release(p);
 		ret = 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 03a0381..22a4e1b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2920,6 +2920,7 @@ static int __kvm_io_bus_read(struct kvm_io_bus *bus, struct kvm_io_range *range,
 
 	return -EOPNOTSUPP;
 }
+EXPORT_SYMBOL_GPL(kvm_io_bus_write);
 
 /* kvm_io_bus_read - called under kvm->slots_lock */
 int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
-- 
MST


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] vmx: speed up wildcard MMIO EVENTFD
  2014-03-31 18:50 ` [PATCH 2/2] vmx: speed up wildcard MMIO EVENTFD Michael S. Tsirkin
@ 2014-04-17 12:33   ` Michael S. Tsirkin
  2014-04-17 17:04     ` Marcelo Tosatti
  0 siblings, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2014-04-17 12:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Paolo Bonzini, gleb, kvm, mtosatti

On Mon, Mar 31, 2014 at 09:50:44PM +0300, Michael S. Tsirkin wrote:
> With KVM, MMIO is much slower than PIO, due to the need to
> do page walk and emulation. But with EPT, it does not have to be: we
> know the address from the VMCS so if the address is unique, we can look
> up the eventfd directly, bypassing emulation.
> 
> Unfortunately, this only works if userspace does not need to match on
> access length and data.  The implementation adds a separate FAST_MMIO
> bus internally. This serves two purposes:
>     - minimize overhead for old userspace that does not use eventfd with lengtth = 0
>     - minimize disruption in other code (since we don't know the length,
>       devices on the MMIO bus only get a valid address in write, this
>       way we don't need to touch all devices to teach them to handle
>       an invalid length)
> 
> At the moment, this optimization only has effect for EPT on x86.
> 
> It will be possible to speed up MMIO for NPT and MMU using the same
> idea in the future.
> 
> With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
> I was unable to detect any measureable slowdown to non-eventfd MMIO.
> 
> Making MMIO faster is important for the upcoming virtio 1.0 which
> includes an MMIO signalling capability.
> 
> The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
> pre-review and suggestions.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Ping.
Marcelo, any chance you could merge this for 3.16?

> ---
>  include/linux/kvm_host.h |  1 +
>  include/uapi/linux/kvm.h |  1 +
>  arch/x86/kvm/vmx.c       |  4 ++++
>  virt/kvm/eventfd.c       | 16 ++++++++++++++++
>  virt/kvm/kvm_main.c      |  1 +
>  5 files changed, 23 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index b8e9a43..c2eaa9f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -163,6 +163,7 @@ enum kvm_bus {
>  	KVM_MMIO_BUS,
>  	KVM_PIO_BUS,
>  	KVM_VIRTIO_CCW_NOTIFY_BUS,
> +	KVM_FAST_MMIO_BUS,
>  	KVM_NR_BUSES
>  };
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d73007e..fc75810 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -450,6 +450,7 @@ enum {
>  	kvm_ioeventfd_flag_nr_pio,
>  	kvm_ioeventfd_flag_nr_deassign,
>  	kvm_ioeventfd_flag_nr_virtio_ccw_notify,
> +	kvm_ioeventfd_flag_nr_fast_mmio,
>  	kvm_ioeventfd_flag_nr_max,
>  };
>  
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 3927528..78a1932 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5493,6 +5493,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>  	gpa_t gpa;
>  
>  	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> +	if (!kvm_io_bus_write(vcpu->kvm, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> +		skip_emulated_instruction(vcpu);
> +		return 1;
> +	}
>  
>  	ret = handle_mmio_page_fault_common(vcpu, gpa, true);
>  	if (likely(ret == RET_MMIO_PF_EMULATE))
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9eadb59..c61f2eb 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -770,6 +770,16 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>  	if (ret < 0)
>  		goto unlock_fail;
>  
> +	/* When length is ignored, MMIO is also put on a separate bus, for
> +	 * faster lookups.
> +	 */
> +	if (!args->len && !(args->flags & KVM_IOEVENTFD_FLAG_PIO)) {
> +		ret = kvm_io_bus_register_dev(kvm, KVM_FAST_MMIO_BUS,
> +					      p->addr, 0, &p->dev);
> +		if (ret < 0)
> +			goto register_fail;
> +	}
> +
>  	kvm->buses[bus_idx]->ioeventfd_count++;
>  	list_add_tail(&p->list, &kvm->ioeventfds);
>  
> @@ -777,6 +787,8 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>  
>  	return 0;
>  
> +register_fail:
> +	kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
>  unlock_fail:
>  	mutex_unlock(&kvm->slots_lock);
>  
> @@ -816,6 +828,10 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>  			continue;
>  
>  		kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
> +		if (!p->length) {
> +			kvm_io_bus_unregister_dev(kvm, KVM_FAST_MMIO_BUS,
> +						  &p->dev);
> +		}
>  		kvm->buses[bus_idx]->ioeventfd_count--;
>  		ioeventfd_release(p);
>  		ret = 0;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 03a0381..22a4e1b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2920,6 +2920,7 @@ static int __kvm_io_bus_read(struct kvm_io_bus *bus, struct kvm_io_range *range,
>  
>  	return -EOPNOTSUPP;
>  }
> +EXPORT_SYMBOL_GPL(kvm_io_bus_write);
>  
>  /* kvm_io_bus_read - called under kvm->slots_lock */
>  int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
> -- 
> MST
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] vmx: speed up wildcard MMIO EVENTFD
  2014-04-17 12:33   ` Michael S. Tsirkin
@ 2014-04-17 17:04     ` Marcelo Tosatti
  0 siblings, 0 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2014-04-17 17:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: linux-kernel, Paolo Bonzini, gleb, kvm

On Thu, Apr 17, 2014 at 03:33:55PM +0300, Michael S. Tsirkin wrote:
> On Mon, Mar 31, 2014 at 09:50:44PM +0300, Michael S. Tsirkin wrote:
> > With KVM, MMIO is much slower than PIO, due to the need to
> > do page walk and emulation. But with EPT, it does not have to be: we
> > know the address from the VMCS so if the address is unique, we can look
> > up the eventfd directly, bypassing emulation.
> > 
> > Unfortunately, this only works if userspace does not need to match on
> > access length and data.  The implementation adds a separate FAST_MMIO
> > bus internally. This serves two purposes:
> >     - minimize overhead for old userspace that does not use eventfd with lengtth = 0
> >     - minimize disruption in other code (since we don't know the length,
> >       devices on the MMIO bus only get a valid address in write, this
> >       way we don't need to touch all devices to teach them to handle
> >       an invalid length)
> > 
> > At the moment, this optimization only has effect for EPT on x86.
> > 
> > It will be possible to speed up MMIO for NPT and MMU using the same
> > idea in the future.
> > 
> > With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
> > I was unable to detect any measureable slowdown to non-eventfd MMIO.
> > 
> > Making MMIO faster is important for the upcoming virtio 1.0 which
> > includes an MMIO signalling capability.
> > 
> > The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
> > pre-review and suggestions.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Ping.
> Marcelo, any chance you could merge this for 3.16?

Applied, note: KVM_CAP_ number change.

FYI: "skip_emulated_instruction" breaks instructions which are composed of
other parts than just the 0-length eventfd register access, eg. rep mov.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-17 17:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-31 18:50 [PATCH 1/2] kvm: support any-length wildcard ioeventfd Michael S. Tsirkin
2014-03-31 18:50 ` [PATCH 2/2] vmx: speed up wildcard MMIO EVENTFD Michael S. Tsirkin
2014-04-17 12:33   ` Michael S. Tsirkin
2014-04-17 17:04     ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).