[PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
       [not found] <20230417122206.34647-1-metikaya@amazon.co.uk>
@ 2023-04-17 12:22 ` Metin Kaya
  2023-04-17 16:31   ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Metin Kaya @ 2023-04-17 12:22 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: x86, bp, dwmw, paul, seanjc, tglx, mingo, dave.hansen,
	joao.m.martins, Metin Kaya

HVMOP_flush_tlbs suboperation of hvm_op hypercall allows a guest to
flush all vCPU TLBs. There is no way for the VMM to flush TLBs from
userspace. Hence, this patch adds support for flushing vCPU TLBs to KVM
by making a KVM_REQ_TLB_FLUSH_GUEST request for all guest vCPUs.

Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>

---
 arch/x86/kvm/xen.c                 | 31 ++++++++++++++++++++++++++++++
 include/xen/interface/hvm/hvm_op.h |  3 +++
 2 files changed, 34 insertions(+)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 40edf4d1974c..78fa6d08bebc 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -21,6 +21,7 @@
 #include <xen/interface/vcpu.h>
 #include <xen/interface/version.h>
 #include <xen/interface/event_channel.h>
+#include <xen/interface/hvm/hvm_op.h>
 #include <xen/interface/sched.h>
 
 #include <asm/xen/cpuid.h>
@@ -1330,6 +1331,32 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode,
 	return false;
 }
 
+static void kvm_xen_hvmop_flush_tlbs(struct kvm_vcpu *vcpu, bool longmode,
+				     u64 arg, u64 *r)
+{
+	if (arg) {
+		*r = -EINVAL;
+		return;
+	}
+
+	kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);
+	*r = 0;
+}
+
+static bool kvm_xen_hcall_hvm_op(struct kvm_vcpu *vcpu, bool longmode,
+				 int cmd, u64 arg, u64 *r)
+{
+	switch (cmd) {
+	case HVMOP_flush_tlbs:
+		kvm_xen_hvmop_flush_tlbs(vcpu, longmode, arg, r);
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
 struct compat_vcpu_set_singleshot_timer {
     uint64_t timeout_abs_ns;
     uint32_t flags;
@@ -1501,6 +1528,10 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
 			timeout |= params[1] << 32;
 		handled = kvm_xen_hcall_set_timer_op(vcpu, timeout, &r);
 		break;
+	case __HYPERVISOR_hvm_op:
+		handled = kvm_xen_hcall_hvm_op(vcpu, longmode, params[0], params[1],
+					       &r);
+		break;
 	}
 	default:
 		break;
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
index 03134bf3cec1..373123226c6f 100644
--- a/include/xen/interface/hvm/hvm_op.h
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -16,6 +16,9 @@ struct xen_hvm_param {
 };
 DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);
 
+/* Flushes all VCPU TLBs: @arg must be NULL. */
+#define HVMOP_flush_tlbs            5
+
 /* Hint from PV drivers for pagetable destruction. */
 #define HVMOP_pagetable_dying       9
 struct xen_hvm_pagetable_dying {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-17 12:22 ` [PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall Metin Kaya
@ 2023-04-17 16:31   ` Sean Christopherson
  2023-04-18  9:14     ` [EXTERNAL][PATCH " David Woodhouse
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2023-04-17 16:31 UTC (permalink / raw)
  To: Metin Kaya
  Cc: kvm, pbonzini, x86, bp, dwmw, paul, tglx, mingo, dave.hansen,
	joao.m.martins

On Mon, Apr 17, 2023, Metin Kaya wrote:
> HVMOP_flush_tlbs suboperation of hvm_op hypercall allows a guest to
> flush all vCPU TLBs. There is no way for the VMM to flush TLBs from
> userspace.

Ah, took me a minute to connect the dots.  Monday morning is definitely partly
to blame, but it would be helpful to expand this sentence to be more explicit as
to why userspace's inability to efficiently flush TLBs.

And strictly speaking, userspace _can_ flush TLBs, just not in a precise, efficient
way.

> Hence, this patch adds support for flushing vCPU TLBs to KVM

Don't use "this patch", just state what the patch does as a command.

> by making a KVM_REQ_TLB_FLUSH_GUEST request for all guest vCPUs.

E.g.

  Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
  allows the guest to flush all vCPU's TLBs.  KVM doesn't provide an ioctl()
  to precisely flush guest TLBs, and punting to userspace would likely
  negate the performance benefits of avoiding a TLB shootdown in the guest.

> Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>
> 
> ---

Regarding the cover letter, first and foremost, make sure the cover letter has
a subject.  `git format-patch --cover-letter` will generate what you want, i.e.
there's no need hand generate it (or however you ended up with a nearly empty
mail with no subjection.

Second, there's no need to provide a cover letter for a standalone patch just to
capture the version information.  This area of the patch, between the three "---"
and the diff, is ignored by `git am`, and can be used for version info.

>  arch/x86/kvm/xen.c                 | 31 ++++++++++++++++++++++++++++++
>  include/xen/interface/hvm/hvm_op.h |  3 +++

Modifications to uapi headers is conspicuously missing.  I.e. there likely needs
to be a capability so that userspace can query support.

>  2 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 40edf4d1974c..78fa6d08bebc 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -21,6 +21,7 @@
>  #include <xen/interface/vcpu.h>
>  #include <xen/interface/version.h>
>  #include <xen/interface/event_channel.h>
> +#include <xen/interface/hvm/hvm_op.h>
>  #include <xen/interface/sched.h>
>  
>  #include <asm/xen/cpuid.h>
> @@ -1330,6 +1331,32 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode,
>  	return false;
>  }
>  
> +static void kvm_xen_hvmop_flush_tlbs(struct kvm_vcpu *vcpu, bool longmode,

Passing @longmode all the way down here just to ignore just screams copy+paste :-)

> +				     u64 arg, u64 *r)
> +{
> +	if (arg) {
> +		*r = -EINVAL;
> +		return;
> +	}
> +
> +	kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);

This doesn't even compile.  I'll give you one guess as to how much confidence I
have that this was properly tested.

Aha!  And QEMU appears to have Xen emulation support.  That means KVM-Unit-Tests
is an option.  Specifically, extend the "access" test to use this hypercall instead
of INVLPG.  That'll verify that the flush is actually being performed as expteced.

> +	*r = 0;

Using a out param in a void function is silly.  But that's a moot point because
this whole function is silly, just open code it in the caller.

> +}
> +
> +static bool kvm_xen_hcall_hvm_op(struct kvm_vcpu *vcpu, bool longmode,

There's no need for @longmode.

> +				 int cmd, u64 arg, u64 *r)
> +{
> +	switch (cmd) {
> +	case HVMOP_flush_tlbs:
> +		kvm_xen_hvmop_flush_tlbs(vcpu, longmode, arg, r);

This can simply be:

		if (arg) {
			*r = -EINVAL;
		} else {
			kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_TLB_FLUSH_GUEST);
			*r = 0;
		}
		return true;

> +		return true;
> +	default:
> +		break;
> +	}
> +
> +	return false;
> +}
> +
>  struct compat_vcpu_set_singleshot_timer {
>      uint64_t timeout_abs_ns;
>      uint32_t flags;
> @@ -1501,6 +1528,10 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>  			timeout |= params[1] << 32;
>  		handled = kvm_xen_hcall_set_timer_op(vcpu, timeout, &r);
>  		break;
> +	case __HYPERVISOR_hvm_op:
> +		handled = kvm_xen_hcall_hvm_op(vcpu, longmode, params[0], params[1],
> +					       &r);

It'll be a moot point, but in the future let code like this poke out past 80 chars.
The 80 char soft limit exists to make code more readable, wrapping a one-char variable
at the tail end of a function call does more harm than good.

> +		break;
>  	}
>  	default:
>  		break;
> diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
> index 03134bf3cec1..373123226c6f 100644
> --- a/include/xen/interface/hvm/hvm_op.h
> +++ b/include/xen/interface/hvm/hvm_op.h
> @@ -16,6 +16,9 @@ struct xen_hvm_param {
>  };
>  DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);
>  
> +/* Flushes all VCPU TLBs:

s/VCPU/vCPU

And "all vCPU TLBs" is ambiguous.  It could mean "all TLBs for the target vCPU".
Adding "guest" in there would also be helpful, e.g. to make it clear that this
doesn't flush TDP mappings.

> @arg must be NULL. */

Is the "NULL" part verbatim from Xen?  Because "0" seems like a better description
than "NULL" for a u64.

E.g.

  "Flush guest TLBs for all vCPUs"

> +#define HVMOP_flush_tlbs            5
> +
>  /* Hint from PV drivers for pagetable destruction. */
>  #define HVMOP_pagetable_dying       9
>  struct xen_hvm_pagetable_dying {
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXTERNAL][PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-17 16:31   ` Sean Christopherson
@ 2023-04-18  9:14     ` David Woodhouse
  2023-04-18 10:13       ` [PATCH v3] " Metin Kaya
  2023-04-18 14:44       ` [EXTERNAL][PATCH v2] " Sean Christopherson
  0 siblings, 2 replies; 17+ messages in thread
From: David Woodhouse @ 2023-04-18  9:14 UTC (permalink / raw)
  To: Sean Christopherson, Metin Kaya
  Cc: kvm, pbonzini, x86, bp, paul, tglx, mingo, dave.hansen,
	joao.m.martins

[-- Attachment #1: Type: text/plain, Size: 3650 bytes --]

On Mon, 2023-04-17 at 09:31 -0700, Sean Christopherson wrote:
> On Mon, Apr 17, 2023, Metin Kaya wrote:
> > HVMOP_flush_tlbs suboperation of hvm_op hypercall allows a guest to
> > flush all vCPU TLBs. There is no way for the VMM to flush TLBs from
> > userspace.
> 
> Ah, took me a minute to connect the dots.  Monday morning is definitely partly
> to blame, but it would be helpful to expand this sentence to be more explicit as
> to why userspace's inability to efficiently flush TLBs.
> 
> And strictly speaking, userspace _can_ flush TLBs, just not in a precise, efficient
> way.

Hm, how? We should probably implement that in userspace as a fallback,
however much it sucks.

> >  arch/x86/kvm/xen.c                 | 31 ++++++++++++++++++++++++++++++
> >  include/xen/interface/hvm/hvm_op.h |  3 +++
> 
> Modifications to uapi headers is conspicuously missing.  I.e. there likely needs
> to be a capability so that userspace can query support.

Nah, nobody cares. If the kernel "accelerates" this hypercall, so be
it. Userspace will just never get the KVM_EXIT_XEN for that hypercall
because it'll be magically handled, like the others.

> 
> > +                                  u64 arg, u64 *r)
> > +{
> > +     if (arg) {
> > +             *r = -EINVAL;
> > +             return;
> > +     }
> > +
> > +     kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);
> 
> This doesn't even compile.  I'll give you one guess as to how much confidence I
> have that this was properly tested.
> 
> Aha!  And QEMU appears to have Xen emulation support. 

As of 8.0, yes :)

>  That means KVM-Unit-Tests
> is an option.  Specifically, extend the "access" test to use this hypercall instead
> of INVLPG.  That'll verify that the flush is actually being performed as expteced.
> 


> > +                              int cmd, u64 arg, u64 *r)
> > +{
> > +     switch (cmd) {
> > +     case HVMOP_flush_tlbs:
> > +             kvm_xen_hvmop_flush_tlbs(vcpu, longmode, arg, r);
> 
> This can simply be:
> 
>                 if (arg) {
>                         *r = -EINVAL;

I don't even know that we care about in-kernel acceleration for the
-EINVAL case. We could just return false for that, and let userspace
(report and) handle it.

It should never happen; Xen has never accepted anything but NULL as the
first argument. For reasons unclear.

    case HVMOP_flush_tlbs:
        rc = guest_handle_is_null(arg) ? hvmop_flush_tlb_all() : -EINVAL;


I don't think we'll be adding any more HVMOPs to that switch statement
so we might as well make it:

 if (cmd == HVMOP_flush_tlbs && !arg)


>                 } else {
>                         kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_TLB_FLUSH_GUEST);
>                         *r = 0;
>                 }
>                 return true;

...

> > +/* Flushes all VCPU TLBs:
> 
> s/VCPU/vCPU
> 
> And "all vCPU TLBs" is ambiguous.  It could mean "all TLBs for the target vCPU".
> Adding "guest" in there would also be helpful, e.g. to make it clear that this
> doesn't flush TDP mappings.
> 
> > @arg must be NULL. */
> 
> Is the "NULL" part verbatim from Xen?  Because "0" seems like a better description
> than "NULL" for a u64.

Yes, that's all verbatim from Xen. The first hypercall argument is
typically a pointer. 


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18  9:14     ` [EXTERNAL][PATCH " David Woodhouse
@ 2023-04-18 10:13       ` Metin Kaya
  2023-04-18 10:48         ` Paul Durrant
                           ` (2 more replies)
  2023-04-18 14:44       ` [EXTERNAL][PATCH v2] " Sean Christopherson
  1 sibling, 3 replies; 17+ messages in thread
From: Metin Kaya @ 2023-04-18 10:13 UTC (permalink / raw)
  To: kvm, pbonzini
  Cc: x86, bp, dwmw, paul, seanjc, tglx, mingo, dave.hansen,
	joao.m.martins, Metin Kaya

Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
allows the guest to flush all vCPU's TLBs. KVM doesn't provide an
ioctl() to precisely flush guest TLBs, and punting to userspace would
likely negate the performance benefits of avoiding a TLB shootdown in
the guest.

Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>

---
v3:
  - Addressed comments for v2.
  - Verified with XTF/invlpg test case.

v2:
  - Removed an irrelevant URL from commit message.
---
 arch/x86/kvm/xen.c                 | 15 +++++++++++++++
 include/xen/interface/hvm/hvm_op.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 40edf4d1974c..a63c48e8d8fa 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -21,6 +21,7 @@
 #include <xen/interface/vcpu.h>
 #include <xen/interface/version.h>
 #include <xen/interface/event_channel.h>
+#include <xen/interface/hvm/hvm_op.h>
 #include <xen/interface/sched.h>
 
 #include <asm/xen/cpuid.h>
@@ -1330,6 +1331,17 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode,
 	return false;
 }
 
+static bool kvm_xen_hcall_hvm_op(struct kvm_vcpu *vcpu, int cmd, u64 arg, u64 *r)
+{
+	if (cmd == HVMOP_flush_tlbs && !arg) {
+		kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_TLB_FLUSH_GUEST);
+		*r = 0;
+		return true;
+	}
+
+	return false;
+}
+
 struct compat_vcpu_set_singleshot_timer {
     uint64_t timeout_abs_ns;
     uint32_t flags;
@@ -1501,6 +1513,9 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
 			timeout |= params[1] << 32;
 		handled = kvm_xen_hcall_set_timer_op(vcpu, timeout, &r);
 		break;
+	case __HYPERVISOR_hvm_op:
+		handled = kvm_xen_hcall_hvm_op(vcpu, params[0], params[1], &r);
+		break;
 	}
 	default:
 		break;
diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
index 03134bf3cec1..240d8149bc04 100644
--- a/include/xen/interface/hvm/hvm_op.h
+++ b/include/xen/interface/hvm/hvm_op.h
@@ -16,6 +16,9 @@ struct xen_hvm_param {
 };
 DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);
 
+/* Flushes guest TLBs for all vCPUs: @arg must be 0. */
+#define HVMOP_flush_tlbs            5
+
 /* Hint from PV drivers for pagetable destruction. */
 #define HVMOP_pagetable_dying       9
 struct xen_hvm_pagetable_dying {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 10:13       ` [PATCH v3] " Metin Kaya
@ 2023-04-18 10:48         ` Paul Durrant
  2023-04-18 11:04           ` Kaya, Metin
  2023-04-18 11:05           ` [EXTERNAL][PATCH " David Woodhouse
  2023-05-26 20:32         ` [PATCH " Sean Christopherson
  2023-08-04  0:22         ` Sean Christopherson
  2 siblings, 2 replies; 17+ messages in thread
From: Paul Durrant @ 2023-04-18 10:48 UTC (permalink / raw)
  To: Metin Kaya, kvm, pbonzini
  Cc: x86, bp, dwmw, seanjc, tglx, mingo, dave.hansen, joao.m.martins

On 18/04/2023 11:13, Metin Kaya wrote:
> Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
> allows the guest to flush all vCPU's TLBs. KVM doesn't provide an
> ioctl() to precisely flush guest TLBs, and punting to userspace would
> likely negate the performance benefits of avoiding a TLB shootdown in
> the guest.
> 
> Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>
> 
> ---
> v3:
>    - Addressed comments for v2.
>    - Verified with XTF/invlpg test case.
> 
> v2:
>    - Removed an irrelevant URL from commit message.
> ---
>   arch/x86/kvm/xen.c                 | 15 +++++++++++++++
>   include/xen/interface/hvm/hvm_op.h |  3 +++
>   2 files changed, 18 insertions(+)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 40edf4d1974c..a63c48e8d8fa 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -21,6 +21,7 @@
>   #include <xen/interface/vcpu.h>
>   #include <xen/interface/version.h>
>   #include <xen/interface/event_channel.h>
> +#include <xen/interface/hvm/hvm_op.h>
>   #include <xen/interface/sched.h>
>   
>   #include <asm/xen/cpuid.h>
> @@ -1330,6 +1331,17 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode,
>   	return false;
>   }
>   
> +static bool kvm_xen_hcall_hvm_op(struct kvm_vcpu *vcpu, int cmd, u64 arg, u64 *r)
> +{
> +	if (cmd == HVMOP_flush_tlbs && !arg) {
> +		kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_TLB_FLUSH_GUEST);
> +		*r = 0;
> +		return true;
> +	}
> +
> +	return false;
> +}

This code structure means that arg != NULL will result in the guest 
seeing ENOSYS rather than EINVAL.

   Paul

> +
>   struct compat_vcpu_set_singleshot_timer {
>       uint64_t timeout_abs_ns;
>       uint32_t flags;
> @@ -1501,6 +1513,9 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>   			timeout |= params[1] << 32;
>   		handled = kvm_xen_hcall_set_timer_op(vcpu, timeout, &r);
>   		break;
> +	case __HYPERVISOR_hvm_op:
> +		handled = kvm_xen_hcall_hvm_op(vcpu, params[0], params[1], &r);
> +		break;
>   	}
>   	default:
>   		break;
> diff --git a/include/xen/interface/hvm/hvm_op.h b/include/xen/interface/hvm/hvm_op.h
> index 03134bf3cec1..240d8149bc04 100644
> --- a/include/xen/interface/hvm/hvm_op.h
> +++ b/include/xen/interface/hvm/hvm_op.h
> @@ -16,6 +16,9 @@ struct xen_hvm_param {
>   };
>   DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);
>   
> +/* Flushes guest TLBs for all vCPUs: @arg must be 0. */
> +#define HVMOP_flush_tlbs            5
> +
>   /* Hint from PV drivers for pagetable destruction. */
>   #define HVMOP_pagetable_dying       9
>   struct xen_hvm_pagetable_dying {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 10:48         ` Paul Durrant
@ 2023-04-18 11:04           ` Kaya, Metin
  2023-04-18 11:13             ` Paul Durrant
  2023-04-18 11:05           ` [EXTERNAL][PATCH " David Woodhouse
  1 sibling, 1 reply; 17+ messages in thread
From: Kaya, Metin @ 2023-04-18 11:04 UTC (permalink / raw)
  To: paul@xen.org
  Cc: x86@kernel.org, bp@alien8.de, Woodhouse, David, seanjc@google.com,
	tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com,
	joao.m.martins@oracle.com, pbonzini@redhat.com,
	kvm@vger.kernel.org

On 18/04/2023 11:48, Paul Durrant wrote:
>On 18/04/2023 11:13, Metin Kaya wrote:
>> Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
>> allows the guest to flush all vCPU's TLBs. KVM doesn't provide an
>> ioctl() to precisely flush guest TLBs, and punting to userspace would
>> likely negate the performance benefits of avoiding a TLB shootdown in
>> the guest.
>>
>> Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>
>>
>> ---
>> v3:
>>    - Addressed comments for v2.
>>    - Verified with XTF/invlpg test case.
>>
>> v2:
>>    - Removed an irrelevant URL from commit message.
>> ---
>>   arch/x86/kvm/xen.c                 | 15 +++++++++++++++
>>   include/xen/interface/hvm/hvm_op.h |  3 +++
>>   2 files changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index
>> 40edf4d1974c..a63c48e8d8fa 100644
>> --- a/arch/x86/kvm/xen.c
>> +++ b/arch/x86/kvm/xen.c
>> @@ -21,6 +21,7 @@
>>   #include <xen/interface/vcpu.h>
>>   #include <xen/interface/version.h>
>>   #include <xen/interface/event_channel.h>
>> +#include <xen/interface/hvm/hvm_op.h>
>>   #include <xen/interface/sched.h>
>>
>>   #include <asm/xen/cpuid.h>
>> @@ -1330,6 +1331,17 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode,
>>       return false;
>>   }
>>
>> +static bool kvm_xen_hcall_hvm_op(struct kvm_vcpu *vcpu, int cmd, u64
>> +arg, u64 *r) {
>> +     if (cmd == HVMOP_flush_tlbs && !arg) {
>> +             kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_TLB_FLUSH_GUEST);
>> +             *r = 0;
>> +             return true;
>> +     }
>> +
>> +     return false;
>> +}
>
>This code structure means that arg != NULL will result in the guest seeing ENOSYS rather than EINVAL.
>
>   Paul

Yes, because of this comment in David's email:
"I don't even know that we care about in-kernel acceleration for the
-EINVAL case. We could just return false for that, and let userspace
(report and) handle it."

>
>> +
>>   struct compat_vcpu_set_singleshot_timer {
>>       uint64_t timeout_abs_ns;
>>       uint32_t flags;
>> @@ -1501,6 +1513,9 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
>>                       timeout |= params[1] << 32;
>>               handled = kvm_xen_hcall_set_timer_op(vcpu, timeout, &r);
>>               break;
>> +     case __HYPERVISOR_hvm_op:
>> +             handled = kvm_xen_hcall_hvm_op(vcpu, params[0], params[1], &r);
>> +             break;
>>       }
>>       default:
>>               break;
>> diff --git a/include/xen/interface/hvm/hvm_op.h
>> b/include/xen/interface/hvm/hvm_op.h
>> index 03134bf3cec1..240d8149bc04 100644
>> --- a/include/xen/interface/hvm/hvm_op.h
>> +++ b/include/xen/interface/hvm/hvm_op.h
>> @@ -16,6 +16,9 @@ struct xen_hvm_param {
>>   };
>>   DEFINE_GUEST_HANDLE_STRUCT(xen_hvm_param);
>>
>> +/* Flushes guest TLBs for all vCPUs: @arg must be 0. */
>> +#define HVMOP_flush_tlbs            5
>> +
>>   /* Hint from PV drivers for pagetable destruction. */
>>   #define HVMOP_pagetable_dying       9
>>   struct xen_hvm_pagetable_dying {
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 11:04           ` Kaya, Metin
@ 2023-04-18 11:13             ` Paul Durrant
  0 siblings, 0 replies; 17+ messages in thread
From: Paul Durrant @ 2023-04-18 11:13 UTC (permalink / raw)
  To: Kaya, Metin
  Cc: x86@kernel.org, bp@alien8.de, Woodhouse, David, seanjc@google.com,
	tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com,
	joao.m.martins@oracle.com, pbonzini@redhat.com,
	kvm@vger.kernel.org

On 18/04/2023 12:04, Kaya, Metin wrote:
> On 18/04/2023 11:48, Paul Durrant wrote:
>> On 18/04/2023 11:13, Metin Kaya wrote:
>>> Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
>>> allows the guest to flush all vCPU's TLBs. KVM doesn't provide an
>>> ioctl() to precisely flush guest TLBs, and punting to userspace would
>>> likely negate the performance benefits of avoiding a TLB shootdown in
>>> the guest.
>>>
>>> Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>
>>>
>>> ---
>>> v3:
>>>     - Addressed comments for v2.
>>>     - Verified with XTF/invlpg test case.
>>>
>>> v2:
>>>     - Removed an irrelevant URL from commit message.
>>> ---
>>>    arch/x86/kvm/xen.c                 | 15 +++++++++++++++
>>>    include/xen/interface/hvm/hvm_op.h |  3 +++
>>>    2 files changed, 18 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index
>>> 40edf4d1974c..a63c48e8d8fa 100644
>>> --- a/arch/x86/kvm/xen.c
>>> +++ b/arch/x86/kvm/xen.c
>>> @@ -21,6 +21,7 @@
>>>    #include <xen/interface/vcpu.h>
>>>    #include <xen/interface/version.h>
>>>    #include <xen/interface/event_channel.h>
>>> +#include <xen/interface/hvm/hvm_op.h>
>>>    #include <xen/interface/sched.h>
>>>
>>>    #include <asm/xen/cpuid.h>
>>> @@ -1330,6 +1331,17 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode,
>>>        return false;
>>>    }
>>>
>>> +static bool kvm_xen_hcall_hvm_op(struct kvm_vcpu *vcpu, int cmd, u64
>>> +arg, u64 *r) {
>>> +     if (cmd == HVMOP_flush_tlbs && !arg) {
>>> +             kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_TLB_FLUSH_GUEST);
>>> +             *r = 0;
>>> +             return true;
>>> +     }
>>> +
>>> +     return false;
>>> +}
>>
>> This code structure means that arg != NULL will result in the guest seeing ENOSYS rather than EINVAL.
>>
>>    Paul
> 
> Yes, because of this comment in David's email:
> "I don't even know that we care about in-kernel acceleration for the
> -EINVAL case. We could just return false for that, and let userspace
> (report and) handle it."
> 

Ok, fair enough.

Reviewed-by: Paul Durrant <paul@xen.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXTERNAL][PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 10:48         ` Paul Durrant
  2023-04-18 11:04           ` Kaya, Metin
@ 2023-04-18 11:05           ` David Woodhouse
  1 sibling, 0 replies; 17+ messages in thread
From: David Woodhouse @ 2023-04-18 11:05 UTC (permalink / raw)
  To: paul, Metin Kaya, kvm, pbonzini
  Cc: x86, bp, seanjc, tglx, mingo, dave.hansen, joao.m.martins

[-- Attachment #1: Type: text/plain, Size: 2441 bytes --]

On Tue, 2023-04-18 at 11:48 +0100, Paul Durrant wrote:
> On 18/04/2023 11:13, Metin Kaya wrote:
> > Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
> > allows the guest to flush all vCPU's TLBs. KVM doesn't provide an
> > ioctl() to precisely flush guest TLBs, and punting to userspace would
> > likely negate the performance benefits of avoiding a TLB shootdown in
> > the guest.
> > 
> > Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>

Although as noted on the internal review and by Sean, it would be good
to have a test case that verifies that the TLBs are actually flushed.

> > 
> > ---
> > v3:
> >    - Addressed comments for v2.
> >    - Verified with XTF/invlpg test case.
> > 
> > v2:
> >    - Removed an irrelevant URL from commit message.
> > ---
> >   arch/x86/kvm/xen.c                 | 15 +++++++++++++++
> >   include/xen/interface/hvm/hvm_op.h |  3 +++
> >   2 files changed, 18 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index 40edf4d1974c..a63c48e8d8fa 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -21,6 +21,7 @@
> >   #include <xen/interface/vcpu.h>
> >   #include <xen/interface/version.h>
> >   #include <xen/interface/event_channel.h>
> > +#include <xen/interface/hvm/hvm_op.h>
> >   #include <xen/interface/sched.h>
> > 
> >   #include <asm/xen/cpuid.h>
> > @@ -1330,6 +1331,17 @@ static bool kvm_xen_hcall_sched_op(struct kvm_vcpu *vcpu, bool longmode,
> >       return false;
> >   }
> > 
> > +static bool kvm_xen_hcall_hvm_op(struct kvm_vcpu *vcpu, int cmd, u64 arg, u64 *r)
> > +{
> > +     if (cmd == HVMOP_flush_tlbs && !arg) {
> > +             kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_TLB_FLUSH_GUEST);
> > +             *r = 0;
> > +             return true;
> > +     }
> > +
> > +     return false;
> > +}
> 
> This code structure means that arg != NULL will result in the guest
> seeing ENOSYS rather than EINVAL.

In kvm_xen_hypercall() the default for 'r' is -ENOSYS but because
'handled' never gets set to true, we don't hand that back to the guest.
Instead we get to handle_in_userspace: and do the KVM_EXIT_XEN exit.

So arg != NULL will cause the standard hypercall exit to userspace just
as it does today.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 10:13       ` [PATCH v3] " Metin Kaya
  2023-04-18 10:48         ` Paul Durrant
@ 2023-05-26 20:32         ` Sean Christopherson
  2023-07-25 12:56           ` David Woodhouse
  2023-08-04  0:22         ` Sean Christopherson
  2 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2023-05-26 20:32 UTC (permalink / raw)
  To: Metin Kaya
  Cc: kvm, pbonzini, x86, bp, dwmw, paul, tglx, mingo, dave.hansen,
	joao.m.martins

On Tue, Apr 18, 2023, Metin Kaya wrote:
> Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
> allows the guest to flush all vCPU's TLBs. KVM doesn't provide an
> ioctl() to precisely flush guest TLBs, and punting to userspace would
> likely negate the performance benefits of avoiding a TLB shootdown in
> the guest.
> 
> Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>
> 
> ---
> v3:
>   - Addressed comments for v2.
>   - Verified with XTF/invlpg test case.
> 
> v2:
>   - Removed an irrelevant URL from commit message.
> ---
>  arch/x86/kvm/xen.c                 | 15 +++++++++++++++
>  include/xen/interface/hvm/hvm_op.h |  3 +++
>  2 files changed, 18 insertions(+)

I still don't see a testcase.

  : This doesn't even compile.  I'll give you one guess as to how much confidence I
  : have that this was properly tested.
  : 
  : Aha!  And QEMU appears to have Xen emulation support.  That means KVM-Unit-Tests
  : is an option.  Specifically, extend the "access" test to use this hypercall instead
  : of INVLPG.  That'll verify that the flush is actually being performed as expteced.

Let me be more explicit this time: I am not applying this without a test.  I don't
care how trivial a patch may seem, I'm done taking patches without test coverage
unless there's a *really* good reason for me to do so.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-05-26 20:32         ` [PATCH " Sean Christopherson
@ 2023-07-25 12:56           ` David Woodhouse
  2023-07-26 20:07             ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2023-07-25 12:56 UTC (permalink / raw)
  To: Sean Christopherson, Metin Kaya
  Cc: kvm, pbonzini, x86, bp, paul, tglx, mingo, dave.hansen,
	joao.m.martins

[-- Attachment #1: Type: text/plain, Size: 3957 bytes --]

On Fri, 2023-05-26 at 13:32 -0700, Sean Christopherson wrote:
>   : Aha!  And QEMU appears to have Xen emulation support.  That means KVM-Unit-Tests
>   : is an option.  Specifically, extend the "access" test to use this hypercall instead
>   : of INVLPG.  That'll verify that the flush is actually being performed as expteced.

That works. Metin has a better version that actually sets up the
hypercall page properly and uses it, but that one bails out when Xen
support isn't present, and doesn't show the failure mode quite so
clearly. This is the simple version:

From: Metin Kaya <metikaya@amazon.co.uk>
Subject: [PATCH] x86/access: Use hvm_op/flush_tlbs hypercall instead of invlpg instruction

Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>
---
 x86/access.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/x86/access.c b/x86/access.c
index 83c8221..6fa08c5 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -253,12 +253,31 @@ static void clear_user_mask(pt_element_t *ptep, int level, unsigned long virt)
 	*ptep &= ~PT_USER_MASK;
 }
 
+#define KVM_HYPERCALL_INTEL ".byte 0x0f,0x01,0xc1"
+#define __HYPERVISOR_hvm_op 34
+#define HVMOP_flush_tlbs    5
+
+static inline long hvm_op_flush_tlbs(void)
+{
+	long ret;
+	uint64_t nr = __HYPERVISOR_hvm_op;
+	uint64_t op = HVMOP_flush_tlbs;
+	uint64_t arg = 0;
+
+	asm volatile(KVM_HYPERCALL_INTEL
+		     : "=a"(ret)
+		     : "a"(nr), "D" (op), "S" (arg)
+		     : "memory");
+
+	return ret;
+}
+
 static void set_user_mask(pt_element_t *ptep, int level, unsigned long virt)
 {
 	*ptep |= PT_USER_MASK;
 
 	/* Flush to avoid spurious #PF */
-	invlpg((void*)virt);
+	hvm_op_flush_tlbs();
 }
 
 static unsigned set_cr4_smep(ac_test_t *at, int smep)
@@ -577,7 +596,7 @@ fault:
 
 static void ac_set_expected_status(ac_test_t *at)
 {
-	invlpg(at->virt);
+	hvm_op_flush_tlbs();
 
 	if (at->ptep)
 		at->expected_pte = *at->ptep;
-- 
2.34.1


Run it with Xen support enabled and it works...

$ qemu-system-x86_64 -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -M pc -device pci-testdev --accel kvm,xen-version=0x4000a,kernel-irqchip=split  -kernel x86/access_test.flat
enabling apic
starting test

run
CR4.PKE not available, disabling PKE tests
CR4.SMEP not available, disabling SMEP tests
.............................
1916935 tests, 0 failures


To be sure, run it without Xen support, the hypercall silently fails
and the test fails....

$ qemu-system-x86_64 -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -M pc -device pci-testdev --accel kvm  -kernel x86/access_test.flat
enabling apic
starting test

run
CR4.PKE not available, disabling PKE tests
CR4.SMEP not available, disabling SMEP tests

test pte.rw pte.p pde.p: FAIL: pte 2000003 expected 2000023
Dump mapping: address: 0xffff923400000000
------L4 I292: 2100027
------L3 I208: 2101027
------L2 I0: 2102061
------L1 I0: 2000003

test pte.user pte.p pde.p: FAIL: pte 2000005 expected 2000025
Dump mapping: address: 0xffff923400000000
------L4 I292: 2100027
------L3 I208: 2101027
------L2 I0: 2102061
------L1 I0: 2000005

…

> Let me be more explicit this time: I am not applying this without a test.  I don't
> care how trivial a patch may seem, I'm done taking patches without test coverage
> unless there's a *really* good reason for me to do so.

Understood.

So, we know it *works*, but the above is a one-off and not a
*regression* test.

It would definitely be nice to have regression tests that cover
absolutely everything, but adding Xen guest support to the generic KVM-
Unit-Tests might be considered overkill, because this *particular*
thing is fairly unlikely to regress? It really is just calling
kvm_flush_remote_tlbs(), and if that goes wrong we're probably likely
to notice it anyway.

What do you think?


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-07-25 12:56           ` David Woodhouse
@ 2023-07-26 20:07             ` Sean Christopherson
  2023-07-27 12:04               ` David Woodhouse
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2023-07-26 20:07 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Metin Kaya, kvm, pbonzini, x86, bp, paul, tglx, mingo,
	dave.hansen, joao.m.martins

On Tue, Jul 25, 2023, David Woodhouse wrote:
> On Fri, 2023-05-26 at 13:32 -0700, Sean Christopherson wrote:
> >   : Aha!  And QEMU appears to have Xen emulation support.  That means KVM-Unit-Tests
> >   : is an option.  Specifically, extend the "access" test to use this hypercall instead
> >   : of INVLPG.  That'll verify that the flush is actually being performed as expteced.
> 
> That works. Metin has a better version that actually sets up the
> hypercall page properly and uses it, but that one bails out when Xen
> support isn't present, and doesn't show the failure mode quite so
> clearly. This is the simple version:

IIUC, y'all have already written both tests, so why not post both?  I certainly
won't object to more tests if they provide different coverage.

> > Let me be more explicit this time: I am not applying this without a test.  I don't
> > care how trivial a patch may seem, I'm done taking patches without test coverage
> > unless there's a *really* good reason for me to do so.
> 
> Understood.
> 
> So, we know it *works*, but the above is a one-off and not a
> *regression* test.
> 
> It would definitely be nice to have regression tests that cover
> absolutely everything, but adding Xen guest support to the generic KVM-
> Unit-Tests might be considered overkill, because this *particular*
> thing is fairly unlikely to regress? It really is just calling
> kvm_flush_remote_tlbs(), and if that goes wrong we're probably likely
> to notice it anyway.
> 
> What do you think?

I'm a-ok with just having basic test coverage.  We don't have anywhere near 100%
(or even 10%...) coverage of KVM's TLB management, so it would be ridiculous to
require that for Xen PV.

I'd definitely love to change that, and raise the bar for test coverage in general,
but that'll take time (and effort).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-07-26 20:07             ` Sean Christopherson
@ 2023-07-27 12:04               ` David Woodhouse
  2023-07-28  7:31                 ` Kaya, Metin
  0 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2023-07-27 12:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Metin Kaya, kvm, pbonzini, x86, bp, paul, tglx, mingo,
	dave.hansen, joao.m.martins

[-- Attachment #1: Type: text/plain, Size: 5085 bytes --]

On Wed, 2023-07-26 at 13:07 -0700, Sean Christopherson wrote:
> On Tue, Jul 25, 2023, David Woodhouse wrote:
> > On Fri, 2023-05-26 at 13:32 -0700, Sean Christopherson wrote:
> > >   : Aha!  And QEMU appears to have Xen emulation support.  That means KVM-Unit-Tests
> > >   : is an option.  Specifically, extend the "access" test to use this hypercall instead
> > >   : of INVLPG.  That'll verify that the flush is actually being performed as expteced.
> > 
> > That works. Metin has a better version that actually sets up the
> > hypercall page properly and uses it, but that one bails out when Xen
> > support isn't present, and doesn't show the failure mode quite so
> > clearly. This is the simple version:
> 
> IIUC, y'all have already written both tests, so why not post both?  I certainly
> won't object to more tests if they provide different coverage.

Yeah, it just needed cleaning up.

This is what we have; Metin will submit it for real after a little more
polishing. It modifies the existing access test so that *if* it's run
in a Xen environment, and *if* the HVMOP_flush_tlbs call returns
success instead of -ENOSYS, it'll use that instead of invlpg.

In itself, that doesn't give us an automatic regression tests, because
you still need to run it manually — as before,
 qemu-system-x86_64 -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev --accel kvm,xen-version=0x4000a,kernel-irqchip=split  -kernel ~/access_test.flat

If we really want to, we can look at making it run that way when qemu
and the host kernel support Xen guests...?

diff --git a/x86/access.c b/x86/access.c
index 83c8221..8c6e44a 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -4,6 +4,7 @@
 #include "asm/page.h"
 #include "x86/vm.h"
 #include "access.h"
+#include "alloc_page.h"
 
 #define true 1
 #define false 0
@@ -253,12 +254,90 @@ static void clear_user_mask(pt_element_t *ptep, int level, unsigned long virt)
 	*ptep &= ~PT_USER_MASK;
 }
 
+uint8_t *hypercall_page;
+
+#define __HYPERVISOR_hvm_op	34
+#define HVMOP_flush_tlbs	5
+
+static inline int do_hvm_op_flush_tlbs(void)
+{
+	long res = 0, _a1 = (long)(HVMOP_flush_tlbs), _a2 = (long)(NULL);
+
+	asm volatile ("call *%[offset]"
+#if defined(__x86_64__)
+		      : "=a" (res), "+D" (_a1), "+S" (_a2)
+#else
+		      : "=a" (res), "+b" (_a1), "+c" (_a2)
+#endif
+		      : [offset] "r" (hypercall_page + (__HYPERVISOR_hvm_op * 32))
+		      : "memory");
+
+	if (res)
+		printf("hvm_op/HVMOP_flush_tlbs failed: %ld.", res);
+
+	return (int)res;
+}
+
+#define XEN_CPUID_FIRST_LEAF    0x40000000
+#define XEN_CPUID_SIGNATURE_EBX 0x566e6558 /* "XenV" */
+#define XEN_CPUID_SIGNATURE_ECX 0x65584d4d /* "MMXe" */
+#define XEN_CPUID_SIGNATURE_EDX 0x4d4d566e /* "nVMM" */
+
+static void init_hypercalls(void)
+{
+	struct cpuid c;
+	u32 base;
+	bool found = false;
+
+	for (base = XEN_CPUID_FIRST_LEAF; base < XEN_CPUID_FIRST_LEAF + 0x10000;
+			base += 0x100) {
+		c = cpuid(base);
+		if ((c.b == XEN_CPUID_SIGNATURE_EBX) &&
+		    (c.c == XEN_CPUID_SIGNATURE_ECX) &&
+		    (c.d == XEN_CPUID_SIGNATURE_EDX) &&
+		    ((c.a - base) >= 2)) {
+			found = true;
+			break;
+		}
+	}
+	if (!found) {
+		printf("Using native invlpg instruction\n");
+		return;
+	}
+
+	hypercall_page = alloc_pages_flags(0, AREA_ANY | FLAG_DONTZERO);
+	if (!hypercall_page)
+		report_abort("failed to allocate hypercall page");
+
+	memset(hypercall_page, 0xc3, PAGE_SIZE);
+
+	c = cpuid(base + 2);
+	wrmsr(c.b, (u64)hypercall_page);
+	barrier();
+
+	if (hypercall_page[0] == 0xc3)
+		report_abort("Hypercall page not initialised correctly\n");
+
+	/*
+	 * Fall back to invlpg instruction if HVMOP_flush_tlbs hypercall is
+	 * unsuported.
+	 */
+	if (do_hvm_op_flush_tlbs()) {
+		printf("Using native invlpg instruction\n");
+		free_page(hypercall_page);
+		hypercall_page = NULL;
+		return;
+	}
+
+	printf("Using Xen HVMOP_flush_tlbs hypercall\n");
+}
+
 static void set_user_mask(pt_element_t *ptep, int level, unsigned long virt)
 {
 	*ptep |= PT_USER_MASK;
 
 	/* Flush to avoid spurious #PF */
-	invlpg((void*)virt);
+	hypercall_page ? do_hvm_op_flush_tlbs() : invlpg((void*)virt);
 }
 
 static unsigned set_cr4_smep(ac_test_t *at, int smep)
@@ -577,7 +656,7 @@ fault:
 
 static void ac_set_expected_status(ac_test_t *at)
 {
-	invlpg(at->virt);
+	hypercall_page ? do_hvm_op_flush_tlbs() : invlpg((void*)at->virt);
 
 	if (at->ptep)
 		at->expected_pte = *at->ptep;
@@ -1162,6 +1241,10 @@ int ac_test_run(int pt_levels)
 	printf("run\n");
 	tests = successes = 0;
 
+	setup_vm();
+
+	init_hypercalls();
+
 	shadow_cr0 = read_cr0();
 	shadow_cr4 = read_cr4();
 	shadow_cr3 = read_cr3();
@@ -1234,6 +1317,9 @@ int ac_test_run(int pt_levels)
 		successes += ac_test_cases[i](&pt_env);
 	}
 
+	if (hypercall_page)
+		free_page(hypercall_page);
+
 	printf("\n%d tests, %d failures\n", tests, tests - successes);
 
 	return successes == tests;
-- 
2.34.1



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-07-27 12:04               ` David Woodhouse
@ 2023-07-28  7:31                 ` Kaya, Metin
  0 siblings, 0 replies; 17+ messages in thread
From: Kaya, Metin @ 2023-07-28  7:31 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson
  Cc: kvm@vger.kernel.org, pbonzini@redhat.com, x86@kernel.org,
	bp@alien8.de, paul@xen.org, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, joao.m.martins@oracle.com

On Thu, Jul 27, 2023 at 1:04 PM, David Woodhouse wrote:
> On Wed, 2023-07-26 at 13:07 -0700, Sean Christopherson wrote:
> > On Tue, Jul 25, 2023, David Woodhouse wrote:
> > > On Fri, 2023-05-26 at 13:32 -0700, Sean Christopherson wrote:
> > > >   : Aha!  And QEMU appears to have Xen emulation support.  That means KVM-Unit-Tests
> > > >   : is an option.  Specifically, extend the "access" test to use this hypercall instead
> > > >   : of INVLPG.  That'll verify that the flush is actually being performed as expteced.
> > >
> > > That works. Metin has a better version that actually sets up the
> > > hypercall page properly and uses it, but that one bails out when Xen
> > > support isn't present, and doesn't show the failure mode quite so
> > > clearly. This is the simple version:
> >
> > IIUC, y'all have already written both tests, so why not post both?  I certainly
> > won't object to more tests if they provide different coverage.
>
> Yeah, it just needed cleaning up.
>
> This is what we have; Metin will submit it for real after a little more
> polishing. It modifies the existing access test so that *if* it's run
> in a Xen environment, and *if* the HVMOP_flush_tlbs call returns
> success instead of -ENOSYS, it'll use that instead of invlpg.

Patch is posted to kvm-unit-tests: https://marc.info/?l=kvm&m=169052867024191&w=2

> In itself, that doesn't give us an automatic regression tests, because
> you still need to run it manually — as before,
>  qemu-system-x86_64 -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev --accel kvm,xen-version=0x4000a,kernel-irqchip=split  -kernel ~/access_test.flat
>
> If we really want to, we can look at making it run that way when qemu
> and the host kernel support Xen guests...?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 10:13       ` [PATCH v3] " Metin Kaya
  2023-04-18 10:48         ` Paul Durrant
  2023-05-26 20:32         ` [PATCH " Sean Christopherson
@ 2023-08-04  0:22         ` Sean Christopherson
  2 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2023-08-04  0:22 UTC (permalink / raw)
  To: Metin Kaya
  Cc: kvm, pbonzini, x86, bp, dwmw, paul, tglx, mingo, dave.hansen,
	joao.m.martins

On Tue, Apr 18, 2023, Metin Kaya wrote:
> Implement in-KVM support for Xen's HVMOP_flush_tlbs hypercall, which
> allows the guest to flush all vCPU's TLBs. KVM doesn't provide an
> ioctl() to precisely flush guest TLBs, and punting to userspace would
> likely negate the performance benefits of avoiding a TLB shootdown in
> the guest.
> 
> Signed-off-by: Metin Kaya <metikaya@amazon.co.uk>
> 
> ---
> v3:
>   - Addressed comments for v2.
>   - Verified with XTF/invlpg test case.
> 
> v2:
>   - Removed an irrelevant URL from commit message.
> ---

I had applied this and even generated the "thank you", but then I actually ran
the KUT access test[*].  It may very well be a pre-existing bug, but that doesn't
change the fact that this patch breaks a test.

I apologize for being snippy, especially if it turns out this is unique to HSW
(or worse, my system), and your systems don't exhibit issues.  Getting the test
to run took way too long and I'm bit grumpy.

Anyways, whatever is going wrong needs to be sorted out before this can be applied.

[*] https://lore.kernel.org/all/ZMxDRg6gWsVLeYFL@google.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXTERNAL][PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18  9:14     ` [EXTERNAL][PATCH " David Woodhouse
  2023-04-18 10:13       ` [PATCH v3] " Metin Kaya
@ 2023-04-18 14:44       ` Sean Christopherson
  2023-04-18 15:47         ` David Woodhouse
  1 sibling, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2023-04-18 14:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Metin Kaya, kvm, pbonzini, x86, bp, paul, tglx, mingo,
	dave.hansen, joao.m.martins

On Tue, Apr 18, 2023, David Woodhouse wrote:
> On Mon, 2023-04-17 at 09:31 -0700, Sean Christopherson wrote:
> > On Mon, Apr 17, 2023, Metin Kaya wrote:
> > > HVMOP_flush_tlbs suboperation of hvm_op hypercall allows a guest to
> > > flush all vCPU TLBs. There is no way for the VMM to flush TLBs from
> > > userspace.
> > 
> > Ah, took me a minute to connect the dots.ï¿½ Monday morning is definitely partly
> > to blame, but it would be helpful to expand this sentence to be more explicit as
> > to why userspace's inability to efficiently flush TLBs.
> > 
> > And strictly speaking, userspace _can_ flush TLBs, just not in a precise, efficient
> > way.
> 
> Hm, how? We should probably implement that in userspace as a fallback,
> however much it sucks.

Oh, the suckage is high :-)  Use KVM_{G,S}ET_SREGS2 to toggle any CR{0,3,4}/EFER
bit and __set_sregs() will reset the MMU context.  Note that without this fix[*]
that I'm going to squeeze into 6.4, the MMU context reset may result in all TDP
MMU roots being freed and reallocated.

[*] https://lore.kernel.org/all/20230413231251.1481410-1-seanjc@google.com

> 
> > > ï¿½arch/x86/kvm/xen.cï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ | 31 ++++++++++++++++++++++++++++++
> > > ï¿½include/xen/interface/hvm/hvm_op.h |ï¿½ 3 +++
> > 
> > Modifications to uapi headers is conspicuously missing.ï¿½ I.e. there likely needs
> > to be a capability so that userspace can query support.
> 
> Nah, nobody cares. If the kernel "accelerates" this hypercall, so be
> it. Userspace will just never get the KVM_EXIT_XEN for that hypercall
> because it'll be magically handled, like the others.

Ah, that makes sense, I was thinking userspace would complain if it got the
"unexpected" exit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXTERNAL][PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 14:44       ` [EXTERNAL][PATCH v2] " Sean Christopherson
@ 2023-04-18 15:47         ` David Woodhouse
  2023-04-18 16:08           ` David Woodhouse
  0 siblings, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2023-04-18 15:47 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Metin Kaya, kvm, pbonzini, x86, bp, paul, tglx, mingo,
	dave.hansen, joao.m.martins

[-- Attachment #1: Type: text/plain, Size: 3179 bytes --]

On Tue, 2023-04-18 at 07:44 -0700, Sean Christopherson wrote:
> On Tue, Apr 18, 2023, David Woodhouse wrote:
> > On Mon, 2023-04-17 at 09:31 -0700, Sean Christopherson wrote:
> > > On Mon, Apr 17, 2023, Metin Kaya wrote:
> > > > HVMOP_flush_tlbs suboperation of hvm_op hypercall allows a guest to
> > > > flush all vCPU TLBs. There is no way for the VMM to flush TLBs from
> > > > userspace.
> > > 
> > > Ah, took me a minute to connect the dots.ï¿½ Monday morning is definitely partly
> > > to blame, but it would be helpful to expand this sentence to be more explicit as
> > > to why userspace's inability to efficiently flush TLBs.
> > > 
> > > And strictly speaking, userspace _can_ flush TLBs, just not in a precise, efficient
> > > way.
> > 
> > Hm, how? We should probably implement that in userspace as a fallback,
> > however much it sucks.
> 
> Oh, the suckage is high :-)  Use KVM_{G,S}ET_SREGS2 to toggle any CR{0,3,4}/EFER
> bit and __set_sregs() will reset the MMU context.

Oh, that suckage is quite high indeed. I'm not actually sure I'll
bother doing this in QEMU. It's quite esoteric; Linux has never used it
and I think that after launching <mumble> million production Xen guests
this way we've only ever seen it used by some FreeBSD variant.

It doesn't seem to be in mainline FreeBSD; there's a hint of it here
https://people.freebsd.org/~dfr/freebsd-6.x-xen-31032009.diff for
example but it's disabled even then:

 	if (pmap == kernel_pmap || pmap->pm_active == all_cpus) {
-		invltlb();
-		smp_invltlb();
+#if defined(XENHVM) && defined(notdef)
+		/*
+		 * As far as I can tell, this makes things slower, at
+		 * least where there are only two physical cpus and
+		 * the host is not overcommitted.
+		 */
+		if (is_running_on_xen()) {
+			HYPERVISOR_hvm_op(HVMOP_flush_tlbs, NULL);
+		} else
+#endif
+		{
+			invltlb();
+			smp_invltlb();
+		}

>  Note that without this fix[*]
> that I'm going to squeeze into 6.4, the MMU context reset may result in all TDP
> MMU roots being freed and reallocated.
> 
> [*] https://lore.kernel.org/all/20230413231251.1481410-1-seanjc@google.com
> 
> > 
> > > > ï¿½arch/x86/kvm/xen.cï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ | 31 ++++++++++++++++++++++++++++++
> > > > ï¿½include/xen/interface/hvm/hvm_op.h |ï¿½ 3 +++
> > > 
> > > Modifications to uapi headers is conspicuously missing.ï¿½ I.e. there likely needs
> > > to be a capability so that userspace can query support.
> > 
> > Nah, nobody cares. If the kernel "accelerates" this hypercall, so be
> > it. Userspace will just never get the KVM_EXIT_XEN for that hypercall
> > because it'll be magically handled, like the others.
> 
> Ah, that makes sense, I was thinking userspace would complain if it got the
> "unexpected" exit.

I've tried to follow a model where userspace is *always* expected to
implement the hypercall, and if the kernel chooses to intervene to
accelerate it, that's a bonus.

It saves a bunch of complexity in error handling in the kernel when we
can just say "screw this, let userspace cope".

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXTERNAL][PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall
  2023-04-18 15:47         ` David Woodhouse
@ 2023-04-18 16:08           ` David Woodhouse
  0 siblings, 0 replies; 17+ messages in thread
From: David Woodhouse @ 2023-04-18 16:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Metin Kaya, kvm, pbonzini, x86, bp, paul, tglx, mingo,
	dave.hansen, joao.m.martins

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On Tue, 2023-04-18 at 16:47 +0100, David Woodhouse wrote:
> 
> Oh, that suckage is quite high indeed. I'm not actually sure I'll
> bother doing this in QEMU. It's quite esoteric; Linux has never used it
> and I think that after launching <mumble> million production Xen guests
> this way we've only ever seen it used by some FreeBSD variant.
> 
> It doesn't seem to be in mainline FreeBSD; there's a hint of it here
> https://people.freebsd.org/~dfr/freebsd-6.x-xen-31032009.diff for
> example but it's disabled even then:

For the record, Metin and Paul tell me I lied; it was actually seen
with a Windows guest using the ancient closed-source Citrix drivers.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-04  0:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230417122206.34647-1-metikaya@amazon.co.uk>
2023-04-17 12:22 ` [PATCH v2] KVM: x86/xen: Implement hvm_op/HVMOP_flush_tlbs hypercall Metin Kaya
2023-04-17 16:31   ` Sean Christopherson
2023-04-18  9:14     ` [EXTERNAL][PATCH " David Woodhouse
2023-04-18 10:13       ` [PATCH v3] " Metin Kaya
2023-04-18 10:48         ` Paul Durrant
2023-04-18 11:04           ` Kaya, Metin
2023-04-18 11:13             ` Paul Durrant
2023-04-18 11:05           ` [EXTERNAL][PATCH " David Woodhouse
2023-05-26 20:32         ` [PATCH " Sean Christopherson
2023-07-25 12:56           ` David Woodhouse
2023-07-26 20:07             ` Sean Christopherson
2023-07-27 12:04               ` David Woodhouse
2023-07-28  7:31                 ` Kaya, Metin
2023-08-04  0:22         ` Sean Christopherson
2023-04-18 14:44       ` [EXTERNAL][PATCH v2] " Sean Christopherson
2023-04-18 15:47         ` David Woodhouse
2023-04-18 16:08           ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).