[PATCH] VMX: wbinvd when vmentry under UC

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] VMX: wbinvd when vmentry under UC
@ 2013-11-25 16:14 Liu, Jinsong
  2013-11-25 16:39 ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Liu, Jinsong @ 2013-11-25 16:14 UTC (permalink / raw)
  To: Jan Beulich, xen-devel@lists.xen.org
  Cc: keir@xen.org, suravee.suthikulpanit@amd.com, tim@xen.org,
	sherry.hurwitz@amd.com, Dong, Eddie, zhenzhong.duan@oracle.com,
	Dugger, Donald D, Auld, Will, Nakajima, Jun,
	andrew.cooper3@citrix.com, Zhang, Xiantao

[-- Attachment #1: Type: text/plain, Size: 3059 bytes --]

>From e2d47e2f75bac6876b7c2eaecfe946966bf27516 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Tue, 26 Nov 2013 04:53:17 +0800
Subject: [PATCH] VMX: wbinvd when vmentry under UC

This patch flush cache when vmentry back to UC guest, to prevent
cache polluted by hypervisor access guest memory during UC mode.

However, wbinvd is a _very_ time consuming operation, so
1. wbinvd ... timer has a good possibility to expire while
   irq disabled, it then would be delayed until
2. ... vmentry back to guest (and irq enalbed), timer interrupt
   then occurs and drops guest at once;
3. drop to hypervisor ... then vmentry and wbinvd again;

This loop will run again and again, until lucky enough wbinvd
happens not to expire timer and then loop break, usually it would
occur 10K~60K times, blocking guest 10s~60s.

reprogram timer to avoid dead_like_loop.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c |   32 ++++++++++++++++++++++++++++----
 1 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 75be62e..4768c9b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -642,10 +642,6 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
             __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
     }
 
-    /* For guest cr0.cd setting, do not use potentially polluted cache */
-    if ( unlikely(v->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) )
-        wbinvd();
-
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
 }
@@ -2967,6 +2963,27 @@ out:
         nvmx_idtv_handling();
 }
 
+/*
+ * wbinvd is a _very_ time consuming operation, so
+ * 1. wbinvd ... timer has a good possibility to expire while
+ *    irq disabled, it then would be delayed until
+ * 2. ... vmentry back to guest (and irq enalbed), timer interrupt
+ *    then occurs and drops guest at once;
+ * 3. drop to hypervisor ... then vmentry and wbinvd again;
+ *
+ * This loop will run again and again, until lucky enough wbinvd
+ * happens not to expire timer and then loop break, usually it would
+ * occur 10K~60K times, blocking guest 10s~60s.
+ *
+ * reprogram timer to avoid dead_like_loop.
+ */
+static inline void uc_wbinvd_and_timer_adjust(void)
+{
+    reprogram_timer(0);
+    wbinvd();
+    reprogram_timer(NOW() + MILLISECS(1));
+}
+
 void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
@@ -2974,6 +2991,13 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
     struct hvm_vcpu_asid *p_asid;
     bool_t need_flush;
 
+    /*
+     * In case hypervisor may access hvm guest memory, and then
+     * cache line polluted under UC mode.
+     */
+    if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) )
+        uc_wbinvd_and_timer_adjust();
+
     if ( !cpu_has_vmx_vpid )
         goto out;
     if ( nestedhvm_vcpu_in_guestmode(curr) )
-- 
1.7.1

[-- Attachment #2: 0001-VMX-wbinvd-when-vmentry-under-UC.patch --]
[-- Type: application/octet-stream, Size: 2974 bytes --]

From e2d47e2f75bac6876b7c2eaecfe946966bf27516 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Tue, 26 Nov 2013 04:53:17 +0800
Subject: [PATCH] VMX: wbinvd when vmentry under UC

This patch flush cache when vmentry back to UC guest, to prevent
cache polluted by hypervisor access guest memory during UC mode.

However, wbinvd is a _very_ time consuming operation, so
1. wbinvd ... timer has a good possibility to expire while
   irq disabled, it then would be delayed until
2. ... vmentry back to guest (and irq enalbed), timer interrupt
   then occurs and drops guest at once;
3. drop to hypervisor ... then vmentry and wbinvd again;

This loop will run again and again, until lucky enough wbinvd
happens not to expire timer and then loop break, usually it would
occur 10K~60K times, blocking guest 10s~60s.

reprogram timer to avoid dead_like_loop.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmx.c |   32 ++++++++++++++++++++++++++++----
 1 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 75be62e..4768c9b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -642,10 +642,6 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
             __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
     }
 
-    /* For guest cr0.cd setting, do not use potentially polluted cache */
-    if ( unlikely(v->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) )
-        wbinvd();
-
     vmx_restore_guest_msrs(v);
     vmx_restore_dr(v);
 }
@@ -2967,6 +2963,27 @@ out:
         nvmx_idtv_handling();
 }
 
+/*
+ * wbinvd is a _very_ time consuming operation, so
+ * 1. wbinvd ... timer has a good possibility to expire while
+ *    irq disabled, it then would be delayed until
+ * 2. ... vmentry back to guest (and irq enalbed), timer interrupt
+ *    then occurs and drops guest at once;
+ * 3. drop to hypervisor ... then vmentry and wbinvd again;
+ *
+ * This loop will run again and again, until lucky enough wbinvd
+ * happens not to expire timer and then loop break, usually it would
+ * occur 10K~60K times, blocking guest 10s~60s.
+ *
+ * reprogram timer to avoid dead_like_loop.
+ */
+static inline void uc_wbinvd_and_timer_adjust(void)
+{
+    reprogram_timer(0);
+    wbinvd();
+    reprogram_timer(NOW() + MILLISECS(1));
+}
+
 void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 {
     struct vcpu *curr = current;
@@ -2974,6 +2991,13 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
     struct hvm_vcpu_asid *p_asid;
     bool_t need_flush;
 
+    /*
+     * In case hypervisor may access hvm guest memory, and then
+     * cache line polluted under UC mode.
+     */
+    if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) )
+        uc_wbinvd_and_timer_adjust();
+
     if ( !cpu_has_vmx_vpid )
         goto out;
     if ( nestedhvm_vcpu_in_guestmode(curr) )
-- 
1.7.1


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-25 16:14 [PATCH] VMX: wbinvd when vmentry under UC Liu, Jinsong
@ 2013-11-25 16:39 ` Andrew Cooper
  2013-11-25 16:46   ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2013-11-25 16:39 UTC (permalink / raw)
  To: Liu, Jinsong
  Cc: keir@xen.org, Jan Beulich, tim@xen.org, Dong, Eddie,
	zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Auld, Will, Nakajima, Jun,
	sherry.hurwitz@amd.com, suravee.suthikulpanit@amd.com,
	Zhang, Xiantao

On 25/11/13 16:14, Liu, Jinsong wrote:
> From e2d47e2f75bac6876b7c2eaecfe946966bf27516 Mon Sep 17 00:00:00 2001
> From: Liu Jinsong <jinsong.liu@intel.com>
> Date: Tue, 26 Nov 2013 04:53:17 +0800
> Subject: [PATCH] VMX: wbinvd when vmentry under UC
>
> This patch flush cache when vmentry back to UC guest, to prevent
> cache polluted by hypervisor access guest memory during UC mode.
>
> However, wbinvd is a _very_ time consuming operation, so
> 1. wbinvd ... timer has a good possibility to expire while
>    irq disabled, it then would be delayed until
> 2. ... vmentry back to guest (and irq enalbed), timer interrupt
>    then occurs and drops guest at once;
> 3. drop to hypervisor ... then vmentry and wbinvd again;
>
> This loop will run again and again, until lucky enough wbinvd
> happens not to expire timer and then loop break, usually it would
> occur 10K~60K times, blocking guest 10s~60s.
>
> reprogram timer to avoid dead_like_loop.
>
> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmx.c |   32 ++++++++++++++++++++++++++++----
>  1 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 75be62e..4768c9b 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -642,10 +642,6 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
>              __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
>      }
>  
> -    /* For guest cr0.cd setting, do not use potentially polluted cache */
> -    if ( unlikely(v->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) )
> -        wbinvd();
> -
>      vmx_restore_guest_msrs(v);
>      vmx_restore_dr(v);
>  }
> @@ -2967,6 +2963,27 @@ out:
>          nvmx_idtv_handling();
>  }
>  
> +/*
> + * wbinvd is a _very_ time consuming operation, so
> + * 1. wbinvd ... timer has a good possibility to expire while
> + *    irq disabled, it then would be delayed until
> + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt
> + *    then occurs and drops guest at once;
> + * 3. drop to hypervisor ... then vmentry and wbinvd again;
> + *
> + * This loop will run again and again, until lucky enough wbinvd
> + * happens not to expire timer and then loop break, usually it would
> + * occur 10K~60K times, blocking guest 10s~60s.
> + *
> + * reprogram timer to avoid dead_like_loop.
> + */
> +static inline void uc_wbinvd_and_timer_adjust(void)
> +{
> +    reprogram_timer(0);
> +    wbinvd();
> +    reprogram_timer(NOW() + MILLISECS(1));

Do you have any number of the time delta across the wbinvd() ?

As it currently stands, I do not think it is reasonable to reprogram the
timer like this.

~Andrew

> +}
> +
>  void vmx_vmenter_helper(const struct cpu_user_regs *regs)
>  {
>      struct vcpu *curr = current;
> @@ -2974,6 +2991,13 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
>      struct hvm_vcpu_asid *p_asid;
>      bool_t need_flush;
>  
> +    /*
> +     * In case hypervisor may access hvm guest memory, and then
> +     * cache line polluted under UC mode.
> +     */
> +    if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) )
> +        uc_wbinvd_and_timer_adjust();
> +
>      if ( !cpu_has_vmx_vpid )
>          goto out;
>      if ( nestedhvm_vcpu_in_guestmode(curr) )

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-25 16:39 ` Andrew Cooper
@ 2013-11-25 16:46   ` Jan Beulich
  2013-11-25 16:52     ` Auld, Will
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2013-11-25 16:46 UTC (permalink / raw)
  To: Andrew Cooper, Jinsong Liu
  Cc: keir@xen.org, suravee.suthikulpanit@amd.com, tim@xen.org,
	Eddie Dong, zhenzhong.duan@oracle.com, Donald D Dugger,
	xen-devel@lists.xen.org, Will Auld, Jun Nakajima,
	sherry.hurwitz@amd.com, Xiantao Zhang

>>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 25/11/13 16:14, Liu, Jinsong wrote:
>> +/*
>> + * wbinvd is a _very_ time consuming operation, so
>> + * 1. wbinvd ... timer has a good possibility to expire while
>> + *    irq disabled, it then would be delayed until
>> + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt
>> + *    then occurs and drops guest at once;
>> + * 3. drop to hypervisor ... then vmentry and wbinvd again;
>> + *
>> + * This loop will run again and again, until lucky enough wbinvd
>> + * happens not to expire timer and then loop break, usually it would
>> + * occur 10K~60K times, blocking guest 10s~60s.
>> + *
>> + * reprogram timer to avoid dead_like_loop.
>> + */
>> +static inline void uc_wbinvd_and_timer_adjust(void)
>> +{
>> +    reprogram_timer(0);
>> +    wbinvd();
>> +    reprogram_timer(NOW() + MILLISECS(1));
> 
> Do you have any number of the time delta across the wbinvd() ?
> 
> As it currently stands, I do not think it is reasonable to reprogram the
> timer like this.

Indeed I was wondering too, but didn't get to look in detail at what
consequences would arise from doing this.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-25 16:46   ` Jan Beulich
@ 2013-11-25 16:52     ` Auld, Will
  2013-11-26  8:56       ` Liu, Jinsong
  0 siblings, 1 reply; 15+ messages in thread
From: Auld, Will @ 2013-11-25 16:52 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Liu, Jinsong
  Cc: keir@xen.org, suravee.suthikulpanit@amd.com, tim@xen.org,
	Dong, Eddie, zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Auld, Will, Nakajima, Jun,
	sherry.hurwitz@amd.com, Zhang, Xiantao



> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, November 25, 2013 8:47 AM
> To: Andrew Cooper; Liu, Jinsong
> Cc: sherry.hurwitz@amd.com; suravee.suthikulpanit@amd.com; Dugger,
> Donald D; Dong, Eddie; Nakajima, Jun; Auld, Will; Zhang, Xiantao; xen-
> devel@lists.xen.org; konrad.wilk@oracle.com; zhenzhong.duan@oracle.com;
> keir@xen.org; tim@xen.org
> Subject: Re: [PATCH] VMX: wbinvd when vmentry under UC
> 
> >>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com>
> wrote:
> > On 25/11/13 16:14, Liu, Jinsong wrote:
> >> +/*
> >> + * wbinvd is a _very_ time consuming operation, so
> >> + * 1. wbinvd ... timer has a good possibility to expire while
> >> + *    irq disabled, it then would be delayed until
> >> + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt
> >> + *    then occurs and drops guest at once;
> >> + * 3. drop to hypervisor ... then vmentry and wbinvd again;
> >> + *
> >> + * This loop will run again and again, until lucky enough wbinvd
> >> + * happens not to expire timer and then loop break, usually it
> would
> >> + * occur 10K~60K times, blocking guest 10s~60s.
> >> + *
> >> + * reprogram timer to avoid dead_like_loop.
> >> + */
> >> +static inline void uc_wbinvd_and_timer_adjust(void) {
> >> +    reprogram_timer(0);
> >> +    wbinvd();
> >> +    reprogram_timer(NOW() + MILLISECS(1));
> >
> > Do you have any number of the time delta across the wbinvd() ?
> >
> > As it currently stands, I do not think it is reasonable to reprogram
> > the timer like this.
> 
> Indeed I was wondering too, but didn't get to look in detail at what
> consequences would arise from doing this.
> 
> Jan

Basically, increase the timer setting so that it is unlikely to fire during wbinvd() but still be there as a safeguard. Then reset as you are currently doing after wbinvd(). 

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-25 16:52     ` Auld, Will
@ 2013-11-26  8:56       ` Liu, Jinsong
  2013-11-26  9:33         ` Jan Beulich
  2013-11-28  7:16         ` Liu, Jinsong
  0 siblings, 2 replies; 15+ messages in thread
From: Liu, Jinsong @ 2013-11-26  8:56 UTC (permalink / raw)
  To: Auld, Will, Jan Beulich, Andrew Cooper
  Cc: keir@xen.org, suravee.suthikulpanit@amd.com, tim@xen.org,
	Dong, Eddie, zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Nakajima, Jun, sherry.hurwitz@amd.com,
	Zhang, Xiantao

Auld, Will wrote:
>> -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, November 25, 2013 8:47 AM
>> To: Andrew Cooper; Liu, Jinsong
>> Cc: sherry.hurwitz@amd.com; suravee.suthikulpanit@amd.com; Dugger,
>> Donald D; Dong, Eddie; Nakajima, Jun; Auld, Will; Zhang, Xiantao;
>> xen- devel@lists.xen.org; konrad.wilk@oracle.com;
>> zhenzhong.duan@oracle.com; keir@xen.org; tim@xen.org Subject: Re:
>> [PATCH] VMX: wbinvd when vmentry under UC 
>> 
>>>>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com>
>> wrote:
>>> On 25/11/13 16:14, Liu, Jinsong wrote:
>>>> +/*
>>>> + * wbinvd is a _very_ time consuming operation, so
>>>> + * 1. wbinvd ... timer has a good possibility to expire while
>>>> + *    irq disabled, it then would be delayed until
>>>> + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt
>>>> + *    then occurs and drops guest at once;
>>>> + * 3. drop to hypervisor ... then vmentry and wbinvd again; + *
>>>> + * This loop will run again and again, until lucky enough wbinvd
>>>> + * happens not to expire timer and then loop break, usually it
>>>> would + * occur 10K~60K times, blocking guest 10s~60s.
>>>> + *
>>>> + * reprogram timer to avoid dead_like_loop.
>>>> + */
>>>> +static inline void uc_wbinvd_and_timer_adjust(void) { +   
>>>> reprogram_timer(0); +    wbinvd();
>>>> +    reprogram_timer(NOW() + MILLISECS(1));
>>> 
>>> Do you have any number of the time delta across the wbinvd() ?

Generally it depends on how big and how dirty the cache is.
In my environment (L1/L2/L3 cache: 64/256/20480K, 2.3G processor), it varies from 1~3ms:
(XEN) wbinvd overhead: 0x6be9ad tsc, 3082209 ns
...
(XEN) wbinvd overhead: 0x26bc68 tsc, 1106378 ns

>>> 
>>> As it currently stands, I do not think it is reasonable to reprogram
>>> the timer like this.
>> 
>> Indeed I was wondering too, but didn't get to look in detail at what
>> consequences would arise from doing this.
>> 
>> Jan
> 
> Basically, increase the timer setting so that it is unlikely to fire
> during wbinvd() but still be there as a safeguard. Then reset as you
> are currently doing after wbinvd().  
> 
> Will

Yes. reprogram_timer here just delay timer a little slot, say, 1~2ms. I think it's OK, i.e. at any point of wbinvd() operation at hypervisor, or any irq disabled area, timer interrupt in fact also has good chance to be delayed some time -- however at TIMER_SOFTIRQ, all expired thing would be executed, and re-calculated and set next time point via reprogram_timer -- that's OK.

Another option is to trust guest. We can remove wbinvd from vmentry -- per SDM when guest sets cr0.cd it should explicitly flush cache, otherwise itself cannot guarantee cache coherency.

Thoughts?

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-26  8:56       ` Liu, Jinsong
@ 2013-11-26  9:33         ` Jan Beulich
  2013-11-28  7:16         ` Liu, Jinsong
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2013-11-26  9:33 UTC (permalink / raw)
  To: Jinsong Liu, Will Auld
  Cc: tim@xen.org, keir@xen.org, suravee.suthikulpanit@amd.com,
	Andrew Cooper, Eddie Dong, zhenzhong.duan@oracle.com,
	Donald D Dugger, xen-devel@lists.xen.org, Jun Nakajima,
	sherry.hurwitz@amd.com, Xiantao Zhang

>>> On 26.11.13 at 09:56, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Another option is to trust guest. We can remove wbinvd from vmentry -- per SDM 
> when guest sets cr0.cd it should explicitly flush cache, otherwise itself 
> cannot guarantee cache coherency.

Which - once again - wouldn't account for hypervisor writes to guest
memory.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-26  8:56       ` Liu, Jinsong
  2013-11-26  9:33         ` Jan Beulich
@ 2013-11-28  7:16         ` Liu, Jinsong
  2013-11-28 14:24           ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Liu, Jinsong @ 2013-11-28  7:16 UTC (permalink / raw)
  To: Jan Beulich, Auld, Will
  Cc: keir@xen.org, Nakajima, Jun, Dong, Eddie, tim@xen.org,
	zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Andrew Cooper,
	suravee.suthikulpanit@amd.com, sherry.hurwitz@amd.com,
	Zhang, Xiantao

Liu, Jinsong wrote:
> Auld, Will wrote:
>>> -----Original Message-----
>>> From: Jan Beulich [mailto:JBeulich@suse.com]
>>> Sent: Monday, November 25, 2013 8:47 AM
>>> To: Andrew Cooper; Liu, Jinsong
>>> Cc: sherry.hurwitz@amd.com; suravee.suthikulpanit@amd.com; Dugger,
>>> Donald D; Dong, Eddie; Nakajima, Jun; Auld, Will; Zhang, Xiantao;
>>> xen- devel@lists.xen.org; konrad.wilk@oracle.com;
>>> zhenzhong.duan@oracle.com; keir@xen.org; tim@xen.org Subject: Re:
>>> [PATCH] VMX: wbinvd when vmentry under UC
>>> 
>>>>>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com>
>>> wrote:
>>>> On 25/11/13 16:14, Liu, Jinsong wrote:
>>>>> +/*
>>>>> + * wbinvd is a _very_ time consuming operation, so
>>>>> + * 1. wbinvd ... timer has a good possibility to expire while
>>>>> + *    irq disabled, it then would be delayed until
>>>>> + * 2. ... vmentry back to guest (and irq enalbed), timer
>>>>> interrupt + *    then occurs and drops guest at once;
>>>>> + * 3. drop to hypervisor ... then vmentry and wbinvd again; + *
>>>>> + * This loop will run again and again, until lucky enough wbinvd
>>>>> + * happens not to expire timer and then loop break, usually it
>>>>> would + * occur 10K~60K times, blocking guest 10s~60s. + *
>>>>> + * reprogram timer to avoid dead_like_loop.
>>>>> + */
>>>>> +static inline void uc_wbinvd_and_timer_adjust(void) { +
>>>>> reprogram_timer(0); +    wbinvd();
>>>>> +    reprogram_timer(NOW() + MILLISECS(1));
>>>> 
>>>> Do you have any number of the time delta across the wbinvd() ?
> 
> Generally it depends on how big and how dirty the cache is.
> In my environment (L1/L2/L3 cache: 64/256/20480K, 2.3G processor), it
> varies from 1~3ms: (XEN) wbinvd overhead: 0x6be9ad tsc, 3082209 ns
> ...
> (XEN) wbinvd overhead: 0x26bc68 tsc, 1106378 ns
> 
>>>> 
>>>> As it currently stands, I do not think it is reasonable to
>>>> reprogram the timer like this.
>>> 
>>> Indeed I was wondering too, but didn't get to look in detail at what
>>> consequences would arise from doing this.
>>> 
>>> Jan
>> 
>> Basically, increase the timer setting so that it is unlikely to fire
>> during wbinvd() but still be there as a safeguard. Then reset as you
>> are currently doing after wbinvd().
>> 
>> Will
> 
> Yes. reprogram_timer here just delay timer a little slot, say, 1~2ms.
> I think it's OK, i.e. at any point of wbinvd() operation at
> hypervisor, or any irq disabled area, timer interrupt in fact also
> has good chance to be delayed some time -- however at TIMER_SOFTIRQ,
> all expired thing would be executed, and re-calculated and set next
> time point via reprogram_timer -- that's OK.     

Comments/thoughts about this option?

> 
> Another option is to trust guest. We can remove wbinvd from vmentry
> -- per SDM when guest sets cr0.cd it should explicitly flush cache,
> otherwise itself cannot guarantee cache coherency.  

OK, as you said we don't consider option 2.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-28  7:16         ` Liu, Jinsong
@ 2013-11-28 14:24           ` Jan Beulich
  2013-11-29 14:15             ` Liu, Jinsong
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2013-11-28 14:24 UTC (permalink / raw)
  To: jinsong.liu, will.auld
  Cc: tim, keir, suravee.suthikulpanit, andrew.cooper3, eddie.dong,
	zhenzhong.duan, donald.d.dugger, xen-devel, jun.nakajima,
	sherry.hurwitz, xiantao.zhang

>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>>
>Liu, Jinsong wrote:
>> Yes. reprogram_timer here just delay timer a little slot, say, 1~2ms.
>> I think it's OK, i.e. at any point of wbinvd() operation at
>> hypervisor, or any irq disabled area, timer interrupt in fact also
>> has good chance to be delayed some time -- however at TIMER_SOFTIRQ,
>> all expired thing would be executed, and re-calculated and set next
>> time point via reprogram_timer -- that's OK.     
>
>Comments/thoughts about this option?

Apart from continuing to be very uncertain that this won't have any
bad side effects, I'm also rather concerned that you deal with one
special case interrupt here, ignoring other potentially high rate ones
(like such coming from NICs).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-28 14:24           ` Jan Beulich
@ 2013-11-29 14:15             ` Liu, Jinsong
  2013-11-29 14:24               ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Liu, Jinsong @ 2013-11-29 14:15 UTC (permalink / raw)
  To: Jan Beulich, Auld, Will
  Cc: tim@xen.org, keir@xen.org, suravee.suthikulpanit@amd.com,
	andrew.cooper3@citrix.com, Dong, Eddie, zhenzhong.duan@oracle.com,
	Dugger, Donald D, xen-devel@lists.xen.org, Nakajima, Jun,
	sherry.hurwitz@amd.com, Zhang, Xiantao

Jan Beulich wrote:
>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>>
>> Liu, Jinsong wrote:
>>> Yes. reprogram_timer here just delay timer a little slot, say,
>>> 1~2ms. I think it's OK, i.e. at any point of wbinvd() operation at
>>> hypervisor, or any irq disabled area, timer interrupt in fact also
>>> has good chance to be delayed some time -- however at TIMER_SOFTIRQ,
>>> all expired thing would be executed, and re-calculated and set next
>>> time point via reprogram_timer -- that's OK.
>> 
>> Comments/thoughts about this option?
> 
> Apart from continuing to be very uncertain that this won't have any
> bad side effects, I'm also rather concerned that you deal with one
> special case interrupt here, ignoring other potentially high rate ones
> (like such coming from NICs).
> 
> Jan

Considering this, seems adding flag is the only work around way since high freq interrupt would result in dead-like-loop. My concern of adding flag is it's not easy to clean every possible path, especially future extension.

Or, do not support vt-d w/o snoop.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-29 14:15             ` Liu, Jinsong
@ 2013-11-29 14:24               ` Andrew Cooper
  2013-11-29 14:31                 ` Liu, Jinsong
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2013-11-29 14:24 UTC (permalink / raw)
  To: Liu, Jinsong
  Cc: keir@xen.org, Jan Beulich, Nakajima, Jun, tim@xen.org,
	Dong, Eddie, zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Auld, Will,
	suravee.suthikulpanit@amd.com, sherry.hurwitz@amd.com,
	Zhang, Xiantao

On 29/11/13 14:15, Liu, Jinsong wrote:
> Jan Beulich wrote:
>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>>
>>> Liu, Jinsong wrote:
>>>> Yes. reprogram_timer here just delay timer a little slot, say,
>>>> 1~2ms. I think it's OK, i.e. at any point of wbinvd() operation at
>>>> hypervisor, or any irq disabled area, timer interrupt in fact also
>>>> has good chance to be delayed some time -- however at TIMER_SOFTIRQ,
>>>> all expired thing would be executed, and re-calculated and set next
>>>> time point via reprogram_timer -- that's OK.
>>> Comments/thoughts about this option?
>> Apart from continuing to be very uncertain that this won't have any
>> bad side effects, I'm also rather concerned that you deal with one
>> special case interrupt here, ignoring other potentially high rate ones
>> (like such coming from NICs).
>>
>> Jan
> Considering this, seems adding flag is the only work around way since high freq interrupt would result in dead-like-loop. My concern of adding flag is it's not easy to clean every possible path, especially future extension.
>
> Or, do not support vt-d w/o snoop.
>
> Thanks,
> Jinsong

Do you know how many systems have vt-d without snoop ?

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-29 14:24               ` Andrew Cooper
@ 2013-11-29 14:31                 ` Liu, Jinsong
  2013-11-29 14:50                   ` Andrew Cooper
  2013-11-29 14:53                   ` Jan Beulich
  0 siblings, 2 replies; 15+ messages in thread
From: Liu, Jinsong @ 2013-11-29 14:31 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: keir@xen.org, Jan Beulich, Nakajima, Jun, tim@xen.org,
	Dong, Eddie, zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Auld, Will,
	suravee.suthikulpanit@amd.com, sherry.hurwitz@amd.com,
	Zhang, Xiantao

Andrew Cooper wrote:
> On 29/11/13 14:15, Liu, Jinsong wrote:
>> Jan Beulich wrote:
>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>>
>>>> Liu, Jinsong wrote:
>>>>> Yes. reprogram_timer here just delay timer a little slot, say,
>>>>> 1~2ms. I think it's OK, i.e. at any point of wbinvd() operation at
>>>>> hypervisor, or any irq disabled area, timer interrupt in fact also
>>>>> has good chance to be delayed some time -- however at
>>>>> TIMER_SOFTIRQ, all expired thing would be executed, and
>>>>> re-calculated and set next time point via reprogram_timer --
>>>>> that's OK. 
>>>> Comments/thoughts about this option?
>>> Apart from continuing to be very uncertain that this won't have any
>>> bad side effects, I'm also rather concerned that you deal with one
>>> special case interrupt here, ignoring other potentially high rate
>>> ones (like such coming from NICs). 
>>> 
>>> Jan
>> Considering this, seems adding flag is the only work around way
>> since high freq interrupt would result in dead-like-loop. My concern
>> of adding flag is it's not easy to clean every possible path,
>> especially future extension.   
>> 
>> Or, do not support vt-d w/o snoop.
>> 
>> Thanks,
>> Jinsong
> 
> Do you know how many systems have vt-d without snoop ?
> 
> ~Andrew

Yes, that's what I need check inside Intel. Maybe not feasible idea I agree.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-29 14:31                 ` Liu, Jinsong
@ 2013-11-29 14:50                   ` Andrew Cooper
  2013-11-29 15:04                     ` Jan Beulich
  2013-11-29 14:53                   ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2013-11-29 14:50 UTC (permalink / raw)
  To: Liu, Jinsong
  Cc: keir@xen.org, Jan Beulich, Nakajima, Jun, tim@xen.org,
	Dong, Eddie, zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Auld, Will,
	suravee.suthikulpanit@amd.com, sherry.hurwitz@amd.com,
	Zhang, Xiantao

On 29/11/13 14:31, Liu, Jinsong wrote:
> Andrew Cooper wrote:
>> On 29/11/13 14:15, Liu, Jinsong wrote:
>>> Jan Beulich wrote:
>>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>>
>>>>> Liu, Jinsong wrote:
>>>>>> Yes. reprogram_timer here just delay timer a little slot, say,
>>>>>> 1~2ms. I think it's OK, i.e. at any point of wbinvd() operation at
>>>>>> hypervisor, or any irq disabled area, timer interrupt in fact also
>>>>>> has good chance to be delayed some time -- however at
>>>>>> TIMER_SOFTIRQ, all expired thing would be executed, and
>>>>>> re-calculated and set next time point via reprogram_timer --
>>>>>> that's OK. 
>>>>> Comments/thoughts about this option?
>>>> Apart from continuing to be very uncertain that this won't have any
>>>> bad side effects, I'm also rather concerned that you deal with one
>>>> special case interrupt here, ignoring other potentially high rate
>>>> ones (like such coming from NICs). 
>>>>
>>>> Jan
>>> Considering this, seems adding flag is the only work around way
>>> since high freq interrupt would result in dead-like-loop. My concern
>>> of adding flag is it's not easy to clean every possible path,
>>> especially future extension.   
>>>
>>> Or, do not support vt-d w/o snoop.
>>>
>>> Thanks,
>>> Jinsong
>> Do you know how many systems have vt-d without snoop ?
>>
>> ~Andrew
> Yes, that's what I need check inside Intel. Maybe not feasible idea I agree.
>
> Thanks,
> Jinsong

Given that PCIPassthrough realistically involves requiring trusting the
guest administrator, it might be feasible to have another iommu= option
of "allow passthough even without snoop".

~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-29 14:31                 ` Liu, Jinsong
  2013-11-29 14:50                   ` Andrew Cooper
@ 2013-11-29 14:53                   ` Jan Beulich
  2013-11-29 16:53                     ` Liu, Jinsong
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2013-11-29 14:53 UTC (permalink / raw)
  To: Andrew Cooper, Jinsong Liu
  Cc: keir@xen.org, suravee.suthikulpanit@amd.com, tim@xen.org,
	Eddie Dong, zhenzhong.duan@oracle.com, Donald D Dugger,
	xen-devel@lists.xen.org, Will Auld, Jun Nakajima,
	sherry.hurwitz@amd.com, Xiantao Zhang

>>> On 29.11.13 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Andrew Cooper wrote:
>> On 29/11/13 14:15, Liu, Jinsong wrote:
>>> Jan Beulich wrote:
>>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>>
>>>>> Liu, Jinsong wrote:
>>>>>> Yes. reprogram_timer here just delay timer a little slot, say,
>>>>>> 1~2ms. I think it's OK, i.e. at any point of wbinvd() operation at
>>>>>> hypervisor, or any irq disabled area, timer interrupt in fact also
>>>>>> has good chance to be delayed some time -- however at
>>>>>> TIMER_SOFTIRQ, all expired thing would be executed, and
>>>>>> re-calculated and set next time point via reprogram_timer --
>>>>>> that's OK. 
>>>>> Comments/thoughts about this option?
>>>> Apart from continuing to be very uncertain that this won't have any
>>>> bad side effects, I'm also rather concerned that you deal with one
>>>> special case interrupt here, ignoring other potentially high rate
>>>> ones (like such coming from NICs). 
>>>> 
>>> Considering this, seems adding flag is the only work around way
>>> since high freq interrupt would result in dead-like-loop. My concern
>>> of adding flag is it's not easy to clean every possible path,
>>> especially future extension.   
>>> 
>>> Or, do not support vt-d w/o snoop.
>> 
>> Do you know how many systems have vt-d without snoop ?
> 
> Yes, that's what I need check inside Intel. Maybe not feasible idea I agree.

But let's take a step back here: Even with the old solution, there
wasn't any cache flushing being done when the guest was running
in UC mode. So with what we have now we're already doing better
than before in this regard. The primary goal here (and the criteria
for backporting the patches) is to address XSA-60. Which, if I'm not
mistaken, we achieved with the first three of the four initial patches.
Patch 4 and anything beyond really are addressing a distinct issue.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-29 14:50                   ` Andrew Cooper
@ 2013-11-29 15:04                     ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2013-11-29 15:04 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Jinsong Liu, keir@xen.org, suravee.suthikulpanit@amd.com,
	tim@xen.org, Eddie Dong, zhenzhong.duan@oracle.com,
	Donald D Dugger, xen-devel@lists.xen.org, Will Auld, Jun Nakajima,
	sherry.hurwitz@amd.com, Xiantao Zhang

>>> On 29.11.13 at 15:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Given that PCIPassthrough realistically involves requiring trusting the
> guest administrator,

Why? As long as the passed through device isn't buggy, there
shouldn't be any risk associated with handing it to the guest
(once we managed to fix all respective software bugs).

Jan

> it might be feasible to have another iommu= option
> of "allow passthough even without snoop".
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] VMX: wbinvd when vmentry under UC
  2013-11-29 14:53                   ` Jan Beulich
@ 2013-11-29 16:53                     ` Liu, Jinsong
  0 siblings, 0 replies; 15+ messages in thread
From: Liu, Jinsong @ 2013-11-29 16:53 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: keir@xen.org, suravee.suthikulpanit@amd.com, tim@xen.org,
	Dong, Eddie, zhenzhong.duan@oracle.com, Dugger, Donald D,
	xen-devel@lists.xen.org, Auld, Will, Nakajima, Jun,
	sherry.hurwitz@amd.com, Zhang, Xiantao

Jan Beulich wrote:
>>>> On 29.11.13 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>> wrote: Andrew Cooper wrote: On 29/11/13 14:15, Liu, Jinsong wrote:
>>>> Jan Beulich wrote: 
>>>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>>
>>>>>> Liu, Jinsong wrote:
>>>>>>> Yes. reprogram_timer here just delay timer a little slot, say,
>>>>>>> 1~2ms. I think it's OK, i.e. at any point of wbinvd() operation
>>>>>>> at hypervisor, or any irq disabled area, timer interrupt in
>>>>>>> fact also has good chance to be delayed some time -- however at
>>>>>>> TIMER_SOFTIRQ, all expired thing would be executed, and
>>>>>>> re-calculated and set next time point via reprogram_timer --
>>>>>>> that's OK.
>>>>>> Comments/thoughts about this option?
>>>>> Apart from continuing to be very uncertain that this won't have
>>>>> any bad side effects, I'm also rather concerned that you deal
>>>>> with one special case interrupt here, ignoring other potentially
>>>>> high rate ones (like such coming from NICs).
>>>>> 
>>>> Considering this, seems adding flag is the only work around way
>>>> since high freq interrupt would result in dead-like-loop. My
>>>> concern of adding flag is it's not easy to clean every possible
>>>> path, especially future extension. 
>>>> 
>>>> Or, do not support vt-d w/o snoop.
>>> 
>>> Do you know how many systems have vt-d without snoop ?
>> 
>> Yes, that's what I need check inside Intel. Maybe not feasible idea
>> I agree. 
> 
> But let's take a step back here: Even with the old solution, there
> wasn't any cache flushing being done when the guest was running
> in UC mode. So with what we have now we're already doing better
> than before in this regard. The primary goal here (and the criteria
> for backporting the patches) is to address XSA-60. Which, if I'm not
> mistaken, we achieved with the first three of the four initial
> patches. Patch 4 and anything beyond really are addressing a distinct
> issue. 
> 
> Jan

The reason of 'do not support vt-d w/o snoop' is not only for XSA-60, though it seems got fix. My real concern comes from cacheability confliction coming from vt-d w/o snoop (and IPAT bit) -- same cpu can access same physical memory w/ different cache attributes via different mapping -- which is explicitly warned by Intel SDM, though until now it didn't bring other safety issue at Xen.

Thanks,
Jinsong

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-11-29 16:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-25 16:14 [PATCH] VMX: wbinvd when vmentry under UC Liu, Jinsong
2013-11-25 16:39 ` Andrew Cooper
2013-11-25 16:46   ` Jan Beulich
2013-11-25 16:52     ` Auld, Will
2013-11-26  8:56       ` Liu, Jinsong
2013-11-26  9:33         ` Jan Beulich
2013-11-28  7:16         ` Liu, Jinsong
2013-11-28 14:24           ` Jan Beulich
2013-11-29 14:15             ` Liu, Jinsong
2013-11-29 14:24               ` Andrew Cooper
2013-11-29 14:31                 ` Liu, Jinsong
2013-11-29 14:50                   ` Andrew Cooper
2013-11-29 15:04                     ` Jan Beulich
2013-11-29 14:53                   ` Jan Beulich
2013-11-29 16:53                     ` Liu, Jinsong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.