All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: XSAVE IRC thread
       [not found] ` <C22E3DF12F0BCD4FBDD7247F311C5AB002FA69@FTLPEX01CL02.citrite.net>
@ 2013-05-03 14:58   ` Jan Beulich
  2013-05-04 11:13     ` Ben Guthro
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2013-05-03 14:58 UTC (permalink / raw)
  To: Ben Guthro, Mark Roddy; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

>>> On 30.04.13 at 18:24, Mark Roddy <mark.roddy@citrix.com> wrote:
> The two values are pointers to the before and after regions:
> 
> 1: kd> dd 841c4880
> 841c4880  0000027f 00000000 6e83506e 0000001b
> 841c4890  01d113d8 00000023 00001f80 0000ffff
> 841c48a0  00000000 00000000 00000000 00000000
> 841c48b0  00000000 00000000 00000000 00000000
> 841c48c0  00000000 00000000 00000000 00000000
> 841c48d0  00000000 00000000 00000000 00000000
> 841c48e0  00000000 00000000 00000000 00000000
> 841c48f0  00000000 00000000 00000000 00000000
> 
> 1: kd> dd 841c4600 
> 841c4600  0000027f 00000000 6e83506e 00000000
> 841c4610  01d113d8 00000000 00001f80 0000ffff
> 841c4620  00000000 00000000 00000000 00000000
> 841c4630  00000000 00000000 00000000 00000000
> 841c4640  00000000 00000000 00000000 00000000
> 841c4650  00000000 00000000 00000000 00000000
> 841c4660  00000000 00000000 00000000 00000000
> 841c4670  00000000 00000000 00000000 00000000
> 
> I don't know if that helps, but obviously there are differences, two of 
> them, at offsets 0xc and 0x12.
> 
> " Interrupt Service Routine 88CAC91C has changed extended thread context.
> Context saved before executing ISR: 841C4880. Context saved after executing 
> ISR: 841C4600."
> 
> 841C4880 is before and 841C4600 is after.

Attached a patch that I think should address this problem. It's
against the tip of the staging tree, and doesn't apply without
adjustment to 4.2 (and making it work for 4.1 would be quite a
bit more work) - please let me know whether that's sufficient for
you testing this, or whether you need me to do any backporting.

I didn't verify this with any Windows, but since the same issue
can - if one is looking for it - be observed on PV Linux, I did verify
the patch to help there.

I'd like to note though that while this is expected to help with
32-bit guests, and with a 64-bit guest kernels doing such checking
after using the respective save (and possibly restore) instructions
with a 64-bit operand size, the hypervisor has no way of knowing
whether the context actually belongs to a 32-bit process while the
guest is in kernel (64-bit) mode. That means that from a 32-bit
app's perspective, inconsistencies could still be observed under
certain conditions (but the case where the hypervisor side save
happens after a VM exit from user mode should also work with
that patch). I don't see any way to hide that, other than running
on CPUs that don't save/restore the selector values at all
anymore (Intel at least has a feature bit for this).

Jan


[-- Attachment #2: x86-FPU-preserve-selectors.patch --]
[-- Type: text/plain, Size: 8533 bytes --]

--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -55,28 +55,54 @@ static inline void fpu_fxrstor(struct vc
      * possibility, which may occur if the block was passed to us by control
      * tools, by silently clearing the block.
      */
-    asm volatile (
-        /* See above for why the operands/constraints are this way. */
-        "1: " REX64_PREFIX "fxrstor (%2)\n"
-        ".section .fixup,\"ax\"   \n"
-        "2: push %%"__OP"ax       \n"
-        "   push %%"__OP"cx       \n"
-        "   push %%"__OP"di       \n"
-        "   lea  %0,%%"__OP"di    \n"
-        "   mov  %1,%%ecx         \n"
-        "   xor  %%eax,%%eax      \n"
-        "   rep ; stosl           \n"
-        "   pop  %%"__OP"di       \n"
-        "   pop  %%"__OP"cx       \n"
-        "   pop  %%"__OP"ax       \n"
-        "   jmp  1b               \n"
-        ".previous                \n"
-        _ASM_EXTABLE(1b, 2b)
-        : 
-        : "m" (*fpu_ctxt),
-          "i" (sizeof(v->arch.xsave_area->fpu_sse)/4)
-          ,"cdaSDb" (fpu_ctxt)
-        );
+    switch ( __builtin_expect(fpu_ctxt[FPU_WORD_SIZE_OFFSET], 8) )
+    {
+    default:
+        asm volatile (
+            /* See below for why the operands/constraints are this way. */
+            "1: " REX64_PREFIX "fxrstor (%2)\n"
+            ".section .fixup,\"ax\"   \n"
+            "2: push %%"__OP"ax       \n"
+            "   push %%"__OP"cx       \n"
+            "   push %%"__OP"di       \n"
+            "   mov  %2,%%"__OP"di    \n"
+            "   mov  %1,%%ecx         \n"
+            "   xor  %%eax,%%eax      \n"
+            "   rep ; stosl           \n"
+            "   pop  %%"__OP"di       \n"
+            "   pop  %%"__OP"cx       \n"
+            "   pop  %%"__OP"ax       \n"
+            "   jmp  1b               \n"
+            ".previous                \n"
+            _ASM_EXTABLE(1b, 2b)
+            :
+            : "m" (*fpu_ctxt),
+              "i" (sizeof(v->arch.xsave_area->fpu_sse)/4),
+              "cdaSDb" (fpu_ctxt) );
+        break;
+    case 4: case 2:
+        asm volatile (
+            "1: fxrstor (%2)\n"
+            ".section .fixup,\"ax\"   \n"
+            "2: push %%"__OP"ax       \n"
+            "   push %%"__OP"cx       \n"
+            "   push %%"__OP"di       \n"
+            "   mov  %2,%%"__OP"di    \n"
+            "   mov  %1,%%ecx         \n"
+            "   xor  %%eax,%%eax      \n"
+            "   rep ; stosl           \n"
+            "   pop  %%"__OP"di       \n"
+            "   pop  %%"__OP"cx       \n"
+            "   pop  %%"__OP"ax       \n"
+            "   jmp  1b               \n"
+            ".previous                \n"
+            _ASM_EXTABLE(1b, 2b)
+            :
+            : "m" (*fpu_ctxt),
+              "i" (sizeof(v->arch.xsave_area->fpu_sse)/4),
+              "r" (fpu_ctxt) );
+        break;
+    }
 }
 
 /* Restore x87 extended state */
@@ -105,15 +131,24 @@ static inline void fpu_xsave(struct vcpu
 static inline void fpu_fxsave(struct vcpu *v)
 {
     char *fpu_ctxt = v->arch.fpu_ctxt;
+    int word_size = guest_word_size(v);
 
-    /*
-     * The only way to force fxsaveq on a wide range of gas versions. On 
-     * older versions the rex64 prefix works only if we force an
-     * addressing mode that doesn't require extended registers.
-     */
-    asm volatile (
-        REX64_PREFIX "fxsave (%1)"
-        : "=m" (*fpu_ctxt) : "cdaSDb" (fpu_ctxt) );
+    switch ( __builtin_expect(word_size, 8) )
+    {
+    default:
+        /*
+         * The only way to force fxsaveq on a wide range of gas versions.
+         * On older versions the rex64 prefix works only if we force an
+         * addressing mode that doesn't require extended registers.
+         */
+        asm volatile ( REX64_PREFIX "fxsave (%1)"
+                       : "=m" (*fpu_ctxt) : "cdaSDb" (fpu_ctxt) );
+        break;
+    case 4: case 2:
+        asm volatile ( "fxsave %0" : "=m" (*fpu_ctxt) );
+        break;
+    }
+    fpu_ctxt[FPU_WORD_SIZE_OFFSET] = word_size;
     
     /* Clear exception flags if FSW.ES is set. */
     if ( unlikely(fpu_ctxt[2] & 0x80) )
@@ -253,6 +288,39 @@ void vcpu_destroy_fpu(struct vcpu *v)
         xfree(v->arch.fpu_ctxt);
 }
 
+int guest_word_size(struct vcpu *v)
+{
+    int mode;
+
+    if ( !is_hvm_vcpu(v) )
+    {
+        if ( is_pv_32bit_vcpu(v) )
+            return 4;
+
+        asm ( "1: lar %1,%0          \n"
+              "   jnz 2f             \n"
+              "3:                    \n"
+              ".section .fixup,\"ax\"\n"
+              "2: xor %0,%0          \n"
+              "   jmp 3b             \n"
+              ".previous             \n"
+              _ASM_EXTABLE(1b, 2b)
+              : "=r" (mode)
+              : "m" (guest_cpu_user_regs()->cs) );
+
+        return !(mode & _SEGMENT_S) || (mode & _SEGMENT_L) ? 8 : 4;
+    }
+
+    switch ( mode = hvm_guest_x86_mode(v) )
+    {
+    case 0: /* real mode */
+    case 1: /* virtual mode */
+        return 2;
+    }
+
+    return mode;
+}
+
 /*
  * Local variables:
  * mode: C
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -56,32 +56,53 @@ void xsave(struct vcpu *v, uint64_t mask
     struct xsave_struct *ptr = v->arch.xsave_area;
     uint32_t hmask = mask >> 32;
     uint32_t lmask = mask;
+    int word_size = guest_word_size(v);
 
-    if ( cpu_has_xsaveopt )
-        asm volatile (
-            ".byte " REX_PREFIX "0x0f,0xae,0x37"
-            :
-            : "a" (lmask), "d" (hmask), "D"(ptr)
-            : "memory" );
-    else
-        asm volatile (
-            ".byte " REX_PREFIX "0x0f,0xae,0x27"
-            :
-            : "a" (lmask), "d" (hmask), "D"(ptr)
-            : "memory" );
+    switch ( __builtin_expect(word_size, 8) )
+    {
+    default:
+        if ( cpu_has_xsaveopt )
+            asm volatile ( ".byte 0x48,0x0f,0xae,0x37"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        else
+            asm volatile ( ".byte 0x48,0x0f,0xae,0x27"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        break;
+    case 4: case 2:
+        if ( cpu_has_xsaveopt )
+            asm volatile ( ".byte 0x0f,0xae,0x37"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        else
+            asm volatile ( ".byte 0x0f,0xae,0x27"
+                           : "=m" (*ptr)
+                           : "a" (lmask), "d" (hmask), "D" (ptr) );
+        break;
+    }
+    ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] = word_size;
 }
 
 void xrstor(struct vcpu *v, uint64_t mask)
 {
     uint32_t hmask = mask >> 32;
     uint32_t lmask = mask;
-
     struct xsave_struct *ptr = v->arch.xsave_area;
 
-    asm volatile (
-        ".byte " REX_PREFIX "0x0f,0xae,0x2f"
-        :
-        : "m" (*ptr), "a" (lmask), "d" (hmask), "D"(ptr) );
+    switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
+    {
+    default:
+        asm volatile ( ".byte 0x48,0x0f,0xae,0x2f"
+                       :
+                       : "m" (*ptr), "a" (lmask), "d" (hmask), "D" (ptr) );
+        break;
+    case 4: case 2:
+        asm volatile ( ".byte 0x0f,0xae,0x2f"
+                       :
+                       : "m" (*ptr), "a" (lmask), "d" (hmask), "D" (ptr) );
+        break;
+    }
 }
 
 bool_t xsave_enabled(const struct vcpu *v)
--- a/xen/include/asm-x86/xstate.h
+++ b/xen/include/asm-x86/xstate.h
@@ -34,8 +34,6 @@
 #define XSTATE_NONLAZY (XSTATE_LWP)
 #define XSTATE_LAZY    (XSTATE_ALL & ~XSTATE_NONLAZY)
 
-#define REX_PREFIX     "0x48, "
-
 /* extended state variables */
 DECLARE_PER_CPU(uint64_t, xcr0);
 
@@ -88,4 +86,14 @@ void xstate_free_save_area(struct vcpu *
 int xstate_alloc_save_area(struct vcpu *v);
 void xstate_init(void);
 
+/* Byte offset within the FXSAVE (portion) of the stored word size. */
+#define FPU_WORD_SIZE_OFFSET 511
+
+/*
+ * Used EXCLUSIVELY to determine the needed operand size override on
+ * XSAVE/FXSAVE. Any other use would need to make sure that the context
+ * is suitable for all operations this involves.
+ */
+int guest_word_size(struct vcpu *);
+
 #endif /* __ASM_XSTATE_H */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XSAVE IRC thread
  2013-05-03 14:58   ` XSAVE IRC thread Jan Beulich
@ 2013-05-04 11:13     ` Ben Guthro
  2013-05-04 15:51       ` Andrew Cooper
                         ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Ben Guthro @ 2013-05-04 11:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Mark Roddy, Ben Guthro


On May 3, 2013, at 10:58 AM, Jan Beulich <JBeulich@suse.com>
 wrote:

>>>> On 30.04.13 at 18:24, Mark Roddy <mark.roddy@citrix.com> wrote:
>> The two values are pointers to the before and after regions:
>> 
>> 1: kd> dd 841c4880
>> 841c4880  0000027f 00000000 6e83506e 0000001b
>> 841c4890  01d113d8 00000023 00001f80 0000ffff
>> 841c48a0  00000000 00000000 00000000 00000000
>> 841c48b0  00000000 00000000 00000000 00000000
>> 841c48c0  00000000 00000000 00000000 00000000
>> 841c48d0  00000000 00000000 00000000 00000000
>> 841c48e0  00000000 00000000 00000000 00000000
>> 841c48f0  00000000 00000000 00000000 00000000
>> 
>> 1: kd> dd 841c4600 
>> 841c4600  0000027f 00000000 6e83506e 00000000
>> 841c4610  01d113d8 00000000 00001f80 0000ffff
>> 841c4620  00000000 00000000 00000000 00000000
>> 841c4630  00000000 00000000 00000000 00000000
>> 841c4640  00000000 00000000 00000000 00000000
>> 841c4650  00000000 00000000 00000000 00000000
>> 841c4660  00000000 00000000 00000000 00000000
>> 841c4670  00000000 00000000 00000000 00000000
>> 
>> I don't know if that helps, but obviously there are differences, two of 
>> them, at offsets 0xc and 0x12.
>> 
>> " Interrupt Service Routine 88CAC91C has changed extended thread context.
>> Context saved before executing ISR: 841C4880. Context saved after executing 
>> ISR: 841C4600."
>> 
>> 841C4880 is before and 841C4600 is after.
> 
> Attached a patch that I think should address this problem. It's
> against the tip of the staging tree, and doesn't apply without
> adjustment to 4.2 (and making it work for 4.1 would be quite a
> bit more work) - please let me know whether that's sufficient for
> you testing this, or whether you need me to do any backporting.
> 
> I didn't verify this with any Windows, but since the same issue
> can - if one is looking for it - be observed on PV Linux, I did verify
> the patch to help there.
> 
> I'd like to note though that while this is expected to help with
> 32-bit guests, and with a 64-bit guest kernels doing such checking
> after using the respective save (and possibly restore) instructions
> with a 64-bit operand size, the hypervisor has no way of knowing
> whether the context actually belongs to a 32-bit process while the
> guest is in kernel (64-bit) mode. That means that from a 32-bit
> app's perspective, inconsistencies could still be observed under
> certain conditions (but the case where the hypervisor side save
> happens after a VM exit from user mode should also work with
> that patch). I don't see any way to hide that, other than running
> on CPUs that don't save/restore the selector values at all
> anymore (Intel at least has a feature bit for this).
> 
> Jan
> 
> <x86-FPU-preserve-selectors.patch>

Jan,

Thank you very much for looking into this so quickly.

Our QA infrastructure is currently set up for testing against our XenClient product built on Xen 4.2.2.
Since this is an intermittent failure, in order to reduce the number of variables in testing this solution,
I'll look into backporting this on Mon to 4.2, and report back after a night, or two of testing.

Thanks again for your help

Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XSAVE IRC thread
  2013-05-04 11:13     ` Ben Guthro
@ 2013-05-04 15:51       ` Andrew Cooper
  2013-05-05  6:32       ` Jan Beulich
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2013-05-04 15:51 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Mark Roddy, Jan Beulich, xen-devel

On 04/05/2013 12:13, Ben Guthro wrote:
> On May 3, 2013, at 10:58 AM, Jan Beulich <JBeulich@suse.com>
>  wrote:
>
>>>>> On 30.04.13 at 18:24, Mark Roddy <mark.roddy@citrix.com> wrote:
>>> The two values are pointers to the before and after regions:
>>>
>>> 1: kd> dd 841c4880
>>> 841c4880  0000027f 00000000 6e83506e 0000001b
>>> 841c4890  01d113d8 00000023 00001f80 0000ffff
>>> 841c48a0  00000000 00000000 00000000 00000000
>>> 841c48b0  00000000 00000000 00000000 00000000
>>> 841c48c0  00000000 00000000 00000000 00000000
>>> 841c48d0  00000000 00000000 00000000 00000000
>>> 841c48e0  00000000 00000000 00000000 00000000
>>> 841c48f0  00000000 00000000 00000000 00000000
>>>
>>> 1: kd> dd 841c4600 
>>> 841c4600  0000027f 00000000 6e83506e 00000000
>>> 841c4610  01d113d8 00000000 00001f80 0000ffff
>>> 841c4620  00000000 00000000 00000000 00000000
>>> 841c4630  00000000 00000000 00000000 00000000
>>> 841c4640  00000000 00000000 00000000 00000000
>>> 841c4650  00000000 00000000 00000000 00000000
>>> 841c4660  00000000 00000000 00000000 00000000
>>> 841c4670  00000000 00000000 00000000 00000000
>>>
>>> I don't know if that helps, but obviously there are differences, two of 
>>> them, at offsets 0xc and 0x12.
>>>
>>> " Interrupt Service Routine 88CAC91C has changed extended thread context.
>>> Context saved before executing ISR: 841C4880. Context saved after executing 
>>> ISR: 841C4600."
>>>
>>> 841C4880 is before and 841C4600 is after.
>> Attached a patch that I think should address this problem. It's
>> against the tip of the staging tree, and doesn't apply without
>> adjustment to 4.2 (and making it work for 4.1 would be quite a
>> bit more work) - please let me know whether that's sufficient for
>> you testing this, or whether you need me to do any backporting.
>>
>> I didn't verify this with any Windows, but since the same issue
>> can - if one is looking for it - be observed on PV Linux, I did verify
>> the patch to help there.
>>
>> I'd like to note though that while this is expected to help with
>> 32-bit guests, and with a 64-bit guest kernels doing such checking
>> after using the respective save (and possibly restore) instructions
>> with a 64-bit operand size, the hypervisor has no way of knowing
>> whether the context actually belongs to a 32-bit process while the
>> guest is in kernel (64-bit) mode. That means that from a 32-bit
>> app's perspective, inconsistencies could still be observed under
>> certain conditions (but the case where the hypervisor side save
>> happens after a VM exit from user mode should also work with
>> that patch). I don't see any way to hide that, other than running
>> on CPUs that don't save/restore the selector values at all
>> anymore (Intel at least has a feature bit for this).
>>
>> Jan
>>
>> <x86-FPU-preserve-selectors.patch>
> Jan,
>
> Thank you very much for looking into this so quickly.
>
> Our QA infrastructure is currently set up for testing against our XenClient product built on Xen 4.2.2.
> Since this is an intermittent failure, in order to reduce the number of variables in testing this solution,
> I'll look into backporting this on Mon to 4.2, and report back after a night, or two of testing.
>
> Thanks again for your help
>
> Ben

I will also look to getting this into XenServer testing ASAP

~Andrew

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XSAVE IRC thread
  2013-05-04 11:13     ` Ben Guthro
  2013-05-04 15:51       ` Andrew Cooper
@ 2013-05-05  6:32       ` Jan Beulich
  2013-05-06 14:51       ` Kirk Allan
  2013-05-07 20:17       ` Ben Guthro
  3 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2013-05-05  6:32 UTC (permalink / raw)
  To: Ben.Guthro; +Cc: Kirk Allan, mark.roddy, xen-devel

>>> Ben Guthro <Ben.Guthro@citrix.com> 05/04/13 1:14 PM >>>
> On May 3, 2013, at 10:58 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> Attached a patch that I think should address this problem. It's
>> against the tip of the staging tree, and doesn't apply without
>> adjustment to 4.2 (and making it work for 4.1 would be quite a
>> bit more work) - please let me know whether that's sufficient for
>> you testing this, or whether you need me to do any backporting.
>> 
>> I didn't verify this with any Windows, but since the same issue
>> can - if one is looking for it - be observed on PV Linux, I did verify
>> the patch to help there.
>> 
>> I'd like to note though that while this is expected to help with
>> 32-bit guests, and with a 64-bit guest kernels doing such checking
>> after using the respective save (and possibly restore) instructions
>> with a 64-bit operand size, the hypervisor has no way of knowing
>> whether the context actually belongs to a 32-bit process while the
>> guest is in kernel (64-bit) mode. That means that from a 32-bit
>> app's perspective, inconsistencies could still be observed under
>> certain conditions (but the case where the hypervisor side save
>> happens after a VM exit from user mode should also work with
>> that patch). I don't see any way to hide that, other than running
>> on CPUs that don't save/restore the selector values at all
>> anymore (Intel at least has a feature bit for this).
> 
>Thank you very much for looking into this so quickly.
>
>Our QA infrastructure is currently set up for testing against our XenClient product built on Xen 4.2.2.
>Since this is an intermittent failure, in order to reduce the number of variables in testing this solution,
>I'll look into backporting this on Mon to 4.2, and report back after a night, or two of testing.

Meanwhile I realized that the patch will need a little further adjustment:
Parts of the xsave modifications need to/can become conditional upon
FPU state being saved/restored (which in particular may not be the case
during the eager restore phase needed for AMD LWP, but which otherwise
would also be a latent bug). This shouldn't affect testing of what I sent
on Friday, though.

Furthermore I meanwhile also mentally restored what I had done on native
Linux years ago, which would permit determining the needed save/restore
layout (32 vs 64 bit) regardless of current guest execution mode. That
would come with a price, though, since some parts of the save operation
would need to be done twice (needing either two fxsave-s/xsave-s, or an
additional fnstenv) - will need to do some measurements to see how bad
this would turn out, and which of the two would be preferable. (This
implicitly also tells me that native Linux ought to have the same issue when
running 32-bit Windows in this verifier mode on 64-bit KVM. Kirk - did you
ever run across FPU state inconsistency bug checks in the WHQL testing on
KVM?)

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XSAVE IRC thread
  2013-05-04 11:13     ` Ben Guthro
  2013-05-04 15:51       ` Andrew Cooper
  2013-05-05  6:32       ` Jan Beulich
@ 2013-05-06 14:51       ` Kirk Allan
  2013-05-07 20:17       ` Ben Guthro
  3 siblings, 0 replies; 6+ messages in thread
From: Kirk Allan @ 2013-05-06 14:51 UTC (permalink / raw)
  To: Ben.Guthro, Jan Beulich; +Cc: mark.roddy, xen-devel



>>> On 5/5/2013 at 12:32 AM, in message <5185FCE8.59A : 120 : 30442>, Jan Beulich
wrote: 
>>>> Ben Guthro <Ben.Guthro@citrix.com> 05/04/13 1:14 PM >>>
>> On May 3, 2013, at 10:58 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>> Attached a patch that I think should address this problem. It's
>>> against the tip of the staging tree, and doesn't apply without
>>> adjustment to 4.2 (and making it work for 4.1 would be quite a
>>> bit more work) - please let me know whether that's sufficient for
>>> you testing this, or whether you need me to do any backporting.
>>> 
>>> I didn't verify this with any Windows, but since the same issue
>>> can - if one is looking for it - be observed on PV Linux, I did verify
>>> the patch to help there.
>>> 
>>> I'd like to note though that while this is expected to help with
>>> 32-bit guests, and with a 64-bit guest kernels doing such checking
>>> after using the respective save (and possibly restore) instructions
>>> with a 64-bit operand size, the hypervisor has no way of knowing
>>> whether the context actually belongs to a 32-bit process while the
>>> guest is in kernel (64-bit) mode. That means that from a 32-bit
>>> app's perspective, inconsistencies could still be observed under
>>> certain conditions (but the case where the hypervisor side save
>>> happens after a VM exit from user mode should also work with
>>> that patch). I don't see any way to hide that, other than running
>>> on CPUs that don't save/restore the selector values at all
>>> anymore (Intel at least has a feature bit for this).
>> 
>>Thank you very much for looking into this so quickly.
>>
>>Our QA infrastructure is currently set up for testing against our XenClient 
> product built on Xen 4.2.2.
>>Since this is an intermittent failure, in order to reduce the number of 
> variables in testing this solution,
>>I'll look into backporting this on Mon to 4.2, and report back after a night, 
> or two of testing.
> 
> Meanwhile I realized that the patch will need a little further adjustment:
> Parts of the xsave modifications need to/can become conditional upon
> FPU state being saved/restored (which in particular may not be the case
> during the eager restore phase needed for AMD LWP, but which otherwise
> would also be a latent bug). This shouldn't affect testing of what I sent
> on Friday, though.
> 
> Furthermore I meanwhile also mentally restored what I had done on native
> Linux years ago, which would permit determining the needed save/restore
> layout (32 vs 64 bit) regardless of current guest execution mode. That
> would come with a price, though, since some parts of the save operation
> would need to be done twice (needing either two fxsave-s/xsave-s, or an
> additional fnstenv) - will need to do some measurements to see how bad
> this would turn out, and which of the two would be preferable. (This
> implicitly also tells me that native Linux ought to have the same issue when
> running 32-bit Windows in this verifier mode on 64-bit KVM. Kirk - did you
> ever run across FPU state inconsistency bug checks in the WHQL testing on
> KVM?)

No, no bug checks while running WHQL.  Other than bug checks, would there be any other manifestation of the issue that I could look for?

> 
> Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XSAVE IRC thread
  2013-05-04 11:13     ` Ben Guthro
                         ` (2 preceding siblings ...)
  2013-05-06 14:51       ` Kirk Allan
@ 2013-05-07 20:17       ` Ben Guthro
  3 siblings, 0 replies; 6+ messages in thread
From: Ben Guthro @ 2013-05-07 20:17 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Mark Roddy, Jan Beulich, xen-devel

On 05/04/2013 07:13 AM, Ben Guthro wrote:
>
> On May 3, 2013, at 10:58 AM, Jan Beulich <JBeulich@suse.com>
>   wrote:
>
>>>>> On 30.04.13 at 18:24, Mark Roddy <mark.roddy@citrix.com> wrote:
>>> The two values are pointers to the before and after regions:
>>>
>>> 1: kd> dd 841c4880
>>> 841c4880  0000027f 00000000 6e83506e 0000001b
>>> 841c4890  01d113d8 00000023 00001f80 0000ffff
>>> 841c48a0  00000000 00000000 00000000 00000000
>>> 841c48b0  00000000 00000000 00000000 00000000
>>> 841c48c0  00000000 00000000 00000000 00000000
>>> 841c48d0  00000000 00000000 00000000 00000000
>>> 841c48e0  00000000 00000000 00000000 00000000
>>> 841c48f0  00000000 00000000 00000000 00000000
>>>
>>> 1: kd> dd 841c4600
>>> 841c4600  0000027f 00000000 6e83506e 00000000
>>> 841c4610  01d113d8 00000000 00001f80 0000ffff
>>> 841c4620  00000000 00000000 00000000 00000000
>>> 841c4630  00000000 00000000 00000000 00000000
>>> 841c4640  00000000 00000000 00000000 00000000
>>> 841c4650  00000000 00000000 00000000 00000000
>>> 841c4660  00000000 00000000 00000000 00000000
>>> 841c4670  00000000 00000000 00000000 00000000
>>>
>>> I don't know if that helps, but obviously there are differences, two of
>>> them, at offsets 0xc and 0x12.
>>>
>>> " Interrupt Service Routine 88CAC91C has changed extended thread context.
>>> Context saved before executing ISR: 841C4880. Context saved after executing
>>> ISR: 841C4600."
>>>
>>> 841C4880 is before and 841C4600 is after.
>>
>> Attached a patch that I think should address this problem. It's
>> against the tip of the staging tree, and doesn't apply without
>> adjustment to 4.2 (and making it work for 4.1 would be quite a
>> bit more work) - please let me know whether that's sufficient for
>> you testing this, or whether you need me to do any backporting.
>>
>> I didn't verify this with any Windows, but since the same issue
>> can - if one is looking for it - be observed on PV Linux, I did verify
>> the patch to help there.
>>
>> I'd like to note though that while this is expected to help with
>> 32-bit guests, and with a 64-bit guest kernels doing such checking
>> after using the respective save (and possibly restore) instructions
>> with a 64-bit operand size, the hypervisor has no way of knowing
>> whether the context actually belongs to a 32-bit process while the
>> guest is in kernel (64-bit) mode. That means that from a 32-bit
>> app's perspective, inconsistencies could still be observed under
>> certain conditions (but the case where the hypervisor side save
>> happens after a VM exit from user mode should also work with
>> that patch). I don't see any way to hide that, other than running
>> on CPUs that don't save/restore the selector values at all
>> anymore (Intel at least has a feature bit for this).
>>
>> Jan
>>
>> <x86-FPU-preserve-selectors.patch>
>
> Jan,
>
> Thank you very much for looking into this so quickly.
>
> Our QA infrastructure is currently set up for testing against our XenClient product built on Xen 4.2.2.
> Since this is an intermittent failure, in order to reduce the number of variables in testing this solution,
> I'll look into backporting this on Mon to 4.2, and report back after a night, or two of testing.

Preliminary testing results look good - no xsave crashes with this patch.

Modulo your concerns from your other email, I'd say this solution looks 
good.

Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-07 20:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <517FCF4F.70209@citrix.com>
     [not found] ` <C22E3DF12F0BCD4FBDD7247F311C5AB002FA69@FTLPEX01CL02.citrite.net>
2013-05-03 14:58   ` XSAVE IRC thread Jan Beulich
2013-05-04 11:13     ` Ben Guthro
2013-05-04 15:51       ` Andrew Cooper
2013-05-05  6:32       ` Jan Beulich
2013-05-06 14:51       ` Kirk Allan
2013-05-07 20:17       ` Ben Guthro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.