* [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions
@ 2011-05-03 20:17 Wei Huang
2011-05-04 7:09 ` Jan Beulich
0 siblings, 1 reply; 6+ messages in thread
From: Wei Huang @ 2011-05-03 20:17 UTC (permalink / raw)
To: 'xen-devel@lists.xensource.com', Keir Fraser, Jan Beulich
[-- Attachment #1: Type: text/plain, Size: 504 bytes --]
FPU: create lazy and non-lazy FPU restore functions
Currently Xen relies on #NM (via CR0.TS) to trigger FPU context restore.
But not all FPU state is tracked by TS bit. This function creates two
FPU restore functions: vcpu_restore_fpu_lazy() and
vcpu_restore_fpu_eager(). vcpu_restore_fpu_lazy() is still used when #NM
is triggered. vcpu_restore_fpu_eager(), as a comparision, is called for
vcpu which is being scheduled in on every context switch.
Signed-off-by: Wei Huang <wei.huang2@amd.com>
[-- Attachment #2: lwp6.txt --]
[-- Type: text/plain, Size: 3909 bytes --]
# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304449177 18000
# Node ID 40c9bab757e1cc3964ae6869022eebba7141d6eb
# Parent 83db82b67f65bee91f35e9caaad700a78ac0a3fc
FPU: create lazy and non-lazy FPU restore functions
Currently Xen relies on #NM (via CR0.TS) to trigger FPU context restore. But not all FPU state is tracked by TS bit. This function creates two FPU restore functions: vcpu_restore_fpu_lazy() and vcpu_restore_fpu_eager(). vcpu_restore_fpu_lazy() is still used when #NM is triggered. vcpu_restore_fpu_eager(), as a comparision, is called for vcpu which is being scheduled in on every context switch.
Signed-off-by: Wei Huang <wei.huang2@amd.com>
diff -r 83db82b67f65 -r 40c9bab757e1 xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c Tue May 03 13:49:27 2011 -0500
+++ b/xen/arch/x86/domain.c Tue May 03 13:59:37 2011 -0500
@@ -1578,6 +1578,7 @@
memcpy(stack_regs, &n->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
if ( xsave_enabled(n) && n->arch.xcr0 != get_xcr0() )
set_xcr0(n->arch.xcr0);
+ vcpu_restore_fpu_eager(n);
n->arch.ctxt_switch_to(n);
}
diff -r 83db82b67f65 -r 40c9bab757e1 xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c Tue May 03 13:49:27 2011 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c Tue May 03 13:59:37 2011 -0500
@@ -348,7 +348,7 @@
{
struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
- vcpu_restore_fpu(v);
+ vcpu_restore_fpu_lazy(v);
vmcb_set_exception_intercepts(
vmcb, vmcb_get_exception_intercepts(vmcb) & ~(1U << TRAP_no_device));
}
diff -r 83db82b67f65 -r 40c9bab757e1 xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c Tue May 03 13:49:27 2011 -0500
+++ b/xen/arch/x86/hvm/vmx/vmx.c Tue May 03 13:59:37 2011 -0500
@@ -612,7 +612,7 @@
static void vmx_fpu_enter(struct vcpu *v)
{
- vcpu_restore_fpu(v);
+ vcpu_restore_fpu_lazy(v);
v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device);
vmx_update_exception_bitmap(v);
v->arch.hvm_vmx.host_cr0 &= ~X86_CR0_TS;
diff -r 83db82b67f65 -r 40c9bab757e1 xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c Tue May 03 13:49:27 2011 -0500
+++ b/xen/arch/x86/i387.c Tue May 03 13:59:37 2011 -0500
@@ -160,10 +160,25 @@
/*******************************/
/* VCPU FPU Functions */
/*******************************/
+/* Restore FPU state whenever VCPU is schduled in. */
+void vcpu_restore_fpu_eager(struct vcpu *v)
+{
+ ASSERT(!is_idle_vcpu(v));
+
+ /* Avoid recursion */
+ clts();
+
+ /* save the nonlazy extended state which is not tracked by CR0.TS bit */
+ if ( xsave_enabled(v) )
+ fpu_xrstor(v, XSTATE_NONLAZY);
+
+ stts();
+}
+
/*
* Restore FPU state when #NM is triggered.
*/
-void vcpu_restore_fpu(struct vcpu *v)
+void vcpu_restore_fpu_lazy(struct vcpu *v)
{
ASSERT(!is_idle_vcpu(v));
@@ -174,7 +189,7 @@
return;
if ( xsave_enabled(v) )
- fpu_xrstor(v, XSTATE_ALL);
+ fpu_xrstor(v, XSTATE_LAZY);
else if ( v->fpu_initialised )
{
if ( cpu_has_fxsr )
diff -r 83db82b67f65 -r 40c9bab757e1 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c Tue May 03 13:49:27 2011 -0500
+++ b/xen/arch/x86/traps.c Tue May 03 13:59:37 2011 -0500
@@ -3198,7 +3198,7 @@
BUG_ON(!guest_mode(regs));
- vcpu_restore_fpu(curr);
+ vcpu_restore_fpu_lazy(curr);
if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
{
diff -r 83db82b67f65 -r 40c9bab757e1 xen/include/asm-x86/i387.h
--- a/xen/include/asm-x86/i387.h Tue May 03 13:49:27 2011 -0500
+++ b/xen/include/asm-x86/i387.h Tue May 03 13:59:37 2011 -0500
@@ -14,7 +14,8 @@
#include <xen/types.h>
#include <xen/percpu.h>
-void vcpu_restore_fpu(struct vcpu *v);
+void vcpu_restore_fpu_eager(struct vcpu *v);
+void vcpu_restore_fpu_lazy(struct vcpu *v);
void vcpu_save_fpu(struct vcpu *v);
int vcpu_init_fpu(struct vcpu *v);
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions
2011-05-03 20:17 [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions Wei Huang
@ 2011-05-04 7:09 ` Jan Beulich
2011-05-04 16:33 ` Wei Huang
0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2011-05-04 7:09 UTC (permalink / raw)
To: Wei Huang; +Cc: xen-devel@lists.xensource.com, KeirFraser
>>> On 03.05.11 at 22:17, Wei Huang <wei.huang2@amd.com> wrote:
Again as pointed out earlier, ...
>--- a/xen/arch/x86/domain.c Tue May 03 13:49:27 2011 -0500
>+++ b/xen/arch/x86/domain.c Tue May 03 13:59:37 2011 -0500
>@@ -1578,6 +1578,7 @@
> memcpy(stack_regs, &n->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
> if ( xsave_enabled(n) && n->arch.xcr0 != get_xcr0() )
> set_xcr0(n->arch.xcr0);
>+ vcpu_restore_fpu_eager(n);
... this call is unconditional, ...
> n->arch.ctxt_switch_to(n);
> }
>
>--- a/xen/arch/x86/i387.c Tue May 03 13:49:27 2011 -0500
>+++ b/xen/arch/x86/i387.c Tue May 03 13:59:37 2011 -0500
>@@ -160,10 +160,25 @@
> /*******************************/
> /* VCPU FPU Functions */
> /*******************************/
>+/* Restore FPU state whenever VCPU is schduled in. */
>+void vcpu_restore_fpu_eager(struct vcpu *v)
>+{
>+ ASSERT(!is_idle_vcpu(v));
>+
>+ /* Avoid recursion */
>+ clts();
>+
>+ /* save the nonlazy extended state which is not tracked by CR0.TS bit */
>+ if ( xsave_enabled(v) )
>+ fpu_xrstor(v, XSTATE_NONLAZY);
>+
>+ stts();
... while here you do an unconditional clts followed by an xrstor only
checking whether xsave is enabled (but not checking whether there's
any non-lazy state to be restored) and, possibly the most expensive
of all, an unconditional write of CR0.
Jan
>+}
>+
> /*
> * Restore FPU state when #NM is triggered.
> */
>-void vcpu_restore_fpu(struct vcpu *v)
>+void vcpu_restore_fpu_lazy(struct vcpu *v)
> {
> ASSERT(!is_idle_vcpu(v));
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions
2011-05-04 7:09 ` Jan Beulich
@ 2011-05-04 16:33 ` Wei Huang
2011-05-05 7:13 ` Jan Beulich
0 siblings, 1 reply; 6+ messages in thread
From: Wei Huang @ 2011-05-04 16:33 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xensource.com, KeirFraser
Checking whether there is a non-lazy state to save is architectural
specific and very messy. For instance, we need to read LWP_CBADDR to
confirm LWP's dirty state. This MSR is AMD specific and we don't want to
add it here. Plus reading data from LWP_CBADDR MSR might be as expensive
as clts/stts.
My previous email showed that the overhead with LWP is around 1%-2% of
__context_switch(). For non lwp-capable CPU, this overhead should be
much smaller (only clts and stts) because xfeature_mask[LWP] is 0.
Yes, clts() and stts() don't have to called every time. How about this one?
/* Restore FPU state whenever VCPU is schduled in. */
void vcpu_restore_fpu_eager(struct vcpu *v)
{
ASSERT(!is_idle_vcpu(v));
/* save the nonlazy extended state which is not tracked by CR0.TS bit */
if ( xsave_enabled(v) )
{
/* Avoid recursion */
clts();
fpu_xrstor(v, XSTATE_NONLAZY);
stts();
}
.
On 05/04/2011 02:09 AM, Jan Beulich wrote:
>>>> On 03.05.11 at 22:17, Wei Huang<wei.huang2@amd.com> wrote:
> Again as pointed out earlier, ...
>
>> --- a/xen/arch/x86/domain.c Tue May 03 13:49:27 2011 -0500
>> +++ b/xen/arch/x86/domain.c Tue May 03 13:59:37 2011 -0500
>> @@ -1578,6 +1578,7 @@
>> memcpy(stack_regs,&n->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
>> if ( xsave_enabled(n)&& n->arch.xcr0 != get_xcr0() )
>> set_xcr0(n->arch.xcr0);
>> + vcpu_restore_fpu_eager(n);
> ... this call is unconditional, ...
>
>> n->arch.ctxt_switch_to(n);
>> }
>>
>> --- a/xen/arch/x86/i387.c Tue May 03 13:49:27 2011 -0500
>> +++ b/xen/arch/x86/i387.c Tue May 03 13:59:37 2011 -0500
>> @@ -160,10 +160,25 @@
>> /*******************************/
>> /* VCPU FPU Functions */
>> /*******************************/
>> +/* Restore FPU state whenever VCPU is schduled in. */
>> +void vcpu_restore_fpu_eager(struct vcpu *v)
>> +{
>> + ASSERT(!is_idle_vcpu(v));
>> +
>> + /* Avoid recursion */
>> + clts();
>> +
>> + /* save the nonlazy extended state which is not tracked by CR0.TS bit */
>> + if ( xsave_enabled(v) )
>> + fpu_xrstor(v, XSTATE_NONLAZY);
>> +
>> + stts();
> ... while here you do an unconditional clts followed by an xrstor only
> checking whether xsave is enabled (but not checking whether there's
> any non-lazy state to be restored) and, possibly the most expensive
> of all, an unconditional write of CR0.
>
> Jan
>
>> +}
>> +
>> /*
>> * Restore FPU state when #NM is triggered.
>> */
>> -void vcpu_restore_fpu(struct vcpu *v)
>> +void vcpu_restore_fpu_lazy(struct vcpu *v)
>> {
>> ASSERT(!is_idle_vcpu(v));
>>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions
2011-05-04 16:33 ` Wei Huang
@ 2011-05-05 7:13 ` Jan Beulich
2011-05-05 21:41 ` Wei Huang
0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2011-05-05 7:13 UTC (permalink / raw)
To: Wei Huang; +Cc: xen-devel@lists.xensource.com, KeirFraser
>>> On 04.05.11 at 18:33, Wei Huang <wei.huang2@amd.com> wrote:
> Checking whether there is a non-lazy state to save is architectural
> specific and very messy. For instance, we need to read LWP_CBADDR to
> confirm LWP's dirty state. This MSR is AMD specific and we don't want to
> add it here. Plus reading data from LWP_CBADDR MSR might be as expensive
> as clts/stts.
>
> My previous email showed that the overhead with LWP is around 1%-2% of
> __context_switch(). For non lwp-capable CPU, this overhead should be
> much smaller (only clts and stts) because xfeature_mask[LWP] is 0.
I wasn't talking about determining whether LWP state is dirty, but
much rather about LWP not being in use at all.
> Yes, clts() and stts() don't have to called every time. How about this one?
>
> /* Restore FPU state whenever VCPU is schduled in. */
> void vcpu_restore_fpu_eager(struct vcpu *v)
> {
> ASSERT(!is_idle_vcpu(v));
>
>
> /* save the nonlazy extended state which is not tracked by CR0.TS bit */
> if ( xsave_enabled(v) )
> {
> /* Avoid recursion */
> clts();
> fpu_xrstor(v, XSTATE_NONLAZY);
> stts();
> }
That's certainly better, but I'd still like to see the xsave_enabled()
check to be replaced by some form of lwp_enabled() or
lazy_xsave_needed() or some such (which will at once exclude all
pv guests until you care to add support for them).
Jan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions
2011-05-05 7:13 ` Jan Beulich
@ 2011-05-05 21:41 ` Wei Huang
2011-05-06 7:49 ` Jan Beulich
0 siblings, 1 reply; 6+ messages in thread
From: Wei Huang @ 2011-05-05 21:41 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xensource.com, KeirFraser
[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]
Hi Jan,
If we want to make LWP restore optional in vcpu_restore_fpu_eager(), we
have to change vcpu_save_fpu() as well. Otherwise, the extended state
will become inconsistent for non-LWP VCPUs (because save and restore is
asymmetric). There are two approaches:
1. In vcpu_save_fpu(), clean physical CPU's extended state for VCPU
which is being scheduled in. This prevents messy states from causing
problems. The disadvantage is the cleaning cost, which would out-weight
the benefits.
2. Add a new variable in VCPU to track whether nonlazy state is dirty. I
think this is better. See the attached file.
Let me know if it is what you want. After that, I will re-spin the patches.
Thanks,
-Wei
On 05/05/2011 02:13 AM, Jan Beulich wrote:
>>>> On 04.05.11 at 18:33, Wei Huang<wei.huang2@amd.com> wrote:
>> Checking whether there is a non-lazy state to save is architectural
>> specific and very messy. For instance, we need to read LWP_CBADDR to
>> confirm LWP's dirty state. This MSR is AMD specific and we don't want to
>> add it here. Plus reading data from LWP_CBADDR MSR might be as expensive
>> as clts/stts.
>>
>> My previous email showed that the overhead with LWP is around 1%-2% of
>> __context_switch(). For non lwp-capable CPU, this overhead should be
>> much smaller (only clts and stts) because xfeature_mask[LWP] is 0.
> I wasn't talking about determining whether LWP state is dirty, but
> much rather about LWP not being in use at all.
>
>> Yes, clts() and stts() don't have to called every time. How about this one?
>>
>> /* Restore FPU state whenever VCPU is schduled in. */
>> void vcpu_restore_fpu_eager(struct vcpu *v)
>> {
>> ASSERT(!is_idle_vcpu(v));
>>
>>
>> /* save the nonlazy extended state which is not tracked by CR0.TS bit */
>> if ( xsave_enabled(v) )
>> {
>> /* Avoid recursion */
>> clts();
>> fpu_xrstor(v, XSTATE_NONLAZY);
>> stts();
>> }
> That's certainly better, but I'd still like to see the xsave_enabled()
> check to be replaced by some form of lwp_enabled() or
> lazy_xsave_needed() or some such (which will at once exclude all
> pv guests until you care to add support for them).
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
[-- Attachment #2: nonlazy_dirty.txt --]
[-- Type: text/plain, Size: 2594 bytes --]
diff -r 343470b5ad6b xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c Tue May 03 14:05:28 2011 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c Thu May 05 16:40:41 2011 -0500
@@ -716,7 +716,7 @@
unsigned int eax, ebx, ecx, edx;
uint32_t msr_low;
- if ( cpu_has_lwp )
+ if ( xsave_enabled(v) && cpu_has_lwp )
{
hvm_cpuid(0x8000001c, &eax, &ebx, &ecx, &edx);
msr_low = (uint32_t)msr_content;
@@ -729,6 +729,9 @@
/* CPU might automatically correct reserved bits. So read it back. */
rdmsrl(MSR_AMD64_LWP_CFG, msr_content);
v->arch.hvm_svm.guest_lwp_cfg = msr_content;
+
+ /* track nonalzy state if LWP_CFG is non-zero. */
+ v->arch.nonlazy_xstate_dirty = !!(msr_content);
}
return 0;
diff -r 343470b5ad6b xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c Tue May 03 14:05:28 2011 -0500
+++ b/xen/arch/x86/i387.c Thu May 05 16:40:41 2011 -0500
@@ -98,13 +98,13 @@
/* FPU Save Functions */
/*******************************/
/* Save x87 extended state */
-static inline void fpu_xsave(struct vcpu *v, uint64_t mask)
+static inline void fpu_xsave(struct vcpu *v)
{
/* XCR0 normally represents what guest OS set. In case of Xen itself,
* we set all accumulated feature mask before doing save/restore.
*/
set_xcr0(v->arch.xcr0_accum);
- xsave(v, mask);
+ xsave(v, v->arch.nonlazy_xstate_dirty ? XSTATE_ALL : XSTATE_LAZY);
set_xcr0(v->arch.xcr0);
}
@@ -164,15 +164,15 @@
void vcpu_restore_fpu_eager(struct vcpu *v)
{
ASSERT(!is_idle_vcpu(v));
-
- /* Avoid recursion */
- clts();
/* save the nonlazy extended state which is not tracked by CR0.TS bit */
- if ( xsave_enabled(v) )
+ if ( xsave_enabled(v) && v->arch.nonlazy_xstate_dirty )
+ {
+ /* Avoid recursion */
+ clts();
fpu_xrstor(v, XSTATE_NONLAZY);
-
- stts();
+ stts();
+ }
}
/*
@@ -219,7 +219,7 @@
clts();
if ( xsave_enabled(v) )
- fpu_xsave(v, XSTATE_ALL);
+ fpu_xsave(v);
else if ( cpu_has_fxsr )
fpu_fxsave(v);
else
diff -r 343470b5ad6b xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h Tue May 03 14:05:28 2011 -0500
+++ b/xen/include/asm-x86/domain.h Thu May 05 16:40:41 2011 -0500
@@ -492,6 +492,9 @@
* it explicitly enables it via xcr0.
*/
uint64_t xcr0_accum;
+ /* This variable determines whether nonlazy extended state is dirty and
+ * needs to be tracked. */
+ bool_t nonlazy_xstate_dirty;
struct paging_vcpu paging;
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions
2011-05-05 21:41 ` Wei Huang
@ 2011-05-06 7:49 ` Jan Beulich
0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2011-05-06 7:49 UTC (permalink / raw)
To: Wei Huang; +Cc: xen-devel@lists.xensource.com, KeirFraser
>>> On 05.05.11 at 23:41, Wei Huang <wei.huang2@amd.com> wrote:
> Hi Jan,
>
> If we want to make LWP restore optional in vcpu_restore_fpu_eager(), we
> have to change vcpu_save_fpu() as well. Otherwise, the extended state
> will become inconsistent for non-LWP VCPUs (because save and restore is
> asymmetric). There are two approaches:
>
> 1. In vcpu_save_fpu(), clean physical CPU's extended state for VCPU
> which is being scheduled in. This prevents messy states from causing
> problems. The disadvantage is the cleaning cost, which would out-weight
> the benefits.
Cleaning cost? Wasn't it that one can express to default-initialize
fields during xrstor (which, if indeed expensive, you'd want to trigger
only if you know the physical CPU's state is dirty, i.e. in this case
requiring a per-CPU variable that gets evaluated and updated on
context restore).
> 2. Add a new variable in VCPU to track whether nonlazy state is dirty. I
> think this is better. See the attached file.
>
> Let me know if it is what you want. After that, I will re-spin the patches.
Yes, this looks like what I meant. Two suggestions: The new field's
name (nonlazy_xstate_dirty) would perhaps better be something
like nonlazy_xstate_used, so that name and use are in sync. And
the check in vcpu_restore_fpu_eager() probably doesn't need to
re-evaluate xsave_enabled(v), since the flag can't get set without
this (if you absolutely want to, put in an ASSERT() to this effect).
Jan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-05-06 7:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-03 20:17 [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions Wei Huang
2011-05-04 7:09 ` Jan Beulich
2011-05-04 16:33 ` Wei Huang
2011-05-05 7:13 ` Jan Beulich
2011-05-05 21:41 ` Wei Huang
2011-05-06 7:49 ` Jan Beulich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).