From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Huang Subject: Re: Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions Date: Thu, 5 May 2011 16:41:07 -0500 Message-ID: <4DC31973.8040009@amd.com> References: <4DC062ED.3070802@amd.com> <4DC117D6020000780003F9CB@vpn.id2.novell.com> <4DC17FF3.5080706@amd.com> <4DC26A46020000780003FC3E@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------020602000508060203020505" Return-path: In-Reply-To: <4DC26A46020000780003FC3E@vpn.id2.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: "xen-devel@lists.xensource.com" , KeirFraser List-Id: xen-devel@lists.xenproject.org --------------020602000508060203020505 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Hi Jan, If we want to make LWP restore optional in vcpu_restore_fpu_eager(), we have to change vcpu_save_fpu() as well. Otherwise, the extended state will become inconsistent for non-LWP VCPUs (because save and restore is asymmetric). There are two approaches: 1. In vcpu_save_fpu(), clean physical CPU's extended state for VCPU which is being scheduled in. This prevents messy states from causing problems. The disadvantage is the cleaning cost, which would out-weight the benefits. 2. Add a new variable in VCPU to track whether nonlazy state is dirty. I think this is better. See the attached file. Let me know if it is what you want. After that, I will re-spin the patches. Thanks, -Wei On 05/05/2011 02:13 AM, Jan Beulich wrote: >>>> On 04.05.11 at 18:33, Wei Huang wrote: >> Checking whether there is a non-lazy state to save is architectural >> specific and very messy. For instance, we need to read LWP_CBADDR to >> confirm LWP's dirty state. This MSR is AMD specific and we don't want to >> add it here. Plus reading data from LWP_CBADDR MSR might be as expensive >> as clts/stts. >> >> My previous email showed that the overhead with LWP is around 1%-2% of >> __context_switch(). For non lwp-capable CPU, this overhead should be >> much smaller (only clts and stts) because xfeature_mask[LWP] is 0. > I wasn't talking about determining whether LWP state is dirty, but > much rather about LWP not being in use at all. > >> Yes, clts() and stts() don't have to called every time. How about this one? >> >> /* Restore FPU state whenever VCPU is schduled in. */ >> void vcpu_restore_fpu_eager(struct vcpu *v) >> { >> ASSERT(!is_idle_vcpu(v)); >> >> >> /* save the nonlazy extended state which is not tracked by CR0.TS bit */ >> if ( xsave_enabled(v) ) >> { >> /* Avoid recursion */ >> clts(); >> fpu_xrstor(v, XSTATE_NONLAZY); >> stts(); >> } > That's certainly better, but I'd still like to see the xsave_enabled() > check to be replaced by some form of lwp_enabled() or > lazy_xsave_needed() or some such (which will at once exclude all > pv guests until you care to add support for them). > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > --------------020602000508060203020505 Content-Type: text/plain; name="nonlazy_dirty.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="nonlazy_dirty.txt" Content-Description: nonlazy_dirty.txt diff -r 343470b5ad6b xen/arch/x86/hvm/svm/svm.c --- a/xen/arch/x86/hvm/svm/svm.c Tue May 03 14:05:28 2011 -0500 +++ b/xen/arch/x86/hvm/svm/svm.c Thu May 05 16:40:41 2011 -0500 @@ -716,7 +716,7 @@ unsigned int eax, ebx, ecx, edx; uint32_t msr_low; - if ( cpu_has_lwp ) + if ( xsave_enabled(v) && cpu_has_lwp ) { hvm_cpuid(0x8000001c, &eax, &ebx, &ecx, &edx); msr_low = (uint32_t)msr_content; @@ -729,6 +729,9 @@ /* CPU might automatically correct reserved bits. So read it back. */ rdmsrl(MSR_AMD64_LWP_CFG, msr_content); v->arch.hvm_svm.guest_lwp_cfg = msr_content; + + /* track nonalzy state if LWP_CFG is non-zero. */ + v->arch.nonlazy_xstate_dirty = !!(msr_content); } return 0; diff -r 343470b5ad6b xen/arch/x86/i387.c --- a/xen/arch/x86/i387.c Tue May 03 14:05:28 2011 -0500 +++ b/xen/arch/x86/i387.c Thu May 05 16:40:41 2011 -0500 @@ -98,13 +98,13 @@ /* FPU Save Functions */ /*******************************/ /* Save x87 extended state */ -static inline void fpu_xsave(struct vcpu *v, uint64_t mask) +static inline void fpu_xsave(struct vcpu *v) { /* XCR0 normally represents what guest OS set. In case of Xen itself, * we set all accumulated feature mask before doing save/restore. */ set_xcr0(v->arch.xcr0_accum); - xsave(v, mask); + xsave(v, v->arch.nonlazy_xstate_dirty ? XSTATE_ALL : XSTATE_LAZY); set_xcr0(v->arch.xcr0); } @@ -164,15 +164,15 @@ void vcpu_restore_fpu_eager(struct vcpu *v) { ASSERT(!is_idle_vcpu(v)); - - /* Avoid recursion */ - clts(); /* save the nonlazy extended state which is not tracked by CR0.TS bit */ - if ( xsave_enabled(v) ) + if ( xsave_enabled(v) && v->arch.nonlazy_xstate_dirty ) + { + /* Avoid recursion */ + clts(); fpu_xrstor(v, XSTATE_NONLAZY); - - stts(); + stts(); + } } /* @@ -219,7 +219,7 @@ clts(); if ( xsave_enabled(v) ) - fpu_xsave(v, XSTATE_ALL); + fpu_xsave(v); else if ( cpu_has_fxsr ) fpu_fxsave(v); else diff -r 343470b5ad6b xen/include/asm-x86/domain.h --- a/xen/include/asm-x86/domain.h Tue May 03 14:05:28 2011 -0500 +++ b/xen/include/asm-x86/domain.h Thu May 05 16:40:41 2011 -0500 @@ -492,6 +492,9 @@ * it explicitly enables it via xcr0. */ uint64_t xcr0_accum; + /* This variable determines whether nonlazy extended state is dirty and + * needs to be tracked. */ + bool_t nonlazy_xstate_dirty; struct paging_vcpu paging; --------------020602000508060203020505 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --------------020602000508060203020505--