From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Huang Subject: Re: [PATCH] FPU LWP 6/8: create lazy and non-lazy FPU restore functions Date: Wed, 4 May 2011 11:33:55 -0500 Message-ID: <4DC17FF3.5080706@amd.com> References: <4DC062ED.3070802@amd.com> <4DC117D6020000780003F9CB@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4DC117D6020000780003F9CB@vpn.id2.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: "xen-devel@lists.xensource.com" , KeirFraser List-Id: xen-devel@lists.xenproject.org Checking whether there is a non-lazy state to save is architectural specific and very messy. For instance, we need to read LWP_CBADDR to confirm LWP's dirty state. This MSR is AMD specific and we don't want to add it here. Plus reading data from LWP_CBADDR MSR might be as expensive as clts/stts. My previous email showed that the overhead with LWP is around 1%-2% of __context_switch(). For non lwp-capable CPU, this overhead should be much smaller (only clts and stts) because xfeature_mask[LWP] is 0. Yes, clts() and stts() don't have to called every time. How about this one? /* Restore FPU state whenever VCPU is schduled in. */ void vcpu_restore_fpu_eager(struct vcpu *v) { ASSERT(!is_idle_vcpu(v)); /* save the nonlazy extended state which is not tracked by CR0.TS bit */ if ( xsave_enabled(v) ) { /* Avoid recursion */ clts(); fpu_xrstor(v, XSTATE_NONLAZY); stts(); } . On 05/04/2011 02:09 AM, Jan Beulich wrote: >>>> On 03.05.11 at 22:17, Wei Huang wrote: > Again as pointed out earlier, ... > >> --- a/xen/arch/x86/domain.c Tue May 03 13:49:27 2011 -0500 >> +++ b/xen/arch/x86/domain.c Tue May 03 13:59:37 2011 -0500 >> @@ -1578,6 +1578,7 @@ >> memcpy(stack_regs,&n->arch.user_regs, CTXT_SWITCH_STACK_BYTES); >> if ( xsave_enabled(n)&& n->arch.xcr0 != get_xcr0() ) >> set_xcr0(n->arch.xcr0); >> + vcpu_restore_fpu_eager(n); > ... this call is unconditional, ... > >> n->arch.ctxt_switch_to(n); >> } >> >> --- a/xen/arch/x86/i387.c Tue May 03 13:49:27 2011 -0500 >> +++ b/xen/arch/x86/i387.c Tue May 03 13:59:37 2011 -0500 >> @@ -160,10 +160,25 @@ >> /*******************************/ >> /* VCPU FPU Functions */ >> /*******************************/ >> +/* Restore FPU state whenever VCPU is schduled in. */ >> +void vcpu_restore_fpu_eager(struct vcpu *v) >> +{ >> + ASSERT(!is_idle_vcpu(v)); >> + >> + /* Avoid recursion */ >> + clts(); >> + >> + /* save the nonlazy extended state which is not tracked by CR0.TS bit */ >> + if ( xsave_enabled(v) ) >> + fpu_xrstor(v, XSTATE_NONLAZY); >> + >> + stts(); > ... while here you do an unconditional clts followed by an xrstor only > checking whether xsave is enabled (but not checking whether there's > any non-lazy state to be restored) and, possibly the most expensive > of all, an unconditional write of CR0. > > Jan > >> +} >> + >> /* >> * Restore FPU state when #NM is triggered. >> */ >> -void vcpu_restore_fpu(struct vcpu *v) >> +void vcpu_restore_fpu_lazy(struct vcpu *v) >> { >> ASSERT(!is_idle_vcpu(v)); >> > >