* [PATCH][RFC] FPU LWP 0/5: patch description
@ 2011-04-14 20:37 Wei Huang
2011-04-14 21:09 ` Keir Fraser
0 siblings, 1 reply; 5+ messages in thread
From: Wei Huang @ 2011-04-14 20:37 UTC (permalink / raw)
To: 'xen-devel@lists.xensource.com'
The following patches support AMD lightweight profiling.
Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
handle lazy and unlazy FPU states differently. Lazy FPU state (such as
SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
is saved and restored on each vcpu context switch. To simplify the code,
we also add a mask option to xsave/xrstor function.
Thanks,
-Wei
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] FPU LWP 0/5: patch description
2011-04-14 20:37 [PATCH][RFC] FPU LWP 0/5: patch description Wei Huang
@ 2011-04-14 21:09 ` Keir Fraser
2011-04-14 22:57 ` Wei Huang
0 siblings, 1 reply; 5+ messages in thread
From: Keir Fraser @ 2011-04-14 21:09 UTC (permalink / raw)
To: Wei Huang, 'xen-devel@lists.xensource.com'
On 14/04/2011 21:37, "Wei Huang" <wei.huang2@amd.com> wrote:
> The following patches support AMD lightweight profiling.
>
> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
> handle lazy and unlazy FPU states differently. Lazy FPU state (such as
> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
> is saved and restored on each vcpu context switch. To simplify the code,
> we also add a mask option to xsave/xrstor function.
How much cost is added to context switch paths in the (overwhelmingly
likely) case that LWP is not being used by the guest? Is this adding a whole
lot of unconditional overhead for a feature that noone uses?
-- Keir
> Thanks,
> -Wei
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] FPU LWP 0/5: patch description
2011-04-14 21:09 ` Keir Fraser
@ 2011-04-14 22:57 ` Wei Huang
2011-04-15 20:16 ` Dan Magenheimer
0 siblings, 1 reply; 5+ messages in thread
From: Wei Huang @ 2011-04-14 22:57 UTC (permalink / raw)
To: Keir Fraser; +Cc: 'xen-devel@lists.xensource.com'
Hi Keir,
I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
__fpu_unlazy_restore(), which are used to save/restore LWP state. Here
are the results:
(1) tsc_total: total time used for context_switch() in x86/domain.c
(2) tsc_unlazy: total time used for __fpu_unlazy_save() +
__fpu_unlazy_retore()
One example:
(XEN) tsc_unlazy=0x00000000008ae174
(XEN) tsc_total=0x00000001028b4907
So the overhead is about 0.2% of total time used by context_switch(). Of
course, this is just one example. I would say the overhead ratio would
be <1% for most cases.
Thanks,
-Wei
On 04/14/2011 04:09 PM, Keir Fraser wrote:
> On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote:
>
>> The following patches support AMD lightweight profiling.
>>
>> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
>> handle lazy and unlazy FPU states differently. Lazy FPU state (such as
>> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP,
>> is saved and restored on each vcpu context switch. To simplify the code,
>> we also add a mask option to xsave/xrstor function.
> How much cost is added to context switch paths in the (overwhelmingly
> likely) case that LWP is not being used by the guest? Is this adding a whole
> lot of unconditional overhead for a feature that noone uses?
>
> -- Keir
>
>> Thanks,
>> -Wei
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][RFC] FPU LWP 0/5: patch description
2011-04-14 22:57 ` Wei Huang
@ 2011-04-15 20:16 ` Dan Magenheimer
2011-04-15 20:23 ` Huang2, Wei
0 siblings, 1 reply; 5+ messages in thread
From: Dan Magenheimer @ 2011-04-15 20:16 UTC (permalink / raw)
To: Wei Huang, Keir Fraser; +Cc: xen-devel
Wait... a context switch takes over 4 billion cycles?
Not likely!
And please check your division. I get the same
answer from "dc" only when I use lowercase hex
numbers and dc complains about unimplemented chars,
else I get 0.033%... also unlikely.
> -----Original Message-----
> From: Wei Huang [mailto:wei.huang2@amd.com]
> Sent: Thursday, April 14, 2011 4:57 PM
> To: Keir Fraser
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
>
> Hi Keir,
>
> I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
> __fpu_unlazy_restore(), which are used to save/restore LWP state. Here
> are the results:
>
> (1) tsc_total: total time used for context_switch() in x86/domain.c
> (2) tsc_unlazy: total time used for __fpu_unlazy_save() +
> __fpu_unlazy_retore()
>
> One example:
> (XEN) tsc_unlazy=0x00000000008ae174
> (XEN) tsc_total=0x00000001028b4907
>
> So the overhead is about 0.2% of total time used by context_switch().
> Of
> course, this is just one example. I would say the overhead ratio would
> be <1% for most cases.
>
> Thanks,
> -Wei
>
>
>
> On 04/14/2011 04:09 PM, Keir Fraser wrote:
> > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote:
> >
> >> The following patches support AMD lightweight profiling.
> >>
> >> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
> >> handle lazy and unlazy FPU states differently. Lazy FPU state (such
> as
> >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as
> LWP,
> >> is saved and restored on each vcpu context switch. To simplify the
> code,
> >> we also add a mask option to xsave/xrstor function.
> > How much cost is added to context switch paths in the (overwhelmingly
> > likely) case that LWP is not being used by the guest? Is this adding
> a whole
> > lot of unconditional overhead for a feature that noone uses?
> >
> > -- Keir
> >
> >> Thanks,
> >> -Wei
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >
> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][RFC] FPU LWP 0/5: patch description
2011-04-15 20:16 ` Dan Magenheimer
@ 2011-04-15 20:23 ` Huang2, Wei
0 siblings, 0 replies; 5+ messages in thread
From: Huang2, Wei @ 2011-04-15 20:23 UTC (permalink / raw)
To: Dan Magenheimer, Keir Fraser; +Cc: xen-devel@lists.xensource.com
Hi Dan,
This isn't the cycles of a single switch. This is the total cycle count (added) over a period. I randomly dumped the numbers when a guest was running.
Thanks,
-Wei
-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Dan Magenheimer
Sent: Friday, April 15, 2011 3:16 PM
To: Huang2, Wei; Keir Fraser
Cc: xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
Wait... a context switch takes over 4 billion cycles?
Not likely!
And please check your division. I get the same
answer from "dc" only when I use lowercase hex
numbers and dc complains about unimplemented chars,
else I get 0.033%... also unlikely.
> -----Original Message-----
> From: Wei Huang [mailto:wei.huang2@amd.com]
> Sent: Thursday, April 14, 2011 4:57 PM
> To: Keir Fraser
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description
>
> Hi Keir,
>
> I ran a quick test to calculate the overhead of __fpu_unlazy_save() and
> __fpu_unlazy_restore(), which are used to save/restore LWP state. Here
> are the results:
>
> (1) tsc_total: total time used for context_switch() in x86/domain.c
> (2) tsc_unlazy: total time used for __fpu_unlazy_save() +
> __fpu_unlazy_retore()
>
> One example:
> (XEN) tsc_unlazy=0x00000000008ae174
> (XEN) tsc_total=0x00000001028b4907
>
> So the overhead is about 0.2% of total time used by context_switch().
> Of
> course, this is just one example. I would say the overhead ratio would
> be <1% for most cases.
>
> Thanks,
> -Wei
>
>
>
> On 04/14/2011 04:09 PM, Keir Fraser wrote:
> > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote:
> >
> >> The following patches support AMD lightweight profiling.
> >>
> >> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to
> >> handle lazy and unlazy FPU states differently. Lazy FPU state (such
> as
> >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as
> LWP,
> >> is saved and restored on each vcpu context switch. To simplify the
> code,
> >> we also add a mask option to xsave/xrstor function.
> > How much cost is added to context switch paths in the (overwhelmingly
> > likely) case that LWP is not being used by the guest? Is this adding
> a whole
> > lot of unconditional overhead for a feature that noone uses?
> >
> > -- Keir
> >
> >> Thanks,
> >> -Wei
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >
> >
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-04-15 20:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-14 20:37 [PATCH][RFC] FPU LWP 0/5: patch description Wei Huang
2011-04-14 21:09 ` Keir Fraser
2011-04-14 22:57 ` Wei Huang
2011-04-15 20:16 ` Dan Magenheimer
2011-04-15 20:23 ` Huang2, Wei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).