* [PATCH][RFC] FPU LWP 0/5: patch description @ 2011-04-14 20:37 Wei Huang 2011-04-14 21:09 ` Keir Fraser 0 siblings, 1 reply; 5+ messages in thread From: Wei Huang @ 2011-04-14 20:37 UTC (permalink / raw) To: 'xen-devel@lists.xensource.com' The following patches support AMD lightweight profiling. Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to handle lazy and unlazy FPU states differently. Lazy FPU state (such as SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, is saved and restored on each vcpu context switch. To simplify the code, we also add a mask option to xsave/xrstor function. Thanks, -Wei ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] FPU LWP 0/5: patch description 2011-04-14 20:37 [PATCH][RFC] FPU LWP 0/5: patch description Wei Huang @ 2011-04-14 21:09 ` Keir Fraser 2011-04-14 22:57 ` Wei Huang 0 siblings, 1 reply; 5+ messages in thread From: Keir Fraser @ 2011-04-14 21:09 UTC (permalink / raw) To: Wei Huang, 'xen-devel@lists.xensource.com' On 14/04/2011 21:37, "Wei Huang" <wei.huang2@amd.com> wrote: > The following patches support AMD lightweight profiling. > > Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to > handle lazy and unlazy FPU states differently. Lazy FPU state (such as > SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, > is saved and restored on each vcpu context switch. To simplify the code, > we also add a mask option to xsave/xrstor function. How much cost is added to context switch paths in the (overwhelmingly likely) case that LWP is not being used by the guest? Is this adding a whole lot of unconditional overhead for a feature that noone uses? -- Keir > Thanks, > -Wei > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][RFC] FPU LWP 0/5: patch description 2011-04-14 21:09 ` Keir Fraser @ 2011-04-14 22:57 ` Wei Huang 2011-04-15 20:16 ` Dan Magenheimer 0 siblings, 1 reply; 5+ messages in thread From: Wei Huang @ 2011-04-14 22:57 UTC (permalink / raw) To: Keir Fraser; +Cc: 'xen-devel@lists.xensource.com' Hi Keir, I ran a quick test to calculate the overhead of __fpu_unlazy_save() and __fpu_unlazy_restore(), which are used to save/restore LWP state. Here are the results: (1) tsc_total: total time used for context_switch() in x86/domain.c (2) tsc_unlazy: total time used for __fpu_unlazy_save() + __fpu_unlazy_retore() One example: (XEN) tsc_unlazy=0x00000000008ae174 (XEN) tsc_total=0x00000001028b4907 So the overhead is about 0.2% of total time used by context_switch(). Of course, this is just one example. I would say the overhead ratio would be <1% for most cases. Thanks, -Wei On 04/14/2011 04:09 PM, Keir Fraser wrote: > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote: > >> The following patches support AMD lightweight profiling. >> >> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to >> handle lazy and unlazy FPU states differently. Lazy FPU state (such as >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, >> is saved and restored on each vcpu context switch. To simplify the code, >> we also add a mask option to xsave/xrstor function. > How much cost is added to context switch paths in the (overwhelmingly > likely) case that LWP is not being used by the guest? Is this adding a whole > lot of unconditional overhead for a feature that noone uses? > > -- Keir > >> Thanks, >> -Wei >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][RFC] FPU LWP 0/5: patch description 2011-04-14 22:57 ` Wei Huang @ 2011-04-15 20:16 ` Dan Magenheimer 2011-04-15 20:23 ` Huang2, Wei 0 siblings, 1 reply; 5+ messages in thread From: Dan Magenheimer @ 2011-04-15 20:16 UTC (permalink / raw) To: Wei Huang, Keir Fraser; +Cc: xen-devel Wait... a context switch takes over 4 billion cycles? Not likely! And please check your division. I get the same answer from "dc" only when I use lowercase hex numbers and dc complains about unimplemented chars, else I get 0.033%... also unlikely. > -----Original Message----- > From: Wei Huang [mailto:wei.huang2@amd.com] > Sent: Thursday, April 14, 2011 4:57 PM > To: Keir Fraser > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description > > Hi Keir, > > I ran a quick test to calculate the overhead of __fpu_unlazy_save() and > __fpu_unlazy_restore(), which are used to save/restore LWP state. Here > are the results: > > (1) tsc_total: total time used for context_switch() in x86/domain.c > (2) tsc_unlazy: total time used for __fpu_unlazy_save() + > __fpu_unlazy_retore() > > One example: > (XEN) tsc_unlazy=0x00000000008ae174 > (XEN) tsc_total=0x00000001028b4907 > > So the overhead is about 0.2% of total time used by context_switch(). > Of > course, this is just one example. I would say the overhead ratio would > be <1% for most cases. > > Thanks, > -Wei > > > > On 04/14/2011 04:09 PM, Keir Fraser wrote: > > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote: > > > >> The following patches support AMD lightweight profiling. > >> > >> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to > >> handle lazy and unlazy FPU states differently. Lazy FPU state (such > as > >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as > LWP, > >> is saved and restored on each vcpu context switch. To simplify the > code, > >> we also add a mask option to xsave/xrstor function. > > How much cost is added to context switch paths in the (overwhelmingly > > likely) case that LWP is not being used by the guest? Is this adding > a whole > > lot of unconditional overhead for a feature that noone uses? > > > > -- Keir > > > >> Thanks, > >> -Wei > >> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH][RFC] FPU LWP 0/5: patch description 2011-04-15 20:16 ` Dan Magenheimer @ 2011-04-15 20:23 ` Huang2, Wei 0 siblings, 0 replies; 5+ messages in thread From: Huang2, Wei @ 2011-04-15 20:23 UTC (permalink / raw) To: Dan Magenheimer, Keir Fraser; +Cc: xen-devel@lists.xensource.com Hi Dan, This isn't the cycles of a single switch. This is the total cycle count (added) over a period. I randomly dumped the numbers when a guest was running. Thanks, -Wei -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Dan Magenheimer Sent: Friday, April 15, 2011 3:16 PM To: Huang2, Wei; Keir Fraser Cc: xen-devel@lists.xensource.com Subject: RE: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description Wait... a context switch takes over 4 billion cycles? Not likely! And please check your division. I get the same answer from "dc" only when I use lowercase hex numbers and dc complains about unimplemented chars, else I get 0.033%... also unlikely. > -----Original Message----- > From: Wei Huang [mailto:wei.huang2@amd.com] > Sent: Thursday, April 14, 2011 4:57 PM > To: Keir Fraser > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH][RFC] FPU LWP 0/5: patch description > > Hi Keir, > > I ran a quick test to calculate the overhead of __fpu_unlazy_save() and > __fpu_unlazy_restore(), which are used to save/restore LWP state. Here > are the results: > > (1) tsc_total: total time used for context_switch() in x86/domain.c > (2) tsc_unlazy: total time used for __fpu_unlazy_save() + > __fpu_unlazy_retore() > > One example: > (XEN) tsc_unlazy=0x00000000008ae174 > (XEN) tsc_total=0x00000001028b4907 > > So the overhead is about 0.2% of total time used by context_switch(). > Of > course, this is just one example. I would say the overhead ratio would > be <1% for most cases. > > Thanks, > -Wei > > > > On 04/14/2011 04:09 PM, Keir Fraser wrote: > > On 14/04/2011 21:37, "Wei Huang"<wei.huang2@amd.com> wrote: > > > >> The following patches support AMD lightweight profiling. > >> > >> Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to > >> handle lazy and unlazy FPU states differently. Lazy FPU state (such > as > >> SSE, YMM) is handled when #NM is triggered. Unlazy state, such as > LWP, > >> is saved and restored on each vcpu context switch. To simplify the > code, > >> we also add a mask option to xsave/xrstor function. > > How much cost is added to context switch paths in the (overwhelmingly > > likely) case that LWP is not being used by the guest? Is this adding > a whole > > lot of unconditional overhead for a feature that noone uses? > > > > -- Keir > > > >> Thanks, > >> -Wei > >> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-04-15 20:23 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-14 20:37 [PATCH][RFC] FPU LWP 0/5: patch description Wei Huang 2011-04-14 21:09 ` Keir Fraser 2011-04-14 22:57 ` Wei Huang 2011-04-15 20:16 ` Dan Magenheimer 2011-04-15 20:23 ` Huang2, Wei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).