* [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions @ 2014-04-15 13:01 Feng Wu 2014-04-15 8:40 ` Jan Beulich 0 siblings, 1 reply; 13+ messages in thread From: Feng Wu @ 2014-04-15 13:01 UTC (permalink / raw) To: JBeulich, Ian.Campbell, xen-devel; +Cc: Feng Wu, eddie.dong, jun.nakajima The STAC/CLAC instructions are only available when SMAP is enabled, but on the other hand they aren't needed if SMAP is not available, or before we start to run userspace, in that case, the functions and macros do nothing. Signed-off-by: Feng Wu <feng.wu@intel.com> --- xen/arch/x86/x86_64/asm-offsets.c | 1 + xen/include/asm-x86/x86_64/asm_defns.h | 70 ++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c index b0098b3..fa4cbb6 100644 --- a/xen/arch/x86/x86_64/asm-offsets.c +++ b/xen/arch/x86/x86_64/asm-offsets.c @@ -160,6 +160,7 @@ void __dummy__(void) BLANK(); OFFSET(CPUINFO86_ext_features, struct cpuinfo_x86, x86_capability[1]); + OFFSET(CPUINFO86_leaf7_features, struct cpuinfo_x86, x86_capability[7]); BLANK(); OFFSET(MB_flags, multiboot_info_t, flags); diff --git a/xen/include/asm-x86/x86_64/asm_defns.h b/xen/include/asm-x86/x86_64/asm_defns.h index bf63ac1..6805629 100644 --- a/xen/include/asm-x86/x86_64/asm_defns.h +++ b/xen/include/asm-x86/x86_64/asm_defns.h @@ -228,4 +228,74 @@ __asm__( \ # define _ASM_EX(p) #p "-." #endif +/* "Raw" instruction opcodes */ +#define __ASM_CLAC .byte 0x0f,0x01,0xca +#define __ASM_STAC .byte 0x0f,0x01,0xcb + +/* Indirect stringification. Doing two levels allows the parameter to be a + * macro itself. For example, compile with -DFOO=bar, __stringify(FOO) + * converts to "bar". + */ +#define __stringify_1(x...) #x +#define __stringify(x...) __stringify_1(x) + +#ifdef __ASSEMBLY__ +#define X86_FEATURE_SMAP (7*32+20) +#define ASM_STAC \ + pushq %rax; \ + leaq boot_cpu_data(%rip),%rax; \ + btl $X86_FEATURE_SMAP-7*32, CPUINFO86_leaf7_features(%rax); \ + jnc 881f; \ + movq %cr4,%rax; \ + testl $X86_CR4_SMAP,%eax; \ + jz 881f; \ + __ASM_STAC; \ +881: popq %rax + +#define ASM_CLAC \ + pushq %rax; \ + leaq boot_cpu_data(%rip),%rax; \ + btl $X86_FEATURE_SMAP-7*32, CPUINFO86_leaf7_features(%rax); \ + jnc 881f; \ + movq %cr4,%rax; \ + testl $X86_CR4_SMAP,%eax; \ + jz 881f; \ + __ASM_CLAC; \ +881: popq %rax +#else +#define ASM_STAC \ + "\npushq %%rax\n\t" \ + "leaq boot_cpu_data(%%rip),%%rax\n\t" \ + "btl $" __stringify(X86_FEATURE_SMAP) "-7*32," \ + __stringify(CPUINFO86_leaf7_features) "(%%rax)\n\t" \ + "jnc 881f\n\t" \ + "movq %%cr4,%%rax\n\t" \ + "testl $" __stringify(X86_CR4_SMAP) ",%%eax\n\t" \ + "jz 881f\n\t" \ + __stringify(__ASM_STAC) "\n\t" \ +"881: popq %%rax" + +#define ASM_CLAC \ + "\npushq %%rax\n\t" \ + "leaq boot_cpu_data(%%rip),%%rax\n\t" \ + "btl $" __stringify(X86_FEATURE_SMAP) "-7*32," \ + __stringify(CPUINFO86_leaf7_features) "(%%rax)\n\t" \ + "jnc 881f\n\t" \ + "movq %%cr4,%%rax\n\t" \ + "testl $" __stringify(X86_CR4_SMAP) ",%%eax\n\t" \ + "jz 881f\n\t" \ + __stringify(__ASM_CLAC) "\n\t" \ +"881: popq %%rax" + +static inline void clac(void) +{ + asm volatile (ASM_CLAC : : : "memory"); +} + +static inline void stac(void) +{ + asm volatile (ASM_STAC : : : "memory"); +} +#endif + #endif /* __X86_64_ASM_DEFNS_H__ */ -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-15 13:01 [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions Feng Wu @ 2014-04-15 8:40 ` Jan Beulich 2014-04-22 7:41 ` Wu, Feng 0 siblings, 1 reply; 13+ messages in thread From: Jan Beulich @ 2014-04-15 8:40 UTC (permalink / raw) To: Feng Wu; +Cc: eddie.dong, Ian.Campbell, jun.nakajima, xen-devel >>> On 15.04.14 at 15:01, <feng.wu@intel.com> wrote: > --- a/xen/include/asm-x86/x86_64/asm_defns.h > +++ b/xen/include/asm-x86/x86_64/asm_defns.h These changes should go into xen/include/asm-x86/asm_defns.h; the x86-64 one should get merged into the previously common one sooner or later, so let's not force that merging patch to be bigger than it needs to be. > @@ -228,4 +228,74 @@ __asm__( \ > # define _ASM_EX(p) #p "-." > #endif > > +/* "Raw" instruction opcodes */ > +#define __ASM_CLAC .byte 0x0f,0x01,0xca > +#define __ASM_STAC .byte 0x0f,0x01,0xcb > + > +/* Indirect stringification. Doing two levels allows the parameter to be a > + * macro itself. For example, compile with -DFOO=bar, __stringify(FOO) > + * converts to "bar". > + */ > +#define __stringify_1(x...) #x > +#define __stringify(x...) __stringify_1(x) Please use xen/stringify.h rather than repeating its defintions. > + > +#ifdef __ASSEMBLY__ > +#define X86_FEATURE_SMAP (7*32+20) No - how would be spot this if we were to re-arrange the feature array? Either use (or make usable) the definitions in cpufeature.h, or propagate the necessary values through asm-offsets.h. > +#define ASM_STAC \ > + pushq %rax; \ > + leaq boot_cpu_data(%rip),%rax; \ > + btl $X86_FEATURE_SMAP-7*32, CPUINFO86_leaf7_features(%rax); \ > + jnc 881f; \ > + movq %cr4,%rax; \ > + testl $X86_CR4_SMAP,%eax; \ > + jz 881f; \ > + __ASM_STAC; \ > +881: popq %rax > + > +#define ASM_CLAC \ > + pushq %rax; \ > + leaq boot_cpu_data(%rip),%rax; \ > + btl $X86_FEATURE_SMAP-7*32, CPUINFO86_leaf7_features(%rax); \ > + jnc 881f; \ > + movq %cr4,%rax; \ > + testl $X86_CR4_SMAP,%eax; \ > + jz 881f; \ > + __ASM_CLAC; \ > +881: popq %rax The only difference between the two macros appears to be the final instruction - please define one macro with an argument, and then make the two definitions here simple wrappers around that macro. That said, the macro contents itself is horrible too: A control register access and two conditional branches in code intended to be used in fast paths? Definitely not an option. Even the simplest possible solution - adding a global flag to be checked here - would already be questionable. Hence I think you should at least consider porting over proper instruction patching abstraction from Linux. > +#else > +#define ASM_STAC \ > + "\npushq %%rax\n\t" \ > + "leaq boot_cpu_data(%%rip),%%rax\n\t" \ > + "btl $" __stringify(X86_FEATURE_SMAP) "-7*32," \ > + __stringify(CPUINFO86_leaf7_features) "(%%rax)\n\t" \ > + "jnc 881f\n\t" \ > + "movq %%cr4,%%rax\n\t" \ > + "testl $" __stringify(X86_CR4_SMAP) ",%%eax\n\t" \ > + "jz 881f\n\t" \ > + __stringify(__ASM_STAC) "\n\t" \ > +"881: popq %%rax" > + > +#define ASM_CLAC \ > + "\npushq %%rax\n\t" \ > + "leaq boot_cpu_data(%%rip),%%rax\n\t" \ > + "btl $" __stringify(X86_FEATURE_SMAP) "-7*32," \ > + __stringify(CPUINFO86_leaf7_features) "(%%rax)\n\t" \ > + "jnc 881f\n\t" \ > + "movq %%cr4,%%rax\n\t" \ > + "testl $" __stringify(X86_CR4_SMAP) ",%%eax\n\t" \ > + "jz 881f\n\t" \ > + __stringify(__ASM_CLAC) "\n\t" \ > +"881: popq %%rax" All the same applies to these of course. Jan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-15 8:40 ` Jan Beulich @ 2014-04-22 7:41 ` Wu, Feng 2014-04-22 8:07 ` Jan Beulich 0 siblings, 1 reply; 13+ messages in thread From: Wu, Feng @ 2014-04-22 7:41 UTC (permalink / raw) To: Jan Beulich Cc: Dong, Eddie, Ian.Campbell@citrix.com, Nakajima, Jun, xen-devel@lists.xen.org > -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Tuesday, April 15, 2014 4:40 PM > To: Wu, Feng > Cc: Ian.Campbell@citrix.com; Dong, Eddie; Nakajima, Jun; > xen-devel@lists.xen.org > Subject: Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions > > >>> On 15.04.14 at 15:01, <feng.wu@intel.com> wrote: > > --- a/xen/include/asm-x86/x86_64/asm_defns.h > > +++ b/xen/include/asm-x86/x86_64/asm_defns.h > > These changes should go into xen/include/asm-x86/asm_defns.h; the > x86-64 one should get merged into the previously common one sooner > or later, so let's not force that merging patch to be bigger than it needs > to be. > > > @@ -228,4 +228,74 @@ > __asm__( \ > > # define _ASM_EX(p) #p "-." > > #endif > > > > +/* "Raw" instruction opcodes */ > > +#define __ASM_CLAC .byte 0x0f,0x01,0xca > > +#define __ASM_STAC .byte 0x0f,0x01,0xcb > > + > > +/* Indirect stringification. Doing two levels allows the parameter to be a > > + * macro itself. For example, compile with -DFOO=bar, __stringify(FOO) > > + * converts to "bar". > > + */ > > +#define __stringify_1(x...) #x > > +#define __stringify(x...) __stringify_1(x) > > Please use xen/stringify.h rather than repeating its defintions. > > > + > > +#ifdef __ASSEMBLY__ > > +#define X86_FEATURE_SMAP (7*32+20) > > No - how would be spot this if we were to re-arrange the feature > array? Either use (or make usable) the definitions in cpufeature.h, > or propagate the necessary values through asm-offsets.h. > > > +#define ASM_STAC \ > > + pushq %rax; \ > > + leaq boot_cpu_data(%rip),%rax; \ > > + btl $X86_FEATURE_SMAP-7*32, CPUINFO86_leaf7_features(%rax); > \ > > + jnc 881f; \ > > + movq %cr4,%rax; \ > > + testl $X86_CR4_SMAP,%eax; \ > > + jz 881f; \ > > + __ASM_STAC; \ > > +881: popq %rax > > + > > +#define ASM_CLAC \ > > + pushq %rax; \ > > + leaq boot_cpu_data(%rip),%rax; \ > > + btl $X86_FEATURE_SMAP-7*32, CPUINFO86_leaf7_features(%rax); > \ > > + jnc 881f; \ > > + movq %cr4,%rax; \ > > + testl $X86_CR4_SMAP,%eax; \ > > + jz 881f; \ > > + __ASM_CLAC; \ > > +881: popq %rax > > The only difference between the two macros appears to be the final > instruction - please define one macro with an argument, and then > make the two definitions here simple wrappers around that macro. > > That said, the macro contents itself is horrible too: A control register > access and two conditional branches in code intended to be used in > fast paths? Definitely not an option. Even the simplest possible > solution - adding a global flag to be checked here - would already be > questionable. Hence I think you should at least consider porting over > proper instruction patching abstraction from Linux. > Jan, I did some investigation about how to handle this two instructions in Linux, basically, it uses the alternatives mechanism to handle these kind of cases. Let's take the following definition of ASM_STAC in Linux for example: #define ASM_CLAC \ 661: ASM_NOP3 ; \ .pushsection .altinstr_replacement, "ax" ; \ 662: __ASM_CLAC ; \ .popsection ; \ .pushsection .altinstructions, "a" ; \ altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3 ; \ .popsection ASM_CLAC is defined as NOP by default, it puts the real CLAC instruction in section "altinstr_replacement" and the needed information to " altinstructions " section, which is useful to replace the default definition by the alternative one. Here is the routine call path: start_kernel () --> check_bugs() --> alternative_instructions(). In function alternative_instructions(), it will check the related features in CPU, if it exists, the alternative definition will overwrite the default one. So there is no conditional branches after this replacement when the Macro is being used. Do you think we need to port this whole mechanism to Xen to support CLAC/STAC? I am not sure if it is a little overkilled. BTW, from the Linux implementation, I think we don't need to check the 'cr4' for the macros, we just need to check whether the feature exists in the CPU. So is it acceptable to use the original code by eliminating the cr4 check? Thanks a lot in advance! > > +#else > > +#define ASM_STAC \ > > + "\npushq %%rax\n\t" \ > > + "leaq boot_cpu_data(%%rip),%%rax\n\t" \ > > + "btl $" __stringify(X86_FEATURE_SMAP) "-7*32," \ > > + __stringify(CPUINFO86_leaf7_features) "(%%rax)\n\t" \ > > + "jnc 881f\n\t" \ > > + "movq %%cr4,%%rax\n\t" \ > > + "testl $" __stringify(X86_CR4_SMAP) ",%%eax\n\t" \ > > + "jz 881f\n\t" \ > > + __stringify(__ASM_STAC) "\n\t" \ > > +"881: popq %%rax" > > + > > +#define ASM_CLAC \ > > + "\npushq %%rax\n\t" \ > > + "leaq boot_cpu_data(%%rip),%%rax\n\t" \ > > + "btl $" __stringify(X86_FEATURE_SMAP) "-7*32," \ > > + __stringify(CPUINFO86_leaf7_features) "(%%rax)\n\t" \ > > + "jnc 881f\n\t" \ > > + "movq %%cr4,%%rax\n\t" \ > > + "testl $" __stringify(X86_CR4_SMAP) ",%%eax\n\t" \ > > + "jz 881f\n\t" \ > > + __stringify(__ASM_CLAC) "\n\t" \ > > +"881: popq %%rax" > > All the same applies to these of course. > > Jan Thanks, Feng ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 7:41 ` Wu, Feng @ 2014-04-22 8:07 ` Jan Beulich 2014-04-22 8:46 ` Wu, Feng 2014-04-22 9:43 ` [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions Andrew Cooper 0 siblings, 2 replies; 13+ messages in thread From: Jan Beulich @ 2014-04-22 8:07 UTC (permalink / raw) To: Feng Wu Cc: Eddie Dong, Ian.Campbell@citrix.com, Jun Nakajima, xen-devel@lists.xen.org >>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> That said, the macro contents itself is horrible too: A control register >> access and two conditional branches in code intended to be used in >> fast paths? Definitely not an option. Even the simplest possible >> solution - adding a global flag to be checked here - would already be >> questionable. Hence I think you should at least consider porting over >> proper instruction patching abstraction from Linux. >> > > Jan, I did some investigation about how to handle this two instructions > in Linux, basically, it uses the alternatives mechanism to handle these > kind of cases. Let's take the following definition of ASM_STAC in Linux for > example: > > #define ASM_CLAC \ > 661: ASM_NOP3 ; \ > .pushsection .altinstr_replacement, "ax" ; \ > 662: __ASM_CLAC ; \ > .popsection ; \ > .pushsection .altinstructions, "a" ; \ > altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3 ; \ > .popsection > > ASM_CLAC is defined as NOP by default, it puts the real CLAC instruction in > section "altinstr_replacement" and > the needed information to " altinstructions " section, which is useful to > replace the default > definition by the alternative one. Here is the routine call path: > start_kernel () --> check_bugs() --> alternative_instructions(). > > In function alternative_instructions(), it will check the related features > in CPU, if it exists, the alternative definition will > overwrite the default one. So there is no conditional branches after this > replacement when the Macro is being used. > > Do you think we need to port this whole mechanism to Xen to support > CLAC/STAC? I am not sure if it is a little overkilled. Obviously we could use this machinery for other things. But whether it's needed here depends on the alternatives. > BTW, from the Linux implementation, I think we don't need to check the 'cr4' > for the macros, we just need > to check whether the feature exists in the CPU. So is it acceptable to use > the original code by eliminating the cr4 check? That _might_ be acceptable if you bring it down to just the three really necessary instructions: BT, JNC, CLAC/STAC. But the "might" has to stand - this, after all, remains an addition of a conditional branch (and for the performance of STAC/CLAC I haven't seen any documentation so far either) to several fast paths, and hence the patching alternative can't be discarded as the potentially better one. Jan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 8:07 ` Jan Beulich @ 2014-04-22 8:46 ` Wu, Feng 2014-04-22 9:17 ` Jan Beulich 2014-04-22 9:43 ` [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions Andrew Cooper 1 sibling, 1 reply; 13+ messages in thread From: Wu, Feng @ 2014-04-22 8:46 UTC (permalink / raw) To: Jan Beulich Cc: Dong, Eddie, Ian.Campbell@citrix.com, Nakajima, Jun, xen-devel@lists.xen.org > -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Tuesday, April 22, 2014 4:07 PM > To: Wu, Feng > Cc: Ian.Campbell@citrix.com; Dong, Eddie; Nakajima, Jun; > xen-devel@lists.xen.org > Subject: RE: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions > > >>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> That said, the macro contents itself is horrible too: A control register > >> access and two conditional branches in code intended to be used in > >> fast paths? Definitely not an option. Even the simplest possible > >> solution - adding a global flag to be checked here - would already be > >> questionable. Hence I think you should at least consider porting over > >> proper instruction patching abstraction from Linux. > >> > > > > Jan, I did some investigation about how to handle this two instructions > > in Linux, basically, it uses the alternatives mechanism to handle these > > kind of cases. Let's take the following definition of ASM_STAC in Linux for > > example: > > > > #define ASM_CLAC > \ > > 661: ASM_NOP3 ; > \ > > .pushsection .altinstr_replacement, "ax" ; > \ > > 662: __ASM_CLAC ; > \ > > .popsection ; > \ > > .pushsection .altinstructions, "a" ; > \ > > altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3 ; > \ > > .popsection > > > > ASM_CLAC is defined as NOP by default, it puts the real CLAC instruction in > > section "altinstr_replacement" and > > the needed information to " altinstructions " section, which is useful to > > replace the default > > definition by the alternative one. Here is the routine call path: > > start_kernel () --> check_bugs() --> alternative_instructions(). > > > > In function alternative_instructions(), it will check the related features > > in CPU, if it exists, the alternative definition will > > overwrite the default one. So there is no conditional branches after this > > replacement when the Macro is being used. > > > > Do you think we need to port this whole mechanism to Xen to support > > CLAC/STAC? I am not sure if it is a little overkilled. > > Obviously we could use this machinery for other things. But whether it's > needed here depends on the alternatives. > > > BTW, from the Linux implementation, I think we don't need to check the 'cr4' > > for the macros, we just need > > to check whether the feature exists in the CPU. So is it acceptable to use > > the original code by eliminating the cr4 check? > > That _might_ be acceptable if you bring it down to just the three > really necessary instructions: BT, JNC, CLAC/STAC. But the "might" > has to stand - this, after all, remains an addition of a conditional > branch (and for the performance of STAC/CLAC I haven't seen any > documentation so far either) to several fast paths, and hence the > patching alternative can't be discarded as the potentially better one. > Since the alternatives mechanism in Linux is something common and independent and needs a bit more efforts to be ported to Xen, can we use the method I mentioned above at the current stage. After that I will have a fully think about how to port the alternatives mechanism Xen. What do you think about this? > Jan Thanks, Feng ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 8:46 ` Wu, Feng @ 2014-04-22 9:17 ` Jan Beulich 2014-04-22 12:19 ` Wu, Feng 0 siblings, 1 reply; 13+ messages in thread From: Jan Beulich @ 2014-04-22 9:17 UTC (permalink / raw) To: Feng Wu Cc: Eddie Dong, Ian.Campbell@citrix.com, Jun Nakajima, xen-devel@lists.xen.org >>> On 22.04.14 at 10:46, <feng.wu@intel.com> wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> >>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: >> > BTW, from the Linux implementation, I think we don't need to check the 'cr4' >> > for the macros, we just need >> > to check whether the feature exists in the CPU. So is it acceptable to use >> > the original code by eliminating the cr4 check? >> >> That _might_ be acceptable if you bring it down to just the three >> really necessary instructions: BT, JNC, CLAC/STAC. But the "might" >> has to stand - this, after all, remains an addition of a conditional >> branch (and for the performance of STAC/CLAC I haven't seen any >> documentation so far either) to several fast paths, and hence the >> patching alternative can't be discarded as the potentially better one. >> > > Since the alternatives mechanism in Linux is something common and > independent and needs > a bit more efforts to be ported to Xen, can we use the method I mentioned > above > at the current stage. After that I will have a fully think about how to port > the > alternatives mechanism Xen. > > What do you think about this? Generally this would seem acceptable (as long as you give at least a rough estimate on when to expect that second step), but then we have this sad experience with promises by Intel engineers to work on certain things... Jan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 9:17 ` Jan Beulich @ 2014-04-22 12:19 ` Wu, Feng 2014-04-22 13:09 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 13+ messages in thread From: Wu, Feng @ 2014-04-22 12:19 UTC (permalink / raw) To: Jan Beulich Cc: Dong, Eddie, Ian.Campbell@citrix.com, Nakajima, Jun, xen-devel@lists.xen.org > -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Tuesday, April 22, 2014 5:17 PM > To: Wu, Feng > Cc: Ian.Campbell@citrix.com; Dong, Eddie; Nakajima, Jun; > xen-devel@lists.xen.org > Subject: RE: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions > > >>> On 22.04.14 at 10:46, <feng.wu@intel.com> wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> >>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: > >> > BTW, from the Linux implementation, I think we don't need to check the > 'cr4' > >> > for the macros, we just need > >> > to check whether the feature exists in the CPU. So is it acceptable to use > >> > the original code by eliminating the cr4 check? > >> > >> That _might_ be acceptable if you bring it down to just the three > >> really necessary instructions: BT, JNC, CLAC/STAC. But the "might" > >> has to stand - this, after all, remains an addition of a conditional > >> branch (and for the performance of STAC/CLAC I haven't seen any > >> documentation so far either) to several fast paths, and hence the > >> patching alternative can't be discarded as the potentially better one. > >> > > > > Since the alternatives mechanism in Linux is something common and > > independent and needs > > a bit more efforts to be ported to Xen, can we use the method I mentioned > > above > > at the current stage. After that I will have a fully think about how to port > > the > > alternatives mechanism Xen. > > > > What do you think about this? > > Generally this would seem acceptable (as long as you give at least a > rough estimate on when to expect that second step), but then we > have this sad experience with promises by Intel engineers to work > on certain things... > Thanks a lot! I think I can work on the alternative mechanism after this SMAP patch is finished. > Jan Thanks, Feng ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 12:19 ` Wu, Feng @ 2014-04-22 13:09 ` Konrad Rzeszutek Wilk 2014-04-23 13:43 ` Wu, Feng 0 siblings, 1 reply; 13+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-22 13:09 UTC (permalink / raw) To: Wu, Feng Cc: Nakajima, Jun, Dong, Eddie, Ian.Campbell@citrix.com, Jan Beulich, xen-devel@lists.xen.org On Tue, Apr 22, 2014 at 12:19:48PM +0000, Wu, Feng wrote: > > > > -----Original Message----- > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: Tuesday, April 22, 2014 5:17 PM > > To: Wu, Feng > > Cc: Ian.Campbell@citrix.com; Dong, Eddie; Nakajima, Jun; > > xen-devel@lists.xen.org > > Subject: RE: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions > > > > >>> On 22.04.14 at 10:46, <feng.wu@intel.com> wrote: > > >> From: Jan Beulich [mailto:JBeulich@suse.com] > > >> >>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: > > >> > BTW, from the Linux implementation, I think we don't need to check the > > 'cr4' > > >> > for the macros, we just need > > >> > to check whether the feature exists in the CPU. So is it acceptable to use > > >> > the original code by eliminating the cr4 check? > > >> > > >> That _might_ be acceptable if you bring it down to just the three > > >> really necessary instructions: BT, JNC, CLAC/STAC. But the "might" > > >> has to stand - this, after all, remains an addition of a conditional > > >> branch (and for the performance of STAC/CLAC I haven't seen any > > >> documentation so far either) to several fast paths, and hence the > > >> patching alternative can't be discarded as the potentially better one. > > >> > > > > > > Since the alternatives mechanism in Linux is something common and > > > independent and needs > > > a bit more efforts to be ported to Xen, can we use the method I mentioned > > > above > > > at the current stage. After that I will have a fully think about how to port > > > the > > > alternatives mechanism Xen. > > > > > > What do you think about this? > > > > Generally this would seem acceptable (as long as you give at least a > > rough estimate on when to expect that second step), but then we > > have this sad experience with promises by Intel engineers to work > > on certain things... > > > > Thanks a lot! > I think I can work on the alternative mechanism after this SMAP patch is finished. Any time estimates when the alternative patching mechanism would be done? Asking because it seems to me that we would want that in Xen 4.5 - so need to figure out your timeline to fold it in the release time-frame. > > > Jan > > Thanks, > Feng > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 13:09 ` Konrad Rzeszutek Wilk @ 2014-04-23 13:43 ` Wu, Feng 2014-04-23 14:52 ` Is: alternative_asm as dependency for STAC/CLAC/new features? Was:Re: " Konrad Rzeszutek Wilk 0 siblings, 1 reply; 13+ messages in thread From: Wu, Feng @ 2014-04-23 13:43 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Nakajima, Jun, Dong, Eddie, Ian.Campbell@citrix.com, Jan Beulich, xen-devel@lists.xen.org > -----Original Message----- > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > Sent: Tuesday, April 22, 2014 9:10 PM > To: Wu, Feng > Cc: Jan Beulich; Dong, Eddie; Ian.Campbell@citrix.com; Nakajima, Jun; > xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v1 1/6] x86: Add support for STAC/CLAC > instructions > > On Tue, Apr 22, 2014 at 12:19:48PM +0000, Wu, Feng wrote: > > > > > > > -----Original Message----- > > > From: Jan Beulich [mailto:JBeulich@suse.com] > > > Sent: Tuesday, April 22, 2014 5:17 PM > > > To: Wu, Feng > > > Cc: Ian.Campbell@citrix.com; Dong, Eddie; Nakajima, Jun; > > > xen-devel@lists.xen.org > > > Subject: RE: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions > > > > > > >>> On 22.04.14 at 10:46, <feng.wu@intel.com> wrote: > > > >> From: Jan Beulich [mailto:JBeulich@suse.com] > > > >> >>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: > > > >> > BTW, from the Linux implementation, I think we don't need to check > the > > > 'cr4' > > > >> > for the macros, we just need > > > >> > to check whether the feature exists in the CPU. So is it acceptable to > use > > > >> > the original code by eliminating the cr4 check? > > > >> > > > >> That _might_ be acceptable if you bring it down to just the three > > > >> really necessary instructions: BT, JNC, CLAC/STAC. But the "might" > > > >> has to stand - this, after all, remains an addition of a conditional > > > >> branch (and for the performance of STAC/CLAC I haven't seen any > > > >> documentation so far either) to several fast paths, and hence the > > > >> patching alternative can't be discarded as the potentially better one. > > > >> > > > > > > > > Since the alternatives mechanism in Linux is something common and > > > > independent and needs > > > > a bit more efforts to be ported to Xen, can we use the method I > mentioned > > > > above > > > > at the current stage. After that I will have a fully think about how to port > > > > the > > > > alternatives mechanism Xen. > > > > > > > > What do you think about this? > > > > > > Generally this would seem acceptable (as long as you give at least a > > > rough estimate on when to expect that second step), but then we > > > have this sad experience with promises by Intel engineers to work > > > on certain things... > > > > > > > Thanks a lot! > > I think I can work on the alternative mechanism after this SMAP patch is > finished. > > Any time estimates when the alternative patching mechanism would be done? > Asking > because it seems to me that we would want that in Xen 4.5 - so need to figure > out > your timeline to fold it in the release time-frame. I am sorry I feel it is a little hard for me to say when the patch will be done, since I am not quite clear about how big the effort would be right now. But I can start porting it ASAP after SMAP is done. BTW, do you know when will Xen 4.5 be released? About 4~5 months later? > > > > > > Jan > > > > Thanks, > > Feng > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel Thanks, Feng ^ permalink raw reply [flat|nested] 13+ messages in thread
* Is: alternative_asm as dependency for STAC/CLAC/new features? Was:Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-23 13:43 ` Wu, Feng @ 2014-04-23 14:52 ` Konrad Rzeszutek Wilk 2014-04-23 15:59 ` Is: alternative_asm as dependency for STAC/CLAC/new features? Jan Beulich 0 siblings, 1 reply; 13+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-04-23 14:52 UTC (permalink / raw) To: Wu, Feng Cc: Nakajima, Jun, Dong, Eddie, Ian.Campbell@citrix.com, Jan Beulich, xen-devel@lists.xen.org On Wed, Apr 23, 2014 at 01:43:35PM +0000, Wu, Feng wrote: > > > > -----Original Message----- > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > Sent: Tuesday, April 22, 2014 9:10 PM > > To: Wu, Feng > > Cc: Jan Beulich; Dong, Eddie; Ian.Campbell@citrix.com; Nakajima, Jun; > > xen-devel@lists.xen.org > > Subject: Re: [Xen-devel] [PATCH v1 1/6] x86: Add support for STAC/CLAC > > instructions > > > > On Tue, Apr 22, 2014 at 12:19:48PM +0000, Wu, Feng wrote: > > > > > > > > > > -----Original Message----- > > > > From: Jan Beulich [mailto:JBeulich@suse.com] > > > > Sent: Tuesday, April 22, 2014 5:17 PM > > > > To: Wu, Feng > > > > Cc: Ian.Campbell@citrix.com; Dong, Eddie; Nakajima, Jun; > > > > xen-devel@lists.xen.org > > > > Subject: RE: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions > > > > > > > > >>> On 22.04.14 at 10:46, <feng.wu@intel.com> wrote: > > > > >> From: Jan Beulich [mailto:JBeulich@suse.com] > > > > >> >>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: > > > > >> > BTW, from the Linux implementation, I think we don't need to check > > the > > > > 'cr4' > > > > >> > for the macros, we just need > > > > >> > to check whether the feature exists in the CPU. So is it acceptable to > > use > > > > >> > the original code by eliminating the cr4 check? > > > > >> > > > > >> That _might_ be acceptable if you bring it down to just the three > > > > >> really necessary instructions: BT, JNC, CLAC/STAC. But the "might" > > > > >> has to stand - this, after all, remains an addition of a conditional > > > > >> branch (and for the performance of STAC/CLAC I haven't seen any > > > > >> documentation so far either) to several fast paths, and hence the > > > > >> patching alternative can't be discarded as the potentially better one. > > > > >> > > > > > > > > > > Since the alternatives mechanism in Linux is something common and > > > > > independent and needs > > > > > a bit more efforts to be ported to Xen, can we use the method I > > mentioned > > > > > above > > > > > at the current stage. After that I will have a fully think about how to port > > > > > the > > > > > alternatives mechanism Xen. > > > > > > > > > > What do you think about this? > > > > > > > > Generally this would seem acceptable (as long as you give at least a > > > > rough estimate on when to expect that second step), but then we > > > > have this sad experience with promises by Intel engineers to work > > > > on certain things... > > > > > > > > > > Thanks a lot! > > > I think I can work on the alternative mechanism after this SMAP patch is > > finished. > > > > Any time estimates when the alternative patching mechanism would be done? > > Asking > > because it seems to me that we would want that in Xen 4.5 - so need to figure > > out > > your timeline to fold it in the release time-frame. > > I am sorry I feel it is a little hard for me to say when the patch will be done, since I am not > quite clear about how big the effort would be right now. But I can start porting it ASAP > after SMAP is done. BTW, do you know when will Xen 4.5 be released? About 4~5 months later? The 4.5 roadmap is not yet clear. I hope that at the Xen Hackathon it will be discussed by the release manager. I think that without this runtime patching the hypervisor will suffer a performance penalty on hosts that don't support this. I say *think* becuase I don't have any hard numbers. And as such this argument might be completly wrong if testing shows otherwise (say, running this on AMD hardware with and without these patches). Bearing that in mind if I this patching is not done by the time Xen 4.5 hits feature freeze window it might be neccessary to: - #ifdef out the code so that it only is compiled in for folks who really want this and are OK with the potential performance setback. - But this #ifdef is a maintaince nightmare - code often bitrots, another QA matrix row, etc. In which case reverting the code is a better option and then it can be targetted for Xen 4.6 Perhaps reorganizing your deliverables would be better. As in, focus on getting alternative_asm first in, and _then_ on this feature. Aka, alternative_asm is a dependency and this work requires that in addition. Or all of this pointless and testing on AMD/Intel hardware with or without these patches (and with / without the CPU feature) shows no performance degradation. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Is: alternative_asm as dependency for STAC/CLAC/new features? 2014-04-23 14:52 ` Is: alternative_asm as dependency for STAC/CLAC/new features? Was:Re: " Konrad Rzeszutek Wilk @ 2014-04-23 15:59 ` Jan Beulich 0 siblings, 0 replies; 13+ messages in thread From: Jan Beulich @ 2014-04-23 15:59 UTC (permalink / raw) To: Feng Wu, Konrad Rzeszutek Wilk Cc: Eddie Dong, Ian.Campbell@citrix.com, Jun Nakajima, xen-devel@lists.xen.org >>> On 23.04.14 at 16:52, <konrad.wilk@oracle.com> wrote: > I think that without this runtime patching the hypervisor will > suffer a performance penalty on hosts that don't support this. Host that do support this would suffer too - a conditional branch doesn't come for free even if not taken. Also I don't think there's a question whether performance would be affected, but only the one by how much (or little). Jan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 8:07 ` Jan Beulich 2014-04-22 8:46 ` Wu, Feng @ 2014-04-22 9:43 ` Andrew Cooper 2014-04-22 9:48 ` Jan Beulich 1 sibling, 1 reply; 13+ messages in thread From: Andrew Cooper @ 2014-04-22 9:43 UTC (permalink / raw) To: Jan Beulich Cc: Eddie Dong, Feng Wu, Ian.Campbell@citrix.com, Jun Nakajima, xen-devel@lists.xen.org On 22/04/14 09:07, Jan Beulich wrote: >>>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: >>> From: Jan Beulich [mailto:JBeulich@suse.com] >>> That said, the macro contents itself is horrible too: A control register >>> access and two conditional branches in code intended to be used in >>> fast paths? Definitely not an option. Even the simplest possible >>> solution - adding a global flag to be checked here - would already be >>> questionable. Hence I think you should at least consider porting over >>> proper instruction patching abstraction from Linux. >>> >> Jan, I did some investigation about how to handle this two instructions >> in Linux, basically, it uses the alternatives mechanism to handle these >> kind of cases. Let's take the following definition of ASM_STAC in Linux for >> example: >> >> #define ASM_CLAC \ >> 661: ASM_NOP3 ; \ >> .pushsection .altinstr_replacement, "ax" ; \ >> 662: __ASM_CLAC ; \ >> .popsection ; \ >> .pushsection .altinstructions, "a" ; \ >> altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3 ; \ >> .popsection >> >> ASM_CLAC is defined as NOP by default, it puts the real CLAC instruction in >> section "altinstr_replacement" and >> the needed information to " altinstructions " section, which is useful to >> replace the default >> definition by the alternative one. Here is the routine call path: >> start_kernel () --> check_bugs() --> alternative_instructions(). >> >> In function alternative_instructions(), it will check the related features >> in CPU, if it exists, the alternative definition will >> overwrite the default one. So there is no conditional branches after this >> replacement when the Macro is being used. >> >> Do you think we need to port this whole mechanism to Xen to support >> CLAC/STAC? I am not sure if it is a little overkilled. > Obviously we could use this machinery for other things. But whether it's > needed here depends on the alternatives. > >> BTW, from the Linux implementation, I think we don't need to check the 'cr4' >> for the macros, we just need >> to check whether the feature exists in the CPU. So is it acceptable to use >> the original code by eliminating the cr4 check? > That _might_ be acceptable if you bring it down to just the three > really necessary instructions: BT, JNC, CLAC/STAC. But the "might" > has to stand - this, after all, remains an addition of a conditional > branch (and for the performance of STAC/CLAC I haven't seen any > documentation so far either) to several fast paths, and hence the > patching alternative can't be discarded as the potentially better one. > > Jan copy_{to,from}_guest() are already long paths (particularly for HVM) so a single extra conditional is not going to be too bad (and as after boot it will remain constant, the branch predictor will have a reliable time with it). It would certainly be fine for a v1 to get SMAP support working. ~Andrew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions 2014-04-22 9:43 ` [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions Andrew Cooper @ 2014-04-22 9:48 ` Jan Beulich 0 siblings, 0 replies; 13+ messages in thread From: Jan Beulich @ 2014-04-22 9:48 UTC (permalink / raw) To: Andrew Cooper Cc: Feng Wu, Eddie Dong, Ian.Campbell@citrix.com, Jun Nakajima, xen-devel@lists.xen.org >>> On 22.04.14 at 11:43, <andrew.cooper3@citrix.com> wrote: > On 22/04/14 09:07, Jan Beulich wrote: >>>>> On 22.04.14 at 09:41, <feng.wu@intel.com> wrote: >>>> From: Jan Beulich [mailto:JBeulich@suse.com] >>>> That said, the macro contents itself is horrible too: A control register >>>> access and two conditional branches in code intended to be used in >>>> fast paths? Definitely not an option. Even the simplest possible >>>> solution - adding a global flag to be checked here - would already be >>>> questionable. Hence I think you should at least consider porting over >>>> proper instruction patching abstraction from Linux. >>>> >>> Jan, I did some investigation about how to handle this two instructions >>> in Linux, basically, it uses the alternatives mechanism to handle these >>> kind of cases. Let's take the following definition of ASM_STAC in Linux for >>> example: >>> >>> #define ASM_CLAC \ >>> 661: ASM_NOP3 ; \ >>> .pushsection .altinstr_replacement, "ax" ; \ >>> 662: __ASM_CLAC ; \ >>> .popsection ; \ >>> .pushsection .altinstructions, "a" ; \ >>> altinstruction_entry 661b, 662b, X86_FEATURE_SMAP, 3, 3 ; \ >>> .popsection >>> >>> ASM_CLAC is defined as NOP by default, it puts the real CLAC instruction in >>> section "altinstr_replacement" and >>> the needed information to " altinstructions " section, which is useful to >>> replace the default >>> definition by the alternative one. Here is the routine call path: >>> start_kernel () --> check_bugs() --> alternative_instructions(). >>> >>> In function alternative_instructions(), it will check the related features >>> in CPU, if it exists, the alternative definition will >>> overwrite the default one. So there is no conditional branches after this >>> replacement when the Macro is being used. >>> >>> Do you think we need to port this whole mechanism to Xen to support >>> CLAC/STAC? I am not sure if it is a little overkilled. >> Obviously we could use this machinery for other things. But whether it's >> needed here depends on the alternatives. >> >>> BTW, from the Linux implementation, I think we don't need to check the 'cr4' > >>> for the macros, we just need >>> to check whether the feature exists in the CPU. So is it acceptable to use >>> the original code by eliminating the cr4 check? >> That _might_ be acceptable if you bring it down to just the three >> really necessary instructions: BT, JNC, CLAC/STAC. But the "might" >> has to stand - this, after all, remains an addition of a conditional >> branch (and for the performance of STAC/CLAC I haven't seen any >> documentation so far either) to several fast paths, and hence the >> patching alternative can't be discarded as the potentially better one. > > copy_{to,from}_guest() are already long paths (particularly for HVM) so > a single extra conditional is not going to be too bad (and as after boot > it will remain constant, the branch predictor will have a reliable time > with it). It would certainly be fine for a v1 to get SMAP support working. But that's not the paths I'm concerned about. There's going to be a CLAC at exception, interrupt, and hypercall entry points. Those are the ones I'm concerned about. Jan ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-04-23 15:59 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-15 13:01 [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions Feng Wu 2014-04-15 8:40 ` Jan Beulich 2014-04-22 7:41 ` Wu, Feng 2014-04-22 8:07 ` Jan Beulich 2014-04-22 8:46 ` Wu, Feng 2014-04-22 9:17 ` Jan Beulich 2014-04-22 12:19 ` Wu, Feng 2014-04-22 13:09 ` Konrad Rzeszutek Wilk 2014-04-23 13:43 ` Wu, Feng 2014-04-23 14:52 ` Is: alternative_asm as dependency for STAC/CLAC/new features? Was:Re: " Konrad Rzeszutek Wilk 2014-04-23 15:59 ` Is: alternative_asm as dependency for STAC/CLAC/new features? Jan Beulich 2014-04-22 9:43 ` [PATCH v1 1/6] x86: Add support for STAC/CLAC instructions Andrew Cooper 2014-04-22 9:48 ` Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.