From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 2/3] x86: use CLFLUSHOPT when available Date: Wed, 10 Feb 2016 15:03:52 +0000 Message-ID: <56BB5158.5050507@citrix.com> References: <56BB3DE902000078000D087A@prv-mh.provo.novell.com> <56BB41BC02000078000D08AD@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0032916484525944462==" Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aTWLH-0005xc-L0 for xen-devel@lists.xenproject.org; Wed, 10 Feb 2016 15:06:07 +0000 In-Reply-To: <56BB41BC02000078000D08AD@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , xen-devel Cc: Keir Fraser List-Id: xen-devel@lists.xenproject.org --===============0032916484525944462== Content-Type: multipart/alternative; boundary="------------030909040505020406030901" --------------030909040505020406030901 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit On 10/02/16 12:57, Jan Beulich wrote: > Also drop an unnecessary va adjustment in the code being touched. > > Signed-off-by: Jan Beulich > > --- a/xen/arch/x86/flushtlb.c > +++ b/xen/arch/x86/flushtlb.c > @@ -139,10 +139,12 @@ unsigned int flush_area_local(const void > c->x86_clflush_size && c->x86_cache_size && sz && > ((sz >> 10) < c->x86_cache_size) ) > { > - va = (const void *)((unsigned long)va & ~(sz - 1)); > + alternative(ASM_NOP3, "sfence", X86_FEATURE_CLFLUSHOPT); Why separate? This would be better in the lower alternative(), with one single nop making up the difference in length. That way, processors without CLFLUSHOPT don't suffer the 1 cycle instruction decode stall from the redundant rex prefix. ~Andrew > for ( i = 0; i < sz; i += c->x86_clflush_size ) > - asm volatile ( "clflush %0" > - : : "m" (((const char *)va)[i]) ); > + alternative_input("rex clflush %0", > + "data16 clflush %0", > + X86_FEATURE_CLFLUSHOPT, > + "m" (((const char *)va)[i])); > } > else > { > --- a/xen/include/asm-x86/cpufeature.h > +++ b/xen/include/asm-x86/cpufeature.h > @@ -163,6 +163,7 @@ > #define X86_FEATURE_ADX (7*32+19) /* ADCX, ADOX instructions */ > #define X86_FEATURE_SMAP (7*32+20) /* Supervisor Mode Access Prevention */ > #define X86_FEATURE_PCOMMIT (7*32+22) /* PCOMMIT instruction */ > +#define X86_FEATURE_CLFLUSHOPT (7*32+23) /* CLFLUSHOPT instruction */ > > /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 8 */ > #define X86_FEATURE_PKU (8*32+ 3) /* Protection Keys for Userspace */ > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel --------------030909040505020406030901 Content-Type: text/html; charset="windows-1252" Content-Length: 2969 Content-Transfer-Encoding: quoted-printable
On 10/02/16 12:57, Jan Beulich wrote:
Also drop an unnecessary va adjustment in the code being touched.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -139,10 +139,12 @@ unsigned int flush_area_local(const void
              c->x86_clflush_size && c->x86_cache_size && sz &&
              ((sz >> 10) < c->x86_cache_size) )
         {
-            va =3D (const void *)((unsigned long)va & ~(sz - 1));
+            alternative(ASM_NOP3, "sfence", X86_FEATURE_CLFLUSHOPT);

Why separate=3F=A0 This would be better in the lower alternative(), with one single nop making up the difference in length.=A0 That way, processors without CLFLUSHOPT don't suffer the 1 cycle instruction decode stall from the redundant rex prefix.

~Andrew

             for ( i =3D 0; i < sz; i +=3D c->x86_clflush_size )
-                 asm volatile ( "clflush %0"
-                                : : "m" (((const char *)va)[i]) );
+                 alternative_input("rex clflush %0",
+                                   "data16 clflush %0",
+                                   X86_FEATURE_CLFLUSHOPT,
+                                   "m" (((const char *)va)[i]));
         }
         else
         {
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -163,6 +163,7 @@
 #define X86_FEATURE_ADX		(7*32+19) /* ADCX, ADOX instructions */
 #define X86_FEATURE_SMAP	(7*32+20) /* Supervisor Mode Access Prevention */
 #define X86_FEATURE_PCOMMIT	(7*32+22) /* PCOMMIT instruction */
+#define X86_FEATURE_CLFLUSHOPT	(7*32+23) /* CLFLUSHOPT instruction */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 8 */
 #define X86_FEATURE_PKU	(8*32+ 3) /* Protection Keys for Userspace */





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--------------030909040505020406030901-- --===============0032916484525944462== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============0032916484525944462==--