public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -tip 1/2] x86/hweight: Fix false output register dependency of POPCNT insn
@ 2025-03-25 16:48 Uros Bizjak
  2025-03-25 16:48 ` [PATCH -tip 2/2] x86/hweight: Use POPCNT when available with X86_NATIVE_CPU option Uros Bizjak
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Uros Bizjak @ 2025-03-25 16:48 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Sandy/Ivy Bridge and later Intel processors, the POPCNT instruction
appears to have a false dependency on the destination register. Even
though the instruction only writes to it, the instruction will wait
until destination is ready before executing. This false dependency
was fixed for Cannon Lake (and later) processors.

Fix false dependency by clearing the destination register first.

The x86_64 defconfig object size increases by 779 bytes:

	    text           data     bss      dec            hex filename
	27341418        4643015  814852 32799285        1f47a35 vmlinux-old.o
	27342197        4643015  814852 32800064        1f47d40 vmlinux-new.o

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/arch_hweight.h | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/arch_hweight.h b/arch/x86/include/asm/arch_hweight.h
index cbc6157f0b4b..aa0b3bd309fc 100644
--- a/arch/x86/include/asm/arch_hweight.h
+++ b/arch/x86/include/asm/arch_hweight.h
@@ -4,12 +4,21 @@
 
 #include <asm/cpufeatures.h>
 
+/*
+ * On Sandy/Ivy Bridge and later Intel processors, the POPCNT instruction
+ * appears to have a false dependency on the destination register. Even
+ * though the instruction only writes to it, the instruction will wait
+ * until destination is ready before executing. This false dependency
+ * was fixed for Cannon Lake (and later) processors.
+ */
+#define ASM_FORCE_CLR "xorl %k[cnt], %k[cnt]\n\t"
+
 #ifdef CONFIG_64BIT
 #define REG_IN "D"
-#define REG_OUT "a"
+#define ASM_CLR ASM_FORCE_CLR
 #else
 #define REG_IN "a"
-#define REG_OUT "a"
+#define ASM_CLR
 #endif
 
 static __always_inline unsigned int __arch_hweight32(unsigned int w)
@@ -18,8 +27,9 @@ static __always_inline unsigned int __arch_hweight32(unsigned int w)
 
 	asm_inline (ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE
 				"call __sw_hweight32",
-				"popcntl %[val], %[cnt]", X86_FEATURE_POPCNT)
-			 : [cnt] "=" REG_OUT (res), ASM_CALL_CONSTRAINT
+				ASM_CLR "popcntl %[val], %[cnt]",
+				X86_FEATURE_POPCNT)
+			 : [cnt] "=a" (res), ASM_CALL_CONSTRAINT
 			 : [val] REG_IN (w));
 
 	return res;
@@ -48,8 +58,9 @@ static __always_inline unsigned long __arch_hweight64(__u64 w)
 
 	asm_inline (ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE
 				"call __sw_hweight64",
-				"popcntq %[val], %[cnt]", X86_FEATURE_POPCNT)
-			 : [cnt] "=" REG_OUT (res), ASM_CALL_CONSTRAINT
+				ASM_CLR "popcntq %[val], %[cnt]",
+				X86_FEATURE_POPCNT)
+			 : [cnt] "=a" (res), ASM_CALL_CONSTRAINT
 			 : [val] REG_IN (w));
 
 	return res;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-03-30 22:44 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-25 16:48 [PATCH -tip 1/2] x86/hweight: Fix false output register dependency of POPCNT insn Uros Bizjak
2025-03-25 16:48 ` [PATCH -tip 2/2] x86/hweight: Use POPCNT when available with X86_NATIVE_CPU option Uros Bizjak
2025-03-25 17:11   ` Borislav Petkov
2025-03-30 15:15     ` Uros Bizjak
2025-03-30 17:31       ` Borislav Petkov
2025-03-30 18:47         ` Ingo Molnar
2025-03-30 19:06           ` Borislav Petkov
2025-03-30 19:20             ` Ingo Molnar
2025-03-30 19:28               ` Borislav Petkov
2025-03-25 21:56   ` Ingo Molnar
2025-03-29  9:19     ` Uros Bizjak
2025-03-29 11:00       ` David Laight
2025-03-30  7:49         ` Uros Bizjak
2025-03-30 18:02           ` David Laight
2025-03-29 23:10       ` H. Peter Anvin
2025-03-30  6:54         ` Uros Bizjak
2025-03-30  9:56       ` Ingo Molnar
2025-03-30 16:07         ` Uros Bizjak
2025-03-30 18:15           ` David Laight
2025-03-30 22:44             ` H. Peter Anvin
2025-03-30 18:54           ` Ingo Molnar
2025-03-25 17:09 ` [PATCH -tip 1/2] x86/hweight: Fix false output register dependency of POPCNT insn Borislav Petkov
2025-03-25 17:17   ` Uros Bizjak
2025-03-25 17:44     ` Borislav Petkov
2025-03-25 21:50 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox