Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation
@ 2026-06-30  4:55 K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86,
	Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, H. Peter Anvin,
	Thomas Huth, Sean Christopherson, Jisheng Zhang, Alexandre Ghiti,
	Christian Borntraeger, Sven Schnelle

tl;dr

This series introduces runtime_const_mask_32() and uses runtime
constants for __ro_after_init data in futex_hash() hot path. More
information can be found on v2 at
https://lore.kernel.org/lkml/20260316052401.18910-1-kprateek.nayak@amd.com/

Major changes in v5
===================

There was enough interest to use better instruction sequence to cater to
the current use case on ARM and RISC-V so the two implementations have
pivoted to using UBFX and SRLI + SLLI instructions respectively.

This saves two instructions on ARM64 and one instruction on RISC-V per
mask operation.

Future use cases that requires a generic mask patching on these
architectures will trip a BUG_ON() in arch/ specific patching functions
and enough bread crumbs have been left in comments and commit log to
allow an easy switch to the more generic implementation from v4.

Addressing sashiko reviews
==========================

o The operator precedence issue noted on Patch 3 and Patch 5 no longer
  exist as the mask operations are done indirectly within the ASM block
  now.

o The issue regarding usage of runtime_const before their init is moot
  since they are setup before thier first usage. AS for the comments on
  weakly ordered architectures, the platform init is done on BSP before
  the userspace is active.

Testing
=======

Apart from x86, which was build and boot tested on baremetal, all the
other architectures have been build and boot tested with cross-compile +
QEMU with some light sanity testing on each.

Patches are based on:

  git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master

at commit ea9c52e91213d ("Merge branch into tip/master: 'irq/msi'")
(29-06-2026)

Few comments from checkpatch.pl have been ignored to adhere to the style
of the particular file. If something needs addressing, please let me
know and I'll address it with a v5.X fixups unless there is a larger
change that will require a re-spin

Everyone has been Cc'd on the cover-letter and the futex bits for the
context. Respective arch maintainers, reviewers, and whoever got lucky
with get_maintainer.pl have been Cc'd on their respective arch specific
changes. Futex maintainers and the lists will be receiving the whole
series (sorry in advance!)
---
changelog v4..v5:

o Collected tags from Catalin and Charlie on patches that remain
  unchanged in v5. (Thanks a ton!)

o Switched mask operations on ARM64 and RISC-V to use UBFX and SRLI +
  SLLI instructions respectively. (Charlie, Samuel on v2)

o Rebased changes on latest tip:master.

v4: https://lore.kernel.org/lkml/20260430094730.31624-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (5):
  arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
  arm64/runtime-const: Introduce runtime_const_mask_32()
  riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC
  riscv/runtime-const: Introduce runtime_const_mask_32()
  s390/runtime-const: Introduce runtime_const_mask_32()

Peter Zijlstra (3):
  x86/runtime-const: Introduce runtime_const_mask_32()
  asm-generic/runtime-const: Add dummy runtime_const_mask_32()
  futex: Use runtime constants for __futex_hash() hot path

 arch/arm64/include/asm/runtime-const.h | 63 ++++++++++++++++----
 arch/riscv/include/asm/asm.h           |  1 +
 arch/riscv/include/asm/runtime-const.h | 82 ++++++++++++++++++++------
 arch/s390/include/asm/runtime-const.h  | 22 ++++++-
 arch/x86/include/asm/runtime-const.h   | 14 +++++
 include/asm-generic/runtime-const.h    |  1 +
 include/asm-generic/vmlinux.lds.h      |  5 +-
 kernel/futex/core.c                    | 42 +++++++------
 8 files changed, 179 insertions(+), 51 deletions(-)


base-commit: ea9c52e91213d5427c6a2e90cd41bf912fd1ea36
-- 
2.34.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32()
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, H. Peter Anvin,
	Thomas Huth, Sean Christopherson

From: Peter Zijlstra <peterz@infradead.org>

Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.

Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.

  [ prateek: Broke off the x86 chunk, commit message. ]

Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o No changes.
---
 arch/x86/include/asm/runtime-const.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/include/asm/runtime-const.h b/arch/x86/include/asm/runtime-const.h
index 4cd94fdcb45e2..b13f7036c1c9b 100644
--- a/arch/x86/include/asm/runtime-const.h
+++ b/arch/x86/include/asm/runtime-const.h
@@ -41,6 +41,15 @@
 		:"+r" (__ret));					\
 	__ret; })
 
+#define runtime_const_mask_32(val, sym) ({			\
+	typeof(0u+(val)) __ret = (val);				\
+	asm_inline("and $0x12345678, %k0\n1:\n"				\
+		   ".pushsection runtime_mask_" #sym ",\"a\"\n\t"\
+		   ".long 1b - 4 - .\n"				\
+		   ".popsection"				\
+		   : "+r" (__ret));				\
+	__ret; })
+
 #define runtime_const_init(type, sym) do {		\
 	extern s32 __start_runtime_##type##_##sym[];	\
 	extern s32 __stop_runtime_##type##_##sym[];	\
@@ -65,6 +74,11 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
 	*(unsigned char *)where = val;
 }
 
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+	*(unsigned int *)where = val;
+}
+
 static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
 	unsigned long val, s32 *start, s32 *end)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, Jisheng Zhang

The current scheme to directly patch the kernel text for runtime
constants runs into the following issue with futex adapted to using
runtime constants on arm64:

  Unable to handle kernel write to read-only memory at virtual address ...

The pc points to the *p assignment in the following call chain:

  futex_init()
    runtime_const_init(shift, __futex_shift)
      __runtime_fixup_shift()
        *p = cpu_to_le32(insn);

which suggests that core_initcall() is too late to patch the kernel text
directly unlike the "d_hash_shift" which is initialized during
vfs_caches_init_early() before the protections are in place.

Use aarch64_insn_patch_text_nosync() to patch the runtime constants
instead of doing it directly to allow runtime_const_init() slightly
later into the boot.

Since aarch64_insn_patch_text_nosync() calls caches_clean_inval_pou()
internally, __runtime_fixup_caches() ends up being redundant.
runtime_const_init() are rare and the overheads of multiple calls to
caches_clean_inval_pou() instead of batching them together should be
negligible in practice.

The cpu_to_le32() conversion of instruction isn't necessary since it is
handled later in the aarch64_insn_patch_text_nosync() call-chain:

  aarch64_insn_patch_text_nosync(addr, insn)
    aarch64_insn_write(addr, insn)
      __aarch64_insn_write(addr, cpu_to_le32(insn))

Sashiko noted that aarch64_insn_patch_text_nosync() does not expect a
lm_alias() address and Catalin suggested it is safe to drop the
lm_alias() for runtime patching since the kernel text is readable. The
address passed to fixup function is interpreted as a __le32 and
dereferenced as is to read the opcode at the patch site.

No functional changes are intended.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o Collected tag from Catalin (Thanks a ton!)
---
 arch/arm64/include/asm/runtime-const.h | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index c3dbd3ae68f69..838145bc289d2 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -7,6 +7,7 @@
 #endif
 
 #include <asm/cacheflush.h>
+#include <asm/text-patching.h>
 
 /* Sigh. You can still run arm64 in BE mode */
 #include <asm/byteorder.h>
@@ -50,34 +51,26 @@ static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
 	u32 insn = le32_to_cpu(*p);
 	insn &= 0xffe0001f;
 	insn |= (val & 0xffff) << 5;
-	*p = cpu_to_le32(insn);
-}
-
-static inline void __runtime_fixup_caches(void *where, unsigned int insns)
-{
-	unsigned long va = (unsigned long)where;
-	caches_clean_inval_pou(va, va + 4*insns);
+	aarch64_insn_patch_text_nosync(p, insn);
 }
 
 static inline void __runtime_fixup_ptr(void *where, unsigned long val)
 {
-	__le32 *p = lm_alias(where);
+	__le32 *p = where;
 	__runtime_fixup_16(p, val);
 	__runtime_fixup_16(p+1, val >> 16);
 	__runtime_fixup_16(p+2, val >> 32);
 	__runtime_fixup_16(p+3, val >> 48);
-	__runtime_fixup_caches(where, 4);
 }
 
 /* Immediate value is 6 bits starting at bit #16 */
 static inline void __runtime_fixup_shift(void *where, unsigned long val)
 {
-	__le32 *p = lm_alias(where);
+	__le32 *p = where;
 	u32 insn = le32_to_cpu(*p);
 	insn &= 0xffc0ffff;
 	insn |= (val & 63) << 16;
-	*p = cpu_to_le32(insn);
-	__runtime_fixup_caches(where, 1);
+	aarch64_insn_patch_text_nosync(p, insn);
 }
 
 static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32()
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, Jisheng Zhang

Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.

Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. Since all the current use-cases
are of the form GENMASK(n, 0), with n > 0, a single:

  ubfx  w0, w0, #0, #widthm1     // w0 = w0 [widthm1:0]

instruction is used for amd64 to improve instruction dinsity and
performance.

"Arm A-profile A64 Instruction Set Architecture" manual, Sec.
"A64 -- Base Instructions" [1] for UBFX instruction highlights the
immediate "width" is encoded as width minus 1 in imms (Bits [15:10])
which is patched by __runtime_fixup_mask() once the mask is known.

If a future use case arises that needs to tackle arbitrary mask,
consider using:

  movz  w1, #lo16, lsl #0
  movk  w1, #hi16, lsl #16

to patch the 32-bit mask in the asm block and return "__ret & (val)"
from runtime_const_mask_32() which allows compiler to further optimize
the logical and operation. __runtime_fixup_ptr() already patches a
"movz, + movk lsl #16" sequence which can be reused when the need
arises.

A possible implementation for this alternate scheme can be found at [2].

Assisted-by: Claude:claude-sonnet-4-6
Suggested-by: Samuel Holland <samuel.holland@sifive.com>
Suggested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Link: https://developer.arm.com/documentation/ddi0602/2026-03/Base-Instructions/ [1]
Link: https://lore.kernel.org/lkml/20260430094730.31624-4-kprateek.nayak@amd.com/ [2]
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o Pivoted to using the UBFX instruction for masking since the futex
  use-case use masks of form 2^n - 1 (n > 1) since there was enough
  interest to improve instruction density for ARM64 and RISC-V.
  (Charlie, Samuel on v2)

o Dropped Catalin's tag as a result of changed approach.
---
 arch/arm64/include/asm/runtime-const.h | 46 ++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index 838145bc289d2..371c9a4bc2d4b 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -36,6 +36,17 @@
 		:"r" (0u+(val)));				\
 	__ret; })
 
+#define runtime_const_mask_32(val, sym) ({			\
+	unsigned long __ret;					\
+	asm_inline("1:\t"					\
+		"ubfx %w0, %w1, #0, #32\n\t"			\
+		".pushsection runtime_mask_" #sym ",\"a\"\n\t"	\
+		".long 1b - .\n\t"				\
+		".popsection"					\
+		:"=r" (__ret)					\
+		:"r" (0u+(val)));				\
+	__ret; })
+
 #define runtime_const_init(type, sym) do {		\
 	extern s32 __start_runtime_##type##_##sym[];	\
 	extern s32 __stop_runtime_##type##_##sym[];	\
@@ -73,6 +84,41 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
 	aarch64_insn_patch_text_nosync(p, insn);
 }
 
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+	unsigned int width = __fls(val) + 1;
+	__le32 *p = where;
+	u32 insn;
+
+	/*
+	 * XXX: Current implementation only supports patching masks of
+	 * form GENMASK(n, 0) (n >= 0) using a single UBFX instruction
+	 * to improve performance, density, and covers all the current
+	 * use-cases.
+	 *
+	 * When the need arises to support any generic mask, and this
+	 * BUG_ON() is tripped, consider using a:
+	 *
+	 *   movz %w0, #imm16
+	 *   movk %w0, #imm16, lsl #16
+	 *
+	 * sequence to load the 32bit const mask, and perform a logical
+	 * and outside the asm block before returning the result. Fixup
+	 * can simply reuse the existing __runtime_fixup_16() to patch
+	 * the individual mov instructions.
+	 */
+	BUG_ON(!val || width > 32 || (GENMASK(width - 1, 0) != val));
+
+	/*
+	 * The width of the mask is encoded as (width - 1) in imms
+	 * which is 6 bits starting at bit #10.
+	 */
+	insn = le32_to_cpu(*p);
+	insn &= 0xffff03ff;
+	insn |= ((width - 1) & 0x1f) << 10;
+	aarch64_insn_patch_text_nosync(p, insn);
+}
+
 static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
 	unsigned long val, s32 *start, s32 *end)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
                   ` (2 preceding siblings ...)
  2026-06-30  4:55 ` [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-06-30  6:47   ` Guo Ren
  2026-06-30  4:55 ` [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
	Albert Ou
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, Alexandre Ghiti,
	Jisheng Zhang, Guo Ren

Define the placeholder used for lui + addi[w] patching sequence as
RUNTIME_MAGIC and use that instead of open coding the constants in the
inline assembly.

No functional changes intended.

Suggested-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Tested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o Collected tags from Charlie (Thanks a ton!)
---
 arch/riscv/include/asm/runtime-const.h | 38 ++++++++++++++------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
index 900db0a103d05..1ce02605d2e43 100644
--- a/arch/riscv/include/asm/runtime-const.h
+++ b/arch/riscv/include/asm/runtime-const.h
@@ -15,21 +15,23 @@
 
 #include <linux/uaccess.h>
 
+#define RUNTIME_MAGIC __ASM_STR(0x89ABCDEF)
+
 #ifdef CONFIG_32BIT
-#define runtime_const_ptr(sym)					\
-({								\
-	typeof(sym) __ret;					\
-	asm_inline(".option push\n\t"				\
-		".option norvc\n\t"				\
-		"1:\t"						\
-		"lui	%[__ret],0x89abd\n\t"			\
-		"addi	%[__ret],%[__ret],-0x211\n\t"		\
-		".option pop\n\t"				\
-		".pushsection runtime_ptr_" #sym ",\"a\"\n\t"	\
-		".long 1b - .\n\t"				\
-		".popsection"					\
-		: [__ret] "=r" (__ret));			\
-	__ret;							\
+#define runtime_const_ptr(sym)						\
+({									\
+	typeof(sym) __ret;						\
+	asm_inline(".option push\n\t"					\
+		".option norvc\n\t"					\
+		"1:\t"							\
+		"lui	%[__ret], %%hi(" RUNTIME_MAGIC ")\n\t"		\
+		"addi	%[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t"	\
+		".option pop\n\t"					\
+		".pushsection runtime_ptr_" #sym ",\"a\"\n\t"		\
+		".long 1b - .\n\t"					\
+		".popsection"						\
+		: [__ret] "=r" (__ret));				\
+	__ret;								\
 })
 #else
 /*
@@ -46,10 +48,10 @@
 	".option push\n\t"					\
 	".option norvc\n\t"					\
 	"1:\t"							\
-	"lui	%[__ret],0x89abd\n\t"				\
-	"lui	%[__tmp],0x1234\n\t"				\
-	"addiw	%[__ret],%[__ret],-0x211\n\t"			\
-	"addiw	%[__tmp],%[__tmp],0x567\n\t"			\
+	"lui	%[__ret], %%hi(" RUNTIME_MAGIC ")\n\t"		\
+	"lui	%[__tmp], %%hi(" RUNTIME_MAGIC ")\n\t"		\
+	"addiw	%[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t"	\
+	"addiw	%[__tmp],%[__tmp], %%lo(" RUNTIME_MAGIC ")\n\t"	\
 
 #define RISCV_RUNTIME_CONST_64_BASE				\
 	"slli	%[__tmp],%[__tmp],32\n\t"			\
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32()
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
                   ` (3 preceding siblings ...)
  2026-06-30  4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 6/8] s390/runtime-const: " K Prateek Nayak
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
	Albert Ou
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, Alexandre Ghiti,
	Jisheng Zhang

Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.

Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. Since all the current use-cases
are of the form GENMASK(n, 0), with n > 0, following sequence:

  srli a0, a1, imm
  slli a0, a0, imm

is used for RISC-V where imm = (31 - width) to improve instruction
density and performance.

"The RISC-V Instruction Set Manual, Volume I - Unprivileged
Architecture" [1] Sec. 2.4.1 "Integer Register-Immediate Instructions"
notes the immediate shift for SRLI and SLLI are 5 bits wide starting at
bit #10. __runtime_fixup_shift() is reused to patch the immediate shifts
for the two instructions.

If a future use case arises that needs to tackle arbitrary mask,
consider using:

  lui   a0, 0x12346       # upper; +0x800 then >>12 for correct rounding
  addi  a0, a0, 0x678     # lower 12 bits

to patch the 32-bit mask in the asm block and return "__ret & (val)"
from runtime_const_mask_32() which allows compiler to further optimize
the logical and operation. __runtime_fixup_ptr() already patches a
lui + addi sequence which can be reused when the need arises.

A possible implementation for this alternate scheme can be found at [2].

Assisted-by: Claude:claude-sonnet-4-5
Suggested-by: Samuel Holland <samuel.holland@sifive.com>
Suggested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Link: https://docs.riscv.org/reference/isa/_attachments/riscv-unprivileged.pdf [1]
Link: https://lore.kernel.org/lkml/20260430094730.31624-6-kprateek.nayak@amd.com/ [2]
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o Pivoted to SRLI + SLLI sequence for mask operation to extract the
  lower bits for improved instruction desnity (Charlie, Samuel on v2).
---
 arch/riscv/include/asm/asm.h           |  1 +
 arch/riscv/include/asm/runtime-const.h | 44 ++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index e9e8ba83e632f..b8bf842d4c136 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -34,6 +34,7 @@
 #define SZREG		__REG_SEL(8, 4)
 #define LGREG		__REG_SEL(3, 2)
 #define SRLI		__REG_SEL(srliw, srli)
+#define SLLI		__REG_SEL(slliw, slli)
 
 #if __SIZEOF_POINTER__ == 8
 #ifdef __ASSEMBLER__
diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
index 1ce02605d2e43..dbf96c937dbb9 100644
--- a/arch/riscv/include/asm/runtime-const.h
+++ b/arch/riscv/include/asm/runtime-const.h
@@ -159,6 +159,23 @@
 	__ret;							\
 })
 
+#define runtime_const_mask_32(val, sym)				\
+({								\
+	u32 __ret;						\
+	asm_inline(".option push\n\t"				\
+		".option norvc\n\t"				\
+		"1:\t"						\
+		SLLI " %[__ret],%[__val],12\n\t"		\
+		SRLI " %[__ret],%[__ret],12\n\t"		\
+		".option pop\n\t"				\
+		".pushsection runtime_mask_" #sym ",\"a\"\n\t"	\
+		".long 1b - .\n\t"				\
+		".popsection"					\
+		: [__ret] "=r" (__ret)				\
+		: [__val] "r" (val));				\
+	__ret;							\
+})
+
 #define runtime_const_init(type, sym) do {			\
 	extern s32 __start_runtime_##type##_##sym[];		\
 	extern s32 __stop_runtime_##type##_##sym[];		\
@@ -262,6 +279,33 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
 	mutex_unlock(&text_mutex);
 }
 
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+	unsigned int width = __fls(val) + 1;
+
+	/*
+	 * XXX: Current implementation only supports patching masks of
+	 * form GENMASK(width, 0) (width >= 0) using a SRLI + SLLI
+	 * sequence instead of LUI + ADDI + AND sequence to improve
+	 * performance, density, and covers all the current use-cases.
+	 *
+	 * When the need arises to support any generic mask, and this
+	 * BUG_ON() is tripped, consider using a:
+	 *
+	 *   lui  %[__ret], #imm16
+	 *   addi %[__ret], #imm16
+	 *
+	 * sequence to load the 32bit const mask, and perform a logical
+	 * and outside the asm block before returning the result. Fixup
+	 * can simply reuse the existing __runtime_fixup_32() to patch
+	 * the LUI + ADDI sequence.
+	 */
+	BUG_ON(!val || width > 31 || (GENMASK(width - 1, 0) != val));
+
+	__runtime_fixup_shift(where, 32 - width);
+	__runtime_fixup_shift(where + 4, 32 - width);
+}
+
 static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
 				       unsigned long val, s32 *start, s32 *end)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 6/8] s390/runtime-const: Introduce runtime_const_mask_32()
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
                   ` (4 preceding siblings ...)
  2026-06-30  4:55 ` [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
  7 siblings, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, Christian Borntraeger,
	Sven Schnelle

Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.

Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.

GCC generates a:

  nilf %r1,<imm32>

to tackle arbitrary 32-bit masks and the same is implemented here.
Immediate patching pattern for __runtime_fixup_mask() has been adopted
from __runtime_fixup_ptr().

Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o No changes.
---
 arch/s390/include/asm/runtime-const.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/runtime-const.h b/arch/s390/include/asm/runtime-const.h
index 17878b1d048cf..7b71156031ecb 100644
--- a/arch/s390/include/asm/runtime-const.h
+++ b/arch/s390/include/asm/runtime-const.h
@@ -33,6 +33,20 @@
 	__ret;							\
 })
 
+#define runtime_const_mask_32(val, sym)				\
+({								\
+	unsigned int __ret = (val);				\
+								\
+	asm_inline(						\
+		"0:	nilf	%[__ret],12\n"			\
+		".pushsection runtime_mask_" #sym ",\"a\"\n"	\
+		".long 0b - .\n"				\
+		".popsection"					\
+		: [__ret] "+d" (__ret)				\
+		: : "cc");					\
+	__ret;							\
+})
+
 #define runtime_const_init(type, sym) do {			\
 	extern s32 __start_runtime_##type##_##sym[];		\
 	extern s32 __stop_runtime_##type##_##sym[];		\
@@ -43,12 +57,12 @@
 			    __stop_runtime_##type##_##sym);	\
 } while (0)
 
-/* 32-bit immediate for iihf and iilf in bits in I2 field */
 static inline void __runtime_fixup_32(u32 *p, unsigned int val)
 {
 	s390_kernel_write(p, &val, sizeof(val));
 }
 
+/* 32-bit immediate for iihf and iilf in bits in I2 field */
 static inline void __runtime_fixup_ptr(void *where, unsigned long val)
 {
 	__runtime_fixup_32(where + 2, val >> 32);
@@ -65,6 +79,12 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
 	s390_kernel_write(where, &insn, sizeof(insn));
 }
 
+/* 32-bit immediate for nilf in bits in I2 field */
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+	__runtime_fixup_32(where + 2, val);
+}
+
 static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
 				       unsigned long val, s32 *start, s32 *end)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32()
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
                   ` (5 preceding siblings ...)
  2026-06-30  4:55 ` [PATCH v5 6/8] s390/runtime-const: " K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-06-30  4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
  7 siblings, 0 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390

From: Peter Zijlstra <peterz@infradead.org>

Add a dummy runtime_const_mask_32() for all the architectures that do
not support runtime-const.

Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o No changes.
---
 include/asm-generic/runtime-const.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/asm-generic/runtime-const.h b/include/asm-generic/runtime-const.h
index 6704994595145..03e6e3e02401e 100644
--- a/include/asm-generic/runtime-const.h
+++ b/include/asm-generic/runtime-const.h
@@ -10,6 +10,7 @@
  */
 #define runtime_const_ptr(sym) (sym)
 #define runtime_const_shift_right_32(val, sym) ((u32)(val)>>(sym))
+#define runtime_const_mask_32(val, sym) ((u32)(val)&(sym))
 #define runtime_const_init(type,sym) do { } while (0)
 
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
  2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
                   ` (6 preceding siblings ...)
  2026-06-30  4:55 ` [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30  4:55 ` K Prateek Nayak
  2026-07-01  7:57   ` Peter Zijlstra
  2026-07-01 19:58   ` Sebastian Andrzej Siewior
  7 siblings, 2 replies; 16+ messages in thread
From: K Prateek Nayak @ 2026-06-30  4:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86,
	Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev
  Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
	linux-arm-kernel, linux-riscv, linux-s390, H. Peter Anvin,
	Thomas Huth, Sean Christopherson, Jisheng Zhang, Alexandre Ghiti,
	Christian Borntraeger, Sven Schnelle

From: Peter Zijlstra <peterz@infradead.org>

Runtime constify the read-only after init data  __futex_shift(shift_32),
__futex_mask(mask_32), and __futex_queues(ptr) used in __futex_hash()
hot path to avoid referencing global variable.

This also allows __futex_queues to be allocated dynamically to
"nr_node_ids" slots instead of reserving config dependent MAX_NUMNODES
(1 << CONFIG_NODES_SHIFT) worth of slots upfront.

Runtime constants are initialized before their first access and
runtime_const_init() provides necessary barrier to ensure subsequent
accesses are not reordered against their initialization.

No functional changes intended.

  [ prateek: Dynamically allocate __futex_queues, mark the global data
    __ro_after_init since they are constified after futex_init(). ]

Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> # MAX_NUMNODES bloat
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:

o Rebased on latest tip:master.
---
 include/asm-generic/vmlinux.lds.h |  5 +++-
 kernel/futex/core.c               | 42 +++++++++++++++++--------------
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 5659f4b5a1252..53207901d4c15 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -970,7 +970,10 @@
 		RUNTIME_CONST(ptr, __dentry_cache)			\
 		RUNTIME_CONST(ptr, __names_cache)			\
 		RUNTIME_CONST(ptr, __filp_cache)			\
-		RUNTIME_CONST(ptr, __bfilp_cache)
+		RUNTIME_CONST(ptr, __bfilp_cache)			\
+		RUNTIME_CONST(shift, __futex_shift)			\
+		RUNTIME_CONST(mask,  __futex_mask)			\
+		RUNTIME_CONST(ptr,   __futex_queues)
 
 /* Alignment must be consistent with (kunit_suite *) in include/kunit/test.h */
 #define KUNIT_TABLE()							\
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 179b26e9c9341..b2a63ceb6ce98 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -48,23 +48,19 @@
 
 #include <vdso/futex.h>
 
+#include <asm/runtime-const.h>
+
 #include "futex.h"
 #include "../locking/rtmutex_common.h"
 
-/*
- * The base of the bucket array and its size are always used together
- * (after initialization only in futex_hash()), so ensure that they
- * reside in the same cacheline.
- */
-static struct {
-	unsigned long            hashmask;
-	unsigned int		 hashshift;
-	struct futex_hash_bucket *queues[MAX_NUMNODES];
-} __futex_data __read_mostly __aligned(2*sizeof(long));
+static u32 __futex_mask __ro_after_init;
+static u32 __futex_shift __ro_after_init;
+static struct futex_hash_bucket **__futex_queues __ro_after_init;
 
-#define futex_hashmask	(__futex_data.hashmask)
-#define futex_hashshift	(__futex_data.hashshift)
-#define futex_queues	(__futex_data.queues)
+static __always_inline struct futex_hash_bucket **futex_queues(void)
+{
+	return runtime_const_ptr(__futex_queues);
+}
 
 struct futex_private_hash {
 	int		state;
@@ -395,13 +391,13 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_
 		 * NOTE: this isn't perfectly uniform, but it is fast and
 		 * handles sparse node masks.
 		 */
-		node = (hash >> futex_hashshift) % nr_node_ids;
+		node = runtime_const_shift_right_32(hash, __futex_shift) % nr_node_ids;
 		if (!node_possible(node)) {
 			node = find_next_bit_wrap(node_possible_map.bits, nr_node_ids, node);
 		}
 	}
 
-	return &futex_queues[node][hash & futex_hashmask];
+	return &futex_queues()[node][runtime_const_mask_32(hash, __futex_mask)];
 }
 
 /**
@@ -1922,7 +1918,7 @@ int futex_hash_allocate_default(void)
 	 *   16 <= threads * 4 <= global hash size
 	 */
 	buckets = roundup_pow_of_two(4 * threads);
-	buckets = clamp(buckets, 16, futex_hashmask + 1);
+	buckets = clamp(buckets, 16, __futex_mask + 1);
 
 	if (current_buckets >= buckets)
 		return 0;
@@ -2020,10 +2016,19 @@ static int __init futex_init(void)
 	hashsize = max(4, hashsize);
 	hashsize = roundup_pow_of_two(hashsize);
 #endif
-	futex_hashshift = ilog2(hashsize);
+	__futex_mask = hashsize - 1;
+	__futex_shift = ilog2(hashsize);
 	size = sizeof(struct futex_hash_bucket) * hashsize;
 	order = get_order(size);
 
+	__futex_queues = kcalloc(nr_node_ids, sizeof(*__futex_queues), GFP_KERNEL);
+
+	runtime_const_init(shift, __futex_shift);
+	runtime_const_init(mask,  __futex_mask);
+	runtime_const_init(ptr,   __futex_queues);
+
+	BUG_ON(!futex_queues());
+
 	for_each_node(n) {
 		struct futex_hash_bucket *table;
 
@@ -2037,10 +2042,9 @@ static int __init futex_init(void)
 		for (i = 0; i < hashsize; i++)
 			futex_hash_bucket_init(&table[i]);
 
-		futex_queues[n] = table;
+		futex_queues()[n] = table;
 	}
 
-	futex_hashmask = hashsize - 1;
 	pr_info("futex hash table entries: %lu (%lu bytes on %d NUMA nodes, total %lu KiB, %s).\n",
 		hashsize, size, num_possible_nodes(), size * num_possible_nodes() / 1024,
 		order > MAX_PAGE_ORDER ? "vmalloc" : "linear");
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC
  2026-06-30  4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
@ 2026-06-30  6:47   ` Guo Ren
  0 siblings, 0 replies; 16+ messages in thread
From: Guo Ren @ 2026-06-30  6:47 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Darren Hart, Davidlohr Bueso, André Almeida,
	linux-arch, linux-kernel, Samuel Holland, Charlie Jenkins,
	linux-arm-kernel, linux-riscv, linux-s390, Alexandre Ghiti,
	Jisheng Zhang

On Tue, Jun 30, 2026 at 12:57 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Define the placeholder used for lui + addi[w] patching sequence as
> RUNTIME_MAGIC and use that instead of open coding the constants in the
> inline assembly.
>
> No functional changes intended.
>
> Suggested-by: Guo Ren <guoren@kernel.org>
> Reviewed-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
> Tested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> changelog v4..v5:
>
> o Collected tags from Charlie (Thanks a ton!)
> ---
>  arch/riscv/include/asm/runtime-const.h | 38 ++++++++++++++------------
>  1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
> index 900db0a103d05..1ce02605d2e43 100644
> --- a/arch/riscv/include/asm/runtime-const.h
> +++ b/arch/riscv/include/asm/runtime-const.h
> @@ -15,21 +15,23 @@
>
>  #include <linux/uaccess.h>
>
> +#define RUNTIME_MAGIC __ASM_STR(0x89ABCDEF)
> +
>  #ifdef CONFIG_32BIT
> -#define runtime_const_ptr(sym)                                 \
> -({                                                             \
> -       typeof(sym) __ret;                                      \
> -       asm_inline(".option push\n\t"                           \
> -               ".option norvc\n\t"                             \
> -               "1:\t"                                          \
> -               "lui    %[__ret],0x89abd\n\t"                   \
> -               "addi   %[__ret],%[__ret],-0x211\n\t"           \
> -               ".option pop\n\t"                               \
> -               ".pushsection runtime_ptr_" #sym ",\"a\"\n\t"   \
> -               ".long 1b - .\n\t"                              \
> -               ".popsection"                                   \
> -               : [__ret] "=r" (__ret));                        \
> -       __ret;                                                  \
> +#define runtime_const_ptr(sym)                                         \
> +({                                                                     \
> +       typeof(sym) __ret;                                              \
> +       asm_inline(".option push\n\t"                                   \
> +               ".option norvc\n\t"                                     \
> +               "1:\t"                                                  \
> +               "lui    %[__ret], %%hi(" RUNTIME_MAGIC ")\n\t"          \
> +               "addi   %[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t" \
> +               ".option pop\n\t"                                       \
> +               ".pushsection runtime_ptr_" #sym ",\"a\"\n\t"           \
> +               ".long 1b - .\n\t"                                      \
> +               ".popsection"                                           \
> +               : [__ret] "=r" (__ret));                                \
> +       __ret;                                                          \
>  })
>  #else
>  /*
> @@ -46,10 +48,10 @@
>         ".option push\n\t"                                      \
>         ".option norvc\n\t"                                     \
>         "1:\t"                                                  \
> -       "lui    %[__ret],0x89abd\n\t"                           \
> -       "lui    %[__tmp],0x1234\n\t"                            \
> -       "addiw  %[__ret],%[__ret],-0x211\n\t"                   \
> -       "addiw  %[__tmp],%[__tmp],0x567\n\t"                    \
> +       "lui    %[__ret], %%hi(" RUNTIME_MAGIC ")\n\t"          \
> +       "lui    %[__tmp], %%hi(" RUNTIME_MAGIC ")\n\t"          \
> +       "addiw  %[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t" \
> +       "addiw  %[__tmp],%[__tmp], %%lo(" RUNTIME_MAGIC ")\n\t" \
LGTM!

Reviewed-by: Guo Ren <guoren@kernel.org>

-- 
Best Regards
 Guo Ren


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
  2026-06-30  4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
@ 2026-07-01  7:57   ` Peter Zijlstra
  2026-07-01  8:41     ` Sebastian Andrzej Siewior
  2026-07-01 19:58   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2026-07-01  7:57 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Arnd Bergmann, Thomas Gleixner, Ingo Molnar,
	Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86,
	Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
	linux-kernel, Samuel Holland, Charlie Jenkins, linux-arm-kernel,
	linux-riscv, linux-s390, H. Peter Anvin, Thomas Huth,
	Sean Christopherson, Jisheng Zhang, Alexandre Ghiti,
	Christian Borntraeger, Sven Schnelle

On Tue, Jun 30, 2026 at 04:55:31AM +0000, K Prateek Nayak wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> Runtime constify the read-only after init data  __futex_shift(shift_32),
> __futex_mask(mask_32), and __futex_queues(ptr) used in __futex_hash()
> hot path to avoid referencing global variable.
> 
> This also allows __futex_queues to be allocated dynamically to
> "nr_node_ids" slots instead of reserving config dependent MAX_NUMNODES
> (1 << CONFIG_NODES_SHIFT) worth of slots upfront.
> 
> Runtime constants are initialized before their first access and
> runtime_const_init() provides necessary barrier to ensure subsequent
> accesses are not reordered against their initialization.
> 
> No functional changes intended.
> 
>   [ prateek: Dynamically allocate __futex_queues, mark the global data
>     __ro_after_init since they are constified after futex_init(). ]
> 
> Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> # MAX_NUMNODES bloat
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>


The big $1M question: does it actually make it go faster? The whole
point here was performance, right? But I'm not seeing numbers showing
how awesome these patches are.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
  2026-07-01  7:57   ` Peter Zijlstra
@ 2026-07-01  8:41     ` Sebastian Andrzej Siewior
  2026-07-01  9:07       ` K Prateek Nayak
  2026-07-01 11:01       ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path Sebastian Andrzej Siewior
  0 siblings, 2 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-07-01  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: K Prateek Nayak, Arnd Bergmann, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-arch, linux-kernel, Samuel Holland,
	Charlie Jenkins, linux-arm-kernel, linux-riscv, linux-s390,
	H. Peter Anvin, Thomas Huth, Sean Christopherson, Jisheng Zhang,
	Alexandre Ghiti, Christian Borntraeger, Sven Schnelle

On 2026-07-01 09:57:14 [+0200], Peter Zijlstra wrote:
> The big $1M question: does it actually make it go faster? The whole
> point here was performance, right? But I'm not seeing numbers showing
> how awesome these patches are.

I did complain about the about the size of __futex_data which is blown
on distro kernels due to CONFIG_NODES_SHIFT=10 on Debian for instance.
This makes it go away at no extra price but yeah let me boot a big box
and see.
If the performance remains unchanged it is still worth considering due
to size savings on the average box with 1 node. The biggest box I have
access to has four nodes. If I remember correctly, Prateek was saying
that AMD has "normal" boxes which would require =9 for normal operation
and they do run distro kernels so lowering that value is not an option.

Sebastian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
  2026-07-01  8:41     ` Sebastian Andrzej Siewior
@ 2026-07-01  9:07       ` K Prateek Nayak
  2026-07-01 16:17         ` [PATCH] futex: Optimise the size check get_futex_key() Sebastian Andrzej Siewior
  2026-07-01 11:01       ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path Sebastian Andrzej Siewior
  1 sibling, 1 reply; 16+ messages in thread
From: K Prateek Nayak @ 2026-07-01  9:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Peter Zijlstra
  Cc: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Catalin Marinas, Will Deacon, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-arch, linux-kernel, Samuel Holland,
	Charlie Jenkins, linux-arm-kernel, linux-riscv, linux-s390,
	H. Peter Anvin, Thomas Huth, Sean Christopherson, Jisheng Zhang,
	Alexandre Ghiti, Christian Borntraeger, Sven Schnelle

Hello Peter, Sebastian,

On 7/1/2026 2:11 PM, Sebastian Andrzej Siewior wrote:
> On 2026-07-01 09:57:14 [+0200], Peter Zijlstra wrote:
>> The big $1M question: does it actually make it go faster? The whole
>> point here was performance, right? But I'm not seeing numbers showing
>> how awesome these patches are.
> 
> I did complain about the about the size of __futex_data which is blown
> on distro kernels due to CONFIG_NODES_SHIFT=10 on Debian for instance.
> This makes it go away at no extra price but yeah let me boot a big box
> and see.
> If the performance remains unchanged it is still worth considering due
> to size savings on the average box with 1 node. The biggest box I have
> access to has four nodes. If I remember correctly, Prateek was saying
> that AMD has "normal" boxes which would require =9 for normal operation
> and they do run distro kernels so lowering that value is not an option.

Rationale there was with CCX as NUMA, we have 32 NUMA nodes on chip and
with CXL, there is a possibility of 2x that so I suggested NODE_SHIFT
of 7 or 8 should probably cover almost all real hardware without any
added NUMA emulation weirdness.

To answer the million dollar question, I see the following on running
perf bench futex on a 3rd Gen EPYC (2 x 64C/128T)

  +----------------+-----------+-----------+-----------+--------------+
  | Benchmark      | Kernel 1  | Kernel 2  |   Unit    | % Improvement|
  |                |  (avg/5)  |  (avg/5)  |           | (K2 vs K1)   |
  +----------------+-----------+-----------+-----------+--------------+
  | Wake-parallel  |  0.01614  |  0.00456  |    ms     |   +71.75%    |
  | Requeue        |  0.26394  |  0.24644  |    ms     |    +6.63%    |
  | Lock-pi        |     34.0  |     57.2  |  ops/sec  |   +68.24%    |
  +----------------+-----------+-----------+-----------+--------------+

Kernel1 is tip at base commit and Kernel 2 is tip + this series.
perf bench futex hash some insane bimodal behavior on my system with
both tip and tip + series so I've left that variant out for now.

This is only from 5 runs from a single boot. I'll try to grab a
bigger system and check is it makes a difference there.

-- 
Thanks and Regards,
Prateek



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
  2026-07-01  8:41     ` Sebastian Andrzej Siewior
  2026-07-01  9:07       ` K Prateek Nayak
@ 2026-07-01 11:01       ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-07-01 11:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: K Prateek Nayak, Arnd Bergmann, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-arch, linux-kernel, Samuel Holland,
	Charlie Jenkins, linux-arm-kernel, linux-riscv, linux-s390,
	H. Peter Anvin, Thomas Huth, Sean Christopherson, Jisheng Zhang,
	Alexandre Ghiti, Christian Borntraeger, Sven Schnelle

On 2026-07-01 10:41:55 [+0200], To Peter Zijlstra wrote:
> This makes it go away at no extra price but yeah let me boot a big box
> and see.

as-is:
|$ ./perf bench futex hash -f 1 -t 1 -r 10 -b 0
| # Running 'futex/hash' benchmark:
| Run summary [PID 3588]: 1 threads, each operating on 1 [private] futexes for 10 secs.
| 
| [thread  0] futex: 0x5555e5ad4740 [ 6449632 ops/sec ]
| 
| Averaged 6449632 operations/sec (+- 0,00%), total secs = 10
| Futex hashing: global hash

roughly that area, repeated runs usually change the last three digits.
Patched:

| $ ./perf bench futex hash -f 1 -t 1 -r 10 -b 0
| # Running 'futex/hash' benchmark:
| Run summary [PID 2375]: 1 threads, each operating on 1 [private] futexes for 10 secs.
|
| [thread  0] futex: 0x5585ddebd740 [ 6532004 ops/sec ]
|
| Averaged 6532004 operations/sec (+- 0,00%), total secs = 10
| Futex hashing: global hash

for private hash there is change within the noise area for -b 8192.

So we have here +1.28% ops/sec. Not ground breaking, not bad either.

With more threads:
| $ ./perf bench futex hash  -r 30 -b 0
| # Running 'futex/hash' benchmark:
| Run summary [PID 2424]: 144 threads, each operating on 1024 [private] futexes for 30 secs.
| 
| [thread  0] futexes: 0x556f3a3387c0 ... 0x556f3a3397bc [ 2104422 ops/sec ]
…
| [thread 143] futexes: 0x556f3a3d9660 ... 0x556f3a3da65c [ 2105480 ops/sec ]
| 
| Averaged 2111486 operations/sec (+- 0,03%), total secs = 30
| Futex hashing: global hash

To:

| $ ./perf bench futex hash  -r 30 -b 0          
| # Running 'futex/hash' benchmark:                                                              
| Run summary [PID 2723]: 144 threads, each operating on 1024 [private] futexes for 30 secs.
|
|[thread  0] futexes: 0x560a09e487c0 ... 0x560a09e497bc [ 2135688 ops/sec ]
…
|[thread 143] futexes: 0x560a09ee9660 ... 0x560a09eea65c [ 2137668 ops/sec ]
|
| Averaged 2139685 operations/sec (+- 0,03%), total secs = 30
| Futex hashing: global hash

+1.34%. Again, not ground breaking but still visible. And the memory
savings.

That is btw, 7.2-rc1 on a Intel(R) Xeon(R) CPU E7-8890 v3 (4 NUMA
nodes).

Sebastian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] futex: Optimise the size check get_futex_key()
  2026-07-01  9:07       ` K Prateek Nayak
@ 2026-07-01 16:17         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-07-01 16:17 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Peter Zijlstra, Arnd Bergmann, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-arch, linux-kernel, Samuel Holland,
	Charlie Jenkins, linux-arm-kernel, linux-riscv, linux-s390,
	H. Peter Anvin, Thomas Huth, Sean Christopherson, Jisheng Zhang,
	Alexandre Ghiti, Christian Borntraeger, Sven Schnelle

The futex address must be naturally aligned and this is checked via
"address % size" where `address' is the supplied address and `size' is
the expected size of futex. It is guaranteed that `size' is power of two
but the compiler does not see it and creates here a `div' operation
(x86, arm, gcc-15).

We can take advantage of the pow2 property and rewrite it as
"address & (size-1)".

As per testing, the command
|perf bench futex hash -f 1 -b 16384 -t 1 -r 30

improved from
| [thread  0] futex: 0x5619f931f740 [ 7001583 ops/sec ]
to
| [thread  0] futex: 0x55da173e5740 [ 7376137 ops/sec ]

or by 5.3%

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---

Could someone verify this, please? The 5% look a bit high. This is on
top of the series (but not worsen by the series).

 kernel/futex/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 179b26e9c9341..2b00ab510e7d2 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -520,7 +520,7 @@ int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *key,
 	 * The futex address must be "naturally" aligned.
 	 */
 	key->both.offset = address % PAGE_SIZE;
-	if (unlikely((address % size) != 0))
+	if (unlikely((address & (size-1)) != 0))
 		return -EINVAL;
 	address -= key->both.offset;
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
  2026-06-30  4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
  2026-07-01  7:57   ` Peter Zijlstra
@ 2026-07-01 19:58   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-07-01 19:58 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Borislav Petkov, Dave Hansen, x86, Catalin Marinas, Will Deacon,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-arch, linux-kernel, Samuel Holland,
	Charlie Jenkins, linux-arm-kernel, linux-riscv, linux-s390,
	H. Peter Anvin, Thomas Huth, Sean Christopherson, Jisheng Zhang,
	Alexandre Ghiti, Christian Borntraeger, Sven Schnelle

On 2026-06-30 04:55:31 [+0000], K Prateek Nayak wrote:
> --- a/kernel/futex/core.c
> +++ b/kernel/futex/core.c
> @@ -395,13 +391,13 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_
>  		 * NOTE: this isn't perfectly uniform, but it is fast and
>  		 * handles sparse node masks.
>  		 */
> -		node = (hash >> futex_hashshift) % nr_node_ids;
> +		node = runtime_const_shift_right_32(hash, __futex_shift) % nr_node_ids;
>  		if (!node_possible(node)) {
>  			node = find_next_bit_wrap(node_possible_map.bits, nr_node_ids, node);
>  		}

I replaced this with:

diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 79e770d4d166..30d8622958d2 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -382,6 +382,7 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_
 		      key->both.offset);
 
 	if (node == FUTEX_NO_NODE) {
+		u32 node_limit = nr_node_ids;
 		/*
 		 * In case of !FLAGS_NUMA, use some unused hash bits to pick a
 		 * node -- this ensures regular futexes are interleaved across
@@ -391,9 +392,9 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_
 		 * NOTE: this isn't perfectly uniform, but it is fast and
 		 * handles sparse node masks.
 		 */
-		node = runtime_const_shift_right_32(hash, __futex_shift) % nr_node_ids;
-		if (!node_possible(node)) {
-			node = find_next_bit_wrap(node_possible_map.bits, nr_node_ids, node);
+		node = reciprocal_scale(hash, node_limit);
+		if (!node_possible(node)) {
+			node = find_next_bit_wrap(node_possible_map.bits, node_limit, node);
 		}
 	}
 
I don't think it is worse, I hardly see a change perf wise. Sometimes
op/s is reported almost unchanged, sometimes it improves a bit.

What it does it reads nr_node_ids only once (which has no effect here
because I have no sparse node) and it replaces the shift + divl with
imulq + shift.

perf was pointing me to the divl but now it points to the imulq.
¯\_(ツ)_/¯

But having that div gone, can't be bad, can it?

Sebastian


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-07-01 19:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30  4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-06-30  4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-06-30  4:55 ` [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
2026-06-30  4:55 ` [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-06-30  4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
2026-06-30  6:47   ` Guo Ren
2026-06-30  4:55 ` [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-06-30  4:55 ` [PATCH v5 6/8] s390/runtime-const: " K Prateek Nayak
2026-06-30  4:55 ` [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-06-30  4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
2026-07-01  7:57   ` Peter Zijlstra
2026-07-01  8:41     ` Sebastian Andrzej Siewior
2026-07-01  9:07       ` K Prateek Nayak
2026-07-01 16:17         ` [PATCH] futex: Optimise the size check get_futex_key() Sebastian Andrzej Siewior
2026-07-01 11:01       ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path Sebastian Andrzej Siewior
2026-07-01 19:58   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox