* [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation
@ 2026-06-30 4:55 K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
` (7 more replies)
0 siblings, 8 replies; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86,
Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, H. Peter Anvin,
Thomas Huth, Sean Christopherson, Jisheng Zhang, Alexandre Ghiti,
Christian Borntraeger, Sven Schnelle
tl;dr
This series introduces runtime_const_mask_32() and uses runtime
constants for __ro_after_init data in futex_hash() hot path. More
information can be found on v2 at
https://lore.kernel.org/lkml/20260316052401.18910-1-kprateek.nayak@amd.com/
Major changes in v5
===================
There was enough interest to use better instruction sequence to cater to
the current use case on ARM and RISC-V so the two implementations have
pivoted to using UBFX and SRLI + SLLI instructions respectively.
This saves two instructions on ARM64 and one instruction on RISC-V per
mask operation.
Future use cases that requires a generic mask patching on these
architectures will trip a BUG_ON() in arch/ specific patching functions
and enough bread crumbs have been left in comments and commit log to
allow an easy switch to the more generic implementation from v4.
Addressing sashiko reviews
==========================
o The operator precedence issue noted on Patch 3 and Patch 5 no longer
exist as the mask operations are done indirectly within the ASM block
now.
o The issue regarding usage of runtime_const before their init is moot
since they are setup before thier first usage. AS for the comments on
weakly ordered architectures, the platform init is done on BSP before
the userspace is active.
Testing
=======
Apart from x86, which was build and boot tested on baremetal, all the
other architectures have been build and boot tested with cross-compile +
QEMU with some light sanity testing on each.
Patches are based on:
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
at commit ea9c52e91213d ("Merge branch into tip/master: 'irq/msi'")
(29-06-2026)
Few comments from checkpatch.pl have been ignored to adhere to the style
of the particular file. If something needs addressing, please let me
know and I'll address it with a v5.X fixups unless there is a larger
change that will require a re-spin
Everyone has been Cc'd on the cover-letter and the futex bits for the
context. Respective arch maintainers, reviewers, and whoever got lucky
with get_maintainer.pl have been Cc'd on their respective arch specific
changes. Futex maintainers and the lists will be receiving the whole
series (sorry in advance!)
---
changelog v4..v5:
o Collected tags from Catalin and Charlie on patches that remain
unchanged in v5. (Thanks a ton!)
o Switched mask operations on ARM64 and RISC-V to use UBFX and SRLI +
SLLI instructions respectively. (Charlie, Samuel on v2)
o Rebased changes on latest tip:master.
v4: https://lore.kernel.org/lkml/20260430094730.31624-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (5):
arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
arm64/runtime-const: Introduce runtime_const_mask_32()
riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC
riscv/runtime-const: Introduce runtime_const_mask_32()
s390/runtime-const: Introduce runtime_const_mask_32()
Peter Zijlstra (3):
x86/runtime-const: Introduce runtime_const_mask_32()
asm-generic/runtime-const: Add dummy runtime_const_mask_32()
futex: Use runtime constants for __futex_hash() hot path
arch/arm64/include/asm/runtime-const.h | 63 ++++++++++++++++----
arch/riscv/include/asm/asm.h | 1 +
arch/riscv/include/asm/runtime-const.h | 82 ++++++++++++++++++++------
arch/s390/include/asm/runtime-const.h | 22 ++++++-
arch/x86/include/asm/runtime-const.h | 14 +++++
include/asm-generic/runtime-const.h | 1 +
include/asm-generic/vmlinux.lds.h | 5 +-
kernel/futex/core.c | 42 +++++++------
8 files changed, 179 insertions(+), 51 deletions(-)
base-commit: ea9c52e91213d5427c6a2e90cd41bf912fd1ea36
--
2.34.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32()
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
` (6 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, H. Peter Anvin,
Thomas Huth, Sean Christopherson
From: Peter Zijlstra <peterz@infradead.org>
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.
[ prateek: Broke off the x86 chunk, commit message. ]
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o No changes.
---
arch/x86/include/asm/runtime-const.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/include/asm/runtime-const.h b/arch/x86/include/asm/runtime-const.h
index 4cd94fdcb45e2..b13f7036c1c9b 100644
--- a/arch/x86/include/asm/runtime-const.h
+++ b/arch/x86/include/asm/runtime-const.h
@@ -41,6 +41,15 @@
:"+r" (__ret)); \
__ret; })
+#define runtime_const_mask_32(val, sym) ({ \
+ typeof(0u+(val)) __ret = (val); \
+ asm_inline("and $0x12345678, %k0\n1:\n" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t"\
+ ".long 1b - 4 - .\n" \
+ ".popsection" \
+ : "+r" (__ret)); \
+ __ret; })
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -65,6 +74,11 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
*(unsigned char *)where = val;
}
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ *(unsigned int *)where = val;
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
` (5 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, Jisheng Zhang
The current scheme to directly patch the kernel text for runtime
constants runs into the following issue with futex adapted to using
runtime constants on arm64:
Unable to handle kernel write to read-only memory at virtual address ...
The pc points to the *p assignment in the following call chain:
futex_init()
runtime_const_init(shift, __futex_shift)
__runtime_fixup_shift()
*p = cpu_to_le32(insn);
which suggests that core_initcall() is too late to patch the kernel text
directly unlike the "d_hash_shift" which is initialized during
vfs_caches_init_early() before the protections are in place.
Use aarch64_insn_patch_text_nosync() to patch the runtime constants
instead of doing it directly to allow runtime_const_init() slightly
later into the boot.
Since aarch64_insn_patch_text_nosync() calls caches_clean_inval_pou()
internally, __runtime_fixup_caches() ends up being redundant.
runtime_const_init() are rare and the overheads of multiple calls to
caches_clean_inval_pou() instead of batching them together should be
negligible in practice.
The cpu_to_le32() conversion of instruction isn't necessary since it is
handled later in the aarch64_insn_patch_text_nosync() call-chain:
aarch64_insn_patch_text_nosync(addr, insn)
aarch64_insn_write(addr, insn)
__aarch64_insn_write(addr, cpu_to_le32(insn))
Sashiko noted that aarch64_insn_patch_text_nosync() does not expect a
lm_alias() address and Catalin suggested it is safe to drop the
lm_alias() for runtime patching since the kernel text is readable. The
address passed to fixup function is interpreted as a __le32 and
dereferenced as is to read the opcode at the patch site.
No functional changes are intended.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o Collected tag from Catalin (Thanks a ton!)
---
arch/arm64/include/asm/runtime-const.h | 17 +++++------------
1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index c3dbd3ae68f69..838145bc289d2 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -7,6 +7,7 @@
#endif
#include <asm/cacheflush.h>
+#include <asm/text-patching.h>
/* Sigh. You can still run arm64 in BE mode */
#include <asm/byteorder.h>
@@ -50,34 +51,26 @@ static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
u32 insn = le32_to_cpu(*p);
insn &= 0xffe0001f;
insn |= (val & 0xffff) << 5;
- *p = cpu_to_le32(insn);
-}
-
-static inline void __runtime_fixup_caches(void *where, unsigned int insns)
-{
- unsigned long va = (unsigned long)where;
- caches_clean_inval_pou(va, va + 4*insns);
+ aarch64_insn_patch_text_nosync(p, insn);
}
static inline void __runtime_fixup_ptr(void *where, unsigned long val)
{
- __le32 *p = lm_alias(where);
+ __le32 *p = where;
__runtime_fixup_16(p, val);
__runtime_fixup_16(p+1, val >> 16);
__runtime_fixup_16(p+2, val >> 32);
__runtime_fixup_16(p+3, val >> 48);
- __runtime_fixup_caches(where, 4);
}
/* Immediate value is 6 bits starting at bit #16 */
static inline void __runtime_fixup_shift(void *where, unsigned long val)
{
- __le32 *p = lm_alias(where);
+ __le32 *p = where;
u32 insn = le32_to_cpu(*p);
insn &= 0xffc0ffff;
insn |= (val & 63) << 16;
- *p = cpu_to_le32(insn);
- __runtime_fixup_caches(where, 1);
+ aarch64_insn_patch_text_nosync(p, insn);
}
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32()
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 5:07 ` sashiko-bot
2026-06-30 4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
` (4 subsequent siblings)
7 siblings, 1 reply; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, Jisheng Zhang
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. Since all the current use-cases
are of the form GENMASK(n, 0), with n > 0, a single:
ubfx w0, w0, #0, #widthm1 // w0 = w0 [widthm1:0]
instruction is used for amd64 to improve instruction dinsity and
performance.
"Arm A-profile A64 Instruction Set Architecture" manual, Sec.
"A64 -- Base Instructions" [1] for UBFX instruction highlights the
immediate "width" is encoded as width minus 1 in imms (Bits [15:10])
which is patched by __runtime_fixup_mask() once the mask is known.
If a future use case arises that needs to tackle arbitrary mask,
consider using:
movz w1, #lo16, lsl #0
movk w1, #hi16, lsl #16
to patch the 32-bit mask in the asm block and return "__ret & (val)"
from runtime_const_mask_32() which allows compiler to further optimize
the logical and operation. __runtime_fixup_ptr() already patches a
"movz, + movk lsl #16" sequence which can be reused when the need
arises.
A possible implementation for this alternate scheme can be found at [2].
Assisted-by: Claude:claude-sonnet-4-6
Suggested-by: Samuel Holland <samuel.holland@sifive.com>
Suggested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Link: https://developer.arm.com/documentation/ddi0602/2026-03/Base-Instructions/ [1]
Link: https://lore.kernel.org/lkml/20260430094730.31624-4-kprateek.nayak@amd.com/ [2]
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o Pivoted to using the UBFX instruction for masking since the futex
use-case use masks of form 2^n - 1 (n > 1) since there was enough
interest to improve instruction density for ARM64 and RISC-V.
(Charlie, Samuel on v2)
o Dropped Catalin's tag as a result of changed approach.
---
arch/arm64/include/asm/runtime-const.h | 46 ++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index 838145bc289d2..371c9a4bc2d4b 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -36,6 +36,17 @@
:"r" (0u+(val))); \
__ret; })
+#define runtime_const_mask_32(val, sym) ({ \
+ unsigned long __ret; \
+ asm_inline("1:\t" \
+ "ubfx %w0, %w1, #0, #32\n\t" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
+ ".long 1b - .\n\t" \
+ ".popsection" \
+ :"=r" (__ret) \
+ :"r" (0u+(val))); \
+ __ret; })
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -73,6 +84,41 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
aarch64_insn_patch_text_nosync(p, insn);
}
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ unsigned int width = __fls(val) + 1;
+ __le32 *p = where;
+ u32 insn;
+
+ /*
+ * XXX: Current implementation only supports patching masks of
+ * form GENMASK(n, 0) (n >= 0) using a single UBFX instruction
+ * to improve performance, density, and covers all the current
+ * use-cases.
+ *
+ * When the need arises to support any generic mask, and this
+ * BUG_ON() is tripped, consider using a:
+ *
+ * movz %w0, #imm16
+ * movk %w0, #imm16, lsl #16
+ *
+ * sequence to load the 32bit const mask, and perform a logical
+ * and outside the asm block before returning the result. Fixup
+ * can simply reuse the existing __runtime_fixup_16() to patch
+ * the individual mov instructions.
+ */
+ BUG_ON(!val || width > 32 || (GENMASK(width - 1, 0) != val));
+
+ /*
+ * The width of the mask is encoded as (width - 1) in imms
+ * which is 6 bits starting at bit #10.
+ */
+ insn = le32_to_cpu(*p);
+ insn &= 0xffff03ff;
+ insn |= ((width - 1) & 0x1f) << 10;
+ aarch64_insn_patch_text_nosync(p, insn);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (2 preceding siblings ...)
2026-06-30 4:55 ` [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 6:47 ` Guo Ren
2026-06-30 4:55 ` [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
` (3 subsequent siblings)
7 siblings, 1 reply; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, Alexandre Ghiti,
Jisheng Zhang, Guo Ren
Define the placeholder used for lui + addi[w] patching sequence as
RUNTIME_MAGIC and use that instead of open coding the constants in the
inline assembly.
No functional changes intended.
Suggested-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Tested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o Collected tags from Charlie (Thanks a ton!)
---
arch/riscv/include/asm/runtime-const.h | 38 ++++++++++++++------------
1 file changed, 20 insertions(+), 18 deletions(-)
diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
index 900db0a103d05..1ce02605d2e43 100644
--- a/arch/riscv/include/asm/runtime-const.h
+++ b/arch/riscv/include/asm/runtime-const.h
@@ -15,21 +15,23 @@
#include <linux/uaccess.h>
+#define RUNTIME_MAGIC __ASM_STR(0x89ABCDEF)
+
#ifdef CONFIG_32BIT
-#define runtime_const_ptr(sym) \
-({ \
- typeof(sym) __ret; \
- asm_inline(".option push\n\t" \
- ".option norvc\n\t" \
- "1:\t" \
- "lui %[__ret],0x89abd\n\t" \
- "addi %[__ret],%[__ret],-0x211\n\t" \
- ".option pop\n\t" \
- ".pushsection runtime_ptr_" #sym ",\"a\"\n\t" \
- ".long 1b - .\n\t" \
- ".popsection" \
- : [__ret] "=r" (__ret)); \
- __ret; \
+#define runtime_const_ptr(sym) \
+({ \
+ typeof(sym) __ret; \
+ asm_inline(".option push\n\t" \
+ ".option norvc\n\t" \
+ "1:\t" \
+ "lui %[__ret], %%hi(" RUNTIME_MAGIC ")\n\t" \
+ "addi %[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t" \
+ ".option pop\n\t" \
+ ".pushsection runtime_ptr_" #sym ",\"a\"\n\t" \
+ ".long 1b - .\n\t" \
+ ".popsection" \
+ : [__ret] "=r" (__ret)); \
+ __ret; \
})
#else
/*
@@ -46,10 +48,10 @@
".option push\n\t" \
".option norvc\n\t" \
"1:\t" \
- "lui %[__ret],0x89abd\n\t" \
- "lui %[__tmp],0x1234\n\t" \
- "addiw %[__ret],%[__ret],-0x211\n\t" \
- "addiw %[__tmp],%[__tmp],0x567\n\t" \
+ "lui %[__ret], %%hi(" RUNTIME_MAGIC ")\n\t" \
+ "lui %[__tmp], %%hi(" RUNTIME_MAGIC ")\n\t" \
+ "addiw %[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t" \
+ "addiw %[__tmp],%[__tmp], %%lo(" RUNTIME_MAGIC ")\n\t" \
#define RISCV_RUNTIME_CONST_64_BASE \
"slli %[__tmp],%[__tmp],32\n\t" \
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32()
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (3 preceding siblings ...)
2026-06-30 4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 6/8] s390/runtime-const: " K Prateek Nayak
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, Alexandre Ghiti,
Jisheng Zhang
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. Since all the current use-cases
are of the form GENMASK(n, 0), with n > 0, following sequence:
srli a0, a1, imm
slli a0, a0, imm
is used for RISC-V where imm = (31 - width) to improve instruction
density and performance.
"The RISC-V Instruction Set Manual, Volume I - Unprivileged
Architecture" [1] Sec. 2.4.1 "Integer Register-Immediate Instructions"
notes the immediate shift for SRLI and SLLI are 5 bits wide starting at
bit #10. __runtime_fixup_shift() is reused to patch the immediate shifts
for the two instructions.
If a future use case arises that needs to tackle arbitrary mask,
consider using:
lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
addi a0, a0, 0x678 # lower 12 bits
to patch the 32-bit mask in the asm block and return "__ret & (val)"
from runtime_const_mask_32() which allows compiler to further optimize
the logical and operation. __runtime_fixup_ptr() already patches a
lui + addi sequence which can be reused when the need arises.
A possible implementation for this alternate scheme can be found at [2].
Assisted-by: Claude:claude-sonnet-4-5
Suggested-by: Samuel Holland <samuel.holland@sifive.com>
Suggested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
Link: https://docs.riscv.org/reference/isa/_attachments/riscv-unprivileged.pdf [1]
Link: https://lore.kernel.org/lkml/20260430094730.31624-6-kprateek.nayak@amd.com/ [2]
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o Pivoted to SRLI + SLLI sequence for mask operation to extract the
lower bits for improved instruction desnity (Charlie, Samuel on v2).
---
arch/riscv/include/asm/asm.h | 1 +
arch/riscv/include/asm/runtime-const.h | 44 ++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
index e9e8ba83e632f..b8bf842d4c136 100644
--- a/arch/riscv/include/asm/asm.h
+++ b/arch/riscv/include/asm/asm.h
@@ -34,6 +34,7 @@
#define SZREG __REG_SEL(8, 4)
#define LGREG __REG_SEL(3, 2)
#define SRLI __REG_SEL(srliw, srli)
+#define SLLI __REG_SEL(slliw, slli)
#if __SIZEOF_POINTER__ == 8
#ifdef __ASSEMBLER__
diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
index 1ce02605d2e43..dbf96c937dbb9 100644
--- a/arch/riscv/include/asm/runtime-const.h
+++ b/arch/riscv/include/asm/runtime-const.h
@@ -159,6 +159,23 @@
__ret; \
})
+#define runtime_const_mask_32(val, sym) \
+({ \
+ u32 __ret; \
+ asm_inline(".option push\n\t" \
+ ".option norvc\n\t" \
+ "1:\t" \
+ SLLI " %[__ret],%[__val],12\n\t" \
+ SRLI " %[__ret],%[__ret],12\n\t" \
+ ".option pop\n\t" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
+ ".long 1b - .\n\t" \
+ ".popsection" \
+ : [__ret] "=r" (__ret) \
+ : [__val] "r" (val)); \
+ __ret; \
+})
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -262,6 +279,33 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
mutex_unlock(&text_mutex);
}
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ unsigned int width = __fls(val) + 1;
+
+ /*
+ * XXX: Current implementation only supports patching masks of
+ * form GENMASK(width, 0) (width >= 0) using a SRLI + SLLI
+ * sequence instead of LUI + ADDI + AND sequence to improve
+ * performance, density, and covers all the current use-cases.
+ *
+ * When the need arises to support any generic mask, and this
+ * BUG_ON() is tripped, consider using a:
+ *
+ * lui %[__ret], #imm16
+ * addi %[__ret], #imm16
+ *
+ * sequence to load the 32bit const mask, and perform a logical
+ * and outside the asm block before returning the result. Fixup
+ * can simply reuse the existing __runtime_fixup_32() to patch
+ * the LUI + ADDI sequence.
+ */
+ BUG_ON(!val || width > 31 || (GENMASK(width - 1, 0) != val));
+
+ __runtime_fixup_shift(where, 32 - width);
+ __runtime_fixup_shift(where + 4, 32 - width);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 6/8] s390/runtime-const: Introduce runtime_const_mask_32()
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (4 preceding siblings ...)
2026-06-30 4:55 ` [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
7 siblings, 0 replies; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, Christian Borntraeger,
Sven Schnelle
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.
GCC generates a:
nilf %r1,<imm32>
to tackle arbitrary 32-bit masks and the same is implemented here.
Immediate patching pattern for __runtime_fixup_mask() has been adopted
from __runtime_fixup_ptr().
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o No changes.
---
arch/s390/include/asm/runtime-const.h | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/arch/s390/include/asm/runtime-const.h b/arch/s390/include/asm/runtime-const.h
index 17878b1d048cf..7b71156031ecb 100644
--- a/arch/s390/include/asm/runtime-const.h
+++ b/arch/s390/include/asm/runtime-const.h
@@ -33,6 +33,20 @@
__ret; \
})
+#define runtime_const_mask_32(val, sym) \
+({ \
+ unsigned int __ret = (val); \
+ \
+ asm_inline( \
+ "0: nilf %[__ret],12\n" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n" \
+ ".long 0b - .\n" \
+ ".popsection" \
+ : [__ret] "+d" (__ret) \
+ : : "cc"); \
+ __ret; \
+})
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -43,12 +57,12 @@
__stop_runtime_##type##_##sym); \
} while (0)
-/* 32-bit immediate for iihf and iilf in bits in I2 field */
static inline void __runtime_fixup_32(u32 *p, unsigned int val)
{
s390_kernel_write(p, &val, sizeof(val));
}
+/* 32-bit immediate for iihf and iilf in bits in I2 field */
static inline void __runtime_fixup_ptr(void *where, unsigned long val)
{
__runtime_fixup_32(where + 2, val >> 32);
@@ -65,6 +79,12 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
s390_kernel_write(where, &insn, sizeof(insn));
}
+/* 32-bit immediate for nilf in bits in I2 field */
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ __runtime_fixup_32(where + 2, val);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32()
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (5 preceding siblings ...)
2026-06-30 4:55 ` [PATCH v5 6/8] s390/runtime-const: " K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
7 siblings, 0 replies; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390
From: Peter Zijlstra <peterz@infradead.org>
Add a dummy runtime_const_mask_32() for all the architectures that do
not support runtime-const.
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o No changes.
---
include/asm-generic/runtime-const.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/asm-generic/runtime-const.h b/include/asm-generic/runtime-const.h
index 6704994595145..03e6e3e02401e 100644
--- a/include/asm-generic/runtime-const.h
+++ b/include/asm-generic/runtime-const.h
@@ -10,6 +10,7 @@
*/
#define runtime_const_ptr(sym) (sym)
#define runtime_const_shift_right_32(val, sym) ((u32)(val)>>(sym))
+#define runtime_const_mask_32(val, sym) ((u32)(val)&(sym))
#define runtime_const_init(type,sym) do { } while (0)
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (6 preceding siblings ...)
2026-06-30 4:55 ` [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30 4:55 ` K Prateek Nayak
2026-06-30 5:20 ` sashiko-bot
7 siblings, 1 reply; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 4:55 UTC (permalink / raw)
To: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86,
Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, Samuel Holland, Charlie Jenkins, K Prateek Nayak,
linux-arm-kernel, linux-riscv, linux-s390, H. Peter Anvin,
Thomas Huth, Sean Christopherson, Jisheng Zhang, Alexandre Ghiti,
Christian Borntraeger, Sven Schnelle
From: Peter Zijlstra <peterz@infradead.org>
Runtime constify the read-only after init data __futex_shift(shift_32),
__futex_mask(mask_32), and __futex_queues(ptr) used in __futex_hash()
hot path to avoid referencing global variable.
This also allows __futex_queues to be allocated dynamically to
"nr_node_ids" slots instead of reserving config dependent MAX_NUMNODES
(1 << CONFIG_NODES_SHIFT) worth of slots upfront.
Runtime constants are initialized before their first access and
runtime_const_init() provides necessary barrier to ensure subsequent
accesses are not reordered against their initialization.
No functional changes intended.
[ prateek: Dynamically allocate __futex_queues, mark the global data
__ro_after_init since they are constified after futex_init(). ]
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> # MAX_NUMNODES bloat
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
changelog v4..v5:
o Rebased on latest tip:master.
---
include/asm-generic/vmlinux.lds.h | 5 +++-
kernel/futex/core.c | 42 +++++++++++++++++--------------
2 files changed, 27 insertions(+), 20 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 5659f4b5a1252..53207901d4c15 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -970,7 +970,10 @@
RUNTIME_CONST(ptr, __dentry_cache) \
RUNTIME_CONST(ptr, __names_cache) \
RUNTIME_CONST(ptr, __filp_cache) \
- RUNTIME_CONST(ptr, __bfilp_cache)
+ RUNTIME_CONST(ptr, __bfilp_cache) \
+ RUNTIME_CONST(shift, __futex_shift) \
+ RUNTIME_CONST(mask, __futex_mask) \
+ RUNTIME_CONST(ptr, __futex_queues)
/* Alignment must be consistent with (kunit_suite *) in include/kunit/test.h */
#define KUNIT_TABLE() \
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index 179b26e9c9341..b2a63ceb6ce98 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -48,23 +48,19 @@
#include <vdso/futex.h>
+#include <asm/runtime-const.h>
+
#include "futex.h"
#include "../locking/rtmutex_common.h"
-/*
- * The base of the bucket array and its size are always used together
- * (after initialization only in futex_hash()), so ensure that they
- * reside in the same cacheline.
- */
-static struct {
- unsigned long hashmask;
- unsigned int hashshift;
- struct futex_hash_bucket *queues[MAX_NUMNODES];
-} __futex_data __read_mostly __aligned(2*sizeof(long));
+static u32 __futex_mask __ro_after_init;
+static u32 __futex_shift __ro_after_init;
+static struct futex_hash_bucket **__futex_queues __ro_after_init;
-#define futex_hashmask (__futex_data.hashmask)
-#define futex_hashshift (__futex_data.hashshift)
-#define futex_queues (__futex_data.queues)
+static __always_inline struct futex_hash_bucket **futex_queues(void)
+{
+ return runtime_const_ptr(__futex_queues);
+}
struct futex_private_hash {
int state;
@@ -395,13 +391,13 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_
* NOTE: this isn't perfectly uniform, but it is fast and
* handles sparse node masks.
*/
- node = (hash >> futex_hashshift) % nr_node_ids;
+ node = runtime_const_shift_right_32(hash, __futex_shift) % nr_node_ids;
if (!node_possible(node)) {
node = find_next_bit_wrap(node_possible_map.bits, nr_node_ids, node);
}
}
- return &futex_queues[node][hash & futex_hashmask];
+ return &futex_queues()[node][runtime_const_mask_32(hash, __futex_mask)];
}
/**
@@ -1922,7 +1918,7 @@ int futex_hash_allocate_default(void)
* 16 <= threads * 4 <= global hash size
*/
buckets = roundup_pow_of_two(4 * threads);
- buckets = clamp(buckets, 16, futex_hashmask + 1);
+ buckets = clamp(buckets, 16, __futex_mask + 1);
if (current_buckets >= buckets)
return 0;
@@ -2020,10 +2016,19 @@ static int __init futex_init(void)
hashsize = max(4, hashsize);
hashsize = roundup_pow_of_two(hashsize);
#endif
- futex_hashshift = ilog2(hashsize);
+ __futex_mask = hashsize - 1;
+ __futex_shift = ilog2(hashsize);
size = sizeof(struct futex_hash_bucket) * hashsize;
order = get_order(size);
+ __futex_queues = kcalloc(nr_node_ids, sizeof(*__futex_queues), GFP_KERNEL);
+
+ runtime_const_init(shift, __futex_shift);
+ runtime_const_init(mask, __futex_mask);
+ runtime_const_init(ptr, __futex_queues);
+
+ BUG_ON(!futex_queues());
+
for_each_node(n) {
struct futex_hash_bucket *table;
@@ -2037,10 +2042,9 @@ static int __init futex_init(void)
for (i = 0; i < hashsize; i++)
futex_hash_bucket_init(&table[i]);
- futex_queues[n] = table;
+ futex_queues()[n] = table;
}
- futex_hashmask = hashsize - 1;
pr_info("futex hash table entries: %lu (%lu bytes on %d NUMA nodes, total %lu KiB, %s).\n",
hashsize, size, num_possible_nodes(), size * num_possible_nodes() / 1024,
order > MAX_PAGE_ORDER ? "vmalloc" : "linear");
--
2.34.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32()
2026-06-30 4:55 ` [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-06-30 5:07 ` sashiko-bot
2026-06-30 8:16 ` K Prateek Nayak
0 siblings, 1 reply; 13+ messages in thread
From: sashiko-bot @ 2026-06-30 5:07 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Calling `__fls(val)` before `BUG_ON(!val)` invokes undefined behavior for `val == 0`, potentially allowing the compiler to optimize away the safety check.
- [Low] The commit message incorrectly references `amd64` instead of `arm64` and misspells `density`.
--
commit e9b79c78d336baf757f7578713831df9b1c4c1b9
Author: K Prateek Nayak <kprateek.nayak@amd.com>
arm64/runtime-const: Introduce runtime_const_mask_32()
This patch introduces runtime_const_mask_32 to optimize the mask operation
in the futex hash computation hot path using the UBFX instruction. It also
adds __runtime_fixup_mask to patch the 32-bit mask in the asm block once
the mask is known.
Link: https://developer.arm.com/documentation/ddi0602/2026-03/Base-Instructions/ [1]
Link: https://lore.kernel.org/lkml/20260430094730.31624-4-kprateek.nayak@amd.com/ [2]
> instruction is used for amd64 to improve instruction dinsity and
> performance.
[Severity: Low]
This isn't a bug, but there are a couple of typos here. Should "amd64" be
"arm64", and "dinsity" be "density"?
> diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
> index 838145bc289d2..371c9a4bc2d4b 100644
> --- a/arch/arm64/include/asm/runtime-const.h
> +++ b/arch/arm64/include/asm/runtime-const.h
[ ... ]
> +static inline void __runtime_fixup_mask(void *where, unsigned long val)
> +{
> + unsigned int width = __fls(val) + 1;
> + __le32 *p = where;
> + u32 insn;
> +
> + /*
[ ... ]
> + */
> + BUG_ON(!val || width > 32 || (GENMASK(width - 1, 0) != val));
[Severity: Medium]
Does evaluating __fls(val) before the BUG_ON(!val) check invoke undefined
behavior if val is 0?
Looking at __fls(), it uses __builtin_clzl(word) which is undefined when
the argument is 0:
include/asm-generic/bitops/builtin-__fls.h:__fls() {
return (sizeof(word) * 8) - 1 - __builtin_clzl(word);
}
Could the compiler use Value Range Propagation to assume val is non-zero
due to the __fls() call, and silently optimize away the !val safety guard?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630045531.3939-1-kprateek.nayak@amd.com?part=3
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path
2026-06-30 4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
@ 2026-06-30 5:20 ` sashiko-bot
0 siblings, 0 replies; 13+ messages in thread
From: sashiko-bot @ 2026-06-30 5:20 UTC (permalink / raw)
To: K Prateek Nayak
Cc: linux-s390, Heiko Carstens, Alexander Gordeev, Vasily Gorbik
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] Accessing the runtime constant accessor `futex_queues()` immediately after `runtime_const_init()` in `futex_init()` causes a deterministic kernel crash during boot due to compiler hoisting and pipeline hazards.
--
commit e05e661cc5e69b283700fa99f0da313e4673aa43
Author: Peter Zijlstra <peterz@infradead.org>
futex: Use runtime constants for __futex_hash() hot path
This commit converts the read-only after init data __futex_shift,
__futex_mask, and __futex_queues used in the __futex_hash() hot path to
runtime constants. It also dynamically allocates __futex_queues based on
nr_node_ids instead of a static maximum. Runtime constants are initialized
in futex_init() using runtime_const_init().
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
> diff --git a/kernel/futex/core.c b/kernel/futex/core.c
> --- a/kernel/futex/core.c
> +++ b/kernel/futex/core.c
[ ... ]
> @@ -2020,10 +2016,19 @@ static int __init futex_init(void)
> hashsize = max(4, hashsize);
> hashsize = roundup_pow_of_two(hashsize);
> #endif
> - futex_hashshift = ilog2(hashsize);
> + __futex_mask = hashsize - 1;
> + __futex_shift = ilog2(hashsize);
> size = sizeof(struct futex_hash_bucket) * hashsize;
> order = get_order(size);
>
> + __futex_queues = kcalloc(nr_node_ids, sizeof(*__futex_queues), GFP_KERNEL);
> +
> + runtime_const_init(shift, __futex_shift);
> + runtime_const_init(mask, __futex_mask);
> + runtime_const_init(ptr, __futex_queues);
> +
> + BUG_ON(!futex_queues());
> +
> for_each_node(n) {
> struct futex_hash_bucket *table;
>
> @@ -2037,10 +2042,9 @@ static int __init futex_init(void)
> for (i = 0; i < hashsize; i++)
> futex_hash_bucket_init(&table[i]);
>
> - futex_queues[n] = table;
> + futex_queues()[n] = table;
[Severity: Critical]
Does using the futex_queues() accessor here in futex_init() right after
runtime_const_init() cause a boot crash?
Since runtime_const_ptr() expands to an asm_inline block with no memory
dependencies or volatile qualifiers, the compiler might treat it as a pure
computation. This means the compiler is free to hoist the evaluation of
futex_queues() before the runtime_const_init() patching loop executes, or
it could evaluate it once and reuse the cached dummy register value.
If the compiler hoists this evaluation, won't this BUG_ON() and subsequent
assignment dereference the unpatched dummy pointer and cause a deterministic
kernel panic during early boot?
Would it be safer to directly use the raw global variable __futex_queues
instead of the futex_queues() accessor inside this initialization function?
> }
>
> - futex_hashmask = hashsize - 1;
> pr_info("futex hash table entries: %lu (%lu bytes on %d NUMA nodes, total %lu KiB, %s).\n",
> hashsize, size, num_possible_nodes(), size * num_possible_nodes() / 1024,
> order > MAX_PAGE_ORDER ? "vmalloc" : "linear");
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630045531.3939-1-kprateek.nayak@amd.com?part=8
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC
2026-06-30 4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
@ 2026-06-30 6:47 ` Guo Ren
0 siblings, 0 replies; 13+ messages in thread
From: Guo Ren @ 2026-06-30 6:47 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Darren Hart, Davidlohr Bueso, André Almeida,
linux-arch, linux-kernel, Samuel Holland, Charlie Jenkins,
linux-arm-kernel, linux-riscv, linux-s390, Alexandre Ghiti,
Jisheng Zhang
On Tue, Jun 30, 2026 at 12:57 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Define the placeholder used for lui + addi[w] patching sequence as
> RUNTIME_MAGIC and use that instead of open coding the constants in the
> inline assembly.
>
> No functional changes intended.
>
> Suggested-by: Guo Ren <guoren@kernel.org>
> Reviewed-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
> Tested-by: Charlie Jenkins <thecharlesjenkins@gmail.com>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> changelog v4..v5:
>
> o Collected tags from Charlie (Thanks a ton!)
> ---
> arch/riscv/include/asm/runtime-const.h | 38 ++++++++++++++------------
> 1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
> index 900db0a103d05..1ce02605d2e43 100644
> --- a/arch/riscv/include/asm/runtime-const.h
> +++ b/arch/riscv/include/asm/runtime-const.h
> @@ -15,21 +15,23 @@
>
> #include <linux/uaccess.h>
>
> +#define RUNTIME_MAGIC __ASM_STR(0x89ABCDEF)
> +
> #ifdef CONFIG_32BIT
> -#define runtime_const_ptr(sym) \
> -({ \
> - typeof(sym) __ret; \
> - asm_inline(".option push\n\t" \
> - ".option norvc\n\t" \
> - "1:\t" \
> - "lui %[__ret],0x89abd\n\t" \
> - "addi %[__ret],%[__ret],-0x211\n\t" \
> - ".option pop\n\t" \
> - ".pushsection runtime_ptr_" #sym ",\"a\"\n\t" \
> - ".long 1b - .\n\t" \
> - ".popsection" \
> - : [__ret] "=r" (__ret)); \
> - __ret; \
> +#define runtime_const_ptr(sym) \
> +({ \
> + typeof(sym) __ret; \
> + asm_inline(".option push\n\t" \
> + ".option norvc\n\t" \
> + "1:\t" \
> + "lui %[__ret], %%hi(" RUNTIME_MAGIC ")\n\t" \
> + "addi %[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t" \
> + ".option pop\n\t" \
> + ".pushsection runtime_ptr_" #sym ",\"a\"\n\t" \
> + ".long 1b - .\n\t" \
> + ".popsection" \
> + : [__ret] "=r" (__ret)); \
> + __ret; \
> })
> #else
> /*
> @@ -46,10 +48,10 @@
> ".option push\n\t" \
> ".option norvc\n\t" \
> "1:\t" \
> - "lui %[__ret],0x89abd\n\t" \
> - "lui %[__tmp],0x1234\n\t" \
> - "addiw %[__ret],%[__ret],-0x211\n\t" \
> - "addiw %[__tmp],%[__tmp],0x567\n\t" \
> + "lui %[__ret], %%hi(" RUNTIME_MAGIC ")\n\t" \
> + "lui %[__tmp], %%hi(" RUNTIME_MAGIC ")\n\t" \
> + "addiw %[__ret],%[__ret], %%lo(" RUNTIME_MAGIC ")\n\t" \
> + "addiw %[__tmp],%[__tmp], %%lo(" RUNTIME_MAGIC ")\n\t" \
LGTM!
Reviewed-by: Guo Ren <guoren@kernel.org>
--
Best Regards
Guo Ren
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32()
2026-06-30 5:07 ` sashiko-bot
@ 2026-06-30 8:16 ` K Prateek Nayak
0 siblings, 0 replies; 13+ messages in thread
From: K Prateek Nayak @ 2026-06-30 8:16 UTC (permalink / raw)
To: sashiko-reviews
Cc: Alexander Gordeev, Heiko Carstens, Vasily Gorbik, linux-s390
Hello,
On 6/30/2026 10:37 AM, sashiko-bot@kernel.org wrote:
> [You don't often get email from sashiko-bot@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [Medium] Calling `__fls(val)` before `BUG_ON(!val)` invokes undefined behavior for `val == 0`, potentially allowing the compiler to optimize away the safety check.
> - [Low] The commit message incorrectly references `amd64` instead of `arm64` and misspells `density`.
> --
>
> commit e9b79c78d336baf757f7578713831df9b1c4c1b9
> Author: K Prateek Nayak <kprateek.nayak@amd.com>
>
> arm64/runtime-const: Introduce runtime_const_mask_32()
>
> This patch introduces runtime_const_mask_32 to optimize the mask operation
> in the futex hash computation hot path using the UBFX instruction. It also
> adds __runtime_fixup_mask to patch the 32-bit mask in the asm block once
> the mask is known.
>
> Link: https://developer.arm.com/documentation/ddi0602/2026-03/Base-Instructions/ [1]
> Link: https://lore.kernel.org/lkml/20260430094730.31624-4-kprateek.nayak@amd.com/ [2]
>
>> instruction is used for amd64 to improve instruction dinsity and
>> performance.
>
> [Severity: Low]
> This isn't a bug, but there are a couple of typos here. Should "amd64" be
> "arm64", and "dinsity" be "density"?
Ah! Yes. Sorry about that. I'll wait for a while for more comments and
spin a v5.1 if necessary to fix those typos.
>
>> diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
>> index 838145bc289d2..371c9a4bc2d4b 100644
>> --- a/arch/arm64/include/asm/runtime-const.h
>> +++ b/arch/arm64/include/asm/runtime-const.h
> [ ... ]
>> +static inline void __runtime_fixup_mask(void *where, unsigned long val)
>> +{
>> + unsigned int width = __fls(val) + 1;
>> + __le32 *p = where;
>> + u32 insn;
>> +
>> + /*
> [ ... ]
>> + */
>> + BUG_ON(!val || width > 32 || (GENMASK(width - 1, 0) != val));
>
> [Severity: Medium]
> Does evaluating __fls(val) before the BUG_ON(!val) check invoke undefined
> behavior if val is 0?
>
> Looking at __fls(), it uses __builtin_clzl(word) which is undefined when
> the argument is 0:
>
> include/asm-generic/bitops/builtin-__fls.h:__fls() {
> return (sizeof(word) * 8) - 1 - __builtin_clzl(word);
> }
>
> Could the compiler use Value Range Propagation to assume val is non-zero
> due to the __fls() call, and silently optimize away the !val safety guard?
How does that work? "width" is calculated as __fls(val) so why would the
compiler make assumption on the possible values of "val" based on that call
alone?
I suppose you are referring to an issue like
https://gcc.gnu.org/pipermail/gcc/2023-November/242808.html but that is
-O3 and under normal circumstances, the propagation is stopped if the
compiler believes there is possibility of undefined behavior:
https://github.com/gcc-mirror/gcc/blob/7ebc766bdbc8584699acb3f8fd30b2c393923c06/gcc/gimple-range-op.cc#L937-L944
That said, I can fix it in the next version to always check for 0 before
__fls().
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-06-30 8:16 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30 4:55 [PATCH v5 0/8] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 1/8] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 2/8] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 3/8] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-06-30 5:07 ` sashiko-bot
2026-06-30 8:16 ` K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 4/8] riscv/runtime-const: Replace open-coded placeholder with RUNTIME_MAGIC K Prateek Nayak
2026-06-30 6:47 ` Guo Ren
2026-06-30 4:55 ` [PATCH v5 5/8] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 6/8] s390/runtime-const: " K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 7/8] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-06-30 4:55 ` [PATCH v5 8/8] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
2026-06-30 5:20 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox