* [RFC PATCH v2 1/7] x86/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 5:23 [RFC PATCH v2 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
@ 2026-03-16 5:23 ` K Prateek Nayak
2026-03-16 5:23 ` [RFC PATCH v2 2/7] arm64/runtime-const: " K Prateek Nayak
` (5 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 5:23 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
K Prateek Nayak, H. Peter Anvin, Kiryl Shutsemau,
Sean Christopherson, Thomas Huth
From: Peter Zijlstra <peterz@infradead.org>
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.
[ prateek: Broke off the x86 chunk, commit message. ]
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
arch/x86/include/asm/runtime-const.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/include/asm/runtime-const.h b/arch/x86/include/asm/runtime-const.h
index 4cd94fdcb45e..b13f7036c1c9 100644
--- a/arch/x86/include/asm/runtime-const.h
+++ b/arch/x86/include/asm/runtime-const.h
@@ -41,6 +41,15 @@
:"+r" (__ret)); \
__ret; })
+#define runtime_const_mask_32(val, sym) ({ \
+ typeof(0u+(val)) __ret = (val); \
+ asm_inline("and $0x12345678, %k0\n1:\n" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t"\
+ ".long 1b - 4 - .\n" \
+ ".popsection" \
+ : "+r" (__ret)); \
+ __ret; })
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -65,6 +74,11 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
*(unsigned char *)where = val;
}
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ *(unsigned int *)where = val;
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC PATCH v2 2/7] arm64/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 5:23 [RFC PATCH v2 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-03-16 5:23 ` [RFC PATCH v2 1/7] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-03-16 5:23 ` K Prateek Nayak
2026-03-16 11:50 ` David Laight
2026-03-16 5:23 ` [RFC PATCH v2 3/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 5:23 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
K Prateek Nayak, Jisheng Zhang
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. GCC generates a:
movz w1, #lo16, lsl #0 // w1 = bits [15:0]
movk w1, #hi16, lsl #16 // w1 = full 32-bit value
and w0, w0, w1 // w0 = w0 & w1
pattern to tackle arbitrary 32-bit masks and the same was also suggested
by Claude which is implemented here. __runtime_fixup_ptr() already
patches a "movz, + movk lsl #16" sequence which has been reused to patch
the same sequence for __runtime_fixup_mask().
Assisted-by: Claude:claude-sonnet-4-5
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
arch/arm64/include/asm/runtime-const.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index c3dbd3ae68f6..4c3f0b9aad98 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -35,6 +35,19 @@
:"r" (0u+(val))); \
__ret; })
+#define runtime_const_mask_32(val, sym) ({ \
+ unsigned long __ret; \
+ asm_inline("1:\t" \
+ "movz %w0, #0xcdef\n\t" \
+ "movk %w0, #0x89ab, lsl #16\n\t" \
+ "and %w0,%w0,%w1\n\t" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
+ ".long 1b - .\n\t" \
+ ".popsection" \
+ :"=r" (__ret) \
+ :"r" (0u+(val))); \
+ __ret; })
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -80,6 +93,15 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
__runtime_fixup_caches(where, 1);
}
+/* Immediate value is 6 bits starting at bit #16 */
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ __le32 *p = lm_alias(where);
+ __runtime_fixup_16(p, val);
+ __runtime_fixup_16(p+1, val >> 16);
+ __runtime_fixup_caches(where, 2);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 2/7] arm64/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 5:23 ` [RFC PATCH v2 2/7] arm64/runtime-const: " K Prateek Nayak
@ 2026-03-16 11:50 ` David Laight
2026-03-16 17:09 ` K Prateek Nayak
0 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2026-03-16 11:50 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon,
Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Jisheng Zhang
On Mon, 16 Mar 2026 05:23:56 +0000
K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> Futex hash computation requires a mask operation with read-only after
> init data that will be converted to a runtime constant in the subsequent
> commit.
>
> Introduce runtime_const_mask_32 to further optimize the mask operation
> in the futex hash computation hot path. GCC generates a:
>
> movz w1, #lo16, lsl #0 // w1 = bits [15:0]
> movk w1, #hi16, lsl #16 // w1 = full 32-bit value
> and w0, w0, w1 // w0 = w0 & w1
I don't thing the '&' needs to be part of the asm block.
Just generate the 32bit constant and do the mask in C.
That will let the compiler schedule the instructions.
It also make the code patching more generally useful.
David
>
> pattern to tackle arbitrary 32-bit masks and the same was also suggested
> by Claude which is implemented here. __runtime_fixup_ptr() already
> patches a "movz, + movk lsl #16" sequence which has been reused to patch
> the same sequence for __runtime_fixup_mask().
>
> Assisted-by: Claude:claude-sonnet-4-5
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> arch/arm64/include/asm/runtime-const.h | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
> index c3dbd3ae68f6..4c3f0b9aad98 100644
> --- a/arch/arm64/include/asm/runtime-const.h
> +++ b/arch/arm64/include/asm/runtime-const.h
> @@ -35,6 +35,19 @@
> :"r" (0u+(val))); \
> __ret; })
>
> +#define runtime_const_mask_32(val, sym) ({ \
> + unsigned long __ret; \
> + asm_inline("1:\t" \
> + "movz %w0, #0xcdef\n\t" \
> + "movk %w0, #0x89ab, lsl #16\n\t" \
> + "and %w0,%w0,%w1\n\t" \
> + ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
> + ".long 1b - .\n\t" \
> + ".popsection" \
> + :"=r" (__ret) \
> + :"r" (0u+(val))); \
> + __ret; })
> +
> #define runtime_const_init(type, sym) do { \
> extern s32 __start_runtime_##type##_##sym[]; \
> extern s32 __stop_runtime_##type##_##sym[]; \
> @@ -80,6 +93,15 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
> __runtime_fixup_caches(where, 1);
> }
>
> +/* Immediate value is 6 bits starting at bit #16 */
> +static inline void __runtime_fixup_mask(void *where, unsigned long val)
> +{
> + __le32 *p = lm_alias(where);
> + __runtime_fixup_16(p, val);
> + __runtime_fixup_16(p+1, val >> 16);
> + __runtime_fixup_caches(where, 2);
> +}
> +
> static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
> unsigned long val, s32 *start, s32 *end)
> {
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 2/7] arm64/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 11:50 ` David Laight
@ 2026-03-16 17:09 ` K Prateek Nayak
0 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 17:09 UTC (permalink / raw)
To: David Laight
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon,
Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Jisheng Zhang
Hello David,
On 3/16/2026 5:20 PM, David Laight wrote:
>> Introduce runtime_const_mask_32 to further optimize the mask operation
>> in the futex hash computation hot path. GCC generates a:
>>
>> movz w1, #lo16, lsl #0 // w1 = bits [15:0]
>> movk w1, #hi16, lsl #16 // w1 = full 32-bit value
>> and w0, w0, w1 // w0 = w0 & w1
>
> I don't thing the '&' needs to be part of the asm block.
> Just generate the 32bit constant and do the mask in C.
> That will let the compiler schedule the instructions.
> It also make the code patching more generally useful.
Ack! That makes sense. I'll update it in the next version.
Thank you for taking a look at the series.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH v2 3/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
2026-03-16 5:23 [RFC PATCH v2 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-03-16 5:23 ` [RFC PATCH v2 1/7] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-03-16 5:23 ` [RFC PATCH v2 2/7] arm64/runtime-const: " K Prateek Nayak
@ 2026-03-16 5:23 ` K Prateek Nayak
2026-03-16 11:52 ` David Laight
2026-03-16 5:23 ` [RFC PATCH v2 4/7] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 5:23 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
K Prateek Nayak, Jisheng Zhang
The current scheme to directly patch the kernel text for runtime
constants runs into the following issue with futex adapted to using
runtime constants on arm64:
Unable to handle kernel write to read-only memory at virtual address fff0000000378fc8
Mem abort info:
ESR = 0x000000009600004e
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x0e: level 2 permission fault
Data abort info:
ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
CM = 0, WnR = 1, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000420a7000
[fff0000000378fc8] pgd=18000000bffff403, p4d=18000000bfffe403, pud=18000000bfffd403, pmd=0060000040200481
Internal error: Oops: 000000009600004e [#1] SMP
Modules linked in:
CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0-rc6-00004-g7e6457d29e6a-dirty #291 PREEMPT
Hardware name: linux,dummy-virt (DT)
pstate: 81400009 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : futex_init+0x13c/0x348
lr : futex_init+0xc8/0x348
sp : ffff80008002bd40
x29: ffff80008002bd40 x28: ffffa4b73ba0a160 x27: ffffa4b73bd10d74
x26: ffffa4b73cb68b28 x25: ffffa4b73ba0b000 x24: ffffa4b73c66b000
x23: 0000000000003fe0 x22: 0000000000000000 x21: ffffa4b73bd10d74
x20: 0000000000008000 x19: 0000000000000000 x18: 00000000ffffffff
x17: 000000007014db06 x16: ffffa4b73ca3ec08 x15: ffff80010002b937
x14: 0000000000000006 x13: fff0000077200000 x12: 00000000000002b2
x11: 00000000000000e6 x10: fff0000079e00000 x9 : fff0000077200000
x8 : fff00000034df9e0 x7 : 0000000000000200 x6 : ffffa4b73ba0b000
x5 : fff0000003510000 x4 : 0000000052803fe0 x3 : 0000000072a00000
x2 : fff0000000378fc8 x1 : ffffa4b739d78fd0 x0 : ffffa4b739d78fc8
Call trace:
futex_init+0x13c/0x348 (P)
do_one_initcall+0x6c/0x1b0
kernel_init_freeable+0x204/0x2e0
kernel_init+0x20/0x1d8
ret_from_fork+0x10/0x20
Code: 120b3c84 120b3c63 2a170084 2a130063 (29000c44)
---[ end trace 0000000000000000 ]---
The pc at "futex_init+0x13c/0x348" points to:
futex_init()
runtime_const_init(shift, __futex_shift)
__runtime_fixup_shift()
*p = cpu_to_le32(insn); /* <--- Here --- */
... which points to core_initcall() being too late to patch the kernel
text directly unlike the "d_hash_shift", "__names_cache" which are
initialized during start_kernel() before the protections are in place.
Use aarch64_insn_patch_text_nosync() to patch the runtime constants
instead of doing it directly to allow for running runtime_const_init()
slightly later into the boot.
Since aarch64_insn_patch_text_nosync() calls caches_clean_inval_pou()
internally, __runtime_fixup_caches() ends up being redundant.
runtime_const_init() are rare and the overheads of multiple calls to
caches_clean_inval_pou() instead of batching them together should be
negligible in practice.
At least one usage in kprobes.c suggests cpu_to_le32() conversion is not
necessary for aarch64_insn_patch_text_nosync() unlike in the current
scheme of patching *p directly.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
arch/arm64/include/asm/runtime-const.h | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index 4c3f0b9aad98..764e244f06a4 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -7,6 +7,7 @@
#endif
#include <asm/cacheflush.h>
+#include <asm/text-patching.h>
/* Sigh. You can still run arm64 in BE mode */
#include <asm/byteorder.h>
@@ -63,13 +64,7 @@ static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
u32 insn = le32_to_cpu(*p);
insn &= 0xffe0001f;
insn |= (val & 0xffff) << 5;
- *p = cpu_to_le32(insn);
-}
-
-static inline void __runtime_fixup_caches(void *where, unsigned int insns)
-{
- unsigned long va = (unsigned long)where;
- caches_clean_inval_pou(va, va + 4*insns);
+ aarch64_insn_patch_text_nosync(p, insn);
}
static inline void __runtime_fixup_ptr(void *where, unsigned long val)
@@ -79,7 +74,6 @@ static inline void __runtime_fixup_ptr(void *where, unsigned long val)
__runtime_fixup_16(p+1, val >> 16);
__runtime_fixup_16(p+2, val >> 32);
__runtime_fixup_16(p+3, val >> 48);
- __runtime_fixup_caches(where, 4);
}
/* Immediate value is 6 bits starting at bit #16 */
@@ -89,8 +83,7 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
u32 insn = le32_to_cpu(*p);
insn &= 0xffc0ffff;
insn |= (val & 63) << 16;
- *p = cpu_to_le32(insn);
- __runtime_fixup_caches(where, 1);
+ aarch64_insn_patch_text_nosync(p, insn);
}
/* Immediate value is 6 bits starting at bit #16 */
@@ -99,7 +92,6 @@ static inline void __runtime_fixup_mask(void *where, unsigned long val)
__le32 *p = lm_alias(where);
__runtime_fixup_16(p, val);
__runtime_fixup_16(p+1, val >> 16);
- __runtime_fixup_caches(where, 2);
}
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 3/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
2026-03-16 5:23 ` [RFC PATCH v2 3/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
@ 2026-03-16 11:52 ` David Laight
2026-03-16 17:13 ` K Prateek Nayak
0 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2026-03-16 11:52 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon,
Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Jisheng Zhang
On Mon, 16 Mar 2026 05:23:57 +0000
K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> The current scheme to directly patch the kernel text for runtime
> constants runs into the following issue with futex adapted to using
> runtime constants on arm64:
Doesn't this need to come before the previous patch?
David
>
> Unable to handle kernel write to read-only memory at virtual address fff0000000378fc8
> Mem abort info:
> ESR = 0x000000009600004e
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x0e: level 2 permission fault
> Data abort info:
> ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
> CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000420a7000
> [fff0000000378fc8] pgd=18000000bffff403, p4d=18000000bfffe403, pud=18000000bfffd403, pmd=0060000040200481
> Internal error: Oops: 000000009600004e [#1] SMP
> Modules linked in:
> CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0-rc6-00004-g7e6457d29e6a-dirty #291 PREEMPT
> Hardware name: linux,dummy-virt (DT)
> pstate: 81400009 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> pc : futex_init+0x13c/0x348
> lr : futex_init+0xc8/0x348
> sp : ffff80008002bd40
> x29: ffff80008002bd40 x28: ffffa4b73ba0a160 x27: ffffa4b73bd10d74
> x26: ffffa4b73cb68b28 x25: ffffa4b73ba0b000 x24: ffffa4b73c66b000
> x23: 0000000000003fe0 x22: 0000000000000000 x21: ffffa4b73bd10d74
> x20: 0000000000008000 x19: 0000000000000000 x18: 00000000ffffffff
> x17: 000000007014db06 x16: ffffa4b73ca3ec08 x15: ffff80010002b937
> x14: 0000000000000006 x13: fff0000077200000 x12: 00000000000002b2
> x11: 00000000000000e6 x10: fff0000079e00000 x9 : fff0000077200000
> x8 : fff00000034df9e0 x7 : 0000000000000200 x6 : ffffa4b73ba0b000
> x5 : fff0000003510000 x4 : 0000000052803fe0 x3 : 0000000072a00000
> x2 : fff0000000378fc8 x1 : ffffa4b739d78fd0 x0 : ffffa4b739d78fc8
> Call trace:
> futex_init+0x13c/0x348 (P)
> do_one_initcall+0x6c/0x1b0
> kernel_init_freeable+0x204/0x2e0
> kernel_init+0x20/0x1d8
> ret_from_fork+0x10/0x20
> Code: 120b3c84 120b3c63 2a170084 2a130063 (29000c44)
> ---[ end trace 0000000000000000 ]---
>
> The pc at "futex_init+0x13c/0x348" points to:
>
> futex_init()
> runtime_const_init(shift, __futex_shift)
> __runtime_fixup_shift()
> *p = cpu_to_le32(insn); /* <--- Here --- */
>
> ... which points to core_initcall() being too late to patch the kernel
> text directly unlike the "d_hash_shift", "__names_cache" which are
> initialized during start_kernel() before the protections are in place.
>
> Use aarch64_insn_patch_text_nosync() to patch the runtime constants
> instead of doing it directly to allow for running runtime_const_init()
> slightly later into the boot.
>
> Since aarch64_insn_patch_text_nosync() calls caches_clean_inval_pou()
> internally, __runtime_fixup_caches() ends up being redundant.
> runtime_const_init() are rare and the overheads of multiple calls to
> caches_clean_inval_pou() instead of batching them together should be
> negligible in practice.
>
> At least one usage in kprobes.c suggests cpu_to_le32() conversion is not
> necessary for aarch64_insn_patch_text_nosync() unlike in the current
> scheme of patching *p directly.
>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> arch/arm64/include/asm/runtime-const.h | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
> index 4c3f0b9aad98..764e244f06a4 100644
> --- a/arch/arm64/include/asm/runtime-const.h
> +++ b/arch/arm64/include/asm/runtime-const.h
> @@ -7,6 +7,7 @@
> #endif
>
> #include <asm/cacheflush.h>
> +#include <asm/text-patching.h>
>
> /* Sigh. You can still run arm64 in BE mode */
> #include <asm/byteorder.h>
> @@ -63,13 +64,7 @@ static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
> u32 insn = le32_to_cpu(*p);
> insn &= 0xffe0001f;
> insn |= (val & 0xffff) << 5;
> - *p = cpu_to_le32(insn);
> -}
> -
> -static inline void __runtime_fixup_caches(void *where, unsigned int insns)
> -{
> - unsigned long va = (unsigned long)where;
> - caches_clean_inval_pou(va, va + 4*insns);
> + aarch64_insn_patch_text_nosync(p, insn);
> }
>
> static inline void __runtime_fixup_ptr(void *where, unsigned long val)
> @@ -79,7 +74,6 @@ static inline void __runtime_fixup_ptr(void *where, unsigned long val)
> __runtime_fixup_16(p+1, val >> 16);
> __runtime_fixup_16(p+2, val >> 32);
> __runtime_fixup_16(p+3, val >> 48);
> - __runtime_fixup_caches(where, 4);
> }
>
> /* Immediate value is 6 bits starting at bit #16 */
> @@ -89,8 +83,7 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
> u32 insn = le32_to_cpu(*p);
> insn &= 0xffc0ffff;
> insn |= (val & 63) << 16;
> - *p = cpu_to_le32(insn);
> - __runtime_fixup_caches(where, 1);
> + aarch64_insn_patch_text_nosync(p, insn);
> }
>
> /* Immediate value is 6 bits starting at bit #16 */
> @@ -99,7 +92,6 @@ static inline void __runtime_fixup_mask(void *where, unsigned long val)
> __le32 *p = lm_alias(where);
> __runtime_fixup_16(p, val);
> __runtime_fixup_16(p+1, val >> 16);
> - __runtime_fixup_caches(where, 2);
> }
>
> static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 3/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
2026-03-16 11:52 ` David Laight
@ 2026-03-16 17:13 ` K Prateek Nayak
0 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 17:13 UTC (permalink / raw)
To: David Laight
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon,
Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Jisheng Zhang
Hello David,
On 3/16/2026 5:22 PM, David Laight wrote:
>> The current scheme to directly patch the kernel text for runtime
>> constants runs into the following issue with futex adapted to using
>> runtime constants on arm64:
>
> Doesn't this need to come before the previous patch?
My rationale was that this didn't make a difference until
the final futex changes so I didn't pay much attention to
how they were ordered.
I will rearrange these the other way around in the next
version to keep this independent of introduction of
runtime_const_mask_32().
Thank you again for taking a look at the series.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH v2 4/7] riscv/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 5:23 [RFC PATCH v2 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (2 preceding siblings ...)
2026-03-16 5:23 ` [RFC PATCH v2 3/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
@ 2026-03-16 5:23 ` K Prateek Nayak
2026-03-16 5:23 ` [RFC PATCH v2 5/7] s390/runtime-const: " K Prateek Nayak
` (2 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 5:23 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Alexandre Ghiti
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
K Prateek Nayak, Charlie Jenkins, Charles Mirabile
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. GCC generates a:
lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
addi a0, a0, 0x678 # lower 12 bits
and a1, a1, a0 # a1 = a1 & a0
pattern to tackle arbitrary 32-bit masks and the same was also suggested
by Claude which is implemented here. __runtime_fixup_ptr() already
patches a "lui + addi" sequence which has been reused to patch the same
sequence for __runtime_fixup_mask().
Assisted-by: Claude:claude-sonnet-4-5
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
arch/riscv/include/asm/runtime-const.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
index d766e2b9e6df..f54289a7ddeb 100644
--- a/arch/riscv/include/asm/runtime-const.h
+++ b/arch/riscv/include/asm/runtime-const.h
@@ -153,6 +153,24 @@
__ret; \
})
+#define runtime_const_mask_32(val, sym) \
+({ \
+ u32 __ret; \
+ asm_inline(".option push\n\t" \
+ ".option norvc\n\t" \
+ "1:\t" \
+ "lui %[__ret],0x89abd\n\t" \
+ "addi %[__ret],%[__ret],-0x211\n\t" \
+ "and %[__ret],%[__ret],%[__val]\n\t" \
+ ".option pop\n\t" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
+ ".long 1b - .\n\t" \
+ ".popsection" \
+ : [__ret] "=&r" (__ret) \
+ : [__val] "r" (val)); \
+ __ret; \
+})
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -256,6 +274,12 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
mutex_unlock(&text_mutex);
}
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ __runtime_fixup_32(where, where + 4, val);
+ __runtime_fixup_caches(where, 2);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC PATCH v2 5/7] s390/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 5:23 [RFC PATCH v2 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (3 preceding siblings ...)
2026-03-16 5:23 ` [RFC PATCH v2 4/7] riscv/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-03-16 5:23 ` K Prateek Nayak
2026-03-16 19:19 ` Heiko Carstens
2026-03-16 5:24 ` [RFC PATCH v2 6/7] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-03-16 5:24 ` [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
6 siblings, 1 reply; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 5:23 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
K Prateek Nayak, Christian Borntraeger, Sven Schnelle
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.
GCC generates a:
nilf %r1,<imm32>
to tackle arbitrary 32-bit masks and the same is implemented here.
Immediate patching pattern for __runtime_fixup_mask() has been adopted
from __runtime_fixup_ptr().
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
arch/s390/include/asm/runtime-const.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/s390/include/asm/runtime-const.h b/arch/s390/include/asm/runtime-const.h
index 17878b1d048c..c0f0d59066e2 100644
--- a/arch/s390/include/asm/runtime-const.h
+++ b/arch/s390/include/asm/runtime-const.h
@@ -33,6 +33,19 @@
__ret; \
})
+#define runtime_const_mask_32(val, sym) \
+({ \
+ unsigned int __ret = (val); \
+ \
+ asm_inline( \
+ "0: nilf %[__ret],12\n" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n" \
+ ".long 0b - .\n" \
+ ".popsection" \
+ : [__ret] "+d" (__ret)); \
+ __ret; \
+})
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -65,6 +78,12 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
s390_kernel_write(where, &insn, sizeof(insn));
}
+/* 32-bit immediate for nilf in bits in I2 field */
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ __runtime_fixup_32(where + 2, val);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 5/7] s390/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 5:23 ` [RFC PATCH v2 5/7] s390/runtime-const: " K Prateek Nayak
@ 2026-03-16 19:19 ` Heiko Carstens
2026-03-17 1:55 ` K Prateek Nayak
0 siblings, 1 reply; 18+ messages in thread
From: Heiko Carstens @ 2026-03-16 19:19 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Vasily Gorbik, Alexander Gordeev,
Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Christian Borntraeger, Sven Schnelle
On Mon, Mar 16, 2026 at 05:23:59AM +0000, K Prateek Nayak wrote:
> Futex hash computation requires a mask operation with read-only after
> init data that will be converted to a runtime constant in the subsequent
> commit.
>
> Introduce runtime_const_mask_32 to further optimize the mask operation
> in the futex hash computation hot path.
>
> GCC generates a:
>
> nilf %r1,<imm32>
>
> to tackle arbitrary 32-bit masks and the same is implemented here.
> Immediate patching pattern for __runtime_fixup_mask() has been adopted
> from __runtime_fixup_ptr().
>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> arch/s390/include/asm/runtime-const.h | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
...
> +#define runtime_const_mask_32(val, sym) \
> +({ \
> + unsigned int __ret = (val); \
> + \
> + asm_inline( \
> + "0: nilf %[__ret],12\n" \
> + ".pushsection runtime_mask_" #sym ",\"a\"\n" \
> + ".long 0b - .\n" \
> + ".popsection" \
> + : [__ret] "+d" (__ret)); \
> + __ret; \
> +})
The nilf instruction changes the condition code and this must be reflected in
the clobber list. Besides that I would also appreciate if you would move the
existing comment above __runtime_fixup_32().
Or in other words, if you merge the patch below into this one feel free to
add:
Acked-by: Heiko Carstens <hca@linux.ibm.com>
diff --git a/arch/s390/include/asm/runtime-const.h b/arch/s390/include/asm/runtime-const.h
index c0f0d59066e2..7b71156031ec 100644
--- a/arch/s390/include/asm/runtime-const.h
+++ b/arch/s390/include/asm/runtime-const.h
@@ -42,7 +42,8 @@
".pushsection runtime_mask_" #sym ",\"a\"\n" \
".long 0b - .\n" \
".popsection" \
- : [__ret] "+d" (__ret)); \
+ : [__ret] "+d" (__ret) \
+ : : "cc"); \
__ret; \
})
@@ -56,12 +57,12 @@
__stop_runtime_##type##_##sym); \
} while (0)
-/* 32-bit immediate for iihf and iilf in bits in I2 field */
static inline void __runtime_fixup_32(u32 *p, unsigned int val)
{
s390_kernel_write(p, &val, sizeof(val));
}
+/* 32-bit immediate for iihf and iilf in bits in I2 field */
static inline void __runtime_fixup_ptr(void *where, unsigned long val)
{
__runtime_fixup_32(where + 2, val >> 32);
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 5/7] s390/runtime-const: Introduce runtime_const_mask_32()
2026-03-16 19:19 ` Heiko Carstens
@ 2026-03-17 1:55 ` K Prateek Nayak
0 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-17 1:55 UTC (permalink / raw)
To: Heiko Carstens
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Vasily Gorbik, Alexander Gordeev,
Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Christian Borntraeger, Sven Schnelle
Hello Heiko,
On 3/17/2026 12:49 AM, Heiko Carstens wrote:
>> +#define runtime_const_mask_32(val, sym) \
>> +({ \
>> + unsigned int __ret = (val); \
>> + \
>> + asm_inline( \
>> + "0: nilf %[__ret],12\n" \
>> + ".pushsection runtime_mask_" #sym ",\"a\"\n" \
>> + ".long 0b - .\n" \
>> + ".popsection" \
>> + : [__ret] "+d" (__ret)); \
>> + __ret; \
>> +})
>
> The nilf instruction changes the condition code and this must be reflected in
> the clobber list.
Thanks a ton for catching that!
> Besides that I would also appreciate if you would move the
> existing comment above __runtime_fixup_32().
>
> Or in other words, if you merge the patch below into this one feel free to
> add:
I'll fold in the suggested diff when spinning up the next version.
>
> Acked-by: Heiko Carstens <hca@linux.ibm.com>
Thanks a ton for taking a look at the series!
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH v2 6/7] asm-generic/runtime-const: Add dummy runtime_const_mask_32()
2026-03-16 5:23 [RFC PATCH v2 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (4 preceding siblings ...)
2026-03-16 5:23 ` [RFC PATCH v2 5/7] s390/runtime-const: " K Prateek Nayak
@ 2026-03-16 5:24 ` K Prateek Nayak
2026-03-16 5:24 ` [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
6 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 5:24 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Arnd Bergmann
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
K Prateek Nayak
From: Peter Zijlstra <peterz@infradead.org>
Add a dummy runtime_const_mask_32() for all the architectures that do
not support runtime-const.
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
include/asm-generic/runtime-const.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/asm-generic/runtime-const.h b/include/asm-generic/runtime-const.h
index 670499459514..03e6e3e02401 100644
--- a/include/asm-generic/runtime-const.h
+++ b/include/asm-generic/runtime-const.h
@@ -10,6 +10,7 @@
*/
#define runtime_const_ptr(sym) (sym)
#define runtime_const_shift_right_32(val, sym) ((u32)(val)>>(sym))
+#define runtime_const_mask_32(val, sym) ((u32)(val)&(sym))
#define runtime_const_init(type,sym) do { } while (0)
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path
2026-03-16 5:23 [RFC PATCH v2 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (5 preceding siblings ...)
2026-03-16 5:24 ` [RFC PATCH v2 6/7] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
@ 2026-03-16 5:24 ` K Prateek Nayak
2026-03-16 8:14 ` Sebastian Andrzej Siewior
2026-03-17 3:06 ` Samuel Holland
6 siblings, 2 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 5:24 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Borislav Petkov, Dave Hansen, x86, Catalin Marinas,
Will Deacon, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Arnd Bergmann
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
K Prateek Nayak, Alexandre Ghiti, H. Peter Anvin, Kiryl Shutsemau,
Sean Christopherson, Charlie Jenkins, Charles Mirabile,
Christian Borntraeger, Sven Schnelle, Thomas Huth, Jisheng Zhang
From: Peter Zijlstra <peterz@infradead.org>
Runtime constify the read-only after init data __futex_shift(shift_32),
__futex_mask(mask_32), and __futex_queues(ptr) used in __futex_hash()
hot path to avoid referencing global variable.
This also allows __futex_queues to be allocated dynamically to
"nr_node_ids" slots instead of reserving config dependent MAX_NUMNODES
(1 << CONFIG_NODES_SHIFT) worth of slots upfront.
No functional chages intended.
[ prateek: Dynamically allocate __futex_queues, mark the global data
__ro_after_init since they are constified after futex_init(). ]
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> # MAX_NUMNODES bloat
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
include/asm-generic/vmlinux.lds.h | 5 +++-
kernel/futex/core.c | 42 +++++++++++++++++--------------
2 files changed, 27 insertions(+), 20 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 1e1580febe4b..86f99fa6ae24 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -975,7 +975,10 @@
RUNTIME_CONST(shift, d_hash_shift) \
RUNTIME_CONST(ptr, dentry_hashtable) \
RUNTIME_CONST(ptr, __dentry_cache) \
- RUNTIME_CONST(ptr, __names_cache)
+ RUNTIME_CONST(ptr, __names_cache) \
+ RUNTIME_CONST(shift, __futex_shift) \
+ RUNTIME_CONST(mask, __futex_mask) \
+ RUNTIME_CONST(ptr, __futex_queues)
/* Alignment must be consistent with (kunit_suite *) in include/kunit/test.h */
#define KUNIT_TABLE() \
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index cf7e610eac42..6b5c5a1596a5 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -45,23 +45,19 @@
#include <linux/mempolicy.h>
#include <linux/mmap_lock.h>
+#include <asm/runtime-const.h>
+
#include "futex.h"
#include "../locking/rtmutex_common.h"
-/*
- * The base of the bucket array and its size are always used together
- * (after initialization only in futex_hash()), so ensure that they
- * reside in the same cacheline.
- */
-static struct {
- unsigned long hashmask;
- unsigned int hashshift;
- struct futex_hash_bucket *queues[MAX_NUMNODES];
-} __futex_data __read_mostly __aligned(2*sizeof(long));
+static u32 __futex_mask __ro_after_init;
+static u32 __futex_shift __ro_after_init;
+static struct futex_hash_bucket **__futex_queues __ro_after_init;
-#define futex_hashmask (__futex_data.hashmask)
-#define futex_hashshift (__futex_data.hashshift)
-#define futex_queues (__futex_data.queues)
+static __always_inline struct futex_hash_bucket **futex_queues(void)
+{
+ return runtime_const_ptr(__futex_queues);
+}
struct futex_private_hash {
int state;
@@ -439,14 +435,14 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph)
* NOTE: this isn't perfectly uniform, but it is fast and
* handles sparse node masks.
*/
- node = (hash >> futex_hashshift) % nr_node_ids;
+ node = runtime_const_shift_right_32(hash, __futex_shift) % nr_node_ids;
if (!node_possible(node)) {
node = find_next_bit_wrap(node_possible_map.bits,
nr_node_ids, node);
}
}
- return &futex_queues[node][hash & futex_hashmask];
+ return &futex_queues()[node][runtime_const_mask_32(hash, __futex_mask)];
}
/**
@@ -1913,7 +1909,7 @@ int futex_hash_allocate_default(void)
* 16 <= threads * 4 <= global hash size
*/
buckets = roundup_pow_of_two(4 * threads);
- buckets = clamp(buckets, 16, futex_hashmask + 1);
+ buckets = clamp(buckets, 16, __futex_mask + 1);
if (current_buckets >= buckets)
return 0;
@@ -1983,10 +1979,19 @@ static int __init futex_init(void)
hashsize = max(4, hashsize);
hashsize = roundup_pow_of_two(hashsize);
#endif
- futex_hashshift = ilog2(hashsize);
+ __futex_mask = hashsize - 1;
+ __futex_shift = ilog2(hashsize);
size = sizeof(struct futex_hash_bucket) * hashsize;
order = get_order(size);
+ __futex_queues = kcalloc(nr_node_ids, sizeof(*__futex_queues), GFP_KERNEL);
+
+ runtime_const_init(shift, __futex_shift);
+ runtime_const_init(mask, __futex_mask);
+ runtime_const_init(ptr, __futex_queues);
+
+ BUG_ON(!futex_queues());
+
for_each_node(n) {
struct futex_hash_bucket *table;
@@ -2000,10 +2005,9 @@ static int __init futex_init(void)
for (i = 0; i < hashsize; i++)
futex_hash_bucket_init(&table[i], NULL);
- futex_queues[n] = table;
+ futex_queues()[n] = table;
}
- futex_hashmask = hashsize - 1;
pr_info("futex hash table entries: %lu (%lu bytes on %d NUMA nodes, total %lu KiB, %s).\n",
hashsize, size, num_possible_nodes(), size * num_possible_nodes() / 1024,
order > MAX_PAGE_ORDER ? "vmalloc" : "linear");
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path
2026-03-16 5:24 ` [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
@ 2026-03-16 8:14 ` Sebastian Andrzej Siewior
2026-03-16 17:15 ` K Prateek Nayak
2026-03-17 3:06 ` Samuel Holland
1 sibling, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-16 8:14 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Borislav Petkov, Dave Hansen, x86,
Catalin Marinas, Will Deacon, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Arnd Bergmann, Darren Hart, Davidlohr Bueso,
André Almeida, linux-arch, linux-kernel, linux-arm-kernel,
linux-riscv, linux-s390, Alexandre Ghiti, H. Peter Anvin,
Kiryl Shutsemau, Sean Christopherson, Charlie Jenkins,
Charles Mirabile, Christian Borntraeger, Sven Schnelle,
Thomas Huth, Jisheng Zhang
On 2026-03-16 05:24:01 [+0000], K Prateek Nayak wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> Runtime constify the read-only after init data __futex_shift(shift_32),
> __futex_mask(mask_32), and __futex_queues(ptr) used in __futex_hash()
> hot path to avoid referencing global variable.
>
> This also allows __futex_queues to be allocated dynamically to
> "nr_node_ids" slots instead of reserving config dependent MAX_NUMNODES
> (1 << CONFIG_NODES_SHIFT) worth of slots upfront.
>
> No functional chages intended.
>
> [ prateek: Dynamically allocate __futex_queues, mark the global data
> __ro_after_init since they are constified after futex_init(). ]
>
> Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> # MAX_NUMNODES bloat
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
This all looks nice. Let me look later at the resulting code. Thank you
so far ;)
Sebastian
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path
2026-03-16 8:14 ` Sebastian Andrzej Siewior
@ 2026-03-16 17:15 ` K Prateek Nayak
0 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-16 17:15 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Borislav Petkov, Dave Hansen, x86,
Catalin Marinas, Will Deacon, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Arnd Bergmann, Darren Hart, Davidlohr Bueso,
André Almeida, linux-arch, linux-kernel, linux-arm-kernel,
linux-riscv, linux-s390, Alexandre Ghiti, H. Peter Anvin,
Kiryl Shutsemau, Sean Christopherson, Charlie Jenkins,
Charles Mirabile, Christian Borntraeger, Sven Schnelle,
Thomas Huth, Jisheng Zhang
Hello Sebastian,
On 3/16/2026 1:44 PM, Sebastian Andrzej Siewior wrote:
> This all looks nice. Let me look later at the resulting code. Thank you
> so far ;)
Let me know if you find anything nasty and we can see how to best
address those bits in the next version :-)
Thank you for taking a look at the series.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path
2026-03-16 5:24 ` [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
2026-03-16 8:14 ` Sebastian Andrzej Siewior
@ 2026-03-17 3:06 ` Samuel Holland
2026-03-17 5:11 ` K Prateek Nayak
1 sibling, 1 reply; 18+ messages in thread
From: Samuel Holland @ 2026-03-17 3:06 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Alexandre Ghiti, H. Peter Anvin, Kiryl Shutsemau,
Sean Christopherson, Charlie Jenkins, Charles Mirabile,
Christian Borntraeger, Sven Schnelle, Thomas Huth, Jisheng Zhang,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Borislav Petkov, Dave Hansen, x86, Catalin Marinas,
Will Deacon, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Arnd Bergmann
Hi Prateek,
On 2026-03-16 12:24 AM, K Prateek Nayak wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> Runtime constify the read-only after init data __futex_shift(shift_32),
> __futex_mask(mask_32), and __futex_queues(ptr) used in __futex_hash()
> hot path to avoid referencing global variable.
>
> This also allows __futex_queues to be allocated dynamically to
> "nr_node_ids" slots instead of reserving config dependent MAX_NUMNODES
> (1 << CONFIG_NODES_SHIFT) worth of slots upfront.
>
> No functional chages intended.
>
> [ prateek: Dynamically allocate __futex_queues, mark the global data
> __ro_after_init since they are constified after futex_init(). ]
>
> Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> # MAX_NUMNODES bloat
> Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> include/asm-generic/vmlinux.lds.h | 5 +++-
> kernel/futex/core.c | 42 +++++++++++++++++--------------
> 2 files changed, 27 insertions(+), 20 deletions(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index 1e1580febe4b..86f99fa6ae24 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -975,7 +975,10 @@
> RUNTIME_CONST(shift, d_hash_shift) \
> RUNTIME_CONST(ptr, dentry_hashtable) \
> RUNTIME_CONST(ptr, __dentry_cache) \
> - RUNTIME_CONST(ptr, __names_cache)
> + RUNTIME_CONST(ptr, __names_cache) \
> + RUNTIME_CONST(shift, __futex_shift) \
> + RUNTIME_CONST(mask, __futex_mask) \
> + RUNTIME_CONST(ptr, __futex_queues)
>
> /* Alignment must be consistent with (kunit_suite *) in include/kunit/test.h */
> #define KUNIT_TABLE() \
> diff --git a/kernel/futex/core.c b/kernel/futex/core.c
> index cf7e610eac42..6b5c5a1596a5 100644
> --- a/kernel/futex/core.c
> +++ b/kernel/futex/core.c
> @@ -45,23 +45,19 @@
> #include <linux/mempolicy.h>
> #include <linux/mmap_lock.h>
>
> +#include <asm/runtime-const.h>
> +
> #include "futex.h"
> #include "../locking/rtmutex_common.h"
>
> -/*
> - * The base of the bucket array and its size are always used together
> - * (after initialization only in futex_hash()), so ensure that they
> - * reside in the same cacheline.
> - */
> -static struct {
> - unsigned long hashmask;
> - unsigned int hashshift;
> - struct futex_hash_bucket *queues[MAX_NUMNODES];
> -} __futex_data __read_mostly __aligned(2*sizeof(long));
> +static u32 __futex_mask __ro_after_init;
> +static u32 __futex_shift __ro_after_init;
> +static struct futex_hash_bucket **__futex_queues __ro_after_init;
>
> -#define futex_hashmask (__futex_data.hashmask)
> -#define futex_hashshift (__futex_data.hashshift)
> -#define futex_queues (__futex_data.queues)
> +static __always_inline struct futex_hash_bucket **futex_queues(void)
> +{
> + return runtime_const_ptr(__futex_queues);
> +}
>
> struct futex_private_hash {
> int state;
> @@ -439,14 +435,14 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph)
> * NOTE: this isn't perfectly uniform, but it is fast and
> * handles sparse node masks.
> */
> - node = (hash >> futex_hashshift) % nr_node_ids;
> + node = runtime_const_shift_right_32(hash, __futex_shift) % nr_node_ids;
> if (!node_possible(node)) {
> node = find_next_bit_wrap(node_possible_map.bits,
> nr_node_ids, node);
> }
> }
>
> - return &futex_queues[node][hash & futex_hashmask];
> + return &futex_queues()[node][runtime_const_mask_32(hash, __futex_mask)];
> }
>
> /**
> @@ -1913,7 +1909,7 @@ int futex_hash_allocate_default(void)
> * 16 <= threads * 4 <= global hash size
> */
> buckets = roundup_pow_of_two(4 * threads);
> - buckets = clamp(buckets, 16, futex_hashmask + 1);
> + buckets = clamp(buckets, 16, __futex_mask + 1);
>
> if (current_buckets >= buckets)
> return 0;
> @@ -1983,10 +1979,19 @@ static int __init futex_init(void)
> hashsize = max(4, hashsize);
> hashsize = roundup_pow_of_two(hashsize);
> #endif
> - futex_hashshift = ilog2(hashsize);
> + __futex_mask = hashsize - 1;
> + __futex_shift = ilog2(hashsize);
__futex_mask is always a power of two minus 1, in other words all low bits set.
Would it be worth using an n-bit zero extension operation instead of an
arbitrary 32-bit mask? This would use fewer instructions on some architectures:
for example a single ubfx on arm64 and slli+srli on riscv.
Regards,
Samuel
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [RFC PATCH v2 7/7] futex: Use runtime constants for __futex_hash() hot path
2026-03-17 3:06 ` Samuel Holland
@ 2026-03-17 5:11 ` K Prateek Nayak
0 siblings, 0 replies; 18+ messages in thread
From: K Prateek Nayak @ 2026-03-17 5:11 UTC (permalink / raw)
To: Samuel Holland
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-arm-kernel, linux-riscv, linux-s390,
Alexandre Ghiti, H. Peter Anvin, Kiryl Shutsemau,
Sean Christopherson, Charlie Jenkins, Charles Mirabile,
Christian Borntraeger, Sven Schnelle, Thomas Huth, Jisheng Zhang,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Borislav Petkov, Dave Hansen, x86, Catalin Marinas,
Will Deacon, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Arnd Bergmann
Hello Samuel,
On 3/17/2026 8:36 AM, Samuel Holland wrote:
>> @@ -1913,7 +1909,7 @@ int futex_hash_allocate_default(void)
>> * 16 <= threads * 4 <= global hash size
>> */
>> buckets = roundup_pow_of_two(4 * threads);
>> - buckets = clamp(buckets, 16, futex_hashmask + 1);
>> + buckets = clamp(buckets, 16, __futex_mask + 1);
>>
>> if (current_buckets >= buckets)
>> return 0;
>> @@ -1983,10 +1979,19 @@ static int __init futex_init(void)
>> hashsize = max(4, hashsize);
>> hashsize = roundup_pow_of_two(hashsize);
>> #endif
>> - futex_hashshift = ilog2(hashsize);
>> + __futex_mask = hashsize - 1;
>> + __futex_shift = ilog2(hashsize);
>
> __futex_mask is always a power of two minus 1, in other words all low bits set.
> Would it be worth using an n-bit zero extension operation instead of an
> arbitrary 32-bit mask? This would use fewer instructions on some architectures:
> for example a single ubfx on arm64 and slli+srli on riscv.
Sure that works for __futex_mask but runtime_const_mask_32() should be
generic enough to handle any mask, no?
Currently, the __futex_hash() with futex_hashmask ends up being:
# ./include/linux/jhash.h:139: __jhash_final(a, b, c);
xor a4,a4,a3 # tmp350, tmp353, tmp334
...
# kernel/futex/core.c:449: return &futex_queues[node][hash & futex_hashmask];
lla a3,.LANCHOR0 # tmp361,
# kernel/futex/core.c:449: return &futex_queues[node][hash & futex_hashmask];
ld a5,0(a3) # __futex_data.hashmask, __futex_data.hashmask
...
# kernel/futex/core.c:449: return &futex_queues[node][hash & futex_hashmask];
and a5,a5,a4 # tmp358, tmp367, __futex_data.hashmask
which isn't too far from what runtime_const_mask_32() implements
where "lla + ld" sequence gets replaced by the "lui + addi"
sequence to load the immediate.
Sure it can be better here since we know the bitmask is of the form
GENMASK(N,0) but IMO runtime_const_mask_32() should generally work
for all masks.
Now, runtime_const_mask_lower_32(val, nbits) may be a better suited
API name for that purpose.
If there is enough interest, I'll go back to the drawing board and
go that route for v2 for arm64 and riscv.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 18+ messages in thread