* [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation
@ 2026-04-02 11:22 K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 1/7] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86,
Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Arnd Bergmann, David Laight,
Samuel Holland
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak
tl;dr
This series introduces runtime_const_mask_32() and uses runtime
constants for __ro_after_init data in futex_hash() hot path. More
information can be found on v2 [1].
Comments that have *not* been addressed in this version
=======================================================
Samuel had an observation on v2 that __futex_mask is always of the form
((1 << bits) - 1) /* Only lower bits set; bits > 1. */
and ARM64 and RISC-V can use a single ubfx (ARM64), or slli+srli pattern
(RISC-V) for the mask operation respectively but this had the main
limitation of runtime_const_mask_32() only working with masks of such
form and others would fail runtime_const_init() at boot.
RISC-V does generated a "addi + slli" pattern with CONFIG_BASE_SMALL=y
where the futex_hash_mask can be computed at compile time.
The old scheme is retained for now since it is equivalent to the
generated asm for !CONFIG_BASE_SMALL and can handle any arbitrary masks
allowing for all future use cases.
If there is enough interest, please let me know, and I can look into
further optimization to runtime_const_mask_32() based on the current use
case for __futex_hash.
Testing
=======
Apart from x86, which was build and boot tested on baremetal, all the
other architectures have been build and boot tested with cross-compile +
QEMU with some light sanity testing on each.
Patches are based on:
git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
at commit 1086b33a3f64 ("Merge branch into tip/master: 'x86/vdso'")
(2026-04-02)
Everyone has been Cc'd on the cover-letter and the futex bits for the
context. Respective arch maintainers, reviewers, and whoever got lucky
with get_maintainer.pl have been Cc'd on their respective arch specific
changes. Futex maintainers and the lists will be receiving the whole
series (sorry in advance!)
---
changelog rfc v2..v3:
o Collected Ack from Heiko for s390 bits after folding in their
suggested changes (Thanks a ton!)
o Reordered Patch 2 and Patch 3 to allow for runtime_const_init() at
late_initcall() first before introducing runtime_const_mask_32() on
ARM64. (David)
o Moved the "&" operation outside the inline asm block on ARM64 and
RISC-V which allows the compiler to optimize it further if possible.
(David)
o Dropped the RFC tag.
v2: https://lore.kernel.org/lkml/20260316052401.18910-1-kprateek.nayak@amd.com/ [1]
changelog rfc v1..rfc v2:
o Use runtime constants to avoid the dereference overheads for
dynamically allocated futex_queues.
o arch/ side plumbings for runtime_const_mask_32()
v1: https://lore.kernel.org/all/20260128101358.20954-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (4):
arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
arm64/runtime-const: Introduce runtime_const_mask_32()
riscv/runtime-const: Introduce runtime_const_mask_32()
s390/runtime-const: Introduce runtime_const_mask_32()
Peter Zijlstra (3):
x86/runtime-const: Introduce runtime_const_mask_32()
asm-generic/runtime-const: Add dummy runtime_const_mask_32()
futex: Use runtime constants for __futex_hash() hot path
arch/arm64/include/asm/runtime-const.h | 32 ++++++++++++++------
arch/riscv/include/asm/runtime-const.h | 22 ++++++++++++++
arch/s390/include/asm/runtime-const.h | 22 +++++++++++++-
arch/x86/include/asm/runtime-const.h | 14 +++++++++
include/asm-generic/runtime-const.h | 1 +
include/asm-generic/vmlinux.lds.h | 5 ++-
kernel/futex/core.c | 42 ++++++++++++++------------
7 files changed, 107 insertions(+), 31 deletions(-)
base-commit: 1086b33a3f644c3bc37abefd699defc45accced1
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 1/7] x86/runtime-const: Introduce runtime_const_mask_32()
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
@ 2026-04-02 11:22 ` K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 2/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak
From: Peter Zijlstra <peterz@infradead.org>
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.
[ prateek: Broke off the x86 chunk, commit message. ]
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog v2..v3:
o No changes.
---
arch/x86/include/asm/runtime-const.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/x86/include/asm/runtime-const.h b/arch/x86/include/asm/runtime-const.h
index 4cd94fdcb45e..b13f7036c1c9 100644
--- a/arch/x86/include/asm/runtime-const.h
+++ b/arch/x86/include/asm/runtime-const.h
@@ -41,6 +41,15 @@
:"+r" (__ret)); \
__ret; })
+#define runtime_const_mask_32(val, sym) ({ \
+ typeof(0u+(val)) __ret = (val); \
+ asm_inline("and $0x12345678, %k0\n1:\n" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t"\
+ ".long 1b - 4 - .\n" \
+ ".popsection" \
+ : "+r" (__ret)); \
+ __ret; })
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -65,6 +74,11 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
*(unsigned char *)where = val;
}
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ *(unsigned int *)where = val;
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 2/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 1/7] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-04-02 11:22 ` K Prateek Nayak
2026-04-10 9:37 ` Catalin Marinas
2026-04-02 11:22 ` [PATCH v3 3/7] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon,
David Laight
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak, Jisheng Zhang
The current scheme to directly patch the kernel text for runtime
constants runs into the following issue with futex adapted to using
runtime constants on arm64:
Unable to handle kernel write to read-only memory at virtual address ...
The pc points to the *p assignment in the following call chain:
futex_init()
runtime_const_init(shift, __futex_shift)
__runtime_fixup_shift()
*p = cpu_to_le32(insn);
which suggests that core_initcall() is too late to patch the kernel text
directly unlike the "d_hash_shift" which is initialized during
vfs_caches_init_early() before the protections are in place.
Use aarch64_insn_patch_text_nosync() to patch the runtime constants
instead of doing it directly to allow runtime_const_init() slightly
later into the boot.
Since aarch64_insn_patch_text_nosync() calls caches_clean_inval_pou()
internally, __runtime_fixup_caches() ends up being redundant.
runtime_const_init() are rare and the overheads of multiple calls to
caches_clean_inval_pou() instead of batching them together should be
negligible in practice.
The cpu_to_le32() conversion of instruction isn't necessary since it is
handled later in the aarch64_insn_patch_text_nosync() call-chain:
aarch64_insn_patch_text_nosync(addr, insn)
aarch64_insn_write(addr, insn)
__aarch64_insn_write(addr, cpu_to_le32(insn))
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog v2..v3:
o Reordered this to come before the introduction of
runtime_const_mask_32(). (David)
o Trimmed down the commit message to be more precise.
---
arch/arm64/include/asm/runtime-const.h | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index c3dbd3ae68f6..a3106f80912b 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -7,6 +7,7 @@
#endif
#include <asm/cacheflush.h>
+#include <asm/text-patching.h>
/* Sigh. You can still run arm64 in BE mode */
#include <asm/byteorder.h>
@@ -50,13 +51,7 @@ static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
u32 insn = le32_to_cpu(*p);
insn &= 0xffe0001f;
insn |= (val & 0xffff) << 5;
- *p = cpu_to_le32(insn);
-}
-
-static inline void __runtime_fixup_caches(void *where, unsigned int insns)
-{
- unsigned long va = (unsigned long)where;
- caches_clean_inval_pou(va, va + 4*insns);
+ aarch64_insn_patch_text_nosync(p, insn);
}
static inline void __runtime_fixup_ptr(void *where, unsigned long val)
@@ -66,7 +61,6 @@ static inline void __runtime_fixup_ptr(void *where, unsigned long val)
__runtime_fixup_16(p+1, val >> 16);
__runtime_fixup_16(p+2, val >> 32);
__runtime_fixup_16(p+3, val >> 48);
- __runtime_fixup_caches(where, 4);
}
/* Immediate value is 6 bits starting at bit #16 */
@@ -76,8 +70,7 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
u32 insn = le32_to_cpu(*p);
insn &= 0xffc0ffff;
insn |= (val & 63) << 16;
- *p = cpu_to_le32(insn);
- __runtime_fixup_caches(where, 1);
+ aarch64_insn_patch_text_nosync(p, insn);
}
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 3/7] arm64/runtime-const: Introduce runtime_const_mask_32()
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 1/7] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 2/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
@ 2026-04-02 11:22 ` K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 4/7] riscv/runtime-const: " K Prateek Nayak
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Catalin Marinas, Will Deacon,
David Laight
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak, Jisheng Zhang
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. GCC generates a:
movz w1, #lo16, lsl #0 // w1 = bits [15:0]
movk w1, #hi16, lsl #16 // w1 = full 32-bit value
and w0, w0, w1 // w0 = w0 & w1
pattern to tackle arbitrary 32-bit masks and the same was also suggested
by Claude which is implemented here. The (__mask & value) operation is
intentiaonally placed outside of asm block to allow compilers to further
optimize it if possible.
__runtime_fixup_ptr() already patches a "movz, + movk lsl #16" sequence
which has been reused to patch the same sequence for
__runtime_fixup_mask().
Assisted-by: Claude:claude-sonnet-4-5
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog v2..v3:
o Reordered this to come after the text patching fixes for ARM64.
(David)
o Moved the "&" operation outside the inline asm block to allow for
compilers to further optimize it if possible. (David)
---
arch/arm64/include/asm/runtime-const.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
index a3106f80912b..21f817eb5951 100644
--- a/arch/arm64/include/asm/runtime-const.h
+++ b/arch/arm64/include/asm/runtime-const.h
@@ -36,6 +36,17 @@
:"r" (0u+(val))); \
__ret; })
+#define runtime_const_mask_32(val, sym) ({ \
+ unsigned long __mask; \
+ asm_inline("1:\t" \
+ "movz %w0, #0xcdef\n\t" \
+ "movk %w0, #0x89ab, lsl #16\n\t" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
+ ".long 1b - .\n\t" \
+ ".popsection" \
+ :"=r" (__mask)); \
+ (__mask & val); })
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -73,6 +84,14 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
aarch64_insn_patch_text_nosync(p, insn);
}
+/* Immediate value is 6 bits starting at bit #16 */
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ __le32 *p = lm_alias(where);
+ __runtime_fixup_16(p, val);
+ __runtime_fixup_16(p+1, val >> 16);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 4/7] riscv/runtime-const: Introduce runtime_const_mask_32()
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (2 preceding siblings ...)
2026-04-02 11:22 ` [PATCH v3 3/7] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
@ 2026-04-02 11:22 ` K Prateek Nayak
2026-04-03 9:42 ` Guo Ren
2026-04-02 11:22 ` [PATCH v3 5/7] s390/runtime-const: " K Prateek Nayak
` (2 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Samuel Holland, David Laight
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak, Alexandre Ghiti, Charlie Jenkins,
Charles Mirabile
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path. GCC generates a:
lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
addi a0, a0, 0x678 # lower 12 bits
and a1, a1, a0 # a1 = a1 & a0
pattern to tackle arbitrary 32-bit masks and the same was also suggested
by Claude which is implemented here. The (__mask & val) operation is
intentionally placed outside of asm block to allow compilers to further
optimize it if possible.
__runtime_fixup_ptr() already patches a "lui + addi" sequence which has
been reused to patch the same sequence for __runtime_fixup_mask().
Assisted-by: Claude:claude-sonnet-4-5
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog v2..v3:
o Moved the "&" operation outside the inline asm block to allow for
compilers to further optimize it if possible. (Based on David's
comment on ARM64 bits).
---
arch/riscv/include/asm/runtime-const.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
index d766e2b9e6df..85efba8ecf12 100644
--- a/arch/riscv/include/asm/runtime-const.h
+++ b/arch/riscv/include/asm/runtime-const.h
@@ -153,6 +153,22 @@
__ret; \
})
+#define runtime_const_mask_32(val, sym) \
+({ \
+ u32 __mask; \
+ asm_inline(".option push\n\t" \
+ ".option norvc\n\t" \
+ "1:\t" \
+ "lui %[__mask],0x89abd\n\t" \
+ "addi %[__mask],%[__mask],-0x211\n\t" \
+ ".option pop\n\t" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
+ ".long 1b - .\n\t" \
+ ".popsection" \
+ : [__mask] "=r" (__mask)); \
+ (__mask & val); \
+})
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -256,6 +272,12 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
mutex_unlock(&text_mutex);
}
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ __runtime_fixup_32(where, where + 4, val);
+ __runtime_fixup_caches(where, 2);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 5/7] s390/runtime-const: Introduce runtime_const_mask_32()
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (3 preceding siblings ...)
2026-04-02 11:22 ` [PATCH v3 4/7] riscv/runtime-const: " K Prateek Nayak
@ 2026-04-02 11:22 ` K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 6/7] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 7/7] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
6 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak, Sven Schnelle
Futex hash computation requires a mask operation with read-only after
init data that will be converted to a runtime constant in the subsequent
commit.
Introduce runtime_const_mask_32 to further optimize the mask operation
in the futex hash computation hot path.
GCC generates a:
nilf %r1,<imm32>
to tackle arbitrary 32-bit masks and the same is implemented here.
Immediate patching pattern for __runtime_fixup_mask() has been adopted
from __runtime_fixup_ptr().
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog v2..v3:
o Collected Ack from Heiko after folding in the suggested diff. (Thanks
a ton!)
---
arch/s390/include/asm/runtime-const.h | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/arch/s390/include/asm/runtime-const.h b/arch/s390/include/asm/runtime-const.h
index 17878b1d048c..7b71156031ec 100644
--- a/arch/s390/include/asm/runtime-const.h
+++ b/arch/s390/include/asm/runtime-const.h
@@ -33,6 +33,20 @@
__ret; \
})
+#define runtime_const_mask_32(val, sym) \
+({ \
+ unsigned int __ret = (val); \
+ \
+ asm_inline( \
+ "0: nilf %[__ret],12\n" \
+ ".pushsection runtime_mask_" #sym ",\"a\"\n" \
+ ".long 0b - .\n" \
+ ".popsection" \
+ : [__ret] "+d" (__ret) \
+ : : "cc"); \
+ __ret; \
+})
+
#define runtime_const_init(type, sym) do { \
extern s32 __start_runtime_##type##_##sym[]; \
extern s32 __stop_runtime_##type##_##sym[]; \
@@ -43,12 +57,12 @@
__stop_runtime_##type##_##sym); \
} while (0)
-/* 32-bit immediate for iihf and iilf in bits in I2 field */
static inline void __runtime_fixup_32(u32 *p, unsigned int val)
{
s390_kernel_write(p, &val, sizeof(val));
}
+/* 32-bit immediate for iihf and iilf in bits in I2 field */
static inline void __runtime_fixup_ptr(void *where, unsigned long val)
{
__runtime_fixup_32(where + 2, val >> 32);
@@ -65,6 +79,12 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
s390_kernel_write(where, &insn, sizeof(insn));
}
+/* 32-bit immediate for nilf in bits in I2 field */
+static inline void __runtime_fixup_mask(void *where, unsigned long val)
+{
+ __runtime_fixup_32(where + 2, val);
+}
+
static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
unsigned long val, s32 *start, s32 *end)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 6/7] asm-generic/runtime-const: Add dummy runtime_const_mask_32()
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (4 preceding siblings ...)
2026-04-02 11:22 ` [PATCH v3 5/7] s390/runtime-const: " K Prateek Nayak
@ 2026-04-02 11:22 ` K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 7/7] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
6 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Arnd Bergmann
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak
From: Peter Zijlstra <peterz@infradead.org>
Add a dummy runtime_const_mask_32() for all the architectures that do
not support runtime-const.
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog v2..v3:
o No changes.
---
include/asm-generic/runtime-const.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/asm-generic/runtime-const.h b/include/asm-generic/runtime-const.h
index 670499459514..03e6e3e02401 100644
--- a/include/asm-generic/runtime-const.h
+++ b/include/asm-generic/runtime-const.h
@@ -10,6 +10,7 @@
*/
#define runtime_const_ptr(sym) (sym)
#define runtime_const_shift_right_32(val, sym) ((u32)(val)>>(sym))
+#define runtime_const_mask_32(val, sym) ((u32)(val)&(sym))
#define runtime_const_init(type,sym) do { } while (0)
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 7/7] futex: Use runtime constants for __futex_hash() hot path
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
` (5 preceding siblings ...)
2026-04-02 11:22 ` [PATCH v3 6/7] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
@ 2026-04-02 11:22 ` K Prateek Nayak
6 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-02 11:22 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Borislav Petkov, Dave Hansen, x86,
Catalin Marinas, Will Deacon, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Arnd Bergmann, David Laight,
Samuel Holland
Cc: Darren Hart, Davidlohr Bueso, André Almeida, linux-arch,
linux-kernel, linux-s390, linux-riscv, linux-arm-kernel,
K Prateek Nayak
From: Peter Zijlstra <peterz@infradead.org>
Runtime constify the read-only after init data __futex_shift(shift_32),
__futex_mask(mask_32), and __futex_queues(ptr) used in __futex_hash()
hot path to avoid referencing global variable.
This also allows __futex_queues to be allocated dynamically to
"nr_node_ids" slots instead of reserving config dependent MAX_NUMNODES
(1 << CONFIG_NODES_SHIFT) worth of slots upfront.
No functional chages intended.
[ prateek: Dynamically allocate __futex_queues, mark the global data
__ro_after_init since they are constified after futex_init(). ]
Link: https://patch.msgid.link/20260227161841.GH606826@noisy.programming.kicks-ass.net
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> # MAX_NUMNODES bloat
Not-yet-signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
Changelog v2..v3:
o No changes.
---
include/asm-generic/vmlinux.lds.h | 5 +++-
kernel/futex/core.c | 42 +++++++++++++++++--------------
2 files changed, 27 insertions(+), 20 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 1e1580febe4b..86f99fa6ae24 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -975,7 +975,10 @@
RUNTIME_CONST(shift, d_hash_shift) \
RUNTIME_CONST(ptr, dentry_hashtable) \
RUNTIME_CONST(ptr, __dentry_cache) \
- RUNTIME_CONST(ptr, __names_cache)
+ RUNTIME_CONST(ptr, __names_cache) \
+ RUNTIME_CONST(shift, __futex_shift) \
+ RUNTIME_CONST(mask, __futex_mask) \
+ RUNTIME_CONST(ptr, __futex_queues)
/* Alignment must be consistent with (kunit_suite *) in include/kunit/test.h */
#define KUNIT_TABLE() \
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ff2a4fb2993f..73eade7184dc 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -45,23 +45,19 @@
#include <linux/mempolicy.h>
#include <linux/mmap_lock.h>
+#include <asm/runtime-const.h>
+
#include "futex.h"
#include "../locking/rtmutex_common.h"
-/*
- * The base of the bucket array and its size are always used together
- * (after initialization only in futex_hash()), so ensure that they
- * reside in the same cacheline.
- */
-static struct {
- unsigned long hashmask;
- unsigned int hashshift;
- struct futex_hash_bucket *queues[MAX_NUMNODES];
-} __futex_data __read_mostly __aligned(2*sizeof(long));
+static u32 __futex_mask __ro_after_init;
+static u32 __futex_shift __ro_after_init;
+static struct futex_hash_bucket **__futex_queues __ro_after_init;
-#define futex_hashmask (__futex_data.hashmask)
-#define futex_hashshift (__futex_data.hashshift)
-#define futex_queues (__futex_data.queues)
+static __always_inline struct futex_hash_bucket **futex_queues(void)
+{
+ return runtime_const_ptr(__futex_queues);
+}
struct futex_private_hash {
int state;
@@ -439,14 +435,14 @@ __futex_hash(union futex_key *key, struct futex_private_hash *fph)
* NOTE: this isn't perfectly uniform, but it is fast and
* handles sparse node masks.
*/
- node = (hash >> futex_hashshift) % nr_node_ids;
+ node = runtime_const_shift_right_32(hash, __futex_shift) % nr_node_ids;
if (!node_possible(node)) {
node = find_next_bit_wrap(node_possible_map.bits,
nr_node_ids, node);
}
}
- return &futex_queues[node][hash & futex_hashmask];
+ return &futex_queues()[node][runtime_const_mask_32(hash, __futex_mask)];
}
/**
@@ -1916,7 +1912,7 @@ int futex_hash_allocate_default(void)
* 16 <= threads * 4 <= global hash size
*/
buckets = roundup_pow_of_two(4 * threads);
- buckets = clamp(buckets, 16, futex_hashmask + 1);
+ buckets = clamp(buckets, 16, __futex_mask + 1);
if (current_buckets >= buckets)
return 0;
@@ -1986,10 +1982,19 @@ static int __init futex_init(void)
hashsize = max(4, hashsize);
hashsize = roundup_pow_of_two(hashsize);
#endif
- futex_hashshift = ilog2(hashsize);
+ __futex_mask = hashsize - 1;
+ __futex_shift = ilog2(hashsize);
size = sizeof(struct futex_hash_bucket) * hashsize;
order = get_order(size);
+ __futex_queues = kcalloc(nr_node_ids, sizeof(*__futex_queues), GFP_KERNEL);
+
+ runtime_const_init(shift, __futex_shift);
+ runtime_const_init(mask, __futex_mask);
+ runtime_const_init(ptr, __futex_queues);
+
+ BUG_ON(!futex_queues());
+
for_each_node(n) {
struct futex_hash_bucket *table;
@@ -2003,10 +2008,9 @@ static int __init futex_init(void)
for (i = 0; i < hashsize; i++)
futex_hash_bucket_init(&table[i], NULL);
- futex_queues[n] = table;
+ futex_queues()[n] = table;
}
- futex_hashmask = hashsize - 1;
pr_info("futex hash table entries: %lu (%lu bytes on %d NUMA nodes, total %lu KiB, %s).\n",
hashsize, size, num_possible_nodes(), size * num_possible_nodes() / 1024,
order > MAX_PAGE_ORDER ? "vmalloc" : "linear");
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 4/7] riscv/runtime-const: Introduce runtime_const_mask_32()
2026-04-02 11:22 ` [PATCH v3 4/7] riscv/runtime-const: " K Prateek Nayak
@ 2026-04-03 9:42 ` Guo Ren
2026-04-03 10:35 ` K Prateek Nayak
0 siblings, 1 reply; 11+ messages in thread
From: Guo Ren @ 2026-04-03 9:42 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Samuel Holland, David Laight, Darren Hart,
Davidlohr Bueso, André Almeida, linux-arch, linux-kernel,
linux-s390, linux-riscv, linux-arm-kernel, Alexandre Ghiti,
Charlie Jenkins, Charles Mirabile
On Thu, Apr 2, 2026 at 7:39 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Futex hash computation requires a mask operation with read-only after
> init data that will be converted to a runtime constant in the subsequent
> commit.
>
> Introduce runtime_const_mask_32 to further optimize the mask operation
> in the futex hash computation hot path. GCC generates a:
>
> lui a0, 0x12346 # upper; +0x800 then >>12 for correct rounding
> addi a0, a0, 0x678 # lower 12 bits
> and a1, a1, a0 # a1 = a1 & a0
>
> pattern to tackle arbitrary 32-bit masks and the same was also suggested
> by Claude which is implemented here. The (__mask & val) operation is
> intentionally placed outside of asm block to allow compilers to further
> optimize it if possible.
>
> __runtime_fixup_ptr() already patches a "lui + addi" sequence which has
> been reused to patch the same sequence for __runtime_fixup_mask().
>
> Assisted-by: Claude:claude-sonnet-4-5
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> Changelog v2..v3:
>
> o Moved the "&" operation outside the inline asm block to allow for
> compilers to further optimize it if possible. (Based on David's
> comment on ARM64 bits).
> ---
> arch/riscv/include/asm/runtime-const.h | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
> index d766e2b9e6df..85efba8ecf12 100644
> --- a/arch/riscv/include/asm/runtime-const.h
> +++ b/arch/riscv/include/asm/runtime-const.h
> @@ -153,6 +153,22 @@
> __ret; \
> })
>
> +#define runtime_const_mask_32(val, sym) \
> +({ \
> + u32 __mask; \
> + asm_inline(".option push\n\t" \
> + ".option norvc\n\t" \
> + "1:\t" \
> + "lui %[__mask],0x89abd\n\t" \
> + "addi %[__mask],%[__mask],-0x211\n\t" \
Ref include/uapi/linux/reboot.h:
#define LINUX_REBOOT_CMD_CAD_ON 0x89ABCDEF
#define RUNTIME_MAGIC 0x89ABCDEF
"lui %[__mask], %%hi(RUNTIME_MAGIC)\n\t"
"addi %[__mask], %[__mask], %%lo(RUNTIME_MAGIC)\n\t"
> + ".option pop\n\t" \
> + ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
> + ".long 1b - .\n\t" \
> + ".popsection" \
> + : [__mask] "=r" (__mask)); \
> + (__mask & val); \
> +})
> +
> #define runtime_const_init(type, sym) do { \
> extern s32 __start_runtime_##type##_##sym[]; \
> extern s32 __stop_runtime_##type##_##sym[]; \
> @@ -256,6 +272,12 @@ static inline void __runtime_fixup_shift(void *where, unsigned long val)
> mutex_unlock(&text_mutex);
> }
>
> +static inline void __runtime_fixup_mask(void *where, unsigned long val)
> +{
> + __runtime_fixup_32(where, where + 4, val);
> + __runtime_fixup_caches(where, 2);
> +}
> +
> static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
> unsigned long val, s32 *start, s32 *end)
> {
> --
> 2.34.1
>
>
--
Best Regards
Guo Ren
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 4/7] riscv/runtime-const: Introduce runtime_const_mask_32()
2026-04-03 9:42 ` Guo Ren
@ 2026-04-03 10:35 ` K Prateek Nayak
0 siblings, 0 replies; 11+ messages in thread
From: K Prateek Nayak @ 2026-04-03 10:35 UTC (permalink / raw)
To: Guo Ren
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Samuel Holland, David Laight, Darren Hart,
Davidlohr Bueso, André Almeida, linux-arch, linux-kernel,
linux-s390, linux-riscv, linux-arm-kernel, Alexandre Ghiti,
Charlie Jenkins, Charles Mirabile
Hello Guo,
On 4/3/2026 3:12 PM, Guo Ren wrote:
>> diff --git a/arch/riscv/include/asm/runtime-const.h b/arch/riscv/include/asm/runtime-const.h
>> index d766e2b9e6df..85efba8ecf12 100644
>> --- a/arch/riscv/include/asm/runtime-const.h
>> +++ b/arch/riscv/include/asm/runtime-const.h
>> @@ -153,6 +153,22 @@
>> __ret; \
>> })
>>
>> +#define runtime_const_mask_32(val, sym) \
>> +({ \
>> + u32 __mask; \
>> + asm_inline(".option push\n\t" \
>> + ".option norvc\n\t" \
>> + "1:\t" \
>> + "lui %[__mask],0x89abd\n\t" \
>> + "addi %[__mask],%[__mask],-0x211\n\t" \
> Ref include/uapi/linux/reboot.h:
> #define LINUX_REBOOT_CMD_CAD_ON 0x89ABCDEF
>
> #define RUNTIME_MAGIC 0x89ABCDEF
>
> "lui %[__mask], %%hi(RUNTIME_MAGIC)\n\t"
> "addi %[__mask], %[__mask], %%lo(RUNTIME_MAGIC)\n\t"
Ack! I'll clean it up in the next version while also fixing the
stuff that Sashiko reported.
Thanks a ton for taking a look at the series.
>
>
>> + ".option pop\n\t" \
>> + ".pushsection runtime_mask_" #sym ",\"a\"\n\t" \
>> + ".long 1b - .\n\t" \
>> + ".popsection" \
>> + : [__mask] "=r" (__mask)); \
>> + (__mask & val); \
>> +})
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching
2026-04-02 11:22 ` [PATCH v3 2/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
@ 2026-04-10 9:37 ` Catalin Marinas
0 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2026-04-10 9:37 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
Sebastian Andrzej Siewior, Will Deacon, David Laight, Darren Hart,
Davidlohr Bueso, André Almeida, linux-arch, linux-kernel,
linux-s390, linux-riscv, linux-arm-kernel, Jisheng Zhang
On Thu, Apr 02, 2026 at 11:22:45AM +0000, K Prateek Nayak wrote:
> diff --git a/arch/arm64/include/asm/runtime-const.h b/arch/arm64/include/asm/runtime-const.h
> index c3dbd3ae68f6..a3106f80912b 100644
> --- a/arch/arm64/include/asm/runtime-const.h
> +++ b/arch/arm64/include/asm/runtime-const.h
> @@ -7,6 +7,7 @@
> #endif
>
> #include <asm/cacheflush.h>
> +#include <asm/text-patching.h>
>
> /* Sigh. You can still run arm64 in BE mode */
> #include <asm/byteorder.h>
> @@ -50,13 +51,7 @@ static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
> u32 insn = le32_to_cpu(*p);
> insn &= 0xffe0001f;
> insn |= (val & 0xffff) << 5;
> - *p = cpu_to_le32(insn);
> -}
> -
> -static inline void __runtime_fixup_caches(void *where, unsigned int insns)
> -{
> - unsigned long va = (unsigned long)where;
> - caches_clean_inval_pou(va, va + 4*insns);
> + aarch64_insn_patch_text_nosync(p, insn);
> }
Sashiko has some good points here:
https://sashiko.dev/#/patchset/20260402112250.2138-1-kprateek.nayak@amd.com
In short, aarch64_insn_patch_text_nosync() does not expect a linear map
address but rather a kernel text one (or vmalloc/modules). The other
valid point is on aliasing I-caches.
I think dropping the lm_alias() and just use 'where' directly would do
but I haven't tried.
--
Catalin
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-10 9:38 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02 11:22 [PATCH v3 0/7] futex: Use runtime constants for futex_hash computation K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 1/7] x86/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 2/7] arm64/runtime-const: Use aarch64_insn_patch_text_nosync() for patching K Prateek Nayak
2026-04-10 9:37 ` Catalin Marinas
2026-04-02 11:22 ` [PATCH v3 3/7] arm64/runtime-const: Introduce runtime_const_mask_32() K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 4/7] riscv/runtime-const: " K Prateek Nayak
2026-04-03 9:42 ` Guo Ren
2026-04-03 10:35 ` K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 5/7] s390/runtime-const: " K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 6/7] asm-generic/runtime-const: Add dummy runtime_const_mask_32() K Prateek Nayak
2026-04-02 11:22 ` [PATCH v3 7/7] futex: Use runtime constants for __futex_hash() hot path K Prateek Nayak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox