* [PATCH v17 01/16] preempt: Introduce HARDIRQ_DISABLE_BITS
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter Lyude Paul
` (16 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
From: Boqun Feng <boqun.feng@gmail.com>
In order to support preempt_disable()-like interrupt disabling, that is,
using part of preempt_count() to track interrupt disabling nested level,
change the preempt_count() layout to contain 8-bit HARDIRQ_DISABLE
count.
Note that HARDIRQ_BITS and NMI_BITS are reduced by 1 because of this,
and it changes the maximum of their (hardirq and nmi) nesting level.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
V14:
* Fix HARDIRQ_DISABLE_MASK definition
include/linux/preempt.h | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index d964f965c8ffc..f07e7f37f3ca5 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -17,6 +17,7 @@
*
* - bits 0-7 are the preemption count (max preemption depth: 256)
* - bits 8-15 are the softirq count (max # of softirqs: 256)
+ * - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
*
* The hardirq count could in theory be the same as the number of
* interrupts in the system, but we run all interrupt handlers with
@@ -26,29 +27,34 @@
*
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
- * HARDIRQ_MASK: 0x000f0000
- * NMI_MASK: 0x00f00000
+ * HARDIRQ_DISABLE_MASK: 0x00ff0000
+ * HARDIRQ_MASK: 0x07000000
+ * NMI_MASK: 0x38000000
* PREEMPT_NEED_RESCHED: 0x80000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
-#define HARDIRQ_BITS 4
-#define NMI_BITS 4
+#define HARDIRQ_DISABLE_BITS 8
+#define HARDIRQ_BITS 3
+#define NMI_BITS 3
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
-#define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_DISABLE_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_SHIFT (HARDIRQ_DISABLE_SHIFT + HARDIRQ_DISABLE_BITS)
#define NMI_SHIFT (HARDIRQ_SHIFT + HARDIRQ_BITS)
#define __IRQ_MASK(x) ((1UL << (x))-1)
#define PREEMPT_MASK (__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
#define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_MASK (__IRQ_MASK(HARDIRQ_DISABLE_BITS) << HARDIRQ_DISABLE_SHIFT)
#define HARDIRQ_MASK (__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
#define NMI_MASK (__IRQ_MASK(NMI_BITS) << NMI_SHIFT)
#define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT)
#define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_OFFSET (1UL << HARDIRQ_DISABLE_SHIFT)
#define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
#define NMI_OFFSET (1UL << NMI_SHIFT)
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
2026-01-21 22:39 ` [PATCH v17 01/16] preempt: Introduce HARDIRQ_DISABLE_BITS Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-02-03 11:44 ` Peter Zijlstra
2026-02-03 12:15 ` Peter Zijlstra
2026-01-21 22:39 ` [PATCH v17 03/16] preempt: Introduce __preempt_count_{sub, add}_return() Lyude Paul
` (15 subsequent siblings)
17 siblings, 2 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long, Joel Fernandes
From: Joel Fernandes <joelagnelf@nvidia.com>
Move NMI nesting tracking from the preempt_count bits to a separate per-CPU
counter (nmi_nesting). This is to free up the NMI bits in the preempt_count,
allowing those bits to be repurposed for other uses. This also has the benefit
of tracking more than 16-levels deep if there is ever a need.
Reduce multiple bits in preempt_count for NMI tracking. Reduce NMI_BITS
from 3 to 1, using it only to detect if we're in an NMI.
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
include/linux/hardirq.h | 16 ++++++++++++----
include/linux/preempt.h | 13 +++++++++----
kernel/softirq.c | 2 ++
3 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d57cab4d4c06f..cc06bda52c3e5 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -10,6 +10,8 @@
#include <linux/vtime.h>
#include <asm/hardirq.h>
+DECLARE_PER_CPU(unsigned int, nmi_nesting);
+
extern void synchronize_irq(unsigned int irq);
extern bool synchronize_hardirq(unsigned int irq);
@@ -102,14 +104,16 @@ void irq_exit_rcu(void);
*/
/*
- * nmi_enter() can nest up to 15 times; see NMI_BITS.
+ * nmi_enter() can nest - nesting is tracked in a per-CPU counter.
*/
#define __nmi_enter() \
do { \
lockdep_off(); \
arch_nmi_enter(); \
- BUG_ON(in_nmi() == NMI_MASK); \
- __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
+ BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX); \
+ __this_cpu_inc(nmi_nesting); \
+ __preempt_count_add(HARDIRQ_OFFSET); \
+ preempt_count_set(preempt_count() | NMI_MASK); \
} while (0)
#define nmi_enter() \
@@ -124,8 +128,12 @@ void irq_exit_rcu(void);
#define __nmi_exit() \
do { \
+ unsigned int nesting; \
BUG_ON(!in_nmi()); \
- __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
+ __preempt_count_sub(HARDIRQ_OFFSET); \
+ nesting = __this_cpu_dec_return(nmi_nesting); \
+ if (!nesting) \
+ __preempt_count_sub(NMI_OFFSET); \
arch_nmi_exit(); \
lockdep_on(); \
} while (0)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index f07e7f37f3ca5..e2d3079d3f5f1 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -18,6 +18,8 @@
* - bits 0-7 are the preemption count (max preemption depth: 256)
* - bits 8-15 are the softirq count (max # of softirqs: 256)
* - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
+ * - bits 24-27 are the hardirq count (max # of hardirqs: 16)
+ * - bit 28 is the NMI flag (no nesting count, tracked separately)
*
* The hardirq count could in theory be the same as the number of
* interrupts in the system, but we run all interrupt handlers with
@@ -25,18 +27,21 @@
* there are a few palaeontologic drivers which reenable interrupts in
* the handler, so we need more than one bit here.
*
+ * NMI nesting depth is tracked in a separate per-CPU variable
+ * (nmi_nesting) to save bits in preempt_count.
+ *
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_DISABLE_MASK: 0x00ff0000
- * HARDIRQ_MASK: 0x07000000
- * NMI_MASK: 0x38000000
+ * HARDIRQ_MASK: 0x0f000000
+ * NMI_MASK: 0x10000000
* PREEMPT_NEED_RESCHED: 0x80000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define HARDIRQ_DISABLE_BITS 8
-#define HARDIRQ_BITS 3
-#define NMI_BITS 3
+#define HARDIRQ_BITS 4
+#define NMI_BITS 1
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 77198911b8dd4..af47ea23aba3b 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
#endif
+DEFINE_PER_CPU(unsigned int, nmi_nesting);
+
/*
* SOFTIRQ_OFFSET usage:
*
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-01-21 22:39 ` [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter Lyude Paul
@ 2026-02-03 11:44 ` Peter Zijlstra
2026-02-06 1:22 ` Joel Fernandes
2026-02-03 12:15 ` Peter Zijlstra
1 sibling, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2026-02-03 11:44 UTC (permalink / raw)
To: Lyude Paul
Cc: rust-for-linux, linux-kernel, Thomas Gleixner, Boqun Feng,
Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Wed, Jan 21, 2026 at 05:39:05PM -0500, Lyude Paul wrote:
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 77198911b8dd4..af47ea23aba3b 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -88,6 +88,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
> EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
> #endif
>
> +DEFINE_PER_CPU(unsigned int, nmi_nesting);
What happened with putting this in the same line as preempt_count?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-03 11:44 ` Peter Zijlstra
@ 2026-02-06 1:22 ` Joel Fernandes
0 siblings, 0 replies; 47+ messages in thread
From: Joel Fernandes @ 2026-02-06 1:22 UTC (permalink / raw)
To: Peter Zijlstra, Lyude Paul
Cc: rust-for-linux, linux-kernel, Thomas Gleixner, Boqun Feng,
Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long
On 2/3/2026 6:44 AM, Peter Zijlstra wrote:
> On Wed, Jan 21, 2026 at 05:39:05PM -0500, Lyude Paul wrote:
>
>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>> index 77198911b8dd4..af47ea23aba3b 100644
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -88,6 +88,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
>> EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
>> #endif
>>
>> +DEFINE_PER_CPU(unsigned int, nmi_nesting);
>
> What happened with putting this in the same line as preempt_count?
I can try to do that again if we still want to go with this patch. When I tried
that last, I ran into issues I can't remember now (the space being limited?).
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-01-21 22:39 ` [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter Lyude Paul
2026-02-03 11:44 ` Peter Zijlstra
@ 2026-02-03 12:15 ` Peter Zijlstra
2026-02-04 11:12 ` Peter Zijlstra
1 sibling, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2026-02-03 12:15 UTC (permalink / raw)
To: Lyude Paul
Cc: rust-for-linux, linux-kernel, Thomas Gleixner, Boqun Feng,
Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Wed, Jan 21, 2026 at 05:39:05PM -0500, Lyude Paul wrote:
> #define __nmi_enter() \
> do { \
> lockdep_off(); \
> arch_nmi_enter(); \
> - BUG_ON(in_nmi() == NMI_MASK); \
> - __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
> + BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX); \
> + __this_cpu_inc(nmi_nesting); \
> + __preempt_count_add(HARDIRQ_OFFSET); \
> + preempt_count_set(preempt_count() | NMI_MASK); \
> } while (0)
>
> #define nmi_enter() \
> @@ -124,8 +128,12 @@ void irq_exit_rcu(void);
>
> #define __nmi_exit() \
> do { \
> + unsigned int nesting; \
> BUG_ON(!in_nmi()); \
> - __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
> + __preempt_count_sub(HARDIRQ_OFFSET); \
> + nesting = __this_cpu_dec_return(nmi_nesting); \
> + if (!nesting) \
> + __preempt_count_sub(NMI_OFFSET); \
> arch_nmi_exit(); \
> lockdep_on(); \
> } while (0)
While not wrong like last time; it is pretty awful.
preempt_count_set() is a cmpxchg() loop.
Would not something like so be better?
#define __nmi_enter() \
do { \
+ unsigned int _o = NMI_MASK + HARDIRQ_OFFSET; \
lockdep_off(); \
arch_nmi_enter(); \
- BUG_ON(in_nmi() == NMI_MASK); \
- __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
+ BUG_ON(__this_cpu_read(nmi_nesting) == ~0U); \
+ __this_cpu_inc(nmi_nesting); \
+ _o -= (preempt_count() & NMI_MASK); \
+ __preempt_count_add(_o); \
} while (0)
#define __nmi_exit() \
do { \
+ unsigned int _o = HARDIRQ_OFFSET; \
BUG_ON(!in_nmi()); \
- __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
+ if (!__this_cpu_dec_return(nmi_nesting)) \
+ _o += NMI_MASK; \
+ __preempt_count_sub(_o); \
arch_nmi_exit(); \
lockdep_on(); \
} while (0)
But I'm really somewhat sad that 64bit can't do better than this.
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-03 12:15 ` Peter Zijlstra
@ 2026-02-04 11:12 ` Peter Zijlstra
2026-02-04 12:32 ` Gary Guo
` (2 more replies)
0 siblings, 3 replies; 47+ messages in thread
From: Peter Zijlstra @ 2026-02-04 11:12 UTC (permalink / raw)
To: Lyude Paul
Cc: rust-for-linux, linux-kernel, Thomas Gleixner, Boqun Feng,
Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
> But I'm really somewhat sad that 64bit can't do better than this.
Here, the below builds and boots (albeit with warnings because printf
format crap sucks).
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/preempt.h | 53 ++++++++++++++++++++++++++++++------------
arch/x86/kernel/cpu/common.c | 2 +-
include/linux/hardirq.h | 7 +++---
include/linux/preempt.h | 52 ++++++++++++++++++++++++++++++++++-------
init/main.c | 2 +-
kernel/Kconfig.preempt | 4 ++++
kernel/sched/core.c | 8 +++----
kernel/softirq.c | 10 +++++++-
kernel/time/timer.c | 2 +-
lib/locking-selftest.c | 2 +-
11 files changed, 106 insertions(+), 37 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f859..2bd1972fd4c7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -326,6 +326,7 @@ config X86
select USER_STACKTRACE_SUPPORT
select HAVE_ARCH_KCSAN if X86_64
select PROC_PID_ARCH_STATUS if PROC_FS
+ select PREEMPT_LONG if X86_64
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16
select FUNCTION_ALIGNMENT_4B
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 578441db09f0..1b54d5555138 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -7,10 +7,19 @@
#include <linux/static_call_types.h>
-DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
+DECLARE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count);
-/* We use the MSB mostly because its available */
-#define PREEMPT_NEED_RESCHED 0x80000000
+/*
+ * We use the MSB for PREEMPT_NEED_RESCHED mostly because it is available.
+ */
+
+#ifdef CONFIG_64BIT
+#define PREEMPT_NEED_RESCHED (~((-1L) >> 1))
+#define __pc_op(op, ...) raw_cpu_##op##_8(__VA_ARGS__)
+#else
+#define PREEMPT_NEED_RESCHED (~((-1) >> 1))
+#define __pc_op(op, ...) raw_cpu_##op##_4(__VA_ARGS__)
+#endif
/*
* We use the PREEMPT_NEED_RESCHED bit as an inverted NEED_RESCHED such
@@ -24,18 +33,18 @@ DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
*/
static __always_inline int preempt_count(void)
{
- return raw_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+ return __pc_op(read, __preempt_count) & ~PREEMPT_NEED_RESCHED;
}
-static __always_inline void preempt_count_set(int pc)
+static __always_inline void preempt_count_set(long pc)
{
int old, new;
- old = raw_cpu_read_4(__preempt_count);
+ old = __pc_op(read, __preempt_count);
do {
new = (old & PREEMPT_NEED_RESCHED) |
(pc & ~PREEMPT_NEED_RESCHED);
- } while (!raw_cpu_try_cmpxchg_4(__preempt_count, &old, new));
+ } while (!__pc_op(try_cmpxchg, __preempt_count, &old, new));
}
/*
@@ -58,33 +67,45 @@ static __always_inline void preempt_count_set(int pc)
static __always_inline void set_preempt_need_resched(void)
{
- raw_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
+ __pc_op(and, __preempt_count, ~PREEMPT_NEED_RESCHED);
}
static __always_inline void clear_preempt_need_resched(void)
{
- raw_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
+ __pc_op(or, __preempt_count, PREEMPT_NEED_RESCHED);
}
static __always_inline bool test_preempt_need_resched(void)
{
- return !(raw_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
+ return !(__pc_op(read, __preempt_count) & PREEMPT_NEED_RESCHED);
}
/*
* The various preempt_count add/sub methods
*/
-static __always_inline void __preempt_count_add(int val)
+static __always_inline void __preempt_count_add(long val)
{
- raw_cpu_add_4(__preempt_count, val);
+ __pc_op(add, __preempt_count, val);
}
-static __always_inline void __preempt_count_sub(int val)
+static __always_inline void __preempt_count_sub(long val)
{
- raw_cpu_add_4(__preempt_count, -val);
+ __pc_op(add, __preempt_count, -val);
}
+#ifdef CONFIG_64BIT
+static __always_inline void __preempt_count_nmi_enter(void)
+{
+ __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);
+}
+
+static __always_inline void __preempt_count_nmi_exit(void)
+{
+ __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);
+}
+#endif
+
/*
* Because we keep PREEMPT_NEED_RESCHED set when we do _not_ need to reschedule
* a decrement which hits zero means we have no preempt_count and should
@@ -101,7 +122,7 @@ static __always_inline bool __preempt_count_dec_and_test(void)
*/
static __always_inline bool should_resched(int preempt_offset)
{
- return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
+ return unlikely(__pc_op(read, __preempt_count) == preempt_offset);
}
#ifdef CONFIG_PREEMPTION
@@ -148,4 +169,6 @@ do { \
#endif /* PREEMPTION */
+#undef __pc_op
+
#endif /* __ASM_PREEMPT_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e7ab22fce3b5..9d3602f085c9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2219,7 +2219,7 @@ DEFINE_PER_CPU_CACHE_HOT(struct task_struct *, current_task) = &init_task;
EXPORT_PER_CPU_SYMBOL(current_task);
EXPORT_PER_CPU_SYMBOL(const_current_task);
-DEFINE_PER_CPU_CACHE_HOT(int, __preempt_count) = INIT_PREEMPT_COUNT;
+DEFINE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count) = INIT_PREEMPT_COUNT;
EXPORT_PER_CPU_SYMBOL(__preempt_count);
DEFINE_PER_CPU_CACHE_HOT(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d57cab4d4c06..77defd9624bf 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -108,15 +108,14 @@ void irq_exit_rcu(void);
do { \
lockdep_off(); \
arch_nmi_enter(); \
- BUG_ON(in_nmi() == NMI_MASK); \
- __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
+ __preempt_count_nmi_enter(); \
} while (0)
#define nmi_enter() \
do { \
__nmi_enter(); \
lockdep_hardirq_enter(); \
- ct_nmi_enter(); \
+ ct_nmi_enter(); \
instrumentation_begin(); \
ftrace_nmi_enter(); \
instrumentation_end(); \
@@ -125,7 +124,7 @@ void irq_exit_rcu(void);
#define __nmi_exit() \
do { \
BUG_ON(!in_nmi()); \
- __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
+ __preempt_count_nmi_exit(); \
arch_nmi_exit(); \
lockdep_on(); \
} while (0)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index d964f965c8ff..7617ca97f442 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -17,6 +17,9 @@
*
* - bits 0-7 are the preemption count (max preemption depth: 256)
* - bits 8-15 are the softirq count (max # of softirqs: 256)
+ * - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
+ * - bits 24-27 are the hardirq count (max # of hardirqs: 16)
+ * - bit 28 is the NMI flag (no nesting count, tracked separately)
*
* The hardirq count could in theory be the same as the number of
* interrupts in the system, but we run all interrupt handlers with
@@ -24,31 +27,41 @@
* there are a few palaeontologic drivers which reenable interrupts in
* the handler, so we need more than one bit here.
*
- * PREEMPT_MASK: 0x000000ff
- * SOFTIRQ_MASK: 0x0000ff00
- * HARDIRQ_MASK: 0x000f0000
- * NMI_MASK: 0x00f00000
- * PREEMPT_NEED_RESCHED: 0x80000000
+ * NMI nesting depth is tracked in a separate per-CPU variable
+ * (nmi_nesting) to save bits in preempt_count.
+ *
+ * 32bit 64bit + PREEMPT_LONG
+ *
+ * PREEMPT_MASK: 0x000000ff 0x00000000000000ff
+ * SOFTIRQ_MASK: 0x0000ff00 0x000000000000ff00
+ * HARDIRQ_DISABLE_MASK: 0x00ff0000 0x0000000000ff0000
+ * HARDIRQ_MASK: 0x0f000000 0x000000000f000000
+ * NMI_MASK: 0x10000000 0x00000000f0000000
+ * PREEMPT_NEED_RESCHED: 0x80000000 0x8000000000000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
+#define HARDIRQ_DISABLE_BITS 8
#define HARDIRQ_BITS 4
-#define NMI_BITS 4
+#define NMI_BITS (1 + 3*IS_ENABLED(CONFIG_PREEMPT_LONG))
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
-#define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_DISABLE_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_SHIFT (HARDIRQ_DISABLE_SHIFT + HARDIRQ_DISABLE_BITS)
#define NMI_SHIFT (HARDIRQ_SHIFT + HARDIRQ_BITS)
#define __IRQ_MASK(x) ((1UL << (x))-1)
#define PREEMPT_MASK (__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
#define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_MASK (__IRQ_MASK(HARDIRQ_DISABLE_BITS) << HARDIRQ_DISABLE_SHIFT)
#define HARDIRQ_MASK (__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
#define NMI_MASK (__IRQ_MASK(NMI_BITS) << NMI_SHIFT)
#define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT)
#define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_OFFSET (1UL << HARDIRQ_DISABLE_SHIFT)
#define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
#define NMI_OFFSET (1UL << NMI_SHIFT)
@@ -105,8 +118,8 @@ static __always_inline unsigned char interrupt_context_level(void)
* preempt_count() is commonly implemented with READ_ONCE().
*/
-#define nmi_count() (preempt_count() & NMI_MASK)
-#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
+#define nmi_count() (preempt_count() & NMI_MASK)
+#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
#ifdef CONFIG_PREEMPT_RT
# define softirq_count() (current->softirq_disable_cnt & SOFTIRQ_MASK)
# define irq_count() ((preempt_count() & (NMI_MASK | HARDIRQ_MASK)) | softirq_count())
@@ -132,6 +145,27 @@ static __always_inline unsigned char interrupt_context_level(void)
# define in_task() (!(preempt_count() & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
#endif
+#ifndef CONFIG_PREEMPT_LONG
+DECLARE_PER_CPU(unsigned int, nmi_nesting);
+
+#define __preempt_count_nmi_enter() \
+ do { \
+ unsigned int _o = NMI_MASK + HARDIRQ_OFFSET; \
+ __this_cpu_inc(nmi_nesting); \
+ _o -= (preempt_count() & NMI_MASK); \
+ __preempt_count_add(_o); \
+ } while (0)
+
+#define __preempt_count_nmi_exit() \
+ do { \
+ unsigned int _o = HARDIRQ_OFFSET; \
+ if (!__this_cpu_dec_return(nmi_nesting)) \
+ _o += NMI_MASK; \
+ __preempt_count_sub(_o); \
+ } while (0)
+
+#endif
+
/*
* The following macros are deprecated and should not be used in new code:
* in_softirq() - We have BH disabled, or are processing softirqs
diff --git a/init/main.c b/init/main.c
index b84818ad9685..f8f4b78b7a06 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1367,7 +1367,7 @@ static inline void do_trace_initcall_level(const char *level)
int __init_or_module do_one_initcall(initcall_t fn)
{
- int count = preempt_count();
+ long count = preempt_count();
char msgbuf[64];
int ret;
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 88c594c6d7fc..2ad9365915eb 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -122,6 +122,10 @@ config PREEMPT_RT_NEEDS_BH_LOCK
config PREEMPT_COUNT
bool
+config PREEMPT_LONG
+ bool
+ depends on PREEMPT_COUNT && 64BIT
+
config PREEMPTION
bool
select PREEMPT_COUNT
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b411e4feff7f..f54dd3cb66f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5709,7 +5709,7 @@ static inline void sched_tick_stop(int cpu) { }
* If the value passed in is equal to the current preempt count
* then we just disabled preemption. Start timing the latency.
*/
-static inline void preempt_latency_start(int val)
+static inline void preempt_latency_start(long val)
{
if (preempt_count() == val) {
unsigned long ip = get_lock_parent_ip();
@@ -5746,7 +5746,7 @@ NOKPROBE_SYMBOL(preempt_count_add);
* If the value passed in equals to the current preempt count
* then we just enabled preemption. Stop timing the latency.
*/
-static inline void preempt_latency_stop(int val)
+static inline void preempt_latency_stop(long val)
{
if (preempt_count() == val)
trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
@@ -8774,7 +8774,7 @@ void __might_sleep(const char *file, int line)
}
EXPORT_SYMBOL(__might_sleep);
-static void print_preempt_disable_ip(int preempt_offset, unsigned long ip)
+static void print_preempt_disable_ip(long preempt_offset, unsigned long ip)
{
if (!IS_ENABLED(CONFIG_DEBUG_PREEMPT))
return;
@@ -8846,7 +8846,7 @@ void __might_resched(const char *file, int line, unsigned int offsets)
}
EXPORT_SYMBOL(__might_resched);
-void __cant_sleep(const char *file, int line, int preempt_offset)
+void __cant_sleep(const char *file, int line, long preempt_offset)
{
static unsigned long prev_jiffy;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 77198911b8dd..51a7f391edab 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,14 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
#endif
+#ifndef CONFIG_PREEMPT_LONG
+/*
+ * Any 32bit architecture that still cares about performance should
+ * probably ensure this is near preempt_count.
+ */
+DEFINE_PER_CPU(unsigned int, nmi_nesting);
+#endif
+
/*
* SOFTIRQ_OFFSET usage:
*
@@ -609,7 +617,7 @@ static void handle_softirqs(bool ksirqd)
while ((softirq_bit = ffs(pending))) {
unsigned int vec_nr;
- int prev_count;
+ long prev_count;
h += softirq_bit - 1;
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 1f2364126894..89c348139218 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1723,7 +1723,7 @@ static void call_timer_fn(struct timer_list *timer,
void (*fn)(struct timer_list *),
unsigned long baseclk)
{
- int count = preempt_count();
+ long count = preempt_count();
#ifdef CONFIG_LOCKDEP
/*
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index d939403331b5..8fd216bd0be6 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1429,7 +1429,7 @@ static int unexpected_testcase_failures;
static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask)
{
- int saved_preempt_count = preempt_count();
+ long saved_preempt_count = preempt_count();
#ifdef CONFIG_PREEMPT_RT
#ifdef CONFIG_SMP
int saved_mgd_count = current->migration_disabled;
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-04 11:12 ` Peter Zijlstra
@ 2026-02-04 12:32 ` Gary Guo
2026-02-04 13:00 ` Peter Zijlstra
2026-02-05 21:40 ` Boqun Feng
2026-02-05 22:07 ` Boqun Feng
2 siblings, 1 reply; 47+ messages in thread
From: Gary Guo @ 2026-02-04 12:32 UTC (permalink / raw)
To: Peter Zijlstra, Lyude Paul
Cc: rust-for-linux, linux-kernel, Thomas Gleixner, Boqun Feng,
Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Wed Feb 4, 2026 at 11:12 AM GMT, Peter Zijlstra wrote:
> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
>> But I'm really somewhat sad that 64bit can't do better than this.
>
> Here, the below builds and boots (albeit with warnings because printf
> format crap sucks).
Hi Peter,
I am not sure if it's worth the complexity to do this for the NMI code path.
I don't think NMI code path is hot enough that this is necessary?
Best,
Gary
>
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/preempt.h | 53 ++++++++++++++++++++++++++++++------------
> arch/x86/kernel/cpu/common.c | 2 +-
> include/linux/hardirq.h | 7 +++---
> include/linux/preempt.h | 52 ++++++++++++++++++++++++++++++++++-------
> init/main.c | 2 +-
> kernel/Kconfig.preempt | 4 ++++
> kernel/sched/core.c | 8 +++----
> kernel/softirq.c | 10 +++++++-
> kernel/time/timer.c | 2 +-
> lib/locking-selftest.c | 2 +-
> 11 files changed, 106 insertions(+), 37 deletions(-)
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-04 12:32 ` Gary Guo
@ 2026-02-04 13:00 ` Peter Zijlstra
0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2026-02-04 13:00 UTC (permalink / raw)
To: Gary Guo
Cc: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner,
Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Wed, Feb 04, 2026 at 12:32:45PM +0000, Gary Guo wrote:
> On Wed Feb 4, 2026 at 11:12 AM GMT, Peter Zijlstra wrote:
> > On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
> >> But I'm really somewhat sad that 64bit can't do better than this.
> >
> > Here, the below builds and boots (albeit with warnings because printf
> > format crap sucks).
>
> Hi Peter,
>
> I am not sure if it's worth the complexity to do this for the NMI code path.
> I don't think NMI code path is hot enough that this is necessary?
Perf uses NMI. Also, the 64bit code is actually simpler.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-04 11:12 ` Peter Zijlstra
2026-02-04 12:32 ` Gary Guo
@ 2026-02-05 21:40 ` Boqun Feng
2026-02-05 22:17 ` Joel Fernandes
2026-02-05 22:07 ` Boqun Feng
2 siblings, 1 reply; 47+ messages in thread
From: Boqun Feng @ 2026-02-05 21:40 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner,
Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
> > But I'm really somewhat sad that 64bit can't do better than this.
>
> Here, the below builds and boots (albeit with warnings because printf
> format crap sucks).
>
Thanks! I will drop patch #1 and #2 and use this one (with a commit log
and some more tests), given it's based on the work of Joel, Lyude and
me, would the following tags make sense to all of you?
Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-by: Joel Fernandes <joelagnelf@nvidia.com>
Co-developed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Regards,
Boqun
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/preempt.h | 53 ++++++++++++++++++++++++++++++------------
> arch/x86/kernel/cpu/common.c | 2 +-
> include/linux/hardirq.h | 7 +++---
> include/linux/preempt.h | 52 ++++++++++++++++++++++++++++++++++-------
> init/main.c | 2 +-
> kernel/Kconfig.preempt | 4 ++++
> kernel/sched/core.c | 8 +++----
> kernel/softirq.c | 10 +++++++-
> kernel/time/timer.c | 2 +-
> lib/locking-selftest.c | 2 +-
> 11 files changed, 106 insertions(+), 37 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 80527299f859..2bd1972fd4c7 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -326,6 +326,7 @@ config X86
> select USER_STACKTRACE_SUPPORT
> select HAVE_ARCH_KCSAN if X86_64
> select PROC_PID_ARCH_STATUS if PROC_FS
> + select PREEMPT_LONG if X86_64
> select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
> select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16
> select FUNCTION_ALIGNMENT_4B
> diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
> index 578441db09f0..1b54d5555138 100644
> --- a/arch/x86/include/asm/preempt.h
> +++ b/arch/x86/include/asm/preempt.h
> @@ -7,10 +7,19 @@
>
> #include <linux/static_call_types.h>
>
> -DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
> +DECLARE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count);
>
> -/* We use the MSB mostly because its available */
> -#define PREEMPT_NEED_RESCHED 0x80000000
> +/*
> + * We use the MSB for PREEMPT_NEED_RESCHED mostly because it is available.
> + */
> +
> +#ifdef CONFIG_64BIT
> +#define PREEMPT_NEED_RESCHED (~((-1L) >> 1))
> +#define __pc_op(op, ...) raw_cpu_##op##_8(__VA_ARGS__)
> +#else
> +#define PREEMPT_NEED_RESCHED (~((-1) >> 1))
> +#define __pc_op(op, ...) raw_cpu_##op##_4(__VA_ARGS__)
> +#endif
>
> /*
> * We use the PREEMPT_NEED_RESCHED bit as an inverted NEED_RESCHED such
> @@ -24,18 +33,18 @@ DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
> */
> static __always_inline int preempt_count(void)
> {
> - return raw_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
> + return __pc_op(read, __preempt_count) & ~PREEMPT_NEED_RESCHED;
> }
>
> -static __always_inline void preempt_count_set(int pc)
> +static __always_inline void preempt_count_set(long pc)
> {
> int old, new;
>
> - old = raw_cpu_read_4(__preempt_count);
> + old = __pc_op(read, __preempt_count);
> do {
> new = (old & PREEMPT_NEED_RESCHED) |
> (pc & ~PREEMPT_NEED_RESCHED);
> - } while (!raw_cpu_try_cmpxchg_4(__preempt_count, &old, new));
> + } while (!__pc_op(try_cmpxchg, __preempt_count, &old, new));
> }
>
> /*
> @@ -58,33 +67,45 @@ static __always_inline void preempt_count_set(int pc)
>
> static __always_inline void set_preempt_need_resched(void)
> {
> - raw_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
> + __pc_op(and, __preempt_count, ~PREEMPT_NEED_RESCHED);
> }
>
> static __always_inline void clear_preempt_need_resched(void)
> {
> - raw_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
> + __pc_op(or, __preempt_count, PREEMPT_NEED_RESCHED);
> }
>
> static __always_inline bool test_preempt_need_resched(void)
> {
> - return !(raw_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
> + return !(__pc_op(read, __preempt_count) & PREEMPT_NEED_RESCHED);
> }
>
> /*
> * The various preempt_count add/sub methods
> */
>
> -static __always_inline void __preempt_count_add(int val)
> +static __always_inline void __preempt_count_add(long val)
> {
> - raw_cpu_add_4(__preempt_count, val);
> + __pc_op(add, __preempt_count, val);
> }
>
> -static __always_inline void __preempt_count_sub(int val)
> +static __always_inline void __preempt_count_sub(long val)
> {
> - raw_cpu_add_4(__preempt_count, -val);
> + __pc_op(add, __preempt_count, -val);
> }
>
> +#ifdef CONFIG_64BIT
> +static __always_inline void __preempt_count_nmi_enter(void)
> +{
> + __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);
> +}
> +
> +static __always_inline void __preempt_count_nmi_exit(void)
> +{
> + __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);
> +}
> +#endif
> +
> /*
> * Because we keep PREEMPT_NEED_RESCHED set when we do _not_ need to reschedule
> * a decrement which hits zero means we have no preempt_count and should
> @@ -101,7 +122,7 @@ static __always_inline bool __preempt_count_dec_and_test(void)
> */
> static __always_inline bool should_resched(int preempt_offset)
> {
> - return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
> + return unlikely(__pc_op(read, __preempt_count) == preempt_offset);
> }
>
> #ifdef CONFIG_PREEMPTION
> @@ -148,4 +169,6 @@ do { \
>
> #endif /* PREEMPTION */
>
> +#undef __pc_op
> +
> #endif /* __ASM_PREEMPT_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e7ab22fce3b5..9d3602f085c9 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2219,7 +2219,7 @@ DEFINE_PER_CPU_CACHE_HOT(struct task_struct *, current_task) = &init_task;
> EXPORT_PER_CPU_SYMBOL(current_task);
> EXPORT_PER_CPU_SYMBOL(const_current_task);
>
> -DEFINE_PER_CPU_CACHE_HOT(int, __preempt_count) = INIT_PREEMPT_COUNT;
> +DEFINE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count) = INIT_PREEMPT_COUNT;
> EXPORT_PER_CPU_SYMBOL(__preempt_count);
>
> DEFINE_PER_CPU_CACHE_HOT(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index d57cab4d4c06..77defd9624bf 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -108,15 +108,14 @@ void irq_exit_rcu(void);
> do { \
> lockdep_off(); \
> arch_nmi_enter(); \
> - BUG_ON(in_nmi() == NMI_MASK); \
> - __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
> + __preempt_count_nmi_enter(); \
> } while (0)
>
> #define nmi_enter() \
> do { \
> __nmi_enter(); \
> lockdep_hardirq_enter(); \
> - ct_nmi_enter(); \
> + ct_nmi_enter(); \
> instrumentation_begin(); \
> ftrace_nmi_enter(); \
> instrumentation_end(); \
> @@ -125,7 +124,7 @@ void irq_exit_rcu(void);
> #define __nmi_exit() \
> do { \
> BUG_ON(!in_nmi()); \
> - __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
> + __preempt_count_nmi_exit(); \
> arch_nmi_exit(); \
> lockdep_on(); \
> } while (0)
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index d964f965c8ff..7617ca97f442 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -17,6 +17,9 @@
> *
> * - bits 0-7 are the preemption count (max preemption depth: 256)
> * - bits 8-15 are the softirq count (max # of softirqs: 256)
> + * - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
> + * - bits 24-27 are the hardirq count (max # of hardirqs: 16)
> + * - bit 28 is the NMI flag (no nesting count, tracked separately)
> *
> * The hardirq count could in theory be the same as the number of
> * interrupts in the system, but we run all interrupt handlers with
> @@ -24,31 +27,41 @@
> * there are a few palaeontologic drivers which reenable interrupts in
> * the handler, so we need more than one bit here.
> *
> - * PREEMPT_MASK: 0x000000ff
> - * SOFTIRQ_MASK: 0x0000ff00
> - * HARDIRQ_MASK: 0x000f0000
> - * NMI_MASK: 0x00f00000
> - * PREEMPT_NEED_RESCHED: 0x80000000
> + * NMI nesting depth is tracked in a separate per-CPU variable
> + * (nmi_nesting) to save bits in preempt_count.
> + *
> + * 32bit 64bit + PREEMPT_LONG
> + *
> + * PREEMPT_MASK: 0x000000ff 0x00000000000000ff
> + * SOFTIRQ_MASK: 0x0000ff00 0x000000000000ff00
> + * HARDIRQ_DISABLE_MASK: 0x00ff0000 0x0000000000ff0000
> + * HARDIRQ_MASK: 0x0f000000 0x000000000f000000
> + * NMI_MASK: 0x10000000 0x00000000f0000000
> + * PREEMPT_NEED_RESCHED: 0x80000000 0x8000000000000000
> */
> #define PREEMPT_BITS 8
> #define SOFTIRQ_BITS 8
> +#define HARDIRQ_DISABLE_BITS 8
> #define HARDIRQ_BITS 4
> -#define NMI_BITS 4
> +#define NMI_BITS (1 + 3*IS_ENABLED(CONFIG_PREEMPT_LONG))
>
> #define PREEMPT_SHIFT 0
> #define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
> -#define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
> +#define HARDIRQ_DISABLE_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
> +#define HARDIRQ_SHIFT (HARDIRQ_DISABLE_SHIFT + HARDIRQ_DISABLE_BITS)
> #define NMI_SHIFT (HARDIRQ_SHIFT + HARDIRQ_BITS)
>
> #define __IRQ_MASK(x) ((1UL << (x))-1)
>
> #define PREEMPT_MASK (__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
> #define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
> +#define HARDIRQ_DISABLE_MASK (__IRQ_MASK(HARDIRQ_DISABLE_BITS) << HARDIRQ_DISABLE_SHIFT)
> #define HARDIRQ_MASK (__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
> #define NMI_MASK (__IRQ_MASK(NMI_BITS) << NMI_SHIFT)
>
> #define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT)
> #define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT)
> +#define HARDIRQ_DISABLE_OFFSET (1UL << HARDIRQ_DISABLE_SHIFT)
> #define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
> #define NMI_OFFSET (1UL << NMI_SHIFT)
>
> @@ -105,8 +118,8 @@ static __always_inline unsigned char interrupt_context_level(void)
> * preempt_count() is commonly implemented with READ_ONCE().
> */
>
> -#define nmi_count() (preempt_count() & NMI_MASK)
> -#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
> +#define nmi_count() (preempt_count() & NMI_MASK)
> +#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
> #ifdef CONFIG_PREEMPT_RT
> # define softirq_count() (current->softirq_disable_cnt & SOFTIRQ_MASK)
> # define irq_count() ((preempt_count() & (NMI_MASK | HARDIRQ_MASK)) | softirq_count())
> @@ -132,6 +145,27 @@ static __always_inline unsigned char interrupt_context_level(void)
> # define in_task() (!(preempt_count() & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> #endif
>
> +#ifndef CONFIG_PREEMPT_LONG
> +DECLARE_PER_CPU(unsigned int, nmi_nesting);
> +
> +#define __preempt_count_nmi_enter() \
> + do { \
> + unsigned int _o = NMI_MASK + HARDIRQ_OFFSET; \
> + __this_cpu_inc(nmi_nesting); \
> + _o -= (preempt_count() & NMI_MASK); \
> + __preempt_count_add(_o); \
> + } while (0)
> +
> +#define __preempt_count_nmi_exit() \
> + do { \
> + unsigned int _o = HARDIRQ_OFFSET; \
> + if (!__this_cpu_dec_return(nmi_nesting)) \
> + _o += NMI_MASK; \
> + __preempt_count_sub(_o); \
> + } while (0)
> +
> +#endif
> +
> /*
> * The following macros are deprecated and should not be used in new code:
> * in_softirq() - We have BH disabled, or are processing softirqs
> diff --git a/init/main.c b/init/main.c
> index b84818ad9685..f8f4b78b7a06 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1367,7 +1367,7 @@ static inline void do_trace_initcall_level(const char *level)
>
> int __init_or_module do_one_initcall(initcall_t fn)
> {
> - int count = preempt_count();
> + long count = preempt_count();
> char msgbuf[64];
> int ret;
>
> diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
> index 88c594c6d7fc..2ad9365915eb 100644
> --- a/kernel/Kconfig.preempt
> +++ b/kernel/Kconfig.preempt
> @@ -122,6 +122,10 @@ config PREEMPT_RT_NEEDS_BH_LOCK
> config PREEMPT_COUNT
> bool
>
> +config PREEMPT_LONG
> + bool
> + depends on PREEMPT_COUNT && 64BIT
> +
> config PREEMPTION
> bool
> select PREEMPT_COUNT
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b411e4feff7f..f54dd3cb66f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5709,7 +5709,7 @@ static inline void sched_tick_stop(int cpu) { }
> * If the value passed in is equal to the current preempt count
> * then we just disabled preemption. Start timing the latency.
> */
> -static inline void preempt_latency_start(int val)
> +static inline void preempt_latency_start(long val)
> {
> if (preempt_count() == val) {
> unsigned long ip = get_lock_parent_ip();
> @@ -5746,7 +5746,7 @@ NOKPROBE_SYMBOL(preempt_count_add);
> * If the value passed in equals to the current preempt count
> * then we just enabled preemption. Stop timing the latency.
> */
> -static inline void preempt_latency_stop(int val)
> +static inline void preempt_latency_stop(long val)
> {
> if (preempt_count() == val)
> trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
> @@ -8774,7 +8774,7 @@ void __might_sleep(const char *file, int line)
> }
> EXPORT_SYMBOL(__might_sleep);
>
> -static void print_preempt_disable_ip(int preempt_offset, unsigned long ip)
> +static void print_preempt_disable_ip(long preempt_offset, unsigned long ip)
> {
> if (!IS_ENABLED(CONFIG_DEBUG_PREEMPT))
> return;
> @@ -8846,7 +8846,7 @@ void __might_resched(const char *file, int line, unsigned int offsets)
> }
> EXPORT_SYMBOL(__might_resched);
>
> -void __cant_sleep(const char *file, int line, int preempt_offset)
> +void __cant_sleep(const char *file, int line, long preempt_offset)
> {
> static unsigned long prev_jiffy;
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 77198911b8dd..51a7f391edab 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -88,6 +88,14 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
> EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
> #endif
>
> +#ifndef CONFIG_PREEMPT_LONG
> +/*
> + * Any 32bit architecture that still cares about performance should
> + * probably ensure this is near preempt_count.
> + */
> +DEFINE_PER_CPU(unsigned int, nmi_nesting);
> +#endif
> +
> /*
> * SOFTIRQ_OFFSET usage:
> *
> @@ -609,7 +617,7 @@ static void handle_softirqs(bool ksirqd)
>
> while ((softirq_bit = ffs(pending))) {
> unsigned int vec_nr;
> - int prev_count;
> + long prev_count;
>
> h += softirq_bit - 1;
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 1f2364126894..89c348139218 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1723,7 +1723,7 @@ static void call_timer_fn(struct timer_list *timer,
> void (*fn)(struct timer_list *),
> unsigned long baseclk)
> {
> - int count = preempt_count();
> + long count = preempt_count();
>
> #ifdef CONFIG_LOCKDEP
> /*
> diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
> index d939403331b5..8fd216bd0be6 100644
> --- a/lib/locking-selftest.c
> +++ b/lib/locking-selftest.c
> @@ -1429,7 +1429,7 @@ static int unexpected_testcase_failures;
>
> static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask)
> {
> - int saved_preempt_count = preempt_count();
> + long saved_preempt_count = preempt_count();
> #ifdef CONFIG_PREEMPT_RT
> #ifdef CONFIG_SMP
> int saved_mgd_count = current->migration_disabled;
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-05 21:40 ` Boqun Feng
@ 2026-02-05 22:17 ` Joel Fernandes
2026-02-06 0:50 ` Joel Fernandes
0 siblings, 1 reply; 47+ messages in thread
From: Joel Fernandes @ 2026-02-05 22:17 UTC (permalink / raw)
To: Boqun Feng, Peter Zijlstra
Cc: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner,
Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long
On 2/5/2026 4:40 PM, Boqun Feng wrote:
> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
>> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
>>> But I'm really somewhat sad that 64bit can't do better than this.
>>
>> Here, the below builds and boots (albeit with warnings because printf
>> format crap sucks).
>>
>
> Thanks! I will drop patch #1 and #2 and use this one (with a commit log
> and some more tests), given it's based on the work of Joel, Lyude and
> me, would the following tags make sense to all of you?
> > Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
I don't know, I am not a big fan of the alternative patch because it adds a
per-cpu counter anyway if !CONFIG_PREEMPT_LONG [1]. And it is also a much bigger
patch than the one I wrote. Purely from an objective perspective, I would still
want to keep my original patch because it is simple. What is really the
objection to it?
[1]
+#ifndef CONFIG_PREEMPT_LONG
+/*
+ * Any 32bit architecture that still cares about performance should
+ * probably ensure this is near preempt_count.
+ */
+DEFINE_PER_CPU(unsigned int, nmi_nesting);
+#endif
Thanks,
--
Joel Fernandes
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-05 22:17 ` Joel Fernandes
@ 2026-02-06 0:50 ` Joel Fernandes
2026-02-06 1:14 ` Boqun Feng
0 siblings, 1 reply; 47+ messages in thread
From: Joel Fernandes @ 2026-02-06 0:50 UTC (permalink / raw)
To: Boqun Feng, Peter Zijlstra
Cc: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner,
Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long
On 2/5/2026 5:17 PM, Joel Fernandes wrote:
>
>
> On 2/5/2026 4:40 PM, Boqun Feng wrote:
>> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
>>> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
>>>> But I'm really somewhat sad that 64bit can't do better than this.
>>>
>>> Here, the below builds and boots (albeit with warnings because printf
>>> format crap sucks).
>>>
>>
>> Thanks! I will drop patch #1 and #2 and use this one (with a commit log
>> and some more tests), given it's based on the work of Joel, Lyude and
>> me, would the following tags make sense to all of you?
>>> Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
>
> I don't know, I am not a big fan of the alternative patch because it adds a
> per-cpu counter anyway if !CONFIG_PREEMPT_LONG [1]. And it is also a much bigger
> patch than the one I wrote. Purely from an objective perspective, I would still
> want to keep my original patch because it is simple. What is really the
> objection to it?
>
> [1]
> +#ifndef CONFIG_PREEMPT_LONG
> +/*
> + * Any 32bit architecture that still cares about performance should
> + * probably ensure this is near preempt_count.
> + */
> +DEFINE_PER_CPU(unsigned int, nmi_nesting);
> +#endif
>
If the objection to my patch is modifying a per-cpu counter, isn't NMI a slow
path? If we agree, then keeping things simple is better IMO unless we have data
showing that it is an issue. This is code is already quite convoluted, let us
not convolute it more with 32-bit specific things.
I had tried moving it to DEFINE_PER_CPU_CACHE_HOT, but ISTR that did not work
out (I think something about a limit to how many things could be moved to cache
hot).
Happy to revise patch again with any other suggestions,
--
Joel Fernandes
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 0:50 ` Joel Fernandes
@ 2026-02-06 1:14 ` Boqun Feng
2026-02-06 1:24 ` Joel Fernandes
0 siblings, 1 reply; 47+ messages in thread
From: Boqun Feng @ 2026-02-06 1:14 UTC (permalink / raw)
To: Joel Fernandes
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On Thu, Feb 05, 2026 at 07:50:03PM -0500, Joel Fernandes wrote:
>
>
> On 2/5/2026 5:17 PM, Joel Fernandes wrote:
> >
> >
> > On 2/5/2026 4:40 PM, Boqun Feng wrote:
> >> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
> >>> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
> >>>> But I'm really somewhat sad that 64bit can't do better than this.
> >>>
> >>> Here, the below builds and boots (albeit with warnings because printf
> >>> format crap sucks).
> >>>
> >>
> >> Thanks! I will drop patch #1 and #2 and use this one (with a commit log
> >> and some more tests), given it's based on the work of Joel, Lyude and
> >> me, would the following tags make sense to all of you?
> >>> Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
> >
> > I don't know, I am not a big fan of the alternative patch because it adds a
> > per-cpu counter anyway if !CONFIG_PREEMPT_LONG [1]. And it is also a much bigger
> > patch than the one I wrote. Purely from an objective perspective, I would still
> > want to keep my original patch because it is simple. What is really the
> > objection to it?
> >
PREEMPT_LONG is an architecture-specific way to improve the performance
IMO. Just to be clear, do you object it at all, or do you object
combining it with your original patch? If it's the latter, I could make
another patch as a follow to enable PREEMPT_LONG.
> > [1]
> > +#ifndef CONFIG_PREEMPT_LONG
> > +/*
> > + * Any 32bit architecture that still cares about performance should
> > + * probably ensure this is near preempt_count.
> > + */
> > +DEFINE_PER_CPU(unsigned int, nmi_nesting);
> > +#endif
> >
> If the objection to my patch is modifying a per-cpu counter, isn't NMI a slow
> path? If we agree, then keeping things simple is better IMO unless we have data
I guess Peter was trying to say it's not a slow path if you consider
perf event interrupts on x86? [1]
> showing that it is an issue. This is code is already quite convoluted, let us
> not convolute it more with 32-bit specific things.
>
> I had tried moving it to DEFINE_PER_CPU_CACHE_HOT, but ISTR that did not work
> out (I think something about a limit to how many things could be moved to cache
> hot).
>
> Happy to revise patch again with any other suggestions,
>
[1]: https://lore.kernel.org/rust-for-linux/20260204130027.GE3016024@noisy.programming.kicks-ass.net/
Regards,
Boqun
> --
> Joel Fernandes
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 1:14 ` Boqun Feng
@ 2026-02-06 1:24 ` Joel Fernandes
2026-02-06 2:51 ` Boqun Feng
2026-02-06 8:42 ` Peter Zijlstra
0 siblings, 2 replies; 47+ messages in thread
From: Joel Fernandes @ 2026-02-06 1:24 UTC (permalink / raw)
To: Boqun Feng
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On 2/5/2026 8:14 PM, Boqun Feng wrote:
> On Thu, Feb 05, 2026 at 07:50:03PM -0500, Joel Fernandes wrote:
>>
>>
>> On 2/5/2026 5:17 PM, Joel Fernandes wrote:
>>>
>>>
>>> On 2/5/2026 4:40 PM, Boqun Feng wrote:
>>>> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
>>>>> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
>>>>>> But I'm really somewhat sad that 64bit can't do better than this.
>>>>>
>>>>> Here, the below builds and boots (albeit with warnings because printf
>>>>> format crap sucks).
>>>>>
>>>>
>>>> Thanks! I will drop patch #1 and #2 and use this one (with a commit log
>>>> and some more tests), given it's based on the work of Joel, Lyude and
>>>> me, would the following tags make sense to all of you?
>>>>> Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
>>>
>>> I don't know, I am not a big fan of the alternative patch because it adds a
>>> per-cpu counter anyway if !CONFIG_PREEMPT_LONG [1]. And it is also a much bigger
>>> patch than the one I wrote. Purely from an objective perspective, I would still
>>> want to keep my original patch because it is simple. What is really the
>>> objection to it?
>>>
>
> PREEMPT_LONG is an architecture-specific way to improve the performance
> IMO. Just to be clear, do you object it at all, or do you object
> combining it with your original patch? If it's the latter, I could make
> another patch as a follow to enable PREEMPT_LONG.
When I looked at the alternative patch, I did consider that it was
overcomplicated and it should be justified. Otherwise, I don't object to it. It
seems to be a matter of preference I think. I would prefer a simpler fix than an
overcomplicated fix for a hypothetical issue (unless we have data showing
issue). If it was a few lines of change, that'd be different story.
>
>>> [1]
>>> +#ifndef CONFIG_PREEMPT_LONG
>>> +/*
>>> + * Any 32bit architecture that still cares about performance should
>>> + * probably ensure this is near preempt_count.
>>> + */
>>> +DEFINE_PER_CPU(unsigned int, nmi_nesting);
>>> +#endif
>>>
>> If the objection to my patch is modifying a per-cpu counter, isn't NMI a slow
>> path? If we agree, then keeping things simple is better IMO unless we have data
>
> I guess Peter was trying to say it's not a slow path if you consider
> perf event interrupts on x86? [1]
How are we handling this performance issue then on 32-bit x86 architecture with
perf? Or are we saying we don't care about performance on 32-bit?
--
Joel Fernandes
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 1:24 ` Joel Fernandes
@ 2026-02-06 2:51 ` Boqun Feng
2026-02-06 8:13 ` Joel Fernandes
2026-02-06 8:42 ` Peter Zijlstra
1 sibling, 1 reply; 47+ messages in thread
From: Boqun Feng @ 2026-02-06 2:51 UTC (permalink / raw)
To: Joel Fernandes
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On Thu, Feb 05, 2026 at 08:24:40PM -0500, Joel Fernandes wrote:
>
>
> On 2/5/2026 8:14 PM, Boqun Feng wrote:
> > On Thu, Feb 05, 2026 at 07:50:03PM -0500, Joel Fernandes wrote:
> >>
> >>
> >> On 2/5/2026 5:17 PM, Joel Fernandes wrote:
> >>>
> >>>
> >>> On 2/5/2026 4:40 PM, Boqun Feng wrote:
> >>>> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
> >>>>> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
> >>>>>> But I'm really somewhat sad that 64bit can't do better than this.
> >>>>>
> >>>>> Here, the below builds and boots (albeit with warnings because printf
> >>>>> format crap sucks).
> >>>>>
> >>>>
> >>>> Thanks! I will drop patch #1 and #2 and use this one (with a commit log
> >>>> and some more tests), given it's based on the work of Joel, Lyude and
> >>>> me, would the following tags make sense to all of you?
> >>>>> Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
> >>>
> >>> I don't know, I am not a big fan of the alternative patch because it adds a
> >>> per-cpu counter anyway if !CONFIG_PREEMPT_LONG [1]. And it is also a much bigger
> >>> patch than the one I wrote. Purely from an objective perspective, I would still
> >>> want to keep my original patch because it is simple. What is really the
> >>> objection to it?
> >>>
> >
> > PREEMPT_LONG is an architecture-specific way to improve the performance
> > IMO. Just to be clear, do you object it at all, or do you object
> > combining it with your original patch? If it's the latter, I could make
> > another patch as a follow to enable PREEMPT_LONG.
>
> When I looked at the alternative patch, I did consider that it was
> overcomplicated and it should be justified. Otherwise, I don't object to it. It
I don't think that's overcomplicated. Note that people have different
goals, for us (you, Lyude and me), we want to have a safer
interrupt-disabling lock API, hence this patchset. I think Peter on the
other hand while agreeing with us on the necessity, but wants to avoid
potential performance lost (maybe in general also likes the idea of
preempt_count being 64bit on 64bit machines ;-)) That patch looks
"overcomplicated" because it contains both goals (it actually contains
patch #1 and #2 along with the improvement). If you look them
separately, it would be not that complicated (Peter's diff against patch
1 + 2 will be relatively small).
> seems to be a matter of preference I think. I would prefer a simpler fix than an
> overcomplicated fix for a hypothetical issue (unless we have data showing
> issue). If it was a few lines of change, that'd be different story.
>
> >
> >>> [1]
> >>> +#ifndef CONFIG_PREEMPT_LONG
> >>> +/*
> >>> + * Any 32bit architecture that still cares about performance should
> >>> + * probably ensure this is near preempt_count.
> >>> + */
> >>> +DEFINE_PER_CPU(unsigned int, nmi_nesting);
> >>> +#endif
> >>>
> >> If the objection to my patch is modifying a per-cpu counter, isn't NMI a slow
> >> path? If we agree, then keeping things simple is better IMO unless we have data
> >
> > I guess Peter was trying to say it's not a slow path if you consider
> > perf event interrupts on x86? [1]
>
> How are we handling this performance issue then on 32-bit x86 architecture with
> perf? Or are we saying we don't care about performance on 32-bit?
>
I'm not in the position to answer this (mostly for the second question).
Either we have data proving that the performance gap caused by your
original patch is small enough (if there is any) or it's up to x86
maintainers.
Regards,
Boqun
> --
> Joel Fernandes
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 2:51 ` Boqun Feng
@ 2026-02-06 8:13 ` Joel Fernandes
2026-02-06 15:28 ` Boqun Feng
0 siblings, 1 reply; 47+ messages in thread
From: Joel Fernandes @ 2026-02-06 8:13 UTC (permalink / raw)
To: Boqun Feng
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
> On Feb 5, 2026, at 9:51 PM, Boqun Feng <boqun@kernel.org> wrote:
>
> On Thu, Feb 05, 2026 at 08:24:40PM -0500, Joel Fernandes wrote:
>>
>>
>>> On 2/5/2026 8:14 PM, Boqun Feng wrote:
>>> On Thu, Feb 05, 2026 at 07:50:03PM -0500, Joel Fernandes wrote:
>>>>
>>>>
>>>> On 2/5/2026 5:17 PM, Joel Fernandes wrote:
>>>>>
>>>>>
>>>>> On 2/5/2026 4:40 PM, Boqun Feng wrote:
>>>>>> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
>>>>>>> On Tue, Feb 03, 2026 at 01:15:21PM +0100, Peter Zijlstra wrote:
>>>>>>>> But I'm really somewhat sad that 64bit can't do better than this.
>>>>>>>
>>>>>>> Here, the below builds and boots (albeit with warnings because printf
>>>>>>> format crap sucks).
>>>>>>>
>>>>>>
>>>>>> Thanks! I will drop patch #1 and #2 and use this one (with a commit log
>>>>>> and some more tests), given it's based on the work of Joel, Lyude and
>>>>>> me, would the following tags make sense to all of you?
>>>>>>> Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
>>>>>
>>>>> I don't know, I am not a big fan of the alternative patch because it adds a
>>>>> per-cpu counter anyway if !CONFIG_PREEMPT_LONG [1]. And it is also a much bigger
>>>>> patch than the one I wrote. Purely from an objective perspective, I would still
>>>>> want to keep my original patch because it is simple. What is really the
>>>>> objection to it?
>>>>>
>>>
>>> PREEMPT_LONG is an architecture-specific way to improve the performance
>>> IMO. Just to be clear, do you object it at all, or do you object
>>> combining it with your original patch? If it's the latter, I could make
>>> another patch as a follow to enable PREEMPT_LONG.
>>
>> When I looked at the alternative patch, I did consider that it was
>> overcomplicated and it should be justified. Otherwise, I don't object to it. It
>
> I don't think that's overcomplicated. Note that people have different
> goals, for us (you, Lyude and me), we want to have a safer
> interrupt-disabling lock API, hence this patchset.
I was also coming from the goal of long term kernel code maintainability. If
we decide to have additional preempt count flags in the future, does special
casing 32 bit add even more complexity? (not rhetorical, really asking)
> I think Peter on the
> other hand while agreeing with us on the necessity, but wants to avoid
> potential performance lost (maybe in general also likes the idea of
> preempt_count being 64bit on 64bit machines ;-)) That patch looks
> "overcomplicated" because it contains both goals (it actually contains
> patch #1 and #2 along with the improvement). If you look them
> separately, it would be not that complicated (Peter's diff against patch
> 1 + 2 will be relatively small).
On further looking, I think my hesitation is mostly around the extra
config option and special casing of 32 bit as mentioned above. But
answering your other question, if it is decided to go with Peter's
patch, you can use my codevelop tag.
--
Joel Fernandes
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 8:13 ` Joel Fernandes
@ 2026-02-06 15:28 ` Boqun Feng
2026-02-06 16:00 ` Joel Fernandes
0 siblings, 1 reply; 47+ messages in thread
From: Boqun Feng @ 2026-02-06 15:28 UTC (permalink / raw)
To: Joel Fernandes
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On Fri, Feb 06, 2026 at 03:13:40AM -0500, Joel Fernandes wrote:
[..]
> >>>
> >>> PREEMPT_LONG is an architecture-specific way to improve the performance
> >>> IMO. Just to be clear, do you object it at all, or do you object
> >>> combining it with your original patch? If it's the latter, I could make
> >>> another patch as a follow to enable PREEMPT_LONG.
> >>
> >> When I looked at the alternative patch, I did consider that it was
> >> overcomplicated and it should be justified. Otherwise, I don't object to it. It
> >
> > I don't think that's overcomplicated. Note that people have different
> > goals, for us (you, Lyude and me), we want to have a safer
> > interrupt-disabling lock API, hence this patchset.
>
> I was also coming from the goal of long term kernel code maintainability. If
> we decide to have additional preempt count flags in the future, does special
> casing 32 bit add even more complexity? (not rhetorical, really asking)
>
First, given what preempt count is, I don't think that'll happen
frequently. Also I think the reality is that we care about 64bit
performance more than 32bit, in that sense, if this "conditional 32 bit
preempt count case" becomes an issue, the reasonable action to me is
just making all preempt count 64bit (using an irq disabling critical
section for 32bit or plus a special locking), and this would make things
simpler. That's the long term view from me. (Now think about this, the
NMI tracking we proposed in this patch is actually a special case of
that ;-))
> > I think Peter on the
> > other hand while agreeing with us on the necessity, but wants to avoid
> > potential performance lost (maybe in general also likes the idea of
> > preempt_count being 64bit on 64bit machines ;-)) That patch looks
> > "overcomplicated" because it contains both goals (it actually contains
> > patch #1 and #2 along with the improvement). If you look them
> > separately, it would be not that complicated (Peter's diff against patch
> > 1 + 2 will be relatively small).
>
> On further looking, I think my hesitation is mostly around the extra
> config option and special casing of 32 bit as mentioned above. But
> answering your other question, if it is decided to go with Peter's
> patch, you can use my codevelop tag.
>
Thank you! But I realized more things are needed, so we probably should
add PREEMPT_LONG as a follow-up (for example, should_resched() should
take a long instead of int, and the print format issues that Peter
mentioned).
Regards,
Boqun
> --
> Joel Fernandes
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 15:28 ` Boqun Feng
@ 2026-02-06 16:00 ` Joel Fernandes
2026-02-06 16:16 ` Boqun Feng
0 siblings, 1 reply; 47+ messages in thread
From: Joel Fernandes @ 2026-02-06 16:00 UTC (permalink / raw)
To: Boqun Feng
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On 2/6/2026 10:28 AM, Boqun Feng wrote:
>> I was also coming from the goal of long term kernel code maintainability. If
>> we decide to have additional preempt count flags in the future, does special
>> casing 32 bit add even more complexity? (not rhetorical, really asking)
>>
> First, given what preempt count is, I don't think that'll happen
> frequently.
Not sure I buy the argument of not happening frequently. I don't think any of us
have a crystal ball. There are cases in the future that can come up IMO.
> Also I think the reality is that we care about 64bit> performance more than
32bit, in that sense, if this "conditional 32 bit
> preempt count case" becomes an issue, the reasonable action to me is
> just making all preempt count 64bit
You might be missing something here. You can't make all of preempt count 64 bit,
that's the point, it doesn't work. That's why Peter did what he did to
special-case 32 bit. See:
https://lore.kernel.org/all/20251020204421.GA197647@joelbox2/
That said, I am ok with the approach now that Peter mentions 32-bit x86 is
"deprecated". :-)
--
Joel Fernandes
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 16:00 ` Joel Fernandes
@ 2026-02-06 16:16 ` Boqun Feng
2026-02-07 22:11 ` Joel Fernandes
0 siblings, 1 reply; 47+ messages in thread
From: Boqun Feng @ 2026-02-06 16:16 UTC (permalink / raw)
To: Joel Fernandes
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On Fri, Feb 06, 2026 at 11:00:16AM -0500, Joel Fernandes wrote:
>
>
> On 2/6/2026 10:28 AM, Boqun Feng wrote:
> >> I was also coming from the goal of long term kernel code maintainability. If
> >> we decide to have additional preempt count flags in the future, does special
> >> casing 32 bit add even more complexity? (not rhetorical, really asking)
> >>
> > First, given what preempt count is, I don't think that'll happen
> > frequently.
>
> Not sure I buy the argument of not happening frequently. I don't think any of us
> have a crystal ball. There are cases in the future that can come up IMO.
>
It's just being realistic, and we pretty much use all the bits there.
> > Also I think the reality is that we care about 64bit> performance more than
> 32bit, in that sense, if this "conditional 32 bit
> > preempt count case" becomes an issue, the reasonable action to me is
> > just making all preempt count 64bit
>
> You might be missing something here. You can't make all of preempt count 64 bit,
> that's the point, it doesn't work. That's why Peter did what he did to
> special-case 32 bit. See:
> https://lore.kernel.org/all/20251020204421.GA197647@joelbox2/
>
> That said, I am ok with the approach now that Peter mentions 32-bit x86 is
> "deprecated". :-)
>
Yeah, "can't" is a strong word? ;-) I did say if we care more about
performance on 64bit than 32bit and can afford slowing down 32bit
preemption disabling in the case where "we decide to have additional
preempt count flags", THEN we can make all preempt count 64bit.
Regards,
Boqun
> --
> Joel Fernandes
>
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 16:16 ` Boqun Feng
@ 2026-02-07 22:11 ` Joel Fernandes
0 siblings, 0 replies; 47+ messages in thread
From: Joel Fernandes @ 2026-02-07 22:11 UTC (permalink / raw)
To: Boqun Feng
Cc: Peter Zijlstra, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On Fri, Feb 06, 2026 at 08:16:39AM -0800, Boqun Feng wrote:
[...]
> Yeah, "can't" is a strong word? ;-) I did say if we care more about
> performance on 64bit than 32bit and can afford slowing down 32bit
> preemption disabling in the case where "we decide to have additional
> preempt count flags", THEN we can make all preempt count 64bit.
Ok, hopefully x86 32-bit goes away before we have to maintain/improve
these workarounds in it. :-)
--
Joel Fernandes
--
--
Joel Fernandes
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-06 1:24 ` Joel Fernandes
2026-02-06 2:51 ` Boqun Feng
@ 2026-02-06 8:42 ` Peter Zijlstra
1 sibling, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2026-02-06 8:42 UTC (permalink / raw)
To: Joel Fernandes
Cc: Boqun Feng, Lyude Paul, rust-for-linux, linux-kernel,
Thomas Gleixner, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Ingo Molnar, Will Deacon, Waiman Long
On Thu, Feb 05, 2026 at 08:24:40PM -0500, Joel Fernandes wrote:
> > I guess Peter was trying to say it's not a slow path if you consider
> > perf event interrupts on x86? [1]
>
> How are we handling this performance issue then on 32-bit x86 architecture with
> perf? Or are we saying we don't care about performance on 32-bit?
Yeah, in general I don't consider any 32bit architecture performance
critical at this point. Its pure legacy code, to be removed at some
point.
To x86_32 in particular, we make it limp along. It sorta builds and
sorta boots but meh. It doesn't even have most of the speculation fixes.
You really, as in *REALLY* should not be running a x86_32 kernel.
I mean, if you still want to run Linux on your museum grade Pentium-II
processor, don't let me stop you. Just don't expect miracles.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-04 11:12 ` Peter Zijlstra
2026-02-04 12:32 ` Gary Guo
2026-02-05 21:40 ` Boqun Feng
@ 2026-02-05 22:07 ` Boqun Feng
2026-02-06 8:45 ` Peter Zijlstra
2 siblings, 1 reply; 47+ messages in thread
From: Boqun Feng @ 2026-02-05 22:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner,
Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
[...]
> DEFINE_PER_CPU_CACHE_HOT(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index d57cab4d4c06..77defd9624bf 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -108,15 +108,14 @@ void irq_exit_rcu(void);
> do { \
> lockdep_off(); \
> arch_nmi_enter(); \
> - BUG_ON(in_nmi() == NMI_MASK); \
> - __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
> + __preempt_count_nmi_enter(); \
> } while (0)
>
> #define nmi_enter() \
> do { \
> __nmi_enter(); \
> lockdep_hardirq_enter(); \
> - ct_nmi_enter(); \
> + ct_nmi_enter(); \
> instrumentation_begin(); \
> ftrace_nmi_enter(); \
> instrumentation_end(); \
> @@ -125,7 +124,7 @@ void irq_exit_rcu(void);
> #define __nmi_exit() \
> do { \
> BUG_ON(!in_nmi()); \
> - __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
> + __preempt_count_nmi_exit(); \
> arch_nmi_exit(); \
> lockdep_on(); \
> } while (0)
> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index d964f965c8ff..7617ca97f442 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
[...]
> @@ -132,6 +145,27 @@ static __always_inline unsigned char interrupt_context_level(void)
> # define in_task() (!(preempt_count() & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> #endif
>
> +#ifndef CONFIG_PREEMPT_LONG
> +DECLARE_PER_CPU(unsigned int, nmi_nesting);
> +
> +#define __preempt_count_nmi_enter() \
> + do { \
> + unsigned int _o = NMI_MASK + HARDIRQ_OFFSET; \
> + __this_cpu_inc(nmi_nesting); \
> + _o -= (preempt_count() & NMI_MASK); \
> + __preempt_count_add(_o); \
> + } while (0)
> +
> +#define __preempt_count_nmi_exit() \
> + do { \
> + unsigned int _o = HARDIRQ_OFFSET; \
> + if (!__this_cpu_dec_return(nmi_nesting)) \
> + _o += NMI_MASK; \
> + __preempt_count_sub(_o); \
> + } while (0)
> +
> +#endif
> +
We need to move it into include/linux/hardirq.h because percpu is not
included in <linux/preempt.h>.
Regards,
Boqun
> /*
> * The following macros are deprecated and should not be used in new code:
> * in_softirq() - We have BH disabled, or are processing softirqs
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter
2026-02-05 22:07 ` Boqun Feng
@ 2026-02-06 8:45 ` Peter Zijlstra
0 siblings, 0 replies; 47+ messages in thread
From: Peter Zijlstra @ 2026-02-06 8:45 UTC (permalink / raw)
To: Boqun Feng
Cc: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner,
Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Ingo Molnar,
Will Deacon, Waiman Long, Joel Fernandes
On Thu, Feb 05, 2026 at 02:07:40PM -0800, Boqun Feng wrote:
> On Wed, Feb 04, 2026 at 12:12:34PM +0100, Peter Zijlstra wrote:
> [...]
> > DEFINE_PER_CPU_CACHE_HOT(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
> > diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> > index d57cab4d4c06..77defd9624bf 100644
> > --- a/include/linux/hardirq.h
> > +++ b/include/linux/hardirq.h
> > @@ -108,15 +108,14 @@ void irq_exit_rcu(void);
> > do { \
> > lockdep_off(); \
> > arch_nmi_enter(); \
> > - BUG_ON(in_nmi() == NMI_MASK); \
> > - __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
> > + __preempt_count_nmi_enter(); \
> > } while (0)
> >
> > #define nmi_enter() \
> > do { \
> > __nmi_enter(); \
> > lockdep_hardirq_enter(); \
> > - ct_nmi_enter(); \
> > + ct_nmi_enter(); \
> > instrumentation_begin(); \
> > ftrace_nmi_enter(); \
> > instrumentation_end(); \
> > @@ -125,7 +124,7 @@ void irq_exit_rcu(void);
> > #define __nmi_exit() \
> > do { \
> > BUG_ON(!in_nmi()); \
> > - __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
> > + __preempt_count_nmi_exit(); \
> > arch_nmi_exit(); \
> > lockdep_on(); \
> > } while (0)
> > diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> > index d964f965c8ff..7617ca97f442 100644
> > --- a/include/linux/preempt.h
> > +++ b/include/linux/preempt.h
> [...]
> > @@ -132,6 +145,27 @@ static __always_inline unsigned char interrupt_context_level(void)
> > # define in_task() (!(preempt_count() & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> > #endif
> >
> > +#ifndef CONFIG_PREEMPT_LONG
> > +DECLARE_PER_CPU(unsigned int, nmi_nesting);
> > +
> > +#define __preempt_count_nmi_enter() \
> > + do { \
> > + unsigned int _o = NMI_MASK + HARDIRQ_OFFSET; \
> > + __this_cpu_inc(nmi_nesting); \
> > + _o -= (preempt_count() & NMI_MASK); \
> > + __preempt_count_add(_o); \
> > + } while (0)
> > +
> > +#define __preempt_count_nmi_exit() \
> > + do { \
> > + unsigned int _o = HARDIRQ_OFFSET; \
> > + if (!__this_cpu_dec_return(nmi_nesting)) \
> > + _o += NMI_MASK; \
> > + __preempt_count_sub(_o); \
> > + } while (0)
> > +
> > +#endif
> > +
>
> We need to move it into include/linux/hardirq.h because percpu is not
> included in <linux/preempt.h>.
That is fine. I also realized you can move the variants from
arch/x86/asm/preempt.h right next to it, it only depends on
PREEMPT_LONG, not anything else, so there is nothing arch specific to
it.
Avoids that getting duplicated on arm64,s390 etc.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v17 03/16] preempt: Introduce __preempt_count_{sub, add}_return()
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
2026-01-21 22:39 ` [PATCH v17 01/16] preempt: Introduce HARDIRQ_DISABLE_BITS Lyude Paul
2026-01-21 22:39 ` [PATCH v17 02/16] preempt: Track NMI nesting to separate per-CPU counter Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 04/16] openrisc: Include <linux/cpumask.h> in smp.h Lyude Paul
` (14 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
From: Boqun Feng <boqun.feng@gmail.com>
In order to use preempt_count() to tracking the interrupt disable
nesting level, __preempt_count_{add,sub}_return() are introduced, as
their name suggest, these primitives return the new value of the
preempt_count() after changing it. The following example shows the usage
of it in local_interrupt_disable():
// increase the HARDIRQ_DISABLE bit
new_count = __preempt_count_add_return(HARDIRQ_DISABLE_OFFSET);
// if it's the first-time increment, then disable the interrupt
// at hardware level.
if (new_count & HARDIRQ_DISABLE_MASK == HARDIRQ_DISABLE_OFFSET) {
local_irq_save(flags);
raw_cpu_write(local_interrupt_disable_state.flags, flags);
}
Having these primitives will avoid a read of preempt_count() after
changing preempt_count() on certain architectures.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
V10:
* Add commit message I forgot
* Rebase against latest pcpu_hot changes
V11:
* Remove CONFIG_PROFILE_ALL_BRANCHES workaround from
__preempt_count_add_return()
arch/arm64/include/asm/preempt.h | 18 ++++++++++++++++++
arch/s390/include/asm/preempt.h | 10 ++++++++++
arch/x86/include/asm/preempt.h | 10 ++++++++++
include/asm-generic/preempt.h | 14 ++++++++++++++
4 files changed, 52 insertions(+)
diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index 932ea4b620428..0dd8221d1bef7 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -55,6 +55,24 @@ static inline void __preempt_count_sub(int val)
WRITE_ONCE(current_thread_info()->preempt.count, pc);
}
+static inline int __preempt_count_add_return(int val)
+{
+ u32 pc = READ_ONCE(current_thread_info()->preempt.count);
+ pc += val;
+ WRITE_ONCE(current_thread_info()->preempt.count, pc);
+
+ return pc;
+}
+
+static inline int __preempt_count_sub_return(int val)
+{
+ u32 pc = READ_ONCE(current_thread_info()->preempt.count);
+ pc -= val;
+ WRITE_ONCE(current_thread_info()->preempt.count, pc);
+
+ return pc;
+}
+
static inline bool __preempt_count_dec_and_test(void)
{
struct thread_info *ti = current_thread_info();
diff --git a/arch/s390/include/asm/preempt.h b/arch/s390/include/asm/preempt.h
index 6ccd033acfe52..5ae366e26c57d 100644
--- a/arch/s390/include/asm/preempt.h
+++ b/arch/s390/include/asm/preempt.h
@@ -98,6 +98,16 @@ static __always_inline bool should_resched(int preempt_offset)
return unlikely(READ_ONCE(get_lowcore()->preempt_count) == preempt_offset);
}
+static __always_inline int __preempt_count_add_return(int val)
+{
+ return val + __atomic_add(val, &get_lowcore()->preempt_count);
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+ return __preempt_count_add_return(-val);
+}
+
#define init_task_preempt_count(p) do { } while (0)
/* Deferred to CPU bringup time */
#define init_idle_preempt_count(p, cpu) do { } while (0)
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 578441db09f0b..1220656f3370b 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -85,6 +85,16 @@ static __always_inline void __preempt_count_sub(int val)
raw_cpu_add_4(__preempt_count, -val);
}
+static __always_inline int __preempt_count_add_return(int val)
+{
+ return raw_cpu_add_return_4(__preempt_count, val);
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+ return raw_cpu_add_return_4(__preempt_count, -val);
+}
+
/*
* Because we keep PREEMPT_NEED_RESCHED set when we do _not_ need to reschedule
* a decrement which hits zero means we have no preempt_count and should
diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
index 51f8f3881523a..c8683c046615d 100644
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -59,6 +59,20 @@ static __always_inline void __preempt_count_sub(int val)
*preempt_count_ptr() -= val;
}
+static __always_inline int __preempt_count_add_return(int val)
+{
+ *preempt_count_ptr() += val;
+
+ return *preempt_count_ptr();
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+ *preempt_count_ptr() -= val;
+
+ return *preempt_count_ptr();
+}
+
static __always_inline bool __preempt_count_dec_and_test(void)
{
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 04/16] openrisc: Include <linux/cpumask.h> in smp.h
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (2 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 03/16] preempt: Introduce __preempt_count_{sub, add}_return() Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 05/16] irq & spin_lock: Add counted interrupt disabling/enabling Lyude Paul
` (13 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long, Stafford Horne
While OpenRISC currently doesn't fail to build upstream, it appears that
include <asm/smp.h> in the right headers is enough to break that -
primarily because OpenRISC's asm/smp.h header doesn't actually provide any
definition for struct cpumask. Which means the only reason we aren't
failing to build kernel is because we've been lucky enough that every spot
including asm/smp.h already has definitions for struct cpumask pulled in.
This became evident when trying to work on a patch series for adding
ref-counted interrupt enable/disables to the kernel, where introducing a
new interrupt_rc.h header suddenly introduced a build error on OpenRISC:
In file included from include/linux/interrupt_rc.h:17,
from include/linux/spinlock.h:60,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:7,
from include/linux/mm.h:7,
from arch/openrisc/include/asm/pgalloc.h:20,
from arch/openrisc/include/asm/io.h:18,
from include/linux/io.h:12,
from drivers/irqchip/irq-ompic.c:61:
arch/openrisc/include/asm/smp.h:21:59: warning: 'struct cpumask'
declared inside parameter list will not be visible outside of this
definition or declaration
21 | extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
| ^~~~~~~
arch/openrisc/include/asm/smp.h:23:54: warning: 'struct cpumask'
declared inside parameter list will not be visible outside of this
definition or declaration
23 | extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));
| ^~~~~~~
drivers/irqchip/irq-ompic.c: In function 'ompic_of_init':
>> drivers/irqchip/irq-ompic.c:191:28: error: passing argument 1 of
'set_smp_cross_call' from incompatible pointer type
[-Werror=incompatible-pointer-types]
191 | set_smp_cross_call(ompic_raise_softirq);
| ^~~~~~~~~~~~~~~~~~~
| |
| void (*)(const struct cpumask *, unsigned int)
arch/openrisc/include/asm/smp.h:23:32: note: expected 'void (*)(const
struct cpumask *, unsigned int)' but argument is of type 'void
(*)(const struct cpumask *, unsigned int)'
23 | extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));
To fix this, let's take an example from the smp.h headers of other
architectures (x86, hexagon, arm64, probably more): just include
linux/cpumask.h at the top.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Stafford Horne <shorne@gmail.com>
---
arch/openrisc/include/asm/smp.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/openrisc/include/asm/smp.h b/arch/openrisc/include/asm/smp.h
index e21d2f12b5b67..0327d8cdae2d0 100644
--- a/arch/openrisc/include/asm/smp.h
+++ b/arch/openrisc/include/asm/smp.h
@@ -9,6 +9,8 @@
#ifndef __ASM_OPENRISC_SMP_H
#define __ASM_OPENRISC_SMP_H
+#include <linux/cpumask.h>
+
#include <asm/spr.h>
#include <asm/spr_defs.h>
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 05/16] irq & spin_lock: Add counted interrupt disabling/enabling
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (3 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 04/16] openrisc: Include <linux/cpumask.h> in smp.h Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 06/16] irq: Add KUnit test for refcounted interrupt enable/disable Lyude Paul
` (12 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
From: Boqun Feng <boqun.feng@gmail.com>
Currently the nested interrupt disabling and enabling is present by
_irqsave() and _irqrestore() APIs, which are relatively unsafe, for
example:
<interrupts are enabled as beginning>
spin_lock_irqsave(l1, flag1);
spin_lock_irqsave(l2, flag2);
spin_unlock_irqrestore(l1, flags1);
<l2 is still held but interrupts are enabled>
// accesses to interrupt-disable protect data will cause races.
This is even easier to triggered with guard facilities:
unsigned long flag2;
scoped_guard(spin_lock_irqsave, l1) {
spin_lock_irqsave(l2, flag2);
}
// l2 locked but interrupts are enabled.
spin_unlock_irqrestore(l2, flag2);
(Hand-to-hand locking critical sections are not uncommon for a
fine-grained lock design)
And because this unsafety, Rust cannot easily wrap the
interrupt-disabling locks in a safe API, which complicates the design.
To resolve this, introduce a new set of interrupt disabling APIs:
* local_interrupt_disable();
* local_interrupt_enable();
They work like local_irq_save() and local_irq_restore() except that 1)
the outermost local_interrupt_disable() call save the interrupt state
into a percpu variable, so that the outermost local_interrupt_enable()
can restore the state, and 2) a percpu counter is added to record the
nest level of these calls, so that interrupts are not accidentally
enabled inside the outermost critical section.
Also add the corresponding spin_lock primitives: spin_lock_irq_disable()
and spin_unlock_irq_enable(), as a result, code as follow:
spin_lock_irq_disable(l1);
spin_lock_irq_disable(l2);
spin_unlock_irq_enable(l1);
// Interrupts are still disabled.
spin_unlock_irq_enable(l2);
doesn't have the issue that interrupts are accidentally enabled.
This also makes the wrapper of interrupt-disabling locks on Rust easier
to design.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
V10:
* Add missing __raw_spin_lock_irq_disable() definition in spinlock.c
V11:
* Move definition of spin_trylock_irq_disable() into this commit
* Get rid of leftover space
* Remove unneeded preempt_disable()/preempt_enable()
V12:
* Move local_interrupt_enable()/local_interrupt_disable() out of
include/linux/spinlock.h, into include/linux/irqflags.h
V14:
* Move local_interrupt_enable()/disable() again, this time into its own
header, interrupt_rc.h, in order to fix a hexagon-specific build issue
caught by the CKI bot.
The reason this is needed is because on most architectures, irqflags.h
ends up including <arch/smp.h>. This provides a definition for the
raw_smp_processor_id() function which we depend on like so:
<linux/percpu-defs.h> <arch/smp.h>
local_interrupt_disable() → raw_cpu_write() → raw_smp_processor_id()
Unfortunately, hexagon appears to be one such architecture which does
not pull in <arch/smp.h> by default here - causing kernel builds to
fail and claim that raw_smp_processor_id() is undefined:
In file included from kernel/sched/rq-offsets.c:5:
In file included from kernel/sched/sched.h:8:
In file included from include/linux/sched/affinity.h:1:
In file included from include/linux/sched.h:37:
In file included from include/linux/spinlock.h:59:
>> include/linux/irqflags.h:277:3: error: call to undeclared function
'raw_smp_processor_id'; ISO C99 and later do not support implicit
function declarations [-Wimplicit-function-declaration]
277 | raw_cpu_write(local_interrupt_disable_state.flags, flags);
| ^
include/linux/percpu-defs.h:413:34: note: expanded from macro
'raw_cpu_write'
While including <arch/smp.h> in <linux/irqflags.h> does fix the build
on hexagon, it ends up breaking the build on x86_64:
In file included from kernel/sched/rq-offsets.c:5:
In file included from kernel/sched/sched.h:8:
In file included from ./include/linux/sched/affinity.h:1:
In file included from ./include/linux/sched.h:13:
In file included from ./arch/x86/include/asm/processor.h:25:
In file included from ./arch/x86/include/asm/special_insns.h:10:
In file included from ./include/linux/irqflags.h:22:
In file included from ./arch/x86/include/asm/smp.h:6:
In file included from ./include/linux/thread_info.h:60:
In file included from ./arch/x86/include/asm/thread_info.h:59:
./arch/x86/include/asm/cpufeature.h:110:40: error: use of undeclared
identifier 'boot_cpu_data'
[cap_byte] "i" (&((const char *)boot_cpu_data.x86_capability)[bit >> 3])
^
While boot_cpu_data is defined in <asm/processor.h>, it's not possible
for us to include that header in irqflags.h because we're already
inside of <asm/processor.h>.
As a result, I just concluded there's no reasonable way of having these
functions in <linux/irqflags.h> because of how many low level ASM
headers depend on it. So, we go with the solution of simply giving
ourselves our own header file.
V15:
* Fix build error on CONFIG_SMP=n reported by Kernel CI
include/linux/interrupt_rc.h | 63 ++++++++++++++++++++++++++++++++
include/linux/preempt.h | 4 ++
include/linux/spinlock.h | 25 +++++++++++++
include/linux/spinlock_api_smp.h | 27 ++++++++++++++
include/linux/spinlock_api_up.h | 8 ++++
include/linux/spinlock_rt.h | 15 ++++++++
kernel/locking/spinlock.c | 29 +++++++++++++++
kernel/softirq.c | 3 ++
8 files changed, 174 insertions(+)
create mode 100644 include/linux/interrupt_rc.h
diff --git a/include/linux/interrupt_rc.h b/include/linux/interrupt_rc.h
new file mode 100644
index 0000000000000..d6d05498731b2
--- /dev/null
+++ b/include/linux/interrupt_rc.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * include/linux/interrupt_rc.h - refcounted local processor interrupt
+ * management.
+ *
+ * Since the implementation of this API currently depends on
+ * local_irq_save()/local_irq_restore(), we split this into it's own header to
+ * make it easier to include without hitting circular header dependencies.
+ */
+
+#ifndef __LINUX_INTERRUPT_RC_H
+#define __LINUX_INTERRUPT_RC_H
+
+#include <linux/irqflags.h>
+#include <asm/processor.h>
+#ifdef CONFIG_SMP
+#include <asm/smp.h>
+#endif
+
+/* Per-cpu interrupt disabling state for local_interrupt_{disable,enable}() */
+struct interrupt_disable_state {
+ unsigned long flags;
+};
+
+DECLARE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
+
+static inline void local_interrupt_disable(void)
+{
+ unsigned long flags;
+ int new_count;
+
+ new_count = hardirq_disable_enter();
+
+ if ((new_count & HARDIRQ_DISABLE_MASK) == HARDIRQ_DISABLE_OFFSET) {
+ local_irq_save(flags);
+ raw_cpu_write(local_interrupt_disable_state.flags, flags);
+ }
+}
+
+static inline void local_interrupt_enable(void)
+{
+ int new_count;
+
+ new_count = hardirq_disable_exit();
+
+ if ((new_count & HARDIRQ_DISABLE_MASK) == 0) {
+ unsigned long flags;
+
+ flags = raw_cpu_read(local_interrupt_disable_state.flags);
+ local_irq_restore(flags);
+ /*
+ * TODO: re-read preempt count can be avoided, but it needs
+ * should_resched() taking another parameter as the current
+ * preempt count
+ */
+#ifdef PREEMPTION
+ if (should_resched(0))
+ __preempt_schedule();
+#endif
+ }
+}
+
+#endif /* !__LINUX_INTERRUPT_RC_H */
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index e2d3079d3f5f1..33fc4c814a9f0 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -151,6 +151,10 @@ static __always_inline unsigned char interrupt_context_level(void)
#define in_softirq() (softirq_count())
#define in_interrupt() (irq_count())
+#define hardirq_disable_count() ((preempt_count() & HARDIRQ_DISABLE_MASK) >> HARDIRQ_DISABLE_SHIFT)
+#define hardirq_disable_enter() __preempt_count_add_return(HARDIRQ_DISABLE_OFFSET)
+#define hardirq_disable_exit() __preempt_count_sub_return(HARDIRQ_DISABLE_OFFSET)
+
/*
* The preempt_count offset after preempt_disable();
*/
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index d3561c4a080e2..bbbee61c6f5df 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -57,6 +57,7 @@
#include <linux/linkage.h>
#include <linux/compiler.h>
#include <linux/irqflags.h>
+#include <linux/interrupt_rc.h>
#include <linux/thread_info.h>
#include <linux/stringify.h>
#include <linux/bottom_half.h>
@@ -272,9 +273,11 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
#endif
#define raw_spin_lock_irq(lock) _raw_spin_lock_irq(lock)
+#define raw_spin_lock_irq_disable(lock) _raw_spin_lock_irq_disable(lock)
#define raw_spin_lock_bh(lock) _raw_spin_lock_bh(lock)
#define raw_spin_unlock(lock) _raw_spin_unlock(lock)
#define raw_spin_unlock_irq(lock) _raw_spin_unlock_irq(lock)
+#define raw_spin_unlock_irq_enable(lock) _raw_spin_unlock_irq_enable(lock)
#define raw_spin_unlock_irqrestore(lock, flags) \
do { \
@@ -300,6 +303,13 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
1 : ({ local_irq_restore(flags); 0; }); \
})
+#define raw_spin_trylock_irq_disable(lock) \
+({ \
+ local_interrupt_disable(); \
+ raw_spin_trylock(lock) ? \
+ 1 : ({ local_interrupt_enable(); 0; }); \
+})
+
#ifndef CONFIG_PREEMPT_RT
/* Include rwlock functions for !RT */
#include <linux/rwlock.h>
@@ -376,6 +386,11 @@ static __always_inline void spin_lock_irq(spinlock_t *lock)
raw_spin_lock_irq(&lock->rlock);
}
+static __always_inline void spin_lock_irq_disable(spinlock_t *lock)
+{
+ raw_spin_lock_irq_disable(&lock->rlock);
+}
+
#define spin_lock_irqsave(lock, flags) \
do { \
raw_spin_lock_irqsave(spinlock_check(lock), flags); \
@@ -401,6 +416,11 @@ static __always_inline void spin_unlock_irq(spinlock_t *lock)
raw_spin_unlock_irq(&lock->rlock);
}
+static __always_inline void spin_unlock_irq_enable(spinlock_t *lock)
+{
+ raw_spin_unlock_irq_enable(&lock->rlock);
+}
+
static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
{
raw_spin_unlock_irqrestore(&lock->rlock, flags);
@@ -421,6 +441,11 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
})
+static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
+{
+ return raw_spin_trylock_irq_disable(&lock->rlock);
+}
+
/**
* spin_is_locked() - Check whether a spinlock is locked.
* @lock: Pointer to the spinlock.
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index 9ecb0ab504e32..92532103b9eaa 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -28,6 +28,8 @@ _raw_spin_lock_nest_lock(raw_spinlock_t *lock, struct lockdep_map *map)
void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock) __acquires(lock);
void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
__acquires(lock);
+void __lockfunc _raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+ __acquires(lock);
unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
__acquires(lock);
@@ -39,6 +41,7 @@ int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock);
void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock) __releases(lock);
void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock) __releases(lock);
void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock) __releases(lock);
+void __lockfunc _raw_spin_unlock_irq_enable(raw_spinlock_t *lock) __releases(lock);
void __lockfunc
_raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
__releases(lock);
@@ -55,6 +58,11 @@ _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
#define _raw_spin_lock_irq(lock) __raw_spin_lock_irq(lock)
#endif
+/* Use the same config as spin_lock_irq() temporarily. */
+#ifdef CONFIG_INLINE_SPIN_LOCK_IRQ
+#define _raw_spin_lock_irq_disable(lock) __raw_spin_lock_irq_disable(lock)
+#endif
+
#ifdef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
#define _raw_spin_lock_irqsave(lock) __raw_spin_lock_irqsave(lock)
#endif
@@ -79,6 +87,11 @@ _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
#define _raw_spin_unlock_irq(lock) __raw_spin_unlock_irq(lock)
#endif
+/* Use the same config as spin_unlock_irq() temporarily. */
+#ifdef CONFIG_INLINE_SPIN_UNLOCK_IRQ
+#define _raw_spin_unlock_irq_enable(lock) __raw_spin_unlock_irq_enable(lock)
+#endif
+
#ifdef CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE
#define _raw_spin_unlock_irqrestore(lock, flags) __raw_spin_unlock_irqrestore(lock, flags)
#endif
@@ -120,6 +133,13 @@ static inline void __raw_spin_lock_irq(raw_spinlock_t *lock)
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}
+static inline void __raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+ local_interrupt_disable();
+ spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
+}
+
static inline void __raw_spin_lock_bh(raw_spinlock_t *lock)
{
__local_bh_disable_ip(_RET_IP_, SOFTIRQ_LOCK_OFFSET);
@@ -160,6 +180,13 @@ static inline void __raw_spin_unlock_irq(raw_spinlock_t *lock)
preempt_enable();
}
+static inline void __raw_spin_unlock_irq_enable(raw_spinlock_t *lock)
+{
+ spin_release(&lock->dep_map, _RET_IP_);
+ do_raw_spin_unlock(lock);
+ local_interrupt_enable();
+}
+
static inline void __raw_spin_unlock_bh(raw_spinlock_t *lock)
{
spin_release(&lock->dep_map, _RET_IP_);
diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h
index 819aeba1c87e6..d02a73671713b 100644
--- a/include/linux/spinlock_api_up.h
+++ b/include/linux/spinlock_api_up.h
@@ -36,6 +36,9 @@
#define __LOCK_IRQ(lock) \
do { local_irq_disable(); __LOCK(lock); } while (0)
+#define __LOCK_IRQ_DISABLE(lock) \
+ do { local_interrupt_disable(); __LOCK(lock); } while (0)
+
#define __LOCK_IRQSAVE(lock, flags) \
do { local_irq_save(flags); __LOCK(lock); } while (0)
@@ -52,6 +55,9 @@
#define __UNLOCK_IRQ(lock) \
do { local_irq_enable(); __UNLOCK(lock); } while (0)
+#define __UNLOCK_IRQ_ENABLE(lock) \
+ do { __UNLOCK(lock); local_interrupt_enable(); } while (0)
+
#define __UNLOCK_IRQRESTORE(lock, flags) \
do { local_irq_restore(flags); __UNLOCK(lock); } while (0)
@@ -64,6 +70,7 @@
#define _raw_read_lock_bh(lock) __LOCK_BH(lock)
#define _raw_write_lock_bh(lock) __LOCK_BH(lock)
#define _raw_spin_lock_irq(lock) __LOCK_IRQ(lock)
+#define _raw_spin_lock_irq_disable(lock) __LOCK_IRQ_DISABLE(lock)
#define _raw_read_lock_irq(lock) __LOCK_IRQ(lock)
#define _raw_write_lock_irq(lock) __LOCK_IRQ(lock)
#define _raw_spin_lock_irqsave(lock, flags) __LOCK_IRQSAVE(lock, flags)
@@ -80,6 +87,7 @@
#define _raw_write_unlock_bh(lock) __UNLOCK_BH(lock)
#define _raw_read_unlock_bh(lock) __UNLOCK_BH(lock)
#define _raw_spin_unlock_irq(lock) __UNLOCK_IRQ(lock)
+#define _raw_spin_unlock_irq_enable(lock) __UNLOCK_IRQ_ENABLE(lock)
#define _raw_read_unlock_irq(lock) __UNLOCK_IRQ(lock)
#define _raw_write_unlock_irq(lock) __UNLOCK_IRQ(lock)
#define _raw_spin_unlock_irqrestore(lock, flags) \
diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h
index f6499c37157df..074182f7cfeea 100644
--- a/include/linux/spinlock_rt.h
+++ b/include/linux/spinlock_rt.h
@@ -93,6 +93,11 @@ static __always_inline void spin_lock_irq(spinlock_t *lock)
rt_spin_lock(lock);
}
+static __always_inline void spin_lock_irq_disable(spinlock_t *lock)
+{
+ rt_spin_lock(lock);
+}
+
#define spin_lock_irqsave(lock, flags) \
do { \
typecheck(unsigned long, flags); \
@@ -116,12 +121,22 @@ static __always_inline void spin_unlock_irq(spinlock_t *lock)
rt_spin_unlock(lock);
}
+static __always_inline void spin_unlock_irq_enable(spinlock_t *lock)
+{
+ rt_spin_unlock(lock);
+}
+
static __always_inline void spin_unlock_irqrestore(spinlock_t *lock,
unsigned long flags)
{
rt_spin_unlock(lock);
}
+static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
+{
+ return rt_spin_trylock(lock);
+}
+
#define spin_trylock(lock) \
__cond_lock(lock, rt_spin_trylock(lock))
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index 7685defd7c526..da54b220b5a45 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -125,6 +125,19 @@ static void __lockfunc __raw_##op##_lock_bh(locktype##_t *lock) \
*/
BUILD_LOCK_OPS(spin, raw_spinlock);
+/* No rwlock_t variants for now, so just build this function by hand */
+static void __lockfunc __raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+ for (;;) {
+ local_interrupt_disable();
+ if (likely(do_raw_spin_trylock(lock)))
+ break;
+ local_interrupt_enable();
+
+ arch_spin_relax(&lock->raw_lock);
+ }
+}
+
#ifndef CONFIG_PREEMPT_RT
BUILD_LOCK_OPS(read, rwlock);
BUILD_LOCK_OPS(write, rwlock);
@@ -172,6 +185,14 @@ noinline void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
EXPORT_SYMBOL(_raw_spin_lock_irq);
#endif
+#ifndef CONFIG_INLINE_SPIN_LOCK_IRQ
+noinline void __lockfunc _raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+ __raw_spin_lock_irq_disable(lock);
+}
+EXPORT_SYMBOL_GPL(_raw_spin_lock_irq_disable);
+#endif
+
#ifndef CONFIG_INLINE_SPIN_LOCK_BH
noinline void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
{
@@ -204,6 +225,14 @@ noinline void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
EXPORT_SYMBOL(_raw_spin_unlock_irq);
#endif
+#ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQ
+noinline void __lockfunc _raw_spin_unlock_irq_enable(raw_spinlock_t *lock)
+{
+ __raw_spin_unlock_irq_enable(lock);
+}
+EXPORT_SYMBOL_GPL(_raw_spin_unlock_irq_enable);
+#endif
+
#ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
noinline void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
{
diff --git a/kernel/softirq.c b/kernel/softirq.c
index af47ea23aba3b..b681545eabbbe 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,9 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
#endif
+DEFINE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
+EXPORT_PER_CPU_SYMBOL_GPL(local_interrupt_disable_state);
+
DEFINE_PER_CPU(unsigned int, nmi_nesting);
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 06/16] irq: Add KUnit test for refcounted interrupt enable/disable
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (4 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 05/16] irq & spin_lock: Add counted interrupt disabling/enabling Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-30 7:43 ` David Gow
2026-01-21 22:39 ` [PATCH v17 07/16] rust: Introduce interrupt module Lyude Paul
` (11 subsequent siblings)
17 siblings, 1 reply; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
While making changes to the refcounted interrupt patch series, at some
point on my local branch I broke something and ended up writing some kunit
tests for testing refcounted interrupts as a result. So, let's include
these tests now that we have refcounted interrupts.
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
V13:
* Add missing MODULE_DESCRIPTION/MODULE_LICENSE lines
* Switch from kunit_test_suites(…) to kunit_test_suite(…)
kernel/irq/Makefile | 1 +
kernel/irq/refcount_interrupt_test.c | 109 +++++++++++++++++++++++++++
2 files changed, 110 insertions(+)
create mode 100644 kernel/irq/refcount_interrupt_test.c
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index 6ab3a40556670..7b5bb5510b110 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_SMP) += affinity.o
obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
obj-$(CONFIG_IRQ_KUNIT_TEST) += irq_test.o
+obj-$(CONFIG_KUNIT) += refcount_interrupt_test.o
diff --git a/kernel/irq/refcount_interrupt_test.c b/kernel/irq/refcount_interrupt_test.c
new file mode 100644
index 0000000000000..b4f224595f261
--- /dev/null
+++ b/kernel/irq/refcount_interrupt_test.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test for refcounted interrupt enable/disables.
+ */
+
+#include <kunit/test.h>
+#include <linux/interrupt_rc.h>
+
+#define TEST_IRQ_ON() KUNIT_EXPECT_FALSE(test, irqs_disabled())
+#define TEST_IRQ_OFF() KUNIT_EXPECT_TRUE(test, irqs_disabled())
+
+/* ===== Test cases ===== */
+static void test_single_irq_change(struct kunit *test)
+{
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+}
+
+static void test_nested_irq_change(struct kunit *test)
+{
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+}
+
+static void test_multiple_irq_change(struct kunit *test)
+{
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+}
+
+static void test_irq_save(struct kunit *test)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_irq_restore(flags);
+ TEST_IRQ_ON();
+
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_irq_save(flags);
+ TEST_IRQ_OFF();
+ local_irq_restore(flags);
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+}
+
+static struct kunit_case test_cases[] = {
+ KUNIT_CASE(test_single_irq_change),
+ KUNIT_CASE(test_nested_irq_change),
+ KUNIT_CASE(test_multiple_irq_change),
+ KUNIT_CASE(test_irq_save),
+ {},
+};
+
+/* (init and exit are the same */
+static int test_init(struct kunit *test)
+{
+ TEST_IRQ_ON();
+
+ return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+ TEST_IRQ_ON();
+}
+
+static struct kunit_suite refcount_interrupt_test_suite = {
+ .name = "refcount_interrupt",
+ .test_cases = test_cases,
+ .init = test_init,
+ .exit = test_exit,
+};
+
+kunit_test_suite(refcount_interrupt_test_suite);
+MODULE_AUTHOR("Lyude Paul <lyude@redhat.com>");
+MODULE_DESCRIPTION("Refcounted interrupt unit test suite");
+MODULE_LICENSE("GPL");
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 06/16] irq: Add KUnit test for refcounted interrupt enable/disable
2026-01-21 22:39 ` [PATCH v17 06/16] irq: Add KUnit test for refcounted interrupt enable/disable Lyude Paul
@ 2026-01-30 7:43 ` David Gow
0 siblings, 0 replies; 47+ messages in thread
From: David Gow @ 2026-01-30 7:43 UTC (permalink / raw)
To: Lyude Paul
Cc: rust-for-linux, linux-kernel, Thomas Gleixner, Boqun Feng,
Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
[-- Attachment #1: Type: text/plain, Size: 4700 bytes --]
On Thu, 22 Jan 2026 at 06:43, Lyude Paul <lyude@redhat.com> wrote:
>
> While making changes to the refcounted interrupt patch series, at some
> point on my local branch I broke something and ended up writing some kunit
> tests for testing refcounted interrupts as a result. So, let's include
> these tests now that we have refcounted interrupts.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
>
> ---
Looks good to me. Not sure if it'd make sense to move this into
kernel/irq/irq_test.c alongside other IRQs, or if you like having it
separate. I'm happy either way.
Reviewed-by: David Gow <davidgow@google.com>
Cheers,
-- David
> V13:
> * Add missing MODULE_DESCRIPTION/MODULE_LICENSE lines
> * Switch from kunit_test_suites(…) to kunit_test_suite(…)
>
> kernel/irq/Makefile | 1 +
> kernel/irq/refcount_interrupt_test.c | 109 +++++++++++++++++++++++++++
> 2 files changed, 110 insertions(+)
> create mode 100644 kernel/irq/refcount_interrupt_test.c
>
> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> index 6ab3a40556670..7b5bb5510b110 100644
> --- a/kernel/irq/Makefile
> +++ b/kernel/irq/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_SMP) += affinity.o
> obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
> obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
> obj-$(CONFIG_IRQ_KUNIT_TEST) += irq_test.o
> +obj-$(CONFIG_KUNIT) += refcount_interrupt_test.o
> diff --git a/kernel/irq/refcount_interrupt_test.c b/kernel/irq/refcount_interrupt_test.c
> new file mode 100644
> index 0000000000000..b4f224595f261
> --- /dev/null
> +++ b/kernel/irq/refcount_interrupt_test.c
> @@ -0,0 +1,109 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KUnit test for refcounted interrupt enable/disables.
> + */
> +
> +#include <kunit/test.h>
> +#include <linux/interrupt_rc.h>
> +
> +#define TEST_IRQ_ON() KUNIT_EXPECT_FALSE(test, irqs_disabled())
> +#define TEST_IRQ_OFF() KUNIT_EXPECT_TRUE(test, irqs_disabled())
> +
> +/* ===== Test cases ===== */
> +static void test_single_irq_change(struct kunit *test)
> +{
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> + local_interrupt_enable();
> +}
> +
> +static void test_nested_irq_change(struct kunit *test)
> +{
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> +
> + local_interrupt_enable();
> + TEST_IRQ_OFF();
> + local_interrupt_enable();
> + TEST_IRQ_OFF();
> + local_interrupt_enable();
> + TEST_IRQ_ON();
> +}
> +
> +static void test_multiple_irq_change(struct kunit *test)
> +{
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> +
> + local_interrupt_enable();
> + TEST_IRQ_OFF();
> + local_interrupt_enable();
> + TEST_IRQ_ON();
> +
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> + local_interrupt_enable();
> + TEST_IRQ_ON();
> +}
> +
> +static void test_irq_save(struct kunit *test)
> +{
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + TEST_IRQ_OFF();
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> + local_interrupt_enable();
> + TEST_IRQ_OFF();
> + local_irq_restore(flags);
> + TEST_IRQ_ON();
> +
> + local_interrupt_disable();
> + TEST_IRQ_OFF();
> + local_irq_save(flags);
> + TEST_IRQ_OFF();
> + local_irq_restore(flags);
> + TEST_IRQ_OFF();
> + local_interrupt_enable();
> + TEST_IRQ_ON();
> +}
> +
> +static struct kunit_case test_cases[] = {
> + KUNIT_CASE(test_single_irq_change),
> + KUNIT_CASE(test_nested_irq_change),
> + KUNIT_CASE(test_multiple_irq_change),
> + KUNIT_CASE(test_irq_save),
> + {},
> +};
> +
> +/* (init and exit are the same */
> +static int test_init(struct kunit *test)
> +{
> + TEST_IRQ_ON();
> +
> + return 0;
> +}
> +
> +static void test_exit(struct kunit *test)
> +{
> + TEST_IRQ_ON();
> +}
> +
> +static struct kunit_suite refcount_interrupt_test_suite = {
> + .name = "refcount_interrupt",
> + .test_cases = test_cases,
> + .init = test_init,
> + .exit = test_exit,
> +};
> +
> +kunit_test_suite(refcount_interrupt_test_suite);
> +MODULE_AUTHOR("Lyude Paul <lyude@redhat.com>");
> +MODULE_DESCRIPTION("Refcounted interrupt unit test suite");
> +MODULE_LICENSE("GPL");
> --
> 2.52.0
>
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5281 bytes --]
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v17 07/16] rust: Introduce interrupt module
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (5 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 06/16] irq: Add KUnit test for refcounted interrupt enable/disable Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 08/16] rust: helper: Add spin_{un,}lock_irq_{enable,disable}() helpers Lyude Paul
` (10 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
This introduces a module for dealing with interrupt-disabled contexts,
including the ability to enable and disable interrupts along with the
ability to annotate functions as expecting that IRQs are already
disabled on the local CPU.
[Boqun: This is based on Lyude's work on interrupt disable abstraction,
I port to the new local_interrupt_disable() mechanism to make it work
as a guard type. I cannot even take the credit of this design, since
Lyude also brought up the same idea in zulip. Anyway, this is only for
POC purpose, and of course all bugs are mine]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
---
V10:
* Fix documentation typos
V11:
* Get rid of unneeded `use bindings;`
* Move ASSUME_DISABLED into assume_disabled()
* Confirm using lockdep_assert_irqs_disabled() that local interrupts are in
fact disabled when LocalInterruptDisabled::assume_disabled() is called.
rust/helpers/helpers.c | 1 +
rust/helpers/interrupt.c | 18 +++++++++
rust/helpers/sync.c | 5 +++
rust/kernel/interrupt.rs | 86 ++++++++++++++++++++++++++++++++++++++++
rust/kernel/lib.rs | 1 +
5 files changed, 111 insertions(+)
create mode 100644 rust/helpers/interrupt.c
create mode 100644 rust/kernel/interrupt.rs
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 79c72762ad9c4..f97d1cb63e0cd 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -29,6 +29,7 @@
#include "err.c"
#include "irq.c"
#include "fs.c"
+#include "interrupt.c"
#include "io.c"
#include "jump_label.c"
#include "kunit.c"
diff --git a/rust/helpers/interrupt.c b/rust/helpers/interrupt.c
new file mode 100644
index 0000000000000..f2380dd461ca5
--- /dev/null
+++ b/rust/helpers/interrupt.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/spinlock.h>
+
+void rust_helper_local_interrupt_disable(void)
+{
+ local_interrupt_disable();
+}
+
+void rust_helper_local_interrupt_enable(void)
+{
+ local_interrupt_enable();
+}
+
+bool rust_helper_irqs_disabled(void)
+{
+ return irqs_disabled();
+}
diff --git a/rust/helpers/sync.c b/rust/helpers/sync.c
index ff7e68b488101..45b2f519f4e2e 100644
--- a/rust/helpers/sync.c
+++ b/rust/helpers/sync.c
@@ -11,3 +11,8 @@ void rust_helper_lockdep_unregister_key(struct lock_class_key *k)
{
lockdep_unregister_key(k);
}
+
+void rust_helper_lockdep_assert_irqs_disabled(void)
+{
+ lockdep_assert_irqs_disabled();
+}
diff --git a/rust/kernel/interrupt.rs b/rust/kernel/interrupt.rs
new file mode 100644
index 0000000000000..6c8d2f58bca70
--- /dev/null
+++ b/rust/kernel/interrupt.rs
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Interrupt controls
+//!
+//! This module allows Rust code to annotate areas of code where local processor interrupts should
+//! be disabled, along with actually disabling local processor interrupts.
+//!
+//! # ⚠️ Warning! ⚠️
+//!
+//! The usage of this module can be more complicated than meets the eye, especially surrounding
+//! [preemptible kernels]. It's recommended to take care when using the functions and types defined
+//! here and familiarize yourself with the various documentation we have before using them, along
+//! with the various documents we link to here.
+//!
+//! # Reading material
+//!
+//! - [Software interrupts and realtime (LWN)](https://lwn.net/Articles/520076)
+//!
+//! [preemptible kernels]: https://www.kernel.org/doc/html/latest/locking/preempt-locking.html
+
+use kernel::types::NotThreadSafe;
+
+/// A guard that represents local processor interrupt disablement on preemptible kernels.
+///
+/// [`LocalInterruptDisabled`] is a guard type that represents that local processor interrupts have
+/// been disabled on a preemptible kernel.
+///
+/// Certain functions take an immutable reference of [`LocalInterruptDisabled`] in order to require
+/// that they may only be run in local-interrupt-disabled contexts on preemptible kernels.
+///
+/// This is a marker type; it has no size, and is simply used as a compile-time guarantee that local
+/// processor interrupts are disabled on preemptible kernels. Note that no guarantees about the
+/// state of interrupts are made by this type on non-preemptible kernels.
+///
+/// # Invariants
+///
+/// Local processor interrupts are disabled on preemptible kernels for as long as an object of this
+/// type exists.
+pub struct LocalInterruptDisabled(NotThreadSafe);
+
+/// Disable local processor interrupts on a preemptible kernel.
+///
+/// This function disables local processor interrupts on a preemptible kernel, and returns a
+/// [`LocalInterruptDisabled`] token as proof of this. On non-preemptible kernels, this function is
+/// a no-op.
+///
+/// **Usage of this function is discouraged** unless you are absolutely sure you know what you are
+/// doing, as kernel interfaces for rust that deal with interrupt state will typically handle local
+/// processor interrupt state management on their own and managing this by hand is quite error
+/// prone.
+pub fn local_interrupt_disable() -> LocalInterruptDisabled {
+ // SAFETY: It's always safe to call `local_interrupt_disable()`.
+ unsafe { bindings::local_interrupt_disable() };
+
+ LocalInterruptDisabled(NotThreadSafe)
+}
+
+impl Drop for LocalInterruptDisabled {
+ fn drop(&mut self) {
+ // SAFETY: Per type invariants, a `local_interrupt_disable()` must be called to create this
+ // object, hence call the corresponding `local_interrupt_enable()` is safe.
+ unsafe { bindings::local_interrupt_enable() };
+ }
+}
+
+impl LocalInterruptDisabled {
+ /// Assume that local processor interrupts are disabled on preemptible kernels.
+ ///
+ /// This can be used for annotating code that is known to be run in contexts where local
+ /// processor interrupts are disabled on preemptible kernels. It makes no changes to the local
+ /// interrupt state on its own.
+ ///
+ /// # Safety
+ ///
+ /// For the whole life `'a`, local interrupts must be disabled on preemptible kernels. This
+ /// could be a context like for example, an interrupt handler.
+ pub unsafe fn assume_disabled<'a>() -> &'a LocalInterruptDisabled {
+ const ASSUME_DISABLED: &LocalInterruptDisabled = &LocalInterruptDisabled(NotThreadSafe);
+
+ // Confirm they're actually disabled if lockdep is available
+ // SAFETY: It's always safe to call `lockdep_assert_irqs_disabled()`
+ unsafe { bindings::lockdep_assert_irqs_disabled() };
+
+ ASSUME_DISABLED
+ }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index f812cf1200428..fbe830c4c54e4 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -101,6 +101,7 @@
pub mod i2c;
pub mod id_pool;
pub mod init;
+pub mod interrupt;
pub mod io;
pub mod ioctl;
pub mod iov;
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 08/16] rust: helper: Add spin_{un,}lock_irq_{enable,disable}() helpers
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (6 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 07/16] rust: Introduce interrupt module Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-26 13:25 ` Gary Guo
2026-01-21 22:39 ` [PATCH v17 09/16] rust: sync: Add SpinLockIrq Lyude Paul
` (9 subsequent siblings)
17 siblings, 1 reply; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
From: Boqun Feng <boqun.feng@gmail.com>
spin_lock_irq_disable() and spin_unlock_irq_enable() are inline
functions, to use them in Rust, helpers are introduced. This is for
interrupt disabling lock abstraction in Rust.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
rust/helpers/spinlock.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/rust/helpers/spinlock.c b/rust/helpers/spinlock.c
index 42c4bf01a23e4..d4e61057c2a7a 100644
--- a/rust/helpers/spinlock.c
+++ b/rust/helpers/spinlock.c
@@ -35,3 +35,18 @@ void rust_helper_spin_assert_is_held(spinlock_t *lock)
{
lockdep_assert_held(lock);
}
+
+void rust_helper_spin_lock_irq_disable(spinlock_t *lock)
+{
+ spin_lock_irq_disable(lock);
+}
+
+void rust_helper_spin_unlock_irq_enable(spinlock_t *lock)
+{
+ spin_unlock_irq_enable(lock);
+}
+
+int rust_helper_spin_trylock_irq_disable(spinlock_t *lock)
+{
+ return spin_trylock_irq_disable(lock);
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 08/16] rust: helper: Add spin_{un,}lock_irq_{enable,disable}() helpers
2026-01-21 22:39 ` [PATCH v17 08/16] rust: helper: Add spin_{un,}lock_irq_{enable,disable}() helpers Lyude Paul
@ 2026-01-26 13:25 ` Gary Guo
0 siblings, 0 replies; 47+ messages in thread
From: Gary Guo @ 2026-01-26 13:25 UTC (permalink / raw)
To: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
On Wed Jan 21, 2026 at 10:39 PM GMT, Lyude Paul wrote:
> From: Boqun Feng <boqun.feng@gmail.com>
>
> spin_lock_irq_disable() and spin_unlock_irq_enable() are inline
> functions, to use them in Rust, helpers are introduced. This is for
> interrupt disabling lock abstraction in Rust.
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Reviewed-by: Andreas Hindborg <a.hindborg@kernel.org>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> ---
> rust/helpers/spinlock.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/rust/helpers/spinlock.c b/rust/helpers/spinlock.c
> index 42c4bf01a23e4..d4e61057c2a7a 100644
> --- a/rust/helpers/spinlock.c
> +++ b/rust/helpers/spinlock.c
> @@ -35,3 +35,18 @@ void rust_helper_spin_assert_is_held(spinlock_t *lock)
> {
> lockdep_assert_held(lock);
> }
> +
> +void rust_helper_spin_lock_irq_disable(spinlock_t *lock)
> +{
> + spin_lock_irq_disable(lock);
> +}
> +
> +void rust_helper_spin_unlock_irq_enable(spinlock_t *lock)
> +{
> + spin_unlock_irq_enable(lock);
> +}
> +
> +int rust_helper_spin_trylock_irq_disable(spinlock_t *lock)
> +{
> + return spin_trylock_irq_disable(lock);
> +}
My comments from v16 about adding __rust_helper is not addressed.
Best,
Gary
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v17 09/16] rust: sync: Add SpinLockIrq
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (7 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 08/16] rust: helper: Add spin_{un,}lock_irq_{enable,disable}() helpers Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-23 22:26 ` Benno Lossin
2026-01-21 22:39 ` [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends Lyude Paul
` (8 subsequent siblings)
17 siblings, 1 reply; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
A variant of `SpinLock` that ensures interrupts are disabled in the
critical section. `lock()` will ensure that either interrupts are already
disabled or disable them. `unlock()` will reverse the respective operation.
[Boqun: Port to use spin_lock_irq_disable() and
spin_unlock_irq_enable()]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
V10:
* Also add support to GlobalLock
* Documentation fixes from Dirk
V11:
* Add unit test requested by Daniel Almeida
V14:
* Improve rustdoc for SpinLockIrqBackend
V17:
* Update Git summary according to Benno's review
rust/kernel/sync.rs | 4 +-
rust/kernel/sync/lock/global.rs | 3 +
rust/kernel/sync/lock/spinlock.rs | 229 ++++++++++++++++++++++++++++++
3 files changed, 235 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index 5df87e2bd212e..48a7cae86c50c 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -27,7 +27,9 @@
pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
pub use lock::global::{global_lock, GlobalGuard, GlobalLock, GlobalLockBackend, GlobalLockedBy};
pub use lock::mutex::{new_mutex, Mutex, MutexGuard};
-pub use lock::spinlock::{new_spinlock, SpinLock, SpinLockGuard};
+pub use lock::spinlock::{
+ new_spinlock, new_spinlock_irq, SpinLock, SpinLockGuard, SpinLockIrq, SpinLockIrqGuard,
+};
pub use locked_by::LockedBy;
pub use refcount::Refcount;
pub use set_once::SetOnce;
diff --git a/rust/kernel/sync/lock/global.rs b/rust/kernel/sync/lock/global.rs
index eab48108a4aeb..7030a47bc0ad1 100644
--- a/rust/kernel/sync/lock/global.rs
+++ b/rust/kernel/sync/lock/global.rs
@@ -302,4 +302,7 @@ macro_rules! global_lock_inner {
(backend SpinLock) => {
$crate::sync::lock::spinlock::SpinLockBackend
};
+ (backend SpinLockIrq) => {
+ $crate::sync::lock::spinlock::SpinLockIrqBackend
+ };
}
diff --git a/rust/kernel/sync/lock/spinlock.rs b/rust/kernel/sync/lock/spinlock.rs
index d7be38ccbdc7d..3fdfb0a8a0ab1 100644
--- a/rust/kernel/sync/lock/spinlock.rs
+++ b/rust/kernel/sync/lock/spinlock.rs
@@ -3,6 +3,7 @@
//! A kernel spinlock.
//!
//! This module allows Rust code to use the kernel's `spinlock_t`.
+use crate::prelude::*;
/// Creates a [`SpinLock`] initialiser with the given name and a newly-created lock class.
///
@@ -139,3 +140,231 @@ unsafe fn assert_is_held(ptr: *mut Self::State) {
unsafe { bindings::spin_assert_is_held(ptr) }
}
}
+
+/// Creates a [`SpinLockIrq`] initialiser with the given name and a newly-created lock class.
+///
+/// It uses the name if one is given, otherwise it generates one based on the file name and line
+/// number.
+#[macro_export]
+macro_rules! new_spinlock_irq {
+ ($inner:expr $(, $name:literal)? $(,)?) => {
+ $crate::sync::SpinLockIrq::new(
+ $inner, $crate::optional_name!($($name)?), $crate::static_lock_class!())
+ };
+}
+pub use new_spinlock_irq;
+
+/// A spinlock that may be acquired when local processor interrupts are disabled.
+///
+/// This is a version of [`SpinLock`] that can only be used in contexts where interrupts for the
+/// local CPU are disabled. It can be acquired in two ways:
+///
+/// - Using [`lock()`] like any other type of lock, in which case the bindings will modify the
+/// interrupt state to ensure that local processor interrupts remain disabled for at least as long
+/// as the [`SpinLockIrqGuard`] exists.
+/// - Using [`lock_with()`] in contexts where a [`LocalInterruptDisabled`] token is present and
+/// local processor interrupts are already known to be disabled, in which case the local interrupt
+/// state will not be touched. This method should be preferred if a [`LocalInterruptDisabled`]
+/// token is present in the scope.
+///
+/// For more info on spinlocks, see [`SpinLock`]. For more information on interrupts,
+/// [see the interrupt module](kernel::interrupt).
+///
+/// # Examples
+///
+/// The following example shows how to declare, allocate initialise and access a struct (`Example`)
+/// that contains an inner struct (`Inner`) that is protected by a spinlock that requires local
+/// processor interrupts to be disabled.
+///
+/// ```
+/// use kernel::sync::{new_spinlock_irq, SpinLockIrq};
+///
+/// struct Inner {
+/// a: u32,
+/// b: u32,
+/// }
+///
+/// #[pin_data]
+/// struct Example {
+/// #[pin]
+/// c: SpinLockIrq<Inner>,
+/// #[pin]
+/// d: SpinLockIrq<Inner>,
+/// }
+///
+/// impl Example {
+/// fn new() -> impl PinInit<Self> {
+/// pin_init!(Self {
+/// c <- new_spinlock_irq!(Inner { a: 0, b: 10 }),
+/// d <- new_spinlock_irq!(Inner { a: 20, b: 30 }),
+/// })
+/// }
+/// }
+///
+/// // Allocate a boxed `Example`
+/// let e = KBox::pin_init(Example::new(), GFP_KERNEL)?;
+///
+/// // Accessing an `Example` from a context where interrupts may not be disabled already.
+/// let c_guard = e.c.lock(); // interrupts are disabled now, +1 interrupt disable refcount
+/// let d_guard = e.d.lock(); // no interrupt state change, +1 interrupt disable refcount
+///
+/// assert_eq!(c_guard.a, 0);
+/// assert_eq!(c_guard.b, 10);
+/// assert_eq!(d_guard.a, 20);
+/// assert_eq!(d_guard.b, 30);
+///
+/// drop(c_guard); // Dropping c_guard will not re-enable interrupts just yet, since d_guard is
+/// // still in scope.
+/// drop(d_guard); // Last interrupt disable reference dropped here, so interrupts are re-enabled
+/// // now
+/// # Ok::<(), Error>(())
+/// ```
+///
+/// [`lock()`]: SpinLockIrq::lock
+/// [`lock_with()`]: SpinLockIrq::lock_with
+pub type SpinLockIrq<T> = super::Lock<T, SpinLockIrqBackend>;
+
+/// A kernel `spinlock_t` lock backend that can only be acquired in interrupt disabled contexts.
+pub struct SpinLockIrqBackend;
+
+/// A [`Guard`] acquired from locking a [`SpinLockIrq`] using [`lock()`].
+///
+/// This is simply a type alias for a [`Guard`] returned from locking a [`SpinLockIrq`] using
+/// [`lock_with()`]. It will unlock the [`SpinLockIrq`] and decrement the local processor's
+/// interrupt disablement refcount upon being dropped.
+///
+/// [`Guard`]: super::Guard
+/// [`lock()`]: SpinLockIrq::lock
+/// [`lock_with()`]: SpinLockIrq::lock_with
+pub type SpinLockIrqGuard<'a, T> = super::Guard<'a, T, SpinLockIrqBackend>;
+
+// SAFETY: The underlying kernel `spinlock_t` object ensures mutual exclusion. `relock` uses the
+// default implementation that always calls the same locking method.
+unsafe impl super::Backend for SpinLockIrqBackend {
+ type State = bindings::spinlock_t;
+ type GuardState = ();
+
+ unsafe fn init(
+ ptr: *mut Self::State,
+ name: *const crate::ffi::c_char,
+ key: *mut bindings::lock_class_key,
+ ) {
+ // SAFETY: The safety requirements ensure that `ptr` is valid for writes, and `name` and
+ // `key` are valid for read indefinitely.
+ unsafe { bindings::__spin_lock_init(ptr, name, key) }
+ }
+
+ unsafe fn lock(ptr: *mut Self::State) -> Self::GuardState {
+ // SAFETY: The safety requirements of this function ensure that `ptr` points to valid
+ // memory, and that it has been initialised before.
+ unsafe { bindings::spin_lock_irq_disable(ptr) }
+ }
+
+ unsafe fn unlock(ptr: *mut Self::State, _guard_state: &Self::GuardState) {
+ // SAFETY: The safety requirements of this function ensure that `ptr` is valid and that the
+ // caller is the owner of the spinlock.
+ unsafe { bindings::spin_unlock_irq_enable(ptr) }
+ }
+
+ unsafe fn try_lock(ptr: *mut Self::State) -> Option<Self::GuardState> {
+ // SAFETY: The `ptr` pointer is guaranteed to be valid and initialized before use.
+ let result = unsafe { bindings::spin_trylock_irq_disable(ptr) };
+
+ if result != 0 {
+ Some(())
+ } else {
+ None
+ }
+ }
+
+ unsafe fn assert_is_held(ptr: *mut Self::State) {
+ // SAFETY: The `ptr` pointer is guaranteed to be valid and initialized before use.
+ unsafe { bindings::spin_assert_is_held(ptr) }
+ }
+}
+
+#[kunit_tests(rust_spinlock_irq_condvar)]
+mod tests {
+ use super::*;
+ use crate::{
+ sync::*,
+ workqueue::{self, impl_has_work, new_work, Work, WorkItem},
+ };
+
+ struct TestState {
+ value: u32,
+ waiter_ready: bool,
+ }
+
+ #[pin_data]
+ struct Test {
+ #[pin]
+ state: SpinLockIrq<TestState>,
+
+ #[pin]
+ state_changed: CondVar,
+
+ #[pin]
+ waiter_state_changed: CondVar,
+
+ #[pin]
+ wait_work: Work<Self>,
+ }
+
+ impl_has_work! {
+ impl HasWork<Self> for Test { self.wait_work }
+ }
+
+ impl Test {
+ pub(crate) fn new() -> Result<Arc<Self>> {
+ Arc::try_pin_init(
+ try_pin_init!(
+ Self {
+ state <- new_spinlock_irq!(TestState {
+ value: 1,
+ waiter_ready: false
+ }),
+ state_changed <- new_condvar!(),
+ waiter_state_changed <- new_condvar!(),
+ wait_work <- new_work!("IrqCondvarTest::wait_work")
+ }
+ ),
+ GFP_KERNEL,
+ )
+ }
+ }
+
+ impl WorkItem for Test {
+ type Pointer = Arc<Self>;
+
+ fn run(this: Arc<Self>) {
+ // Wait for the test to be ready to wait for us
+ let mut state = this.state.lock();
+
+ while !state.waiter_ready {
+ this.waiter_state_changed.wait(&mut state);
+ }
+
+ // Deliver the exciting value update our test has been waiting for
+ state.value += 1;
+ this.state_changed.notify_sync();
+ }
+ }
+
+ #[test]
+ fn spinlock_irq_condvar() -> Result {
+ let testdata = Test::new()?;
+
+ let _ = workqueue::system().enqueue(testdata.clone());
+
+ // Let the updater know when we're ready to wait
+ let mut state = testdata.state.lock();
+ state.waiter_ready = true;
+ testdata.waiter_state_changed.notify_sync();
+
+ // Wait for the exciting value update
+ testdata.state_changed.wait(&mut state);
+ assert_eq!(state.value, 2);
+ Ok(())
+ }
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 09/16] rust: sync: Add SpinLockIrq
2026-01-21 22:39 ` [PATCH v17 09/16] rust: sync: Add SpinLockIrq Lyude Paul
@ 2026-01-23 22:26 ` Benno Lossin
0 siblings, 0 replies; 47+ messages in thread
From: Benno Lossin @ 2026-01-23 22:26 UTC (permalink / raw)
To: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Andrew Morton, Peter Zijlstra, Ingo Molnar,
Will Deacon, Waiman Long
On Wed Jan 21, 2026 at 11:39 PM CET, Lyude Paul wrote:
> diff --git a/rust/kernel/sync/lock/spinlock.rs b/rust/kernel/sync/lock/spinlock.rs
> index d7be38ccbdc7d..3fdfb0a8a0ab1 100644
> --- a/rust/kernel/sync/lock/spinlock.rs
> +++ b/rust/kernel/sync/lock/spinlock.rs
> @@ -3,6 +3,7 @@
> //! A kernel spinlock.
> //!
> //! This module allows Rust code to use the kernel's `spinlock_t`.
> +use crate::prelude::*;
>
> /// Creates a [`SpinLock`] initialiser with the given name and a newly-created lock class.
> ///
> @@ -139,3 +140,231 @@ unsafe fn assert_is_held(ptr: *mut Self::State) {
> unsafe { bindings::spin_assert_is_held(ptr) }
> }
> }
> +
> +/// Creates a [`SpinLockIrq`] initialiser with the given name and a newly-created lock class.
> +///
> +/// It uses the name if one is given, otherwise it generates one based on the file name and line
> +/// number.
> +#[macro_export]
> +macro_rules! new_spinlock_irq {
> + ($inner:expr $(, $name:literal)? $(,)?) => {
> + $crate::sync::SpinLockIrq::new(
> + $inner, $crate::optional_name!($($name)?), $crate::static_lock_class!())
> + };
> +}
> +pub use new_spinlock_irq;
> +
> +/// A spinlock that may be acquired when local processor interrupts are disabled.
This hasn't been updated with my previous suggestion and I don't see a
reply on the mailing list. Am I missing something?
With that fixed:
Reviewed-by: Benno Lossin <lossin@kernel.org>
Cheers,
Benno
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (8 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 09/16] rust: sync: Add SpinLockIrq Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-22 11:56 ` kernel test robot
` (2 more replies)
2026-01-21 22:39 ` [PATCH v17 11/16] rust: sync: Expose lock::Backend Lyude Paul
` (7 subsequent siblings)
17 siblings, 3 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
`SpinLockIrq` and `SpinLock` use the exact same underlying C structure,
with the only real difference being that the former uses the irq_disable()
and irq_enable() variants for locking/unlocking. These variants can
introduce some minor overhead in contexts where we already know that
local processor interrupts are disabled, and as such we want a way to be
able to skip modifying processor interrupt state in said contexts in order
to avoid some overhead - just like the current C API allows us to do. So,
`ContextualBackend` allows us to cast a lock into it's contextless version
for situations where we already have whatever guarantees would be provided
by `BackendWithContext::ContextualBackend` in place.
In some hacked-together benchmarks we ran, most of the time this did
actually seem to lead to a noticeable difference in overhead:
From an aarch64 VM running on a MacBook M4:
lock() when irq is disabled, 100 times cost Delta { nanos: 500 }
lock_with() when irq is disabled, 100 times cost Delta { nanos: 292 }
lock() when irq is enabled, 100 times cost Delta { nanos: 834 }
lock() when irq is disabled, 100 times cost Delta { nanos: 459 }
lock_with() when irq is disabled, 100 times cost Delta { nanos: 291 }
lock() when irq is enabled, 100 times cost Delta { nanos: 709 }
From an x86_64 VM (qemu/kvm) running on a i7-13700H
lock() when irq is disabled, 100 times cost Delta { nanos: 1002 }
lock_with() when irq is disabled, 100 times cost Delta { nanos: 729 }
lock() when irq is enabled, 100 times cost Delta { nanos: 1516 }
lock() when irq is disabled, 100 times cost Delta { nanos: 754 }
lock_with() when irq is disabled, 100 times cost Delta { nanos: 966 }
lock() when irq is enabled, 100 times cost Delta { nanos: 1227 }
(note that there were some runs on x86_64 where lock() on irq disabled
vs. lock_with() on irq disabled had equivalent benchmarks, but it very
much appeared to be a minority of test runs.
While it's not clear how this affects real-world workloads yet, let's add
this for the time being so we can find out. Implement
lock::Lock::lock_with() and lock::BackendWithContext::ContextualBackend.
This makes it so that a `SpinLockIrq` will work like a `SpinLock` if
interrupts are disabled. So a function:
(&'a SpinLockIrq, &'a InterruptDisabled) -> Guard<'a, .., SpinLockBackend>
makes sense. Note that due to `Guard` and `InterruptDisabled` having the
same lifetime, interrupts cannot be enabled while the Guard exists.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
This was originally two patches, but keeping them split didn't make sense
after going from BackendInContext to BackendWithContext.
V10:
* Fix typos - Dirk/Lyude
* Since we're adding support for context locks to GlobalLock as well, let's
also make sure to cover try_lock while we're at it and add try_lock_with
* Add a private function as_lock_in_context() for handling casting from a
Lock<T, B> to Lock<T, B::ContextualBackend> so we don't have to duplicate
safety comments
V11:
* Fix clippy::ref_as_ptr error in Lock::as_lock_in_context()
V14:
* Add benchmark results, rewrite commit message
V17:
* Introduce `BackendWithContext`, move context-related bits into there and
out of `Backend`.
* Add missing #[must_use = …] for try_lock_with()
* Remove all unsafe code from lock_with() and try_lock_with():
Somehow I never noticed that literally none of the unsafe code in these
two functions is needed with as_lock_in_context()...
rust/kernel/sync/lock.rs | 71 ++++++++++++++++++++++++++++++-
rust/kernel/sync/lock/spinlock.rs | 48 ++++++++++++++++++++-
2 files changed, 117 insertions(+), 2 deletions(-)
diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
index 46a57d1fc309d..9f6d7b381bd15 100644
--- a/rust/kernel/sync/lock.rs
+++ b/rust/kernel/sync/lock.rs
@@ -30,10 +30,15 @@
/// is owned, that is, between calls to [`lock`] and [`unlock`].
/// - Implementers must also ensure that [`relock`] uses the same locking method as the original
/// lock operation.
+/// - Implementers must ensure if [`BackendInContext`] is a [`Backend`], it's safe to acquire the
+/// lock under the [`Context`], the [`State`] of two backends must be the same.
///
/// [`lock`]: Backend::lock
/// [`unlock`]: Backend::unlock
/// [`relock`]: Backend::relock
+/// [`BackendInContext`]: Backend::BackendInContext
+/// [`Context`]: Backend::Context
+/// [`State`]: Backend::State
pub unsafe trait Backend {
/// The state required by the lock.
type State;
@@ -97,6 +102,34 @@ unsafe fn relock(ptr: *mut Self::State, guard_state: &mut Self::GuardState) {
unsafe fn assert_is_held(ptr: *mut Self::State);
}
+/// A lock [`Backend`] with a [`ContextualBackend`] that can make lock acquisition cheaper.
+///
+/// Some locks, such as [`SpinLockIrq`](super::SpinLockIrq), can only be acquired in specific
+/// hardware contexts (e.g. local processor interrupts disabled). Entering and exiting these
+/// contexts incurs additional overhead. But this overhead may be avoided if we know ahead of time
+/// that we are already within the correct context for a given lock as we can then skip any costly
+/// operations required for entering/exiting said context.
+///
+/// Any lock implementing this trait requires such a interrupt context, and can provide cheaper
+/// lock-acquisition functions through [`Lock::lock_with`] and [`Lock::try_lock_with`] as long as a
+/// context token of type [`Context`] is available.
+///
+/// # Safety
+///
+/// - Implementors must ensure that it is safe to acquire the lock under [`Context`].
+///
+/// [`ContextualBackend`]: BackendWithContext::ContextualBackend
+/// [`Context`]: BackendWithContext::Context
+pub unsafe trait BackendWithContext: Backend {
+ /// The context which must be provided in order to acquire the lock with the
+ /// [`ContextualBackend`](BackendWithContext::ContextualBackend).
+ type Context<'a>;
+
+ /// The alternative cheaper backend we can use if a [`Context`](BackendWithContext::Context) is
+ /// provided.
+ type ContextualBackend: Backend<State = Self::State>;
+}
+
/// A mutual exclusion primitive.
///
/// Exposes one of the kernel locking primitives. Which one is exposed depends on the lock
@@ -169,7 +202,8 @@ pub unsafe fn from_raw<'a>(ptr: *mut B::State) -> &'a Self {
impl<T: ?Sized, B: Backend> Lock<T, B> {
/// Acquires the lock and gives the caller access to the data protected by it.
- pub fn lock(&self) -> Guard<'_, T, B> {
+ #[inline]
+ pub fn lock<'a>(&'a self) -> Guard<'a, T, B> {
// SAFETY: The constructor of the type calls `init`, so the existence of the object proves
// that `init` was called.
let state = unsafe { B::lock(self.state.get()) };
@@ -189,6 +223,41 @@ pub fn try_lock(&self) -> Option<Guard<'_, T, B>> {
}
}
+impl<T: ?Sized, B: BackendWithContext> Lock<T, B> {
+ /// Casts the lock as a `Lock<T, B::ContextualBackend>`.
+ fn as_lock_in_context<'a>(
+ &'a self,
+ _context: B::Context<'a>,
+ ) -> &'a Lock<T, B::ContextualBackend>
+ where
+ B::ContextualBackend: Backend,
+ {
+ // SAFETY:
+ // - Per the safety guarantee of `Backend`, if `B::ContextualBackend` and `B` should
+ // have the same state, the layout of the lock is the same so it's safe to convert one to
+ // another.
+ // - The caller provided `B::Context<'a>`, so it is safe to recast and return this lock.
+ unsafe { &*(core::ptr::from_ref(self) as *const _) }
+ }
+
+ /// Acquires the lock with the given context and gives the caller access to the data protected
+ /// by it.
+ pub fn lock_with<'a>(&'a self, context: B::Context<'a>) -> Guard<'a, T, B::ContextualBackend> {
+ self.as_lock_in_context(context).lock()
+ }
+
+ /// Tries to acquire the lock with the given context.
+ ///
+ /// Returns a guard that can be used to access the data protected by the lock if successful.
+ #[must_use = "if unused, the lock will be immediately unlocked"]
+ pub fn try_lock_with<'a>(
+ &'a self,
+ context: B::Context<'a>,
+ ) -> Option<Guard<'a, T, B::ContextualBackend>> {
+ self.as_lock_in_context(context).try_lock()
+ }
+}
+
/// A lock guard.
///
/// Allows mutual exclusion primitives that implement the [`Backend`] trait to automatically unlock
diff --git a/rust/kernel/sync/lock/spinlock.rs b/rust/kernel/sync/lock/spinlock.rs
index 3fdfb0a8a0ab1..e082791a0d23c 100644
--- a/rust/kernel/sync/lock/spinlock.rs
+++ b/rust/kernel/sync/lock/spinlock.rs
@@ -3,7 +3,7 @@
//! A kernel spinlock.
//!
//! This module allows Rust code to use the kernel's `spinlock_t`.
-use crate::prelude::*;
+use crate::{interrupt::LocalInterruptDisabled, prelude::*};
/// Creates a [`SpinLock`] initialiser with the given name and a newly-created lock class.
///
@@ -220,6 +220,45 @@ macro_rules! new_spinlock_irq {
/// # Ok::<(), Error>(())
/// ```
///
+/// The next example demonstrates locking a [`SpinLockIrq`] using [`lock_with()`] in a function
+/// which can only be called when local processor interrupts are already disabled.
+///
+/// ```
+/// use kernel::sync::{new_spinlock_irq, SpinLockIrq};
+/// use kernel::interrupt::*;
+///
+/// struct Inner {
+/// a: u32,
+/// }
+///
+/// #[pin_data]
+/// struct Example {
+/// #[pin]
+/// inner: SpinLockIrq<Inner>,
+/// }
+///
+/// impl Example {
+/// fn new() -> impl PinInit<Self> {
+/// pin_init!(Self {
+/// inner <- new_spinlock_irq!(Inner { a: 20 }),
+/// })
+/// }
+/// }
+///
+/// // Accessing an `Example` from a function that can only be called in no-interrupt contexts.
+/// fn noirq_work(e: &Example, interrupt_disabled: &LocalInterruptDisabled) {
+/// // Because we know interrupts are disabled from interrupt_disable, we can skip toggling
+/// // interrupt state using lock_with() and the provided token
+/// assert_eq!(e.inner.lock_with(interrupt_disabled).a, 20);
+/// }
+///
+/// # let e = KBox::pin_init(Example::new(), GFP_KERNEL)?;
+/// # let interrupt_guard = local_interrupt_disable();
+/// # noirq_work(&e, &interrupt_guard);
+/// #
+/// # Ok::<(), Error>(())
+/// ```
+///
/// [`lock()`]: SpinLockIrq::lock
/// [`lock_with()`]: SpinLockIrq::lock_with
pub type SpinLockIrq<T> = super::Lock<T, SpinLockIrqBackend>;
@@ -283,6 +322,13 @@ unsafe fn assert_is_held(ptr: *mut Self::State) {
}
}
+// SAFETY: When executing with local processor interrupts disabled, [`SpinLock`] and [`SpinLockIrq`]
+// are identical.
+unsafe impl super::BackendWithContext for SpinLockIrqBackend {
+ type Context<'a> = &'a LocalInterruptDisabled;
+ type ContextualBackend = SpinLockBackend;
+}
+
#[kunit_tests(rust_spinlock_irq_condvar)]
mod tests {
use super::*;
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends
2026-01-21 22:39 ` [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends Lyude Paul
@ 2026-01-22 11:56 ` kernel test robot
2026-01-23 22:55 ` Benno Lossin
2026-01-26 13:31 ` Gary Guo
2 siblings, 0 replies; 47+ messages in thread
From: kernel test robot @ 2026-01-22 11:56 UTC (permalink / raw)
To: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner
Cc: oe-kbuild-all, Boqun Feng, Daniel Almeida, Miguel Ojeda,
Alex Gaynor, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Andrew Morton, Linux Memory Management List, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
Hi Lyude,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 2ad6c5cdc89acfefb01b84afa5e55262c40d6fec]
url: https://github.com/intel-lab-lkp/linux/commits/Lyude-Paul/preempt-Introduce-HARDIRQ_DISABLE_BITS/20260122-064928
base: 2ad6c5cdc89acfefb01b84afa5e55262c40d6fec
patch link: https://lore.kernel.org/r/20260121223933.1568682-11-lyude%40redhat.com
patch subject: [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20260122/202601221246.Qfwh5Atq-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260122/202601221246.Qfwh5Atq-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601221246.Qfwh5Atq-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> warning: unresolved link to `Backend::BackendInContext`
--> rust/kernel/sync/lock.rs:39:27
|
39 | /// [`BackendInContext`]: Backend::BackendInContext
| ^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `Backend` has no associated item named `BackendInContext`
|
= note: `#[warn(rustdoc::broken_intra_doc_links)]` on by default
--
>> warning: unresolved link to `Backend::Context`
--> rust/kernel/sync/lock.rs:40:18
|
40 | /// [`Context`]: Backend::Context
| ^^^^^^^^^^^^^^^^ the trait `Backend` has no associated item named `Context`
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends
2026-01-21 22:39 ` [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends Lyude Paul
2026-01-22 11:56 ` kernel test robot
@ 2026-01-23 22:55 ` Benno Lossin
2026-01-26 13:31 ` Gary Guo
2 siblings, 0 replies; 47+ messages in thread
From: Benno Lossin @ 2026-01-23 22:55 UTC (permalink / raw)
To: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Andrew Morton, Peter Zijlstra, Ingo Molnar,
Will Deacon, Waiman Long
On Wed Jan 21, 2026 at 11:39 PM CET, Lyude Paul wrote:
> `SpinLockIrq` and `SpinLock` use the exact same underlying C structure,
> with the only real difference being that the former uses the irq_disable()
> and irq_enable() variants for locking/unlocking. These variants can
> introduce some minor overhead in contexts where we already know that
> local processor interrupts are disabled, and as such we want a way to be
> able to skip modifying processor interrupt state in said contexts in order
> to avoid some overhead - just like the current C API allows us to do. So,
> `ContextualBackend` allows us to cast a lock into it's contextless version
> for situations where we already have whatever guarantees would be provided
> by `BackendWithContext::ContextualBackend` in place.
>
> In some hacked-together benchmarks we ran, most of the time this did
> actually seem to lead to a noticeable difference in overhead:
>
> From an aarch64 VM running on a MacBook M4:
> lock() when irq is disabled, 100 times cost Delta { nanos: 500 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 292 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 834 }
>
> lock() when irq is disabled, 100 times cost Delta { nanos: 459 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 291 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 709 }
>
> From an x86_64 VM (qemu/kvm) running on a i7-13700H
> lock() when irq is disabled, 100 times cost Delta { nanos: 1002 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 729 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 1516 }
>
> lock() when irq is disabled, 100 times cost Delta { nanos: 754 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 966 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 1227 }
>
> (note that there were some runs on x86_64 where lock() on irq disabled
> vs. lock_with() on irq disabled had equivalent benchmarks, but it very
> much appeared to be a minority of test runs.
>
> While it's not clear how this affects real-world workloads yet, let's add
> this for the time being so we can find out. Implement
> lock::Lock::lock_with() and lock::BackendWithContext::ContextualBackend.
> This makes it so that a `SpinLockIrq` will work like a `SpinLock` if
> interrupts are disabled. So a function:
>
> (&'a SpinLockIrq, &'a InterruptDisabled) -> Guard<'a, .., SpinLockBackend>
>
> makes sense. Note that due to `Guard` and `InterruptDisabled` having the
> same lifetime, interrupts cannot be enabled while the Guard exists.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
My overall opinion of the design is that we should no longer use the generic
approach with the `Backend` traits. I think I mentioned this on Zulip
already at multiple points. I'm okay with having this extension, but it
would be ideal if we could move to not having a single `Lock` struct,
but one for each locking primitive.
>
> ---
> This was originally two patches, but keeping them split didn't make sense
> after going from BackendInContext to BackendWithContext.
>
> V10:
> * Fix typos - Dirk/Lyude
> * Since we're adding support for context locks to GlobalLock as well, let's
> also make sure to cover try_lock while we're at it and add try_lock_with
> * Add a private function as_lock_in_context() for handling casting from a
> Lock<T, B> to Lock<T, B::ContextualBackend> so we don't have to duplicate
> safety comments
> V11:
> * Fix clippy::ref_as_ptr error in Lock::as_lock_in_context()
> V14:
> * Add benchmark results, rewrite commit message
> V17:
> * Introduce `BackendWithContext`, move context-related bits into there and
> out of `Backend`.
> * Add missing #[must_use = …] for try_lock_with()
> * Remove all unsafe code from lock_with() and try_lock_with():
> Somehow I never noticed that literally none of the unsafe code in these
> two functions is needed with as_lock_in_context()...
>
> rust/kernel/sync/lock.rs | 71 ++++++++++++++++++++++++++++++-
> rust/kernel/sync/lock/spinlock.rs | 48 ++++++++++++++++++++-
> 2 files changed, 117 insertions(+), 2 deletions(-)
>
> diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
> index 46a57d1fc309d..9f6d7b381bd15 100644
> --- a/rust/kernel/sync/lock.rs
> +++ b/rust/kernel/sync/lock.rs
> @@ -30,10 +30,15 @@
> /// is owned, that is, between calls to [`lock`] and [`unlock`].
> /// - Implementers must also ensure that [`relock`] uses the same locking method as the original
> /// lock operation.
> +/// - Implementers must ensure if [`BackendInContext`] is a [`Backend`], it's safe to acquire the
> +/// lock under the [`Context`], the [`State`] of two backends must be the same.
This isn't needed, since we don't have `Backend::Context` any longer.
> ///
> /// [`lock`]: Backend::lock
> /// [`unlock`]: Backend::unlock
> /// [`relock`]: Backend::relock
> +/// [`BackendInContext`]: Backend::BackendInContext
> +/// [`Context`]: Backend::Context
> +/// [`State`]: Backend::State
Same for these.
> pub unsafe trait Backend {
> /// The state required by the lock.
> type State;
> @@ -97,6 +102,34 @@ unsafe fn relock(ptr: *mut Self::State, guard_state: &mut Self::GuardState) {
> unsafe fn assert_is_held(ptr: *mut Self::State);
> }
>
> +/// A lock [`Backend`] with a [`ContextualBackend`] that can make lock acquisition cheaper.
> +///
> +/// Some locks, such as [`SpinLockIrq`](super::SpinLockIrq), can only be acquired in specific
> +/// hardware contexts (e.g. local processor interrupts disabled). Entering and exiting these
> +/// contexts incurs additional overhead. But this overhead may be avoided if we know ahead of time
> +/// that we are already within the correct context for a given lock as we can then skip any costly
> +/// operations required for entering/exiting said context.
> +///
> +/// Any lock implementing this trait requires such a interrupt context, and can provide cheaper
> +/// lock-acquisition functions through [`Lock::lock_with`] and [`Lock::try_lock_with`] as long as a
> +/// context token of type [`Context`] is available.
> +///
> +/// # Safety
> +///
> +/// - Implementors must ensure that it is safe to acquire the lock under [`Context`].
This safety comment needs some improvements. We probably should just put
the entire cast into this.
> +///
> +/// [`ContextualBackend`]: BackendWithContext::ContextualBackend
> +/// [`Context`]: BackendWithContext::Context
> +pub unsafe trait BackendWithContext: Backend {
> + /// The context which must be provided in order to acquire the lock with the
> + /// [`ContextualBackend`](BackendWithContext::ContextualBackend).
> + type Context<'a>;
> +
> + /// The alternative cheaper backend we can use if a [`Context`](BackendWithContext::Context) is
> + /// provided.
> + type ContextualBackend: Backend<State = Self::State>;
> +}
> +
> /// A mutual exclusion primitive.
> ///
> /// Exposes one of the kernel locking primitives. Which one is exposed depends on the lock
> @@ -169,7 +202,8 @@ pub unsafe fn from_raw<'a>(ptr: *mut B::State) -> &'a Self {
>
> impl<T: ?Sized, B: Backend> Lock<T, B> {
> /// Acquires the lock and gives the caller access to the data protected by it.
> - pub fn lock(&self) -> Guard<'_, T, B> {
> + #[inline]
> + pub fn lock<'a>(&'a self) -> Guard<'a, T, B> {
Why this change?
> // SAFETY: The constructor of the type calls `init`, so the existence of the object proves
> // that `init` was called.
> let state = unsafe { B::lock(self.state.get()) };
> @@ -189,6 +223,41 @@ pub fn try_lock(&self) -> Option<Guard<'_, T, B>> {
> }
> }
>
> +impl<T: ?Sized, B: BackendWithContext> Lock<T, B> {
> + /// Casts the lock as a `Lock<T, B::ContextualBackend>`.
> + fn as_lock_in_context<'a>(
> + &'a self,
> + _context: B::Context<'a>,
> + ) -> &'a Lock<T, B::ContextualBackend>
> + where
> + B::ContextualBackend: Backend,
> + {
> + // SAFETY:
> + // - Per the safety guarantee of `Backend`, if `B::ContextualBackend` and `B` should
> + // have the same state, the layout of the lock is the same so it's safe to convert one to
> + // another.
This also relies on `Lock` being `repr(C)`. `repr(Rust)` types are
allowed to change layout in cases where their generics are substituted
for others (even if those other ones have the same layout!).
Cheers,
Benno
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends
2026-01-21 22:39 ` [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends Lyude Paul
2026-01-22 11:56 ` kernel test robot
2026-01-23 22:55 ` Benno Lossin
@ 2026-01-26 13:31 ` Gary Guo
2 siblings, 0 replies; 47+ messages in thread
From: Gary Guo @ 2026-01-26 13:31 UTC (permalink / raw)
To: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
On Wed Jan 21, 2026 at 10:39 PM GMT, Lyude Paul wrote:
> `SpinLockIrq` and `SpinLock` use the exact same underlying C structure,
> with the only real difference being that the former uses the irq_disable()
> and irq_enable() variants for locking/unlocking. These variants can
> introduce some minor overhead in contexts where we already know that
> local processor interrupts are disabled, and as such we want a way to be
> able to skip modifying processor interrupt state in said contexts in order
> to avoid some overhead - just like the current C API allows us to do. So,
> `ContextualBackend` allows us to cast a lock into it's contextless version
> for situations where we already have whatever guarantees would be provided
> by `BackendWithContext::ContextualBackend` in place.
>
> In some hacked-together benchmarks we ran, most of the time this did
> actually seem to lead to a noticeable difference in overhead:
>
> From an aarch64 VM running on a MacBook M4:
> lock() when irq is disabled, 100 times cost Delta { nanos: 500 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 292 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 834 }
>
> lock() when irq is disabled, 100 times cost Delta { nanos: 459 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 291 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 709 }
>
> From an x86_64 VM (qemu/kvm) running on a i7-13700H
> lock() when irq is disabled, 100 times cost Delta { nanos: 1002 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 729 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 1516 }
>
> lock() when irq is disabled, 100 times cost Delta { nanos: 754 }
> lock_with() when irq is disabled, 100 times cost Delta { nanos: 966 }
> lock() when irq is enabled, 100 times cost Delta { nanos: 1227 }
>
> (note that there were some runs on x86_64 where lock() on irq disabled
> vs. lock_with() on irq disabled had equivalent benchmarks, but it very
> much appeared to be a minority of test runs.
>
> While it's not clear how this affects real-world workloads yet, let's add
> this for the time being so we can find out. Implement
> lock::Lock::lock_with() and lock::BackendWithContext::ContextualBackend.
> This makes it so that a `SpinLockIrq` will work like a `SpinLock` if
> interrupts are disabled. So a function:
>
> (&'a SpinLockIrq, &'a InterruptDisabled) -> Guard<'a, .., SpinLockBackend>
>
> makes sense. Note that due to `Guard` and `InterruptDisabled` having the
> same lifetime, interrupts cannot be enabled while the Guard exists.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
>
> ---
> This was originally two patches, but keeping them split didn't make sense
> after going from BackendInContext to BackendWithContext.
>
> V10:
> * Fix typos - Dirk/Lyude
> * Since we're adding support for context locks to GlobalLock as well, let's
> also make sure to cover try_lock while we're at it and add try_lock_with
> * Add a private function as_lock_in_context() for handling casting from a
> Lock<T, B> to Lock<T, B::ContextualBackend> so we don't have to duplicate
> safety comments
> V11:
> * Fix clippy::ref_as_ptr error in Lock::as_lock_in_context()
> V14:
> * Add benchmark results, rewrite commit message
> V17:
> * Introduce `BackendWithContext`, move context-related bits into there and
> out of `Backend`.
> * Add missing #[must_use = …] for try_lock_with()
> * Remove all unsafe code from lock_with() and try_lock_with():
> Somehow I never noticed that literally none of the unsafe code in these
> two functions is needed with as_lock_in_context()...
>
> rust/kernel/sync/lock.rs | 71 ++++++++++++++++++++++++++++++-
> rust/kernel/sync/lock/spinlock.rs | 48 ++++++++++++++++++++-
> 2 files changed, 117 insertions(+), 2 deletions(-)
>
> diff --git a/rust/kernel/sync/lock.rs b/rust/kernel/sync/lock.rs
> index 46a57d1fc309d..9f6d7b381bd15 100644
> --- a/rust/kernel/sync/lock.rs
> +++ b/rust/kernel/sync/lock.rs
> @@ -30,10 +30,15 @@
> /// is owned, that is, between calls to [`lock`] and [`unlock`].
> /// - Implementers must also ensure that [`relock`] uses the same locking method as the original
> /// lock operation.
> +/// - Implementers must ensure if [`BackendInContext`] is a [`Backend`], it's safe to acquire the
> +/// lock under the [`Context`], the [`State`] of two backends must be the same.
> ///
> /// [`lock`]: Backend::lock
> /// [`unlock`]: Backend::unlock
> /// [`relock`]: Backend::relock
> +/// [`BackendInContext`]: Backend::BackendInContext
> +/// [`Context`]: Backend::Context
> +/// [`State`]: Backend::State
> pub unsafe trait Backend {
> /// The state required by the lock.
> type State;
> @@ -97,6 +102,34 @@ unsafe fn relock(ptr: *mut Self::State, guard_state: &mut Self::GuardState) {
> unsafe fn assert_is_held(ptr: *mut Self::State);
> }
>
> +/// A lock [`Backend`] with a [`ContextualBackend`] that can make lock acquisition cheaper.
> +///
> +/// Some locks, such as [`SpinLockIrq`](super::SpinLockIrq), can only be acquired in specific
> +/// hardware contexts (e.g. local processor interrupts disabled). Entering and exiting these
> +/// contexts incurs additional overhead. But this overhead may be avoided if we know ahead of time
> +/// that we are already within the correct context for a given lock as we can then skip any costly
> +/// operations required for entering/exiting said context.
> +///
> +/// Any lock implementing this trait requires such a interrupt context, and can provide cheaper
> +/// lock-acquisition functions through [`Lock::lock_with`] and [`Lock::try_lock_with`] as long as a
> +/// context token of type [`Context`] is available.
> +///
> +/// # Safety
> +///
> +/// - Implementors must ensure that it is safe to acquire the lock under [`Context`].
> +///
> +/// [`ContextualBackend`]: BackendWithContext::ContextualBackend
> +/// [`Context`]: BackendWithContext::Context
> +pub unsafe trait BackendWithContext: Backend {
> + /// The context which must be provided in order to acquire the lock with the
> + /// [`ContextualBackend`](BackendWithContext::ContextualBackend).
> + type Context<'a>;
> +
> + /// The alternative cheaper backend we can use if a [`Context`](BackendWithContext::Context) is
> + /// provided.
> + type ContextualBackend: Backend<State = Self::State>;
> +}
The dicsussion on Zulip seems to arrive in a consensus that we want to avoid
generic approach at all and do an inherent implementation on
`Lock<T, SpinLockIqBackend>` instead.
Link: https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/Spinlocks.20with.20IRQs.3F/near/564176443
Best,
Gary
> +
> /// A mutual exclusion primitive.
> ///
> /// Exposes one of the kernel locking primitives. Which one is exposed depends on the lock
> @@ -169,7 +202,8 @@ pub unsafe fn from_raw<'a>(ptr: *mut B::State) -> &'a Self {
>
> impl<T: ?Sized, B: Backend> Lock<T, B> {
> /// Acquires the lock and gives the caller access to the data protected by it.
> - pub fn lock(&self) -> Guard<'_, T, B> {
> + #[inline]
> + pub fn lock<'a>(&'a self) -> Guard<'a, T, B> {
> // SAFETY: The constructor of the type calls `init`, so the existence of the object proves
> // that `init` was called.
> let state = unsafe { B::lock(self.state.get()) };
> @@ -189,6 +223,41 @@ pub fn try_lock(&self) -> Option<Guard<'_, T, B>> {
> }
> }
>
> +impl<T: ?Sized, B: BackendWithContext> Lock<T, B> {
> + /// Casts the lock as a `Lock<T, B::ContextualBackend>`.
> + fn as_lock_in_context<'a>(
> + &'a self,
> + _context: B::Context<'a>,
> + ) -> &'a Lock<T, B::ContextualBackend>
> + where
> + B::ContextualBackend: Backend,
> + {
> + // SAFETY:
> + // - Per the safety guarantee of `Backend`, if `B::ContextualBackend` and `B` should
> + // have the same state, the layout of the lock is the same so it's safe to convert one to
> + // another.
> + // - The caller provided `B::Context<'a>`, so it is safe to recast and return this lock.
> + unsafe { &*(core::ptr::from_ref(self) as *const _) }
> + }
> +
> + /// Acquires the lock with the given context and gives the caller access to the data protected
> + /// by it.
> + pub fn lock_with<'a>(&'a self, context: B::Context<'a>) -> Guard<'a, T, B::ContextualBackend> {
> + self.as_lock_in_context(context).lock()
> + }
> +
> + /// Tries to acquire the lock with the given context.
> + ///
> + /// Returns a guard that can be used to access the data protected by the lock if successful.
> + #[must_use = "if unused, the lock will be immediately unlocked"]
> + pub fn try_lock_with<'a>(
> + &'a self,
> + context: B::Context<'a>,
> + ) -> Option<Guard<'a, T, B::ContextualBackend>> {
> + self.as_lock_in_context(context).try_lock()
> + }
> +}
> +
> /// A lock guard.
> ///
> /// Allows mutual exclusion primitives that implement the [`Backend`] trait to automatically unlock
> diff --git a/rust/kernel/sync/lock/spinlock.rs b/rust/kernel/sync/lock/spinlock.rs
> index 3fdfb0a8a0ab1..e082791a0d23c 100644
> --- a/rust/kernel/sync/lock/spinlock.rs
> +++ b/rust/kernel/sync/lock/spinlock.rs
> @@ -3,7 +3,7 @@
> //! A kernel spinlock.
> //!
> //! This module allows Rust code to use the kernel's `spinlock_t`.
> -use crate::prelude::*;
> +use crate::{interrupt::LocalInterruptDisabled, prelude::*};
>
> /// Creates a [`SpinLock`] initialiser with the given name and a newly-created lock class.
> ///
> @@ -220,6 +220,45 @@ macro_rules! new_spinlock_irq {
> /// # Ok::<(), Error>(())
> /// ```
> ///
> +/// The next example demonstrates locking a [`SpinLockIrq`] using [`lock_with()`] in a function
> +/// which can only be called when local processor interrupts are already disabled.
> +///
> +/// ```
> +/// use kernel::sync::{new_spinlock_irq, SpinLockIrq};
> +/// use kernel::interrupt::*;
> +///
> +/// struct Inner {
> +/// a: u32,
> +/// }
> +///
> +/// #[pin_data]
> +/// struct Example {
> +/// #[pin]
> +/// inner: SpinLockIrq<Inner>,
> +/// }
> +///
> +/// impl Example {
> +/// fn new() -> impl PinInit<Self> {
> +/// pin_init!(Self {
> +/// inner <- new_spinlock_irq!(Inner { a: 20 }),
> +/// })
> +/// }
> +/// }
> +///
> +/// // Accessing an `Example` from a function that can only be called in no-interrupt contexts.
> +/// fn noirq_work(e: &Example, interrupt_disabled: &LocalInterruptDisabled) {
> +/// // Because we know interrupts are disabled from interrupt_disable, we can skip toggling
> +/// // interrupt state using lock_with() and the provided token
> +/// assert_eq!(e.inner.lock_with(interrupt_disabled).a, 20);
> +/// }
> +///
> +/// # let e = KBox::pin_init(Example::new(), GFP_KERNEL)?;
> +/// # let interrupt_guard = local_interrupt_disable();
> +/// # noirq_work(&e, &interrupt_guard);
> +/// #
> +/// # Ok::<(), Error>(())
> +/// ```
> +///
> /// [`lock()`]: SpinLockIrq::lock
> /// [`lock_with()`]: SpinLockIrq::lock_with
> pub type SpinLockIrq<T> = super::Lock<T, SpinLockIrqBackend>;
> @@ -283,6 +322,13 @@ unsafe fn assert_is_held(ptr: *mut Self::State) {
> }
> }
>
> +// SAFETY: When executing with local processor interrupts disabled, [`SpinLock`] and [`SpinLockIrq`]
> +// are identical.
> +unsafe impl super::BackendWithContext for SpinLockIrqBackend {
> + type Context<'a> = &'a LocalInterruptDisabled;
> + type ContextualBackend = SpinLockBackend;
> +}
> +
> #[kunit_tests(rust_spinlock_irq_condvar)]
> mod tests {
> use super::*;
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v17 11/16] rust: sync: Expose lock::Backend
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (9 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 10/16] rust: sync: Introduce lock::Lock::lock_with() and friends Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-23 22:56 ` Benno Lossin
2026-01-21 22:39 ` [PATCH v17 12/16] rust: sync: lock/global: Rename B to G in trait bounds Lyude Paul
` (6 subsequent siblings)
17 siblings, 1 reply; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
Due to the addition of sync::lock::Backend::Context, lock guards can be
returned with a different Backend than their respective lock. Since we'll
be adding a trait bound for Backend to GlobalGuard in order to support
this, users will need to be able to directly refer to Backend so that they
can use it in trait bounds.
So, let's make this easier for users and expose Backend in sync.
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
rust/kernel/sync.rs | 1 +
1 file changed, 1 insertion(+)
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index 48a7cae86c50c..ce31154198cea 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -30,6 +30,7 @@
pub use lock::spinlock::{
new_spinlock, new_spinlock_irq, SpinLock, SpinLockGuard, SpinLockIrq, SpinLockIrqGuard,
};
+pub use lock::Backend;
pub use locked_by::LockedBy;
pub use refcount::Refcount;
pub use set_once::SetOnce;
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 11/16] rust: sync: Expose lock::Backend
2026-01-21 22:39 ` [PATCH v17 11/16] rust: sync: Expose lock::Backend Lyude Paul
@ 2026-01-23 22:56 ` Benno Lossin
0 siblings, 0 replies; 47+ messages in thread
From: Benno Lossin @ 2026-01-23 22:56 UTC (permalink / raw)
To: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Andrew Morton, Peter Zijlstra, Ingo Molnar,
Will Deacon, Waiman Long
On Wed Jan 21, 2026 at 11:39 PM CET, Lyude Paul wrote:
> Due to the addition of sync::lock::Backend::Context, lock guards can be
> returned with a different Backend than their respective lock. Since we'll
> be adding a trait bound for Backend to GlobalGuard in order to support
> this, users will need to be able to directly refer to Backend so that they
> can use it in trait bounds.
>
> So, let's make this easier for users and expose Backend in sync.
>
> Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Cheers,
Benno
> ---
> rust/kernel/sync.rs | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
> index 48a7cae86c50c..ce31154198cea 100644
> --- a/rust/kernel/sync.rs
> +++ b/rust/kernel/sync.rs
> @@ -30,6 +30,7 @@
> pub use lock::spinlock::{
> new_spinlock, new_spinlock_irq, SpinLock, SpinLockGuard, SpinLockIrq, SpinLockIrqGuard,
> };
> +pub use lock::Backend;
> pub use locked_by::LockedBy;
> pub use refcount::Refcount;
> pub use set_once::SetOnce;
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH v17 12/16] rust: sync: lock/global: Rename B to G in trait bounds
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (10 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 11/16] rust: sync: Expose lock::Backend Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 13/16] rust: sync: Add a lifetime parameter to lock::global::GlobalGuard Lyude Paul
` (5 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
Due to the introduction of BackendWithContext::ContextualBackend, if we
want to be able support Lock types with a Context we need to be able to
handle the fact that the Backend for a returned Guard may not exactly match
the Backend for the lock. Before we add this though, rename B to G in all
of our trait bounds to make sure things don't become more difficult to
understand once we add a Backend bound.
There should be no functional changes in this patch.
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
rust/kernel/sync/lock/global.rs | 58 ++++++++++++++++-----------------
1 file changed, 29 insertions(+), 29 deletions(-)
diff --git a/rust/kernel/sync/lock/global.rs b/rust/kernel/sync/lock/global.rs
index 7030a47bc0ad1..06d62ad02f90d 100644
--- a/rust/kernel/sync/lock/global.rs
+++ b/rust/kernel/sync/lock/global.rs
@@ -33,18 +33,18 @@ pub trait GlobalLockBackend {
/// Type used for global locks.
///
/// See [`global_lock!`] for examples.
-pub struct GlobalLock<B: GlobalLockBackend> {
- inner: Lock<B::Item, B::Backend>,
+pub struct GlobalLock<G: GlobalLockBackend> {
+ inner: Lock<G::Item, G::Backend>,
}
-impl<B: GlobalLockBackend> GlobalLock<B> {
+impl<G: GlobalLockBackend> GlobalLock<G> {
/// Creates a global lock.
///
/// # Safety
///
/// * Before any other method on this lock is called, [`Self::init`] must be called.
- /// * The type `B` must not be used with any other lock.
- pub const unsafe fn new(data: B::Item) -> Self {
+ /// * The type `G` must not be used with any other lock.
+ pub const unsafe fn new(data: G::Item) -> Self {
Self {
inner: Lock {
state: Opaque::uninit(),
@@ -68,23 +68,23 @@ pub unsafe fn init(&'static self) {
// `init` before using any other methods. As `init` can only be called once, all other
// uses of this lock must happen after this call.
unsafe {
- B::Backend::init(
+ G::Backend::init(
self.inner.state.get(),
- B::NAME.as_char_ptr(),
- B::get_lock_class().as_ptr(),
+ G::NAME.as_char_ptr(),
+ G::get_lock_class().as_ptr(),
)
}
}
/// Lock this global lock.
- pub fn lock(&'static self) -> GlobalGuard<B> {
+ pub fn lock(&'static self) -> GlobalGuard<G> {
GlobalGuard {
inner: self.inner.lock(),
}
}
/// Try to lock this global lock.
- pub fn try_lock(&'static self) -> Option<GlobalGuard<B>> {
+ pub fn try_lock(&'static self) -> Option<GlobalGuard<G>> {
Some(GlobalGuard {
inner: self.inner.try_lock()?,
})
@@ -94,21 +94,21 @@ pub fn try_lock(&'static self) -> Option<GlobalGuard<B>> {
/// A guard for a [`GlobalLock`].
///
/// See [`global_lock!`] for examples.
-pub struct GlobalGuard<B: GlobalLockBackend> {
- inner: Guard<'static, B::Item, B::Backend>,
+pub struct GlobalGuard<G: GlobalLockBackend> {
+ inner: Guard<'static, G::Item, G::Backend>,
}
-impl<B: GlobalLockBackend> core::ops::Deref for GlobalGuard<B> {
- type Target = B::Item;
+impl<G: GlobalLockBackend> core::ops::Deref for GlobalGuard<G> {
+ type Target = G::Item;
fn deref(&self) -> &Self::Target {
&self.inner
}
}
-impl<B: GlobalLockBackend> core::ops::DerefMut for GlobalGuard<B>
+impl<G: GlobalLockBackend> core::ops::DerefMut for GlobalGuard<G>
where
- B::Item: Unpin,
+ G::Item: Unpin,
{
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.inner
@@ -118,33 +118,33 @@ fn deref_mut(&mut self) -> &mut Self::Target {
/// A version of [`LockedBy`] for a [`GlobalLock`].
///
/// See [`global_lock!`] for examples.
-pub struct GlobalLockedBy<T: ?Sized, B: GlobalLockBackend> {
- _backend: PhantomData<B>,
+pub struct GlobalLockedBy<T: ?Sized, G: GlobalLockBackend> {
+ _backend: PhantomData<G>,
value: UnsafeCell<T>,
}
// SAFETY: The same thread-safety rules as `LockedBy` apply to `GlobalLockedBy`.
-unsafe impl<T, B> Send for GlobalLockedBy<T, B>
+unsafe impl<T, G> Send for GlobalLockedBy<T, G>
where
T: ?Sized,
- B: GlobalLockBackend,
- LockedBy<T, B::Item>: Send,
+ G: GlobalLockBackend,
+ LockedBy<T, G::Item>: Send,
{
}
// SAFETY: The same thread-safety rules as `LockedBy` apply to `GlobalLockedBy`.
-unsafe impl<T, B> Sync for GlobalLockedBy<T, B>
+unsafe impl<T, G> Sync for GlobalLockedBy<T, G>
where
T: ?Sized,
- B: GlobalLockBackend,
- LockedBy<T, B::Item>: Sync,
+ G: GlobalLockBackend,
+ LockedBy<T, G::Item>: Sync,
{
}
-impl<T, B: GlobalLockBackend> GlobalLockedBy<T, B> {
+impl<T, G: GlobalLockBackend> GlobalLockedBy<T, G> {
/// Create a new [`GlobalLockedBy`].
///
- /// The provided value will be protected by the global lock indicated by `B`.
+ /// The provided value will be protected by the global lock indicated by `G`.
pub fn new(val: T) -> Self {
Self {
value: UnsafeCell::new(val),
@@ -153,11 +153,11 @@ pub fn new(val: T) -> Self {
}
}
-impl<T: ?Sized, B: GlobalLockBackend> GlobalLockedBy<T, B> {
+impl<T: ?Sized, G: GlobalLockBackend> GlobalLockedBy<T, G> {
/// Access the value immutably.
///
/// The caller must prove shared access to the lock.
- pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<B>) -> &'a T {
+ pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<G>) -> &'a T {
// SAFETY: The lock is globally unique, so there can only be one guard.
unsafe { &*self.value.get() }
}
@@ -165,7 +165,7 @@ pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<B>) -> &'a T {
/// Access the value mutably.
///
/// The caller must prove shared exclusive to the lock.
- pub fn as_mut<'a>(&'a self, _guard: &'a mut GlobalGuard<B>) -> &'a mut T {
+ pub fn as_mut<'a>(&'a self, _guard: &'a mut GlobalGuard<G>) -> &'a mut T {
// SAFETY: The lock is globally unique, so there can only be one guard.
unsafe { &mut *self.value.get() }
}
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 13/16] rust: sync: Add a lifetime parameter to lock::global::GlobalGuard
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (11 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 12/16] rust: sync: lock/global: Rename B to G in trait bounds Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 14/16] rust: sync: lock/global: Add Backend parameter to GlobalGuard Lyude Paul
` (4 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
While a GlobalLock is always going to be static, in the case of locks with
explicit backend contexts the GlobalGuard will not be 'static and will
instead share the lifetime of the context. So, add a lifetime parameter to
GlobalGuard to allow for this so we can implement GlobalGuard support for
SpinlockIrq.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
---
rust/kernel/sync/lock/global.rs | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/rust/kernel/sync/lock/global.rs b/rust/kernel/sync/lock/global.rs
index 06d62ad02f90d..be17a30c66bf8 100644
--- a/rust/kernel/sync/lock/global.rs
+++ b/rust/kernel/sync/lock/global.rs
@@ -77,14 +77,14 @@ pub unsafe fn init(&'static self) {
}
/// Lock this global lock.
- pub fn lock(&'static self) -> GlobalGuard<G> {
+ pub fn lock(&'static self) -> GlobalGuard<'static, G> {
GlobalGuard {
inner: self.inner.lock(),
}
}
/// Try to lock this global lock.
- pub fn try_lock(&'static self) -> Option<GlobalGuard<G>> {
+ pub fn try_lock(&'static self) -> Option<GlobalGuard<'static, G>> {
Some(GlobalGuard {
inner: self.inner.try_lock()?,
})
@@ -94,11 +94,11 @@ pub fn try_lock(&'static self) -> Option<GlobalGuard<G>> {
/// A guard for a [`GlobalLock`].
///
/// See [`global_lock!`] for examples.
-pub struct GlobalGuard<G: GlobalLockBackend> {
- inner: Guard<'static, G::Item, G::Backend>,
+pub struct GlobalGuard<'a, G: GlobalLockBackend> {
+ inner: Guard<'a, G::Item, G::Backend>,
}
-impl<G: GlobalLockBackend> core::ops::Deref for GlobalGuard<G> {
+impl<'a, G: GlobalLockBackend> core::ops::Deref for GlobalGuard<'a, G> {
type Target = G::Item;
fn deref(&self) -> &Self::Target {
@@ -106,7 +106,7 @@ fn deref(&self) -> &Self::Target {
}
}
-impl<G: GlobalLockBackend> core::ops::DerefMut for GlobalGuard<G>
+impl<'a, G: GlobalLockBackend> core::ops::DerefMut for GlobalGuard<'a, G>
where
G::Item: Unpin,
{
@@ -157,7 +157,7 @@ impl<T: ?Sized, G: GlobalLockBackend> GlobalLockedBy<T, G> {
/// Access the value immutably.
///
/// The caller must prove shared access to the lock.
- pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<G>) -> &'a T {
+ pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<'_, G>) -> &'a T {
// SAFETY: The lock is globally unique, so there can only be one guard.
unsafe { &*self.value.get() }
}
@@ -165,7 +165,7 @@ pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<G>) -> &'a T {
/// Access the value mutably.
///
/// The caller must prove shared exclusive to the lock.
- pub fn as_mut<'a>(&'a self, _guard: &'a mut GlobalGuard<G>) -> &'a mut T {
+ pub fn as_mut<'a>(&'a self, _guard: &'a mut GlobalGuard<'_, G>) -> &'a mut T {
// SAFETY: The lock is globally unique, so there can only be one guard.
unsafe { &mut *self.value.get() }
}
@@ -235,7 +235,7 @@ pub fn get_mut(&mut self) -> &mut T {
/// /// Increment the counter in this instance.
/// ///
/// /// The caller must hold the `MY_MUTEX` mutex.
-/// fn increment(&self, guard: &mut GlobalGuard<MY_MUTEX>) -> u32 {
+/// fn increment(&self, guard: &mut GlobalGuard<'_, MY_MUTEX>) -> u32 {
/// let my_counter = self.my_counter.as_mut(guard);
/// *my_counter += 1;
/// *my_counter
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 14/16] rust: sync: lock/global: Add Backend parameter to GlobalGuard
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (12 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 13/16] rust: sync: Add a lifetime parameter to lock::global::GlobalGuard Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 15/16] rust: sync: lock/global: Add ContextualBackend support to GlobalLock Lyude Paul
` (3 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
Due to the introduction of sync::lock::Backend::Context, it's now possible
for normal locks to return a Guard with a different Backend than their
respective lock (e.g. Backend::BackendInContext). We want to be able to
support global locks with contexts as well, so add a trait bound to
explicitly specify which Backend is in use for a GlobalGuard.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
---
V17:
* Add default parameter for generic `B` to `GlobalGuard`
rust/kernel/sync/lock/global.rs | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/rust/kernel/sync/lock/global.rs b/rust/kernel/sync/lock/global.rs
index be17a30c66bf8..94f6b3b21324f 100644
--- a/rust/kernel/sync/lock/global.rs
+++ b/rust/kernel/sync/lock/global.rs
@@ -77,14 +77,14 @@ pub unsafe fn init(&'static self) {
}
/// Lock this global lock.
- pub fn lock(&'static self) -> GlobalGuard<'static, G> {
+ pub fn lock(&'static self) -> GlobalGuard<'static, G, G::Backend> {
GlobalGuard {
inner: self.inner.lock(),
}
}
/// Try to lock this global lock.
- pub fn try_lock(&'static self) -> Option<GlobalGuard<'static, G>> {
+ pub fn try_lock(&'static self) -> Option<GlobalGuard<'static, G, G::Backend>> {
Some(GlobalGuard {
inner: self.inner.try_lock()?,
})
@@ -94,11 +94,11 @@ pub fn try_lock(&'static self) -> Option<GlobalGuard<'static, G>> {
/// A guard for a [`GlobalLock`].
///
/// See [`global_lock!`] for examples.
-pub struct GlobalGuard<'a, G: GlobalLockBackend> {
- inner: Guard<'a, G::Item, G::Backend>,
+pub struct GlobalGuard<'a, G: GlobalLockBackend, B: Backend = <G as GlobalLockBackend>::Backend> {
+ inner: Guard<'a, G::Item, B>,
}
-impl<'a, G: GlobalLockBackend> core::ops::Deref for GlobalGuard<'a, G> {
+impl<'a, G: GlobalLockBackend, B: Backend> core::ops::Deref for GlobalGuard<'a, G, B> {
type Target = G::Item;
fn deref(&self) -> &Self::Target {
@@ -106,7 +106,7 @@ fn deref(&self) -> &Self::Target {
}
}
-impl<'a, G: GlobalLockBackend> core::ops::DerefMut for GlobalGuard<'a, G>
+impl<'a, G: GlobalLockBackend, B: Backend> core::ops::DerefMut for GlobalGuard<'a, G, B>
where
G::Item: Unpin,
{
@@ -157,7 +157,7 @@ impl<T: ?Sized, G: GlobalLockBackend> GlobalLockedBy<T, G> {
/// Access the value immutably.
///
/// The caller must prove shared access to the lock.
- pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<'_, G>) -> &'a T {
+ pub fn as_ref<'a, B: Backend>(&'a self, _guard: &'a GlobalGuard<'_, G, B>) -> &'a T {
// SAFETY: The lock is globally unique, so there can only be one guard.
unsafe { &*self.value.get() }
}
@@ -165,7 +165,7 @@ pub fn as_ref<'a>(&'a self, _guard: &'a GlobalGuard<'_, G>) -> &'a T {
/// Access the value mutably.
///
/// The caller must prove shared exclusive to the lock.
- pub fn as_mut<'a>(&'a self, _guard: &'a mut GlobalGuard<'_, G>) -> &'a mut T {
+ pub fn as_mut<'a, B: Backend>(&'a self, _guard: &'a mut GlobalGuard<'_, G, B>) -> &'a mut T {
// SAFETY: The lock is globally unique, so there can only be one guard.
unsafe { &mut *self.value.get() }
}
@@ -219,7 +219,7 @@ pub fn get_mut(&mut self) -> &mut T {
/// ```
/// # mod ex {
/// # use kernel::prelude::*;
-/// use kernel::sync::{GlobalGuard, GlobalLockedBy};
+/// use kernel::sync::{Backend, GlobalGuard, GlobalLockedBy};
///
/// kernel::sync::global_lock! {
/// // SAFETY: Initialized in module initializer before first use.
@@ -235,7 +235,7 @@ pub fn get_mut(&mut self) -> &mut T {
/// /// Increment the counter in this instance.
/// ///
/// /// The caller must hold the `MY_MUTEX` mutex.
-/// fn increment(&self, guard: &mut GlobalGuard<'_, MY_MUTEX>) -> u32 {
+/// fn increment<B: Backend>(&self, guard: &mut GlobalGuard<'_, MY_MUTEX, B>) -> u32 {
/// let my_counter = self.my_counter.as_mut(guard);
/// *my_counter += 1;
/// *my_counter
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 15/16] rust: sync: lock/global: Add ContextualBackend support to GlobalLock
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (13 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 14/16] rust: sync: lock/global: Add Backend parameter to GlobalGuard Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-21 22:39 ` [PATCH v17 16/16] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Lyude Paul
` (2 subsequent siblings)
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
Now that we have the ability to provide an explicit lifetime for a
GlobalGuard and an explicit Backend for a GlobalGuard, we can finally
implement lock_with() and try_lock_with().
Signed-off-by: Lyude Paul <lyude@redhat.com>
---
rust/kernel/sync/lock/global.rs | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/rust/kernel/sync/lock/global.rs b/rust/kernel/sync/lock/global.rs
index 94f6b3b21324f..cfc71af7482ed 100644
--- a/rust/kernel/sync/lock/global.rs
+++ b/rust/kernel/sync/lock/global.rs
@@ -6,7 +6,7 @@
use crate::{
str::{CStr, CStrExt as _},
- sync::lock::{Backend, Guard, Lock},
+ sync::lock::{Backend, BackendWithContext, Guard, Lock},
sync::{LockClassKey, LockedBy},
types::Opaque,
};
@@ -89,6 +89,34 @@ pub fn try_lock(&'static self) -> Option<GlobalGuard<'static, G, G::Backend>> {
inner: self.inner.try_lock()?,
})
}
+
+ /// Lock this global lock with the provided `context`.
+ pub fn lock_with<'a, B>(
+ &'static self,
+ context: <G::Backend as BackendWithContext>::Context<'a>,
+ ) -> GlobalGuard<'a, G, B>
+ where
+ G::Backend: BackendWithContext<ContextualBackend = B>,
+ B: Backend,
+ {
+ GlobalGuard {
+ inner: self.inner.lock_with(context),
+ }
+ }
+
+ /// Try to lock this global lock with the provided `context`.
+ pub fn try_lock_with<'a, B>(
+ &'static self,
+ context: <G::Backend as BackendWithContext>::Context<'a>,
+ ) -> Option<GlobalGuard<'a, G, B>>
+ where
+ G::Backend: BackendWithContext<ContextualBackend = B>,
+ B: Backend,
+ {
+ Some(GlobalGuard {
+ inner: self.inner.try_lock_with(context)?,
+ })
+ }
}
/// A guard for a [`GlobalLock`].
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* [PATCH v17 16/16] locking: Switch to _irq_{disable,enable}() variants in cleanup guards
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (14 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 15/16] rust: sync: lock/global: Add ContextualBackend support to GlobalLock Lyude Paul
@ 2026-01-21 22:39 ` Lyude Paul
2026-01-26 13:24 ` [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Gary Guo
2026-01-26 16:17 ` Boqun Feng
17 siblings, 0 replies; 47+ messages in thread
From: Lyude Paul @ 2026-01-21 22:39 UTC (permalink / raw)
To: rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
From: Boqun Feng <boqun.feng@gmail.com>
The semantics of various irq disabling guards match what
*_irq_{disable,enable}() provide, i.e. the interrupt disabling is
properly nested, therefore it's OK to switch to use
*_irq_{disable,enable}() primitives.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
V10:
* Add PREEMPT_RT build fix from Guangbo Cui
include/linux/spinlock.h | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index bbbee61c6f5df..72bb6ae5319c7 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -568,10 +568,10 @@ DEFINE_LOCK_GUARD_1(raw_spinlock_nested, raw_spinlock_t,
raw_spin_unlock(_T->lock))
DEFINE_LOCK_GUARD_1(raw_spinlock_irq, raw_spinlock_t,
- raw_spin_lock_irq(_T->lock),
- raw_spin_unlock_irq(_T->lock))
+ raw_spin_lock_irq_disable(_T->lock),
+ raw_spin_unlock_irq_enable(_T->lock))
-DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock))
+DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq_disable(_T->lock))
DEFINE_LOCK_GUARD_1(raw_spinlock_bh, raw_spinlock_t,
raw_spin_lock_bh(_T->lock),
@@ -580,12 +580,11 @@ DEFINE_LOCK_GUARD_1(raw_spinlock_bh, raw_spinlock_t,
DEFINE_LOCK_GUARD_1_COND(raw_spinlock_bh, _try, raw_spin_trylock_bh(_T->lock))
DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t,
- raw_spin_lock_irqsave(_T->lock, _T->flags),
- raw_spin_unlock_irqrestore(_T->lock, _T->flags),
- unsigned long flags)
+ raw_spin_lock_irq_disable(_T->lock),
+ raw_spin_unlock_irq_enable(_T->lock))
DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try,
- raw_spin_trylock_irqsave(_T->lock, _T->flags))
+ raw_spin_trylock_irq_disable(_T->lock))
DEFINE_LOCK_GUARD_1(spinlock, spinlock_t,
spin_lock(_T->lock),
@@ -594,11 +593,11 @@ DEFINE_LOCK_GUARD_1(spinlock, spinlock_t,
DEFINE_LOCK_GUARD_1_COND(spinlock, _try, spin_trylock(_T->lock))
DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t,
- spin_lock_irq(_T->lock),
- spin_unlock_irq(_T->lock))
+ spin_lock_irq_disable(_T->lock),
+ spin_unlock_irq_enable(_T->lock))
DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try,
- spin_trylock_irq(_T->lock))
+ spin_trylock_irq_disable(_T->lock))
DEFINE_LOCK_GUARD_1(spinlock_bh, spinlock_t,
spin_lock_bh(_T->lock),
@@ -608,12 +607,11 @@ DEFINE_LOCK_GUARD_1_COND(spinlock_bh, _try,
spin_trylock_bh(_T->lock))
DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t,
- spin_lock_irqsave(_T->lock, _T->flags),
- spin_unlock_irqrestore(_T->lock, _T->flags),
- unsigned long flags)
+ spin_lock_irq_disable(_T->lock),
+ spin_unlock_irq_enable(_T->lock))
DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try,
- spin_trylock_irqsave(_T->lock, _T->flags))
+ spin_trylock_irq_disable(_T->lock))
DEFINE_LOCK_GUARD_1(read_lock, rwlock_t,
read_lock(_T->lock),
--
2.52.0
^ permalink raw reply related [flat|nested] 47+ messages in thread* Re: [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (15 preceding siblings ...)
2026-01-21 22:39 ` [PATCH v17 16/16] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Lyude Paul
@ 2026-01-26 13:24 ` Gary Guo
2026-01-26 16:17 ` Boqun Feng
17 siblings, 0 replies; 47+ messages in thread
From: Gary Guo @ 2026-01-26 13:24 UTC (permalink / raw)
To: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner
Cc: Boqun Feng, Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
On Wed Jan 21, 2026 at 10:39 PM GMT, Lyude Paul wrote:
> This is the latest patch series for adding rust bindings for controlling
> local processor interrupts, adding support for spinlocks in rust that
> are acquired with local processor interrupts disabled, and implementing
> local interrupt controls through refcounting in the kernel.
>
> The previous version of this patch series can be found here:
>
> https://lkml.org/lkml/2025/12/15/1190
Please use lore.kernel.org (or patch.msgid.link) links instead.
Best,
Gary
>
> This patch series applies on top of the rust-next branch.
>
> There's a few big changes from the last time. Mainly that we've
> addressed all(?) of the open questions on this patch series:
>
> * Thanks to Joel Fernandes, we now have a seperate per-CPU counter for
> tracking NMI nesting - which ensures that we don't have to sacrifice
> NMI nest level bits in order to store a counter for refcounted IRQs.
> These patches have been included at the start of the series.
> * We've been able to prove that being able to convert the kernel over to
> this new interface is indeed possible, more on this below.
> * Also thank to Joel, we also now have actual benchmarks for how this
> affects performance:
> https://lore.kernel.org/rust-for-linux/20250619175335.2905836-1-joelagnelf@nvidia.com/
> * Also some small changes to the kunit test I added, mainly just making
> sure I don't forget to include a MODULE_DESCRIPTION or MODULE_LICENSE.
>
> Regarding the conversion plan: we've had some success at getting kernels
> to boot after attempting to convert the entire kernel from the
> non-refcounted API to the new refcounted API. It will definitely take
> quite a lot of work to get this right though, at least in the kernel
> core side of things. To give readers an idea of what I mean, here's a
> few of the issues that we ended up running into:
>
> On my end, I tried running a number of coccinelle conversions for this.
> At first I did actually try simply rewiring
> local_irq_disable()/local_irq_enable() to
> local_interrupt_enable()/local_interrupt_disable(). This wasn't really
> workable though, as it causes the kernel to crash very early on in a
> number of ways that I haven't fully untangled. Doing this with
> coccinelle on the other hand allowed me to convert individual files at a
> time, along with specific usage patterns of the old API, and as a result
> this ended up giving me a pretty good idea of where our issues are
> coming from. This coccinelle script, while still leaving most of the
> kernel unconverted, was at least able to be run on almost all of kernel/
> while still allowing us to boot on x86_64
>
> @depends on patch && !report@
> @@
> - local_irq_disable();
> + local_interrupt_disable();
> ...
> - local_irq_enable();
> + local_interrupt_enable();
>
> There were two files in kernel/ that were exceptions to this:
>
> * kernel/softirq.c
> * kernel/main.c (I figured out at least one fix to an issue here)
>
> The reason this worked is because it seems like the vast majority of the
> issues we're seeing come from "unbalanced"/"misordered" usages of the
> old irq API. And there seems to be a few reasons for this:
>
> * The first simple reason: occasionally the enable/disable was split
> across a function, which this script didn't handle.
> * The second more complicated reason: some portions of the kernel core
> end up calling processor instructions that modify the processor's
> local interrupt flags independently of the kernel. In x86_64's case, I
> believe we came to the conclusion the iret instruction (interrupt
> return) was modifying the interrupt flag state. There's possibly a few
> more instances like this elsewhere.
>
> Boqun also took a stab at this on aarch64, and ended up having similar
> findings. In their case, they discovered one of the culprits being
> raw_spin_rq_unlock_irq(). Here the reason is that on aarch64
> preempt_count is per-thread and not just per-cpu, and when context
> switching you generally disable interrupts from one task and restore it
> in the other task. So in order to fix it, we'll need to make some
> modifications to the aarch64 context-switching code.
>
> So - with this being said, we decided that the best way of converting it
> is likely to just leave us with 3 APIs for the time being - and have new
> drivers and code use the new API while we go through and convert the
> rest of the kernel.
>
> FULL CHANGELOG BELOW
>
> Boqun Feng (5):
> preempt: Introduce HARDIRQ_DISABLE_BITS
> preempt: Introduce __preempt_count_{sub, add}_return()
> irq & spin_lock: Add counted interrupt disabling/enabling
> rust: helper: Add spin_{un,}lock_irq_{enable,disable}() helpers
> locking: Switch to _irq_{disable,enable}() variants in cleanup guards
>
> Joel Fernandes (1):
> preempt: Track NMI nesting to separate per-CPU counter
>
> Lyude Paul (10):
> openrisc: Include <linux/cpumask.h> in smp.h
> irq: Add KUnit test for refcounted interrupt enable/disable
> rust: Introduce interrupt module
> rust: sync: Add SpinLockIrq
> rust: sync: Introduce lock::Lock::lock_with() and friends
> rust: sync: Expose lock::Backend
> rust: sync: lock/global: Rename B to G in trait bounds
> rust: sync: Add a lifetime parameter to lock::global::GlobalGuard
> rust: sync: lock/global: Add Backend parameter to GlobalGuard
> rust: sync: lock/global: Add ContextualBackend support to GlobalLock
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust
2026-01-21 22:39 [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Lyude Paul
` (16 preceding siblings ...)
2026-01-26 13:24 ` [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust Gary Guo
@ 2026-01-26 16:17 ` Boqun Feng
2026-02-03 0:36 ` Boqun Feng
17 siblings, 1 reply; 47+ messages in thread
From: Boqun Feng @ 2026-01-26 16:17 UTC (permalink / raw)
To: Lyude Paul
Cc: rust-for-linux, linux-kernel, Thomas Gleixner, Daniel Almeida,
Miguel Ojeda, Alex Gaynor, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Andrew Morton, Peter Zijlstra, Ingo Molnar,
Will Deacon, Waiman Long
On Wed, Jan 21, 2026 at 05:39:03PM -0500, Lyude Paul wrote:
> This is the latest patch series for adding rust bindings for controlling
> local processor interrupts, adding support for spinlocks in rust that
> are acquired with local processor interrupts disabled, and implementing
> local interrupt controls through refcounting in the kernel.
>
If there is no objection, I plan to queue patch #1 to #6 and patch #16
first, hence kindly ping for any review on these.
Regards,
Boqun
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: [PATCH v17 00/16] Refcounted interrupts, SpinLockIrq for rust
2026-01-26 16:17 ` Boqun Feng
@ 2026-02-03 0:36 ` Boqun Feng
0 siblings, 0 replies; 47+ messages in thread
From: Boqun Feng @ 2026-02-03 0:36 UTC (permalink / raw)
To: Boqun Feng
Cc: Lyude Paul, rust-for-linux, linux-kernel, Thomas Gleixner,
Daniel Almeida, Miguel Ojeda, Alex Gaynor, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Andrew Morton, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long
On Mon, Jan 26, 2026 at 08:17:27AM -0800, Boqun Feng wrote:
> On Wed, Jan 21, 2026 at 05:39:03PM -0500, Lyude Paul wrote:
> > This is the latest patch series for adding rust bindings for controlling
> > local processor interrupts, adding support for spinlocks in rust that
> > are acquired with local processor interrupts disabled, and implementing
> > local interrupt controls through refcounting in the kernel.
> >
>
> If there is no objection, I plan to queue patch #1 to #6 and patch #16
Hearing none, I just queued them into rust-sync:
https://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git/ rust-sync
For more tests and reviews. Thanks!
@Lyude, for the rest of them, please resolve comments and rebase on
rust-sync. Thank you for the great work!
Regards,
Boqun
> first, hence kindly ping for any review on these.
>
> Regards,
> Boqun
^ permalink raw reply [flat|nested] 47+ messages in thread