* [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1)
@ 2026-05-08 4:21 Boqun Feng
2026-05-08 4:21 ` [PATCH 01/11] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
` (11 more replies)
0 siblings, 12 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
Hi Peter,
This is a follow-up for Lyude's work [1]. Per your feedback at [2], I
did some digging and turned out that ARM64 already kinda did this. The
basic idea is based on:
1) preempt_count() previously mask our NEED_RESCHED bit, so the
effective bits is 31bits
2) with a 64bit preempt count implementation (as in your PREEMPT_LONG
proposal), the effective bits that record "whether we CAN preempt or
not" still fit in 32bit (i.e. an int)
as a result, I don't think we need to change the existing
preempt_count() API, but rather keep "32bit vs 64bit" as an
implementation detail. This saves us the need to change the printk code
for preempt_count().
For people who have reviewed the previous version, patch 8-11 are new,
please take a look.
The patchset passed the build and booting tests and also a "perf record"
test on x86 for NMI code path.
I would like to target this changes for 7.2 if possible.
[1]: https://lore.kernel.org/all/20260121223933.1568682-1-lyude@redhat.com/
[2]: https://lore.kernel.org/all/20260204111234.GA3031506@noisy.programming.kicks-ass.net/
Regards,
Boqun
Boqun Feng (8):
preempt: Introduce HARDIRQ_DISABLE_BITS
preempt: Introduce __preempt_count_{sub, add}_return()
irq & spin_lock: Add counted interrupt disabling/enabling
locking: Switch to _irq_{disable,enable}() variants in cleanup guards
sched: Remove the unused preempt_offset parameter of __cant_sleep()
sched: Avoid signed comparison of preempt_count() in __cant_migrate()
preempt: Introduce PREEMPT_COUNT_64BIT
arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT
Joel Fernandes (1):
preempt: Track NMI nesting to separate per-CPU counter
Lyude Paul (2):
openrisc: Include <linux/cpumask.h> in smp.h
irq: Add KUnit test for refcounted interrupt enable/disable
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/preempt.h | 18 +++++
arch/openrisc/include/asm/smp.h | 2 +
arch/s390/include/asm/preempt.h | 10 +++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/preempt.h | 61 +++++++++++----
arch/x86/kernel/cpu/common.c | 2 +-
include/asm-generic/preempt.h | 14 ++++
include/linux/hardirq.h | 41 ++++++++--
include/linux/interrupt_rc.h | 63 ++++++++++++++++
include/linux/kernel.h | 4 +-
include/linux/preempt.h | 35 ++++++---
include/linux/spinlock.h | 51 +++++++++----
include/linux/spinlock_api_smp.h | 27 +++++++
include/linux/spinlock_api_up.h | 9 +++
include/linux/spinlock_rt.h | 15 ++++
kernel/Kconfig.preempt | 4 +
kernel/irq/Makefile | 1 +
kernel/irq/refcount_interrupt_test.c | 109 +++++++++++++++++++++++++++
kernel/locking/spinlock.c | 29 +++++++
kernel/sched/core.c | 18 +++--
kernel/softirq.c | 11 +++
lib/locking-selftest.c | 2 +-
23 files changed, 476 insertions(+), 52 deletions(-)
create mode 100644 include/linux/interrupt_rc.h
create mode 100644 kernel/irq/refcount_interrupt_test.c
--
2.50.1 (Apple Git-155)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 01/11] preempt: Introduce HARDIRQ_DISABLE_BITS
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 02/11] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
` (10 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux, Boqun Feng
From: Boqun Feng <boqun.feng@gmail.com>
In order to support preempt_disable()-like interrupt disabling, that is,
using part of preempt_count() to track interrupt disabling nested level,
change the preempt_count() layout to contain 8-bit HARDIRQ_DISABLE
count.
Note that HARDIRQ_BITS and NMI_BITS are reduced by 1 because of this,
and it changes the maximum of their (hardirq and nmi) nesting level.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-2-lyude@redhat.com
---
include/linux/preempt.h | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index d964f965c8ff..f07e7f37f3ca 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -17,6 +17,7 @@
*
* - bits 0-7 are the preemption count (max preemption depth: 256)
* - bits 8-15 are the softirq count (max # of softirqs: 256)
+ * - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
*
* The hardirq count could in theory be the same as the number of
* interrupts in the system, but we run all interrupt handlers with
@@ -26,29 +27,34 @@
*
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
- * HARDIRQ_MASK: 0x000f0000
- * NMI_MASK: 0x00f00000
+ * HARDIRQ_DISABLE_MASK: 0x00ff0000
+ * HARDIRQ_MASK: 0x07000000
+ * NMI_MASK: 0x38000000
* PREEMPT_NEED_RESCHED: 0x80000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
-#define HARDIRQ_BITS 4
-#define NMI_BITS 4
+#define HARDIRQ_DISABLE_BITS 8
+#define HARDIRQ_BITS 3
+#define NMI_BITS 3
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
-#define HARDIRQ_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_DISABLE_SHIFT (SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_SHIFT (HARDIRQ_DISABLE_SHIFT + HARDIRQ_DISABLE_BITS)
#define NMI_SHIFT (HARDIRQ_SHIFT + HARDIRQ_BITS)
#define __IRQ_MASK(x) ((1UL << (x))-1)
#define PREEMPT_MASK (__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
#define SOFTIRQ_MASK (__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_MASK (__IRQ_MASK(HARDIRQ_DISABLE_BITS) << HARDIRQ_DISABLE_SHIFT)
#define HARDIRQ_MASK (__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
#define NMI_MASK (__IRQ_MASK(NMI_BITS) << NMI_SHIFT)
#define PREEMPT_OFFSET (1UL << PREEMPT_SHIFT)
#define SOFTIRQ_OFFSET (1UL << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_OFFSET (1UL << HARDIRQ_DISABLE_SHIFT)
#define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
#define NMI_OFFSET (1UL << NMI_SHIFT)
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 02/11] preempt: Track NMI nesting to separate per-CPU counter
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
2026-05-08 4:21 ` [PATCH 01/11] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 03/11] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
` (9 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux, Boqun Feng, Joel Fernandes
From: Joel Fernandes <joelagnelf@nvidia.com>
Move NMI nesting tracking from the preempt_count bits to a separate per-CPU
counter (nmi_nesting). This is to free up the NMI bits in the preempt_count,
allowing those bits to be repurposed for other uses. This also has the benefit
of tracking more than 16-levels deep if there is ever a need.
Reduce multiple bits in preempt_count for NMI tracking. Reduce NMI_BITS
from 3 to 1, using it only to detect if we're in an NMI.
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-3-lyude@redhat.com
---
include/linux/hardirq.h | 16 ++++++++++++----
include/linux/preempt.h | 13 +++++++++----
kernel/softirq.c | 2 ++
3 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d57cab4d4c06..cc06bda52c3e 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -10,6 +10,8 @@
#include <linux/vtime.h>
#include <asm/hardirq.h>
+DECLARE_PER_CPU(unsigned int, nmi_nesting);
+
extern void synchronize_irq(unsigned int irq);
extern bool synchronize_hardirq(unsigned int irq);
@@ -102,14 +104,16 @@ void irq_exit_rcu(void);
*/
/*
- * nmi_enter() can nest up to 15 times; see NMI_BITS.
+ * nmi_enter() can nest - nesting is tracked in a per-CPU counter.
*/
#define __nmi_enter() \
do { \
lockdep_off(); \
arch_nmi_enter(); \
- BUG_ON(in_nmi() == NMI_MASK); \
- __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
+ BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX); \
+ __this_cpu_inc(nmi_nesting); \
+ __preempt_count_add(HARDIRQ_OFFSET); \
+ preempt_count_set(preempt_count() | NMI_MASK); \
} while (0)
#define nmi_enter() \
@@ -124,8 +128,12 @@ void irq_exit_rcu(void);
#define __nmi_exit() \
do { \
+ unsigned int nesting; \
BUG_ON(!in_nmi()); \
- __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
+ __preempt_count_sub(HARDIRQ_OFFSET); \
+ nesting = __this_cpu_dec_return(nmi_nesting); \
+ if (!nesting) \
+ __preempt_count_sub(NMI_OFFSET); \
arch_nmi_exit(); \
lockdep_on(); \
} while (0)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index f07e7f37f3ca..e2d3079d3f5f 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -18,6 +18,8 @@
* - bits 0-7 are the preemption count (max preemption depth: 256)
* - bits 8-15 are the softirq count (max # of softirqs: 256)
* - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
+ * - bits 24-27 are the hardirq count (max # of hardirqs: 16)
+ * - bit 28 is the NMI flag (no nesting count, tracked separately)
*
* The hardirq count could in theory be the same as the number of
* interrupts in the system, but we run all interrupt handlers with
@@ -25,18 +27,21 @@
* there are a few palaeontologic drivers which reenable interrupts in
* the handler, so we need more than one bit here.
*
+ * NMI nesting depth is tracked in a separate per-CPU variable
+ * (nmi_nesting) to save bits in preempt_count.
+ *
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_DISABLE_MASK: 0x00ff0000
- * HARDIRQ_MASK: 0x07000000
- * NMI_MASK: 0x38000000
+ * HARDIRQ_MASK: 0x0f000000
+ * NMI_MASK: 0x10000000
* PREEMPT_NEED_RESCHED: 0x80000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define HARDIRQ_DISABLE_BITS 8
-#define HARDIRQ_BITS 3
-#define NMI_BITS 3
+#define HARDIRQ_BITS 4
+#define NMI_BITS 1
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4425d8dce44b..10af5ed859e7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
#endif
+DEFINE_PER_CPU(unsigned int, nmi_nesting);
+
/*
* SOFTIRQ_OFFSET usage:
*
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 03/11] preempt: Introduce __preempt_count_{sub, add}_return()
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
2026-05-08 4:21 ` [PATCH 01/11] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
2026-05-08 4:21 ` [PATCH 02/11] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-09 18:09 ` Heiko Carstens
2026-05-08 4:21 ` [PATCH 04/11] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
` (8 subsequent siblings)
11 siblings, 1 reply; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux, Boqun Feng
From: Boqun Feng <boqun.feng@gmail.com>
In order to use preempt_count() to tracking the interrupt disable
nesting level, __preempt_count_{add,sub}_return() are introduced, as
their name suggest, these primitives return the new value of the
preempt_count() after changing it. The following example shows the usage
of it in local_interrupt_disable():
// increase the HARDIRQ_DISABLE bit
new_count = __preempt_count_add_return(HARDIRQ_DISABLE_OFFSET);
// if it's the first-time increment, then disable the interrupt
// at hardware level.
if (new_count & HARDIRQ_DISABLE_MASK == HARDIRQ_DISABLE_OFFSET) {
local_irq_save(flags);
raw_cpu_write(local_interrupt_disable_state.flags, flags);
}
Having these primitives will avoid a read of preempt_count() after
changing preempt_count() on certain architectures.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-4-lyude@redhat.com
---
arch/arm64/include/asm/preempt.h | 18 ++++++++++++++++++
arch/s390/include/asm/preempt.h | 10 ++++++++++
arch/x86/include/asm/preempt.h | 10 ++++++++++
include/asm-generic/preempt.h | 14 ++++++++++++++
4 files changed, 52 insertions(+)
diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index 932ea4b62042..0dd8221d1bef 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -55,6 +55,24 @@ static inline void __preempt_count_sub(int val)
WRITE_ONCE(current_thread_info()->preempt.count, pc);
}
+static inline int __preempt_count_add_return(int val)
+{
+ u32 pc = READ_ONCE(current_thread_info()->preempt.count);
+ pc += val;
+ WRITE_ONCE(current_thread_info()->preempt.count, pc);
+
+ return pc;
+}
+
+static inline int __preempt_count_sub_return(int val)
+{
+ u32 pc = READ_ONCE(current_thread_info()->preempt.count);
+ pc -= val;
+ WRITE_ONCE(current_thread_info()->preempt.count, pc);
+
+ return pc;
+}
+
static inline bool __preempt_count_dec_and_test(void)
{
struct thread_info *ti = current_thread_info();
diff --git a/arch/s390/include/asm/preempt.h b/arch/s390/include/asm/preempt.h
index 6e5821bb047e..0a25d4648b4c 100644
--- a/arch/s390/include/asm/preempt.h
+++ b/arch/s390/include/asm/preempt.h
@@ -139,6 +139,16 @@ static __always_inline bool should_resched(int preempt_offset)
return unlikely(READ_ONCE(get_lowcore()->preempt_count) == preempt_offset);
}
+static __always_inline int __preempt_count_add_return(int val)
+{
+ return val + __atomic_add(val, &get_lowcore()->preempt_count);
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+ return __preempt_count_add_return(-val);
+}
+
#define init_task_preempt_count(p) do { } while (0)
/* Deferred to CPU bringup time */
#define init_idle_preempt_count(p, cpu) do { } while (0)
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 578441db09f0..1220656f3370 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -85,6 +85,16 @@ static __always_inline void __preempt_count_sub(int val)
raw_cpu_add_4(__preempt_count, -val);
}
+static __always_inline int __preempt_count_add_return(int val)
+{
+ return raw_cpu_add_return_4(__preempt_count, val);
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+ return raw_cpu_add_return_4(__preempt_count, -val);
+}
+
/*
* Because we keep PREEMPT_NEED_RESCHED set when we do _not_ need to reschedule
* a decrement which hits zero means we have no preempt_count and should
diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
index 51f8f3881523..c8683c046615 100644
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -59,6 +59,20 @@ static __always_inline void __preempt_count_sub(int val)
*preempt_count_ptr() -= val;
}
+static __always_inline int __preempt_count_add_return(int val)
+{
+ *preempt_count_ptr() += val;
+
+ return *preempt_count_ptr();
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+ *preempt_count_ptr() -= val;
+
+ return *preempt_count_ptr();
+}
+
static __always_inline bool __preempt_count_dec_and_test(void)
{
/*
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 04/11] openrisc: Include <linux/cpumask.h> in smp.h
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (2 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 03/11] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 05/11] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
` (7 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
From: Lyude Paul <lyude@redhat.com>
While OpenRISC currently doesn't fail to build upstream, it appears that
include <asm/smp.h> in the right headers is enough to break that -
primarily because OpenRISC's asm/smp.h header doesn't actually provide any
definition for struct cpumask. Which means the only reason we aren't
failing to build kernel is because we've been lucky enough that every spot
including asm/smp.h already has definitions for struct cpumask pulled in.
This became evident when trying to work on a patch series for adding
ref-counted interrupt enable/disables to the kernel, where introducing a
new interrupt_rc.h header suddenly introduced a build error on OpenRISC:
In file included from include/linux/interrupt_rc.h:17,
from include/linux/spinlock.h:60,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:7,
from include/linux/mm.h:7,
from arch/openrisc/include/asm/pgalloc.h:20,
from arch/openrisc/include/asm/io.h:18,
from include/linux/io.h:12,
from drivers/irqchip/irq-ompic.c:61:
arch/openrisc/include/asm/smp.h:21:59: warning: 'struct cpumask'
declared inside parameter list will not be visible outside of this
definition or declaration
21 | extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
| ^~~~~~~
arch/openrisc/include/asm/smp.h:23:54: warning: 'struct cpumask'
declared inside parameter list will not be visible outside of this
definition or declaration
23 | extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));
| ^~~~~~~
drivers/irqchip/irq-ompic.c: In function 'ompic_of_init':
>> drivers/irqchip/irq-ompic.c:191:28: error: passing argument 1 of
'set_smp_cross_call' from incompatible pointer type
[-Werror=incompatible-pointer-types]
191 | set_smp_cross_call(ompic_raise_softirq);
| ^~~~~~~~~~~~~~~~~~~
| |
| void (*)(const struct cpumask *, unsigned int)
arch/openrisc/include/asm/smp.h:23:32: note: expected 'void (*)(const
struct cpumask *, unsigned int)' but argument is of type 'void
(*)(const struct cpumask *, unsigned int)'
23 | extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));
To fix this, let's take an example from the smp.h headers of other
architectures (x86, hexagon, arm64, probably more): just include
linux/cpumask.h at the top.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-5-lyude@redhat.com
---
arch/openrisc/include/asm/smp.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/openrisc/include/asm/smp.h b/arch/openrisc/include/asm/smp.h
index 007296f160ef..84653aaffa96 100644
--- a/arch/openrisc/include/asm/smp.h
+++ b/arch/openrisc/include/asm/smp.h
@@ -9,6 +9,8 @@
#ifndef __ASM_OPENRISC_SMP_H
#define __ASM_OPENRISC_SMP_H
+#include <linux/cpumask.h>
+
#include <asm/spr.h>
#include <asm/spr_defs.h>
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 05/11] irq & spin_lock: Add counted interrupt disabling/enabling
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (3 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 04/11] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 06/11] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
` (6 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux, Boqun Feng
From: Boqun Feng <boqun.feng@gmail.com>
Currently the nested interrupt disabling and enabling is present by
_irqsave() and _irqrestore() APIs, which are relatively unsafe, for
example:
<interrupts are enabled as beginning>
spin_lock_irqsave(l1, flag1);
spin_lock_irqsave(l2, flag2);
spin_unlock_irqrestore(l1, flags1);
<l2 is still held but interrupts are enabled>
// accesses to interrupt-disable protect data will cause races.
This is even easier to triggered with guard facilities:
unsigned long flag2;
scoped_guard(spin_lock_irqsave, l1) {
spin_lock_irqsave(l2, flag2);
}
// l2 locked but interrupts are enabled.
spin_unlock_irqrestore(l2, flag2);
(Hand-to-hand locking critical sections are not uncommon for a
fine-grained lock design)
And because this unsafety, Rust cannot easily wrap the
interrupt-disabling locks in a safe API, which complicates the design.
To resolve this, introduce a new set of interrupt disabling APIs:
* local_interrupt_disable();
* local_interrupt_enable();
They work like local_irq_save() and local_irq_restore() except that 1)
the outermost local_interrupt_disable() call save the interrupt state
into a percpu variable, so that the outermost local_interrupt_enable()
can restore the state, and 2) a percpu counter is added to record the
nest level of these calls, so that interrupts are not accidentally
enabled inside the outermost critical section.
Also add the corresponding spin_lock primitives: spin_lock_irq_disable()
and spin_unlock_irq_enable(), as a result, code as follow:
spin_lock_irq_disable(l1);
spin_lock_irq_disable(l2);
spin_unlock_irq_enable(l1);
// Interrupts are still disabled.
spin_unlock_irq_enable(l2);
doesn't have the issue that interrupts are accidentally enabled.
This also makes the wrapper of interrupt-disabling locks on Rust easier
to design.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-6-lyude@redhat.com
---
include/linux/interrupt_rc.h | 63 ++++++++++++++++++++++++++++++++
include/linux/preempt.h | 4 ++
include/linux/spinlock.h | 25 +++++++++++++
include/linux/spinlock_api_smp.h | 27 ++++++++++++++
include/linux/spinlock_api_up.h | 9 +++++
include/linux/spinlock_rt.h | 15 ++++++++
kernel/locking/spinlock.c | 29 +++++++++++++++
kernel/softirq.c | 3 ++
8 files changed, 175 insertions(+)
create mode 100644 include/linux/interrupt_rc.h
diff --git a/include/linux/interrupt_rc.h b/include/linux/interrupt_rc.h
new file mode 100644
index 000000000000..d6d05498731b
--- /dev/null
+++ b/include/linux/interrupt_rc.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * include/linux/interrupt_rc.h - refcounted local processor interrupt
+ * management.
+ *
+ * Since the implementation of this API currently depends on
+ * local_irq_save()/local_irq_restore(), we split this into it's own header to
+ * make it easier to include without hitting circular header dependencies.
+ */
+
+#ifndef __LINUX_INTERRUPT_RC_H
+#define __LINUX_INTERRUPT_RC_H
+
+#include <linux/irqflags.h>
+#include <asm/processor.h>
+#ifdef CONFIG_SMP
+#include <asm/smp.h>
+#endif
+
+/* Per-cpu interrupt disabling state for local_interrupt_{disable,enable}() */
+struct interrupt_disable_state {
+ unsigned long flags;
+};
+
+DECLARE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
+
+static inline void local_interrupt_disable(void)
+{
+ unsigned long flags;
+ int new_count;
+
+ new_count = hardirq_disable_enter();
+
+ if ((new_count & HARDIRQ_DISABLE_MASK) == HARDIRQ_DISABLE_OFFSET) {
+ local_irq_save(flags);
+ raw_cpu_write(local_interrupt_disable_state.flags, flags);
+ }
+}
+
+static inline void local_interrupt_enable(void)
+{
+ int new_count;
+
+ new_count = hardirq_disable_exit();
+
+ if ((new_count & HARDIRQ_DISABLE_MASK) == 0) {
+ unsigned long flags;
+
+ flags = raw_cpu_read(local_interrupt_disable_state.flags);
+ local_irq_restore(flags);
+ /*
+ * TODO: re-read preempt count can be avoided, but it needs
+ * should_resched() taking another parameter as the current
+ * preempt count
+ */
+#ifdef PREEMPTION
+ if (should_resched(0))
+ __preempt_schedule();
+#endif
+ }
+}
+
+#endif /* !__LINUX_INTERRUPT_RC_H */
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index e2d3079d3f5f..33fc4c814a9f 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -151,6 +151,10 @@ static __always_inline unsigned char interrupt_context_level(void)
#define in_softirq() (softirq_count())
#define in_interrupt() (irq_count())
+#define hardirq_disable_count() ((preempt_count() & HARDIRQ_DISABLE_MASK) >> HARDIRQ_DISABLE_SHIFT)
+#define hardirq_disable_enter() __preempt_count_add_return(HARDIRQ_DISABLE_OFFSET)
+#define hardirq_disable_exit() __preempt_count_sub_return(HARDIRQ_DISABLE_OFFSET)
+
/*
* The preempt_count offset after preempt_disable();
*/
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 241277cd34cf..66fa699fff19 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -57,6 +57,7 @@
#include <linux/linkage.h>
#include <linux/compiler.h>
#include <linux/irqflags.h>
+#include <linux/interrupt_rc.h>
#include <linux/thread_info.h>
#include <linux/stringify.h>
#include <linux/bottom_half.h>
@@ -273,9 +274,11 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
#endif
#define raw_spin_lock_irq(lock) _raw_spin_lock_irq(lock)
+#define raw_spin_lock_irq_disable(lock) _raw_spin_lock_irq_disable(lock)
#define raw_spin_lock_bh(lock) _raw_spin_lock_bh(lock)
#define raw_spin_unlock(lock) _raw_spin_unlock(lock)
#define raw_spin_unlock_irq(lock) _raw_spin_unlock_irq(lock)
+#define raw_spin_unlock_irq_enable(lock) _raw_spin_unlock_irq_enable(lock)
#define raw_spin_unlock_irqrestore(lock, flags) \
do { \
@@ -290,6 +293,13 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
#define raw_spin_trylock_irqsave(lock, flags) _raw_spin_trylock_irqsave(lock, &(flags))
+#define raw_spin_trylock_irq_disable(lock) \
+({ \
+ local_interrupt_disable(); \
+ raw_spin_trylock(lock) ? \
+ 1 : ({ local_interrupt_enable(); 0; }); \
+})
+
#ifndef CONFIG_PREEMPT_RT
/* Include rwlock functions for !RT */
#include <linux/rwlock.h>
@@ -372,6 +382,11 @@ static __always_inline void spin_lock_irq(spinlock_t *lock)
raw_spin_lock_irq(&lock->rlock);
}
+static __always_inline void spin_lock_irq_disable(spinlock_t *lock)
+{
+ raw_spin_lock_irq_disable(&lock->rlock);
+}
+
#define spin_lock_irqsave(lock, flags) \
do { \
raw_spin_lock_irqsave(spinlock_check(lock), flags); \
@@ -402,6 +417,11 @@ static __always_inline void spin_unlock_irq(spinlock_t *lock)
raw_spin_unlock_irq(&lock->rlock);
}
+static __always_inline void spin_unlock_irq_enable(spinlock_t *lock)
+{
+ raw_spin_unlock_irq_enable(&lock->rlock);
+}
+
static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
__releases(lock) __no_context_analysis
{
@@ -427,6 +447,11 @@ static __always_inline bool _spin_trylock_irqsave(spinlock_t *lock, unsigned lon
}
#define spin_trylock_irqsave(lock, flags) _spin_trylock_irqsave(lock, &(flags))
+static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
+{
+ return raw_spin_trylock_irq_disable(&lock->rlock);
+}
+
/**
* spin_is_locked() - Check whether a spinlock is locked.
* @lock: Pointer to the spinlock.
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index bda5e7a390cd..a05f507b6979 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -28,6 +28,8 @@ _raw_spin_lock_nest_lock(raw_spinlock_t *lock, struct lockdep_map *map)
void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock) __acquires(lock);
void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
__acquires(lock);
+void __lockfunc _raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+ __acquires(lock);
unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
__acquires(lock);
@@ -39,6 +41,7 @@ int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock) __cond_acquires(true,
void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock) __releases(lock);
void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock) __releases(lock);
void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock) __releases(lock);
+void __lockfunc _raw_spin_unlock_irq_enable(raw_spinlock_t *lock) __releases(lock);
void __lockfunc
_raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
__releases(lock);
@@ -55,6 +58,11 @@ _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
#define _raw_spin_lock_irq(lock) __raw_spin_lock_irq(lock)
#endif
+/* Use the same config as spin_lock_irq() temporarily. */
+#ifdef CONFIG_INLINE_SPIN_LOCK_IRQ
+#define _raw_spin_lock_irq_disable(lock) __raw_spin_lock_irq_disable(lock)
+#endif
+
#ifdef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
#define _raw_spin_lock_irqsave(lock) __raw_spin_lock_irqsave(lock)
#endif
@@ -79,6 +87,11 @@ _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
#define _raw_spin_unlock_irq(lock) __raw_spin_unlock_irq(lock)
#endif
+/* Use the same config as spin_unlock_irq() temporarily. */
+#ifdef CONFIG_INLINE_SPIN_UNLOCK_IRQ
+#define _raw_spin_unlock_irq_enable(lock) __raw_spin_unlock_irq_enable(lock)
+#endif
+
#ifdef CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE
#define _raw_spin_unlock_irqrestore(lock, flags) __raw_spin_unlock_irqrestore(lock, flags)
#endif
@@ -143,6 +156,13 @@ static inline void __raw_spin_lock_irq(raw_spinlock_t *lock)
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}
+static inline void __raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+ local_interrupt_disable();
+ spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+ LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
+}
+
static inline void __raw_spin_lock_bh(raw_spinlock_t *lock)
__acquires(lock) __no_context_analysis
{
@@ -188,6 +208,13 @@ static inline void __raw_spin_unlock_irq(raw_spinlock_t *lock)
preempt_enable();
}
+static inline void __raw_spin_unlock_irq_enable(raw_spinlock_t *lock)
+{
+ spin_release(&lock->dep_map, _RET_IP_);
+ do_raw_spin_unlock(lock);
+ local_interrupt_enable();
+}
+
static inline void __raw_spin_unlock_bh(raw_spinlock_t *lock)
__releases(lock)
{
diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h
index a9d5c7c66e03..e0dea85ac45d 100644
--- a/include/linux/spinlock_api_up.h
+++ b/include/linux/spinlock_api_up.h
@@ -42,6 +42,9 @@
#define __LOCK_IRQSAVE(lock, flags, ...) \
do { local_irq_save(flags); __LOCK(lock, ##__VA_ARGS__); } while (0)
+#define __LOCK_IRQ_DISABLE(lock, ...) \
+ do { local_interrupt_disable(); __LOCK(lock, ##__VA_ARGS__); } while (0)
+
#define ___UNLOCK_(lock) \
do { __release(lock); (void)(lock); } while (0)
@@ -61,6 +64,10 @@
#define __UNLOCK_IRQRESTORE(lock, flags, ...) \
do { local_irq_restore(flags); __UNLOCK(lock, ##__VA_ARGS__); } while (0)
+#define __UNLOCK_IRQ_ENABLE(lock, ...) \
+ do { __UNLOCK(lock, ##__VA_ARGS__); local_interrupt_enable(); } while (0)
+
+
#define _raw_spin_lock(lock) __LOCK(lock)
#define _raw_spin_lock_nested(lock, subclass) __LOCK(lock)
#define _raw_read_lock(lock) __LOCK(lock, shared)
@@ -70,6 +77,7 @@
#define _raw_read_lock_bh(lock) __LOCK_BH(lock, shared)
#define _raw_write_lock_bh(lock) __LOCK_BH(lock)
#define _raw_spin_lock_irq(lock) __LOCK_IRQ(lock)
+#define _raw_spin_lock_irq_disable(lock) __LOCK_IRQ_DISABLE(lock)
#define _raw_read_lock_irq(lock) __LOCK_IRQ(lock, shared)
#define _raw_write_lock_irq(lock) __LOCK_IRQ(lock)
#define _raw_spin_lock_irqsave(lock, flags) __LOCK_IRQSAVE(lock, flags)
@@ -132,6 +140,7 @@ static __always_inline int _raw_write_trylock_irqsave(rwlock_t *lock, unsigned l
#define _raw_write_unlock_bh(lock) __UNLOCK_BH(lock)
#define _raw_read_unlock_bh(lock) __UNLOCK_BH(lock, shared)
#define _raw_spin_unlock_irq(lock) __UNLOCK_IRQ(lock)
+#define _raw_spin_unlock_irq_enable(lock) __UNLOCK_IRQ_ENABLE(lock)
#define _raw_read_unlock_irq(lock) __UNLOCK_IRQ(lock, shared)
#define _raw_write_unlock_irq(lock) __UNLOCK_IRQ(lock)
#define _raw_spin_unlock_irqrestore(lock, flags) \
diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h
index 373618a4243c..c5a8f3f31a2d 100644
--- a/include/linux/spinlock_rt.h
+++ b/include/linux/spinlock_rt.h
@@ -96,6 +96,11 @@ static __always_inline void spin_lock_irq(spinlock_t *lock)
rt_spin_lock(lock);
}
+static __always_inline void spin_lock_irq_disable(spinlock_t *lock)
+{
+ rt_spin_lock(lock);
+}
+
#define spin_lock_irqsave(lock, flags) \
do { \
typecheck(unsigned long, flags); \
@@ -122,6 +127,11 @@ static __always_inline void spin_unlock_irq(spinlock_t *lock)
rt_spin_unlock(lock);
}
+static __always_inline void spin_unlock_irq_enable(spinlock_t *lock)
+{
+ rt_spin_unlock(lock);
+}
+
static __always_inline void spin_unlock_irqrestore(spinlock_t *lock,
unsigned long flags)
__releases(lock)
@@ -131,6 +141,11 @@ static __always_inline void spin_unlock_irqrestore(spinlock_t *lock,
#define spin_trylock(lock) rt_spin_trylock(lock)
+static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
+{
+ return rt_spin_trylock(lock);
+}
+
#define spin_trylock_bh(lock) rt_spin_trylock_bh(lock)
#define spin_trylock_irq(lock) rt_spin_trylock(lock)
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index b42d293da38b..764641f6ec57 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -129,6 +129,19 @@ static void __lockfunc __raw_##op##_lock_bh(locktype##_t *lock) \
*/
BUILD_LOCK_OPS(spin, raw_spinlock, __acquires);
+/* No rwlock_t variants for now, so just build this function by hand */
+static void __lockfunc __raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+ for (;;) {
+ local_interrupt_disable();
+ if (likely(do_raw_spin_trylock(lock)))
+ break;
+ local_interrupt_enable();
+
+ arch_spin_relax(&lock->raw_lock);
+ }
+}
+
#ifndef CONFIG_PREEMPT_RT
BUILD_LOCK_OPS(read, rwlock, __acquires_shared);
BUILD_LOCK_OPS(write, rwlock, __acquires);
@@ -176,6 +189,14 @@ noinline void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
EXPORT_SYMBOL(_raw_spin_lock_irq);
#endif
+#ifndef CONFIG_INLINE_SPIN_LOCK_IRQ
+noinline void __lockfunc _raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+ __raw_spin_lock_irq_disable(lock);
+}
+EXPORT_SYMBOL_GPL(_raw_spin_lock_irq_disable);
+#endif
+
#ifndef CONFIG_INLINE_SPIN_LOCK_BH
noinline void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
{
@@ -208,6 +229,14 @@ noinline void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
EXPORT_SYMBOL(_raw_spin_unlock_irq);
#endif
+#ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQ
+noinline void __lockfunc _raw_spin_unlock_irq_enable(raw_spinlock_t *lock)
+{
+ __raw_spin_unlock_irq_enable(lock);
+}
+EXPORT_SYMBOL_GPL(_raw_spin_unlock_irq_enable);
+#endif
+
#ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
noinline void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
{
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 10af5ed859e7..6fa83aabae47 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,9 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
#endif
+DEFINE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
+EXPORT_PER_CPU_SYMBOL_GPL(local_interrupt_disable_state);
+
DEFINE_PER_CPU(unsigned int, nmi_nesting);
/*
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 06/11] irq: Add KUnit test for refcounted interrupt enable/disable
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (4 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 05/11] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 07/11] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
` (5 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
From: Lyude Paul <lyude@redhat.com>
While making changes to the refcounted interrupt patch series, at some
point on my local branch I broke something and ended up writing some kunit
tests for testing refcounted interrupts as a result. So, let's include
these tests now that we have refcounted interrupts.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-7-lyude@redhat.com
---
kernel/irq/Makefile | 1 +
kernel/irq/refcount_interrupt_test.c | 109 +++++++++++++++++++++++++++
2 files changed, 110 insertions(+)
create mode 100644 kernel/irq/refcount_interrupt_test.c
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index 86a2e5ae08f9..44c4d6fc502a 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -16,3 +16,4 @@ obj-$(CONFIG_SMP) += affinity.o
obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
obj-$(CONFIG_IRQ_KUNIT_TEST) += irq_test.o
+obj-$(CONFIG_KUNIT) += refcount_interrupt_test.o
diff --git a/kernel/irq/refcount_interrupt_test.c b/kernel/irq/refcount_interrupt_test.c
new file mode 100644
index 000000000000..b4f224595f26
--- /dev/null
+++ b/kernel/irq/refcount_interrupt_test.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test for refcounted interrupt enable/disables.
+ */
+
+#include <kunit/test.h>
+#include <linux/interrupt_rc.h>
+
+#define TEST_IRQ_ON() KUNIT_EXPECT_FALSE(test, irqs_disabled())
+#define TEST_IRQ_OFF() KUNIT_EXPECT_TRUE(test, irqs_disabled())
+
+/* ===== Test cases ===== */
+static void test_single_irq_change(struct kunit *test)
+{
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+}
+
+static void test_nested_irq_change(struct kunit *test)
+{
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+}
+
+static void test_multiple_irq_change(struct kunit *test)
+{
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+}
+
+static void test_irq_save(struct kunit *test)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ TEST_IRQ_OFF();
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_OFF();
+ local_irq_restore(flags);
+ TEST_IRQ_ON();
+
+ local_interrupt_disable();
+ TEST_IRQ_OFF();
+ local_irq_save(flags);
+ TEST_IRQ_OFF();
+ local_irq_restore(flags);
+ TEST_IRQ_OFF();
+ local_interrupt_enable();
+ TEST_IRQ_ON();
+}
+
+static struct kunit_case test_cases[] = {
+ KUNIT_CASE(test_single_irq_change),
+ KUNIT_CASE(test_nested_irq_change),
+ KUNIT_CASE(test_multiple_irq_change),
+ KUNIT_CASE(test_irq_save),
+ {},
+};
+
+/* (init and exit are the same */
+static int test_init(struct kunit *test)
+{
+ TEST_IRQ_ON();
+
+ return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+ TEST_IRQ_ON();
+}
+
+static struct kunit_suite refcount_interrupt_test_suite = {
+ .name = "refcount_interrupt",
+ .test_cases = test_cases,
+ .init = test_init,
+ .exit = test_exit,
+};
+
+kunit_test_suite(refcount_interrupt_test_suite);
+MODULE_AUTHOR("Lyude Paul <lyude@redhat.com>");
+MODULE_DESCRIPTION("Refcounted interrupt unit test suite");
+MODULE_LICENSE("GPL");
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 07/11] locking: Switch to _irq_{disable,enable}() variants in cleanup guards
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (5 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 06/11] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 08/11] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
` (4 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux, Boqun Feng
From: Boqun Feng <boqun.feng@gmail.com>
The semantics of various irq disabling guards match what
*_irq_{disable,enable}() provide, i.e. the interrupt disabling is
properly nested, therefore it's OK to switch to use
*_irq_{disable,enable}() primitives.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-17-lyude@redhat.com
---
include/linux/spinlock.h | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 66fa699fff19..cf5cdb8b272c 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -574,12 +574,12 @@ DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_nested, __acquires(_T), __releases(*(raw
#define class_raw_spinlock_nested_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_nested, _T)
DEFINE_LOCK_GUARD_1(raw_spinlock_irq, raw_spinlock_t,
- raw_spin_lock_irq(_T->lock),
- raw_spin_unlock_irq(_T->lock))
+ raw_spin_lock_irq_disable(_T->lock),
+ raw_spin_unlock_irq_enable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irq, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
#define class_raw_spinlock_irq_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irq, _T)
-DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock))
+DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq_disable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irq_try, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
#define class_raw_spinlock_irq_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irq_try, _T)
@@ -594,14 +594,13 @@ DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_bh_try, __acquires(_T), __releases(*(raw
#define class_raw_spinlock_bh_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_bh_try, _T)
DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t,
- raw_spin_lock_irqsave(_T->lock, _T->flags),
- raw_spin_unlock_irqrestore(_T->lock, _T->flags),
- unsigned long flags)
+ raw_spin_lock_irq_disable(_T->lock),
+ raw_spin_unlock_irq_enable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
#define class_raw_spinlock_irqsave_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave, _T)
DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try,
- raw_spin_trylock_irqsave(_T->lock, _T->flags))
+ raw_spin_trylock_irq_disable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave_try, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
#define class_raw_spinlock_irqsave_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave_try, _T)
@@ -620,13 +619,13 @@ DECLARE_LOCK_GUARD_1_ATTRS(spinlock_try, __acquires(_T), __releases(*(spinlock_t
#define class_spinlock_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_try, _T)
DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t,
- spin_lock_irq(_T->lock),
- spin_unlock_irq(_T->lock))
+ spin_lock_irq_disable(_T->lock),
+ spin_unlock_irq_enable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irq, __acquires(_T), __releases(*(spinlock_t **)_T))
#define class_spinlock_irq_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irq, _T)
DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try,
- spin_trylock_irq(_T->lock))
+ spin_trylock_irq_disable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irq_try, __acquires(_T), __releases(*(spinlock_t **)_T))
#define class_spinlock_irq_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irq_try, _T)
@@ -642,14 +641,13 @@ DECLARE_LOCK_GUARD_1_ATTRS(spinlock_bh_try, __acquires(_T), __releases(*(spinloc
#define class_spinlock_bh_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_bh_try, _T)
DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t,
- spin_lock_irqsave(_T->lock, _T->flags),
- spin_unlock_irqrestore(_T->lock, _T->flags),
- unsigned long flags)
+ spin_lock_irq_disable(_T->lock),
+ spin_unlock_irq_enable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irqsave, __acquires(_T), __releases(*(spinlock_t **)_T))
#define class_spinlock_irqsave_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irqsave, _T)
DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try,
- spin_trylock_irqsave(_T->lock, _T->flags))
+ spin_trylock_irq_disable(_T->lock))
DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irqsave_try, __acquires(_T), __releases(*(spinlock_t **)_T))
#define class_spinlock_irqsave_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irqsave_try, _T)
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 08/11] sched: Remove the unused preempt_offset parameter of __cant_sleep()
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (6 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 07/11] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 09/11] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
` (3 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
The preempt_offset is always 0 in all the callsites of __cant_sleep(),
hence remove it. It also allows us to clear the code a bit by stopping
using a "preempt_count() > .." comparison.
Signed-off-by: Boqun Feng <boqun@kernel.org>
---
include/linux/kernel.h | 4 ++--
kernel/sched/core.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5570a16cbb1..24414c79e59a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -72,7 +72,7 @@ extern int dynamic_might_resched(void);
#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
extern void __might_resched(const char *file, int line, unsigned int offsets);
extern void __might_sleep(const char *file, int line);
-extern void __cant_sleep(const char *file, int line, int preempt_offset);
+extern void __cant_sleep(const char *file, int line);
extern void __cant_migrate(const char *file, int line);
/**
@@ -95,7 +95,7 @@ extern void __cant_migrate(const char *file, int line);
* this macro will print a stack trace if it is executed with preemption enabled
*/
# define cant_sleep() \
- do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
+ do { __cant_sleep(__FILE__, __LINE__); } while (0)
# define sched_annotate_sleep() (current->task_state_change = 0)
/**
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..75dba7cc09bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9165,7 +9165,7 @@ void __might_resched(const char *file, int line, unsigned int offsets)
}
EXPORT_SYMBOL(__might_resched);
-void __cant_sleep(const char *file, int line, int preempt_offset)
+void __cant_sleep(const char *file, int line)
{
static unsigned long prev_jiffy;
@@ -9175,7 +9175,7 @@ void __cant_sleep(const char *file, int line, int preempt_offset)
if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
return;
- if (preempt_count() > preempt_offset)
+ if (preempt_count())
return;
if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 09/11] sched: Avoid signed comparison of preempt_count() in __cant_migrate()
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (7 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 08/11] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 10/11] preempt: Introduce PREEMPT_COUNT_64BIT Boqun Feng
` (2 subsequent siblings)
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
Currently preempt_count() is always a non-negative int on all archs
(PREEMPT_NEED_RESCHED archs will mask out the MSB when return
preempt_count()), hence the checking in __cant_migrate() is in fact just
checking whether preempt_count() is 0 or not. In a future change, we are
going to use all the 32 bits of preempt_count(), which would make
negative int values possible from preempt_count(). Therefore convert the
"> 0" comparison into a zero checking to prepare for the future change.
No functional changes are intended.
Signed-off-by: Boqun Feng <boqun@kernel.org>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75dba7cc09bd..636e6a15f104 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9207,7 +9207,7 @@ void __cant_migrate(const char *file, int line)
if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
return;
- if (preempt_count() > 0)
+ if (preempt_count())
return;
if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 10/11] preempt: Introduce PREEMPT_COUNT_64BIT
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (8 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 09/11] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 4:21 ` [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT Boqun Feng
2026-05-09 18:12 ` [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Heiko Carstens
11 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
With the changes that enable preempt count to tracking irq disabling
nesting, we don't have enough bits in 32bit preempt count
implementation, as a result we move NMI nesting bits out of the 32bit
preempt count. However on the architectures that can support 64bit
preempt count implementation, we can keep the NMI nesting bits in the
32bit preempt count and avoid maintaining NMI nesting bits out of the
same cache line.
Therefore PREEMPT_COUNT_64BIT is introduced to allow architectures to
select this. Note that under this kconfig, preempt count is maintained
in a 64bit word however preempt_count() still remains as an int because
all the effective bits still fit in (previously we mask out NEED_RESCHED
bit in preempt_count()). This should make no functional changes for
existing preempt_count() users.
Enable this for x86_64 along with the introduction of the Kconfig.
Originally-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun@kernel.org>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/preempt.h | 55 +++++++++++++++++++++++-----------
arch/x86/kernel/cpu/common.c | 2 +-
include/linux/hardirq.h | 47 +++++++++++++++++++++--------
include/linux/preempt.h | 20 +++++++------
kernel/Kconfig.preempt | 4 +++
kernel/sched/core.c | 12 ++++++--
kernel/softirq.c | 6 ++++
lib/locking-selftest.c | 2 +-
9 files changed, 107 insertions(+), 42 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..89f79608d8e1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -327,6 +327,7 @@ config X86
select USER_STACKTRACE_SUPPORT
select HAVE_ARCH_KCSAN if X86_64
select PROC_PID_ARCH_STATUS if PROC_FS
+ select PREEMPT_COUNT_64BIT if X86_64
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16
select FUNCTION_ALIGNMENT_4B
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 1220656f3370..b7dd3f764480 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -7,10 +7,20 @@
#include <linux/static_call_types.h>
-DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
+DECLARE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count);
-/* We use the MSB mostly because its available */
-#define PREEMPT_NEED_RESCHED 0x80000000
+/*
+ * We use the MSB for PREEMPT_NEED_RESCHED mostly because it is available.
+ */
+#define PREEMPT_NEED_RESCHED (~(((unsigned long)-1L) >> 1))
+
+#ifdef CONFIG_PREEMPT_COUNT_64BIT
+#define __pc_dec "decq"
+#define __pc_op(op, ...) raw_cpu_##op##_8(__VA_ARGS__)
+#else
+#define __pc_dec "decl"
+#define __pc_op(op, ...) raw_cpu_##op##_4(__VA_ARGS__)
+#endif
/*
* We use the PREEMPT_NEED_RESCHED bit as an inverted NEED_RESCHED such
@@ -24,18 +34,26 @@ DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
*/
static __always_inline int preempt_count(void)
{
- return raw_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+ return __pc_op(read, __preempt_count) & ~PREEMPT_NEED_RESCHED;
}
-static __always_inline void preempt_count_set(int pc)
+/*
+ * unsigned long preempt count parameter works for both 32bit and 64bit cases:
+ *
+ * - For 32bit, "int" (the return of preempt_count()) and "unsigned long" have
+ * the same size.
+ * - For 64bit, the effective bits of a preempt count sits in 32bit, and we
+ * reserve the NEED_RESCHED bit from the old count.
+ */
+static __always_inline void preempt_count_set(unsigned long pc)
{
- int old, new;
+ unsigned long old, new;
- old = raw_cpu_read_4(__preempt_count);
+ old = __pc_op(read, __preempt_count);
do {
new = (old & PREEMPT_NEED_RESCHED) |
(pc & ~PREEMPT_NEED_RESCHED);
- } while (!raw_cpu_try_cmpxchg_4(__preempt_count, &old, new));
+ } while (!__pc_op(try_cmpxchg, __preempt_count, &old, new));
}
/*
@@ -58,17 +76,17 @@ static __always_inline void preempt_count_set(int pc)
static __always_inline void set_preempt_need_resched(void)
{
- raw_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
+ __pc_op(and, __preempt_count, ~PREEMPT_NEED_RESCHED);
}
static __always_inline void clear_preempt_need_resched(void)
{
- raw_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
+ __pc_op(or, __preempt_count, PREEMPT_NEED_RESCHED);
}
static __always_inline bool test_preempt_need_resched(void)
{
- return !(raw_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
+ return !(__pc_op(read, __preempt_count) & PREEMPT_NEED_RESCHED);
}
/*
@@ -77,22 +95,22 @@ static __always_inline bool test_preempt_need_resched(void)
static __always_inline void __preempt_count_add(int val)
{
- raw_cpu_add_4(__preempt_count, val);
+ __pc_op(add, __preempt_count, val);
}
static __always_inline void __preempt_count_sub(int val)
{
- raw_cpu_add_4(__preempt_count, -val);
+ __pc_op(add, __preempt_count, -val);
}
static __always_inline int __preempt_count_add_return(int val)
{
- return raw_cpu_add_return_4(__preempt_count, val);
+ return __pc_op(add_return, __preempt_count, val);
}
static __always_inline int __preempt_count_sub_return(int val)
{
- return raw_cpu_add_return_4(__preempt_count, -val);
+ return __pc_op(add_return, __preempt_count, -val);
}
/*
@@ -102,7 +120,7 @@ static __always_inline int __preempt_count_sub_return(int val)
*/
static __always_inline bool __preempt_count_dec_and_test(void)
{
- return GEN_UNARY_RMWcc("decl", __my_cpu_var(__preempt_count), e,
+ return GEN_UNARY_RMWcc(__pc_dec, __my_cpu_var(__preempt_count), e,
__percpu_arg([var]));
}
@@ -111,7 +129,7 @@ static __always_inline bool __preempt_count_dec_and_test(void)
*/
static __always_inline bool should_resched(int preempt_offset)
{
- return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
+ return unlikely(__pc_op(read, __preempt_count) == preempt_offset);
}
#ifdef CONFIG_PREEMPTION
@@ -158,4 +176,7 @@ do { \
#endif /* PREEMPTION */
+#undef __pc_op
+#undef __pc_dec
+
#endif /* __ASM_PREEMPT_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a4268c47f2bc..182772b6ad6d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2240,7 +2240,7 @@ DEFINE_PER_CPU_CACHE_HOT(struct task_struct *, current_task) = &init_task;
EXPORT_PER_CPU_SYMBOL(current_task);
EXPORT_PER_CPU_SYMBOL(const_current_task);
-DEFINE_PER_CPU_CACHE_HOT(int, __preempt_count) = INIT_PREEMPT_COUNT;
+DEFINE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count) = INIT_PREEMPT_COUNT;
EXPORT_PER_CPU_SYMBOL(__preempt_count);
DEFINE_PER_CPU_CACHE_HOT(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index cc06bda52c3e..7690524a6677 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -10,8 +10,6 @@
#include <linux/vtime.h>
#include <asm/hardirq.h>
-DECLARE_PER_CPU(unsigned int, nmi_nesting);
-
extern void synchronize_irq(unsigned int irq);
extern bool synchronize_hardirq(unsigned int irq);
@@ -94,6 +92,38 @@ void irq_exit_rcu(void);
#define arch_nmi_exit() do { } while (0)
#endif
+#ifdef CONFIG_PREEMPT_COUNT_64BIT
+static __always_inline void __preempt_count_nmi_enter(void)
+{
+ __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);
+}
+
+static __always_inline void __preempt_count_nmi_exit(void)
+{
+ __preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);
+}
+#else
+DECLARE_PER_CPU(unsigned int, nmi_nesting);
+
+#define __preempt_count_nmi_enter() \
+ do { \
+ unsigned int _o = NMI_MASK + HARDIRQ_OFFSET; \
+ __this_cpu_inc(nmi_nesting); \
+ _o -= (preempt_count() & NMI_MASK); \
+ __preempt_count_add(_o); \
+ } while (0)
+
+#define __preempt_count_nmi_exit() \
+ do { \
+ unsigned int _o = HARDIRQ_OFFSET; \
+ if (!__this_cpu_dec_return(nmi_nesting)) \
+ _o += NMI_MASK; \
+ __preempt_count_sub(_o); \
+ } while (0)
+
+#endif
+
+
/*
* NMI vs Tracing
* --------------
@@ -110,17 +140,14 @@ void irq_exit_rcu(void);
do { \
lockdep_off(); \
arch_nmi_enter(); \
- BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX); \
- __this_cpu_inc(nmi_nesting); \
- __preempt_count_add(HARDIRQ_OFFSET); \
- preempt_count_set(preempt_count() | NMI_MASK); \
+ __preempt_count_nmi_enter(); \
} while (0)
#define nmi_enter() \
do { \
__nmi_enter(); \
lockdep_hardirq_enter(); \
- ct_nmi_enter(); \
+ ct_nmi_enter(); \
instrumentation_begin(); \
ftrace_nmi_enter(); \
instrumentation_end(); \
@@ -128,12 +155,8 @@ void irq_exit_rcu(void);
#define __nmi_exit() \
do { \
- unsigned int nesting; \
BUG_ON(!in_nmi()); \
- __preempt_count_sub(HARDIRQ_OFFSET); \
- nesting = __this_cpu_dec_return(nmi_nesting); \
- if (!nesting) \
- __preempt_count_sub(NMI_OFFSET); \
+ __preempt_count_nmi_exit(); \
arch_nmi_exit(); \
lockdep_on(); \
} while (0)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 33fc4c814a9f..97387d586dfb 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -30,18 +30,20 @@
* NMI nesting depth is tracked in a separate per-CPU variable
* (nmi_nesting) to save bits in preempt_count.
*
- * PREEMPT_MASK: 0x000000ff
- * SOFTIRQ_MASK: 0x0000ff00
- * HARDIRQ_DISABLE_MASK: 0x00ff0000
- * HARDIRQ_MASK: 0x0f000000
- * NMI_MASK: 0x10000000
- * PREEMPT_NEED_RESCHED: 0x80000000
+ * 32bit PREEMPT_COUNT_64BIT
+ *
+ * PREEMPT_MASK: 0x000000ff 0x00000000000000ff
+ * SOFTIRQ_MASK: 0x0000ff00 0x000000000000ff00
+ * HARDIRQ_DISABLE_MASK: 0x00ff0000 0x0000000000ff0000
+ * HARDIRQ_MASK: 0x0f000000 0x000000000f000000
+ * NMI_MASK: 0x10000000 0x00000000f0000000
+ * PREEMPT_NEED_RESCHED: 0x80000000 0x8000000000000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define HARDIRQ_DISABLE_BITS 8
#define HARDIRQ_BITS 4
-#define NMI_BITS 1
+#define NMI_BITS (1 + 3*IS_ENABLED(CONFIG_PREEMPT_COUNT_64BIT))
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
@@ -116,8 +118,8 @@ static __always_inline unsigned char interrupt_context_level(void)
* preempt_count() is commonly implemented with READ_ONCE().
*/
-#define nmi_count() (preempt_count() & NMI_MASK)
-#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
+#define nmi_count() (preempt_count() & NMI_MASK)
+#define hardirq_count() (preempt_count() & HARDIRQ_MASK)
#ifdef CONFIG_PREEMPT_RT
# define softirq_count() (current->softirq_disable_cnt & SOFTIRQ_MASK)
# define irq_count() ((preempt_count() & (NMI_MASK | HARDIRQ_MASK)) | softirq_count())
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 88c594c6d7fc..64b90d2211aa 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -122,6 +122,10 @@ config PREEMPT_RT_NEEDS_BH_LOCK
config PREEMPT_COUNT
bool
+config PREEMPT_COUNT_64BIT
+ bool
+ depends on PREEMPT_COUNT && 64BIT
+
config PREEMPTION
bool
select PREEMPT_COUNT
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 636e6a15f104..f7d4345ca50d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5847,8 +5847,13 @@ void preempt_count_add(int val)
#ifdef CONFIG_DEBUG_PREEMPT
/*
* Underflow?
+ *
+ * Cannot detect underflow based on the current preempt_count() value
+ * if using PREEMPT_COUNT_64BIT because preempt count takes all 32
+ * bits.
*/
- if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
+ if (!IS_ENABLED(CONFIG_PREEMPT_COUNT_64BIT) &&
+ DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
return;
#endif
__preempt_count_add(val);
@@ -5880,7 +5885,10 @@ void preempt_count_sub(int val)
/*
* Underflow?
*/
- if (DEBUG_LOCKS_WARN_ON(val > preempt_count()))
+ unsigned int uval = val;
+ unsigned int pc = preempt_count();
+
+ if (DEBUG_LOCKS_WARN_ON(pc - uval > pc))
return;
/*
* Is the spinlock portion underflowing?
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 6fa83aabae47..68e53484b0cc 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -91,7 +91,13 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
DEFINE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
EXPORT_PER_CPU_SYMBOL_GPL(local_interrupt_disable_state);
+#ifndef CONFIG_PREEMPT_COUNT_64BIT
+/*
+ * Any 32bit architecture that still cares about performance should
+ * probably ensure this is near preempt_count.
+ */
DEFINE_PER_CPU(unsigned int, nmi_nesting);
+#endif
/*
* SOFTIRQ_OFFSET usage:
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index d939403331b5..8fd216bd0be6 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1429,7 +1429,7 @@ static int unexpected_testcase_failures;
static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask)
{
- int saved_preempt_count = preempt_count();
+ long saved_preempt_count = preempt_count();
#ifdef CONFIG_PREEMPT_RT
#ifdef CONFIG_SMP
int saved_mgd_count = current->migration_disabled;
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (9 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 10/11] preempt: Introduce PREEMPT_COUNT_64BIT Boqun Feng
@ 2026-05-08 4:21 ` Boqun Feng
2026-05-08 8:22 ` Mark Rutland
2026-05-09 18:12 ` [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Heiko Carstens
11 siblings, 1 reply; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 4:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
ARM64 already uses 64bit preempt count and the need reschedule bit is
maintained in a separate 32bit than the preempt count. Therefore preempt
count has enough bits to represent 16 level of NMI nesting, hence enable
it for ARM64. This saves a per-CPU variable and additional instructions
in the NMI path.
Signed-off-by: Boqun Feng <boqun@kernel.org>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943..1ed5173872fc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -248,6 +248,7 @@ config ARM64
select PCI_SYSCALL if PCI
select POWER_RESET
select POWER_SUPPLY
+ select PREEMPT_COUNT_64BIT
select SPARSE_IRQ
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT
2026-05-08 4:21 ` [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT Boqun Feng
@ 2026-05-08 8:22 ` Mark Rutland
2026-05-08 14:48 ` Boqun Feng
0 siblings, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2026-05-08 8:22 UTC (permalink / raw)
To: Boqun Feng
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Jonas Bonn,
Stefan Kristiansson, Stafford Horne, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Arnd Bergmann, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, K Prateek Nayak, Waiman Long,
Andrew Morton, Miguel Ojeda, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Jinjie Ruan, Ada Couprie Diaz, Lyude Paul,
Sohil Mehta, Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
Hi Boqun,
I have a question at the end, with some context for other reviewers
before that.
On Thu, May 07, 2026 at 09:21:11PM -0700, Boqun Feng wrote:
> ARM64 already uses 64bit preempt count and the need reschedule bit is
> maintained in a separate 32bit than the preempt count.
For the benefit of those reading the list, arm64 has a separate 32-bit
count and a 32-bit field for need_resched, which are unioned together as
a composite 64-bit value:
union {
u64 preempt_count; /* 0 => preemptible, <0 => bug */
struct {
u32 count;
u32 need_resched;
} preempt;
};
All of our "count" operations work on the 32-bit count, e.g.
static inline int preempt_count(void)
{
return READ_ONCE(current_thread_info()->preempt.count);
}
static inline void __preempt_count_add(int val)
{
u32 pc = READ_ONCE(current_thread_info()->preempt.count);
pc += val;
WRITE_ONCE(current_thread_info()->preempt.count, pc);
}
static inline void __preempt_count_sub(int val)
{
u32 pc = READ_ONCE(current_thread_info()->preempt.count);
pc -= val;
WRITE_ONCE(current_thread_info()->preempt.count, pc);
}
... but some operations use the 64-bit 'preempt_count' field from the union, e.g.
static inline bool should_resched(int preempt_offset)
{
u64 pc = READ_ONCE(current_thread_info()->preempt_count);
return pc == preempt_offset;
}
> Therefore preempt count has enough bits to represent 16 level of NMI
> nesting, hence enable it for ARM64. This saves a per-CPU variable and
> additional instructions in the NMI path.
This might be true, but I think the name "PREEMPT_COUNT_64BIT" is
misleading given the above. What exactly does PREEMPT_COUNT_64BIT tell
core code it can do?
If this is just telling core code that it doesn't need ot reserve space
in preempt_count for the resched bits, can this be called something
else, e.g. HAS_SEPARATE_PREEMPT_RESCHED_BITS?
Mark.
>
> Signed-off-by: Boqun Feng <boqun@kernel.org>
> ---
> arch/arm64/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fe60738e5943..1ed5173872fc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -248,6 +248,7 @@ config ARM64
> select PCI_SYSCALL if PCI
> select POWER_RESET
> select POWER_SUPPLY
> + select PREEMPT_COUNT_64BIT
> select SPARSE_IRQ
> select SWIOTLB
> select SYSCTL_EXCEPTION_TRACE
> --
> 2.50.1 (Apple Git-155)
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT
2026-05-08 8:22 ` Mark Rutland
@ 2026-05-08 14:48 ` Boqun Feng
0 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-08 14:48 UTC (permalink / raw)
To: Mark Rutland
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Jonas Bonn,
Stefan Kristiansson, Stafford Horne, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
Sven Schnelle, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Arnd Bergmann, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, K Prateek Nayak, Waiman Long,
Andrew Morton, Miguel Ojeda, Gary Guo, Björn Roy Baron,
Benno Lossin, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Jinjie Ruan, Ada Couprie Diaz, Lyude Paul,
Sohil Mehta, Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
On Fri, May 08, 2026 at 09:22:20AM +0100, Mark Rutland wrote:
> Hi Boqun,
>
> I have a question at the end, with some context for other reviewers
> before that.
>
Thank you for adding the context.
> On Thu, May 07, 2026 at 09:21:11PM -0700, Boqun Feng wrote:
> > ARM64 already uses 64bit preempt count and the need reschedule bit is
> > maintained in a separate 32bit than the preempt count.
>
> For the benefit of those reading the list, arm64 has a separate 32-bit
> count and a 32-bit field for need_resched, which are unioned together as
> a composite 64-bit value:
>
> union {
> u64 preempt_count; /* 0 => preemptible, <0 => bug */
> struct {
> u32 count;
> u32 need_resched;
> } preempt;
> };
>
> All of our "count" operations work on the 32-bit count, e.g.
>
> static inline int preempt_count(void)
> {
> return READ_ONCE(current_thread_info()->preempt.count);
> }
>
>
> static inline void __preempt_count_add(int val)
> {
> u32 pc = READ_ONCE(current_thread_info()->preempt.count);
> pc += val;
> WRITE_ONCE(current_thread_info()->preempt.count, pc);
> }
>
> static inline void __preempt_count_sub(int val)
> {
> u32 pc = READ_ONCE(current_thread_info()->preempt.count);
> pc -= val;
> WRITE_ONCE(current_thread_info()->preempt.count, pc);
> }
>
> ... but some operations use the 64-bit 'preempt_count' field from the union, e.g.
>
> static inline bool should_resched(int preempt_offset)
> {
> u64 pc = READ_ONCE(current_thread_info()->preempt_count);
> return pc == preempt_offset;
> }
>
> > Therefore preempt count has enough bits to represent 16 level of NMI
> > nesting, hence enable it for ARM64. This saves a per-CPU variable and
> > additional instructions in the NMI path.
>
> This might be true, but I think the name "PREEMPT_COUNT_64BIT" is
> misleading given the above. What exactly does PREEMPT_COUNT_64BIT tell
> core code it can do?
It tells the core code that all 32bit of preempt_count() can be used.
>
> If this is just telling core code that it doesn't need ot reserve space
> in preempt_count for the resched bits, can this be called something
> else, e.g. HAS_SEPARATE_PREEMPT_RESCHED_BITS?
You are right on that the resched bit detail is a major difference
between PREEMPT_COUNT_64BIT=y or n. I haven't gone that far to resolve
the #1 difficult issue in programming (i.e. naming) ;-)
HAS_SEPARATE_PREEMPT_RESCHED_BITS seems reasonable, I will use it if no
other better ideas. Thanks!
Regards,
Boqun
>
> Mark.
>
> >
> > Signed-off-by: Boqun Feng <boqun@kernel.org>
> > ---
> > arch/arm64/Kconfig | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index fe60738e5943..1ed5173872fc 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -248,6 +248,7 @@ config ARM64
> > select PCI_SYSCALL if PCI
> > select POWER_RESET
> > select POWER_SUPPLY
> > + select PREEMPT_COUNT_64BIT
> > select SPARSE_IRQ
> > select SWIOTLB
> > select SYSCTL_EXCEPTION_TRACE
> > --
> > 2.50.1 (Apple Git-155)
> >
> >
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 03/11] preempt: Introduce __preempt_count_{sub, add}_return()
2026-05-08 4:21 ` [PATCH 03/11] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
@ 2026-05-09 18:09 ` Heiko Carstens
0 siblings, 0 replies; 17+ messages in thread
From: Heiko Carstens @ 2026-05-09 18:09 UTC (permalink / raw)
To: Boqun Feng
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Jonas Bonn,
Stefan Kristiansson, Stafford Horne, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Arnd Bergmann, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, K Prateek Nayak, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux, Boqun Feng
On Thu, May 07, 2026 at 09:21:03PM -0700, Boqun Feng wrote:
> From: Boqun Feng <boqun.feng@gmail.com>
>
> In order to use preempt_count() to tracking the interrupt disable
> nesting level, __preempt_count_{add,sub}_return() are introduced, as
> their name suggest, these primitives return the new value of the
> preempt_count() after changing it. The following example shows the usage
> of it in local_interrupt_disable():
>
> // increase the HARDIRQ_DISABLE bit
> new_count = __preempt_count_add_return(HARDIRQ_DISABLE_OFFSET);
>
> // if it's the first-time increment, then disable the interrupt
> // at hardware level.
> if (new_count & HARDIRQ_DISABLE_MASK == HARDIRQ_DISABLE_OFFSET) {
> local_irq_save(flags);
> raw_cpu_write(local_interrupt_disable_state.flags, flags);
> }
>
> Having these primitives will avoid a read of preempt_count() after
> changing preempt_count() on certain architectures.
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Boqun Feng <boqun@kernel.org>
> Link: https://patch.msgid.link/20260121223933.1568682-4-lyude@redhat.com
> ---
> arch/arm64/include/asm/preempt.h | 18 ++++++++++++++++++
> arch/s390/include/asm/preempt.h | 10 ++++++++++
> arch/x86/include/asm/preempt.h | 10 ++++++++++
> include/asm-generic/preempt.h | 14 ++++++++++++++
> 4 files changed, 52 insertions(+)
fwiw:
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1)
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
` (10 preceding siblings ...)
2026-05-08 4:21 ` [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT Boqun Feng
@ 2026-05-09 18:12 ` Heiko Carstens
2026-05-09 18:21 ` Boqun Feng
11 siblings, 1 reply; 17+ messages in thread
From: Heiko Carstens @ 2026-05-09 18:12 UTC (permalink / raw)
To: Boqun Feng
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Jonas Bonn,
Stefan Kristiansson, Stafford Horne, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Arnd Bergmann, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, K Prateek Nayak, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
On Thu, May 07, 2026 at 09:21:00PM -0700, Boqun Feng wrote:
> Hi Peter,
>
> This is a follow-up for Lyude's work [1]. Per your feedback at [2], I
> did some digging and turned out that ARM64 already kinda did this. The
> basic idea is based on:
>
> 1) preempt_count() previously mask our NEED_RESCHED bit, so the
> effective bits is 31bits
> 2) with a 64bit preempt count implementation (as in your PREEMPT_LONG
> proposal), the effective bits that record "whether we CAN preempt or
> not" still fit in 32bit (i.e. an int)
>
> as a result, I don't think we need to change the existing
> preempt_count() API, but rather keep "32bit vs 64bit" as an
> implementation detail. This saves us the need to change the printk code
> for preempt_count().
>
> For people who have reviewed the previous version, patch 8-11 are new,
> please take a look.
>
> The patchset passed the build and booting tests and also a "perf record"
> test on x86 for NMI code path.
>
> I would like to target this changes for 7.2 if possible.
>
> [1]: https://lore.kernel.org/all/20260121223933.1568682-1-lyude@redhat.com/
> [2]: https://lore.kernel.org/all/20260204111234.GA3031506@noisy.programming.kicks-ass.net/
>
> Regards,
> Boqun
>
> Boqun Feng (8):
> preempt: Introduce HARDIRQ_DISABLE_BITS
> preempt: Introduce __preempt_count_{sub, add}_return()
> irq & spin_lock: Add counted interrupt disabling/enabling
> locking: Switch to _irq_{disable,enable}() variants in cleanup guards
> sched: Remove the unused preempt_offset parameter of __cant_sleep()
> sched: Avoid signed comparison of preempt_count() in __cant_migrate()
> preempt: Introduce PREEMPT_COUNT_64BIT
> arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT
The below is the s390 conversion to PREEMPT_COUNT_64BIT (or whatever the
future name might be). I'd appreciate if you would add that to your series.
From 827629e68ad67919f8c825d118863664badd227a Mon Sep 17 00:00:00 2001
From: Heiko Carstens <hca@linux.ibm.com>
Date: Sat, 9 May 2026 19:23:08 +0200
Subject: [PATCH] s390/preempt: Enable PREEMPT_COUNT_64BIT
Convert s390's preempt_count to 64 bit, and change
the preempt primitives accordingly.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
---
arch/s390/Kconfig | 1 +
arch/s390/include/asm/lowcore.h | 13 +++++++----
arch/s390/include/asm/preempt.h | 41 +++++++++++++++------------------
3 files changed, 29 insertions(+), 26 deletions(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index ecbcbb781e40..efa52667b5d4 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -276,6 +276,7 @@ config S390
select PCI_MSI if PCI
select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
select PCI_QUIRKS if PCI
+ select PREEMPT_COUNT_64BIT
select SPARSE_IRQ
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 50ffe75adeb4..0974ab278169 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -160,10 +160,15 @@ struct lowcore {
/* SMP info area */
__u32 cpu_nr; /* 0x03a0 */
__u32 softirq_pending; /* 0x03a4 */
- __s32 preempt_count; /* 0x03a8 */
- __u32 spinlock_lockval; /* 0x03ac */
- __u32 spinlock_index; /* 0x03b0 */
- __u8 pad_0x03b4[0x03b8-0x03b4]; /* 0x03b4 */
+ union {
+ struct {
+ __u32 need_resched; /* 0x03a8 */
+ __u32 count; /* 0x03ac */
+ } preempt;
+ __u64 preempt_count; /* 0x03a8 */
+ };
+ __u32 spinlock_lockval; /* 0x03b0 */
+ __u32 spinlock_index; /* 0x03b4 */
__u64 percpu_offset; /* 0x03b8 */
__u8 pad_0x03c0[0x0400-0x03c0]; /* 0x03c0 */
diff --git a/arch/s390/include/asm/preempt.h b/arch/s390/include/asm/preempt.h
index 0a25d4648b4c..1d5e4d7e9e1b 100644
--- a/arch/s390/include/asm/preempt.h
+++ b/arch/s390/include/asm/preempt.h
@@ -8,11 +8,8 @@
#include <asm/cmpxchg.h>
#include <asm/march.h>
-/*
- * Use MSB so it is possible to read preempt_count with LLGT which
- * reads the least significant 31 bits with a single instruction.
- */
-#define PREEMPT_NEED_RESCHED 0x80000000
+/* Use MSB for PREEMPT_NEED_RESCHED mostly because it is available. */
+#define PREEMPT_NEED_RESCHED 0x8000000000000000UL
/*
* We use the PREEMPT_NEED_RESCHED bit as an inverted NEED_RESCHED such
@@ -26,25 +23,25 @@
*/
static __always_inline int preempt_count(void)
{
- unsigned long lc_preempt, count;
+ unsigned long lc_preempt;
+ int count;
- BUILD_BUG_ON(sizeof_field(struct lowcore, preempt_count) != sizeof(int));
- lc_preempt = offsetof(struct lowcore, preempt_count);
+ lc_preempt = offsetof(struct lowcore, preempt.count);
/* READ_ONCE(get_lowcore()->preempt_count) & ~PREEMPT_NEED_RESCHED */
asm_inline(
- ALTERNATIVE("llgt %[count],%[offzero](%%r0)\n",
- "llgt %[count],%[offalt](%%r0)\n",
+ ALTERNATIVE("ly %[count],%[offzero](%%r0)\n",
+ "ly %[count],%[offalt](%%r0)\n",
ALT_FEATURE(MFEATURE_LOWCORE))
: [count] "=d" (count)
: [offzero] "i" (lc_preempt),
[offalt] "i" (lc_preempt + LOWCORE_ALT_ADDRESS),
- "m" (((struct lowcore *)0)->preempt_count));
+ "m" (((struct lowcore *)0)->preempt.count));
return count;
}
-static __always_inline void preempt_count_set(int pc)
+static __always_inline void preempt_count_set(unsigned long pc)
{
- int old, new;
+ unsigned long old, new;
old = READ_ONCE(get_lowcore()->preempt_count);
do {
@@ -63,12 +60,12 @@ static __always_inline void preempt_count_set(int pc)
static __always_inline void set_preempt_need_resched(void)
{
- __atomic_and(~PREEMPT_NEED_RESCHED, &get_lowcore()->preempt_count);
+ __atomic64_and(~PREEMPT_NEED_RESCHED, (long *)&get_lowcore()->preempt_count);
}
static __always_inline void clear_preempt_need_resched(void)
{
- __atomic_or(PREEMPT_NEED_RESCHED, &get_lowcore()->preempt_count);
+ __atomic64_or(PREEMPT_NEED_RESCHED, (long *)&get_lowcore()->preempt_count);
}
static __always_inline bool test_preempt_need_resched(void)
@@ -88,8 +85,8 @@ static __always_inline void __preempt_count_add(int val)
lc_preempt = offsetof(struct lowcore, preempt_count);
asm_inline(
- ALTERNATIVE("asi %[offzero](%%r0),%[val]\n",
- "asi %[offalt](%%r0),%[val]\n",
+ ALTERNATIVE("agsi %[offzero](%%r0),%[val]\n",
+ "agsi %[offalt](%%r0),%[val]\n",
ALT_FEATURE(MFEATURE_LOWCORE))
: "+m" (((struct lowcore *)0)->preempt_count)
: [offzero] "i" (lc_preempt), [val] "i" (val),
@@ -98,7 +95,7 @@ static __always_inline void __preempt_count_add(int val)
return;
}
}
- __atomic_add(val, &get_lowcore()->preempt_count);
+ __atomic64_add(val, (long *)&get_lowcore()->preempt_count);
}
static __always_inline void __preempt_count_sub(int val)
@@ -119,15 +116,15 @@ static __always_inline bool __preempt_count_dec_and_test(void)
lc_preempt = offsetof(struct lowcore, preempt_count);
asm_inline(
- ALTERNATIVE("alsi %[offzero](%%r0),%[val]\n",
- "alsi %[offalt](%%r0),%[val]\n",
+ ALTERNATIVE("algsi %[offzero](%%r0),%[val]\n",
+ "algsi %[offalt](%%r0),%[val]\n",
ALT_FEATURE(MFEATURE_LOWCORE))
: "=@cc" (cc), "+m" (((struct lowcore *)0)->preempt_count)
: [offzero] "i" (lc_preempt), [val] "i" (-1),
[offalt] "i" (lc_preempt + LOWCORE_ALT_ADDRESS));
return (cc == 0) || (cc == 2);
#else
- return __atomic_add_const_and_test(-1, &get_lowcore()->preempt_count);
+ return __atomic64_add_const_and_test(-1, (long *)&get_lowcore()->preempt_count);
#endif
}
@@ -141,7 +138,7 @@ static __always_inline bool should_resched(int preempt_offset)
static __always_inline int __preempt_count_add_return(int val)
{
- return val + __atomic_add(val, &get_lowcore()->preempt_count);
+ return val + __atomic64_add(val, (long *)&get_lowcore()->preempt_count);
}
static __always_inline int __preempt_count_sub_return(int val)
--
2.51.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1)
2026-05-09 18:12 ` [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Heiko Carstens
@ 2026-05-09 18:21 ` Boqun Feng
0 siblings, 0 replies; 17+ messages in thread
From: Boqun Feng @ 2026-05-09 18:21 UTC (permalink / raw)
To: Heiko Carstens
Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Jonas Bonn,
Stefan Kristiansson, Stafford Horne, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Arnd Bergmann, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, K Prateek Nayak, Waiman Long, Andrew Morton,
Miguel Ojeda, Gary Guo, Björn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Jinjie Ruan, Ada Couprie Diaz, Lyude Paul, Sohil Mehta,
Pawan Gupta, Xin Li (Intel), Sean Christopherson,
Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
Yury Norov, Sebastian Andrzej Siewior, linux-arm-kernel,
linux-kernel, linux-openrisc, linux-s390, linux-arch,
rust-for-linux
On Sat, May 09, 2026 at 08:12:49PM +0200, Heiko Carstens wrote:
> On Thu, May 07, 2026 at 09:21:00PM -0700, Boqun Feng wrote:
> > Hi Peter,
> >
> > This is a follow-up for Lyude's work [1]. Per your feedback at [2], I
> > did some digging and turned out that ARM64 already kinda did this. The
> > basic idea is based on:
> >
> > 1) preempt_count() previously mask our NEED_RESCHED bit, so the
> > effective bits is 31bits
> > 2) with a 64bit preempt count implementation (as in your PREEMPT_LONG
> > proposal), the effective bits that record "whether we CAN preempt or
> > not" still fit in 32bit (i.e. an int)
> >
> > as a result, I don't think we need to change the existing
> > preempt_count() API, but rather keep "32bit vs 64bit" as an
> > implementation detail. This saves us the need to change the printk code
> > for preempt_count().
> >
> > For people who have reviewed the previous version, patch 8-11 are new,
> > please take a look.
> >
> > The patchset passed the build and booting tests and also a "perf record"
> > test on x86 for NMI code path.
> >
> > I would like to target this changes for 7.2 if possible.
> >
> > [1]: https://lore.kernel.org/all/20260121223933.1568682-1-lyude@redhat.com/
> > [2]: https://lore.kernel.org/all/20260204111234.GA3031506@noisy.programming.kicks-ass.net/
> >
> > Regards,
> > Boqun
> >
> > Boqun Feng (8):
> > preempt: Introduce HARDIRQ_DISABLE_BITS
> > preempt: Introduce __preempt_count_{sub, add}_return()
> > irq & spin_lock: Add counted interrupt disabling/enabling
> > locking: Switch to _irq_{disable,enable}() variants in cleanup guards
> > sched: Remove the unused preempt_offset parameter of __cant_sleep()
> > sched: Avoid signed comparison of preempt_count() in __cant_migrate()
> > preempt: Introduce PREEMPT_COUNT_64BIT
> > arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT
>
> The below is the s390 conversion to PREEMPT_COUNT_64BIT (or whatever the
> future name might be). I'd appreciate if you would add that to your series.
>
Thanks a lot! Yeah, I will include it in the next version.
Regards,
Boqun
> From 827629e68ad67919f8c825d118863664badd227a Mon Sep 17 00:00:00 2001
> From: Heiko Carstens <hca@linux.ibm.com>
> Date: Sat, 9 May 2026 19:23:08 +0200
> Subject: [PATCH] s390/preempt: Enable PREEMPT_COUNT_64BIT
>
> Convert s390's preempt_count to 64 bit, and change
> the preempt primitives accordingly.
>
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
> ---
> arch/s390/Kconfig | 1 +
> arch/s390/include/asm/lowcore.h | 13 +++++++----
> arch/s390/include/asm/preempt.h | 41 +++++++++++++++------------------
> 3 files changed, 29 insertions(+), 26 deletions(-)
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index ecbcbb781e40..efa52667b5d4 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -276,6 +276,7 @@ config S390
> select PCI_MSI if PCI
> select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
> select PCI_QUIRKS if PCI
> + select PREEMPT_COUNT_64BIT
> select SPARSE_IRQ
> select SWIOTLB
> select SYSCTL_EXCEPTION_TRACE
> diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
> index 50ffe75adeb4..0974ab278169 100644
> --- a/arch/s390/include/asm/lowcore.h
> +++ b/arch/s390/include/asm/lowcore.h
> @@ -160,10 +160,15 @@ struct lowcore {
> /* SMP info area */
> __u32 cpu_nr; /* 0x03a0 */
> __u32 softirq_pending; /* 0x03a4 */
> - __s32 preempt_count; /* 0x03a8 */
> - __u32 spinlock_lockval; /* 0x03ac */
> - __u32 spinlock_index; /* 0x03b0 */
> - __u8 pad_0x03b4[0x03b8-0x03b4]; /* 0x03b4 */
> + union {
> + struct {
> + __u32 need_resched; /* 0x03a8 */
> + __u32 count; /* 0x03ac */
> + } preempt;
> + __u64 preempt_count; /* 0x03a8 */
> + };
> + __u32 spinlock_lockval; /* 0x03b0 */
> + __u32 spinlock_index; /* 0x03b4 */
> __u64 percpu_offset; /* 0x03b8 */
> __u8 pad_0x03c0[0x0400-0x03c0]; /* 0x03c0 */
>
> diff --git a/arch/s390/include/asm/preempt.h b/arch/s390/include/asm/preempt.h
> index 0a25d4648b4c..1d5e4d7e9e1b 100644
> --- a/arch/s390/include/asm/preempt.h
> +++ b/arch/s390/include/asm/preempt.h
> @@ -8,11 +8,8 @@
> #include <asm/cmpxchg.h>
> #include <asm/march.h>
>
> -/*
> - * Use MSB so it is possible to read preempt_count with LLGT which
> - * reads the least significant 31 bits with a single instruction.
> - */
> -#define PREEMPT_NEED_RESCHED 0x80000000
> +/* Use MSB for PREEMPT_NEED_RESCHED mostly because it is available. */
> +#define PREEMPT_NEED_RESCHED 0x8000000000000000UL
>
> /*
> * We use the PREEMPT_NEED_RESCHED bit as an inverted NEED_RESCHED such
> @@ -26,25 +23,25 @@
> */
> static __always_inline int preempt_count(void)
> {
> - unsigned long lc_preempt, count;
> + unsigned long lc_preempt;
> + int count;
>
> - BUILD_BUG_ON(sizeof_field(struct lowcore, preempt_count) != sizeof(int));
> - lc_preempt = offsetof(struct lowcore, preempt_count);
> + lc_preempt = offsetof(struct lowcore, preempt.count);
> /* READ_ONCE(get_lowcore()->preempt_count) & ~PREEMPT_NEED_RESCHED */
> asm_inline(
> - ALTERNATIVE("llgt %[count],%[offzero](%%r0)\n",
> - "llgt %[count],%[offalt](%%r0)\n",
> + ALTERNATIVE("ly %[count],%[offzero](%%r0)\n",
> + "ly %[count],%[offalt](%%r0)\n",
> ALT_FEATURE(MFEATURE_LOWCORE))
> : [count] "=d" (count)
> : [offzero] "i" (lc_preempt),
> [offalt] "i" (lc_preempt + LOWCORE_ALT_ADDRESS),
> - "m" (((struct lowcore *)0)->preempt_count));
> + "m" (((struct lowcore *)0)->preempt.count));
> return count;
> }
>
> -static __always_inline void preempt_count_set(int pc)
> +static __always_inline void preempt_count_set(unsigned long pc)
> {
> - int old, new;
> + unsigned long old, new;
>
> old = READ_ONCE(get_lowcore()->preempt_count);
> do {
> @@ -63,12 +60,12 @@ static __always_inline void preempt_count_set(int pc)
>
> static __always_inline void set_preempt_need_resched(void)
> {
> - __atomic_and(~PREEMPT_NEED_RESCHED, &get_lowcore()->preempt_count);
> + __atomic64_and(~PREEMPT_NEED_RESCHED, (long *)&get_lowcore()->preempt_count);
> }
>
> static __always_inline void clear_preempt_need_resched(void)
> {
> - __atomic_or(PREEMPT_NEED_RESCHED, &get_lowcore()->preempt_count);
> + __atomic64_or(PREEMPT_NEED_RESCHED, (long *)&get_lowcore()->preempt_count);
> }
>
> static __always_inline bool test_preempt_need_resched(void)
> @@ -88,8 +85,8 @@ static __always_inline void __preempt_count_add(int val)
>
> lc_preempt = offsetof(struct lowcore, preempt_count);
> asm_inline(
> - ALTERNATIVE("asi %[offzero](%%r0),%[val]\n",
> - "asi %[offalt](%%r0),%[val]\n",
> + ALTERNATIVE("agsi %[offzero](%%r0),%[val]\n",
> + "agsi %[offalt](%%r0),%[val]\n",
> ALT_FEATURE(MFEATURE_LOWCORE))
> : "+m" (((struct lowcore *)0)->preempt_count)
> : [offzero] "i" (lc_preempt), [val] "i" (val),
> @@ -98,7 +95,7 @@ static __always_inline void __preempt_count_add(int val)
> return;
> }
> }
> - __atomic_add(val, &get_lowcore()->preempt_count);
> + __atomic64_add(val, (long *)&get_lowcore()->preempt_count);
> }
>
> static __always_inline void __preempt_count_sub(int val)
> @@ -119,15 +116,15 @@ static __always_inline bool __preempt_count_dec_and_test(void)
>
> lc_preempt = offsetof(struct lowcore, preempt_count);
> asm_inline(
> - ALTERNATIVE("alsi %[offzero](%%r0),%[val]\n",
> - "alsi %[offalt](%%r0),%[val]\n",
> + ALTERNATIVE("algsi %[offzero](%%r0),%[val]\n",
> + "algsi %[offalt](%%r0),%[val]\n",
> ALT_FEATURE(MFEATURE_LOWCORE))
> : "=@cc" (cc), "+m" (((struct lowcore *)0)->preempt_count)
> : [offzero] "i" (lc_preempt), [val] "i" (-1),
> [offalt] "i" (lc_preempt + LOWCORE_ALT_ADDRESS));
> return (cc == 0) || (cc == 2);
> #else
> - return __atomic_add_const_and_test(-1, &get_lowcore()->preempt_count);
> + return __atomic64_add_const_and_test(-1, (long *)&get_lowcore()->preempt_count);
> #endif
> }
>
> @@ -141,7 +138,7 @@ static __always_inline bool should_resched(int preempt_offset)
>
> static __always_inline int __preempt_count_add_return(int val)
> {
> - return val + __atomic_add(val, &get_lowcore()->preempt_count);
> + return val + __atomic64_add(val, (long *)&get_lowcore()->preempt_count);
> }
>
> static __always_inline int __preempt_count_sub_return(int val)
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-05-09 18:21 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
2026-05-08 4:21 ` [PATCH 01/11] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
2026-05-08 4:21 ` [PATCH 02/11] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
2026-05-08 4:21 ` [PATCH 03/11] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
2026-05-09 18:09 ` Heiko Carstens
2026-05-08 4:21 ` [PATCH 04/11] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
2026-05-08 4:21 ` [PATCH 05/11] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
2026-05-08 4:21 ` [PATCH 06/11] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
2026-05-08 4:21 ` [PATCH 07/11] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
2026-05-08 4:21 ` [PATCH 08/11] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
2026-05-08 4:21 ` [PATCH 09/11] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
2026-05-08 4:21 ` [PATCH 10/11] preempt: Introduce PREEMPT_COUNT_64BIT Boqun Feng
2026-05-08 4:21 ` [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT Boqun Feng
2026-05-08 8:22 ` Mark Rutland
2026-05-08 14:48 ` Boqun Feng
2026-05-09 18:12 ` [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Heiko Carstens
2026-05-09 18:21 ` Boqun Feng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox