Rust for Linux List
 help / color / mirror / Atom feed
* [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1)
@ 2026-05-26 15:21 Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 01/12] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
                   ` (12 more replies)
  0 siblings, 13 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

Hi Peter,

This is a follow-up for Lyude's work [1]. After learning the current
preempt_count() usage and how ARM64 handle this, I came up with this
series that could resolve your feedback [2]. The basic idea is based on:

1) preempt_count() previously already masks our NEED_RESCHED bit, so the
   effective bits is 31bits
2) with a 64bit preempt count implementation (as in your PREEMPT_LONG
   proposal), the effective bits that record "whether we CAN preempt or
   not" still fit in 32bit (i.e. an int)

as a result, I don't think we need to change the existing
preempt_count() API, but rather keep "32bit vs 64bit" as an
implementation detail. This saves us the need to change the printk code
for preempt_count().

v1: https://lore.kernel.org/rust-for-linux/20260508042111.24358-1-boqun@kernel.org/

Changes since v1:

* Rename PREEMPT_COUNT_64BIT to HAS_SEPARATE_PREEMPT_RESCHED_BITS per
  Mark Rutland.
* Add s390's support for HAS_SEPARATE_PREEMPT_RESCHED_BITS for Heiko
  Carstens, thank you!
* Reorder patch #1 and #2 per Steven Rostedt.
* Keep the NMI count warning per Steven Rostedt and Joel Fernandes.
* Fix an race between interrupt disabling and softirq reported by
  sashiko (see the changes in __irq_exit_rcu()).
* Add Context Analysis annotations for the newly introduced API.
* Sync the preempt bits changes to BPF tests.

I would like to target this changes for 7.2 if possible.

[1]: https://lore.kernel.org/all/20260121223933.1568682-1-lyude@redhat.com/
[2]: https://lore.kernel.org/all/20260204111234.GA3031506@noisy.programming.kicks-ass.net/

Regards,
Boqun

Boqun Feng (8):
  preempt: Introduce HARDIRQ_DISABLE_BITS
  preempt: Introduce __preempt_count_{sub, add}_return()
  irq & spin_lock: Add counted interrupt disabling/enabling
  locking: Switch to _irq_{disable,enable}() variants in cleanup guards
  sched: Remove the unused preempt_offset parameter of __cant_sleep()
  sched: Avoid signed comparison of preempt_count() in __cant_migrate()
  preempt: Introduce HAS_SEPARATE_PREEMPT_RESCHED_BITS
  arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS

Heiko Carstens (1):
  s390/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS

Joel Fernandes (1):
  preempt: Track NMI nesting to separate per-CPU counter

Lyude Paul (2):
  openrisc: Include <linux/cpumask.h> in smp.h
  irq: Add KUnit test for refcounted interrupt enable/disable

 arch/arm64/Kconfig                            |   1 +
 arch/arm64/include/asm/preempt.h              |  18 +++
 arch/openrisc/include/asm/smp.h               |   2 +
 arch/s390/Kconfig                             |   1 +
 arch/s390/include/asm/lowcore.h               |  13 ++-
 arch/s390/include/asm/preempt.h               |  49 ++++----
 arch/x86/Kconfig                              |   1 +
 arch/x86/include/asm/preempt.h                |  61 +++++++---
 arch/x86/kernel/cpu/common.c                  |   2 +-
 include/asm-generic/preempt.h                 |  14 +++
 include/linux/hardirq.h                       |  43 ++++++-
 include/linux/interrupt_rc.h                  |  65 +++++++++++
 include/linux/kernel.h                        |   4 +-
 include/linux/preempt.h                       |  35 ++++--
 include/linux/spinlock.h                      |  48 +++++---
 include/linux/spinlock_api_smp.h              |  41 +++++++
 include/linux/spinlock_api_up.h               |  16 +++
 include/linux/spinlock_rt.h                   |  18 +++
 kernel/Kconfig.preempt                        |   4 +
 kernel/irq/Makefile                           |   1 +
 kernel/irq/refcount_interrupt_test.c          | 109 ++++++++++++++++++
 kernel/locking/spinlock.c                     |  29 +++++
 kernel/sched/core.c                           |  18 ++-
 kernel/softirq.c                              |  22 +++-
 lib/locking-selftest.c                        |   2 +-
 .../testing/selftests/bpf/bpf_experimental.h  |   7 +-
 26 files changed, 544 insertions(+), 80 deletions(-)
 create mode 100644 include/linux/interrupt_rc.h
 create mode 100644 kernel/irq/refcount_interrupt_test.c

-- 
2.50.1 (Apple Git-155)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 01/12] preempt: Track NMI nesting to separate per-CPU counter
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 02/12] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

From: Joel Fernandes <joelagnelf@nvidia.com>

Move NMI nesting tracking from the preempt_count bits to a separate per-CPU
counter (nmi_nesting). This is to free up the NMI bits in the preempt_count,
allowing those bits to be repurposed for other uses.

Reduce NMI_BITS from 4 to 1, using it only to detect if we're in an NMI.
The per-CPU counter currently caps nesting at 15.

[boqun: Solve Steven Rostedt's comment on the BUG_ON() condition]

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-3-lyude@redhat.com
---
 include/linux/hardirq.h                        | 17 +++++++++++++----
 include/linux/preempt.h                        |  9 +++++++--
 kernel/softirq.c                               |  2 ++
 tools/testing/selftests/bpf/bpf_experimental.h |  2 +-
 4 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d57cab4d4c06..1a0360a1000f 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -10,6 +10,8 @@
 #include <linux/vtime.h>
 #include <asm/hardirq.h>
 
+DECLARE_PER_CPU(unsigned int, nmi_nesting);
+
 extern void synchronize_irq(unsigned int irq);
 extern bool synchronize_hardirq(unsigned int irq);
 
@@ -102,14 +104,17 @@ void irq_exit_rcu(void);
  */
 
 /*
- * nmi_enter() can nest up to 15 times; see NMI_BITS.
+ * nmi_enter() can nest - nesting is tracked in a per-CPU counter.
  */
 #define __nmi_enter()						\
 	do {							\
 		lockdep_off();					\
 		arch_nmi_enter();				\
-		BUG_ON(in_nmi() == NMI_MASK);			\
-		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		/* Maximum NMI nesting is 15. */		\
+		BUG_ON(__this_cpu_read(nmi_nesting) >= 15);	\
+		__this_cpu_inc(nmi_nesting);			\
+		__preempt_count_add(HARDIRQ_OFFSET);		\
+		preempt_count_set(preempt_count() | NMI_MASK);	\
 	} while (0)
 
 #define nmi_enter()						\
@@ -124,8 +129,12 @@ void irq_exit_rcu(void);
 
 #define __nmi_exit()						\
 	do {							\
+		unsigned int nesting;				\
 		BUG_ON(!in_nmi());				\
-		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		__preempt_count_sub(HARDIRQ_OFFSET);		\
+		nesting = __this_cpu_dec_return(nmi_nesting);	\
+		if (!nesting)					\
+			__preempt_count_sub(NMI_OFFSET);	\
 		arch_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index d964f965c8ff..586f96688325 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -17,6 +17,8 @@
  *
  * - bits 0-7 are the preemption count (max preemption depth: 256)
  * - bits 8-15 are the softirq count (max # of softirqs: 256)
+ * - bits 16-19 are the hardirq count (max # of hardirqs: 16)
+ * - bit 20 is the NMI flag (no nesting count, tracked separately)
  *
  * The hardirq count could in theory be the same as the number of
  * interrupts in the system, but we run all interrupt handlers with
@@ -24,16 +26,19 @@
  * there are a few palaeontologic drivers which reenable interrupts in
  * the handler, so we need more than one bit here.
  *
+ * NMI nesting depth is tracked in a separate per-CPU variable
+ * (nmi_nesting) to save bits in preempt_count.
+ *
  *         PREEMPT_MASK:	0x000000ff
  *         SOFTIRQ_MASK:	0x0000ff00
  *         HARDIRQ_MASK:	0x000f0000
- *             NMI_MASK:	0x00f00000
+ *             NMI_MASK:	0x00100000
  * PREEMPT_NEED_RESCHED:	0x80000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_BITS	4
-#define NMI_BITS	4
+#define NMI_BITS	1
 
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4425d8dce44b..10af5ed859e7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
 EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
 #endif
 
+DEFINE_PER_CPU(unsigned int, nmi_nesting);
+
 /*
  * SOFTIRQ_OFFSET usage:
  *
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 2234bd6bc9d3..2d4256ff471f 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -449,7 +449,7 @@ extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_BITS	4
-#define NMI_BITS	4
+#define NMI_BITS	1
 
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 02/12] preempt: Introduce HARDIRQ_DISABLE_BITS
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 01/12] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 03/12] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

From: Boqun Feng <boqun.feng@gmail.com>

In order to support preempt_disable()-like interrupt disabling, that is,
using part of preempt_count() to track interrupt disabling nested level,
change the preempt_count() layout to contain 8-bit HARDIRQ_DISABLE
count.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-2-lyude@redhat.com
---
 include/linux/preempt.h                        | 16 +++++++++++-----
 tools/testing/selftests/bpf/bpf_experimental.h |  5 ++++-
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 586f96688325..e2d3079d3f5f 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -17,8 +17,9 @@
  *
  * - bits 0-7 are the preemption count (max preemption depth: 256)
  * - bits 8-15 are the softirq count (max # of softirqs: 256)
- * - bits 16-19 are the hardirq count (max # of hardirqs: 16)
- * - bit 20 is the NMI flag (no nesting count, tracked separately)
+ * - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
+ * - bits 24-27 are the hardirq count (max # of hardirqs: 16)
+ * - bit 28 is the NMI flag (no nesting count, tracked separately)
  *
  * The hardirq count could in theory be the same as the number of
  * interrupts in the system, but we run all interrupt handlers with
@@ -31,29 +32,34 @@
  *
  *         PREEMPT_MASK:	0x000000ff
  *         SOFTIRQ_MASK:	0x0000ff00
- *         HARDIRQ_MASK:	0x000f0000
- *             NMI_MASK:	0x00100000
+ * HARDIRQ_DISABLE_MASK:	0x00ff0000
+ *         HARDIRQ_MASK:	0x0f000000
+ *             NMI_MASK:	0x10000000
  * PREEMPT_NEED_RESCHED:	0x80000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
+#define HARDIRQ_DISABLE_BITS	8
 #define HARDIRQ_BITS	4
 #define NMI_BITS	1
 
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
-#define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_DISABLE_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_SHIFT	(HARDIRQ_DISABLE_SHIFT + HARDIRQ_DISABLE_BITS)
 #define NMI_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
 
 #define __IRQ_MASK(x)	((1UL << (x))-1)
 
 #define PREEMPT_MASK	(__IRQ_MASK(PREEMPT_BITS) << PREEMPT_SHIFT)
 #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_MASK	(__IRQ_MASK(HARDIRQ_DISABLE_BITS) << HARDIRQ_DISABLE_SHIFT)
 #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
 #define NMI_MASK	(__IRQ_MASK(NMI_BITS)     << NMI_SHIFT)
 
 #define PREEMPT_OFFSET	(1UL << PREEMPT_SHIFT)
 #define SOFTIRQ_OFFSET	(1UL << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_OFFSET	(1UL << HARDIRQ_DISABLE_SHIFT)
 #define HARDIRQ_OFFSET	(1UL << HARDIRQ_SHIFT)
 #define NMI_OFFSET	(1UL << NMI_SHIFT)
 
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 2d4256ff471f..a811b080db02 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -448,17 +448,20 @@ extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
 
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
+#define HARDIRQ_DISABLE_BITS	8
 #define HARDIRQ_BITS	4
 #define NMI_BITS	1
 
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
-#define HARDIRQ_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_DISABLE_SHIFT	(SOFTIRQ_SHIFT + SOFTIRQ_BITS)
+#define HARDIRQ_SHIFT	(HARDIRQ_DISABLE_SHIFT + HARDIRQ_DISABLE_BITS)
 #define NMI_SHIFT	(HARDIRQ_SHIFT + HARDIRQ_BITS)
 
 #define __IRQ_MASK(x)	((1UL << (x))-1)
 
 #define SOFTIRQ_MASK	(__IRQ_MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT)
+#define HARDIRQ_DISABLE_MASK	(__IRQ_MASK(HARDIRQ_DISABLE_BITS) << HARDIRQ_DISABLE_SHIFT)
 #define HARDIRQ_MASK	(__IRQ_MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT)
 #define NMI_MASK	(__IRQ_MASK(NMI_BITS)     << NMI_SHIFT)
 
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 03/12] preempt: Introduce __preempt_count_{sub, add}_return()
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 01/12] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 02/12] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 04/12] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

From: Boqun Feng <boqun.feng@gmail.com>

In order to use preempt_count() to tracking the interrupt disable
nesting level, __preempt_count_{add,sub}_return() are introduced, as
their name suggest, these primitives return the new value of the
preempt_count() after changing it. The following example shows the usage
of it in local_interrupt_disable():

	// increase the HARDIRQ_DISABLE bit
	new_count = __preempt_count_add_return(HARDIRQ_DISABLE_OFFSET);

	// if it's the first-time increment, then disable the interrupt
	// at hardware level.
	if (new_count & HARDIRQ_DISABLE_MASK == HARDIRQ_DISABLE_OFFSET) {
		local_irq_save(flags);
		raw_cpu_write(local_interrupt_disable_state.flags, flags);
	}

Having these primitives will avoid a read of preempt_count() after
changing preempt_count() on certain architectures.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-4-lyude@redhat.com
---
 arch/arm64/include/asm/preempt.h | 18 ++++++++++++++++++
 arch/s390/include/asm/preempt.h  | 10 ++++++++++
 arch/x86/include/asm/preempt.h   | 10 ++++++++++
 include/asm-generic/preempt.h    | 14 ++++++++++++++
 4 files changed, 52 insertions(+)

diff --git a/arch/arm64/include/asm/preempt.h b/arch/arm64/include/asm/preempt.h
index 932ea4b62042..0dd8221d1bef 100644
--- a/arch/arm64/include/asm/preempt.h
+++ b/arch/arm64/include/asm/preempt.h
@@ -55,6 +55,24 @@ static inline void __preempt_count_sub(int val)
 	WRITE_ONCE(current_thread_info()->preempt.count, pc);
 }
 
+static inline int __preempt_count_add_return(int val)
+{
+	u32 pc = READ_ONCE(current_thread_info()->preempt.count);
+	pc += val;
+	WRITE_ONCE(current_thread_info()->preempt.count, pc);
+
+	return pc;
+}
+
+static inline int __preempt_count_sub_return(int val)
+{
+	u32 pc = READ_ONCE(current_thread_info()->preempt.count);
+	pc -= val;
+	WRITE_ONCE(current_thread_info()->preempt.count, pc);
+
+	return pc;
+}
+
 static inline bool __preempt_count_dec_and_test(void)
 {
 	struct thread_info *ti = current_thread_info();
diff --git a/arch/s390/include/asm/preempt.h b/arch/s390/include/asm/preempt.h
index 6e5821bb047e..0a25d4648b4c 100644
--- a/arch/s390/include/asm/preempt.h
+++ b/arch/s390/include/asm/preempt.h
@@ -139,6 +139,16 @@ static __always_inline bool should_resched(int preempt_offset)
 	return unlikely(READ_ONCE(get_lowcore()->preempt_count) == preempt_offset);
 }
 
+static __always_inline int __preempt_count_add_return(int val)
+{
+	return val + __atomic_add(val, &get_lowcore()->preempt_count);
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+	return __preempt_count_add_return(-val);
+}
+
 #define init_task_preempt_count(p)	do { } while (0)
 /* Deferred to CPU bringup time */
 #define init_idle_preempt_count(p, cpu)	do { } while (0)
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 578441db09f0..1220656f3370 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -85,6 +85,16 @@ static __always_inline void __preempt_count_sub(int val)
 	raw_cpu_add_4(__preempt_count, -val);
 }
 
+static __always_inline int __preempt_count_add_return(int val)
+{
+	return raw_cpu_add_return_4(__preempt_count, val);
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+	return raw_cpu_add_return_4(__preempt_count, -val);
+}
+
 /*
  * Because we keep PREEMPT_NEED_RESCHED set when we do _not_ need to reschedule
  * a decrement which hits zero means we have no preempt_count and should
diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
index 51f8f3881523..c8683c046615 100644
--- a/include/asm-generic/preempt.h
+++ b/include/asm-generic/preempt.h
@@ -59,6 +59,20 @@ static __always_inline void __preempt_count_sub(int val)
 	*preempt_count_ptr() -= val;
 }
 
+static __always_inline int __preempt_count_add_return(int val)
+{
+	*preempt_count_ptr() += val;
+
+	return *preempt_count_ptr();
+}
+
+static __always_inline int __preempt_count_sub_return(int val)
+{
+	*preempt_count_ptr() -= val;
+
+	return *preempt_count_ptr();
+}
+
 static __always_inline bool __preempt_count_dec_and_test(void)
 {
 	/*
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 04/12] openrisc: Include <linux/cpumask.h> in smp.h
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (2 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 03/12] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

From: Lyude Paul <lyude@redhat.com>

While OpenRISC currently doesn't fail to build upstream, it appears that
include <asm/smp.h> in the right headers is enough to break that -
primarily because OpenRISC's asm/smp.h header doesn't actually provide any
definition for struct cpumask. Which means the only reason we aren't
failing to build kernel is because we've been lucky enough that every spot
including asm/smp.h already has definitions for struct cpumask pulled in.

This became evident when trying to work on a patch series for adding
ref-counted interrupt enable/disables to the kernel, where introducing a
new interrupt_rc.h header suddenly introduced a build error on OpenRISC:

     In file included from include/linux/interrupt_rc.h:17,
                      from include/linux/spinlock.h:60,
                      from include/linux/mmzone.h:8,
                      from include/linux/gfp.h:7,
                      from include/linux/mm.h:7,
                      from arch/openrisc/include/asm/pgalloc.h:20,
                      from arch/openrisc/include/asm/io.h:18,
                      from include/linux/io.h:12,
                      from drivers/irqchip/irq-ompic.c:61:
     arch/openrisc/include/asm/smp.h:21:59: warning: 'struct cpumask'
     declared inside parameter list will not be visible outside of this
     definition or declaration
        21 | extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
           |                                                           ^~~~~~~
     arch/openrisc/include/asm/smp.h:23:54: warning: 'struct cpumask'
     declared inside parameter list will not be visible outside of this
     definition or declaration
        23 | extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));
           |                                                      ^~~~~~~
     drivers/irqchip/irq-ompic.c: In function 'ompic_of_init':
  >> drivers/irqchip/irq-ompic.c:191:28: error: passing argument 1 of
     'set_smp_cross_call' from incompatible pointer type
     [-Werror=incompatible-pointer-types]
       191 |         set_smp_cross_call(ompic_raise_softirq);
           |                            ^~~~~~~~~~~~~~~~~~~
           |                            |
           |                            void (*)(const struct cpumask *, unsigned int)
     arch/openrisc/include/asm/smp.h:23:32: note: expected 'void (*)(const
     struct cpumask *, unsigned int)' but argument is of type 'void
     (*)(const struct cpumask *, unsigned int)'
        23 | extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));

To fix this, let's take an example from the smp.h headers of other
architectures (x86, hexagon, arm64, probably more): just include
linux/cpumask.h at the top.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-5-lyude@redhat.com
---
 arch/openrisc/include/asm/smp.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/openrisc/include/asm/smp.h b/arch/openrisc/include/asm/smp.h
index 007296f160ef..84653aaffa96 100644
--- a/arch/openrisc/include/asm/smp.h
+++ b/arch/openrisc/include/asm/smp.h
@@ -9,6 +9,8 @@
 #ifndef __ASM_OPENRISC_SMP_H
 #define __ASM_OPENRISC_SMP_H
 
+#include <linux/cpumask.h>
+
 #include <asm/spr.h>
 #include <asm/spr_defs.h>
 
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (3 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 04/12] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 16:19   ` bot+bpf-ci
  2026-05-28 10:43   ` Peter Zijlstra
  2026-05-26 15:21 ` [PATCH v2 06/12] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

From: Boqun Feng <boqun.feng@gmail.com>

Currently the nested interrupt disabling and enabling is present by
_irqsave() and _irqrestore() APIs, which are relatively unsafe, for
example:

	<interrupts are enabled as beginning>
	spin_lock_irqsave(l1, flag1);
	spin_lock_irqsave(l2, flag2);
	spin_unlock_irqrestore(l1, flags1);
	<l2 is still held but interrupts are enabled>
	// accesses to interrupt-disable protect data will cause races.

This is even easier to triggered with guard facilities:

	unsigned long flag2;

	scoped_guard(spin_lock_irqsave, l1) {
		spin_lock_irqsave(l2, flag2);
	}
	// l2 locked but interrupts are enabled.
	spin_unlock_irqrestore(l2, flag2);

(Hand-to-hand locking critical sections are not uncommon for a
fine-grained lock design)

And because this unsafety, Rust cannot easily wrap the
interrupt-disabling locks in a safe API, which complicates the design.

To resolve this, introduce a new set of interrupt disabling APIs:

*	local_interrupt_disable();
*	local_interrupt_enable();

They work like local_irq_save() and local_irq_restore() except that 1)
the outermost local_interrupt_disable() call save the interrupt state
into a percpu variable, so that the outermost local_interrupt_enable()
can restore the state, and 2) a percpu counter is added to record the
nest level of these calls, so that interrupts are not accidentally
enabled inside the outermost critical section.

Also add the corresponding spin_lock primitives: spin_lock_irq_disable()
and spin_unlock_irq_enable(), as a result, code as follow:

	spin_lock_irq_disable(l1);
	spin_lock_irq_disable(l2);
	spin_unlock_irq_enable(l1);
	// Interrupts are still disabled.
	spin_unlock_irq_enable(l2);

doesn't have the issue that interrupts are accidentally enabled.

This also makes the wrapper of interrupt-disabling locks on Rust easier
to design.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-6-lyude@redhat.com
---
 include/linux/interrupt_rc.h     | 65 ++++++++++++++++++++++++++++++++
 include/linux/preempt.h          |  4 ++
 include/linux/spinlock.h         | 22 +++++++++++
 include/linux/spinlock_api_smp.h | 41 ++++++++++++++++++++
 include/linux/spinlock_api_up.h  | 16 ++++++++
 include/linux/spinlock_rt.h      | 18 +++++++++
 kernel/locking/spinlock.c        | 29 ++++++++++++++
 kernel/softirq.c                 | 14 ++++++-
 8 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/interrupt_rc.h

diff --git a/include/linux/interrupt_rc.h b/include/linux/interrupt_rc.h
new file mode 100644
index 000000000000..868f32524a87
--- /dev/null
+++ b/include/linux/interrupt_rc.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * include/linux/interrupt_rc.h - refcounted local processor interrupt
+ * management.
+ *
+ * Since the implementation of this API currently depends on
+ * local_irq_save()/local_irq_restore(), we split this into it's own header to
+ * make it easier to include without hitting circular header dependencies.
+ */
+
+#ifndef __LINUX_INTERRUPT_RC_H
+#define __LINUX_INTERRUPT_RC_H
+
+#include <linux/irqflags.h>
+#include <asm/processor.h>
+#ifdef CONFIG_SMP
+#include <asm/smp.h>
+#endif
+
+/* Per-cpu interrupt disabling state for local_interrupt_{disable,enable}() */
+struct interrupt_disable_state {
+	unsigned long flags;
+};
+
+DECLARE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
+
+static inline void local_interrupt_disable(void)
+{
+	unsigned long flags;
+	int new_count;
+
+	new_count = hardirq_disable_enter();
+
+	/* Interrupts can happen here, but it's OK, see __irq_exit_rcu(). */
+
+	if ((new_count & HARDIRQ_DISABLE_MASK) == HARDIRQ_DISABLE_OFFSET) {
+		local_irq_save(flags);
+		raw_cpu_write(local_interrupt_disable_state.flags, flags);
+	}
+}
+
+static inline void local_interrupt_enable(void)
+{
+	int new_count;
+
+	new_count = hardirq_disable_exit();
+
+	if ((new_count & HARDIRQ_DISABLE_MASK) == 0) {
+		unsigned long flags;
+
+		flags = raw_cpu_read(local_interrupt_disable_state.flags);
+		local_irq_restore(flags);
+		/*
+		 * TODO: re-read preempt count can be avoided, but it needs
+		 * should_resched() taking another parameter as the current
+		 * preempt count
+		 */
+#ifdef CONFIG_PREEMPTION
+		if (should_resched(0))
+			__preempt_schedule();
+#endif
+	}
+}
+
+#endif /* !__LINUX_INTERRUPT_RC_H */
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index e2d3079d3f5f..33fc4c814a9f 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -151,6 +151,10 @@ static __always_inline unsigned char interrupt_context_level(void)
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
 
+#define hardirq_disable_count()	((preempt_count() & HARDIRQ_DISABLE_MASK) >> HARDIRQ_DISABLE_SHIFT)
+#define hardirq_disable_enter()	__preempt_count_add_return(HARDIRQ_DISABLE_OFFSET)
+#define hardirq_disable_exit()	__preempt_count_sub_return(HARDIRQ_DISABLE_OFFSET)
+
 /*
  * The preempt_count offset after preempt_disable();
  */
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 241277cd34cf..9d6012ac929d 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -57,6 +57,7 @@
 #include <linux/linkage.h>
 #include <linux/compiler.h>
 #include <linux/irqflags.h>
+#include <linux/interrupt_rc.h>
 #include <linux/thread_info.h>
 #include <linux/stringify.h>
 #include <linux/bottom_half.h>
@@ -273,9 +274,11 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 #endif
 
 #define raw_spin_lock_irq(lock)		_raw_spin_lock_irq(lock)
+#define raw_spin_lock_irq_disable(lock)	_raw_spin_lock_irq_disable(lock)
 #define raw_spin_lock_bh(lock)		_raw_spin_lock_bh(lock)
 #define raw_spin_unlock(lock)		_raw_spin_unlock(lock)
 #define raw_spin_unlock_irq(lock)	_raw_spin_unlock_irq(lock)
+#define raw_spin_unlock_irq_enable(lock)	_raw_spin_unlock_irq_enable(lock)
 
 #define raw_spin_unlock_irqrestore(lock, flags)		\
 	do {							\
@@ -290,6 +293,8 @@ static inline void do_raw_spin_unlock(raw_spinlock_t *lock) __releases(lock)
 
 #define raw_spin_trylock_irqsave(lock, flags) _raw_spin_trylock_irqsave(lock, &(flags))
 
+#define raw_spin_trylock_irq_disable(lock)	_raw_spin_trylock_irq_disable(lock)
+
 #ifndef CONFIG_PREEMPT_RT
 /* Include rwlock functions for !RT */
 #include <linux/rwlock.h>
@@ -372,6 +377,12 @@ static __always_inline void spin_lock_irq(spinlock_t *lock)
 	raw_spin_lock_irq(&lock->rlock);
 }
 
+static __always_inline void spin_lock_irq_disable(spinlock_t *lock)
+	__acquires(lock) __no_context_analysis
+{
+	raw_spin_lock_irq_disable(&lock->rlock);
+}
+
 #define spin_lock_irqsave(lock, flags)				\
 do {								\
 	raw_spin_lock_irqsave(spinlock_check(lock), flags);	\
@@ -402,6 +413,12 @@ static __always_inline void spin_unlock_irq(spinlock_t *lock)
 	raw_spin_unlock_irq(&lock->rlock);
 }
 
+static __always_inline void spin_unlock_irq_enable(spinlock_t *lock)
+	__releases(lock) __no_context_analysis
+{
+	raw_spin_unlock_irq_enable(&lock->rlock);
+}
+
 static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
 	__releases(lock) __no_context_analysis
 {
@@ -427,6 +444,11 @@ static __always_inline bool _spin_trylock_irqsave(spinlock_t *lock, unsigned lon
 }
 #define spin_trylock_irqsave(lock, flags) _spin_trylock_irqsave(lock, &(flags))
 
+static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
+{
+	return raw_spin_trylock_irq_disable(&lock->rlock);
+}
+
 /**
  * spin_is_locked() - Check whether a spinlock is locked.
  * @lock: Pointer to the spinlock.
diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h
index bda5e7a390cd..07a94ba1d760 100644
--- a/include/linux/spinlock_api_smp.h
+++ b/include/linux/spinlock_api_smp.h
@@ -28,6 +28,8 @@ _raw_spin_lock_nest_lock(raw_spinlock_t *lock, struct lockdep_map *map)
 void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)		__acquires(lock);
 void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
 								__acquires(lock);
+void __lockfunc _raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+								__acquires(lock);
 
 unsigned long __lockfunc _raw_spin_lock_irqsave(raw_spinlock_t *lock)
 								__acquires(lock);
@@ -39,6 +41,7 @@ int __lockfunc _raw_spin_trylock_bh(raw_spinlock_t *lock)	__cond_acquires(true,
 void __lockfunc _raw_spin_unlock(raw_spinlock_t *lock)		__releases(lock);
 void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)	__releases(lock);
 void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)	__releases(lock);
+void __lockfunc _raw_spin_unlock_irq_enable(raw_spinlock_t *lock)	__releases(lock);
 void __lockfunc
 _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
 								__releases(lock);
@@ -55,6 +58,11 @@ _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
 #define _raw_spin_lock_irq(lock) __raw_spin_lock_irq(lock)
 #endif
 
+/* Use the same config as spin_lock_irq() temporarily. */
+#ifdef CONFIG_INLINE_SPIN_LOCK_IRQ
+#define _raw_spin_lock_irq_disable(lock) __raw_spin_lock_irq_disable(lock)
+#endif
+
 #ifdef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
 #define _raw_spin_lock_irqsave(lock) __raw_spin_lock_irqsave(lock)
 #endif
@@ -79,6 +87,11 @@ _raw_spin_unlock_irqrestore(raw_spinlock_t *lock, unsigned long flags)
 #define _raw_spin_unlock_irq(lock) __raw_spin_unlock_irq(lock)
 #endif
 
+/* Use the same config as spin_unlock_irq() temporarily. */
+#ifdef CONFIG_INLINE_SPIN_UNLOCK_IRQ
+#define _raw_spin_unlock_irq_enable(lock) __raw_spin_unlock_irq_enable(lock)
+#endif
+
 #ifdef CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE
 #define _raw_spin_unlock_irqrestore(lock, flags) __raw_spin_unlock_irqrestore(lock, flags)
 #endif
@@ -105,6 +118,18 @@ static __always_inline bool _raw_spin_trylock_irq(raw_spinlock_t *lock)
 	return false;
 }
 
+static __always_inline bool _raw_spin_trylock_irq_disable(raw_spinlock_t *lock)
+	__cond_acquires(true, lock)
+{
+	local_interrupt_disable();
+	if (do_raw_spin_trylock(lock)) {
+		spin_acquire(&lock->dep_map, 0, 1, _RET_IP_);
+		return true;
+	}
+	local_interrupt_enable();
+	return false;
+}
+
 static __always_inline bool _raw_spin_trylock_irqsave(raw_spinlock_t *lock, unsigned long *flags)
 	__cond_acquires(true, lock)
 {
@@ -143,6 +168,14 @@ static inline void __raw_spin_lock_irq(raw_spinlock_t *lock)
 	LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
 }
 
+static inline void __raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+	__acquires(lock) __no_context_analysis
+{
+	local_interrupt_disable();
+	spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
+	LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
+}
+
 static inline void __raw_spin_lock_bh(raw_spinlock_t *lock)
 	__acquires(lock) __no_context_analysis
 {
@@ -188,6 +221,14 @@ static inline void __raw_spin_unlock_irq(raw_spinlock_t *lock)
 	preempt_enable();
 }
 
+static inline void __raw_spin_unlock_irq_enable(raw_spinlock_t *lock)
+	__releases(lock)
+{
+	spin_release(&lock->dep_map, _RET_IP_);
+	do_raw_spin_unlock(lock);
+	local_interrupt_enable();
+}
+
 static inline void __raw_spin_unlock_bh(raw_spinlock_t *lock)
 	__releases(lock)
 {
diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h
index a9d5c7c66e03..e4de8bb26a15 100644
--- a/include/linux/spinlock_api_up.h
+++ b/include/linux/spinlock_api_up.h
@@ -42,6 +42,9 @@
 #define __LOCK_IRQSAVE(lock, flags, ...) \
   do { local_irq_save(flags); __LOCK(lock, ##__VA_ARGS__); } while (0)
 
+#define __LOCK_IRQ_DISABLE(lock, ...) \
+  do { local_interrupt_disable(); __LOCK(lock, ##__VA_ARGS__); } while (0)
+
 #define ___UNLOCK_(lock) \
   do { __release(lock); (void)(lock); } while (0)
 
@@ -61,6 +64,10 @@
 #define __UNLOCK_IRQRESTORE(lock, flags, ...) \
   do { local_irq_restore(flags); __UNLOCK(lock, ##__VA_ARGS__); } while (0)
 
+#define __UNLOCK_IRQ_ENABLE(lock, ...) \
+  do { __UNLOCK(lock, ##__VA_ARGS__); local_interrupt_enable(); } while (0)
+
+
 #define _raw_spin_lock(lock)			__LOCK(lock)
 #define _raw_spin_lock_nested(lock, subclass)	__LOCK(lock)
 #define _raw_read_lock(lock)			__LOCK(lock, shared)
@@ -70,6 +77,7 @@
 #define _raw_read_lock_bh(lock)			__LOCK_BH(lock, shared)
 #define _raw_write_lock_bh(lock)		__LOCK_BH(lock)
 #define _raw_spin_lock_irq(lock)		__LOCK_IRQ(lock)
+#define _raw_spin_lock_irq_disable(lock)	__LOCK_IRQ_DISABLE(lock)
 #define _raw_read_lock_irq(lock)		__LOCK_IRQ(lock, shared)
 #define _raw_write_lock_irq(lock)		__LOCK_IRQ(lock)
 #define _raw_spin_lock_irqsave(lock, flags)	__LOCK_IRQSAVE(lock, flags)
@@ -97,6 +105,13 @@ static __always_inline int _raw_spin_trylock_irq(raw_spinlock_t *lock)
 	return 1;
 }
 
+static __always_inline int _raw_spin_trylock_irq_disable(raw_spinlock_t *lock)
+	__cond_acquires(true, lock)
+{
+	__LOCK_IRQ_DISABLE(lock);
+	return 1;
+}
+
 static __always_inline int _raw_spin_trylock_irqsave(raw_spinlock_t *lock, unsigned long *flags)
 	__cond_acquires(true, lock)
 {
@@ -132,6 +147,7 @@ static __always_inline int _raw_write_trylock_irqsave(rwlock_t *lock, unsigned l
 #define _raw_write_unlock_bh(lock)		__UNLOCK_BH(lock)
 #define _raw_read_unlock_bh(lock)		__UNLOCK_BH(lock, shared)
 #define _raw_spin_unlock_irq(lock)		__UNLOCK_IRQ(lock)
+#define _raw_spin_unlock_irq_enable(lock)	__UNLOCK_IRQ_ENABLE(lock)
 #define _raw_read_unlock_irq(lock)		__UNLOCK_IRQ(lock, shared)
 #define _raw_write_unlock_irq(lock)		__UNLOCK_IRQ(lock)
 #define _raw_spin_unlock_irqrestore(lock, flags) \
diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h
index 373618a4243c..560d06384e0c 100644
--- a/include/linux/spinlock_rt.h
+++ b/include/linux/spinlock_rt.h
@@ -96,6 +96,12 @@ static __always_inline void spin_lock_irq(spinlock_t *lock)
 	rt_spin_lock(lock);
 }
 
+static __always_inline void spin_lock_irq_disable(spinlock_t *lock)
+	__acquires(lock)
+{
+	rt_spin_lock(lock);
+}
+
 #define spin_lock_irqsave(lock, flags)			 \
 	do {						 \
 		typecheck(unsigned long, flags);	 \
@@ -122,6 +128,12 @@ static __always_inline void spin_unlock_irq(spinlock_t *lock)
 	rt_spin_unlock(lock);
 }
 
+static __always_inline void spin_unlock_irq_enable(spinlock_t *lock)
+	__releases(lock)
+{
+	rt_spin_unlock(lock);
+}
+
 static __always_inline void spin_unlock_irqrestore(spinlock_t *lock,
 						   unsigned long flags)
 	__releases(lock)
@@ -131,6 +143,12 @@ static __always_inline void spin_unlock_irqrestore(spinlock_t *lock,
 
 #define spin_trylock(lock)	rt_spin_trylock(lock)
 
+static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
+	__cond_acquires(true, lock)
+{
+	return rt_spin_trylock(lock);
+}
+
 #define spin_trylock_bh(lock)	rt_spin_trylock_bh(lock)
 
 #define spin_trylock_irq(lock)	rt_spin_trylock(lock)
diff --git a/kernel/locking/spinlock.c b/kernel/locking/spinlock.c
index b42d293da38b..764641f6ec57 100644
--- a/kernel/locking/spinlock.c
+++ b/kernel/locking/spinlock.c
@@ -129,6 +129,19 @@ static void __lockfunc __raw_##op##_lock_bh(locktype##_t *lock)		\
  */
 BUILD_LOCK_OPS(spin, raw_spinlock, __acquires);
 
+/* No rwlock_t variants for now, so just build this function by hand */
+static void __lockfunc __raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+	for (;;) {
+		local_interrupt_disable();
+		if (likely(do_raw_spin_trylock(lock)))
+			break;
+		local_interrupt_enable();
+
+		arch_spin_relax(&lock->raw_lock);
+	}
+}
+
 #ifndef CONFIG_PREEMPT_RT
 BUILD_LOCK_OPS(read, rwlock, __acquires_shared);
 BUILD_LOCK_OPS(write, rwlock, __acquires);
@@ -176,6 +189,14 @@ noinline void __lockfunc _raw_spin_lock_irq(raw_spinlock_t *lock)
 EXPORT_SYMBOL(_raw_spin_lock_irq);
 #endif
 
+#ifndef CONFIG_INLINE_SPIN_LOCK_IRQ
+noinline void __lockfunc _raw_spin_lock_irq_disable(raw_spinlock_t *lock)
+{
+	__raw_spin_lock_irq_disable(lock);
+}
+EXPORT_SYMBOL_GPL(_raw_spin_lock_irq_disable);
+#endif
+
 #ifndef CONFIG_INLINE_SPIN_LOCK_BH
 noinline void __lockfunc _raw_spin_lock_bh(raw_spinlock_t *lock)
 {
@@ -208,6 +229,14 @@ noinline void __lockfunc _raw_spin_unlock_irq(raw_spinlock_t *lock)
 EXPORT_SYMBOL(_raw_spin_unlock_irq);
 #endif
 
+#ifndef CONFIG_INLINE_SPIN_UNLOCK_IRQ
+noinline void __lockfunc _raw_spin_unlock_irq_enable(raw_spinlock_t *lock)
+{
+	__raw_spin_unlock_irq_enable(lock);
+}
+EXPORT_SYMBOL_GPL(_raw_spin_unlock_irq_enable);
+#endif
+
 #ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
 noinline void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
 {
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 10af5ed859e7..d1ab1799794c 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,9 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
 EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
 #endif
 
+DEFINE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
+EXPORT_PER_CPU_SYMBOL_GPL(local_interrupt_disable_state);
+
 DEFINE_PER_CPU(unsigned int, nmi_nesting);
 
 /*
@@ -728,7 +731,16 @@ static inline void __irq_exit_rcu(void)
 #endif
 	account_hardirq_exit(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
-	if (!in_interrupt() && local_softirq_pending()) {
+	/*
+	 * Interrupts may happen between hardirq_disable_enter() and
+	 * local_irq_save() in local_interrupt_disable(), if irq_exit() invokes
+	 * softirq here, we may have a softirq handler calling
+	 * local_interrupt_disable() but it won't disable the irq because
+	 * hardirq disabling count is already 1, hence we need to prevent
+	 * invoking softirq when a local_interrupt_disable() is ongoing.
+	 */
+	if (!in_interrupt() && !hardirq_disable_count() &&
+	    local_softirq_pending()) {
 		/*
 		 * If we left hrtimers unarmed, make sure to arm them now,
 		 * before enabling interrupts to run SoftIRQ.
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 06/12] irq: Add KUnit test for refcounted interrupt enable/disable
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (4 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 07/12] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

From: Lyude Paul <lyude@redhat.com>

While making changes to the refcounted interrupt patch series, at some
point on my local branch I broke something and ended up writing some kunit
tests for testing refcounted interrupts as a result. So, let's include
these tests now that we have refcounted interrupts.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-7-lyude@redhat.com
---
 kernel/irq/Makefile                  |   1 +
 kernel/irq/refcount_interrupt_test.c | 109 +++++++++++++++++++++++++++
 2 files changed, 110 insertions(+)
 create mode 100644 kernel/irq/refcount_interrupt_test.c

diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index 86a2e5ae08f9..44c4d6fc502a 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -16,3 +16,4 @@ obj-$(CONFIG_SMP) += affinity.o
 obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
 obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
 obj-$(CONFIG_IRQ_KUNIT_TEST) += irq_test.o
+obj-$(CONFIG_KUNIT) += refcount_interrupt_test.o
diff --git a/kernel/irq/refcount_interrupt_test.c b/kernel/irq/refcount_interrupt_test.c
new file mode 100644
index 000000000000..b4f224595f26
--- /dev/null
+++ b/kernel/irq/refcount_interrupt_test.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KUnit test for refcounted interrupt enable/disables.
+ */
+
+#include <kunit/test.h>
+#include <linux/interrupt_rc.h>
+
+#define TEST_IRQ_ON() KUNIT_EXPECT_FALSE(test, irqs_disabled())
+#define TEST_IRQ_OFF() KUNIT_EXPECT_TRUE(test, irqs_disabled())
+
+/* ===== Test cases ===== */
+static void test_single_irq_change(struct kunit *test)
+{
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+	local_interrupt_enable();
+}
+
+static void test_nested_irq_change(struct kunit *test)
+{
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+
+	local_interrupt_enable();
+	TEST_IRQ_OFF();
+	local_interrupt_enable();
+	TEST_IRQ_OFF();
+	local_interrupt_enable();
+	TEST_IRQ_ON();
+}
+
+static void test_multiple_irq_change(struct kunit *test)
+{
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+
+	local_interrupt_enable();
+	TEST_IRQ_OFF();
+	local_interrupt_enable();
+	TEST_IRQ_ON();
+
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+	local_interrupt_enable();
+	TEST_IRQ_ON();
+}
+
+static void test_irq_save(struct kunit *test)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	TEST_IRQ_OFF();
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+	local_interrupt_enable();
+	TEST_IRQ_OFF();
+	local_irq_restore(flags);
+	TEST_IRQ_ON();
+
+	local_interrupt_disable();
+	TEST_IRQ_OFF();
+	local_irq_save(flags);
+	TEST_IRQ_OFF();
+	local_irq_restore(flags);
+	TEST_IRQ_OFF();
+	local_interrupt_enable();
+	TEST_IRQ_ON();
+}
+
+static struct kunit_case test_cases[] = {
+	KUNIT_CASE(test_single_irq_change),
+	KUNIT_CASE(test_nested_irq_change),
+	KUNIT_CASE(test_multiple_irq_change),
+	KUNIT_CASE(test_irq_save),
+	{},
+};
+
+/* (init and exit are the same */
+static int test_init(struct kunit *test)
+{
+	TEST_IRQ_ON();
+
+	return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+	TEST_IRQ_ON();
+}
+
+static struct kunit_suite refcount_interrupt_test_suite = {
+	.name = "refcount_interrupt",
+	.test_cases = test_cases,
+	.init = test_init,
+	.exit = test_exit,
+};
+
+kunit_test_suite(refcount_interrupt_test_suite);
+MODULE_AUTHOR("Lyude Paul <lyude@redhat.com>");
+MODULE_DESCRIPTION("Refcounted interrupt unit test suite");
+MODULE_LICENSE("GPL");
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 07/12] locking: Switch to _irq_{disable,enable}() variants in cleanup guards
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (5 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 06/12] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-28 10:45   ` Peter Zijlstra
  2026-05-26 15:21 ` [PATCH v2 08/12] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

From: Boqun Feng <boqun.feng@gmail.com>

The semantics of various irq disabling guards match what
*_irq_{disable,enable}() provide, i.e. the interrupt disabling is
properly nested, therefore it's OK to switch to use
*_irq_{disable,enable}() primitives.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-17-lyude@redhat.com
---
 include/linux/spinlock.h | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 9d6012ac929d..0b4023b67f43 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -571,12 +571,12 @@ DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_nested, __acquires(_T), __releases(*(raw
 #define class_raw_spinlock_nested_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_nested, _T)
 
 DEFINE_LOCK_GUARD_1(raw_spinlock_irq, raw_spinlock_t,
-		    raw_spin_lock_irq(_T->lock),
-		    raw_spin_unlock_irq(_T->lock))
+		    raw_spin_lock_irq_disable(_T->lock),
+		    raw_spin_unlock_irq_enable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irq, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
 #define class_raw_spinlock_irq_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irq, _T)
 
-DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock))
+DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq_disable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irq_try, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
 #define class_raw_spinlock_irq_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irq_try, _T)
 
@@ -591,14 +591,13 @@ DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_bh_try, __acquires(_T), __releases(*(raw
 #define class_raw_spinlock_bh_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_bh_try, _T)
 
 DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t,
-		    raw_spin_lock_irqsave(_T->lock, _T->flags),
-		    raw_spin_unlock_irqrestore(_T->lock, _T->flags),
-		    unsigned long flags)
+		    raw_spin_lock_irq_disable(_T->lock),
+		    raw_spin_unlock_irq_enable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
 #define class_raw_spinlock_irqsave_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave, _T)
 
 DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try,
-			 raw_spin_trylock_irqsave(_T->lock, _T->flags))
+			 raw_spin_trylock_irq_disable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave_try, __acquires(_T), __releases(*(raw_spinlock_t **)_T))
 #define class_raw_spinlock_irqsave_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave_try, _T)
 
@@ -617,13 +616,13 @@ DECLARE_LOCK_GUARD_1_ATTRS(spinlock_try, __acquires(_T), __releases(*(spinlock_t
 #define class_spinlock_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_try, _T)
 
 DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t,
-		    spin_lock_irq(_T->lock),
-		    spin_unlock_irq(_T->lock))
+		    spin_lock_irq_disable(_T->lock),
+		    spin_unlock_irq_enable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irq, __acquires(_T), __releases(*(spinlock_t **)_T))
 #define class_spinlock_irq_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irq, _T)
 
 DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try,
-			 spin_trylock_irq(_T->lock))
+			 spin_trylock_irq_disable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irq_try, __acquires(_T), __releases(*(spinlock_t **)_T))
 #define class_spinlock_irq_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irq_try, _T)
 
@@ -639,14 +638,13 @@ DECLARE_LOCK_GUARD_1_ATTRS(spinlock_bh_try, __acquires(_T), __releases(*(spinloc
 #define class_spinlock_bh_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_bh_try, _T)
 
 DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t,
-		    spin_lock_irqsave(_T->lock, _T->flags),
-		    spin_unlock_irqrestore(_T->lock, _T->flags),
-		    unsigned long flags)
+		    spin_lock_irq_disable(_T->lock),
+		    spin_unlock_irq_enable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irqsave, __acquires(_T), __releases(*(spinlock_t **)_T))
 #define class_spinlock_irqsave_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irqsave, _T)
 
 DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try,
-			 spin_trylock_irqsave(_T->lock, _T->flags))
+			 spin_trylock_irq_disable(_T->lock))
 DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irqsave_try, __acquires(_T), __releases(*(spinlock_t **)_T))
 #define class_spinlock_irqsave_try_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(spinlock_irqsave_try, _T)
 
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 08/12] sched: Remove the unused preempt_offset parameter of __cant_sleep()
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (6 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 07/12] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 09/12] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

The preempt_offset is always 0 in all the callsites of __cant_sleep(),
hence remove it. It also allows us to clear the code a bit by stopping
using a "preempt_count() > .." comparison.

Signed-off-by: Boqun Feng <boqun@kernel.org>
---
 include/linux/kernel.h | 4 ++--
 kernel/sched/core.c    | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index e5570a16cbb1..24414c79e59a 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -72,7 +72,7 @@ extern int dynamic_might_resched(void);
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 extern void __might_resched(const char *file, int line, unsigned int offsets);
 extern void __might_sleep(const char *file, int line);
-extern void __cant_sleep(const char *file, int line, int preempt_offset);
+extern void __cant_sleep(const char *file, int line);
 extern void __cant_migrate(const char *file, int line);
 
 /**
@@ -95,7 +95,7 @@ extern void __cant_migrate(const char *file, int line);
  * this macro will print a stack trace if it is executed with preemption enabled
  */
 # define cant_sleep() \
-	do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
+	do { __cant_sleep(__FILE__, __LINE__); } while (0)
 # define sched_annotate_sleep()	(current->task_state_change = 0)
 
 /**
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..75dba7cc09bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9165,7 +9165,7 @@ void __might_resched(const char *file, int line, unsigned int offsets)
 }
 EXPORT_SYMBOL(__might_resched);
 
-void __cant_sleep(const char *file, int line, int preempt_offset)
+void __cant_sleep(const char *file, int line)
 {
 	static unsigned long prev_jiffy;
 
@@ -9175,7 +9175,7 @@ void __cant_sleep(const char *file, int line, int preempt_offset)
 	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
 		return;
 
-	if (preempt_count() > preempt_offset)
+	if (preempt_count())
 		return;
 
 	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 09/12] sched: Avoid signed comparison of preempt_count() in __cant_migrate()
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (7 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 08/12] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 10/12] preempt: Introduce HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

Currently preempt_count() is always a non-negative int on all archs
(PREEMPT_NEED_RESCHED archs will mask out the MSB when return
preempt_count()), hence the checking in __cant_migrate() is in fact just
checking whether preempt_count() is 0 or not. In a future change, we are
going to use all the 32 bits of preempt_count(), which would make
negative int values possible from preempt_count(). Therefore convert the
"> 0" comparison into a zero checking to prepare for the future change.
No functional changes are intended.

Signed-off-by: Boqun Feng <boqun@kernel.org>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75dba7cc09bd..636e6a15f104 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9207,7 +9207,7 @@ void __cant_migrate(const char *file, int line)
 	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
 		return;
 
-	if (preempt_count() > 0)
+	if (preempt_count())
 		return;
 
 	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 10/12] preempt: Introduce HAS_SEPARATE_PREEMPT_RESCHED_BITS
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (8 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 09/12] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-26 15:21 ` [PATCH v2 11/12] arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

With the changes that enable preempt count to tracking irq disabling
nesting, we don't have enough bits in 32bit preempt count
implementation, as a result we move NMI nesting bits out of the 32bit
preempt count. However on the architectures that can support 64bit
preempt count implementation, we can keep the NMI nesting bits in the
32bit preempt count and avoid maintaining NMI nesting bits out of the
same cache line.

Therefore HAS_SEPARATE_PREEMPT_RESCHED_BITS is introduced to allow
architectures to select this. Note that under this kconfig, preempt
count is maintained in a 64bit word however preempt_count() still
remains as an int because all the effective bits still fit in
(previously we mask out NEED_RESCHED bit in preempt_count()). This
should make no functional changes for existing preempt_count() users.

Enable this for x86_64 along with the introduction of the Kconfig.

Originally-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun@kernel.org>
---
 arch/x86/Kconfig               |  1 +
 arch/x86/include/asm/preempt.h | 55 +++++++++++++++++++++++-----------
 arch/x86/kernel/cpu/common.c   |  2 +-
 include/linux/hardirq.h        | 50 +++++++++++++++++++++++--------
 include/linux/preempt.h        | 20 +++++++------
 kernel/Kconfig.preempt         |  4 +++
 kernel/sched/core.c            | 12 ++++++--
 kernel/softirq.c               |  6 ++++
 lib/locking-selftest.c         |  2 +-
 9 files changed, 109 insertions(+), 43 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..bf8288b3d52b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -327,6 +327,7 @@ config X86
 	select USER_STACKTRACE_SUPPORT
 	select HAVE_ARCH_KCSAN			if X86_64
 	select PROC_PID_ARCH_STATUS		if PROC_FS
+	select HAS_SEPARATE_PREEMPT_RESCHED_BITS		if X86_64
 	select HAVE_ARCH_NODE_DEV_GROUP		if X86_SGX
 	select FUNCTION_ALIGNMENT_16B		if X86_64 || X86_ALIGNMENT_16
 	select FUNCTION_ALIGNMENT_4B
diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 1220656f3370..12353eeebc52 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -7,10 +7,20 @@
 
 #include <linux/static_call_types.h>
 
-DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
+DECLARE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count);
 
-/* We use the MSB mostly because its available */
-#define PREEMPT_NEED_RESCHED	0x80000000
+/*
+ * We use the MSB for PREEMPT_NEED_RESCHED mostly because it is available.
+ */
+#define PREEMPT_NEED_RESCHED	(~(((unsigned long)-1L) >> 1))
+
+#ifdef CONFIG_HAS_SEPARATE_PREEMPT_RESCHED_BITS
+#define __pc_dec		"decq"
+#define __pc_op(op, ...)	raw_cpu_##op##_8(__VA_ARGS__)
+#else
+#define __pc_dec		"decl"
+#define __pc_op(op, ...)	raw_cpu_##op##_4(__VA_ARGS__)
+#endif
 
 /*
  * We use the PREEMPT_NEED_RESCHED bit as an inverted NEED_RESCHED such
@@ -24,18 +34,26 @@ DECLARE_PER_CPU_CACHE_HOT(int, __preempt_count);
  */
 static __always_inline int preempt_count(void)
 {
-	return raw_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
+	return __pc_op(read, __preempt_count) & ~PREEMPT_NEED_RESCHED;
 }
 
-static __always_inline void preempt_count_set(int pc)
+/*
+ * unsigned long preempt count parameter works for both 32bit and 64bit cases:
+ *
+ * - For 32bit, "int" (the return of preempt_count()) and "unsigned long" have
+ *   the same size.
+ * - For 64bit, the effective bits of a preempt count sits in 32bit, and we
+ *   reserve the NEED_RESCHED bit from the old count.
+ */
+static __always_inline void preempt_count_set(unsigned long pc)
 {
-	int old, new;
+	unsigned long old, new;
 
-	old = raw_cpu_read_4(__preempt_count);
+	old = __pc_op(read, __preempt_count);
 	do {
 		new = (old & PREEMPT_NEED_RESCHED) |
 			(pc & ~PREEMPT_NEED_RESCHED);
-	} while (!raw_cpu_try_cmpxchg_4(__preempt_count, &old, new));
+	} while (!__pc_op(try_cmpxchg, __preempt_count, &old, new));
 }
 
 /*
@@ -58,17 +76,17 @@ static __always_inline void preempt_count_set(int pc)
 
 static __always_inline void set_preempt_need_resched(void)
 {
-	raw_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
+	__pc_op(and, __preempt_count, ~PREEMPT_NEED_RESCHED);
 }
 
 static __always_inline void clear_preempt_need_resched(void)
 {
-	raw_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
+	__pc_op(or, __preempt_count, PREEMPT_NEED_RESCHED);
 }
 
 static __always_inline bool test_preempt_need_resched(void)
 {
-	return !(raw_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
+	return !(__pc_op(read, __preempt_count) & PREEMPT_NEED_RESCHED);
 }
 
 /*
@@ -77,22 +95,22 @@ static __always_inline bool test_preempt_need_resched(void)
 
 static __always_inline void __preempt_count_add(int val)
 {
-	raw_cpu_add_4(__preempt_count, val);
+	__pc_op(add, __preempt_count, val);
 }
 
 static __always_inline void __preempt_count_sub(int val)
 {
-	raw_cpu_add_4(__preempt_count, -val);
+	__pc_op(add, __preempt_count, -val);
 }
 
 static __always_inline int __preempt_count_add_return(int val)
 {
-	return raw_cpu_add_return_4(__preempt_count, val);
+	return __pc_op(add_return, __preempt_count, val);
 }
 
 static __always_inline int __preempt_count_sub_return(int val)
 {
-	return raw_cpu_add_return_4(__preempt_count, -val);
+	return __pc_op(add_return, __preempt_count, -val);
 }
 
 /*
@@ -102,7 +120,7 @@ static __always_inline int __preempt_count_sub_return(int val)
  */
 static __always_inline bool __preempt_count_dec_and_test(void)
 {
-	return GEN_UNARY_RMWcc("decl", __my_cpu_var(__preempt_count), e,
+	return GEN_UNARY_RMWcc(__pc_dec, __my_cpu_var(__preempt_count), e,
 			       __percpu_arg([var]));
 }
 
@@ -111,7 +129,7 @@ static __always_inline bool __preempt_count_dec_and_test(void)
  */
 static __always_inline bool should_resched(int preempt_offset)
 {
-	return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
+	return unlikely(__pc_op(read, __preempt_count) == preempt_offset);
 }
 
 #ifdef CONFIG_PREEMPTION
@@ -158,4 +176,7 @@ do { \
 
 #endif /* PREEMPTION */
 
+#undef __pc_op
+#undef __pc_dec
+
 #endif /* __ASM_PREEMPT_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a4268c47f2bc..182772b6ad6d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2240,7 +2240,7 @@ DEFINE_PER_CPU_CACHE_HOT(struct task_struct *, current_task) = &init_task;
 EXPORT_PER_CPU_SYMBOL(current_task);
 EXPORT_PER_CPU_SYMBOL(const_current_task);
 
-DEFINE_PER_CPU_CACHE_HOT(int, __preempt_count) = INIT_PREEMPT_COUNT;
+DEFINE_PER_CPU_CACHE_HOT(unsigned long, __preempt_count) = INIT_PREEMPT_COUNT;
 EXPORT_PER_CPU_SYMBOL(__preempt_count);
 
 DEFINE_PER_CPU_CACHE_HOT(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK;
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 1a0360a1000f..26e106b0dc30 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -10,8 +10,6 @@
 #include <linux/vtime.h>
 #include <asm/hardirq.h>
 
-DECLARE_PER_CPU(unsigned int, nmi_nesting);
-
 extern void synchronize_irq(unsigned int irq);
 extern bool synchronize_hardirq(unsigned int irq);
 
@@ -94,6 +92,40 @@ void irq_exit_rcu(void);
 #define arch_nmi_exit()		do { } while (0)
 #endif
 
+#ifdef CONFIG_HAS_SEPARATE_PREEMPT_RESCHED_BITS
+static __always_inline void __preempt_count_nmi_enter(void)
+{
+	__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);
+}
+
+static __always_inline void __preempt_count_nmi_exit(void)
+{
+	__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);
+}
+#else
+DECLARE_PER_CPU(unsigned int, nmi_nesting);
+
+#define __preempt_count_nmi_enter()				\
+	do {							\
+		unsigned int _o = NMI_MASK + HARDIRQ_OFFSET;	\
+		/* Maximum NMI nesting is 15. */		\
+		BUG_ON(__this_cpu_read(nmi_nesting) >= 15);	\
+		__this_cpu_inc(nmi_nesting);			\
+		_o -= (preempt_count() & NMI_MASK);		\
+		__preempt_count_add(_o);			\
+	} while (0)
+
+#define __preempt_count_nmi_exit()				\
+	do {							\
+		unsigned int _o = HARDIRQ_OFFSET;		\
+		if (!__this_cpu_dec_return(nmi_nesting))	\
+			_o += NMI_MASK;				\
+		__preempt_count_sub(_o);			\
+	} while (0)
+
+#endif
+
+
 /*
  * NMI vs Tracing
  * --------------
@@ -110,18 +142,14 @@ void irq_exit_rcu(void);
 	do {							\
 		lockdep_off();					\
 		arch_nmi_enter();				\
-		/* Maximum NMI nesting is 15. */		\
-		BUG_ON(__this_cpu_read(nmi_nesting) >= 15);	\
-		__this_cpu_inc(nmi_nesting);			\
-		__preempt_count_add(HARDIRQ_OFFSET);		\
-		preempt_count_set(preempt_count() | NMI_MASK);	\
+		__preempt_count_nmi_enter();			\
 	} while (0)
 
 #define nmi_enter()						\
 	do {							\
 		__nmi_enter();					\
 		lockdep_hardirq_enter();			\
-		ct_nmi_enter();				\
+		ct_nmi_enter();					\
 		instrumentation_begin();			\
 		ftrace_nmi_enter();				\
 		instrumentation_end();				\
@@ -129,12 +157,8 @@ void irq_exit_rcu(void);
 
 #define __nmi_exit()						\
 	do {							\
-		unsigned int nesting;				\
 		BUG_ON(!in_nmi());				\
-		__preempt_count_sub(HARDIRQ_OFFSET);		\
-		nesting = __this_cpu_dec_return(nmi_nesting);	\
-		if (!nesting)					\
-			__preempt_count_sub(NMI_OFFSET);	\
+		__preempt_count_nmi_exit();			\
 		arch_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 33fc4c814a9f..87d5367f986c 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -30,18 +30,20 @@
  * NMI nesting depth is tracked in a separate per-CPU variable
  * (nmi_nesting) to save bits in preempt_count.
  *
- *         PREEMPT_MASK:	0x000000ff
- *         SOFTIRQ_MASK:	0x0000ff00
- * HARDIRQ_DISABLE_MASK:	0x00ff0000
- *         HARDIRQ_MASK:	0x0f000000
- *             NMI_MASK:	0x10000000
- * PREEMPT_NEED_RESCHED:	0x80000000
+ *				32bit		HAS_SEPARATE_PREEMPT_RESCHED_BITS
+ *
+ *         PREEMPT_MASK:	0x000000ff	0x00000000000000ff
+ *         SOFTIRQ_MASK:	0x0000ff00	0x000000000000ff00
+ * HARDIRQ_DISABLE_MASK:	0x00ff0000	0x0000000000ff0000
+ *         HARDIRQ_MASK:	0x0f000000	0x000000000f000000
+ *             NMI_MASK:	0x10000000	0x00000000f0000000
+ * PREEMPT_NEED_RESCHED:	0x80000000	0x8000000000000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_DISABLE_BITS	8
 #define HARDIRQ_BITS	4
-#define NMI_BITS	1
+#define NMI_BITS	(1 + 3*IS_ENABLED(CONFIG_HAS_SEPARATE_PREEMPT_RESCHED_BITS))
 
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
@@ -116,8 +118,8 @@ static __always_inline unsigned char interrupt_context_level(void)
  * preempt_count() is commonly implemented with READ_ONCE().
  */
 
-#define nmi_count()	(preempt_count() & NMI_MASK)
-#define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
+#define nmi_count()		(preempt_count() & NMI_MASK)
+#define hardirq_count()		(preempt_count() & HARDIRQ_MASK)
 #ifdef CONFIG_PREEMPT_RT
 # define softirq_count()	(current->softirq_disable_cnt & SOFTIRQ_MASK)
 # define irq_count()		((preempt_count() & (NMI_MASK | HARDIRQ_MASK)) | softirq_count())
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 88c594c6d7fc..35f546a042b1 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -122,6 +122,10 @@ config PREEMPT_RT_NEEDS_BH_LOCK
 config PREEMPT_COUNT
        bool
 
+config HAS_SEPARATE_PREEMPT_RESCHED_BITS
+	bool
+	depends on PREEMPT_COUNT && 64BIT
+
 config PREEMPTION
        bool
        select PREEMPT_COUNT
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 636e6a15f104..f4c944878516 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5847,8 +5847,13 @@ void preempt_count_add(int val)
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Underflow?
+	 *
+	 * Cannot detect underflow based on the current preempt_count() value
+	 * if using HAS_SEPARATE_PREEMPT_RESCHED_BITS because preempt count takes all 32
+	 * bits.
 	 */
-	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
+	if (!IS_ENABLED(CONFIG_HAS_SEPARATE_PREEMPT_RESCHED_BITS) &&
+	    DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
 		return;
 #endif
 	__preempt_count_add(val);
@@ -5880,7 +5885,10 @@ void preempt_count_sub(int val)
 	/*
 	 * Underflow?
 	 */
-	if (DEBUG_LOCKS_WARN_ON(val > preempt_count()))
+	unsigned int uval = val;
+	unsigned int pc = preempt_count();
+
+	if (DEBUG_LOCKS_WARN_ON(pc - uval > pc))
 		return;
 	/*
 	 * Is the spinlock portion underflowing?
diff --git a/kernel/softirq.c b/kernel/softirq.c
index d1ab1799794c..491136a313db 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -91,7 +91,13 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
 DEFINE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
 EXPORT_PER_CPU_SYMBOL_GPL(local_interrupt_disable_state);
 
+#ifndef CONFIG_HAS_SEPARATE_PREEMPT_RESCHED_BITS
+/*
+ * Any 32bit architecture that still cares about performance should
+ * probably ensure this is near preempt_count.
+ */
 DEFINE_PER_CPU(unsigned int, nmi_nesting);
+#endif
 
 /*
  * SOFTIRQ_OFFSET usage:
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index d939403331b5..8fd216bd0be6 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1429,7 +1429,7 @@ static int unexpected_testcase_failures;
 
 static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask)
 {
-	int saved_preempt_count = preempt_count();
+	long saved_preempt_count = preempt_count();
 #ifdef CONFIG_PREEMPT_RT
 #ifdef CONFIG_SMP
 	int saved_mgd_count = current->migration_disabled;
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 11/12] arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (9 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 10/12] preempt: Introduce HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-28 10:50   ` Peter Zijlstra
  2026-05-26 15:21 ` [PATCH v2 12/12] s390/preempt: " Boqun Feng
  2026-05-27 16:18 ` [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Peter Zijlstra
  12 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

ARM64 already uses 64bit preempt count and the need reschedule bit is
maintained in a separate 32bit than the preempt count. Therefore preempt
count has enough bits to represent 16 level of NMI nesting, hence enable
it for ARM64. This saves a per-CPU variable and additional instructions
in the NMI path.

Signed-off-by: Boqun Feng <boqun@kernel.org>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943..8178cb857115 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -248,6 +248,7 @@ config ARM64
 	select PCI_SYSCALL if PCI
 	select POWER_RESET
 	select POWER_SUPPLY
+	select HAS_SEPARATE_PREEMPT_RESCHED_BITS
 	select SPARSE_IRQ
 	select SWIOTLB
 	select SYSCTL_EXCEPTION_TRACE
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 12/12] s390/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (10 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 11/12] arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
@ 2026-05-26 15:21 ` Boqun Feng
  2026-05-28 10:53   ` Peter Zijlstra
  2026-05-27 16:18 ` [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Peter Zijlstra
  12 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2026-05-26 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Boqun Feng, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

From: Heiko Carstens <hca@linux.ibm.com>

Convert s390's preempt_count to 64 bit, and change the preempt
primitives accordingly.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260509181249.16281C67-hca@linux.ibm.com
---
 arch/s390/Kconfig               |  1 +
 arch/s390/include/asm/lowcore.h | 13 +++++++----
 arch/s390/include/asm/preempt.h | 41 +++++++++++++++------------------
 3 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index ecbcbb781e40..cbbca82f8443 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -276,6 +276,7 @@ config S390
 	select PCI_MSI			if PCI
 	select PCI_MSI_ARCH_FALLBACKS	if PCI_MSI
 	select PCI_QUIRKS		if PCI
+	select HAS_SEPARATE_PREEMPT_RESCHED_BITS
 	select SPARSE_IRQ
 	select SWIOTLB
 	select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 50ffe75adeb4..0974ab278169 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -160,10 +160,15 @@ struct lowcore {
 	/* SMP info area */
 	__u32	cpu_nr;				/* 0x03a0 */
 	__u32	softirq_pending;		/* 0x03a4 */
-	__s32	preempt_count;			/* 0x03a8 */
-	__u32	spinlock_lockval;		/* 0x03ac */
-	__u32	spinlock_index;			/* 0x03b0 */
-	__u8	pad_0x03b4[0x03b8-0x03b4];	/* 0x03b4 */
+	union {
+		struct {
+			__u32	need_resched;	/* 0x03a8 */
+			__u32	count;		/* 0x03ac */
+		} preempt;
+		__u64	preempt_count;		/* 0x03a8 */
+	};
+	__u32	spinlock_lockval;		/* 0x03b0 */
+	__u32	spinlock_index;			/* 0x03b4 */
 	__u64	percpu_offset;			/* 0x03b8 */
 	__u8	pad_0x03c0[0x0400-0x03c0];	/* 0x03c0 */
 
diff --git a/arch/s390/include/asm/preempt.h b/arch/s390/include/asm/preempt.h
index 0a25d4648b4c..1d5e4d7e9e1b 100644
--- a/arch/s390/include/asm/preempt.h
+++ b/arch/s390/include/asm/preempt.h
@@ -8,11 +8,8 @@
 #include <asm/cmpxchg.h>
 #include <asm/march.h>
 
-/*
- * Use MSB so it is possible to read preempt_count with LLGT which
- * reads the least significant 31 bits with a single instruction.
- */
-#define PREEMPT_NEED_RESCHED	0x80000000
+/* Use MSB for PREEMPT_NEED_RESCHED mostly because it is available. */
+#define PREEMPT_NEED_RESCHED	0x8000000000000000UL
 
 /*
  * We use the PREEMPT_NEED_RESCHED bit as an inverted NEED_RESCHED such
@@ -26,25 +23,25 @@
  */
 static __always_inline int preempt_count(void)
 {
-	unsigned long lc_preempt, count;
+	unsigned long lc_preempt;
+	int count;
 
-	BUILD_BUG_ON(sizeof_field(struct lowcore, preempt_count) != sizeof(int));
-	lc_preempt = offsetof(struct lowcore, preempt_count);
+	lc_preempt = offsetof(struct lowcore, preempt.count);
 	/* READ_ONCE(get_lowcore()->preempt_count) & ~PREEMPT_NEED_RESCHED */
 	asm_inline(
-		ALTERNATIVE("llgt	%[count],%[offzero](%%r0)\n",
-			    "llgt	%[count],%[offalt](%%r0)\n",
+		ALTERNATIVE("ly		%[count],%[offzero](%%r0)\n",
+			    "ly		%[count],%[offalt](%%r0)\n",
 			    ALT_FEATURE(MFEATURE_LOWCORE))
 		: [count] "=d" (count)
 		: [offzero] "i" (lc_preempt),
 		  [offalt] "i" (lc_preempt + LOWCORE_ALT_ADDRESS),
-		  "m" (((struct lowcore *)0)->preempt_count));
+		  "m" (((struct lowcore *)0)->preempt.count));
 	return count;
 }
 
-static __always_inline void preempt_count_set(int pc)
+static __always_inline void preempt_count_set(unsigned long pc)
 {
-	int old, new;
+	unsigned long old, new;
 
 	old = READ_ONCE(get_lowcore()->preempt_count);
 	do {
@@ -63,12 +60,12 @@ static __always_inline void preempt_count_set(int pc)
 
 static __always_inline void set_preempt_need_resched(void)
 {
-	__atomic_and(~PREEMPT_NEED_RESCHED, &get_lowcore()->preempt_count);
+	__atomic64_and(~PREEMPT_NEED_RESCHED, (long *)&get_lowcore()->preempt_count);
 }
 
 static __always_inline void clear_preempt_need_resched(void)
 {
-	__atomic_or(PREEMPT_NEED_RESCHED, &get_lowcore()->preempt_count);
+	__atomic64_or(PREEMPT_NEED_RESCHED, (long *)&get_lowcore()->preempt_count);
 }
 
 static __always_inline bool test_preempt_need_resched(void)
@@ -88,8 +85,8 @@ static __always_inline void __preempt_count_add(int val)
 
 			lc_preempt = offsetof(struct lowcore, preempt_count);
 			asm_inline(
-				ALTERNATIVE("asi	%[offzero](%%r0),%[val]\n",
-					    "asi	%[offalt](%%r0),%[val]\n",
+				ALTERNATIVE("agsi	%[offzero](%%r0),%[val]\n",
+					    "agsi	%[offalt](%%r0),%[val]\n",
 					    ALT_FEATURE(MFEATURE_LOWCORE))
 				: "+m" (((struct lowcore *)0)->preempt_count)
 				: [offzero] "i" (lc_preempt), [val] "i" (val),
@@ -98,7 +95,7 @@ static __always_inline void __preempt_count_add(int val)
 			return;
 		}
 	}
-	__atomic_add(val, &get_lowcore()->preempt_count);
+	__atomic64_add(val, (long *)&get_lowcore()->preempt_count);
 }
 
 static __always_inline void __preempt_count_sub(int val)
@@ -119,15 +116,15 @@ static __always_inline bool __preempt_count_dec_and_test(void)
 
 	lc_preempt = offsetof(struct lowcore, preempt_count);
 	asm_inline(
-		ALTERNATIVE("alsi	%[offzero](%%r0),%[val]\n",
-			    "alsi	%[offalt](%%r0),%[val]\n",
+		ALTERNATIVE("algsi	%[offzero](%%r0),%[val]\n",
+			    "algsi	%[offalt](%%r0),%[val]\n",
 			    ALT_FEATURE(MFEATURE_LOWCORE))
 		: "=@cc" (cc), "+m" (((struct lowcore *)0)->preempt_count)
 		: [offzero] "i" (lc_preempt), [val] "i" (-1),
 		[offalt] "i" (lc_preempt + LOWCORE_ALT_ADDRESS));
 	return (cc == 0) || (cc == 2);
 #else
-	return __atomic_add_const_and_test(-1, &get_lowcore()->preempt_count);
+	return __atomic64_add_const_and_test(-1, (long *)&get_lowcore()->preempt_count);
 #endif
 }
 
@@ -141,7 +138,7 @@ static __always_inline bool should_resched(int preempt_offset)
 
 static __always_inline int __preempt_count_add_return(int val)
 {
-	return val + __atomic_add(val, &get_lowcore()->preempt_count);
+	return val + __atomic64_add(val, (long *)&get_lowcore()->preempt_count);
 }
 
 static __always_inline int __preempt_count_sub_return(int val)
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling
  2026-05-26 15:21 ` [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
@ 2026-05-26 16:19   ` bot+bpf-ci
  2026-05-28 10:43   ` Peter Zijlstra
  1 sibling, 0 replies; 24+ messages in thread
From: bot+bpf-ci @ 2026-05-26 16:19 UTC (permalink / raw)
  To: boqun, peterz
  Cc: catalin.marinas, will, jonas, stefan.kristiansson, shorne, hca,
	gor, agordeev, borntraeger, svens, tglx, mingo, bp, dave.hansen,
	x86, hpa, arnd, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, kprateek.nayak, boqun,
	longman, akpm, andrii, eddyz87, ast, daniel, martin.lau, memxor,
	song, yonghong.song, jolsa, shuah, ojeda, gary, bjorn3_gh, lossin,
	a.hindborg, aliceryhl, tmgross, dakr, ruanjinjie, lyude, thuth,
	sohil.mehta, xin

[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]

> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 241277cd34cf..9d6012ac929d 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h

[ ... ]

> @@ -427,6 +444,11 @@ static __always_inline bool _spin_trylock_irqsave(spinlock_t *lock, unsigned lon
>  }
>  #define spin_trylock_irqsave(lock, flags) _spin_trylock_irqsave(lock, &(flags))
>
> +static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
> +{
> +	return raw_spin_trylock_irq_disable(&lock->rlock);
> +}
> +

Should spin_trylock_irq_disable() include the sparse and clang
context-analysis annotations that all other spin_trylock_* variants
carry?

Looking at the other trylock variants in the same file:

spin_trylock():
    static __always_inline int spin_trylock(spinlock_t *lock)
        __cond_acquires(true, lock) __no_context_analysis

spin_trylock_bh():
    static __always_inline int spin_trylock_bh(spinlock_t *lock)
        __cond_acquires(true, lock) __no_context_analysis

spin_trylock_irq():
    static __always_inline int spin_trylock_irq(spinlock_t *lock)
        __cond_acquires(true, lock) __no_context_analysis

_spin_trylock_irqsave():
    static __always_inline bool _spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags)
        __cond_acquires(true, lock) __no_context_analysis

The RT counterpart in include/linux/spinlock_rt.h has the
__cond_acquires(true, lock) annotation:

    static __always_inline int spin_trylock_irq_disable(spinlock_t *lock)
        __cond_acquires(true, lock)

And the underlying _raw_spin_trylock_irq_disable() in spinlock_api_smp.h
also has __cond_acquires(true, lock):

    static __always_inline bool _raw_spin_trylock_irq_disable(raw_spinlock_t *lock)
        __cond_acquires(true, lock)

The wrapper spin_trylock_irq_disable() in spinlock.h is the only place
where these annotations are missing. This would break sparse static
analysis and clang's thread-safety analysis for this function, causing
inconsistent static-checking behavior compared to all peer trylock APIs.


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26458754996

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1)
  2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
                   ` (11 preceding siblings ...)
  2026-05-26 15:21 ` [PATCH v2 12/12] s390/preempt: " Boqun Feng
@ 2026-05-27 16:18 ` Peter Zijlstra
  2026-05-27 16:33   ` Boqun Feng
  12 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2026-05-27 16:18 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

On Tue, May 26, 2026 at 08:21:36AM -0700, Boqun Feng wrote:
> Hi Peter,
> 
> This is a follow-up for Lyude's work [1]. After learning the current
> preempt_count() usage and how ARM64 handle this, I came up with this
> series that could resolve your feedback [2]. The basic idea is based on:
> 
> 1) preempt_count() previously already masks our NEED_RESCHED bit, so the
>    effective bits is 31bits
> 2) with a 64bit preempt count implementation (as in your PREEMPT_LONG
>    proposal), the effective bits that record "whether we CAN preempt or
>    not" still fit in 32bit (i.e. an int)
> 
> as a result, I don't think we need to change the existing
> preempt_count() API, but rather keep "32bit vs 64bit" as an
> implementation detail. This saves us the need to change the printk code
> for preempt_count().

> 
> v1: https://lore.kernel.org/rust-for-linux/20260508042111.24358-1-boqun@kernel.org/
> 
> Changes since v1:
> 
> * Rename PREEMPT_COUNT_64BIT to HAS_SEPARATE_PREEMPT_RESCHED_BITS per
>   Mark Rutland.

Blergh, so I really don't like that new name. It isn't that
PREEMPT_RESCHED is separate, it really is a 64bit preempt count.

Shashiko has a few fits, but its mostly being stupid. Although I think
it might be useful to perhaps put a WARN_ON_ONCE(in_nmi()) in
local_interrupt_disable().

Anyway, I'll re-read things again tomorrow, but I suppose this will do.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1)
  2026-05-27 16:18 ` [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Peter Zijlstra
@ 2026-05-27 16:33   ` Boqun Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-27 16:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

On Wed, May 27, 2026 at 06:18:44PM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2026 at 08:21:36AM -0700, Boqun Feng wrote:
> > Hi Peter,
> > 
> > This is a follow-up for Lyude's work [1]. After learning the current
> > preempt_count() usage and how ARM64 handle this, I came up with this
> > series that could resolve your feedback [2]. The basic idea is based on:
> > 
> > 1) preempt_count() previously already masks our NEED_RESCHED bit, so the
> >    effective bits is 31bits
> > 2) with a 64bit preempt count implementation (as in your PREEMPT_LONG
> >    proposal), the effective bits that record "whether we CAN preempt or
> >    not" still fit in 32bit (i.e. an int)
> > 
> > as a result, I don't think we need to change the existing
> > preempt_count() API, but rather keep "32bit vs 64bit" as an
> > implementation detail. This saves us the need to change the printk code
> > for preempt_count().
> 
> > 
> > v1: https://lore.kernel.org/rust-for-linux/20260508042111.24358-1-boqun@kernel.org/
> > 
> > Changes since v1:
> > 
> > * Rename PREEMPT_COUNT_64BIT to HAS_SEPARATE_PREEMPT_RESCHED_BITS per
> >   Mark Rutland.
> 
> Blergh, so I really don't like that new name. It isn't that
> PREEMPT_RESCHED is separate, it really is a 64bit preempt count.
> 
> Shashiko has a few fits, but its mostly being stupid. Although I think
> it might be useful to perhaps put a WARN_ON_ONCE(in_nmi()) in
> local_interrupt_disable().

Yes, that's also what I'm thinking of, and also we need to check
local_softirq_pending() in local_interrupt_enable() I think, because now
HARDIRQ_DISABLE bits being non-zero means deferring softirq. (Shashiko
also reported this as well).

> 
> Anyway, I'll re-read things again tomorrow, but I suppose this will do.
> 

Thank you!

Regards,
Boqun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling
  2026-05-26 15:21 ` [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
  2026-05-26 16:19   ` bot+bpf-ci
@ 2026-05-28 10:43   ` Peter Zijlstra
  2026-05-28 14:31     ` Boqun Feng
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2026-05-28 10:43 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

On Tue, May 26, 2026 at 08:21:41AM -0700, Boqun Feng wrote:

> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> index e2d3079d3f5f..33fc4c814a9f 100644
> --- a/include/linux/preempt.h
> +++ b/include/linux/preempt.h
> @@ -151,6 +151,10 @@ static __always_inline unsigned char interrupt_context_level(void)
>  #define in_softirq()		(softirq_count())
>  #define in_interrupt()		(irq_count())
>  
> +#define hardirq_disable_count()	((preempt_count() & HARDIRQ_DISABLE_MASK) >> HARDIRQ_DISABLE_SHIFT)
> +#define hardirq_disable_enter()	__preempt_count_add_return(HARDIRQ_DISABLE_OFFSET)
> +#define hardirq_disable_exit()	__preempt_count_sub_return(HARDIRQ_DISABLE_OFFSET)
> +
>  /*
>   * The preempt_count offset after preempt_disable();
>   */

> diff --git a/include/linux/interrupt_rc.h b/include/linux/interrupt_rc.h
> new file mode 100644
> index 000000000000..868f32524a87
> --- /dev/null
> +++ b/include/linux/interrupt_rc.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * include/linux/interrupt_rc.h - refcounted local processor interrupt
> + * management.
> + *
> + * Since the implementation of this API currently depends on
> + * local_irq_save()/local_irq_restore(), we split this into it's own header to
> + * make it easier to include without hitting circular header dependencies.
> + */
> +
> +#ifndef __LINUX_INTERRUPT_RC_H
> +#define __LINUX_INTERRUPT_RC_H
> +
> +#include <linux/irqflags.h>
> +#include <asm/processor.h>
> +#ifdef CONFIG_SMP
> +#include <asm/smp.h>
> +#endif
> +
> +/* Per-cpu interrupt disabling state for local_interrupt_{disable,enable}() */
> +struct interrupt_disable_state {
> +	unsigned long flags;
> +};
> +
> +DECLARE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
> +
> +static inline void local_interrupt_disable(void)
> +{
> +	unsigned long flags;
> +	int new_count;
> +
> +	new_count = hardirq_disable_enter();
> +
> +	/* Interrupts can happen here, but it's OK, see __irq_exit_rcu(). */
> +
> +	if ((new_count & HARDIRQ_DISABLE_MASK) == HARDIRQ_DISABLE_OFFSET) {
> +		local_irq_save(flags);
> +		raw_cpu_write(local_interrupt_disable_state.flags, flags);
> +	}
> +}
> +
> +static inline void local_interrupt_enable(void)
> +{
> +	int new_count;
> +
> +	new_count = hardirq_disable_exit();
> +
> +	if ((new_count & HARDIRQ_DISABLE_MASK) == 0) {
> +		unsigned long flags;
> +
> +		flags = raw_cpu_read(local_interrupt_disable_state.flags);
> +		local_irq_restore(flags);
> +		/*
> +		 * TODO: re-read preempt count can be avoided, but it needs
> +		 * should_resched() taking another parameter as the current
> +		 * preempt count
> +		 */
> +#ifdef CONFIG_PREEMPTION
> +		if (should_resched(0))
> +			__preempt_schedule();

I'm not sure why you bother with should_resched() at this point, can't
you simply write:

		if (!new_count)
			__preempt_schedule();

> +#endif
> +	}
> +}
> +
> +#endif /* !__LINUX_INTERRUPT_RC_H */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 07/12] locking: Switch to _irq_{disable,enable}() variants in cleanup guards
  2026-05-26 15:21 ` [PATCH v2 07/12] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
@ 2026-05-28 10:45   ` Peter Zijlstra
  2026-05-28 14:31     ` Boqun Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2026-05-28 10:45 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

On Tue, May 26, 2026 at 08:21:43AM -0700, Boqun Feng wrote:
> From: Boqun Feng <boqun.feng@gmail.com>
> 
> The semantics of various irq disabling guards match what
> *_irq_{disable,enable}() provide, i.e. the interrupt disabling is
> properly nested, therefore it's OK to switch to use
> *_irq_{disable,enable}() primitives.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Boqun Feng <boqun@kernel.org>

You really need them both? ;-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 11/12] arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS
  2026-05-26 15:21 ` [PATCH v2 11/12] arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
@ 2026-05-28 10:50   ` Peter Zijlstra
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2026-05-28 10:50 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

On Tue, May 26, 2026 at 08:21:47AM -0700, Boqun Feng wrote:
> ARM64 already uses 64bit preempt count and the need reschedule bit is
> maintained in a separate 32bit than the preempt count. Therefore preempt
> count has enough bits to represent 16 level of NMI nesting, hence enable
> it for ARM64. This saves a per-CPU variable and additional instructions
> in the NMI path.

Egads, so ARM being load-store gets around the preempt bit scribble by
moving it into a separate word. And while that works, that does *not*
make the preempt_count 64bit.

All of this really only works because the actual preempt count bits
still fit inside a u32. The moment that changes, this comes unstuck :-(

And I suppose this is the reason Mark wanted that name change.

I suppose Power could employ the same scheme..


> Signed-off-by: Boqun Feng <boqun@kernel.org>
> ---
>  arch/arm64/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fe60738e5943..8178cb857115 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -248,6 +248,7 @@ config ARM64
>  	select PCI_SYSCALL if PCI
>  	select POWER_RESET
>  	select POWER_SUPPLY
> +	select HAS_SEPARATE_PREEMPT_RESCHED_BITS
>  	select SPARSE_IRQ
>  	select SWIOTLB
>  	select SYSCTL_EXCEPTION_TRACE
> -- 
> 2.50.1 (Apple Git-155)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 12/12] s390/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS
  2026-05-26 15:21 ` [PATCH v2 12/12] s390/preempt: " Boqun Feng
@ 2026-05-28 10:53   ` Peter Zijlstra
  2026-05-28 14:41     ` Boqun Feng
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2026-05-28 10:53 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

On Tue, May 26, 2026 at 08:21:48AM -0700, Boqun Feng wrote:
> From: Heiko Carstens <hca@linux.ibm.com>
> 
> Convert s390's preempt_count to 64 bit, and change the preempt
> primitives accordingly.
> 
> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
> Signed-off-by: Boqun Feng <boqun@kernel.org>
> Link: https://patch.msgid.link/20260509181249.16281C67-hca@linux.ibm.com
> ---
>  arch/s390/Kconfig               |  1 +
>  arch/s390/include/asm/lowcore.h | 13 +++++++----
>  arch/s390/include/asm/preempt.h | 41 +++++++++++++++------------------
>  3 files changed, 29 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index ecbcbb781e40..cbbca82f8443 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -276,6 +276,7 @@ config S390
>  	select PCI_MSI			if PCI
>  	select PCI_MSI_ARCH_FALLBACKS	if PCI_MSI
>  	select PCI_QUIRKS		if PCI
> +	select HAS_SEPARATE_PREEMPT_RESCHED_BITS
>  	select SPARSE_IRQ
>  	select SWIOTLB
>  	select SYSCTL_EXCEPTION_TRACE
> diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
> index 50ffe75adeb4..0974ab278169 100644
> --- a/arch/s390/include/asm/lowcore.h
> +++ b/arch/s390/include/asm/lowcore.h
> @@ -160,10 +160,15 @@ struct lowcore {
>  	/* SMP info area */
>  	__u32	cpu_nr;				/* 0x03a0 */
>  	__u32	softirq_pending;		/* 0x03a4 */
> -	__s32	preempt_count;			/* 0x03a8 */
> -	__u32	spinlock_lockval;		/* 0x03ac */
> -	__u32	spinlock_index;			/* 0x03b0 */
> -	__u8	pad_0x03b4[0x03b8-0x03b4];	/* 0x03b4 */
> +	union {
> +		struct {
> +			__u32	need_resched;	/* 0x03a8 */
> +			__u32	count;		/* 0x03ac */
> +		} preempt;
> +		__u64	preempt_count;		/* 0x03a8 */
> +	};

I'm a little confused by this union; afaict it isn't actually used.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling
  2026-05-28 10:43   ` Peter Zijlstra
@ 2026-05-28 14:31     ` Boqun Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-28 14:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

On Thu, May 28, 2026 at 12:43:22PM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2026 at 08:21:41AM -0700, Boqun Feng wrote:
> 
> > diff --git a/include/linux/preempt.h b/include/linux/preempt.h
> > index e2d3079d3f5f..33fc4c814a9f 100644
> > --- a/include/linux/preempt.h
> > +++ b/include/linux/preempt.h
> > @@ -151,6 +151,10 @@ static __always_inline unsigned char interrupt_context_level(void)
> >  #define in_softirq()		(softirq_count())
> >  #define in_interrupt()		(irq_count())
> >  
> > +#define hardirq_disable_count()	((preempt_count() & HARDIRQ_DISABLE_MASK) >> HARDIRQ_DISABLE_SHIFT)
> > +#define hardirq_disable_enter()	__preempt_count_add_return(HARDIRQ_DISABLE_OFFSET)
> > +#define hardirq_disable_exit()	__preempt_count_sub_return(HARDIRQ_DISABLE_OFFSET)
> > +
> >  /*
> >   * The preempt_count offset after preempt_disable();
> >   */
> 
> > diff --git a/include/linux/interrupt_rc.h b/include/linux/interrupt_rc.h
> > new file mode 100644
> > index 000000000000..868f32524a87
> > --- /dev/null
> > +++ b/include/linux/interrupt_rc.h
> > @@ -0,0 +1,65 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * include/linux/interrupt_rc.h - refcounted local processor interrupt
> > + * management.
> > + *
> > + * Since the implementation of this API currently depends on
> > + * local_irq_save()/local_irq_restore(), we split this into it's own header to
> > + * make it easier to include without hitting circular header dependencies.
> > + */
> > +
> > +#ifndef __LINUX_INTERRUPT_RC_H
> > +#define __LINUX_INTERRUPT_RC_H
> > +
> > +#include <linux/irqflags.h>
> > +#include <asm/processor.h>
> > +#ifdef CONFIG_SMP
> > +#include <asm/smp.h>
> > +#endif
> > +
> > +/* Per-cpu interrupt disabling state for local_interrupt_{disable,enable}() */
> > +struct interrupt_disable_state {
> > +	unsigned long flags;
> > +};
> > +
> > +DECLARE_PER_CPU(struct interrupt_disable_state, local_interrupt_disable_state);
> > +
> > +static inline void local_interrupt_disable(void)
> > +{
> > +	unsigned long flags;
> > +	int new_count;
> > +
> > +	new_count = hardirq_disable_enter();
> > +
> > +	/* Interrupts can happen here, but it's OK, see __irq_exit_rcu(). */
> > +
> > +	if ((new_count & HARDIRQ_DISABLE_MASK) == HARDIRQ_DISABLE_OFFSET) {
> > +		local_irq_save(flags);
> > +		raw_cpu_write(local_interrupt_disable_state.flags, flags);
> > +	}
> > +}
> > +
> > +static inline void local_interrupt_enable(void)
> > +{
> > +	int new_count;
> > +
> > +	new_count = hardirq_disable_exit();
> > +
> > +	if ((new_count & HARDIRQ_DISABLE_MASK) == 0) {
> > +		unsigned long flags;
> > +
> > +		flags = raw_cpu_read(local_interrupt_disable_state.flags);
> > +		local_irq_restore(flags);
> > +		/*
> > +		 * TODO: re-read preempt count can be avoided, but it needs
> > +		 * should_resched() taking another parameter as the current
> > +		 * preempt count
> > +		 */
> > +#ifdef CONFIG_PREEMPTION
> > +		if (should_resched(0))
> > +			__preempt_schedule();
> 
> I'm not sure why you bother with should_resched() at this point, can't
> you simply write:
> 
> 		if (!new_count)
> 			__preempt_schedule();
> 

I was trying to not re-invent the wheel for "checking whether we can
preempt" (because the definition of preempt condition might be changed
in the future?), but yes I think directly using new_count should be fine
here. Although I think I need to check tif_need_resched() for other
architectures?

Regards,
Boqun

> > +#endif
> > +	}
> > +}
> > +
> > +#endif /* !__LINUX_INTERRUPT_RC_H */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 07/12] locking: Switch to _irq_{disable,enable}() variants in cleanup guards
  2026-05-28 10:45   ` Peter Zijlstra
@ 2026-05-28 14:31     ` Boqun Feng
  0 siblings, 0 replies; 24+ messages in thread
From: Boqun Feng @ 2026-05-28 14:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida, Boqun Feng

On Thu, May 28, 2026 at 12:45:29PM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2026 at 08:21:43AM -0700, Boqun Feng wrote:
> > From: Boqun Feng <boqun.feng@gmail.com>
> > 
> > The semantics of various irq disabling guards match what
> > *_irq_{disable,enable}() provide, i.e. the interrupt disabling is
> > properly nested, therefore it's OK to switch to use
> > *_irq_{disable,enable}() primitives.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > Signed-off-by: Boqun Feng <boqun@kernel.org>
> 
> You really need them both? ;-)

Nah, I will clean this up ;-)

Regards,
Boqun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 12/12] s390/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS
  2026-05-28 10:53   ` Peter Zijlstra
@ 2026-05-28 14:41     ` Boqun Feng
  2026-05-28 15:18       ` Heiko Carstens
  0 siblings, 1 reply; 24+ messages in thread
From: Boqun Feng @ 2026-05-28 14:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Catalin Marinas, Will Deacon, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Arnd Bergmann, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Waiman Long, Andrew Morton, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Song Liu,
	Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Jinjie Ruan, Lyude Paul,
	Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

On Thu, May 28, 2026 at 12:53:25PM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2026 at 08:21:48AM -0700, Boqun Feng wrote:
> > From: Heiko Carstens <hca@linux.ibm.com>
> > 
> > Convert s390's preempt_count to 64 bit, and change the preempt
> > primitives accordingly.
> > 
> > Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
> > Signed-off-by: Boqun Feng <boqun@kernel.org>
> > Link: https://patch.msgid.link/20260509181249.16281C67-hca@linux.ibm.com
> > ---
> >  arch/s390/Kconfig               |  1 +
> >  arch/s390/include/asm/lowcore.h | 13 +++++++----
> >  arch/s390/include/asm/preempt.h | 41 +++++++++++++++------------------
> >  3 files changed, 29 insertions(+), 26 deletions(-)
> > 
> > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> > index ecbcbb781e40..cbbca82f8443 100644
> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -276,6 +276,7 @@ config S390
> >  	select PCI_MSI			if PCI
> >  	select PCI_MSI_ARCH_FALLBACKS	if PCI_MSI
> >  	select PCI_QUIRKS		if PCI
> > +	select HAS_SEPARATE_PREEMPT_RESCHED_BITS
> >  	select SPARSE_IRQ
> >  	select SWIOTLB
> >  	select SYSCTL_EXCEPTION_TRACE
> > diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
> > index 50ffe75adeb4..0974ab278169 100644
> > --- a/arch/s390/include/asm/lowcore.h
> > +++ b/arch/s390/include/asm/lowcore.h
> > @@ -160,10 +160,15 @@ struct lowcore {
> >  	/* SMP info area */
> >  	__u32	cpu_nr;				/* 0x03a0 */
> >  	__u32	softirq_pending;		/* 0x03a4 */
> > -	__s32	preempt_count;			/* 0x03a8 */
> > -	__u32	spinlock_lockval;		/* 0x03ac */
> > -	__u32	spinlock_index;			/* 0x03b0 */
> > -	__u8	pad_0x03b4[0x03b8-0x03b4];	/* 0x03b4 */
> > +	union {
> > +		struct {
> > +			__u32	need_resched;	/* 0x03a8 */
> > +			__u32	count;		/* 0x03ac */
> > +		} preempt;
> > +		__u64	preempt_count;		/* 0x03a8 */
> > +	};
> 
> I'm a little confused by this union; afaict it isn't actually used.

(TIL: s390 is big endian)

In preempt_count() the union is used for reading the lower 32bits in an
asm block.

Regards,
Boqun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 12/12] s390/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS
  2026-05-28 14:41     ` Boqun Feng
@ 2026-05-28 15:18       ` Heiko Carstens
  0 siblings, 0 replies; 24+ messages in thread
From: Heiko Carstens @ 2026-05-28 15:18 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Catalin Marinas, Will Deacon, Jonas Bonn,
	Stefan Kristiansson, Stafford Horne, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Arnd Bergmann, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Waiman Long, Andrew Morton,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Shuah Khan, Miguel Ojeda,
	Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Alice Ryhl, Trevor Gross, Danilo Krummrich, Jinjie Ruan,
	Lyude Paul, Thomas Huth, Sohil Mehta, Xin Li (Intel), Pawan Gupta,
	Nikunj A Dadhania, Joel Fernandes, Andy Shevchenko, Randy Dunlap,
	Yury Norov, Sebastian Andrzej Siewior, linux-kernel,
	linux-openrisc, linux-s390, linux-arch, bpf, linux-kselftest,
	rust-for-linux, Onur Özkan, Daniel Almeida

On Thu, May 28, 2026 at 07:41:24AM -0700, Boqun Feng wrote:
> On Thu, May 28, 2026 at 12:53:25PM +0200, Peter Zijlstra wrote:
> > On Tue, May 26, 2026 at 08:21:48AM -0700, Boqun Feng wrote:
> > > From: Heiko Carstens <hca@linux.ibm.com>
> > > diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
> > > index 50ffe75adeb4..0974ab278169 100644
> > > --- a/arch/s390/include/asm/lowcore.h
> > > +++ b/arch/s390/include/asm/lowcore.h
> > > @@ -160,10 +160,15 @@ struct lowcore {
> > >  	/* SMP info area */
> > >  	__u32	cpu_nr;				/* 0x03a0 */
> > >  	__u32	softirq_pending;		/* 0x03a4 */
> > > -	__s32	preempt_count;			/* 0x03a8 */
> > > -	__u32	spinlock_lockval;		/* 0x03ac */
> > > -	__u32	spinlock_index;			/* 0x03b0 */
> > > -	__u8	pad_0x03b4[0x03b8-0x03b4];	/* 0x03b4 */
> > > +	union {
Ä> > > +		struct {
> > > +			__u32	need_resched;	/* 0x03a8 */
> > > +			__u32	count;		/* 0x03ac */
> > > +		} preempt;
> > > +		__u64	preempt_count;		/* 0x03a8 */
> > > +	};
> > 
> > I'm a little confused by this union; afaict it isn't actually used.
> 
> (TIL: s390 is big endian)
> 
> In preempt_count() the union is used for reading the lower 32bits in an
> asm block.

Yes, it is used there to read only the lower 32 bits (count) in order
to avoid masking out the higher 32 bits, which would be required with
a 64 bit load.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-05-28 15:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 15:21 [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
2026-05-26 15:21 ` [PATCH v2 01/12] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
2026-05-26 15:21 ` [PATCH v2 02/12] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
2026-05-26 15:21 ` [PATCH v2 03/12] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
2026-05-26 15:21 ` [PATCH v2 04/12] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
2026-05-26 15:21 ` [PATCH v2 05/12] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
2026-05-26 16:19   ` bot+bpf-ci
2026-05-28 10:43   ` Peter Zijlstra
2026-05-28 14:31     ` Boqun Feng
2026-05-26 15:21 ` [PATCH v2 06/12] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
2026-05-26 15:21 ` [PATCH v2 07/12] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
2026-05-28 10:45   ` Peter Zijlstra
2026-05-28 14:31     ` Boqun Feng
2026-05-26 15:21 ` [PATCH v2 08/12] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
2026-05-26 15:21 ` [PATCH v2 09/12] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
2026-05-26 15:21 ` [PATCH v2 10/12] preempt: Introduce HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
2026-05-26 15:21 ` [PATCH v2 11/12] arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
2026-05-28 10:50   ` Peter Zijlstra
2026-05-26 15:21 ` [PATCH v2 12/12] s390/preempt: " Boqun Feng
2026-05-28 10:53   ` Peter Zijlstra
2026-05-28 14:41     ` Boqun Feng
2026-05-28 15:18       ` Heiko Carstens
2026-05-27 16:18 ` [PATCH v2 00/12] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Peter Zijlstra
2026-05-27 16:33   ` Boqun Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox