All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boqun Feng <boqun@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Catalin Marinas" <catalin.marinas@arm.com>,
	"Will Deacon" <will@kernel.org>,
	"Jonas Bonn" <jonas@southpole.se>,
	"Stefan Kristiansson" <stefan.kristiansson@saunalahti.fi>,
	"Stafford Horne" <shorne@gmail.com>,
	"Heiko Carstens" <hca@linux.ibm.com>,
	"Vasily Gorbik" <gor@linux.ibm.com>,
	"Alexander Gordeev" <agordeev@linux.ibm.com>,
	"Christian Borntraeger" <borntraeger@linux.ibm.com>,
	"Sven Schnelle" <svens@linux.ibm.com>,
	"Thomas Gleixner" <tglx@kernel.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	"Valentin Schneider" <vschneid@redhat.com>,
	"K Prateek Nayak" <kprateek.nayak@amd.com>,
	"Boqun Feng" <boqun@kernel.org>,
	"Waiman Long" <longman@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Miguel Ojeda" <ojeda@kernel.org>, "Gary Guo" <gary@garyguo.net>,
	"Björn Roy Baron" <bjorn3_gh@protonmail.com>,
	"Benno Lossin" <lossin@kernel.org>,
	"Andreas Hindborg" <a.hindborg@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>,
	"Trevor Gross" <tmgross@umich.edu>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Jinjie Ruan" <ruanjinjie@huawei.com>,
	"Ada Couprie Diaz" <ada.coupriediaz@arm.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Sohil Mehta" <sohil.mehta@intel.com>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Xin Li (Intel)" <xin@zytor.com>,
	"Sean Christopherson" <seanjc@google.com>,
	"Nikunj A Dadhania" <nikunj@amd.com>,
	"Joel Fernandes" <joelagnelf@nvidia.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Yury Norov" <ynorov@nvidia.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-openrisc@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-arch@vger.kernel.org,
	rust-for-linux@vger.kernel.org,
	"Boqun Feng" <boqun.feng@gmail.com>,
	"Joel Fernandes" <joelaf@google.com>
Subject: [PATCH 02/11] preempt: Track NMI nesting to separate per-CPU counter
Date: Thu,  7 May 2026 21:21:02 -0700	[thread overview]
Message-ID: <20260508042111.24358-3-boqun@kernel.org> (raw)
In-Reply-To: <20260508042111.24358-1-boqun@kernel.org>

From: Joel Fernandes <joelagnelf@nvidia.com>

Move NMI nesting tracking from the preempt_count bits to a separate per-CPU
counter (nmi_nesting). This is to free up the NMI bits in the preempt_count,
allowing those bits to be repurposed for other uses.  This also has the benefit
of tracking more than 16-levels deep if there is ever a need.

Reduce multiple bits in preempt_count for NMI tracking. Reduce NMI_BITS
from 3 to 1, using it only to detect if we're in an NMI.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Boqun Feng <boqun@kernel.org>
Link: https://patch.msgid.link/20260121223933.1568682-3-lyude@redhat.com
---
 include/linux/hardirq.h | 16 ++++++++++++----
 include/linux/preempt.h | 13 +++++++++----
 kernel/softirq.c        |  2 ++
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d57cab4d4c06..cc06bda52c3e 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -10,6 +10,8 @@
 #include <linux/vtime.h>
 #include <asm/hardirq.h>
 
+DECLARE_PER_CPU(unsigned int, nmi_nesting);
+
 extern void synchronize_irq(unsigned int irq);
 extern bool synchronize_hardirq(unsigned int irq);
 
@@ -102,14 +104,16 @@ void irq_exit_rcu(void);
  */
 
 /*
- * nmi_enter() can nest up to 15 times; see NMI_BITS.
+ * nmi_enter() can nest - nesting is tracked in a per-CPU counter.
  */
 #define __nmi_enter()						\
 	do {							\
 		lockdep_off();					\
 		arch_nmi_enter();				\
-		BUG_ON(in_nmi() == NMI_MASK);			\
-		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX);	\
+		__this_cpu_inc(nmi_nesting);			\
+		__preempt_count_add(HARDIRQ_OFFSET);		\
+		preempt_count_set(preempt_count() | NMI_MASK);	\
 	} while (0)
 
 #define nmi_enter()						\
@@ -124,8 +128,12 @@ void irq_exit_rcu(void);
 
 #define __nmi_exit()						\
 	do {							\
+		unsigned int nesting;				\
 		BUG_ON(!in_nmi());				\
-		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
+		__preempt_count_sub(HARDIRQ_OFFSET);		\
+		nesting = __this_cpu_dec_return(nmi_nesting);	\
+		if (!nesting)					\
+			__preempt_count_sub(NMI_OFFSET);	\
 		arch_nmi_exit();				\
 		lockdep_on();					\
 	} while (0)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index f07e7f37f3ca..e2d3079d3f5f 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -18,6 +18,8 @@
  * - bits 0-7 are the preemption count (max preemption depth: 256)
  * - bits 8-15 are the softirq count (max # of softirqs: 256)
  * - bits 16-23 are the hardirq disable count (max # of hardirq disable: 256)
+ * - bits 24-27 are the hardirq count (max # of hardirqs: 16)
+ * - bit 28 is the NMI flag (no nesting count, tracked separately)
  *
  * The hardirq count could in theory be the same as the number of
  * interrupts in the system, but we run all interrupt handlers with
@@ -25,18 +27,21 @@
  * there are a few palaeontologic drivers which reenable interrupts in
  * the handler, so we need more than one bit here.
  *
+ * NMI nesting depth is tracked in a separate per-CPU variable
+ * (nmi_nesting) to save bits in preempt_count.
+ *
  *         PREEMPT_MASK:	0x000000ff
  *         SOFTIRQ_MASK:	0x0000ff00
  * HARDIRQ_DISABLE_MASK:	0x00ff0000
- *         HARDIRQ_MASK:	0x07000000
- *             NMI_MASK:	0x38000000
+ *         HARDIRQ_MASK:	0x0f000000
+ *             NMI_MASK:	0x10000000
  * PREEMPT_NEED_RESCHED:	0x80000000
  */
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_DISABLE_BITS	8
-#define HARDIRQ_BITS	3
-#define NMI_BITS	3
+#define HARDIRQ_BITS	4
+#define NMI_BITS	1
 
 #define PREEMPT_SHIFT	0
 #define SOFTIRQ_SHIFT	(PREEMPT_SHIFT + PREEMPT_BITS)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 4425d8dce44b..10af5ed859e7 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -88,6 +88,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(hardirqs_enabled);
 EXPORT_PER_CPU_SYMBOL_GPL(hardirq_context);
 #endif
 
+DEFINE_PER_CPU(unsigned int, nmi_nesting);
+
 /*
  * SOFTIRQ_OFFSET usage:
  *
-- 
2.50.1 (Apple Git-155)


  parent reply	other threads:[~2026-05-08  4:21 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  4:21 [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
2026-05-08  4:21 ` [PATCH 01/11] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
2026-05-12 16:28   ` Steven Rostedt
2026-05-08  4:21 ` Boqun Feng [this message]
2026-05-12 16:30   ` [PATCH 02/11] preempt: Track NMI nesting to separate per-CPU counter Steven Rostedt
2026-05-12 19:22     ` Joel Fernandes
2026-05-12 22:16       ` Boqun Feng
2026-05-12 23:33         ` Joel Fernandes
2026-05-08  4:21 ` [PATCH 03/11] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
2026-05-09 18:09   ` Heiko Carstens
2026-05-08  4:21 ` [PATCH 04/11] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
2026-05-08  4:21 ` [PATCH 05/11] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
2026-05-08  4:21 ` [PATCH 06/11] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
2026-05-08  4:21 ` [PATCH 07/11] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
2026-05-08  4:21 ` [PATCH 08/11] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
2026-05-08  4:21 ` [PATCH 09/11] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
2026-05-08  4:21 ` [PATCH 10/11] preempt: Introduce PREEMPT_COUNT_64BIT Boqun Feng
2026-05-08  4:21 ` [PATCH 11/11] arm64: sched/preempt: Enable PREEMPT_COUNT_64BIT Boqun Feng
2026-05-08  8:22   ` Mark Rutland
2026-05-08 14:48     ` Boqun Feng
2026-05-09 18:12 ` [PATCH 00/11] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Heiko Carstens
2026-05-09 18:21   ` Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508042111.24358-3-boqun@kernel.org \
    --to=boqun@kernel.org \
    --cc=a.hindborg@kernel.org \
    --cc=ada.coupriediaz@arm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aliceryhl@google.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dakr@kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gary@garyguo.net \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=joelaf@google.com \
    --cc=joelagnelf@nvidia.com \
    --cc=jonas@southpole.se \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-openrisc@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=lossin@kernel.org \
    --cc=lyude@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=nikunj@amd.com \
    --cc=ojeda@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=ruanjinjie@huawei.com \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=shorne@gmail.com \
    --cc=sohil.mehta@intel.com \
    --cc=stefan.kristiansson@saunalahti.fi \
    --cc=svens@linux.ibm.com \
    --cc=tglx@kernel.org \
    --cc=tmgross@umich.edu \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xin@zytor.com \
    --cc=ynorov@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.