BPF List
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Boqun Feng" <boqun@kernel.org>
Cc: bpf@vger.kernel.org, ojeda@kernel.org
Subject: Re: [PATCH v3 01/13] preempt: Track NMI nesting to separate per-CPU counter
Date: Fri, 05 Jun 2026 05:59:17 +0000	[thread overview]
Message-ID: <20260605055917.755661F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260605054128.5925-2-boqun@kernel.org>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Arithmetic overflow of `HARDIRQ_COUNT` carrying into `NMI_MASK` permanently corrupts `preempt_count` and causes a fatal kernel panic on nested NMI exit.
--

commit fea264f43839ad70d356b7c0bdb7708892c51c22
Author: Joel Fernandes <joelagnelf@nvidia.com>

preempt: Track NMI nesting to separate per-CPU counter

Move NMI nesting tracking from the preempt_count bits to a separate per-CPU
counter (nmi_nesting). This frees up the NMI bits in the preempt_count,
allowing those bits to be repurposed for other uses.

Link: https://patch.msgid.link/20260121223933.1568682-3-lyude@redhat.com

> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -102,14 +104,17 @@ void irq_exit_rcu(void);
>   */
>  
>  /*
> - * nmi_enter() can nest up to 15 times; see NMI_BITS.
> + * nmi_enter() can nest - nesting is tracked in a per-CPU counter.
>   */
>  #define __nmi_enter()						\
>  	do {							\
>  		lockdep_off();					\
>  		arch_nmi_enter();				\
> -		BUG_ON(in_nmi() == NMI_MASK);			\
> -		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
> +		/* Maximum NMI nesting is 15. */		\
> +		BUG_ON(__this_cpu_read(nmi_nesting) >= 15);	\
> +		__this_cpu_inc(nmi_nesting);			\
> +		__preempt_count_add(HARDIRQ_OFFSET);		\
> +		preempt_count_set(preempt_count() | NMI_MASK);	\
>  	} while (0)

[ ... ]

>  #define nmi_enter()						\
> @@ -124,8 +129,12 @@ void irq_exit_rcu(void);
>  
>  #define __nmi_exit()						\
>  	do {							\
> +		unsigned int nesting;				\
>  		BUG_ON(!in_nmi());				\
> -		__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET);	\
> +		__preempt_count_sub(HARDIRQ_OFFSET);		\
> +		nesting = __this_cpu_dec_return(nmi_nesting);	\
> +		if (!nesting)					\
> +			preempt_count_set(preempt_count() & ~NMI_MASK);	\
>  		arch_nmi_exit();				\
>  		lockdep_on();					\
>  	} while (0)

[Severity: High]
Does this combination of __preempt_count_add() and bitwise OR in __nmi_enter()
break arithmetic symmetry when NMIs nest deeply?

Because HARDIRQ_BITS is 4, it can hold a maximum value of 15. The patch
explicitly allows NMI nesting up to 15 levels. If an NMI interrupts an active
hardirq (where the hardirq count is already 1) and then nests 15 times, the
total number of __preempt_count_add(HARDIRQ_OFFSET) operations reaches 16.

This overflows the 4-bit HARDIRQ field and carries a 1 into the NMI_MASK bit
(bit 20). The overflow mathematically clears bit 20 and sets bit 21. By then
forcing bit 20 back to 1 with a bitwise OR, the state is corrupted.

When the innermost NMI exits, __preempt_count_sub(HARDIRQ_OFFSET) subtracts
from this corrupted state, borrowing from bit 20 and clearing it. Because the
inner NMI does not explicitly restore NMI_MASK on exit (as nesting is 14),
it returns to the outer NMI with bit 20 cleared.

When the outer NMI eventually exits, won't it fail the BUG_ON(!in_nmi()) in
__nmi_exit() and trigger a kernel panic?

Also, doesn't this permanently leak a carry bit into the higher fields of
preempt_count (e.g., bit 21), which can accumulate and corrupt other states
like PREEMPT_NEED_RESCHED?

I note that a later patch in the series (commit "preempt: Introduce
HAS_SEPARATE_PREEMPT_RESCHED_BITS") fixes this for 64-bit architectures by
restoring the old algebraic logic, but 32-bit architectures without
CONFIG_HAS_SEPARATE_PREEMPT_RESCHED_BITS appear to continue using this flawed
bitwise-OR logic and remain vulnerable.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260605054128.5925-1-boqun@kernel.org?part=1

  reply	other threads:[~2026-06-05  5:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05  5:41 [PATCH v3 00/13] Refcounted interrupt disable and SpinLockIrq for rust (Part 1) Boqun Feng
2026-06-05  5:41 ` [PATCH v3 01/13] preempt: Track NMI nesting to separate per-CPU counter Boqun Feng
2026-06-05  5:59   ` sashiko-bot [this message]
2026-06-05  5:41 ` [PATCH v3 02/13] preempt: Introduce HARDIRQ_DISABLE_BITS Boqun Feng
2026-06-05  6:01   ` sashiko-bot
2026-06-05  5:41 ` [PATCH v3 03/13] preempt: Introduce __preempt_count_{sub, add}_return() Boqun Feng
2026-06-05  5:59   ` sashiko-bot
2026-06-05  6:30   ` bot+bpf-ci
2026-06-05  6:45     ` Boqun Feng
2026-06-05  5:41 ` [PATCH v3 04/13] openrisc: Include <linux/cpumask.h> in smp.h Boqun Feng
2026-06-05  5:41 ` [PATCH v3 05/13] irq & spin_lock: Add counted interrupt disabling/enabling Boqun Feng
2026-06-05  6:01   ` sashiko-bot
2026-06-05  6:27     ` Boqun Feng
2026-06-05  6:30   ` bot+bpf-ci
2026-06-05  6:40     ` Boqun Feng
2026-06-05  5:41 ` [PATCH v3 06/13] irq: Add KUnit test for refcounted interrupt enable/disable Boqun Feng
2026-06-05  5:53   ` sashiko-bot
2026-06-05  5:41 ` [PATCH v3 07/13] locking: Switch to _irq_{disable,enable}() variants in cleanup guards Boqun Feng
2026-06-05  5:57   ` sashiko-bot
2026-06-05  5:41 ` [PATCH v3 08/13] sched: Remove the unused preempt_offset parameter of __cant_sleep() Boqun Feng
2026-06-05  5:41 ` [PATCH v3 09/13] sched: Avoid signed comparison of preempt_count() in __cant_migrate() Boqun Feng
2026-06-05  5:41 ` [PATCH v3 10/13] preempt: Introduce HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
2026-06-05  5:59   ` sashiko-bot
2026-06-05  6:30   ` bot+bpf-ci
2026-06-05  5:41 ` [PATCH v3 11/13] arm64: sched/preempt: Enable HAS_SEPARATE_PREEMPT_RESCHED_BITS Boqun Feng
2026-06-05  5:41 ` [PATCH v3 12/13] s390/preempt: " Boqun Feng
2026-06-05  5:41 ` [PATCH v3 13/13] irq: Optimize reschedule check in local_interrupt_enable() Boqun Feng
2026-06-05  6:04   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605055917.755661F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=boqun@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=ojeda@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox