Re: [PATCH] arm64/trap: fix broken ct->nmi_nesting when die() is called in a kthread

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Will Deacon <will@kernel.org>
Cc: mark.rutland@arm.com, ptosi@google.com, catalin.marinas@arm.com,
	bigeasy@linutronix.de, linux-kernel@vger.kernel.org,
	kristina.martsenko@arm.com, mcgrof@kernel.org,
	broonie@kernel.org, geert@linux-m68k.org, joey.gouly@arm.com,
	james.morse@arm.com, pcc@google.com, rppt@kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/trap: fix broken ct->nmi_nesting when die() is called in a kthread
Date: Mon, 2 Jun 2025 15:54:19 +0100	[thread overview]
Message-ID: <aD27GxSWsFekORcy@e129823.arm.com> (raw)
In-Reply-To: <20250602124738.GD1227@willie-the-truck>

Hi Will,

> [+Ada]
>
> On Fri, May 30, 2025 at 10:27:23AM +0100, Yeoreum Yun wrote:
> > When a kernel thread hits BUG() outside of an interrupt handler and
> > panic_on_oops is not set, it exits via make_task_dead(), which is called by die().
> > In this case, the nmi_nesting value in context_tracking becomes
> > inconsistent because the proper context tracking exit functions are not reached.
> >
> > Here’s an example scenario on arm64:
> >   1. A kernel thread hits the BUG() macro outside an interrupt handler,
> >      and panic_on_oops is not set (ct->nmi_nesting == CT_NESTING_IRQ_NONIDLE).
> >
> >   2. The exception handler jumps to el1_dbg() and calls arm64_enter_el1_dbg(),
> >      which invokes ct_nmi_enter(). (ct->nmi_nesting == CT_NESTING_IRQ_NONIDLE + 2)
> >
> >   3. bug_handler() runs, and if the bug type is BUG_TRAP_TYPE_BUG, it calls die().
> >
> >   4. die() then calls make_task_dead(), which terminates the kernel thread and
> >      schedules another task—assume this is the idle_task.
>
> This sounds like there is a genuine imbalance, then, as we're scheduling
> in the context of an exception taken from EL1.

TBH, this "scheduling" is called in do_exit() to kill BUG()
happend task:

 el1_dbg()
    -> arm64_enter_el1_dbg()
      -> do_debug_exception()
        -> die()
         -> make_task_dead
           -> do_exit()
             -> schedule()
    // unreachable
    -> arm64_exit_el1_dbg()

Because arm64_enter_el1_dbg() always call ct_nmi_enter(),
If do_debug_exception determined to call die(), there is no point to
call ct_nmi_exit().

>
> >   5. The idle_task attempts to enter the idle state by calling ct_idle_enter().
> >      However, since the current ct->nmi_nesting value is CT_NESTING_IRQ_NONIDLE + 2,
> >      ct_kernel_exit() triggers a WARN_ON_ONCE() warning.
> >
> > Because the kernel thread couldn’t call the appropriate context tracking exit
> > function in step 3, the ct->nmi_nesting value remains incorrect.
> > This leads to warnings like the following:
> >
> > [    7.133093] ------------[ cut here ]------------
> > [    7.133129] WARNING: CPU: 2 PID: 0 at kernel/context_tracking.c:127 ct_kernel
> > [    7.134157] Modules linked in:
> > [    7.134158]     not ok 62 kasan_strings
> > [    7.134280]
> > [    7.134506] CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Tainted: G    B D W        N
> > [    7.134930] Tainted: [B]=BAD_PAGE, [D]=DIE, [W]=WARN, [N]=TEST
> > [    7.135150] pstate: 204003c5 (nzCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    7.135441] pc : ct_kernel_exit+0xa4/0xb0
> > [    7.135656] lr : ct_kernel_exit+0x1c/0xb0
> > [    7.135866] sp : ffff8000829bbd90
> > [    7.136011] x29: ffff8000829bbd90 x28: ffff80008224ecf0 x27: 0000000000000004
> > [    7.136379] x26: ffff80008228ed30 x25: ffff80008228e000 x24: 0000000000000000
> > [    7.137016] x23: f3ff000800a52280 x22: 0000000000000000 x21: ffff00087b6c7408
> > [    7.137380] x20: ffff80008224b408 x19: 0000000000000005 x18: 0000000000000000
> > [    7.137741] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > [    7.311316] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> > [    7.311673] x11: 0000000000000000 x10: 0000000000000000 x9 : 4000000000000000
> > [    7.312031] x8 : 4000000000000002 x7 : 0000000000000000 x6 : 0000000000000000
> > [    7.312387] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
> > [    7.312740] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff8007f947c000
> > [    7.313096] Call trace:
> > [    7.313210]  ct_kernel_exit+0xa4/0xb0 (P)
> > [    7.313445]  ct_idle_enter+0x14/0x28
> > [    7.313666]  default_idle_call+0x2c/0x60
> > [    7.313902]  do_idle+0xec/0x320
> > [    7.314104]  cpu_startup_entry+0x40/0x50
> > [    7.314331]  secondary_start_kernel+0x120/0x1a0
> >
> > This behavior is specific to the arm64 architecture, where ct_nmi_enter()
> > is called when handling a debug exception.
> > In contrast, on other architectures, ct_nmi_enter() is not called when
> > handling BUG().
> > (i.e) when handling X86_TRAP_UD via handle_invalid_op(), it doesn't call
> > ct_nmi_enter(). so it doesn’t cause any issues
> > (While irq_entry_enter() does call ct_nmi_enter() for idle tasks,
> > that doesn’t apply to debug exception handling).
>
> It sounds like you're suggesting that we don't update the
> context-tracking NMI state for BRK exceptions from EL1, to align
> with x86.

If el1_dbg() doesn't be called in idle_task(),
I think it doesn't need to call ct_nmi_enter() in arm64_enter_el1_debug()
since its nmi_nesting is always >= CT_NESTING_IRQ_NONIDLE and RCU wathcing this cpu.

But, it seems el1_dbg() could be called ct_idle_enter() and ct_idle_exit().
actually this situation seems be possible in theory when
some idle code have BUG() -- i.e) cpuidle driver's enter callback have BUG().
However, this case triggers another quetions. what happen if idle_task was
killed (I think it seems panic() case...)

So, If arm64_enter_el1_debug() doesn't need to call the ct_nmi_enter()
instead, __nmi_enter() should be called only for idle_task().

Am I wrong?

> I think Ada's pending series might make that easier, but then
> the patch you propose:
>
> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > index 529cff825531..9cf03b9ce691 100644
> > --- a/arch/arm64/kernel/traps.c
> > +++ b/arch/arm64/kernel/traps.c
> > @@ -227,8 +227,14 @@ void die(const char *str, struct pt_regs *regs, long err)
> >
> >  	raw_spin_unlock_irqrestore(&die_lock, flags);
> >
> > -	if (ret != NOTIFY_STOP)
> > +	if (ret != NOTIFY_STOP) {
> > +#ifdef CONFIG_CONTEXT_TRACKING_IDLE
> > +		long nmi_nesting = ct_nmi_nesting();
> > +		if (nmi_nesting && nmi_nesting != CT_NESTING_IRQ_NONIDLE)
> > +			ct_nmi_exit();
> > +#endif
>
> tries to undo the context-tracking when we realise we're going to kill
> the task, which feels like a hack.

Although her patches is applied,
I think this problem still exist if arm64_enter_el1_dbg() calls ct_nmi_enter().
I agree it's a hacky way for handling kernel task die() in debug
exception since in case of user task will be killed via signal.
However, unless arm64_enter_el1_dbg() calls ct_nmi_enter(),
In my narrow view, it seems the best...

--
Sincerely,
Yeoreum Yun

next prev parent reply	other threads:[~2025-06-02 14:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-30  9:27 [PATCH] arm64/trap: fix broken ct->nmi_nesting when die() is called in a kthread Yeoreum Yun
2025-06-02 12:47 ` Will Deacon
2025-06-02 14:54   ` Yeoreum Yun [this message]
2025-06-02 15:11     ` Mark Rutland
2025-06-02 17:50       ` Yeoreum Yun
2025-06-03  9:44         ` Mark Rutland
2025-06-03 11:14           ` Yeoreum Yun
2025-06-03 13:46             ` Mark Rutland
2025-06-03 15:23               ` Yeoreum Yun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aD27GxSWsFekORcy@e129823.arm.com \
    --to=yeoreum.yun@arm.com \
    --cc=bigeasy@linutronix.de \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=geert@linux-m68k.org \
    --cc=james.morse@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kristina.martsenko@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mcgrof@kernel.org \
    --cc=pcc@google.com \
    --cc=ptosi@google.com \
    --cc=rppt@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox