CSD lockup during kexec due to unbounded busy-wait in pl011_console_write

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
@ 2025-11-25 16:02 Breno Leitao
  2025-11-26 14:13 ` Breno Leitao
  2025-11-28 16:08 ` Petr Mladek
  0 siblings, 2 replies; 11+ messages in thread
From: Breno Leitao @ 2025-11-25 16:02 UTC (permalink / raw)
  To: john.ogness, pmladek, linux, paulmck
  Cc: usamaarif642, leo.yan, linux-arm-kernel, linux-kernel,
	kernel-team, rmikey

Hello,

I am reporting a CSD lockup issue that occurs during kexec on ARM64 hosts,
which I have traced to the amba-pl011 serial driver waiting for hardware with
IRQs disabled in the nbcon atomic write path.

PROBLEM SUMMARY:
================
During kexec, a CSD lockup occurs when pl011_console_write_atomic() performs
an unbounded busy-wait for hardware synchronization while IRQs are disabled.
This blocks other CPUs for extended periods (>11 seconds observed), triggering
CSD lock timeouts.

KERNEL VERSION:
===============
Observed on kernel 6.13, but the code path appears similar in upstream.

ERROR MESSAGE:
==============
  mlx5_core 0000:03:00.0: Shutdown was called
  kvm: exiting hardware virtualization
  arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0x00000104, hwcons 0x00000102]
  smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000032 ns for CPU#00 do_nothing (kernel/smp.c:1057)
  smp:     csd: CSD lock (#1) unresponsive.
  Sending NMI from CPU 4 to CPUs 0:
  NMI backtrace for cpu 0
  pstate: 03401009 (nzcv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
  pc : pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540)
  lr : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:292 drivers/tty/serial/amba-pl011.c:298 drivers/tty/serial/amba-pl011.c:2539)
  sp : ffff80010e26fae0
  pmr: 000000c0
  x29: ffff80010e26fae0 x28: ffff800082ddb000 x27: 00000000000000e0
  x26: 0000000000000001 x25: ffff8000826a8de8 x24: 00000000000008eb
  x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
  x20: ffff00009c19c880 x19: ffff80010e26fb88 x18: 0000000000000018
  x17: 696f70646e452065 x16: 4943502032303830 x15: 3130783020737361
  x14: 6c63203030206570 x13: 746e696f70646e45 x12: 0000000000000000
  x11: 0000000000000008 x10: 0000000000000000 x9 : ffff800081888d80
  x8 : 0000000000000018 x7 : 205d313332363336 x6 : 362e31202020205b
  x5 : ffff000097d4700f x4 : ffff80010e26f99f x3 : ffff800081125220
  x2 : 0000000000000052 x1 : 000000000000000a x0 : ffff00009c19c880
  Call trace:
  pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
  nbcon_emit_next_record (kernel/printk/nbcon.c:1049)
  __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517)
  __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/printk/nbcon.c:1612)
  nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629)
  printk_kthreads_shutdown (kernel/printk/printk.c:?)
  syscore_shutdown (drivers/base/syscore.c:120)
  kernel_kexec (kernel/kexec_core.c:1045)
  __arm64_sys_reboot (kernel/reboot.c:794 kernel/reboot.c:722 kernel/reboot.c:722)
  invoke_syscall (arch/arm64/kernel/syscall.c:50)
  el0_svc_common.llvm.14158405452757855239 (arch/arm64/kernel/syscall.c:?)
  do_el0_svc (arch/arm64/kernel/syscall.c:152)
  el0_svc (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:73 arch/arm64/kernel/entry-common.c:169 arch/arm64/kernel/entry-common.c:182 arch/arm64/kernel/entry-common.c:749)
  el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:820)
  el0t_64_sync (arch/arm64/kernel/entry.S:600)
  smp: csd: Re-sending CSD lock (#1) IPI from CPU#04 to CPU#00
  Workqueue: events_unbound toggle_allocation_gate

  Call trace:
  show_stack (arch/arm64/kernel/stacktrace.c:503) (C)
  dump_stack_lvl (lib/dump_stack.c:122)
  smp_call_function_many_cond.llvm.3022501501692466737 (lib/dump_stack.c:? kernel/smp.c:305 kernel/smp.c:326 kernel/smp.c:336 kernel/smp.c:884)
  kick_all_cpus_sync (kernel/smp.c:1076)
  __jump_label_update (kernel/jump_label.c:522)
  jump_label_update (kernel/jump_label.c:921)
  static_key_enable_cpuslocked (kernel/jump_label.c:?)
  toggle_allocation_gate (kernel/jump_label.c:224 mm/kfence/core.c:849)
  process_scheduled_works (kernel/workqueue.c:3245 kernel/workqueue.c:3321)
  worker_thread (./include/linux/list.h:373 kernel/workqueue.c:950 kernel/workqueue.c:3403)
  kthread (kernel/kthread.c:391)
  ret_from_fork (arch/arm64/kernel/entry.S:863)
  smp: csd: CSD lock (#1) got unstuck on CPU#04, CPU#00 released the lock.
  kexec_core: Starting new kernel

ROOT CAUSE ANALYSIS:
====================
The issue occurs through the following sequence:

1. System initiates kexec shutdown on an ARM64 host
2. NBCON enters atomic mode during shutdown (printk_kthreads_shutdown)
3. NBCON calls pl011_console_write_atomic() with the following call path:

   local_irq_save()
     __nbcon_atomic_flush_pending_con()
       pl011_console_write_atomic()

4. Inside pl011_console_write_atomic(), the driver performs an unbounded
busy-wait for the hardware to become ready before leaving ->write_atomic()

   while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
       cpu_relax();            // drivers/tty/serial/amba-pl011.c:2540

5. With IRQs disabled, this busy-wait blocks the CPU for >11 seconds waiting
for the hardware to clear its busy state.

6. Meanwhile, kfence's toggle_allocation_gate() on another CPU attempts to
perform a synchronous operation across all CPUs, which correctly triggers a CSD
lock timeout because CPU#0 is stuck in the busy loop with IRQs disabled.

NOTES:
======

This is slightly similar to a report I gave a while ago [1] that got
fixed by Petr's a7df4ed0af77 ("printk: Allow to use the printk kthread
immediately even for 1st nbcon")

https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/

QUESTION
========

1) Should nbcon wait for hardware synchronizations with IRQ disabled?
2) Can the hardware synchronization be moved of the IRQ disabled path?

Thanks
--breno

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-25 16:02 CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64) Breno Leitao
@ 2025-11-26 14:13 ` Breno Leitao
  2025-11-26 14:54   ` Marco Elver
  2025-11-28 16:08 ` Petr Mladek
  1 sibling, 1 reply; 11+ messages in thread
From: Breno Leitao @ 2025-11-26 14:13 UTC (permalink / raw)
  To: glider, elver, dvyukov
  Cc: usamaarif642, leo.yan, linux-arm-kernel, linux-kernel,
	kernel-team, rmikey, john.ogness, pmladek, linux, paulmck,
	kasan-dev

On Tue, Nov 25, 2025 at 08:02:16AM -0800, Breno Leitao wrote:
> 6. Meanwhile, kfence's toggle_allocation_gate() on another CPU attempts to
> perform a synchronous operation across all CPUs, which correctly triggers a CSD
> lock timeout because CPU#0 is stuck in the busy loop with IRQs disabled.
 
I've hacked a patch to disable kfence IPIs during machine shutdown, and
with it loaded, I don't reproduce the problem described in this thread.

	Author: Breno Leitao <leitao@debian.org>
	Date:   Tue Nov 25 07:21:55 2025 -0800

	mm/kfence: add reboot notifier to disable KFENCE on shutdown
	
	Register a reboot notifier to disable KFENCE and cancel any pending
	timer work during system shutdown. This prevents potential IPI
	synchronization issues that can occur when KFENCE is active during
	the reboot process.
	
	The notifier runs with high priority (INT_MAX) to ensure KFENCE is
	disabled early in the shutdown sequence.
	
	Signed-off-by: Breno Leitao <leitao@debian.org>

	diff --git a/mm/kfence/core.c b/mm/kfence/core.c
	index 727c20c94ac5..5810afaaf6b4 100644
	--- a/mm/kfence/core.c
	+++ b/mm/kfence/core.c
	@@ -26,6 +26,7 @@
	#include <linux/panic_notifier.h>
	#include <linux/random.h>
	#include <linux/rcupdate.h>
	+#include <linux/reboot.h>
	#include <linux/sched/clock.h>
	#include <linux/seq_file.h>
	#include <linux/slab.h>
	@@ -819,6 +820,21 @@ static struct notifier_block kfence_check_canary_notifier = {
	
	static struct delayed_work kfence_timer;
	
	+static int kfence_reboot_callback(struct notifier_block *nb,
	+				  unsigned long action, void *data)
	+{
	+	/* Disable KFENCE to avoid IPI synchronization during shutdown */
	+	WRITE_ONCE(kfence_enabled, false);
	+	/* Cancel any pending timer work */
	+	cancel_delayed_work_sync(&kfence_timer);
	+	return NOTIFY_OK;
	+}
	+
	+static struct notifier_block kfence_reboot_notifier = {
	+	.notifier_call = kfence_reboot_callback,
	+	.priority = INT_MAX, /* Run early to stop timers ASAP */
	+};
	+
	#ifdef CONFIG_KFENCE_STATIC_KEYS
	/* Wait queue to wake up allocation-gate timer task. */
	static DECLARE_WAIT_QUEUE_HEAD(allocation_wait);
	@@ -901,6 +917,8 @@ static void kfence_init_enable(void)
		if (kfence_check_on_panic)
			atomic_notifier_chain_register(&panic_notifier_list, &kfence_check_canary_notifier);
	
	+	register_reboot_notifier(&kfence_reboot_notifier);
	+
		WRITE_ONCE(kfence_enabled, true);
		queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
 

Alexander, Marco and Kasan maintainers:

What is the potential impact of disabling KFENCE during reboot
procedures?

The primary motivation is to avoid triggering IPIs during the machine
teardown process, mainly when the nbconsole is not running in threaded
mode.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-26 14:13 ` Breno Leitao
@ 2025-11-26 14:54   ` Marco Elver
  2025-11-26 15:54     ` Breno Leitao
  0 siblings, 1 reply; 11+ messages in thread
From: Marco Elver @ 2025-11-26 14:54 UTC (permalink / raw)
  To: Breno Leitao
  Cc: glider, dvyukov, usamaarif642, leo.yan, linux-arm-kernel,
	linux-kernel, kernel-team, rmikey, john.ogness, pmladek, linux,
	paulmck, kasan-dev

On Wed, 26 Nov 2025 at 15:13, Breno Leitao <leitao@debian.org> wrote:
>
> On Tue, Nov 25, 2025 at 08:02:16AM -0800, Breno Leitao wrote:
> > 6. Meanwhile, kfence's toggle_allocation_gate() on another CPU attempts to
> > perform a synchronous operation across all CPUs, which correctly triggers a CSD
> > lock timeout because CPU#0 is stuck in the busy loop with IRQs disabled.
>
> I've hacked a patch to disable kfence IPIs during machine shutdown, and
> with it loaded, I don't reproduce the problem described in this thread.
>
>         Author: Breno Leitao <leitao@debian.org>
>         Date:   Tue Nov 25 07:21:55 2025 -0800
>
>         mm/kfence: add reboot notifier to disable KFENCE on shutdown
>
>         Register a reboot notifier to disable KFENCE and cancel any pending
>         timer work during system shutdown. This prevents potential IPI
>         synchronization issues that can occur when KFENCE is active during
>         the reboot process.
>
>         The notifier runs with high priority (INT_MAX) to ensure KFENCE is
>         disabled early in the shutdown sequence.
>
>         Signed-off-by: Breno Leitao <leitao@debian.org>
>
>         diff --git a/mm/kfence/core.c b/mm/kfence/core.c
>         index 727c20c94ac5..5810afaaf6b4 100644
>         --- a/mm/kfence/core.c
>         +++ b/mm/kfence/core.c
>         @@ -26,6 +26,7 @@
>         #include <linux/panic_notifier.h>
>         #include <linux/random.h>
>         #include <linux/rcupdate.h>
>         +#include <linux/reboot.h>
>         #include <linux/sched/clock.h>
>         #include <linux/seq_file.h>
>         #include <linux/slab.h>
>         @@ -819,6 +820,21 @@ static struct notifier_block kfence_check_canary_notifier = {
>
>         static struct delayed_work kfence_timer;
>
>         +static int kfence_reboot_callback(struct notifier_block *nb,
>         +                                 unsigned long action, void *data)
>         +{
>         +       /* Disable KFENCE to avoid IPI synchronization during shutdown */
>         +       WRITE_ONCE(kfence_enabled, false);
>         +       /* Cancel any pending timer work */
>         +       cancel_delayed_work_sync(&kfence_timer);
>         +       return NOTIFY_OK;
>         +}
>         +
>         +static struct notifier_block kfence_reboot_notifier = {
>         +       .notifier_call = kfence_reboot_callback,
>         +       .priority = INT_MAX, /* Run early to stop timers ASAP */
>         +};

Just place it under the #ifdef CONFIG_KFENCE_STATIS_KEYS below, I do
not think this is required if CONFIG_KFENCE_STATIC_KEYS is unset.

>         #ifdef CONFIG_KFENCE_STATIC_KEYS
>         /* Wait queue to wake up allocation-gate timer task. */
>         static DECLARE_WAIT_QUEUE_HEAD(allocation_wait);
>         @@ -901,6 +917,8 @@ static void kfence_init_enable(void)
>                 if (kfence_check_on_panic)
>                         atomic_notifier_chain_register(&panic_notifier_list, &kfence_check_canary_notifier);
>
>         +       register_reboot_notifier(&kfence_reboot_notifier);
>         +
>                 WRITE_ONCE(kfence_enabled, true);
>                 queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
>
>
> Alexander, Marco and Kasan maintainers:
>
> What is the potential impact of disabling KFENCE during reboot
> procedures?

But only if CONFIG_KFENCE_STATIC_KEYS is enabled?
That would be reasonable, given our recommendation has been to disable
CONFIG_KFENCE_STATIC_KEYS since
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4f612ed3f748962cbef1316ff3d323e2b9055b6e
in most cases.

I believe some low-CPU count systems are still benefiting from it, but
in general, I'd advise against it.

> The primary motivation is to avoid triggering IPIs during the machine
> teardown process, mainly when the nbconsole is not running in threaded
> mode.

Thanks,
-- Marco


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-26 14:54   ` Marco Elver
@ 2025-11-26 15:54     ` Breno Leitao
  2025-11-26 16:08       ` Marco Elver
  0 siblings, 1 reply; 11+ messages in thread
From: Breno Leitao @ 2025-11-26 15:54 UTC (permalink / raw)
  To: Marco Elver
  Cc: glider, dvyukov, usamaarif642, leo.yan, linux-arm-kernel,
	linux-kernel, kernel-team, rmikey, john.ogness, pmladek, linux,
	paulmck, kasan-dev

Hello Marco,

On Wed, Nov 26, 2025 at 03:54:26PM +0100, Marco Elver wrote:
> On Wed, 26 Nov 2025 at 15:13, Breno Leitao <leitao@debian.org> wrote:
> >         +static int kfence_reboot_callback(struct notifier_block *nb,
> >         +                                 unsigned long action, void *data)
> >         +{
> >         +       /* Disable KFENCE to avoid IPI synchronization during shutdown */
> >         +       WRITE_ONCE(kfence_enabled, false);
> >         +       /* Cancel any pending timer work */
> >         +       cancel_delayed_work_sync(&kfence_timer);
> >         +       return NOTIFY_OK;
> >         +}
> >         +
> >         +static struct notifier_block kfence_reboot_notifier = {
> >         +       .notifier_call = kfence_reboot_callback,
> >         +       .priority = INT_MAX, /* Run early to stop timers ASAP */
> >         +};
> 
> Just place it under the #ifdef CONFIG_KFENCE_STATIS_KEYS below, I do
> not think this is required if CONFIG_KFENCE_STATIC_KEYS is unset.

Ack. This is only needed for CONFIG_KFENCE_STATIC_KEYS, my bad.

> > Alexander, Marco and Kasan maintainers:
> >
> > What is the potential impact of disabling KFENCE during reboot
> > procedures?
> 
> But only if CONFIG_KFENCE_STATIC_KEYS is enabled?
> That would be reasonable, given our recommendation has been to disable
> CONFIG_KFENCE_STATIC_KEYS since
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4f612ed3f748962cbef1316ff3d323e2b9055b6e
> in most cases.
> 
> I believe some low-CPU count systems are still benefiting from it, but
> in general, I'd advise against it.

Thanks for your review and guidance.

Just to confirm my understanding: You’re okay with me adding this
notifier specifically for CONFIG_KFENCE_STATIC_KEYS (which is what
I need), but you would not support adding it for the general case where
!CONFIG_KFENCE_STATIC_KEYS, correct?

Thanks again,
--breno


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-26 15:54     ` Breno Leitao
@ 2025-11-26 16:08       ` Marco Elver
  2025-11-26 16:37         ` Breno Leitao
  0 siblings, 1 reply; 11+ messages in thread
From: Marco Elver @ 2025-11-26 16:08 UTC (permalink / raw)
  To: Breno Leitao
  Cc: glider, dvyukov, usamaarif642, leo.yan, linux-arm-kernel,
	linux-kernel, kernel-team, rmikey, john.ogness, pmladek, linux,
	paulmck, kasan-dev

On Wed, 26 Nov 2025 at 16:54, Breno Leitao <leitao@debian.org> wrote:
[..]
> > > Alexander, Marco and Kasan maintainers:
> > >
> > > What is the potential impact of disabling KFENCE during reboot
> > > procedures?
> >
> > But only if CONFIG_KFENCE_STATIC_KEYS is enabled?
> > That would be reasonable, given our recommendation has been to disable
> > CONFIG_KFENCE_STATIC_KEYS since
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4f612ed3f748962cbef1316ff3d323e2b9055b6e
> > in most cases.
> >
> > I believe some low-CPU count systems are still benefiting from it, but
> > in general, I'd advise against it.
>
> Thanks for your review and guidance.
>
> Just to confirm my understanding: You’re okay with me adding this
> notifier specifically for CONFIG_KFENCE_STATIC_KEYS (which is what
> I need), but you would not support adding it for the general case where
> !CONFIG_KFENCE_STATIC_KEYS, correct?

Yes, correct. If there's a real issue with CONFIG_KFENCE_STATIC_KEYS,
it's worth fixing if there are still valid uses for it. But I wouldn't
pessimize the now default mode, which is !CONFIG_KFENCE_STATIC_KEYS,
as it doesn't appear to have this problem.

Thanks,
-- Marco


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-26 16:08       ` Marco Elver
@ 2025-11-26 16:37         ` Breno Leitao
  0 siblings, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2025-11-26 16:37 UTC (permalink / raw)
  To: Marco Elver
  Cc: glider, dvyukov, usamaarif642, leo.yan, linux-arm-kernel,
	linux-kernel, kernel-team, rmikey, john.ogness, pmladek, linux,
	paulmck, kasan-dev

On Wed, Nov 26, 2025 at 05:08:59PM +0100, Marco Elver wrote:
> On Wed, 26 Nov 2025 at 16:54, Breno Leitao <leitao@debian.org> wrote:
> [..]
> > > > Alexander, Marco and Kasan maintainers:
> > > >
> > > > What is the potential impact of disabling KFENCE during reboot
> > > > procedures?
> > >
> > > But only if CONFIG_KFENCE_STATIC_KEYS is enabled?
> > > That would be reasonable, given our recommendation has been to disable
> > > CONFIG_KFENCE_STATIC_KEYS since
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4f612ed3f748962cbef1316ff3d323e2b9055b6e
> > > in most cases.
> > >
> > > I believe some low-CPU count systems are still benefiting from it, but
> > > in general, I'd advise against it.
> >
> > Thanks for your review and guidance.
> >
> > Just to confirm my understanding: You’re okay with me adding this
> > notifier specifically for CONFIG_KFENCE_STATIC_KEYS (which is what
> > I need), but you would not support adding it for the general case where
> > !CONFIG_KFENCE_STATIC_KEYS, correct?
> 
> Yes, correct. If there's a real issue with CONFIG_KFENCE_STATIC_KEYS,
> it's worth fixing if there are still valid uses for it.

Thanks for clarifying. I'll submit the patch with changes limited to
CONFIG_KFENCE_STATIC_KEYS.

--breno


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-25 16:02 CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64) Breno Leitao
  2025-11-26 14:13 ` Breno Leitao
@ 2025-11-28 16:08 ` Petr Mladek
  2025-12-01 12:58   ` John Ogness
  2025-12-01 17:04   ` Breno Leitao
  1 sibling, 2 replies; 11+ messages in thread
From: Petr Mladek @ 2025-11-28 16:08 UTC (permalink / raw)
  To: Breno Leitao
  Cc: john.ogness, linux, paulmck, usamaarif642, leo.yan,
	linux-arm-kernel, linux-kernel, kernel-team, rmikey

On Tue 2025-11-25 08:02:16, Breno Leitao wrote:
> Hello,
> 
> I am reporting a CSD lockup issue that occurs during kexec on ARM64 hosts,
> which I have traced to the amba-pl011 serial driver waiting for hardware with
> IRQs disabled in the nbcon atomic write path.
> 
> 
> PROBLEM SUMMARY:
> ================
> During kexec, a CSD lockup occurs when pl011_console_write_atomic() performs
> an unbounded busy-wait for hardware synchronization while IRQs are disabled.
> This blocks other CPUs for extended periods (>11 seconds observed), triggering
> CSD lock timeouts.

I do _not_ think that the CPU was waiting in pl011_console_write_atomic() in the
the following cycle the entire 11 secs:

	while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
		cpu_relax();

A more likely scenario was that pl011_console_write_atomic() was
called several times during this period because there were more
pending messages.

See below.

> KERNEL VERSION:
> ===============
> Observed on kernel 6.13, but the code path appears similar in upstream.
> 
> 
> ERROR MESSAGE:
> ==============
>   mlx5_core 0000:03:00.0: Shutdown was called
>   kvm: exiting hardware virtualization
>   arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0x00000104, hwcons 0x00000102]
>   smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000032 ns for CPU#00 do_nothing (kernel/smp.c:1057)
>   smp:     csd: CSD lock (#1) unresponsive.
>   Sending NMI from CPU 4 to CPUs 0:
>   NMI backtrace for cpu 0
>   pstate: 03401009 (nzcv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
>   pc : pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540)

This seems to be the cycle:

	while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
		cpu_relax();

>   lr : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:292 drivers/tty/serial/amba-pl011.c:298 drivers/tty/serial/amba-pl011.c:2539)
>   sp : ffff80010e26fae0
>   pmr: 000000c0
>   x29: ffff80010e26fae0 x28: ffff800082ddb000 x27: 00000000000000e0
>   x26: 0000000000000001 x25: ffff8000826a8de8 x24: 00000000000008eb
>   x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
>   x20: ffff00009c19c880 x19: ffff80010e26fb88 x18: 0000000000000018
>   x17: 696f70646e452065 x16: 4943502032303830 x15: 3130783020737361
>   x14: 6c63203030206570 x13: 746e696f70646e45 x12: 0000000000000000
>   x11: 0000000000000008 x10: 0000000000000000 x9 : ffff800081888d80
>   x8 : 0000000000000018 x7 : 205d313332363336 x6 : 362e31202020205b
>   x5 : ffff000097d4700f x4 : ffff80010e26f99f x3 : ffff800081125220
>   x2 : 0000000000000052 x1 : 000000000000000a x0 : ffff00009c19c880
>   Call trace:
>   pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
>   nbcon_emit_next_record (kernel/printk/nbcon.c:1049)
>   __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517)
>   __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/printk/nbcon.c:1612)
>   nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629)

This code looks like:

static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
					   bool allow_unsafe_takeover)
{
[...]
	/*
	 * Atomic flushing does not use console driver synchronization (i.e.
	 * it does not hold the port lock for uart consoles). Therefore IRQs
	 * must be disabled to avoid being interrupted and then calling into
	 * a driver that will deadlock trying to acquire console ownership.
	 */
	local_irq_save(flags);

	err = __nbcon_atomic_flush_pending_con(con, stop_seq, allow_unsafe_takeover);

	local_irq_restore(flags);
[...]
}

It means that IRQs are disabled until all pending messages are flushed.

>   printk_kthreads_shutdown (kernel/printk/printk.c:?)

But the function seems be called with IRQs enabled. So that it might
help to restore IRQs after each flushed message.

>   syscore_shutdown (drivers/base/syscore.c:120)
>   kernel_kexec (kernel/kexec_core.c:1045)
> 
> NOTES:
> ======
> 
> This is slightly similar to a report I gave a while ago [1] that got
> fixed by Petr's a7df4ed0af77 ("printk: Allow to use the printk kthread
> immediately even for 1st nbcon")
> 
> https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/
> 
> QUESTION
> ========
> 
> 1) Should nbcon wait for hardware synchronizations with IRQ disabled?
> 2) Can the hardware synchronization be moved of the IRQ disabled path?

This would be complicated because the nbcon console ownership has
to be acquired with IRQs disabled. Otherwise, it might cause a
deadlock because uart_port_lock() has to acquire the nbcon console
as well.

But we could extend the existing commit d5d399efff6577 ("printk/nbcon:
Release nbcon consoles ownership in atomic flush after each emitted
record") and restore IRQs after each emitted record.

I wonder if the following patch would help in this scenario.
It is made on top of "for-next" branch in printk/linux.git.
But the most important pre-requisite is the above mentioned commit
in the branch "rework/atomic-flush-hardlockup".

Note that the patch is only compile tested.

From 6173069ae66fbb3b903cbc3798c16d3b8046da08 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Fri, 28 Nov 2025 16:16:19 +0100
Subject: [RFC] printk/nbcon: Restore IRQ in atomic flush after each emitted
 record

The commit d5d399efff6577 ("printk/nbcon: Release nbcon consoles ownership
in atomic flush after each emitted record") prevented stall of a CPU
which lost nbcon console ownership because another CPU entered
an emergency flush.

But there is still the problem that the CPU doing the emergency flush
might cause a stall on its own.

Let's go even further and restore IRQ in the atomic flush after
each emitted record.

It is not a complete solution. The interrupts and/or scheduling might
still be blocked when the emergency atomic flush was called with
IRQs and/or scheduling disabled. But it should remove the following
lockup:

  mlx5_core 0000:03:00.0: Shutdown was called
  kvm: exiting hardware virtualization
  arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0x00000104, hwcons 0x00000102]
  smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000032 ns for CPU#00 do_nothing (kernel/smp.c:1057)
  smp:     csd: CSD lock (#1) unresponsive.
  [...]
  Call trace:
  pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
  nbcon_emit_next_record (kernel/printk/nbcon.c:1049)
  __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517)
  __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/printk/nbcon.c:1612)
  nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629)
  printk_kthreads_shutdown (kernel/printk/printk.c:?)
  syscore_shutdown (drivers/base/syscore.c:120)
  kernel_kexec (kernel/kexec_core.c:1045)
  __arm64_sys_reboot (kernel/reboot.c:794 kernel/reboot.c:722 kernel/reboot.c:722)
  invoke_syscall (arch/arm64/kernel/syscall.c:50)
  el0_svc_common.llvm.14158405452757855239 (arch/arm64/kernel/syscall.c:?)
  do_el0_svc (arch/arm64/kernel/syscall.c:152)
  el0_svc (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:73 arch/arm64/kernel/entry-common.c:169 arch/arm64/kernel/entry-common.c:182 arch/arm64/kernel/entry-common.c:749)
  el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:820)
  el0t_64_sync (arch/arm64/kernel/entry.S:600)

In this case, nbcon_atomic_flush_pending() is called from
printk_kthreads_shutdown() with IRQs and scheduling enabled.

An ultimate solution would be touching the watchdog. But it would hide
all problems. Let's do it later when anyone reports a stall which does
not have a better solution.

Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/nbcon.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 3fa403f9831f..6b8becb6ecd9 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1549,6 +1549,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
 {
 	struct nbcon_write_context wctxt = { };
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+	unsigned long flags;
 	int err = 0;

 	ctxt->console			= con;
@@ -1557,18 +1558,31 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
 	ctxt->allow_unsafe_takeover	= nbcon_allow_unsafe_takeover();

 	while (nbcon_seq_read(con) < stop_seq) {
-		if (!nbcon_context_try_acquire(ctxt, false))
+		/*
+		 * Atomic flushing does not use console driver synchronization
+		 * (i.e. it does not hold the port lock for uart consoles).
+		 * Therefore IRQs must be disabled to avoid being interrupted
+		 * and then calling into a driver that will deadlock trying
+		 * to acquire console ownership.
+		 */
+		local_irq_save(flags);
+		if (!nbcon_context_try_acquire(ctxt, false)) {
+			local_irq_restore(flags);
 			return -EPERM;
+		}

 		/*
 		 * nbcon_emit_next_record() returns false when the console was
 		 * handed over or taken over. In both cases the context is no
 		 * longer valid.
 		 */
-		if (!nbcon_emit_next_record(&wctxt, true))
+		if (!nbcon_emit_next_record(&wctxt, true)) {
+			local_irq_restore(flags);
 			return -EAGAIN;
+		}

 		nbcon_context_release(ctxt);
+		local_irq_restore(flags);

 		if (!ctxt->backlog) {
 			/* Are there reserved but not yet finalized records? */
@@ -1595,22 +1609,11 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
 static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
 {
 	struct console_flush_type ft;
-	unsigned long flags;
 	int err;

 again:
-	/*
-	 * Atomic flushing does not use console driver synchronization (i.e.
-	 * it does not hold the port lock for uart consoles). Therefore IRQs
-	 * must be disabled to avoid being interrupted and then calling into
-	 * a driver that will deadlock trying to acquire console ownership.
-	 */
-	local_irq_save(flags);
-
 	err = __nbcon_atomic_flush_pending_con(con, stop_seq);

-	local_irq_restore(flags);
-
 	/*
 	 * If there was a new owner (-EPERM, -EAGAIN), that context is
 	 * responsible for completing.
-- 
2.52.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-28 16:08 ` Petr Mladek
@ 2025-12-01 12:58   ` John Ogness
  2025-12-01 13:21     ` John Ogness
  2025-12-01 17:04   ` Breno Leitao
  1 sibling, 1 reply; 11+ messages in thread
From: John Ogness @ 2025-12-01 12:58 UTC (permalink / raw)
  To: Petr Mladek, Breno Leitao
  Cc: linux, paulmck, usamaarif642, leo.yan, linux-arm-kernel,
	linux-kernel, kernel-team, rmikey

On 2025-11-28, Petr Mladek <pmladek@suse.com> wrote:

> On Tue 2025-11-25 08:02:16, Breno Leitao wrote:
>> Hello,
>> 
>> I am reporting a CSD lockup issue that occurs during kexec on ARM64 hosts,
>> which I have traced to the amba-pl011 serial driver waiting for hardware with
>> IRQs disabled in the nbcon atomic write path.
>> 
>> 
>> PROBLEM SUMMARY:
>> ================
>> During kexec, a CSD lockup occurs when pl011_console_write_atomic() performs
>> an unbounded busy-wait for hardware synchronization while IRQs are disabled.
>> This blocks other CPUs for extended periods (>11 seconds observed), triggering
>> CSD lock timeouts.
>
> I do _not_ think that the CPU was waiting in pl011_console_write_atomic() in the
> the following cycle the entire 11 secs:
>
> 	while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
> 		cpu_relax();
>
> A more likely scenario was that pl011_console_write_atomic() was
> called several times during this period because there were more
> pending messages.
>
> See below.
>
>> KERNEL VERSION:
>> ===============
>> Observed on kernel 6.13, but the code path appears similar in upstream.
>> 
>> 
>> ERROR MESSAGE:
>> ==============
>>   mlx5_core 0000:03:00.0: Shutdown was called
>>   kvm: exiting hardware virtualization
>>   arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0x00000104, hwcons 0x00000102]
>>   smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000032 ns for CPU#00 do_nothing (kernel/smp.c:1057)
>>   smp:     csd: CSD lock (#1) unresponsive.
>>   Sending NMI from CPU 4 to CPUs 0:
>>   NMI backtrace for cpu 0
>>   pstate: 03401009 (nzcv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--)
>>   pc : pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540)
>
> This seems to be the cycle:
>
> 	while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
> 		cpu_relax();
>
>>   lr : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:292 drivers/tty/serial/amba-pl011.c:298 drivers/tty/serial/amba-pl011.c:2539)
>>   sp : ffff80010e26fae0
>>   pmr: 000000c0
>>   x29: ffff80010e26fae0 x28: ffff800082ddb000 x27: 00000000000000e0
>>   x26: 0000000000000001 x25: ffff8000826a8de8 x24: 00000000000008eb
>>   x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
>>   x20: ffff00009c19c880 x19: ffff80010e26fb88 x18: 0000000000000018
>>   x17: 696f70646e452065 x16: 4943502032303830 x15: 3130783020737361
>>   x14: 6c63203030206570 x13: 746e696f70646e45 x12: 0000000000000000
>>   x11: 0000000000000008 x10: 0000000000000000 x9 : ffff800081888d80
>>   x8 : 0000000000000018 x7 : 205d313332363336 x6 : 362e31202020205b
>>   x5 : ffff000097d4700f x4 : ffff80010e26f99f x3 : ffff800081125220
>>   x2 : 0000000000000052 x1 : 000000000000000a x0 : ffff00009c19c880
>>   Call trace:
>>   pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
>>   nbcon_emit_next_record (kernel/printk/nbcon.c:1049)
>>   __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517)
>>   __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/printk/nbcon.c:1612)
>>   nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629)
>
> This code looks like:
>
> static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
> 					   bool allow_unsafe_takeover)
> {
> [...]
> 	/*
> 	 * Atomic flushing does not use console driver synchronization (i.e.
> 	 * it does not hold the port lock for uart consoles). Therefore IRQs
> 	 * must be disabled to avoid being interrupted and then calling into
> 	 * a driver that will deadlock trying to acquire console ownership.
> 	 */
> 	local_irq_save(flags);
>
> 	err = __nbcon_atomic_flush_pending_con(con, stop_seq, allow_unsafe_takeover);
>
> 	local_irq_restore(flags);
> [...]
> }
>
> It means that IRQs are disabled until all pending messages are flushed.
>
>>   printk_kthreads_shutdown (kernel/printk/printk.c:?)
>
> But the function seems be called with IRQs enabled. So that it might
> help to restore IRQs after each flushed message.
>
>>   syscore_shutdown (drivers/base/syscore.c:120)
>>   kernel_kexec (kernel/kexec_core.c:1045)
>> 
>> NOTES:
>> ======
>> 
>> This is slightly similar to a report I gave a while ago [1] that got
>> fixed by Petr's a7df4ed0af77 ("printk: Allow to use the printk kthread
>> immediately even for 1st nbcon")
>> 
>> https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/
>> 
>> QUESTION
>> ========
>> 
>> 1) Should nbcon wait for hardware synchronizations with IRQ disabled?
>> 2) Can the hardware synchronization be moved of the IRQ disabled path?
>
> This would be complicated because the nbcon console ownership has
> to be acquired with IRQs disabled. Otherwise, it might cause a
> deadlock because uart_port_lock() has to acquire the nbcon console
> as well.
>
> But we could extend the existing commit d5d399efff6577 ("printk/nbcon:
> Release nbcon consoles ownership in atomic flush after each emitted
> record") and restore IRQs after each emitted record.
>
> I wonder if the following patch would help in this scenario.
> It is made on top of "for-next" branch in printk/linux.git.
> But the most important pre-requisite is the above mentioned commit
> in the branch "rework/atomic-flush-hardlockup".
>
> Note that the patch is only compile tested.
>
> From 6173069ae66fbb3b903cbc3798c16d3b8046da08 Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Fri, 28 Nov 2025 16:16:19 +0100
> Subject: [RFC] printk/nbcon: Restore IRQ in atomic flush after each emitted
>  record
>
> The commit d5d399efff6577 ("printk/nbcon: Release nbcon consoles ownership
> in atomic flush after each emitted record") prevented stall of a CPU
> which lost nbcon console ownership because another CPU entered
> an emergency flush.
>
> But there is still the problem that the CPU doing the emergency flush
> might cause a stall on its own.
>
> Let's go even further and restore IRQ in the atomic flush after
> each emitted record.
>
> It is not a complete solution. The interrupts and/or scheduling might
> still be blocked when the emergency atomic flush was called with
> IRQs and/or scheduling disabled. But it should remove the following
> lockup:
>
>   mlx5_core 0000:03:00.0: Shutdown was called
>   kvm: exiting hardware virtualization
>   arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0x00000104, hwcons 0x00000102]
>   smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000032 ns for CPU#00 do_nothing (kernel/smp.c:1057)
>   smp:     csd: CSD lock (#1) unresponsive.
>   [...]
>   Call trace:
>   pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
>   nbcon_emit_next_record (kernel/printk/nbcon.c:1049)
>   __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517)
>   __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/printk/nbcon.c:1612)
>   nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629)
>   printk_kthreads_shutdown (kernel/printk/printk.c:?)
>   syscore_shutdown (drivers/base/syscore.c:120)
>   kernel_kexec (kernel/kexec_core.c:1045)
>   __arm64_sys_reboot (kernel/reboot.c:794 kernel/reboot.c:722 kernel/reboot.c:722)
>   invoke_syscall (arch/arm64/kernel/syscall.c:50)
>   el0_svc_common.llvm.14158405452757855239 (arch/arm64/kernel/syscall.c:?)
>   do_el0_svc (arch/arm64/kernel/syscall.c:152)
>   el0_svc (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:73 arch/arm64/kernel/entry-common.c:169 arch/arm64/kernel/entry-common.c:182 arch/arm64/kernel/entry-common.c:749)
>   el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:820)
>   el0t_64_sync (arch/arm64/kernel/entry.S:600)
>
> In this case, nbcon_atomic_flush_pending() is called from
> printk_kthreads_shutdown() with IRQs and scheduling enabled.
>
> An ultimate solution would be touching the watchdog. But it would hide
> all problems. Let's do it later when anyone reports a stall which does
> not have a better solution.
>
> Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/printk/nbcon.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 3fa403f9831f..6b8becb6ecd9 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1549,6 +1549,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
>  {
>  	struct nbcon_write_context wctxt = { };
>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +	unsigned long flags;
>  	int err = 0;
>  
>  	ctxt->console			= con;
> @@ -1557,18 +1558,31 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
>  	ctxt->allow_unsafe_takeover	= nbcon_allow_unsafe_takeover();
>  
>  	while (nbcon_seq_read(con) < stop_seq) {
> -		if (!nbcon_context_try_acquire(ctxt, false))
> +		/*
> +		 * Atomic flushing does not use console driver synchronization
> +		 * (i.e. it does not hold the port lock for uart consoles).
> +		 * Therefore IRQs must be disabled to avoid being interrupted
> +		 * and then calling into a driver that will deadlock trying
> +		 * to acquire console ownership.
> +		 */
> +		local_irq_save(flags);
> +		if (!nbcon_context_try_acquire(ctxt, false)) {
> +			local_irq_restore(flags);
>  			return -EPERM;
> +		}
>  
>  		/*
>  		 * nbcon_emit_next_record() returns false when the console was
>  		 * handed over or taken over. In both cases the context is no
>  		 * longer valid.
>  		 */
> -		if (!nbcon_emit_next_record(&wctxt, true))
> +		if (!nbcon_emit_next_record(&wctxt, true)) {
> +			local_irq_restore(flags);
>  			return -EAGAIN;
> +		}
>  
>  		nbcon_context_release(ctxt);
> +		local_irq_restore(flags);

Using local_irq_save()/_restore() here is not acceptable for PREEMPT_RT
because __nbcon_atomic_flush_pending_con() is also used by
nbcon_device_release().

Using local_lock_irqsave()/_irqrestore() instead is also not acceptable
because __nbcon_atomic_flush_pending_con() is called by vprintk_emit(),
which can be a context that does not allow sleeping locks.

If we want this kind of a solution, nbcon_device_release() will need an
atomic flushing variant that does not explicitly disable interrupts.

John Ogness


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-12-01 12:58   ` John Ogness
@ 2025-12-01 13:21     ` John Ogness
  2025-12-02 10:34       ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: John Ogness @ 2025-12-01 13:21 UTC (permalink / raw)
  To: Petr Mladek, Breno Leitao
  Cc: linux, paulmck, usamaarif642, leo.yan, linux-arm-kernel,
	linux-kernel, kernel-team, rmikey

On 2025-12-01, John Ogness <john.ogness@linutronix.de> wrote:
>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> index 3fa403f9831f..6b8becb6ecd9 100644
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -1549,6 +1549,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
>>  {
>>  	struct nbcon_write_context wctxt = { };
>>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
>> +	unsigned long flags;
>>  	int err = 0;
>>  
>>  	ctxt->console			= con;
>> @@ -1557,18 +1558,31 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
>>  	ctxt->allow_unsafe_takeover	= nbcon_allow_unsafe_takeover();
>>  
>>  	while (nbcon_seq_read(con) < stop_seq) {
>> -		if (!nbcon_context_try_acquire(ctxt, false))
>> +		/*
>> +		 * Atomic flushing does not use console driver synchronization
>> +		 * (i.e. it does not hold the port lock for uart consoles).
>> +		 * Therefore IRQs must be disabled to avoid being interrupted
>> +		 * and then calling into a driver that will deadlock trying
>> +		 * to acquire console ownership.
>> +		 */
>> +		local_irq_save(flags);
>> +		if (!nbcon_context_try_acquire(ctxt, false)) {
>> +			local_irq_restore(flags);
>>  			return -EPERM;
>> +		}
>>  
>>  		/*
>>  		 * nbcon_emit_next_record() returns false when the console was
>>  		 * handed over or taken over. In both cases the context is no
>>  		 * longer valid.
>>  		 */
>> -		if (!nbcon_emit_next_record(&wctxt, true))
>> +		if (!nbcon_emit_next_record(&wctxt, true)) {
>> +			local_irq_restore(flags);
>>  			return -EAGAIN;
>> +		}
>>  
>>  		nbcon_context_release(ctxt);
>> +		local_irq_restore(flags);
>
> Using local_irq_save()/_restore() here is not acceptable for PREEMPT_RT
> because __nbcon_atomic_flush_pending_con() is also used by
> nbcon_device_release().

After thinking about this more, this would be acceptable. If
printk_get_console_flush_type() is reporting nbcon_atomic==true, then
the system is in a state where latencies are irrelevant.

John Ogness


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-11-28 16:08 ` Petr Mladek
  2025-12-01 12:58   ` John Ogness
@ 2025-12-01 17:04   ` Breno Leitao
  1 sibling, 0 replies; 11+ messages in thread
From: Breno Leitao @ 2025-12-01 17:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: john.ogness, linux, paulmck, usamaarif642, leo.yan,
	linux-arm-kernel, linux-kernel, kernel-team, rmikey

Hello Petr,

On Fri, Nov 28, 2025 at 05:08:17PM +0100, Petr Mladek wrote:
> On Tue 2025-11-25 08:02:16, Breno Leitao wrote:
>
> I do _not_ think that the CPU was waiting in pl011_console_write_atomic() in the
> the following cycle the entire 11 secs:
> 
> 	while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
> 		cpu_relax();
> 
> A more likely scenario was that pl011_console_write_atomic() was
> called several times during this period because there were more
> pending messages.

Probably. Most of the messages are coming from CPU being powered off:

	[   44.119433] psci: CPU1 killed (polled 0 ms)
	[   44.146057] psci: CPU2 killed (polled 0 ms)
	[   44.182058] psci: CPU3 killed (polled 0 ms)
	[   44.218031] psci: CPU4 killed (polled 0 ms)
	[   44.252962] psci: CPU5 killed (polled 0 ms)
	[   44.276939] psci: CPU6 killed (polled 0 ms)
	[   44.296152] psci: CPU7 killed (polled 1 ms)
	....

And this only happens on "large" machines, thus, the host is flushing
a lot of messages during kexec turn down time.

> >   printk_kthreads_shutdown (kernel/printk/printk.c:?)
> 
> But the function seems be called with IRQs enabled. So that it might
> help to restore IRQs after each flushed message.

Agree. This would make the irq-disabled sections much smaller, with
a higher changes of IPIs and NMIs (on arm64 hosts without FEAT_NMI).

> But we could extend the existing commit d5d399efff6577 ("printk/nbcon:
> Release nbcon consoles ownership in atomic flush after each emitted
> record") and restore IRQs after each emitted record.
> 
> I wonder if the following patch would help in this scenario.
> It is made on top of "for-next" branch in printk/linux.git.
> But the most important pre-requisite is the above mentioned commit
> in the branch "rework/atomic-flush-hardlockup".
> 
> Note that the patch is only compile tested.

I've tested the patch and I don't see the CSD lockups anymore.
Thanks for the quick fix.

> Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6hkd@v6sqmpbu3vvu
> Signed-off-by: Petr Mladek <pmladek@suse.com>

Tested-by: Breno Leitao <leitao@debian.org>

Thanks for all people involved in here. With this last patch (that makes
the irq-disbled section smaller), and kfence not IPIing during kexec
time, I consider this issue closed. 

--breno


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64)
  2025-12-01 13:21     ` John Ogness
@ 2025-12-02 10:34       ` Petr Mladek
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Mladek @ 2025-12-02 10:34 UTC (permalink / raw)
  To: John Ogness
  Cc: Breno Leitao, linux, paulmck, usamaarif642, leo.yan,
	linux-arm-kernel, linux-kernel, kernel-team, rmikey

On Mon 2025-12-01 14:27:32, John Ogness wrote:
> On 2025-12-01, John Ogness <john.ogness@linutronix.de> wrote:
> >> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> >> index 3fa403f9831f..6b8becb6ecd9 100644
> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -1549,6 +1549,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> >>  {
> >>  	struct nbcon_write_context wctxt = { };
> >>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> >> +	unsigned long flags;
> >>  	int err = 0;
> >>  
> >>  	ctxt->console			= con;
> >> @@ -1557,18 +1558,31 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> >>  	ctxt->allow_unsafe_takeover	= nbcon_allow_unsafe_takeover();
> >>  
> >>  	while (nbcon_seq_read(con) < stop_seq) {
> >> -		if (!nbcon_context_try_acquire(ctxt, false))
> >> +		/*
> >> +		 * Atomic flushing does not use console driver synchronization
> >> +		 * (i.e. it does not hold the port lock for uart consoles).
> >> +		 * Therefore IRQs must be disabled to avoid being interrupted
> >> +		 * and then calling into a driver that will deadlock trying
> >> +		 * to acquire console ownership.
> >> +		 */
> >> +		local_irq_save(flags);
> >> +		if (!nbcon_context_try_acquire(ctxt, false)) {
> >> +			local_irq_restore(flags);
> >>  			return -EPERM;
> >> +		}
> >>  
> >>  		/*
> >>  		 * nbcon_emit_next_record() returns false when the console was
> >>  		 * handed over or taken over. In both cases the context is no
> >>  		 * longer valid.
> >>  		 */
> >> -		if (!nbcon_emit_next_record(&wctxt, true))
> >> +		if (!nbcon_emit_next_record(&wctxt, true)) {
> >> +			local_irq_restore(flags);
> >>  			return -EAGAIN;
> >> +		}
> >>  
> >>  		nbcon_context_release(ctxt);
> >> +		local_irq_restore(flags);
> >
> > Using local_irq_save()/_restore() here is not acceptable for PREEMPT_RT
> > because __nbcon_atomic_flush_pending_con() is also used by
> > nbcon_device_release().

Great catch! I did not think about this code path.

> After thinking about this more, this would be acceptable. If
> printk_get_console_flush_type() is reporting nbcon_atomic==true, then
> the system is in a state where latencies are irrelevant.

I agree. It might be possible to create a special variant for
the nbcon_device_release() code path. But it probably is not
worth it.

I am going to mention this in the commit message and send
it as proper patch.

Thanks a lot for review and feedback.

Best Regards,
Petr


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-12-02 10:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25 16:02 CSD lockup during kexec due to unbounded busy-wait in pl011_console_write_atomic (arm64) Breno Leitao
2025-11-26 14:13 ` Breno Leitao
2025-11-26 14:54   ` Marco Elver
2025-11-26 15:54     ` Breno Leitao
2025-11-26 16:08       ` Marco Elver
2025-11-26 16:37         ` Breno Leitao
2025-11-28 16:08 ` Petr Mladek
2025-12-01 12:58   ` John Ogness
2025-12-01 13:21     ` John Ogness
2025-12-02 10:34       ` Petr Mladek
2025-12-01 17:04   ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).