linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* arm64: csdlock at early boot due to slow serial (?)
@ 2025-07-02 17:10 Breno Leitao
  2025-07-02 17:20 ` Leo Yan
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Breno Leitao @ 2025-07-02 17:10 UTC (permalink / raw)
  To: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial
  Cc: rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel,
	paulmck

Hello,

I'm observing two unusual behaviors during the boot process on my SBSA
ARM machine, with upstream kernel (6.16-rc4):

1) A 9-second pause during early boot:

	[    0.000000] ACPI: SPCR: console: pl011,mmio32,0xc280000,115200
	[    0.420120] Serial: AMBA PL011 UART driver
	[    0.875263] printk: console [ttyAMA0] enabled
	[    9.848263] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])

2) Occasional CSD lock during early boot:

Intermittently, I encounter a CSD lock. Diagnosing this was challenging, but
after enabling PSEUDO NMI, I was able to capture the stack trace:

	printk: console [ttyAMA0] enabled
	smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#02 do_nothing (kernel/smp.c:1058)
	smp:     csd: CSD lock (#1) unresponsive.
	Sending NMI from CPU 0 to CPUs 2:
	....
	pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
	nbcon_emit_next_record (kernel/printk/nbcon.c:1030)
	__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
	__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
	nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
	vprintk_emit (kernel/printk/printk.c:2429)

On reviewing the amba-pl011.c code, I noticed that each message being flushed
causes the following loop to iterate roughly 20,000 times:

          while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
                  cpu_relax();

Tracing this, I found that flushing early boot messages is taking a significant
amount of time. For example, trace_printk() output from that function shows:

       swapper/0-1       [000] dN...     9.695941: pl011_console_write_atomic: "[    0.928995] printk: console [ttyAMA0] enabled"
											|
       											-> This is trace_printk of wctxt->outbuf

At timestamp 9.69 seconds, the serial console is still flushing messages from
0.92 seconds, indicating that the initial 9-second gap is spent looping in
cpu_relax()—about 20,000 times per message, which is clearly suboptimal.

Further debugging revealed the following sequence with the pl011 registers:

	1) uart_console_write()
	2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
	3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()

Michael has reported a hardware issue where the BUSY bit could get
stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.

If I am having the same hardware issue, I suppose I need to change that logic
to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0
instead of BUSY.

Anyway, any one familar with this weird behaviour?

Thanks
--breno


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-08-06 15:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao
2025-07-02 17:20 ` Leo Yan
2025-07-03 10:02   ` Breno Leitao
2025-07-03 11:45     ` Leo Yan
2025-07-03 15:01       ` Breno Leitao
2025-07-03 16:17         ` Leo Yan
2025-07-03 10:28 ` Mark Rutland
2025-07-03 14:13   ` Breno Leitao
2025-07-03 16:31     ` Mark Rutland
2025-07-08 14:00       ` Breno Leitao
2025-07-09 14:23         ` Breno Leitao
2025-07-10 13:35           ` Leo Yan
2025-07-10 17:30             ` Breno Leitao
2025-07-11  9:50               ` Leo Yan
2025-07-11 10:45                 ` Breno Leitao
2025-07-14 10:52                   ` Leo Yan
2025-08-06 11:44 ` Nirmoy Das
2025-08-06 15:53   ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).