linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Breno Leitao <leitao@debian.org>
To: cov@codeaurora.org, rmk+kernel@armlinux.org.uk,
	mark.rutland@arm.com, catalin.marinas@arm.com,
	linux-serial@vger.kernel.org
Cc: rmikey@meta.com, linux-arm-kernel@lists.infradead.org,
	usamaarif642@gmail.com, leo.yan@arm.com,
	linux-kernel@vger.kernel.org, paulmck@kernel.org
Subject: arm64: csdlock at early boot due to slow serial (?)
Date: Wed, 2 Jul 2025 10:10:21 -0700	[thread overview]
Message-ID: <aGVn/SnOvwWewkOW@gmail.com> (raw)

Hello,

I'm observing two unusual behaviors during the boot process on my SBSA
ARM machine, with upstream kernel (6.16-rc4):

1) A 9-second pause during early boot:

	[    0.000000] ACPI: SPCR: console: pl011,mmio32,0xc280000,115200
	[    0.420120] Serial: AMBA PL011 UART driver
	[    0.875263] printk: console [ttyAMA0] enabled
	[    9.848263] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])

2) Occasional CSD lock during early boot:

Intermittently, I encounter a CSD lock. Diagnosing this was challenging, but
after enabling PSEUDO NMI, I was able to capture the stack trace:

	printk: console [ttyAMA0] enabled
	smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#02 do_nothing (kernel/smp.c:1058)
	smp:     csd: CSD lock (#1) unresponsive.
	Sending NMI from CPU 0 to CPUs 2:
	....
	pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
	nbcon_emit_next_record (kernel/printk/nbcon.c:1030)
	__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
	__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
	nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
	vprintk_emit (kernel/printk/printk.c:2429)

On reviewing the amba-pl011.c code, I noticed that each message being flushed
causes the following loop to iterate roughly 20,000 times:

          while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
                  cpu_relax();

Tracing this, I found that flushing early boot messages is taking a significant
amount of time. For example, trace_printk() output from that function shows:

       swapper/0-1       [000] dN...     9.695941: pl011_console_write_atomic: "[    0.928995] printk: console [ttyAMA0] enabled"
											|
       											-> This is trace_printk of wctxt->outbuf

At timestamp 9.69 seconds, the serial console is still flushing messages from
0.92 seconds, indicating that the initial 9-second gap is spent looping in
cpu_relax()—about 20,000 times per message, which is clearly suboptimal.

Further debugging revealed the following sequence with the pl011 registers:

	1) uart_console_write()
	2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
	3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()

Michael has reported a hardware issue where the BUSY bit could get
stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.

If I am having the same hardware issue, I suppose I need to change that logic
to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0
instead of BUSY.

Anyway, any one familar with this weird behaviour?

Thanks
--breno


             reply	other threads:[~2025-07-02 17:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 17:10 Breno Leitao [this message]
2025-07-02 17:20 ` arm64: csdlock at early boot due to slow serial (?) Leo Yan
2025-07-03 10:02   ` Breno Leitao
2025-07-03 11:45     ` Leo Yan
2025-07-03 15:01       ` Breno Leitao
2025-07-03 16:17         ` Leo Yan
2025-07-03 10:28 ` Mark Rutland
2025-07-03 14:13   ` Breno Leitao
2025-07-03 16:31     ` Mark Rutland
2025-07-08 14:00       ` Breno Leitao
2025-07-09 14:23         ` Breno Leitao
2025-07-10 13:35           ` Leo Yan
2025-07-10 17:30             ` Breno Leitao
2025-07-11  9:50               ` Leo Yan
2025-07-11 10:45                 ` Breno Leitao
2025-07-14 10:52                   ` Leo Yan
2025-08-06 11:44 ` Nirmoy Das
2025-08-06 15:53   ` Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGVn/SnOvwWewkOW@gmail.com \
    --to=leitao@debian.org \
    --cc=catalin.marinas@arm.com \
    --cc=cov@codeaurora.org \
    --cc=leo.yan@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@kernel.org \
    --cc=rmikey@meta.com \
    --cc=rmk+kernel@armlinux.org.uk \
    --cc=usamaarif642@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).