* arm64: csdlock at early boot due to slow serial (?)
@ 2025-07-02 17:10 Breno Leitao
2025-07-02 17:20 ` Leo Yan
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Breno Leitao @ 2025-07-02 17:10 UTC (permalink / raw)
To: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial
Cc: rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel,
paulmck
Hello,
I'm observing two unusual behaviors during the boot process on my SBSA
ARM machine, with upstream kernel (6.16-rc4):
1) A 9-second pause during early boot:
[ 0.000000] ACPI: SPCR: console: pl011,mmio32,0xc280000,115200
[ 0.420120] Serial: AMBA PL011 UART driver
[ 0.875263] printk: console [ttyAMA0] enabled
[ 9.848263] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])
2) Occasional CSD lock during early boot:
Intermittently, I encounter a CSD lock. Diagnosing this was challenging, but
after enabling PSEUDO NMI, I was able to capture the stack trace:
printk: console [ttyAMA0] enabled
smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#02 do_nothing (kernel/smp.c:1058)
smp: csd: CSD lock (#1) unresponsive.
Sending NMI from CPU 0 to CPUs 2:
....
pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P)
nbcon_emit_next_record (kernel/printk/nbcon.c:1030)
__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
vprintk_emit (kernel/printk/printk.c:2429)
On reviewing the amba-pl011.c code, I noticed that each message being flushed
causes the following loop to iterate roughly 20,000 times:
while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
cpu_relax();
Tracing this, I found that flushing early boot messages is taking a significant
amount of time. For example, trace_printk() output from that function shows:
swapper/0-1 [000] dN... 9.695941: pl011_console_write_atomic: "[ 0.928995] printk: console [ttyAMA0] enabled"
|
-> This is trace_printk of wctxt->outbuf
At timestamp 9.69 seconds, the serial console is still flushing messages from
0.92 seconds, indicating that the initial 9-second gap is spent looping in
cpu_relax()—about 20,000 times per message, which is clearly suboptimal.
Further debugging revealed the following sequence with the pl011 registers:
1) uart_console_write()
2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
Michael has reported a hardware issue where the BUSY bit could get
stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
If I am having the same hardware issue, I suppose I need to change that logic
to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0
instead of BUSY.
Anyway, any one familar with this weird behaviour?
Thanks
--breno
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao
@ 2025-07-02 17:20 ` Leo Yan
2025-07-03 10:02 ` Breno Leitao
2025-07-03 10:28 ` Mark Rutland
2025-08-06 11:44 ` Nirmoy Das
2 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2025-07-02 17:20 UTC (permalink / raw)
To: Breno Leitao
Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial,
rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck
Hi Breno,
On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote:
[...]
> Further debugging revealed the following sequence with the pl011 registers:
>
> 1) uart_console_write()
> 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
> 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
>
> Michael has reported a hardware issue where the BUSY bit could get
> stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
> bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
>
> If I am having the same hardware issue, I suppose I need to change that logic
> to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0
> instead of BUSY.
>
> Anyway, any one familar with this weird behaviour?
To be clear, I am not familiar with pl011 driver.
For the first step, could you confirm the UART port is only used by
Linux kernel?
In some cases, if normal world and secure world share the same UART
port, it can cause the UART state machine malfunction and long wait.
Thanks,
Leo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-02 17:20 ` Leo Yan
@ 2025-07-03 10:02 ` Breno Leitao
2025-07-03 11:45 ` Leo Yan
0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2025-07-03 10:02 UTC (permalink / raw)
To: Leo Yan
Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial,
rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck
On Wed, Jul 02, 2025 at 06:20:36PM +0100, Leo Yan wrote:
> Hi Breno,
>
> On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote:
>
> [...]
>
> > Further debugging revealed the following sequence with the pl011 registers:
> >
> > 1) uart_console_write()
> > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
> > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
> >
> > Michael has reported a hardware issue where the BUSY bit could get
> > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
> > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
> >
> > If I am having the same hardware issue, I suppose I need to change that logic
> > to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0
> > instead of BUSY.
> >
> > Anyway, any one familar with this weird behaviour?
>
> To be clear, I am not familiar with pl011 driver.
>
> For the first step, could you confirm the UART port is only used by
> Linux kernel?
>
> In some cases, if normal world and secure world share the same UART
> port, it can cause the UART state machine malfunction and long wait.
I don't know how to check it for sure, but, looking at the serial
console output, I don't see anything else using the UART. The only
output I see on the console at that time is coming from linux kernel.
Would you recommend any additional check?
Thanks for the reply,
--breno
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao
2025-07-02 17:20 ` Leo Yan
@ 2025-07-03 10:28 ` Mark Rutland
2025-07-03 14:13 ` Breno Leitao
2025-08-06 11:44 ` Nirmoy Das
2 siblings, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2025-07-03 10:28 UTC (permalink / raw)
To: Breno Leitao
Cc: cov, rmk+kernel, catalin.marinas, linux-serial, rmikey,
linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck
Hi Breno,
On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote:
> Hello,
>
> I'm observing two unusual behaviors during the boot process on my SBSA
> ARM machine, with upstream kernel (6.16-rc4):
Can you say which SoC in particular that is? Knowing that would help to
identify whether there's some known erratum, clocking issue, etc.
Likewise that might imply more folk to add to Cc.
[...]
> At timestamp 9.69 seconds, the serial console is still flushing messages from
> 0.92 seconds, indicating that the initial 9-second gap is spent looping in
> cpu_relax()—about 20,000 times per message, which is clearly suboptimal.
>
> Further debugging revealed the following sequence with the pl011 registers:
>
> 1) uart_console_write()
> 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
> 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
>
> Michael has reported a hardware issue where the BUSY bit could get
> stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
> bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
Looking at the commit message, that was an issue with the a "custom
(non-PrimeCell) implementation of the SBSA UART" present on QDF400. I
assume that was soemthing that Qualcomm Datacenter Technologies designed
themselves.
It's possible that your SoC has a similar issue with whatever IP block
is being used as the UART, but the issue in that commit certainly
doesn't apply to most PL011 / SBSA-UART implementations.
> If I am having the same hardware issue, I suppose I need to change that logic
> to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0
> instead of BUSY.
If you have the same issue, then applying the same workaround makes
sense. As above, it's not clear that this is necessarily the same issue.
Any more detail that you can share regarding the SoC would be very
helpful.
Mark.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-03 10:02 ` Breno Leitao
@ 2025-07-03 11:45 ` Leo Yan
2025-07-03 15:01 ` Breno Leitao
0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2025-07-03 11:45 UTC (permalink / raw)
To: Breno Leitao
Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial,
rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck
On Thu, Jul 03, 2025 at 03:02:39AM -0700, Breno Leitao wrote:
[...]
> > In some cases, if normal world and secure world share the same UART
> > port, it can cause the UART state machine malfunction and long wait.
>
> I don't know how to check it for sure, but, looking at the serial
> console output, I don't see anything else using the UART. The only
> output I see on the console at that time is coming from linux kernel.
>
> Would you recommend any additional check?
I have no experience in the driver, I should avoid any noise. But two
things in my head for quick trying.
- First, you could try earlycon mode, e.g., add option in kernel
command line:
earlycon=pl011,mmio32,0xc280000
earlycon=pl011,0xc280000
This would be possible to give more printing logs. If earlycon works
well, then the issue might be caused by later init code (clock, or
UART driver itself).
- Try another UART port if this is possible.
Just curious, you mentioned you are working on the mainline kernel. If
you can confirm a workable kernel version, then this info can help to
rule out hardware issue and we can review the delta changes.
Thanks,
Leo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-03 10:28 ` Mark Rutland
@ 2025-07-03 14:13 ` Breno Leitao
2025-07-03 16:31 ` Mark Rutland
0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2025-07-03 14:13 UTC (permalink / raw)
To: Mark Rutland
Cc: cov, rmk+kernel, catalin.marinas, linux-serial, rmikey,
linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck
On Thu, Jul 03, 2025 at 11:28:50AM +0100, Mark Rutland wrote:
> On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote:
> > I'm observing two unusual behaviors during the boot process on my SBSA
> > ARM machine, with upstream kernel (6.16-rc4):
>
> Can you say which SoC in particular that is? Knowing that would help to
> identify whether there's some known erratum, clocking issue, etc.
This is custom made rack mounted machine based on Grace CPU. Here are
some info about the hardware:
# lscpu:
Vendor ID: ARM
Model name: Neoverse-V2
Model: 0
Thread(s) per core: 1
Core(s) per socket: 72
Socket(s): 1
Stepping: r0p0
# /proc/cpuinfo
processor : 71
BogoMIPS : 2000.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd4f
CPU revision : 0
# lshw
description: Rack Mount Chassis
product: <Internal name>
vendor: Quanta
version: <Internal name>
width: 64 bits
capabilities: smbios-3.6.0 dmi-3.6.0 smp sve_default_vector_length tagged_addr_disabled
configuration: boot=normal chassis=rackmount family=Default string sku=Default string uuid=...
How do I find the SoC exactly?
> Likewise that might imply more folk to add to Cc.
>
> [...]
>
> > At timestamp 9.69 seconds, the serial console is still flushing messages from
> > 0.92 seconds, indicating that the initial 9-second gap is spent looping in
> > cpu_relax()-about 20,000 times per message, which is clearly suboptimal.
> >
> > Further debugging revealed the following sequence with the pl011 registers:
> >
> > 1) uart_console_write()
> > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
> > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
> >
> > Michael has reported a hardware issue where the BUSY bit could get
> > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
> > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
>
> Looking at the commit message, that was an issue with the a "custom
> (non-PrimeCell) implementation of the SBSA UART" present on QDF400. I
> assume that was soemthing that Qualcomm Datacenter Technologies designed
> themselves.
>
> It's possible that your SoC has a similar issue with whatever IP block
> is being used as the UART, but the issue in that commit certainly
> doesn't apply to most PL011 / SBSA-UART implementations.
That makes total sense. Decoding SPCR I see the following:
# iasl -d spcr.dat
Intel ACPI Component Architecture
ASL+ Optimizing Compiler/Disassembler version 20210604
Copyright (c) 2000 - 2021 Intel Corporation
File appears to be binary: found 56 non-ASCII characters, disassembling
Binary file appears to be a valid ACPI table, disassembling
Input file spcr.dat, Length 0x50 (80) bytes
ACPI: SPCR 0x0000000000000000 000050 (v02 NVIDIA A M I 00000001 ARMH 00010000)
Acpi Data Table [SPCR] decoded
Formatted output: spcr.dsl - 2624 bytes
Thanks,
--breno
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-03 11:45 ` Leo Yan
@ 2025-07-03 15:01 ` Breno Leitao
2025-07-03 16:17 ` Leo Yan
0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2025-07-03 15:01 UTC (permalink / raw)
To: Leo Yan
Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial,
rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck
Hello Leo,
On Thu, Jul 03, 2025 at 12:45:57PM +0100, Leo Yan wrote:
> On Thu, Jul 03, 2025 at 03:02:39AM -0700, Breno Leitao wrote:
> > > In some cases, if normal world and secure world share the same UART
> > > port, it can cause the UART state machine malfunction and long wait.
> >
> > I don't know how to check it for sure, but, looking at the serial
> > console output, I don't see anything else using the UART. The only
> > output I see on the console at that time is coming from linux kernel.
> >
> > Would you recommend any additional check?
>
> I have no experience in the driver, I should avoid any noise. But two
> things in my head for quick trying.
>
> - First, you could try earlycon mode, e.g., add option in kernel
> command line:
>
> earlycon=pl011,mmio32,0xc280000
> earlycon=pl011,0xc280000
>
> This would be possible to give more printing logs. If earlycon works
> well, then the issue might be caused by later init code (clock, or
> UART driver itself).
That is an interesting experiment, thanks for the heads-up.
I got `earlycon=pl011,mmio32,0xc280000` working, the machine boots fine,
and I see different behaviours.
Booting up the system the following lines (as examples):
[ 0.000000] earlycon: pl11 at MMIO32 0x000000000c280000 (options '115200')
[ 0.000000] printk: legacy bootconsole [pl11] enabled
[ 0.493205] smp: Brought up 1 node, 72 CPUs
[ 4.232864] SMP: Total of 72 processors activated.
[ 4.242788] CPU: All CPU(s) started at EL2
[ 6.318137] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
[ 6.978245] netpoll: netconsole: local port 6666
[ 7.234429] Run /init as init process
Also, I have a trace_printk() in pl011_console_write_atomic, and it only
start later in the code. More precisely at:
swapper/0-1 [000] dN.1. 6.334581: pl011_console_write_atomic: Printing 49 [ 6.334573] printk: console [ttyAMA0] enabled
I understand that the early boot messages got sent by some other
function? (Or maybe the tracing framework was not initialized?)
Anyway, pl011_console_write_atomic() continues to call ~20K times cpu_relax()
waiting for the BUSY bit to disappear, as before.
For comparison, these are the same line as above, when earlycon is not
set:
[ 0.154009] smp: Brought up 1 node, 72 CPUs
[ 0.154012] SMP: Total of 72 processors activated.
[ 0.154013] CPU: All CPU(s) started at EL2
[ 0.466949] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
[ 0.466967] printk: console [ttyAMA0] enabled
[ 0.466949] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
[ 5.513307] smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#05 do_nothing+0x0/0x10(0x0).
[ 10.719305] netpoll: netconsole: local port 6666
[ 10.961798] Run /init as init process
From what I see, using earlycon boots the kernel faster. Here are some
ideas I got:
1) Earlycon enables the message to be printed earlier, thus, it smooth
out the serial console
2) without earlycon, the messages are printed at once, with interrupt
disabled
2.a) This causes CSD locks
2.b) CSD locks caused an additional 3 seconds delay
3) pl011_console_write_atomic is slow on both cases
Thanks for the good suggestion,
--breno
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-03 15:01 ` Breno Leitao
@ 2025-07-03 16:17 ` Leo Yan
0 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-03 16:17 UTC (permalink / raw)
To: Breno Leitao
Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial,
rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck
Hi Breno,
On Thu, Jul 03, 2025 at 04:01:54PM +0100, Breno Leitao wrote:
[...]
> That is an interesting experiment, thanks for the heads-up.
>
> I got `earlycon=pl011,mmio32,0xc280000` working, the machine boots fine,
> and I see different behaviours.
>
> Booting up the system the following lines (as examples):
>
> [ 0.000000] earlycon: pl11 at MMIO32 0x000000000c280000 (options '115200')
> [ 0.000000] printk: legacy bootconsole [pl11] enabled
> [ 0.493205] smp: Brought up 1 node, 72 CPUs
> [ 4.232864] SMP: Total of 72 processors activated.
> [ 4.242788] CPU: All CPU(s) started at EL2
> [ 6.318137] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
> [ 6.978245] netpoll: netconsole: local port 6666
> [ 7.234429] Run /init as init process
>
> Also, I have a trace_printk() in pl011_console_write_atomic, and it only
> start later in the code. More precisely at:
>
> swapper/0-1 [000] dN.1. 6.334581: pl011_console_write_atomic: Printing 49 [ 6.334573] printk: console [ttyAMA0] enabled
>
> I understand that the early boot messages got sent by some other
> function? (Or maybe the tracing framework was not initialized?)
>
> Anyway, pl011_console_write_atomic() continues to call ~20K times cpu_relax()
> waiting for the BUSY bit to disappear, as before.
If earlycon works correctly, it means the UART clock has been prepared
before the kernel boots.
I noticed that pl011_console_write_atomic() enables and disables the
uap->clk clock dynamically. The clock is a SoC implementation. If the
UART clock is already properly enabled for earlycon, we can assume it
doesn't need to be managed again in the UART driver.
So, perhaps it's worth an experiment by commenting out the clock
operations in the UART driver?
Thanks,
Leo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-03 14:13 ` Breno Leitao
@ 2025-07-03 16:31 ` Mark Rutland
2025-07-08 14:00 ` Breno Leitao
0 siblings, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2025-07-03 16:31 UTC (permalink / raw)
To: Breno Leitao
Cc: cov, rmk+kernel, catalin.marinas, linux-serial, rmikey,
linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck,
Ankit Agrawal, Besar Wicaksono
On Thu, Jul 03, 2025 at 03:13:26PM +0100, Breno Leitao wrote:
> On Thu, Jul 03, 2025 at 11:28:50AM +0100, Mark Rutland wrote:
> > On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote:
> > > I'm observing two unusual behaviors during the boot process on my SBSA
> > > ARM machine, with upstream kernel (6.16-rc4):
> >
> > Can you say which SoC in particular that is? Knowing that would help to
> > identify whether there's some known erratum, clocking issue, etc.
>
> This is custom made rack mounted machine based on Grace CPU. Here are
> some info about the hardware:
>
> # lscpu:
> Vendor ID: ARM
> Model name: Neoverse-V2
> Model: 0
> Thread(s) per core: 1
> Core(s) per socket: 72
> Socket(s): 1
> Stepping: r0p0
>
> # /proc/cpuinfo
> processor : 71
> BogoMIPS : 2000.00
> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
> CPU implementer : 0x41
> CPU architecture: 8
> CPU variant : 0x0
> CPU part : 0xd4f
> CPU revision : 0
>
> # lshw
> description: Rack Mount Chassis
> product: <Internal name>
> vendor: Quanta
> version: <Internal name>
> width: 64 bits
> capabilities: smbios-3.6.0 dmi-3.6.0 smp sve_default_vector_length tagged_addr_disabled
> configuration: boot=normal chassis=rackmount family=Default string sku=Default string uuid=...
>
> How do I find the SoC exactly?
From what you've told me above, the SoC is Nvidia Grace; what they call
the CPU is the whole SoC.
> > Likewise that might imply more folk to add to Cc.
I've added Ankit and Besar, since they've both worked on some system
level bits on Grace, and might have an idea.
Ankit, Besar, are you aware of any UART issues on Grace (as described in
Breno's messages below), or do you know of anyone who might have an
idea?
Thanks,
Mark.
> > [...]
> >
> > > At timestamp 9.69 seconds, the serial console is still flushing messages from
> > > 0.92 seconds, indicating that the initial 9-second gap is spent looping in
> > > cpu_relax()-about 20,000 times per message, which is clearly suboptimal.
> > >
> > > Further debugging revealed the following sequence with the pl011 registers:
> > >
> > > 1) uart_console_write()
> > > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax())
> > > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax()
> > >
> > > Michael has reported a hardware issue where the BUSY bit could get
> > > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY
> > > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long.
> >
> > Looking at the commit message, that was an issue with the a "custom
> > (non-PrimeCell) implementation of the SBSA UART" present on QDF400. I
> > assume that was soemthing that Qualcomm Datacenter Technologies designed
> > themselves.
> >
> > It's possible that your SoC has a similar issue with whatever IP block
> > is being used as the UART, but the issue in that commit certainly
> > doesn't apply to most PL011 / SBSA-UART implementations.
>
> That makes total sense. Decoding SPCR I see the following:
>
> # iasl -d spcr.dat
> Intel ACPI Component Architecture
> ASL+ Optimizing Compiler/Disassembler version 20210604
> Copyright (c) 2000 - 2021 Intel Corporation
>
> File appears to be binary: found 56 non-ASCII characters, disassembling
> Binary file appears to be a valid ACPI table, disassembling
> Input file spcr.dat, Length 0x50 (80) bytes
> ACPI: SPCR 0x0000000000000000 000050 (v02 NVIDIA A M I 00000001 ARMH 00010000)
> Acpi Data Table [SPCR] decoded
> Formatted output: spcr.dsl - 2624 bytes
>
> Thanks,
> --breno
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-03 16:31 ` Mark Rutland
@ 2025-07-08 14:00 ` Breno Leitao
2025-07-09 14:23 ` Breno Leitao
0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2025-07-08 14:00 UTC (permalink / raw)
To: Mark Rutland, ankita, bwicaksono
Cc: rmk+kernel, catalin.marinas, linux-serial, rmikey,
linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck,
Ankit Agrawal, Besar Wicaksono
On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote:
> > How do I find the SoC exactly?
>
> >From what you've told me above, the SoC is Nvidia Grace; what they call
> the CPU is the whole SoC.
>
> > > Likewise that might imply more folk to add to Cc.
>
> I've added Ankit and Besar, since they've both worked on some system
> level bits on Grace, and might have an idea.
>
> Ankit, Besar, are you aware of any UART issues on Grace (as described in
> Breno's messages below), or do you know of anyone who might have an
> idea?
Here is more information I got about this problem. TL;DR: While the
machine is booting, it is throttled by the UART speed, while having IRQ
disabled.
Here is my line of investigation:
1) when the kernel start to flush the messages, there are ~700 messages
in the buffer that will be flushed using
pl011_console_write_atomic()
[ 0.653673] printk: console [ttyAMA0] enabled
[ 9.239225] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])
a) pl011_console_write_atomic() is called ~700 times. Each message
have around 50 chars.
b) For each char, it needs to wait for it to be sent before proceeding.
3) pl011_console_write_atomic() have two big while()/cpu_relax() blocks.
One per character and one per message.
a) the first one is in pl011_console_putchar(), where it checks if the
FIFO is full before writing a single character. This is done one per
character.
while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF)
cpu_relax();
b) After the line is written, then it hits the second one, in
pl011_console_write_atomic(), where it waits until the controller is
not busy anymore:
while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
cpu_relax();
4) The controller has FIFO enabled, but, it doesn't help much, because
the CPU is waiting for the 700 * 50 chars to be transmitted.
5) At this time we just have one single CPU online (?)
6) This is done with IRQ disabled, which causes a CSD lock to trigger,
given that the CPU has IRQ disabled
a) Following Paul's suggestion, I added
a `lockdep_assert_irqs_enabled()` at the entrance of
pl011_console_write_atomic(), and that is what I see.
hardirqs last enabled at (256267): [<ffff800080364980>] vprintk_store+0x6f8/0x820
hardirqs last disabled at (256268): [<ffff80008036e1c4>] __nbcon_atomic_flush_pending+0x2c4/0x3e8
softirqs last enabled at (255640): [<ffff8000801bb020>] handle_softirqs+0x910/0xe78
softirqs last disabled at (255635): [<ffff800080010938>] __do_softirq+0x18/0x20
b) This is the full stack in how we are calling cpu_relax() with irq
disabled million times:
WARNING: CPU: 0 PID: 1 at drivers/tty/serial/amba-pl011.c:2545 pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545)
Modules linked in:
pstate: 234003c9 (nzCv DAIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545)
lr : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545)
sp : ffff80008980dca0
pmr: 000000f0
x29: ffff80008980dca0 x28: ffff0000aebcc080 x27: dfff800000000000
x26: 1fffe00015d7985a x25: 0000000000000000 x24: 0000000000000000
x23: ffff0000a1800000 x22: ffff800087e6d7f8 x21: 1fffe00015d7985b
x20: ffff80008980dee0 x19: 0000000000000003 x18: 00000000ffffffff
x17: 30313478305b2030 x16: 3030303230303030 x15: 3078302055504320
x14: 6c61636973796870 x13: 0a2a2a2065676173 x12: ffff700011301b8b
x11: 1ffff00011301b8a x10: ffff700011301b8a x9 : ffff800081eafdb4
x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000000
x5 : 1ffff00011301be2 x4 : 0000000000000018 x3 : 0000000000000000
x2 : 1ffff00010fcdb57 x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545) (P)
nbcon_emit_next_record (kernel/printk/nbcon.c:1026)
__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
vprintk_emit (kernel/printk/printk.c:2429)
vprintk_default (kernel/printk/printk.c:2466)
vprintk (kernel/printk/printk_safe.c:83)
_printk (kernel/printk/printk.c:2470)
__warn (kernel/panic.c:768)
report_bug (lib/bug.c:197 lib/bug.c:215)
bug_handler (arch/arm64/kernel/traps.c:1002)
call_break_hook (arch/arm64/kernel/debug-monitors.c:319)
brk_handler (arch/arm64/kernel/debug-monitors.c:325)
do_debug_exception (arch/arm64/mm/fault.c:1002)
el1_dbg (arch/arm64/kernel/entry-common.c:514)
el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:567)
el1h_64_sync (arch/arm64/kernel/entry.S:595)
pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2524) (P)
nbcon_emit_next_record (kernel/printk/nbcon.c:1026)
__nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498)
__nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593)
nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610)
vprintk_emit (kernel/printk/printk.c:2429)
vprintk_default (kernel/printk/printk.c:2466)
vprintk (kernel/printk/printk_safe.c:83)
_printk (kernel/printk/printk.c:2470)
register_console (kernel/printk/printk.c:4126 (discriminator 9))
serial_core_register_port (drivers/tty/serial/serial_core.c:2637 drivers/tty/serial/serial_core.c:3157 drivers/tty/serial/serial_core.c:3388)
serial_ctrl_register_port (drivers/tty/serial/serial_ctrl.c:42)
uart_add_one_port (drivers/tty/serial/serial_port.c:144)
pl011_register_port (drivers/tty/serial/amba-pl011.c:2867)
sbsa_uart_probe (drivers/tty/serial/amba-pl011.c:3041)
platform_probe (drivers/base/platform.c:1405)
really_probe (drivers/base/dd.c:579 drivers/base/dd.c:657)
__driver_probe_device (drivers/base/dd.c:799)
driver_probe_device (drivers/base/dd.c:829)
__device_attach_driver (drivers/base/dd.c:958)
bus_for_each_drv (drivers/base/bus.c:462)
__device_attach (drivers/base/dd.c:1031)
device_initial_probe (drivers/base/dd.c:1079)
bus_probe_device (drivers/base/bus.c:537)
device_add (drivers/base/core.c:3699)
platform_device_add (drivers/base/platform.c:716)
platform_device_register_full (drivers/base/platform.c:845)
acpi_create_platform_device (./include/linux/err.h:70 drivers/acpi/acpi_platform.c:178)
acpi_bus_attach (./include/linux/acpi.h:738 drivers/acpi/scan.c:2206 drivers/acpi/scan.c:2198 drivers/acpi/scan.c:2316)
acpi_dev_for_one_check (drivers/acpi/bus.c:1146)
device_for_each_child (drivers/base/core.c:4021)
acpi_dev_for_each_child (drivers/acpi/bus.c:1151)
acpi_bus_attach (drivers/acpi/scan.c:2323)
acpi_dev_for_one_check (drivers/acpi/bus.c:1146)
device_for_each_child (drivers/base/core.c:4021)
acpi_dev_for_each_child (drivers/acpi/bus.c:1151)
acpi_bus_attach (drivers/acpi/scan.c:2323)
acpi_bus_scan (drivers/acpi/scan.c:2533 drivers/acpi/scan.c:2606)
acpi_scan_init (drivers/acpi/scan.c:2739)
acpi_init (drivers/acpi/bus.c:1470)
do_one_initcall (init/main.c:1274)
kernel_init_freeable (init/main.c:1335 init/main.c:1352 init/main.c:1371 init/main.c:1584)
kernel_init (init/main.c:1476)
ret_from_fork (arch/arm64/kernel/entry.S:863)
irq event stamp: 256268
hardirqs last enabled at (256267): vprintk_store (kernel/printk/printk.c:2356 (discriminator 1))
hardirqs last disabled at (256268): __nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1539 kernel/printk/nbcon.c:1593)
softirqs last enabled at (255640): handle_softirqs (./arch/arm64/include/asm/current.h:19 ./arch/arm64/include/asm/preempt.h:13 kernel/softirq.c:426 kernel/softirq.c:607)
softirqs last disabled at (255635): __do_softirq (kernel/softirq.c:614)
---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-08 14:00 ` Breno Leitao
@ 2025-07-09 14:23 ` Breno Leitao
2025-07-10 13:35 ` Leo Yan
0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2025-07-09 14:23 UTC (permalink / raw)
To: Mark Rutland, ankita, bwicaksono
Cc: rmk+kernel, catalin.marinas, linux-serial, rmikey,
linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck
On Tue, Jul 08, 2025 at 07:00:45AM -0700, Breno Leitao wrote:
> On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote:
>
> Here is more information I got about this problem. TL;DR: While the
> machine is booting, it is throttled by the UART speed, while having IRQ
> disabled.
quick update: I've identified a solution that significantly improves the
situation. I've found that the serial issue was heavily affecting boot
time, which is unleashed now.
After applying the following fix, the boot speed has improved
dramatically. It's the fastest I've seen, and the CSD lockups are gone.
If no concerns raise in the next days, I will send it officially to the
serial maintainers.
Author: Breno Leitao <leitao@debian.org>
Date: Wed Jul 9 05:57:06 2025 -0700
serial: amba-pl011: Fix boot performance by switching to console_initcall()
Replace arch_initcall() with console_initcall() for PL011 driver initialization
to resolve severe boot performance issues.
The current arch_initcall() registration causes the console to initialize
before the printk subsystem is ready, forcing the driver into atomic mode
during early boot. This results in:
- 5-8 second boot delay while ~700 boot messages are processed
- System freeze with IRQs disabled during message output
- Each character transmitted synchronously with cpu_relax() polling
This is what is driving the driver to atomic mode in the early boot:
static inline void printk_get_console_flush_type(struct console_flush_type *ft)
{
....
if (printk_kthreads_running)
ft->nbcon_offload = true;
The atomic path processes each character individually through
pl011_console_putchar(), waiting for UART transmission completion
before proceeding. With only one CPU online during early boot,
this creates a bottleneck where the system spends excessive time
in interrupt-disabled state.
Here is how the code looks like:
1) disable interrupt
2) for each of these 700 messages, call pl011_console_write_atomic()
3) for each character in the message, calls pl011_console_putchar(),
which waits for the character to be transmitted
4) once all the line is transmitted, wait for the UART to be idle
5) re-enable interrupt
Here is the code representation of the above:
pl011_console_write_atomic() {
...
// For each char in the message
pl011_console_putchar() {
while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF)
cpu_relax();
}
while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
cpu_relax();
Using console_initcall() ensures proper initialization order,
allowing the printk subsystem to use threaded output instead
of atomic mode, eliminating the performance bottleneck.
Performance improvement: 16x faster kernel boot time at my GRACE SoC
machine.
- Before: 10.08s to reach init process
- After: 0.62s to reach init process
Here are more timing details, collected from Linus' upstream, where the
only different is this patch:
Linus upstream:
[ 0.616203] printk: legacy console [netcon_ext0] enabled
[ 0.627469] Run /init as init process
[ 0.837477] loop: module loaded
[ 8.354803] Adding 134199360k swap on /swapvol/swapfile.
With this patch:
[ 0.305109] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
[ 10.081742] Run /init as init process
[ 13.288717] loop: module loaded
[ 22.919934] Adding 134199168k swap on /swapvol/swapfile.
Link: https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/ [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 22939841b1de..0cf251365825 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -3116,7 +3116,7 @@ static void __exit pl011_exit(void)
* While this can be a module, if builtin it's most likely the console
* So let's leave module_exit but move module_init to an earlier place
*/
-arch_initcall(pl011_init);
+console_initcall(pl011_init);
module_exit(pl011_exit);
MODULE_AUTHOR("ARM Ltd/Deep Blue Solutions Ltd");
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-09 14:23 ` Breno Leitao
@ 2025-07-10 13:35 ` Leo Yan
2025-07-10 17:30 ` Breno Leitao
0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2025-07-10 13:35 UTC (permalink / raw)
To: Breno Leitao
Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas,
linux-serial, rmikey, linux-arm-kernel, usamaarif642,
linux-kernel, paulmck
Hi Breno,
On Wed, Jul 09, 2025 at 07:23:44AM -0700, Breno Leitao wrote:
> On Tue, Jul 08, 2025 at 07:00:45AM -0700, Breno Leitao wrote:
> > On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote:
> >
> > Here is more information I got about this problem. TL;DR: While the
> > machine is booting, it is throttled by the UART speed, while having IRQ
> > disabled.
>
> quick update: I've identified a solution that significantly improves the
> situation. I've found that the serial issue was heavily affecting boot
> time, which is unleashed now.
>
> After applying the following fix, the boot speed has improved
> dramatically. It's the fastest I've seen, and the CSD lockups are gone.
Thanks for trying to fix the issue.
> If no concerns raise in the next days, I will send it officially to the
> serial maintainers.
I am not an expert on the PL011 driver, however, I do have concerns
after review the change. Please see my comments below.
> Author: Breno Leitao <leitao@debian.org>
> Date: Wed Jul 9 05:57:06 2025 -0700
>
> serial: amba-pl011: Fix boot performance by switching to console_initcall()
>
> Replace arch_initcall() with console_initcall() for PL011 driver initialization
> to resolve severe boot performance issues.
pl011_init() registers as an AMBA device, so the PL011 driver depends
on the AMBA bus initialization. The AMBA bus is initialized with:
postcore_initcall(amba_init);
Therefore, the PL011 driver is initialized with arch_initcall(), which
occurs later than the postcore init.
My understanding is that console_initcall() is invoked much earlier
than other initcalls triggered by do_initcalls(). With your change, I
saw the PL011 driver fails to register on Juno-r2 board, due to AMBA bus
driver is not ready for a console init.
Driver 'uart-pl011' was unable to register with bus_type 'amba'
because the bus was not initialized.
> The current arch_initcall() registration causes the console to initialize
> before the printk subsystem is ready, forcing the driver into atomic mode
> during early boot. This results in:
>
> - 5-8 second boot delay while ~700 boot messages are processed
> - System freeze with IRQs disabled during message output
> - Each character transmitted synchronously with cpu_relax() polling
>
> This is what is driving the driver to atomic mode in the early boot:
>
> static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> {
> ....
> if (printk_kthreads_running)
> ft->nbcon_offload = true;
>
> The atomic path processes each character individually through
> pl011_console_putchar(), waiting for UART transmission completion
> before proceeding. With only one CPU online during early boot,
> this creates a bottleneck where the system spends excessive time
> in interrupt-disabled state.
The atomic path is introduced recently by the commit:
2eb2608618ce ("serial: amba-pl011: Implement nbcon console")
My conclusion is that changing the initcall will not disable the atomic
path, changing to console_initcall() will cause AMBA device init
failure, and as a result, the clock operations will not be invoked.
Thus, I am curious if you have ruled out the issue is caused by the UART
clock (as I mentioned in another reply).
BTW, since the atomic path is enabled in the commit 2eb2608618ce, what
is the result after reverting the commit?
Thanks,
Leo
> Here is how the code looks like:
>
> 1) disable interrupt
> 2) for each of these 700 messages, call pl011_console_write_atomic()
> 3) for each character in the message, calls pl011_console_putchar(),
> which waits for the character to be transmitted
> 4) once all the line is transmitted, wait for the UART to be idle
> 5) re-enable interrupt
>
> Here is the code representation of the above:
>
> pl011_console_write_atomic() {
> ...
> // For each char in the message
> pl011_console_putchar() {
> while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF)
> cpu_relax();
> }
> while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy)
> cpu_relax();
>
> Using console_initcall() ensures proper initialization order,
> allowing the printk subsystem to use threaded output instead
> of atomic mode, eliminating the performance bottleneck.
>
> Performance improvement: 16x faster kernel boot time at my GRACE SoC
> machine.
>
> - Before: 10.08s to reach init process
> - After: 0.62s to reach init process
>
> Here are more timing details, collected from Linus' upstream, where the
> only different is this patch:
>
> Linus upstream:
> [ 0.616203] printk: legacy console [netcon_ext0] enabled
> [ 0.627469] Run /init as init process
> [ 0.837477] loop: module loaded
> [ 8.354803] Adding 134199360k swap on /swapvol/swapfile.
>
> With this patch:
> [ 0.305109] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
> [ 10.081742] Run /init as init process
> [ 13.288717] loop: module loaded
> [ 22.919934] Adding 134199168k swap on /swapvol/swapfile.
>
> Link: https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/ [1]
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
>
> diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
> index 22939841b1de..0cf251365825 100644
> --- a/drivers/tty/serial/amba-pl011.c
> +++ b/drivers/tty/serial/amba-pl011.c
> @@ -3116,7 +3116,7 @@ static void __exit pl011_exit(void)
> * While this can be a module, if builtin it's most likely the console
> * So let's leave module_exit but move module_init to an earlier place
> */
> -arch_initcall(pl011_init);
> +console_initcall(pl011_init);
> module_exit(pl011_exit);
>
> MODULE_AUTHOR("ARM Ltd/Deep Blue Solutions Ltd");
>
>
>
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-10 13:35 ` Leo Yan
@ 2025-07-10 17:30 ` Breno Leitao
2025-07-11 9:50 ` Leo Yan
0 siblings, 1 reply; 17+ messages in thread
From: Breno Leitao @ 2025-07-10 17:30 UTC (permalink / raw)
To: Leo Yan
Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas,
linux-serial, rmikey, linux-arm-kernel, usamaarif642,
linux-kernel, paulmck
Hello Leo,
On Thu, Jul 10, 2025 at 02:35:57PM +0100, Leo Yan wrote:
> > serial: amba-pl011: Fix boot performance by switching to console_initcall()
> >
> > Replace arch_initcall() with console_initcall() for PL011 driver initialization
> > to resolve severe boot performance issues.
>
> pl011_init() registers as an AMBA device, so the PL011 driver depends
> on the AMBA bus initialization. The AMBA bus is initialized with:
>
> postcore_initcall(amba_init);
>
> Therefore, the PL011 driver is initialized with arch_initcall(), which
> occurs later than the postcore init.
>
> My understanding is that console_initcall() is invoked much earlier
> than other initcalls triggered by do_initcalls(). With your change, I
> saw the PL011 driver fails to register on Juno-r2 board, due to AMBA bus
> driver is not ready for a console init.
You are absolutely correct. I confirm that the machine is fast because
pl011 is not working, which is sad and embarassing.
> The atomic path is introduced recently by the commit:
>
> 2eb2608618ce ("serial: amba-pl011: Implement nbcon console")
>
> My conclusion is that changing the initcall will not disable the atomic
> path, changing to console_initcall() will cause AMBA device init
> failure, and as a result, the clock operations will not be invoked.
> Thus, I am curious if you have ruled out the issue is caused by the UART
> clock (as I mentioned in another reply).
>
> BTW, since the atomic path is enabled in the commit 2eb2608618ce, what
> is the result after reverting the commit?
I've reverted commit 2eb2608618ce ("serial: amba-pl011: Implement nbcon
console"), and I don't see the CSD locks anymoer. The serial speed is
the same and continue to be slow, but, the CSD lock is not there. Here
is the time spent on the serial flush when reverting the commit above
[ 0.309561] printk: legacy console [ttyAMA0] enabled
[ 8.657938] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])
Thanks
--breno
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-10 17:30 ` Breno Leitao
@ 2025-07-11 9:50 ` Leo Yan
[not found] ` <cno3lsprrz36gqu27omvwrw27d2ubqtshac3cahfgkhcm2rla7@bpnulcmsyx2u>
0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2025-07-11 9:50 UTC (permalink / raw)
To: Breno Leitao
Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas,
linux-serial, rmikey, linux-arm-kernel, usamaarif642,
linux-kernel, paulmck
Hi Breno,
On Thu, Jul 10, 2025 at 10:30:45AM -0700, Breno Leitao wrote:
[...]
> > The atomic path is introduced recently by the commit:
> >
> > 2eb2608618ce ("serial: amba-pl011: Implement nbcon console")
> >
> > My conclusion is that changing the initcall will not disable the atomic
> > path, changing to console_initcall() will cause AMBA device init
> > failure, and as a result, the clock operations will not be invoked.
> > Thus, I am curious if you have ruled out the issue is caused by the UART
> > clock (as I mentioned in another reply).
> >
> > BTW, since the atomic path is enabled in the commit 2eb2608618ce, what
> > is the result after reverting the commit?
>
> I've reverted commit 2eb2608618ce ("serial: amba-pl011: Implement nbcon
> console"), and I don't see the CSD locks anymoer. The serial speed is
> the same and continue to be slow, but, the CSD lock is not there. Here
> is the time spent on the serial flush when reverting the commit above
>
> [ 0.309561] printk: legacy console [ttyAMA0] enabled
> [ 8.657938] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])
From this result, we can know both the atomic path and the thread path
take a long time polling.
Since both paths configure the UART clock, I'm curious about the
behaviour if the UART clock is untouched. The relevant code is shown
below.
I may seem a bit stubborn in suspecting a clock issue :) But if you
have confirmed that a standard pl011 UART IP is being used, then the
only external factor I am aware of is the clock.
Thanks,
Leo
---8<---
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 71482d639a6d..b04773ba2602 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -64,6 +64,8 @@ static int amba_get_enable_pclk(struct amba_device *pcdev)
{
int ret;
+ return 0;
+
pcdev->pclk = clk_get(&pcdev->dev, "apb_pclk");
if (IS_ERR(pcdev->pclk))
return PTR_ERR(pcdev->pclk);
@@ -77,8 +79,8 @@ static int amba_get_enable_pclk(struct amba_device *pcdev)
static void amba_put_disable_pclk(struct amba_device *pcdev)
{
- clk_disable_unprepare(pcdev->pclk);
- clk_put(pcdev->pclk);
+ //clk_disable_unprepare(pcdev->pclk);
+ //clk_put(pcdev->pclk);
}
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index 22939841b1de..9ba24afb6e4f 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -1777,7 +1777,7 @@ static int pl011_hwinit(struct uart_port *port)
/*
* Try to enable the clock producer.
*/
- retval = clk_prepare_enable(uap->clk);
+ retval = clk_prepare(uap->clk);
if (retval)
return retval;
@@ -1934,7 +1934,7 @@ static int pl011_startup(struct uart_port *port)
return 0;
clk_dis:
- clk_disable_unprepare(uap->clk);
+ //clk_disable_unprepare(uap->clk);
return retval;
}
@@ -2025,7 +2025,7 @@ static void pl011_shutdown(struct uart_port *port)
/*
* Shut down the clock producer
*/
- clk_disable_unprepare(uap->clk);
+ //clk_disable_unprepare(uap->clk);
/* Optionally let pins go into sleep states */
pinctrl_pm_select_sleep_state(port->dev);
@@ -2524,7 +2524,7 @@ pl011_console_write_atomic(struct console *co, struct nbcon_write_context *wctxt
if (!nbcon_enter_unsafe(wctxt))
return;
- clk_enable(uap->clk);
+ //clk_enable(uap->clk);
if (!uap->vendor->always_enabled) {
old_cr = pl011_read(uap, REG_CR);
@@ -2542,7 +2542,7 @@ pl011_console_write_atomic(struct console *co, struct nbcon_write_context *wctxt
if (!uap->vendor->always_enabled)
pl011_write(old_cr, uap, REG_CR);
- clk_disable(uap->clk);
+ //clk_disable(uap->clk);
nbcon_exit_unsafe(wctxt);
}
@@ -2556,7 +2556,7 @@ pl011_console_write_thread(struct console *co, struct nbcon_write_context *wctxt
if (!nbcon_enter_unsafe(wctxt))
return;
- clk_enable(uap->clk);
+ //clk_enable(uap->clk);
if (!uap->vendor->always_enabled) {
old_cr = pl011_read(uap, REG_CR);
@@ -2586,7 +2586,7 @@ pl011_console_write_thread(struct console *co, struct nbcon_write_context *wctxt
if (!uap->vendor->always_enabled)
pl011_write(old_cr, uap, REG_CR);
- clk_disable(uap->clk);
+ //clk_disable(uap->clk);
nbcon_exit_unsafe(wctxt);
}
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
[not found] ` <cno3lsprrz36gqu27omvwrw27d2ubqtshac3cahfgkhcm2rla7@bpnulcmsyx2u>
@ 2025-07-14 10:52 ` Leo Yan
0 siblings, 0 replies; 17+ messages in thread
From: Leo Yan @ 2025-07-14 10:52 UTC (permalink / raw)
To: Breno Leitao
Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas,
linux-serial, rmikey, linux-arm-kernel, usamaarif642,
linux-kernel, paulmck
On Fri, Jul 11, 2025 at 03:45:03AM -0700, Breno Leitao wrote:
> On Fri, Jul 11, 2025 at 10:50:23AM +0100, Leo Yan wrote:
> > > I've reverted commit 2eb2608618ce ("serial: amba-pl011: Implement nbcon
> > > console"), and I don't see the CSD locks anymoer. The serial speed is
> > > the same and continue to be slow, but, the CSD lock is not there. Here
> > > is the time spent on the serial flush when reverting the commit above
> > >
> > > [ 0.309561] printk: legacy console [ttyAMA0] enabled
> > > [ 8.657938] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff])
> >
> > From this result, we can know both the atomic path and the thread path
> > take a long time polling.
>
> I am wondering if this slowdown is by design, or, if there is a bug
> somewhere.
Polling for over 5 seconds (5,001,000,000 ns) seems too long; it
should not be by design.
> > Since both paths configure the UART clock, I'm curious about the
> > behaviour if the UART clock is untouched. The relevant code is shown
> > below.
>
> Is this the clock frequency that is connected to pl011 coming from AMBA?
The programming clock is for programming registers, my understanding is
the clock is not provided by bus.
> > I may seem a bit stubborn in suspecting a clock issue :) But if you
> > have confirmed that a standard pl011 UART IP is being used.
>
> How do I double check this is a pl011 UART IP or if this is being
> emulated by firmware/ACPI.
The log shows:
ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
This should be emulated by firmware by reading Peripheral Identification
Registers (see [1]).
The most reliable way is to confirm with SoC vendor if the IP is
standard or includes any modification.
[1] https://developer.arm.com/documentation/ddi0183/g/programmers-model/register-descriptions/peripheral-identification-registers--uartperiphid0-3
> > diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> > index 71482d639a6d..b04773ba2602 100644
> > --- a/drivers/amba/bus.c
> > +++ b/drivers/amba/bus.c
> >
> I've tested it, but, it seems it didnt' help much. here is the full log
[...]
> [ 0.307809] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA
> [ 0.307816] printk: console [ttyAMA0] enabled
Based on this log and followed OOPS, the issue happens just after
enabling console. It is a bit tricky that it prints logs during
console registration (and configuration).
Seems to me, it is good to check pl011_console_setup() or
sbsa_uart_set_termios() has any impaction on the long waiting. For
example, the flow below will configure pin control and which might
impact the data transaction:
pl011_console_setup()
`> pinctrl_pm_select_default_state()
Based on the information shared earlier, the UART FIFO appears to be
stalled, which is causing the long wait times.
Apologies if my suggestion was unreliable or misled anything, and
welcome others to freely correct me.
Thanks,
Leo
> [ 5.414515] smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#01 do_nothing+0x0/0x10(0x0).
> [ 5.414532] smp: csd: CSD lock (#1) unresponsive.
> [ 5.414535] Sending NMI from CPU 0 to CPUs 1:
> [ 5.414541] NMI backtrace for cpu 1
> [ 5.414545] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.16.0-rc5-00056-gd7753fd13f9e #86 NONE
> [ 5.414548] Hardware name: Quanta JAVA ISLAND PVT 29F0EMAZ049/Java Island, BIOS F0EJ3A13 06/04/2025
> [ 5.414550] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [ 5.414552] pc : pl011_console_write_atomic+0x94/0x170
> [ 5.414559] lr : pl011_console_write_atomic+0x84/0x170
> [ 5.414562] sp : ffff80008492f1a0
> [ 5.414564] pmr: 000000c0
> [ 5.414565] x29: ffff80008492f1a0 x28: 0000000000000000 x27: 00000000000000e0
> [ 5.414572] x26: 0000000000000190 x25: 0000000000000000 x24: 0000000000000000
> [ 5.414578] x23: 0000000000000001 x22: 0000000000000000 x21: ffff800080c38ea8
> [ 5.414584] x20: ffff80008492f288 x19: ffff0000b2060080 x18: 00000000ffffffff
> [ 5.414590] x17: 6362323230303030 x16: 3030303078303a30 x15: 206e6f6967657220
> [ 5.414595] x14: 3030303064322072 x13: 3030303063623232 x12: 3030303030303030
> [ 5.414601] x11: 78303a30206e6f69 x10: 6765722030303030 x9 : ffff800080c390e4
> [ 5.414607] x8 : 6964657220646e75 x7 : 205d313533383630 x6 : 302e30202020205b
> [ 5.414612] x5 : ffff0000ad10200f x4 : 0000000000000000 x3 : 0000000000000008
> [ 5.414618] x2 : ffff8000816336c0 x1 : 0000000000000018 x0 : 0000000000000018
> [ 5.414624] Call trace:
> [ 5.414626] pl011_console_write_atomic+0x94/0x170 (P)
> [ 5.414630] nbcon_emit_next_record+0x234/0x388
> [ 5.414634] __nbcon_atomic_flush_pending_con+0x88/0x108
> [ 5.414637] __nbcon_atomic_flush_pending+0x108/0x198
> [ 5.414640] nbcon_atomic_flush_pending+0x24/0x38
> [ 5.414643] vprintk_emit+0x200/0x348
> [ 5.414645] vprintk_default+0x3c/0x50
> [ 5.414647] vprintk+0x2c/0x40
> [ 5.414650] _printk+0x50/0x60
> [ 5.414652] register_console+0x424/0x560
> [ 5.414654] serial_core_register_port+0x878/0x898
> [ 5.414659] serial_ctrl_register_port+0x14/0x28
> [ 5.414661] uart_add_one_port+0x14/0x28
> [ 5.414664] pl011_register_port+0x74/0x130
> [ 5.414667] sbsa_uart_probe+0x164/0x1b8
> [ 5.414670] platform_probe+0x8c/0x100
> [ 5.414674] really_probe+0xc4/0x398
> [ 5.414676] __driver_probe_device+0x80/0x1a8
> [ 5.414679] driver_probe_device+0x44/0x120
> [ 5.414681] __device_attach_driver+0xb8/0x158
> [ 5.414683] bus_for_each_drv+0x74/0xc0
> [ 5.414685] __device_attach+0xac/0x1e0
> [ 5.414687] device_initial_probe+0x18/0x28
> [ 5.414690] bus_probe_device+0xa8/0xb8
> [ 5.414692] device_add+0x648/0x830
> [ 5.414697] platform_device_add+0x114/0x280
> [ 5.414700] platform_device_register_full+0x148/0x1b8
> [ 5.414702] acpi_create_platform_device+0x264/0x388
> [ 5.414706] acpi_bus_attach+0x2c8/0x2e8
> [ 5.414709] acpi_dev_for_one_check+0x38/0x48
> [ 5.414711] device_for_each_child+0x54/0xa0
> [ 5.414714] acpi_dev_for_each_child+0x30/0x40
> [ 5.414716] acpi_bus_attach+0x6c/0x2e8
> [ 5.414718] acpi_dev_for_one_check+0x38/0x48
> [ 5.414720] device_for_each_child+0x54/0xa0
> [ 5.414723] acpi_dev_for_each_child+0x30/0x40
> [ 5.414725] acpi_bus_attach+0x6c/0x2e8
> [ 5.414727] acpi_bus_scan+0x5c/0x1b0
> [ 5.414730] acpi_scan_init+0xa0/0x288
> [ 5.414737] acpi_init+0x1e8/0x4b8
> [ 5.414740] do_one_initcall+0x3c/0x2a0
> [ 5.414743] kernel_init_freeable+0x22c/0x460
> [ 5.414746] kernel_init+0x28/0x1f0
> [ 5.414750] ret_from_fork+0x10/0x20
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao
2025-07-02 17:20 ` Leo Yan
2025-07-03 10:28 ` Mark Rutland
@ 2025-08-06 11:44 ` Nirmoy Das
2025-08-06 15:53 ` Breno Leitao
2 siblings, 1 reply; 17+ messages in thread
From: Nirmoy Das @ 2025-08-06 11:44 UTC (permalink / raw)
To: leitao
Cc: catalin.marinas, cov, leo.yan, linux-arm-kernel, linux-kernel,
linux-serial, mark.rutland, paulmck, rmikey, rmk+kernel,
usamaarif642
Hi Breno,
I tried the latest kernel which seems to contain a fix. Bisecting it led me to https://patch.msgid.link/20250604142045.253301-1-pmladek@suse.com and that indeed fixes it for me.
Could you please give it a try.
Regards,
Nirmoy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?)
2025-08-06 11:44 ` Nirmoy Das
@ 2025-08-06 15:53 ` Breno Leitao
0 siblings, 0 replies; 17+ messages in thread
From: Breno Leitao @ 2025-08-06 15:53 UTC (permalink / raw)
To: Nirmoy Das
Cc: catalin.marinas, cov, leo.yan, linux-arm-kernel, linux-kernel,
linux-serial, mark.rutland, paulmck, rmikey, rmk+kernel,
usamaarif642
Hello Nirmoy,
On Wed, Aug 06, 2025 at 04:44:41AM -0700, Nirmoy Das wrote:
> Hi Breno,
>
> I tried the latest kernel which seems to contain a fix. Bisecting it led me to https://patch.msgid.link/20250604142045.253301-1-pmladek@suse.com and that indeed fixes it for me.
> Could you please give it a try.
That is awesome. I've played a little bit with it, and I see it solving
my problem as well.
The machine is booting very fast now, taking 0.8 seconds to hit init.
[ 0.811320] Run /init as init process
Thanks for the heads-up!
--breno
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-08-06 16:16 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao
2025-07-02 17:20 ` Leo Yan
2025-07-03 10:02 ` Breno Leitao
2025-07-03 11:45 ` Leo Yan
2025-07-03 15:01 ` Breno Leitao
2025-07-03 16:17 ` Leo Yan
2025-07-03 10:28 ` Mark Rutland
2025-07-03 14:13 ` Breno Leitao
2025-07-03 16:31 ` Mark Rutland
2025-07-08 14:00 ` Breno Leitao
2025-07-09 14:23 ` Breno Leitao
2025-07-10 13:35 ` Leo Yan
2025-07-10 17:30 ` Breno Leitao
2025-07-11 9:50 ` Leo Yan
[not found] ` <cno3lsprrz36gqu27omvwrw27d2ubqtshac3cahfgkhcm2rla7@bpnulcmsyx2u>
2025-07-14 10:52 ` Leo Yan
2025-08-06 11:44 ` Nirmoy Das
2025-08-06 15:53 ` Breno Leitao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).