* arm64: csdlock at early boot due to slow serial (?) @ 2025-07-02 17:10 Breno Leitao 2025-07-02 17:20 ` Leo Yan ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Breno Leitao @ 2025-07-02 17:10 UTC (permalink / raw) To: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial Cc: rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck Hello, I'm observing two unusual behaviors during the boot process on my SBSA ARM machine, with upstream kernel (6.16-rc4): 1) A 9-second pause during early boot: [ 0.000000] ACPI: SPCR: console: pl011,mmio32,0xc280000,115200 [ 0.420120] Serial: AMBA PL011 UART driver [ 0.875263] printk: console [ttyAMA0] enabled [ 9.848263] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff]) 2) Occasional CSD lock during early boot: Intermittently, I encounter a CSD lock. Diagnosing this was challenging, but after enabling PSEUDO NMI, I was able to capture the stack trace: printk: console [ttyAMA0] enabled smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#02 do_nothing (kernel/smp.c:1058) smp: csd: CSD lock (#1) unresponsive. Sending NMI from CPU 0 to CPUs 2: .... pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 drivers/tty/serial/amba-pl011.c:2540) (P) nbcon_emit_next_record (kernel/printk/nbcon.c:1030) __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498) __nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593) nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610) vprintk_emit (kernel/printk/printk.c:2429) On reviewing the amba-pl011.c code, I noticed that each message being flushed causes the following loop to iterate roughly 20,000 times: while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy) cpu_relax(); Tracing this, I found that flushing early boot messages is taking a significant amount of time. For example, trace_printk() output from that function shows: swapper/0-1 [000] dN... 9.695941: pl011_console_write_atomic: "[ 0.928995] printk: console [ttyAMA0] enabled" | -> This is trace_printk of wctxt->outbuf At timestamp 9.69 seconds, the serial console is still flushing messages from 0.92 seconds, indicating that the initial 9-second gap is spent looping in cpu_relax()—about 20,000 times per message, which is clearly suboptimal. Further debugging revealed the following sequence with the pl011 registers: 1) uart_console_write() 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax()) 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax() Michael has reported a hardware issue where the BUSY bit could get stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long. If I am having the same hardware issue, I suppose I need to change that logic to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0 instead of BUSY. Anyway, any one familar with this weird behaviour? Thanks --breno ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao @ 2025-07-02 17:20 ` Leo Yan 2025-07-03 10:02 ` Breno Leitao 2025-07-03 10:28 ` Mark Rutland 2025-08-06 11:44 ` Nirmoy Das 2 siblings, 1 reply; 17+ messages in thread From: Leo Yan @ 2025-07-02 17:20 UTC (permalink / raw) To: Breno Leitao Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck Hi Breno, On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote: [...] > Further debugging revealed the following sequence with the pl011 registers: > > 1) uart_console_write() > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax()) > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax() > > Michael has reported a hardware issue where the BUSY bit could get > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long. > > If I am having the same hardware issue, I suppose I need to change that logic > to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0 > instead of BUSY. > > Anyway, any one familar with this weird behaviour? To be clear, I am not familiar with pl011 driver. For the first step, could you confirm the UART port is only used by Linux kernel? In some cases, if normal world and secure world share the same UART port, it can cause the UART state machine malfunction and long wait. Thanks, Leo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-02 17:20 ` Leo Yan @ 2025-07-03 10:02 ` Breno Leitao 2025-07-03 11:45 ` Leo Yan 0 siblings, 1 reply; 17+ messages in thread From: Breno Leitao @ 2025-07-03 10:02 UTC (permalink / raw) To: Leo Yan Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck On Wed, Jul 02, 2025 at 06:20:36PM +0100, Leo Yan wrote: > Hi Breno, > > On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote: > > [...] > > > Further debugging revealed the following sequence with the pl011 registers: > > > > 1) uart_console_write() > > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax()) > > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax() > > > > Michael has reported a hardware issue where the BUSY bit could get > > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY > > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long. > > > > If I am having the same hardware issue, I suppose I need to change that logic > > to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0 > > instead of BUSY. > > > > Anyway, any one familar with this weird behaviour? > > To be clear, I am not familiar with pl011 driver. > > For the first step, could you confirm the UART port is only used by > Linux kernel? > > In some cases, if normal world and secure world share the same UART > port, it can cause the UART state machine malfunction and long wait. I don't know how to check it for sure, but, looking at the serial console output, I don't see anything else using the UART. The only output I see on the console at that time is coming from linux kernel. Would you recommend any additional check? Thanks for the reply, --breno ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-03 10:02 ` Breno Leitao @ 2025-07-03 11:45 ` Leo Yan 2025-07-03 15:01 ` Breno Leitao 0 siblings, 1 reply; 17+ messages in thread From: Leo Yan @ 2025-07-03 11:45 UTC (permalink / raw) To: Breno Leitao Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck On Thu, Jul 03, 2025 at 03:02:39AM -0700, Breno Leitao wrote: [...] > > In some cases, if normal world and secure world share the same UART > > port, it can cause the UART state machine malfunction and long wait. > > I don't know how to check it for sure, but, looking at the serial > console output, I don't see anything else using the UART. The only > output I see on the console at that time is coming from linux kernel. > > Would you recommend any additional check? I have no experience in the driver, I should avoid any noise. But two things in my head for quick trying. - First, you could try earlycon mode, e.g., add option in kernel command line: earlycon=pl011,mmio32,0xc280000 earlycon=pl011,0xc280000 This would be possible to give more printing logs. If earlycon works well, then the issue might be caused by later init code (clock, or UART driver itself). - Try another UART port if this is possible. Just curious, you mentioned you are working on the mainline kernel. If you can confirm a workable kernel version, then this info can help to rule out hardware issue and we can review the delta changes. Thanks, Leo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-03 11:45 ` Leo Yan @ 2025-07-03 15:01 ` Breno Leitao 2025-07-03 16:17 ` Leo Yan 0 siblings, 1 reply; 17+ messages in thread From: Breno Leitao @ 2025-07-03 15:01 UTC (permalink / raw) To: Leo Yan Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck Hello Leo, On Thu, Jul 03, 2025 at 12:45:57PM +0100, Leo Yan wrote: > On Thu, Jul 03, 2025 at 03:02:39AM -0700, Breno Leitao wrote: > > > In some cases, if normal world and secure world share the same UART > > > port, it can cause the UART state machine malfunction and long wait. > > > > I don't know how to check it for sure, but, looking at the serial > > console output, I don't see anything else using the UART. The only > > output I see on the console at that time is coming from linux kernel. > > > > Would you recommend any additional check? > > I have no experience in the driver, I should avoid any noise. But two > things in my head for quick trying. > > - First, you could try earlycon mode, e.g., add option in kernel > command line: > > earlycon=pl011,mmio32,0xc280000 > earlycon=pl011,0xc280000 > > This would be possible to give more printing logs. If earlycon works > well, then the issue might be caused by later init code (clock, or > UART driver itself). That is an interesting experiment, thanks for the heads-up. I got `earlycon=pl011,mmio32,0xc280000` working, the machine boots fine, and I see different behaviours. Booting up the system the following lines (as examples): [ 0.000000] earlycon: pl11 at MMIO32 0x000000000c280000 (options '115200') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.493205] smp: Brought up 1 node, 72 CPUs [ 4.232864] SMP: Total of 72 processors activated. [ 4.242788] CPU: All CPU(s) started at EL2 [ 6.318137] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA [ 6.978245] netpoll: netconsole: local port 6666 [ 7.234429] Run /init as init process Also, I have a trace_printk() in pl011_console_write_atomic, and it only start later in the code. More precisely at: swapper/0-1 [000] dN.1. 6.334581: pl011_console_write_atomic: Printing 49 [ 6.334573] printk: console [ttyAMA0] enabled I understand that the early boot messages got sent by some other function? (Or maybe the tracing framework was not initialized?) Anyway, pl011_console_write_atomic() continues to call ~20K times cpu_relax() waiting for the BUSY bit to disappear, as before. For comparison, these are the same line as above, when earlycon is not set: [ 0.154009] smp: Brought up 1 node, 72 CPUs [ 0.154012] SMP: Total of 72 processors activated. [ 0.154013] CPU: All CPU(s) started at EL2 [ 0.466949] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA [ 0.466967] printk: console [ttyAMA0] enabled [ 0.466949] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA [ 5.513307] smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#05 do_nothing+0x0/0x10(0x0). [ 10.719305] netpoll: netconsole: local port 6666 [ 10.961798] Run /init as init process From what I see, using earlycon boots the kernel faster. Here are some ideas I got: 1) Earlycon enables the message to be printed earlier, thus, it smooth out the serial console 2) without earlycon, the messages are printed at once, with interrupt disabled 2.a) This causes CSD locks 2.b) CSD locks caused an additional 3 seconds delay 3) pl011_console_write_atomic is slow on both cases Thanks for the good suggestion, --breno ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-03 15:01 ` Breno Leitao @ 2025-07-03 16:17 ` Leo Yan 0 siblings, 0 replies; 17+ messages in thread From: Leo Yan @ 2025-07-03 16:17 UTC (permalink / raw) To: Breno Leitao Cc: cov, rmk+kernel, mark.rutland, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck Hi Breno, On Thu, Jul 03, 2025 at 04:01:54PM +0100, Breno Leitao wrote: [...] > That is an interesting experiment, thanks for the heads-up. > > I got `earlycon=pl011,mmio32,0xc280000` working, the machine boots fine, > and I see different behaviours. > > Booting up the system the following lines (as examples): > > [ 0.000000] earlycon: pl11 at MMIO32 0x000000000c280000 (options '115200') > [ 0.000000] printk: legacy bootconsole [pl11] enabled > [ 0.493205] smp: Brought up 1 node, 72 CPUs > [ 4.232864] SMP: Total of 72 processors activated. > [ 4.242788] CPU: All CPU(s) started at EL2 > [ 6.318137] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA > [ 6.978245] netpoll: netconsole: local port 6666 > [ 7.234429] Run /init as init process > > Also, I have a trace_printk() in pl011_console_write_atomic, and it only > start later in the code. More precisely at: > > swapper/0-1 [000] dN.1. 6.334581: pl011_console_write_atomic: Printing 49 [ 6.334573] printk: console [ttyAMA0] enabled > > I understand that the early boot messages got sent by some other > function? (Or maybe the tracing framework was not initialized?) > > Anyway, pl011_console_write_atomic() continues to call ~20K times cpu_relax() > waiting for the BUSY bit to disappear, as before. If earlycon works correctly, it means the UART clock has been prepared before the kernel boots. I noticed that pl011_console_write_atomic() enables and disables the uap->clk clock dynamically. The clock is a SoC implementation. If the UART clock is already properly enabled for earlycon, we can assume it doesn't need to be managed again in the UART driver. So, perhaps it's worth an experiment by commenting out the clock operations in the UART driver? Thanks, Leo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao 2025-07-02 17:20 ` Leo Yan @ 2025-07-03 10:28 ` Mark Rutland 2025-07-03 14:13 ` Breno Leitao 2025-08-06 11:44 ` Nirmoy Das 2 siblings, 1 reply; 17+ messages in thread From: Mark Rutland @ 2025-07-03 10:28 UTC (permalink / raw) To: Breno Leitao Cc: cov, rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck Hi Breno, On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote: > Hello, > > I'm observing two unusual behaviors during the boot process on my SBSA > ARM machine, with upstream kernel (6.16-rc4): Can you say which SoC in particular that is? Knowing that would help to identify whether there's some known erratum, clocking issue, etc. Likewise that might imply more folk to add to Cc. [...] > At timestamp 9.69 seconds, the serial console is still flushing messages from > 0.92 seconds, indicating that the initial 9-second gap is spent looping in > cpu_relax()—about 20,000 times per message, which is clearly suboptimal. > > Further debugging revealed the following sequence with the pl011 registers: > > 1) uart_console_write() > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax()) > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax() > > Michael has reported a hardware issue where the BUSY bit could get > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long. Looking at the commit message, that was an issue with the a "custom (non-PrimeCell) implementation of the SBSA UART" present on QDF400. I assume that was soemthing that Qualcomm Datacenter Technologies designed themselves. It's possible that your SoC has a similar issue with whatever IP block is being used as the UART, but the issue in that commit certainly doesn't apply to most PL011 / SBSA-UART implementations. > If I am having the same hardware issue, I suppose I need to change that logic > to exist the cpu_relax() loop by checking when Transmit FIFO Empty (TXFE) is 0 > instead of BUSY. If you have the same issue, then applying the same workaround makes sense. As above, it's not clear that this is necessarily the same issue. Any more detail that you can share regarding the SoC would be very helpful. Mark. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-03 10:28 ` Mark Rutland @ 2025-07-03 14:13 ` Breno Leitao 2025-07-03 16:31 ` Mark Rutland 0 siblings, 1 reply; 17+ messages in thread From: Breno Leitao @ 2025-07-03 14:13 UTC (permalink / raw) To: Mark Rutland Cc: cov, rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck On Thu, Jul 03, 2025 at 11:28:50AM +0100, Mark Rutland wrote: > On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote: > > I'm observing two unusual behaviors during the boot process on my SBSA > > ARM machine, with upstream kernel (6.16-rc4): > > Can you say which SoC in particular that is? Knowing that would help to > identify whether there's some known erratum, clocking issue, etc. This is custom made rack mounted machine based on Grace CPU. Here are some info about the hardware: # lscpu: Vendor ID: ARM Model name: Neoverse-V2 Model: 0 Thread(s) per core: 1 Core(s) per socket: 72 Socket(s): 1 Stepping: r0p0 # /proc/cpuinfo processor : 71 BogoMIPS : 2000.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd4f CPU revision : 0 # lshw description: Rack Mount Chassis product: <Internal name> vendor: Quanta version: <Internal name> width: 64 bits capabilities: smbios-3.6.0 dmi-3.6.0 smp sve_default_vector_length tagged_addr_disabled configuration: boot=normal chassis=rackmount family=Default string sku=Default string uuid=... How do I find the SoC exactly? > Likewise that might imply more folk to add to Cc. > > [...] > > > At timestamp 9.69 seconds, the serial console is still flushing messages from > > 0.92 seconds, indicating that the initial 9-second gap is spent looping in > > cpu_relax()-about 20,000 times per message, which is clearly suboptimal. > > > > Further debugging revealed the following sequence with the pl011 registers: > > > > 1) uart_console_write() > > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax()) > > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax() > > > > Michael has reported a hardware issue where the BUSY bit could get > > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY > > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long. > > Looking at the commit message, that was an issue with the a "custom > (non-PrimeCell) implementation of the SBSA UART" present on QDF400. I > assume that was soemthing that Qualcomm Datacenter Technologies designed > themselves. > > It's possible that your SoC has a similar issue with whatever IP block > is being used as the UART, but the issue in that commit certainly > doesn't apply to most PL011 / SBSA-UART implementations. That makes total sense. Decoding SPCR I see the following: # iasl -d spcr.dat Intel ACPI Component Architecture ASL+ Optimizing Compiler/Disassembler version 20210604 Copyright (c) 2000 - 2021 Intel Corporation File appears to be binary: found 56 non-ASCII characters, disassembling Binary file appears to be a valid ACPI table, disassembling Input file spcr.dat, Length 0x50 (80) bytes ACPI: SPCR 0x0000000000000000 000050 (v02 NVIDIA A M I 00000001 ARMH 00010000) Acpi Data Table [SPCR] decoded Formatted output: spcr.dsl - 2624 bytes Thanks, --breno ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-03 14:13 ` Breno Leitao @ 2025-07-03 16:31 ` Mark Rutland 2025-07-08 14:00 ` Breno Leitao 0 siblings, 1 reply; 17+ messages in thread From: Mark Rutland @ 2025-07-03 16:31 UTC (permalink / raw) To: Breno Leitao Cc: cov, rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck, Ankit Agrawal, Besar Wicaksono On Thu, Jul 03, 2025 at 03:13:26PM +0100, Breno Leitao wrote: > On Thu, Jul 03, 2025 at 11:28:50AM +0100, Mark Rutland wrote: > > On Wed, Jul 02, 2025 at 10:10:21AM -0700, Breno Leitao wrote: > > > I'm observing two unusual behaviors during the boot process on my SBSA > > > ARM machine, with upstream kernel (6.16-rc4): > > > > Can you say which SoC in particular that is? Knowing that would help to > > identify whether there's some known erratum, clocking issue, etc. > > This is custom made rack mounted machine based on Grace CPU. Here are > some info about the hardware: > > # lscpu: > Vendor ID: ARM > Model name: Neoverse-V2 > Model: 0 > Thread(s) per core: 1 > Core(s) per socket: 72 > Socket(s): 1 > Stepping: r0p0 > > # /proc/cpuinfo > processor : 71 > BogoMIPS : 2000.00 > Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part : 0xd4f > CPU revision : 0 > > # lshw > description: Rack Mount Chassis > product: <Internal name> > vendor: Quanta > version: <Internal name> > width: 64 bits > capabilities: smbios-3.6.0 dmi-3.6.0 smp sve_default_vector_length tagged_addr_disabled > configuration: boot=normal chassis=rackmount family=Default string sku=Default string uuid=... > > How do I find the SoC exactly? From what you've told me above, the SoC is Nvidia Grace; what they call the CPU is the whole SoC. > > Likewise that might imply more folk to add to Cc. I've added Ankit and Besar, since they've both worked on some system level bits on Grace, and might have an idea. Ankit, Besar, are you aware of any UART issues on Grace (as described in Breno's messages below), or do you know of anyone who might have an idea? Thanks, Mark. > > [...] > > > > > At timestamp 9.69 seconds, the serial console is still flushing messages from > > > 0.92 seconds, indicating that the initial 9-second gap is spent looping in > > > cpu_relax()-about 20,000 times per message, which is clearly suboptimal. > > > > > > Further debugging revealed the following sequence with the pl011 registers: > > > > > > 1) uart_console_write() > > > 2) REG_FR has BUSY | RXFE | TXFF for a while (~1k cpu_relax()) > > > 3) RXFE and TXFF are cleaned, and BUSY stay on for another 17k-19k cpu_relax() > > > > > > Michael has reported a hardware issue where the BUSY bit could get > > > stuck (see commit d8a4995bcea1: "tty: pl011: Work around QDF2400 E44 stuck BUSY > > > bit"), which is very similar. TXFE goes down, but BUSY is(?) still stuck for long. > > > > Looking at the commit message, that was an issue with the a "custom > > (non-PrimeCell) implementation of the SBSA UART" present on QDF400. I > > assume that was soemthing that Qualcomm Datacenter Technologies designed > > themselves. > > > > It's possible that your SoC has a similar issue with whatever IP block > > is being used as the UART, but the issue in that commit certainly > > doesn't apply to most PL011 / SBSA-UART implementations. > > That makes total sense. Decoding SPCR I see the following: > > # iasl -d spcr.dat > Intel ACPI Component Architecture > ASL+ Optimizing Compiler/Disassembler version 20210604 > Copyright (c) 2000 - 2021 Intel Corporation > > File appears to be binary: found 56 non-ASCII characters, disassembling > Binary file appears to be a valid ACPI table, disassembling > Input file spcr.dat, Length 0x50 (80) bytes > ACPI: SPCR 0x0000000000000000 000050 (v02 NVIDIA A M I 00000001 ARMH 00010000) > Acpi Data Table [SPCR] decoded > Formatted output: spcr.dsl - 2624 bytes > > Thanks, > --breno ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-03 16:31 ` Mark Rutland @ 2025-07-08 14:00 ` Breno Leitao 2025-07-09 14:23 ` Breno Leitao 0 siblings, 1 reply; 17+ messages in thread From: Breno Leitao @ 2025-07-08 14:00 UTC (permalink / raw) To: Mark Rutland, ankita, bwicaksono Cc: rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck, Ankit Agrawal, Besar Wicaksono On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote: > > How do I find the SoC exactly? > > >From what you've told me above, the SoC is Nvidia Grace; what they call > the CPU is the whole SoC. > > > > Likewise that might imply more folk to add to Cc. > > I've added Ankit and Besar, since they've both worked on some system > level bits on Grace, and might have an idea. > > Ankit, Besar, are you aware of any UART issues on Grace (as described in > Breno's messages below), or do you know of anyone who might have an > idea? Here is more information I got about this problem. TL;DR: While the machine is booting, it is throttled by the UART speed, while having IRQ disabled. Here is my line of investigation: 1) when the kernel start to flush the messages, there are ~700 messages in the buffer that will be flushed using pl011_console_write_atomic() [ 0.653673] printk: console [ttyAMA0] enabled [ 9.239225] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff]) a) pl011_console_write_atomic() is called ~700 times. Each message have around 50 chars. b) For each char, it needs to wait for it to be sent before proceeding. 3) pl011_console_write_atomic() have two big while()/cpu_relax() blocks. One per character and one per message. a) the first one is in pl011_console_putchar(), where it checks if the FIFO is full before writing a single character. This is done one per character. while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF) cpu_relax(); b) After the line is written, then it hits the second one, in pl011_console_write_atomic(), where it waits until the controller is not busy anymore: while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy) cpu_relax(); 4) The controller has FIFO enabled, but, it doesn't help much, because the CPU is waiting for the 700 * 50 chars to be transmitted. 5) At this time we just have one single CPU online (?) 6) This is done with IRQ disabled, which causes a CSD lock to trigger, given that the CPU has IRQ disabled a) Following Paul's suggestion, I added a `lockdep_assert_irqs_enabled()` at the entrance of pl011_console_write_atomic(), and that is what I see. hardirqs last enabled at (256267): [<ffff800080364980>] vprintk_store+0x6f8/0x820 hardirqs last disabled at (256268): [<ffff80008036e1c4>] __nbcon_atomic_flush_pending+0x2c4/0x3e8 softirqs last enabled at (255640): [<ffff8000801bb020>] handle_softirqs+0x910/0xe78 softirqs last disabled at (255635): [<ffff800080010938>] __do_softirq+0x18/0x20 b) This is the full stack in how we are calling cpu_relax() with irq disabled million times: WARNING: CPU: 0 PID: 1 at drivers/tty/serial/amba-pl011.c:2545 pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545) Modules linked in: pstate: 234003c9 (nzCv DAIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545) lr : pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545) sp : ffff80008980dca0 pmr: 000000f0 x29: ffff80008980dca0 x28: ffff0000aebcc080 x27: dfff800000000000 x26: 1fffe00015d7985a x25: 0000000000000000 x24: 0000000000000000 x23: ffff0000a1800000 x22: ffff800087e6d7f8 x21: 1fffe00015d7985b x20: ffff80008980dee0 x19: 0000000000000003 x18: 00000000ffffffff x17: 30313478305b2030 x16: 3030303230303030 x15: 3078302055504320 x14: 6c61636973796870 x13: 0a2a2a2065676173 x12: ffff700011301b8b x11: 1ffff00011301b8a x10: ffff700011301b8a x9 : ffff800081eafdb4 x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000000 x5 : 1ffff00011301be2 x4 : 0000000000000018 x3 : 0000000000000000 x2 : 1ffff00010fcdb57 x1 : 0000000000000000 x0 : 0000000000000001 Call trace: pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2545) (P) nbcon_emit_next_record (kernel/printk/nbcon.c:1026) __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498) __nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593) nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610) vprintk_emit (kernel/printk/printk.c:2429) vprintk_default (kernel/printk/printk.c:2466) vprintk (kernel/printk/printk_safe.c:83) _printk (kernel/printk/printk.c:2470) __warn (kernel/panic.c:768) report_bug (lib/bug.c:197 lib/bug.c:215) bug_handler (arch/arm64/kernel/traps.c:1002) call_break_hook (arch/arm64/kernel/debug-monitors.c:319) brk_handler (arch/arm64/kernel/debug-monitors.c:325) do_debug_exception (arch/arm64/mm/fault.c:1002) el1_dbg (arch/arm64/kernel/entry-common.c:514) el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:567) el1h_64_sync (arch/arm64/kernel/entry.S:595) pl011_console_write_atomic (drivers/tty/serial/amba-pl011.c:2524) (P) nbcon_emit_next_record (kernel/printk/nbcon.c:1026) __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1498) __nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1541 kernel/printk/nbcon.c:1593) nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1610) vprintk_emit (kernel/printk/printk.c:2429) vprintk_default (kernel/printk/printk.c:2466) vprintk (kernel/printk/printk_safe.c:83) _printk (kernel/printk/printk.c:2470) register_console (kernel/printk/printk.c:4126 (discriminator 9)) serial_core_register_port (drivers/tty/serial/serial_core.c:2637 drivers/tty/serial/serial_core.c:3157 drivers/tty/serial/serial_core.c:3388) serial_ctrl_register_port (drivers/tty/serial/serial_ctrl.c:42) uart_add_one_port (drivers/tty/serial/serial_port.c:144) pl011_register_port (drivers/tty/serial/amba-pl011.c:2867) sbsa_uart_probe (drivers/tty/serial/amba-pl011.c:3041) platform_probe (drivers/base/platform.c:1405) really_probe (drivers/base/dd.c:579 drivers/base/dd.c:657) __driver_probe_device (drivers/base/dd.c:799) driver_probe_device (drivers/base/dd.c:829) __device_attach_driver (drivers/base/dd.c:958) bus_for_each_drv (drivers/base/bus.c:462) __device_attach (drivers/base/dd.c:1031) device_initial_probe (drivers/base/dd.c:1079) bus_probe_device (drivers/base/bus.c:537) device_add (drivers/base/core.c:3699) platform_device_add (drivers/base/platform.c:716) platform_device_register_full (drivers/base/platform.c:845) acpi_create_platform_device (./include/linux/err.h:70 drivers/acpi/acpi_platform.c:178) acpi_bus_attach (./include/linux/acpi.h:738 drivers/acpi/scan.c:2206 drivers/acpi/scan.c:2198 drivers/acpi/scan.c:2316) acpi_dev_for_one_check (drivers/acpi/bus.c:1146) device_for_each_child (drivers/base/core.c:4021) acpi_dev_for_each_child (drivers/acpi/bus.c:1151) acpi_bus_attach (drivers/acpi/scan.c:2323) acpi_dev_for_one_check (drivers/acpi/bus.c:1146) device_for_each_child (drivers/base/core.c:4021) acpi_dev_for_each_child (drivers/acpi/bus.c:1151) acpi_bus_attach (drivers/acpi/scan.c:2323) acpi_bus_scan (drivers/acpi/scan.c:2533 drivers/acpi/scan.c:2606) acpi_scan_init (drivers/acpi/scan.c:2739) acpi_init (drivers/acpi/bus.c:1470) do_one_initcall (init/main.c:1274) kernel_init_freeable (init/main.c:1335 init/main.c:1352 init/main.c:1371 init/main.c:1584) kernel_init (init/main.c:1476) ret_from_fork (arch/arm64/kernel/entry.S:863) irq event stamp: 256268 hardirqs last enabled at (256267): vprintk_store (kernel/printk/printk.c:2356 (discriminator 1)) hardirqs last disabled at (256268): __nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1539 kernel/printk/nbcon.c:1593) softirqs last enabled at (255640): handle_softirqs (./arch/arm64/include/asm/current.h:19 ./arch/arm64/include/asm/preempt.h:13 kernel/softirq.c:426 kernel/softirq.c:607) softirqs last disabled at (255635): __do_softirq (kernel/softirq.c:614) ---[ end trace 0000000000000000 ]--- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-08 14:00 ` Breno Leitao @ 2025-07-09 14:23 ` Breno Leitao 2025-07-10 13:35 ` Leo Yan 0 siblings, 1 reply; 17+ messages in thread From: Breno Leitao @ 2025-07-09 14:23 UTC (permalink / raw) To: Mark Rutland, ankita, bwicaksono Cc: rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, leo.yan, linux-kernel, paulmck On Tue, Jul 08, 2025 at 07:00:45AM -0700, Breno Leitao wrote: > On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote: > > Here is more information I got about this problem. TL;DR: While the > machine is booting, it is throttled by the UART speed, while having IRQ > disabled. quick update: I've identified a solution that significantly improves the situation. I've found that the serial issue was heavily affecting boot time, which is unleashed now. After applying the following fix, the boot speed has improved dramatically. It's the fastest I've seen, and the CSD lockups are gone. If no concerns raise in the next days, I will send it officially to the serial maintainers. Author: Breno Leitao <leitao@debian.org> Date: Wed Jul 9 05:57:06 2025 -0700 serial: amba-pl011: Fix boot performance by switching to console_initcall() Replace arch_initcall() with console_initcall() for PL011 driver initialization to resolve severe boot performance issues. The current arch_initcall() registration causes the console to initialize before the printk subsystem is ready, forcing the driver into atomic mode during early boot. This results in: - 5-8 second boot delay while ~700 boot messages are processed - System freeze with IRQs disabled during message output - Each character transmitted synchronously with cpu_relax() polling This is what is driving the driver to atomic mode in the early boot: static inline void printk_get_console_flush_type(struct console_flush_type *ft) { .... if (printk_kthreads_running) ft->nbcon_offload = true; The atomic path processes each character individually through pl011_console_putchar(), waiting for UART transmission completion before proceeding. With only one CPU online during early boot, this creates a bottleneck where the system spends excessive time in interrupt-disabled state. Here is how the code looks like: 1) disable interrupt 2) for each of these 700 messages, call pl011_console_write_atomic() 3) for each character in the message, calls pl011_console_putchar(), which waits for the character to be transmitted 4) once all the line is transmitted, wait for the UART to be idle 5) re-enable interrupt Here is the code representation of the above: pl011_console_write_atomic() { ... // For each char in the message pl011_console_putchar() { while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF) cpu_relax(); } while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy) cpu_relax(); Using console_initcall() ensures proper initialization order, allowing the printk subsystem to use threaded output instead of atomic mode, eliminating the performance bottleneck. Performance improvement: 16x faster kernel boot time at my GRACE SoC machine. - Before: 10.08s to reach init process - After: 0.62s to reach init process Here are more timing details, collected from Linus' upstream, where the only different is this patch: Linus upstream: [ 0.616203] printk: legacy console [netcon_ext0] enabled [ 0.627469] Run /init as init process [ 0.837477] loop: module loaded [ 8.354803] Adding 134199360k swap on /swapvol/swapfile. With this patch: [ 0.305109] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA [ 10.081742] Run /init as init process [ 13.288717] loop: module loaded [ 22.919934] Adding 134199168k swap on /swapvol/swapfile. Link: https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/ [1] Signed-off-by: Breno Leitao <leitao@debian.org> diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index 22939841b1de..0cf251365825 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -3116,7 +3116,7 @@ static void __exit pl011_exit(void) * While this can be a module, if builtin it's most likely the console * So let's leave module_exit but move module_init to an earlier place */ -arch_initcall(pl011_init); +console_initcall(pl011_init); module_exit(pl011_exit); MODULE_AUTHOR("ARM Ltd/Deep Blue Solutions Ltd"); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-09 14:23 ` Breno Leitao @ 2025-07-10 13:35 ` Leo Yan 2025-07-10 17:30 ` Breno Leitao 0 siblings, 1 reply; 17+ messages in thread From: Leo Yan @ 2025-07-10 13:35 UTC (permalink / raw) To: Breno Leitao Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck Hi Breno, On Wed, Jul 09, 2025 at 07:23:44AM -0700, Breno Leitao wrote: > On Tue, Jul 08, 2025 at 07:00:45AM -0700, Breno Leitao wrote: > > On Thu, Jul 03, 2025 at 05:31:09PM +0100, Mark Rutland wrote: > > > > Here is more information I got about this problem. TL;DR: While the > > machine is booting, it is throttled by the UART speed, while having IRQ > > disabled. > > quick update: I've identified a solution that significantly improves the > situation. I've found that the serial issue was heavily affecting boot > time, which is unleashed now. > > After applying the following fix, the boot speed has improved > dramatically. It's the fastest I've seen, and the CSD lockups are gone. Thanks for trying to fix the issue. > If no concerns raise in the next days, I will send it officially to the > serial maintainers. I am not an expert on the PL011 driver, however, I do have concerns after review the change. Please see my comments below. > Author: Breno Leitao <leitao@debian.org> > Date: Wed Jul 9 05:57:06 2025 -0700 > > serial: amba-pl011: Fix boot performance by switching to console_initcall() > > Replace arch_initcall() with console_initcall() for PL011 driver initialization > to resolve severe boot performance issues. pl011_init() registers as an AMBA device, so the PL011 driver depends on the AMBA bus initialization. The AMBA bus is initialized with: postcore_initcall(amba_init); Therefore, the PL011 driver is initialized with arch_initcall(), which occurs later than the postcore init. My understanding is that console_initcall() is invoked much earlier than other initcalls triggered by do_initcalls(). With your change, I saw the PL011 driver fails to register on Juno-r2 board, due to AMBA bus driver is not ready for a console init. Driver 'uart-pl011' was unable to register with bus_type 'amba' because the bus was not initialized. > The current arch_initcall() registration causes the console to initialize > before the printk subsystem is ready, forcing the driver into atomic mode > during early boot. This results in: > > - 5-8 second boot delay while ~700 boot messages are processed > - System freeze with IRQs disabled during message output > - Each character transmitted synchronously with cpu_relax() polling > > This is what is driving the driver to atomic mode in the early boot: > > static inline void printk_get_console_flush_type(struct console_flush_type *ft) > { > .... > if (printk_kthreads_running) > ft->nbcon_offload = true; > > The atomic path processes each character individually through > pl011_console_putchar(), waiting for UART transmission completion > before proceeding. With only one CPU online during early boot, > this creates a bottleneck where the system spends excessive time > in interrupt-disabled state. The atomic path is introduced recently by the commit: 2eb2608618ce ("serial: amba-pl011: Implement nbcon console") My conclusion is that changing the initcall will not disable the atomic path, changing to console_initcall() will cause AMBA device init failure, and as a result, the clock operations will not be invoked. Thus, I am curious if you have ruled out the issue is caused by the UART clock (as I mentioned in another reply). BTW, since the atomic path is enabled in the commit 2eb2608618ce, what is the result after reverting the commit? Thanks, Leo > Here is how the code looks like: > > 1) disable interrupt > 2) for each of these 700 messages, call pl011_console_write_atomic() > 3) for each character in the message, calls pl011_console_putchar(), > which waits for the character to be transmitted > 4) once all the line is transmitted, wait for the UART to be idle > 5) re-enable interrupt > > Here is the code representation of the above: > > pl011_console_write_atomic() { > ... > // For each char in the message > pl011_console_putchar() { > while (pl011_read(uap, REG_FR) & UART01x_FR_TXFF) > cpu_relax(); > } > while ((pl011_read(uap, REG_FR) ^ uap->vendor->inv_fr) & uap->vendor->fr_busy) > cpu_relax(); > > Using console_initcall() ensures proper initialization order, > allowing the printk subsystem to use threaded output instead > of atomic mode, eliminating the performance bottleneck. > > Performance improvement: 16x faster kernel boot time at my GRACE SoC > machine. > > - Before: 10.08s to reach init process > - After: 0.62s to reach init process > > Here are more timing details, collected from Linus' upstream, where the > only different is this patch: > > Linus upstream: > [ 0.616203] printk: legacy console [netcon_ext0] enabled > [ 0.627469] Run /init as init process > [ 0.837477] loop: module loaded > [ 8.354803] Adding 134199360k swap on /swapvol/swapfile. > > With this patch: > [ 0.305109] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA > [ 10.081742] Run /init as init process > [ 13.288717] loop: module loaded > [ 22.919934] Adding 134199168k swap on /swapvol/swapfile. > > Link: https://lore.kernel.org/all/aGVn%2FSnOvwWewkOW@gmail.com/ [1] > > Signed-off-by: Breno Leitao <leitao@debian.org> > > diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c > index 22939841b1de..0cf251365825 100644 > --- a/drivers/tty/serial/amba-pl011.c > +++ b/drivers/tty/serial/amba-pl011.c > @@ -3116,7 +3116,7 @@ static void __exit pl011_exit(void) > * While this can be a module, if builtin it's most likely the console > * So let's leave module_exit but move module_init to an earlier place > */ > -arch_initcall(pl011_init); > +console_initcall(pl011_init); > module_exit(pl011_exit); > > MODULE_AUTHOR("ARM Ltd/Deep Blue Solutions Ltd"); > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-10 13:35 ` Leo Yan @ 2025-07-10 17:30 ` Breno Leitao 2025-07-11 9:50 ` Leo Yan 0 siblings, 1 reply; 17+ messages in thread From: Breno Leitao @ 2025-07-10 17:30 UTC (permalink / raw) To: Leo Yan Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck Hello Leo, On Thu, Jul 10, 2025 at 02:35:57PM +0100, Leo Yan wrote: > > serial: amba-pl011: Fix boot performance by switching to console_initcall() > > > > Replace arch_initcall() with console_initcall() for PL011 driver initialization > > to resolve severe boot performance issues. > > pl011_init() registers as an AMBA device, so the PL011 driver depends > on the AMBA bus initialization. The AMBA bus is initialized with: > > postcore_initcall(amba_init); > > Therefore, the PL011 driver is initialized with arch_initcall(), which > occurs later than the postcore init. > > My understanding is that console_initcall() is invoked much earlier > than other initcalls triggered by do_initcalls(). With your change, I > saw the PL011 driver fails to register on Juno-r2 board, due to AMBA bus > driver is not ready for a console init. You are absolutely correct. I confirm that the machine is fast because pl011 is not working, which is sad and embarassing. > The atomic path is introduced recently by the commit: > > 2eb2608618ce ("serial: amba-pl011: Implement nbcon console") > > My conclusion is that changing the initcall will not disable the atomic > path, changing to console_initcall() will cause AMBA device init > failure, and as a result, the clock operations will not be invoked. > Thus, I am curious if you have ruled out the issue is caused by the UART > clock (as I mentioned in another reply). > > BTW, since the atomic path is enabled in the commit 2eb2608618ce, what > is the result after reverting the commit? I've reverted commit 2eb2608618ce ("serial: amba-pl011: Implement nbcon console"), and I don't see the CSD locks anymoer. The serial speed is the same and continue to be slow, but, the CSD lock is not there. Here is the time spent on the serial flush when reverting the commit above [ 0.309561] printk: legacy console [ttyAMA0] enabled [ 8.657938] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff]) Thanks --breno ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-10 17:30 ` Breno Leitao @ 2025-07-11 9:50 ` Leo Yan [not found] ` <cno3lsprrz36gqu27omvwrw27d2ubqtshac3cahfgkhcm2rla7@bpnulcmsyx2u> 0 siblings, 1 reply; 17+ messages in thread From: Leo Yan @ 2025-07-11 9:50 UTC (permalink / raw) To: Breno Leitao Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck Hi Breno, On Thu, Jul 10, 2025 at 10:30:45AM -0700, Breno Leitao wrote: [...] > > The atomic path is introduced recently by the commit: > > > > 2eb2608618ce ("serial: amba-pl011: Implement nbcon console") > > > > My conclusion is that changing the initcall will not disable the atomic > > path, changing to console_initcall() will cause AMBA device init > > failure, and as a result, the clock operations will not be invoked. > > Thus, I am curious if you have ruled out the issue is caused by the UART > > clock (as I mentioned in another reply). > > > > BTW, since the atomic path is enabled in the commit 2eb2608618ce, what > > is the result after reverting the commit? > > I've reverted commit 2eb2608618ce ("serial: amba-pl011: Implement nbcon > console"), and I don't see the CSD locks anymoer. The serial speed is > the same and continue to be slow, but, the CSD lock is not there. Here > is the time spent on the serial flush when reverting the commit above > > [ 0.309561] printk: legacy console [ttyAMA0] enabled > [ 8.657938] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff]) From this result, we can know both the atomic path and the thread path take a long time polling. Since both paths configure the UART clock, I'm curious about the behaviour if the UART clock is untouched. The relevant code is shown below. I may seem a bit stubborn in suspecting a clock issue :) But if you have confirmed that a standard pl011 UART IP is being used, then the only external factor I am aware of is the clock. Thanks, Leo ---8<--- diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c index 71482d639a6d..b04773ba2602 100644 --- a/drivers/amba/bus.c +++ b/drivers/amba/bus.c @@ -64,6 +64,8 @@ static int amba_get_enable_pclk(struct amba_device *pcdev) { int ret; + return 0; + pcdev->pclk = clk_get(&pcdev->dev, "apb_pclk"); if (IS_ERR(pcdev->pclk)) return PTR_ERR(pcdev->pclk); @@ -77,8 +79,8 @@ static int amba_get_enable_pclk(struct amba_device *pcdev) static void amba_put_disable_pclk(struct amba_device *pcdev) { - clk_disable_unprepare(pcdev->pclk); - clk_put(pcdev->pclk); + //clk_disable_unprepare(pcdev->pclk); + //clk_put(pcdev->pclk); } diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index 22939841b1de..9ba24afb6e4f 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -1777,7 +1777,7 @@ static int pl011_hwinit(struct uart_port *port) /* * Try to enable the clock producer. */ - retval = clk_prepare_enable(uap->clk); + retval = clk_prepare(uap->clk); if (retval) return retval; @@ -1934,7 +1934,7 @@ static int pl011_startup(struct uart_port *port) return 0; clk_dis: - clk_disable_unprepare(uap->clk); + //clk_disable_unprepare(uap->clk); return retval; } @@ -2025,7 +2025,7 @@ static void pl011_shutdown(struct uart_port *port) /* * Shut down the clock producer */ - clk_disable_unprepare(uap->clk); + //clk_disable_unprepare(uap->clk); /* Optionally let pins go into sleep states */ pinctrl_pm_select_sleep_state(port->dev); @@ -2524,7 +2524,7 @@ pl011_console_write_atomic(struct console *co, struct nbcon_write_context *wctxt if (!nbcon_enter_unsafe(wctxt)) return; - clk_enable(uap->clk); + //clk_enable(uap->clk); if (!uap->vendor->always_enabled) { old_cr = pl011_read(uap, REG_CR); @@ -2542,7 +2542,7 @@ pl011_console_write_atomic(struct console *co, struct nbcon_write_context *wctxt if (!uap->vendor->always_enabled) pl011_write(old_cr, uap, REG_CR); - clk_disable(uap->clk); + //clk_disable(uap->clk); nbcon_exit_unsafe(wctxt); } @@ -2556,7 +2556,7 @@ pl011_console_write_thread(struct console *co, struct nbcon_write_context *wctxt if (!nbcon_enter_unsafe(wctxt)) return; - clk_enable(uap->clk); + //clk_enable(uap->clk); if (!uap->vendor->always_enabled) { old_cr = pl011_read(uap, REG_CR); @@ -2586,7 +2586,7 @@ pl011_console_write_thread(struct console *co, struct nbcon_write_context *wctxt if (!uap->vendor->always_enabled) pl011_write(old_cr, uap, REG_CR); - clk_disable(uap->clk); + //clk_disable(uap->clk); nbcon_exit_unsafe(wctxt); } ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <cno3lsprrz36gqu27omvwrw27d2ubqtshac3cahfgkhcm2rla7@bpnulcmsyx2u>]
* Re: arm64: csdlock at early boot due to slow serial (?) [not found] ` <cno3lsprrz36gqu27omvwrw27d2ubqtshac3cahfgkhcm2rla7@bpnulcmsyx2u> @ 2025-07-14 10:52 ` Leo Yan 0 siblings, 0 replies; 17+ messages in thread From: Leo Yan @ 2025-07-14 10:52 UTC (permalink / raw) To: Breno Leitao Cc: Mark Rutland, ankita, bwicaksono, rmk+kernel, catalin.marinas, linux-serial, rmikey, linux-arm-kernel, usamaarif642, linux-kernel, paulmck On Fri, Jul 11, 2025 at 03:45:03AM -0700, Breno Leitao wrote: > On Fri, Jul 11, 2025 at 10:50:23AM +0100, Leo Yan wrote: > > > I've reverted commit 2eb2608618ce ("serial: amba-pl011: Implement nbcon > > > console"), and I don't see the CSD locks anymoer. The serial speed is > > > the same and continue to be slow, but, the CSD lock is not there. Here > > > is the time spent on the serial flush when reverting the commit above > > > > > > [ 0.309561] printk: legacy console [ttyAMA0] enabled > > > [ 8.657938] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-ff]) > > > > From this result, we can know both the atomic path and the thread path > > take a long time polling. > > I am wondering if this slowdown is by design, or, if there is a bug > somewhere. Polling for over 5 seconds (5,001,000,000 ns) seems too long; it should not be by design. > > Since both paths configure the UART clock, I'm curious about the > > behaviour if the UART clock is untouched. The relevant code is shown > > below. > > Is this the clock frequency that is connected to pl011 coming from AMBA? The programming clock is for programming registers, my understanding is the clock is not provided by bus. > > I may seem a bit stubborn in suspecting a clock issue :) But if you > > have confirmed that a standard pl011 UART IP is being used. > > How do I double check this is a pl011 UART IP or if this is being > emulated by firmware/ACPI. The log shows: ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA This should be emulated by firmware by reading Peripheral Identification Registers (see [1]). The most reliable way is to confirm with SoC vendor if the IP is standard or includes any modification. [1] https://developer.arm.com/documentation/ddi0183/g/programmers-model/register-descriptions/peripheral-identification-registers--uartperiphid0-3 > > diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c > > index 71482d639a6d..b04773ba2602 100644 > > --- a/drivers/amba/bus.c > > +++ b/drivers/amba/bus.c > > > I've tested it, but, it seems it didnt' help much. here is the full log [...] > [ 0.307809] ARMH0011:00: ttyAMA0 at MMIO 0xc280000 (irq = 66, base_baud = 0) is a SBSA > [ 0.307816] printk: console [ttyAMA0] enabled Based on this log and followed OOPS, the issue happens just after enabling console. It is a bit tricky that it prints logs during console registration (and configuration). Seems to me, it is good to check pl011_console_setup() or sbsa_uart_set_termios() has any impaction on the long waiting. For example, the flow below will configure pin control and which might impact the data transaction: pl011_console_setup() `> pinctrl_pm_select_default_state() Based on the information shared earlier, the UART FIFO appears to be stalled, which is causing the long wait times. Apologies if my suggestion was unreliable or misled anything, and welcome others to freely correct me. Thanks, Leo > [ 5.414515] smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5001000000 ns for CPU#01 do_nothing+0x0/0x10(0x0). > [ 5.414532] smp: csd: CSD lock (#1) unresponsive. > [ 5.414535] Sending NMI from CPU 0 to CPUs 1: > [ 5.414541] NMI backtrace for cpu 1 > [ 5.414545] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.16.0-rc5-00056-gd7753fd13f9e #86 NONE > [ 5.414548] Hardware name: Quanta JAVA ISLAND PVT 29F0EMAZ049/Java Island, BIOS F0EJ3A13 06/04/2025 > [ 5.414550] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > [ 5.414552] pc : pl011_console_write_atomic+0x94/0x170 > [ 5.414559] lr : pl011_console_write_atomic+0x84/0x170 > [ 5.414562] sp : ffff80008492f1a0 > [ 5.414564] pmr: 000000c0 > [ 5.414565] x29: ffff80008492f1a0 x28: 0000000000000000 x27: 00000000000000e0 > [ 5.414572] x26: 0000000000000190 x25: 0000000000000000 x24: 0000000000000000 > [ 5.414578] x23: 0000000000000001 x22: 0000000000000000 x21: ffff800080c38ea8 > [ 5.414584] x20: ffff80008492f288 x19: ffff0000b2060080 x18: 00000000ffffffff > [ 5.414590] x17: 6362323230303030 x16: 3030303078303a30 x15: 206e6f6967657220 > [ 5.414595] x14: 3030303064322072 x13: 3030303063623232 x12: 3030303030303030 > [ 5.414601] x11: 78303a30206e6f69 x10: 6765722030303030 x9 : ffff800080c390e4 > [ 5.414607] x8 : 6964657220646e75 x7 : 205d313533383630 x6 : 302e30202020205b > [ 5.414612] x5 : ffff0000ad10200f x4 : 0000000000000000 x3 : 0000000000000008 > [ 5.414618] x2 : ffff8000816336c0 x1 : 0000000000000018 x0 : 0000000000000018 > [ 5.414624] Call trace: > [ 5.414626] pl011_console_write_atomic+0x94/0x170 (P) > [ 5.414630] nbcon_emit_next_record+0x234/0x388 > [ 5.414634] __nbcon_atomic_flush_pending_con+0x88/0x108 > [ 5.414637] __nbcon_atomic_flush_pending+0x108/0x198 > [ 5.414640] nbcon_atomic_flush_pending+0x24/0x38 > [ 5.414643] vprintk_emit+0x200/0x348 > [ 5.414645] vprintk_default+0x3c/0x50 > [ 5.414647] vprintk+0x2c/0x40 > [ 5.414650] _printk+0x50/0x60 > [ 5.414652] register_console+0x424/0x560 > [ 5.414654] serial_core_register_port+0x878/0x898 > [ 5.414659] serial_ctrl_register_port+0x14/0x28 > [ 5.414661] uart_add_one_port+0x14/0x28 > [ 5.414664] pl011_register_port+0x74/0x130 > [ 5.414667] sbsa_uart_probe+0x164/0x1b8 > [ 5.414670] platform_probe+0x8c/0x100 > [ 5.414674] really_probe+0xc4/0x398 > [ 5.414676] __driver_probe_device+0x80/0x1a8 > [ 5.414679] driver_probe_device+0x44/0x120 > [ 5.414681] __device_attach_driver+0xb8/0x158 > [ 5.414683] bus_for_each_drv+0x74/0xc0 > [ 5.414685] __device_attach+0xac/0x1e0 > [ 5.414687] device_initial_probe+0x18/0x28 > [ 5.414690] bus_probe_device+0xa8/0xb8 > [ 5.414692] device_add+0x648/0x830 > [ 5.414697] platform_device_add+0x114/0x280 > [ 5.414700] platform_device_register_full+0x148/0x1b8 > [ 5.414702] acpi_create_platform_device+0x264/0x388 > [ 5.414706] acpi_bus_attach+0x2c8/0x2e8 > [ 5.414709] acpi_dev_for_one_check+0x38/0x48 > [ 5.414711] device_for_each_child+0x54/0xa0 > [ 5.414714] acpi_dev_for_each_child+0x30/0x40 > [ 5.414716] acpi_bus_attach+0x6c/0x2e8 > [ 5.414718] acpi_dev_for_one_check+0x38/0x48 > [ 5.414720] device_for_each_child+0x54/0xa0 > [ 5.414723] acpi_dev_for_each_child+0x30/0x40 > [ 5.414725] acpi_bus_attach+0x6c/0x2e8 > [ 5.414727] acpi_bus_scan+0x5c/0x1b0 > [ 5.414730] acpi_scan_init+0xa0/0x288 > [ 5.414737] acpi_init+0x1e8/0x4b8 > [ 5.414740] do_one_initcall+0x3c/0x2a0 > [ 5.414743] kernel_init_freeable+0x22c/0x460 > [ 5.414746] kernel_init+0x28/0x1f0 > [ 5.414750] ret_from_fork+0x10/0x20 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao 2025-07-02 17:20 ` Leo Yan 2025-07-03 10:28 ` Mark Rutland @ 2025-08-06 11:44 ` Nirmoy Das 2025-08-06 15:53 ` Breno Leitao 2 siblings, 1 reply; 17+ messages in thread From: Nirmoy Das @ 2025-08-06 11:44 UTC (permalink / raw) To: leitao Cc: catalin.marinas, cov, leo.yan, linux-arm-kernel, linux-kernel, linux-serial, mark.rutland, paulmck, rmikey, rmk+kernel, usamaarif642 Hi Breno, I tried the latest kernel which seems to contain a fix. Bisecting it led me to https://patch.msgid.link/20250604142045.253301-1-pmladek@suse.com and that indeed fixes it for me. Could you please give it a try. Regards, Nirmoy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: arm64: csdlock at early boot due to slow serial (?) 2025-08-06 11:44 ` Nirmoy Das @ 2025-08-06 15:53 ` Breno Leitao 0 siblings, 0 replies; 17+ messages in thread From: Breno Leitao @ 2025-08-06 15:53 UTC (permalink / raw) To: Nirmoy Das Cc: catalin.marinas, cov, leo.yan, linux-arm-kernel, linux-kernel, linux-serial, mark.rutland, paulmck, rmikey, rmk+kernel, usamaarif642 Hello Nirmoy, On Wed, Aug 06, 2025 at 04:44:41AM -0700, Nirmoy Das wrote: > Hi Breno, > > I tried the latest kernel which seems to contain a fix. Bisecting it led me to https://patch.msgid.link/20250604142045.253301-1-pmladek@suse.com and that indeed fixes it for me. > Could you please give it a try. That is awesome. I've played a little bit with it, and I see it solving my problem as well. The machine is booting very fast now, taking 0.8 seconds to hit init. [ 0.811320] Run /init as init process Thanks for the heads-up! --breno ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-08-06 16:16 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-02 17:10 arm64: csdlock at early boot due to slow serial (?) Breno Leitao 2025-07-02 17:20 ` Leo Yan 2025-07-03 10:02 ` Breno Leitao 2025-07-03 11:45 ` Leo Yan 2025-07-03 15:01 ` Breno Leitao 2025-07-03 16:17 ` Leo Yan 2025-07-03 10:28 ` Mark Rutland 2025-07-03 14:13 ` Breno Leitao 2025-07-03 16:31 ` Mark Rutland 2025-07-08 14:00 ` Breno Leitao 2025-07-09 14:23 ` Breno Leitao 2025-07-10 13:35 ` Leo Yan 2025-07-10 17:30 ` Breno Leitao 2025-07-11 9:50 ` Leo Yan [not found] ` <cno3lsprrz36gqu27omvwrw27d2ubqtshac3cahfgkhcm2rla7@bpnulcmsyx2u> 2025-07-14 10:52 ` Leo Yan 2025-08-06 11:44 ` Nirmoy Das 2025-08-06 15:53 ` Breno Leitao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).