From: John Ogness <john.ogness@linutronix.de>
To: pmladek@suse.com
Cc: "Toshiyuki Sato (Fujitsu)" <fj6611ie@fujitsu.com>,
'Michael Kelley' <mhklinux@outlook.com>,
'Ryo Takakura' <ryotkkr98@gmail.com>,
Russell King <linux@armlinux.org.uk>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jiri Slaby <jirislaby@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-serial@vger.kernel.org" <linux-serial@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: RE: Problem with nbcon console and amba-pl011 serial port
Date: Tue, 03 Jun 2025 13:15:38 +0206 [thread overview]
Message-ID: <84plfl5bf1.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <84y0u95e0j.fsf@jogness.linutronix.de>
Hi Petr,
On 2025-06-03, John Ogness <john.ogness@linutronix.de> wrote:
> On 2025-06-03, "Toshiyuki Sato (Fujitsu)" <fj6611ie@fujitsu.com> wrote:
>>> 4. pr_emerg() has a high logging level, and it effectively steals the console
>>> from the "pr/ttyAMA0" task, which I believe is intentional in the nbcon design.
>>> Down in pl011_console_write_thread(), the "pr/ttyAMA0" task is doing
>>> nbcon_enter_unsafe() and nbcon_exit_unsafe() around each character
>>> that it outputs. When pr_emerg() steals the console, nbcon_exit_unsafe()
>>> returns 0, so the "for" loop exits. pl011_console_write_thread() then
>>> enters a busy "while" loop waiting to reclaim the console. It's doing this
>>> busy "while" loop with interrupts disabled, and because of the panic,
>>> it never succeeds. Whatever CPU is running "pr/ttyAMA0" is effectively
>>> stuck at this point.
>>>
>>> 5. Meanwhile panic() continues, calling panic_other_cpus_shutdown(). On
>>> ARM64, other CPUs are stopped by sending them an IPI. Each CPU receives
>>> the IPI and calls the PSCI function to stop itself. But the CPU running
>>> "pr/ttyAMA0" is looping forever with interrupts disabled, so it never
>>> processes the IPI and it never stops. ARM64 doesn't have a true NMI that
>>> can override the looping with interrupts disabled, so there's no way to
>>> stop that CPU.
>>>
>>> 6. The failure to stop the "pr/ttyAMA0" CPU then causes downstream
>>> problems, such as when loading and running a kdump kernel.
>
> [...]
>
>> After reproducing the issue,
>> I plan to try a workaround that forcibly terminates the nbcon_reacquire_nobuf
>> loop in pl011_console_write_thread if other_cpu_in_panic is true.
>> Please comment if you have any other ideas.
>
> For panic, if it is OK to leave uap->clk enabled and not restore REG_CR,
> then it should be fine to just return. But only for panic.
>
> So something like:
>
> while (!nbcon_enter_unsafe(wctxt)) {
> if (other_cpu_in_panic())
> return;
> nbcon_reacquire_nobuf(wctxt);
> }
Actually this is not enough because there is also a loop inside
nbcon_reacquire_nobuf().
nbcon_reacquire_nobuf() needs to return an error for the panic case
because it will never succeed. This is the only case where it will never
succeed. Should we use a bool? Or return some code like -EPERM?
So the above code becomes:
while (!nbcon_enter_unsafe(wctxt)) {
if (!nbcon_reacquire_nobuf(wctxt))
return;
}
We should also add __must_check to the prototype.
Thoughts?
John
next prev parent reply other threads:[~2025-06-03 11:09 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 3:18 Problem with nbcon console and amba-pl011 serial port Michael Kelley
2025-06-03 9:03 ` Ryo Takakura
2025-06-03 9:36 ` Toshiyuki Sato (Fujitsu)
2025-06-03 10:13 ` John Ogness
2025-06-03 10:44 ` John Ogness
2025-06-04 1:22 ` Toshiyuki Sato (Fujitsu)
2025-06-04 7:44 ` John Ogness
2025-06-04 8:11 ` Russell King (Oracle)
2025-06-03 11:09 ` John Ogness [this message]
2025-06-04 4:11 ` Toshiyuki Sato (Fujitsu)
2025-06-04 7:52 ` John Ogness
2025-06-04 11:08 ` Petr Mladek
2025-06-04 11:50 ` John Ogness
2025-06-04 13:42 ` Petr Mladek
2025-06-05 5:27 ` Toshiyuki Sato (Fujitsu)
2025-06-05 13:39 ` Petr Mladek
2025-06-06 6:46 ` Toshiyuki Sato (Fujitsu)
2025-06-06 10:19 ` John Ogness
2025-06-06 10:35 ` John Ogness
2025-06-06 14:01 ` Petr Mladek
2025-06-06 16:58 ` John Ogness
2025-06-05 2:49 ` Michael Kelley
2025-06-05 6:22 ` Toshiyuki Sato (Fujitsu)
2025-06-05 7:42 ` John Ogness
2025-06-09 3:38 ` Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84plfl5bf1.fsf@jogness.linutronix.de \
--to=john.ogness@linutronix.de \
--cc=fj6611ie@fujitsu.com \
--cc=gregkh@linuxfoundation.org \
--cc=jirislaby@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=mhklinux@outlook.com \
--cc=pmladek@suse.com \
--cc=ryotkkr98@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox