From: John Ogness <john.ogness@linutronix.de>
To: pmladek@suse.com
Cc: "Toshiyuki Sato (Fujitsu)" <fj6611ie@fujitsu.com>,
'Michael Kelley' <mhklinux@outlook.com>,
'Ryo Takakura' <ryotkkr98@gmail.com>,
Russell King <linux@armlinux.org.uk>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jiri Slaby <jirislaby@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-serial@vger.kernel.org" <linux-serial@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>
Subject: RE: Problem with nbcon console and amba-pl011 serial port
Date: Tue, 03 Jun 2025 13:15:38 +0206 [thread overview]
Message-ID: <84plfl5bf1.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <84y0u95e0j.fsf@jogness.linutronix.de>
Hi Petr,
On 2025-06-03, John Ogness <john.ogness@linutronix.de> wrote:
> On 2025-06-03, "Toshiyuki Sato (Fujitsu)" <fj6611ie@fujitsu.com> wrote:
>>> 4. pr_emerg() has a high logging level, and it effectively steals the console
>>> from the "pr/ttyAMA0" task, which I believe is intentional in the nbcon design.
>>> Down in pl011_console_write_thread(), the "pr/ttyAMA0" task is doing
>>> nbcon_enter_unsafe() and nbcon_exit_unsafe() around each character
>>> that it outputs. When pr_emerg() steals the console, nbcon_exit_unsafe()
>>> returns 0, so the "for" loop exits. pl011_console_write_thread() then
>>> enters a busy "while" loop waiting to reclaim the console. It's doing this
>>> busy "while" loop with interrupts disabled, and because of the panic,
>>> it never succeeds. Whatever CPU is running "pr/ttyAMA0" is effectively
>>> stuck at this point.
>>>
>>> 5. Meanwhile panic() continues, calling panic_other_cpus_shutdown(). On
>>> ARM64, other CPUs are stopped by sending them an IPI. Each CPU receives
>>> the IPI and calls the PSCI function to stop itself. But the CPU running
>>> "pr/ttyAMA0" is looping forever with interrupts disabled, so it never
>>> processes the IPI and it never stops. ARM64 doesn't have a true NMI that
>>> can override the looping with interrupts disabled, so there's no way to
>>> stop that CPU.
>>>
>>> 6. The failure to stop the "pr/ttyAMA0" CPU then causes downstream
>>> problems, such as when loading and running a kdump kernel.
>
> [...]
>
>> After reproducing the issue,
>> I plan to try a workaround that forcibly terminates the nbcon_reacquire_nobuf
>> loop in pl011_console_write_thread if other_cpu_in_panic is true.
>> Please comment if you have any other ideas.
>
> For panic, if it is OK to leave uap->clk enabled and not restore REG_CR,
> then it should be fine to just return. But only for panic.
>
> So something like:
>
> while (!nbcon_enter_unsafe(wctxt)) {
> if (other_cpu_in_panic())
> return;
> nbcon_reacquire_nobuf(wctxt);
> }
Actually this is not enough because there is also a loop inside
nbcon_reacquire_nobuf().
nbcon_reacquire_nobuf() needs to return an error for the panic case
because it will never succeed. This is the only case where it will never
succeed. Should we use a bool? Or return some code like -EPERM?
So the above code becomes:
while (!nbcon_enter_unsafe(wctxt)) {
if (!nbcon_reacquire_nobuf(wctxt))
return;
}
We should also add __must_check to the prototype.
Thoughts?
John
next prev parent reply other threads:[~2025-06-03 11:13 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 3:18 Problem with nbcon console and amba-pl011 serial port Michael Kelley
2025-06-03 9:03 ` Ryo Takakura
2025-06-03 9:36 ` Toshiyuki Sato (Fujitsu)
2025-06-03 10:13 ` John Ogness
2025-06-03 10:44 ` John Ogness
2025-06-04 1:22 ` Toshiyuki Sato (Fujitsu)
2025-06-04 7:44 ` John Ogness
2025-06-04 8:11 ` Russell King (Oracle)
2025-06-03 11:09 ` John Ogness [this message]
2025-06-04 4:11 ` Toshiyuki Sato (Fujitsu)
2025-06-04 7:52 ` John Ogness
2025-06-04 11:08 ` Petr Mladek
2025-06-04 11:50 ` John Ogness
2025-06-04 13:42 ` Petr Mladek
2025-06-05 5:27 ` Toshiyuki Sato (Fujitsu)
2025-06-05 13:39 ` Petr Mladek
2025-06-06 6:46 ` Toshiyuki Sato (Fujitsu)
2025-06-06 10:19 ` John Ogness
2025-06-06 10:35 ` John Ogness
2025-06-06 14:01 ` Petr Mladek
2025-06-06 16:58 ` John Ogness
2025-06-05 2:49 ` Michael Kelley
2025-06-05 6:22 ` Toshiyuki Sato (Fujitsu)
2025-06-05 7:42 ` John Ogness
2025-06-09 3:38 ` Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84plfl5bf1.fsf@jogness.linutronix.de \
--to=john.ogness@linutronix.de \
--cc=fj6611ie@fujitsu.com \
--cc=gregkh@linuxfoundation.org \
--cc=jirislaby@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=mhklinux@outlook.com \
--cc=pmladek@suse.com \
--cc=ryotkkr98@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.