Re: Problem with nbcon console and amba-pl011 serial port

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: "Toshiyuki Sato (Fujitsu)" <fj6611ie@fujitsu.com>,
	'Michael Kelley' <mhklinux@outlook.com>,
	'Ryo Takakura' <ryotkkr98@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jirislaby@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-serial@vger.kernel.org" <linux-serial@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: Problem with nbcon console and amba-pl011 serial port
Date: Fri, 06 Jun 2025 19:04:22 +0206	[thread overview]
Message-ID: <84msakdcy9.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <aEL0tZgSEhsR9qbf@pathway.suse.cz>

On 2025-06-06, Petr Mladek <pmladek@suse.com> wrote:
>> What if during non-panic-CPU shutdown, we allow reacquires to succeed
>> only for _direct_ acquires? The below diff shows how this could be
>> implemented. Since it only supports direct acquires, it does not violate
>> any state rules. And also, since it only involves the reacquire, there
>> is no ugly battling for console printing between the panic and non-panic
>> CPUs.
>
> Interesting idea. I thought a lot about it, see below.
>
>
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index 5b462029d03c1..d58ebdc8170b3 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index b0b9a8bf4560d..8f572630c9f7e 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -304,6 +310,8 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>  		smp_send_stop();
>>  	else
>>  		crash_smp_send_stop();
>> +
>> +	nbcon_panic_allow_reacquire_set(false);
>>  }
>
> I have two concerns here:
>
> 1. I wonder whether this is reliable enough. It seems that
>    smp_send_stop() waits at least 1 sec until the CPUs
>    get stopped. But is this enough on virtualized systems?
>
> 2. It might increase a risk when CPUs are stopped using NMI.
>    The change would allow a non-panic CPU to reacquire the ownership
>    and enter _unsafe_ section right before being stopped by NMI.
>
>
> The 1st problem might be avoided by allowing the reacquire all
> the time unless an "unsafe" takeover happened.
>
> The 2nd problem is worse. But allowing the reacquire all the time
> might actually help as well.
>
> Note that the information about the "unsafe_takeover" is stored
> in struct console so that we even won't need a new global
> variable.

That is a good idea. We should add unsafe_takeover as a condition as
well. That can only happen way after the smp_send_stop() anyway. But it
means that only the atomic printing code would ever need to worry about
unsafe_takeover (assuming a driver were to implement some sort of
handling of it).

>>  /**
>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> index d60596777d278..d960cb8a05558 100644
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -235,7 +235,8 @@ static void nbcon_seq_try_update(struct nbcon_context *ctxt, u64 new_seq)
>>   *			the handover acquire method.
>>   */
>>  static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>> -					    struct nbcon_state *cur)
>> +					    struct nbcon_state *cur,
>> +					    bool ignore_other_cpu_in_panic)
>>  {
>>  	unsigned int cpu = smp_processor_id();
>>  	struct console *con = ctxt->console;
>> @@ -249,7 +250,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>>  		 * nbcon_waiter_matches(). In particular, the assumption that
>>  		 * lower priorities are ignored during panic.
>>  		 */
>> -		if (other_cpu_in_panic())
>> +		if (other_cpu_in_panic() && !ignore_other_cpu_in_panic)
>
> If you agree with allowing the reacquire all the time then I would
> rename the parameter to @is_reacquire and do something like:
>
> 	if (other_cpu_in_panic() &&
> 	   (!is_reacquire || cur->unsafe_takeover))

Looks fine to me.

>>  			return -EPERM;
>>  
>>  		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
>> @@ -913,7 +920,7 @@ void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt)
>>  {
>>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
>>  
>> -	while (!nbcon_context_try_acquire(ctxt))
>> +	while (!nbcon_context_try_acquire(ctxt, READ_ONCE(nbcon_panic_allow_reacquire)))
>
> And here it would be:
>
> 	while (!nbcon_context_try_acquire(ctxt, true))

Right.

>>  		cpu_relax();
>>  
>>  	nbcon_write_context_set_buf(wctxt, NULL, 0);
>
>
> Summary:
>
> I open to give this alternative approach a chance when we allow the
> reacquire all the time. It might work well. And it won't require
> any special "panic" handling in all console drivers.

Agreed. Thanks for being open about this approach. I will put together
an official patch.

John

next prev parent reply	other threads:[~2025-06-06 17:00 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-03  3:18 Problem with nbcon console and amba-pl011 serial port Michael Kelley
2025-06-03  9:03 ` Ryo Takakura
2025-06-03  9:36 ` Toshiyuki Sato (Fujitsu)
2025-06-03 10:13   ` John Ogness
2025-06-03 10:44     ` John Ogness
2025-06-04  1:22       ` Toshiyuki Sato (Fujitsu)
2025-06-04  7:44         ` John Ogness
2025-06-04  8:11           ` Russell King (Oracle)
2025-06-03 11:09     ` John Ogness
2025-06-04  4:11       ` Toshiyuki Sato (Fujitsu)
2025-06-04  7:52         ` John Ogness
2025-06-04 11:08         ` Petr Mladek
2025-06-04 11:50           ` John Ogness
2025-06-04 13:42             ` Petr Mladek
2025-06-05  5:27               ` Toshiyuki Sato (Fujitsu)
2025-06-05 13:39                 ` Petr Mladek
2025-06-06  6:46                   ` Toshiyuki Sato (Fujitsu)
2025-06-06 10:19                   ` John Ogness
2025-06-06 10:35                     ` John Ogness
2025-06-06 14:01                     ` Petr Mladek
2025-06-06 16:58                       ` John Ogness [this message]
2025-06-05  2:49         ` Michael Kelley
2025-06-05  6:22           ` Toshiyuki Sato (Fujitsu)
2025-06-05  7:42             ` John Ogness
2025-06-09  3:38               ` Michael Kelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84msakdcy9.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=fj6611ie@fujitsu.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jirislaby@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mhklinux@outlook.com \
    --cc=pmladek@suse.com \
    --cc=ryotkkr98@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.