public inbox for linux-serial@vger.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: "Toshiyuki Sato (Fujitsu)" <fj6611ie@fujitsu.com>,
	'Michael Kelley' <mhklinux@outlook.com>,
	'Ryo Takakura' <ryotkkr98@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jirislaby@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-serial@vger.kernel.org" <linux-serial@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: Problem with nbcon console and amba-pl011 serial port
Date: Fri, 06 Jun 2025 19:04:22 +0206	[thread overview]
Message-ID: <84msakdcy9.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <aEL0tZgSEhsR9qbf@pathway.suse.cz>

On 2025-06-06, Petr Mladek <pmladek@suse.com> wrote:
>> What if during non-panic-CPU shutdown, we allow reacquires to succeed
>> only for _direct_ acquires? The below diff shows how this could be
>> implemented. Since it only supports direct acquires, it does not violate
>> any state rules. And also, since it only involves the reacquire, there
>> is no ugly battling for console printing between the panic and non-panic
>> CPUs.
>
> Interesting idea. I thought a lot about it, see below.
>
>
>> diff --git a/include/linux/printk.h b/include/linux/printk.h
>> index 5b462029d03c1..d58ebdc8170b3 100644
>> --- a/include/linux/printk.h
>> +++ b/include/linux/printk.h
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index b0b9a8bf4560d..8f572630c9f7e 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -304,6 +310,8 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>  		smp_send_stop();
>>  	else
>>  		crash_smp_send_stop();
>> +
>> +	nbcon_panic_allow_reacquire_set(false);
>>  }
>
> I have two concerns here:
>
> 1. I wonder whether this is reliable enough. It seems that
>    smp_send_stop() waits at least 1 sec until the CPUs
>    get stopped. But is this enough on virtualized systems?
>
> 2. It might increase a risk when CPUs are stopped using NMI.
>    The change would allow a non-panic CPU to reacquire the ownership
>    and enter _unsafe_ section right before being stopped by NMI.
>
>
> The 1st problem might be avoided by allowing the reacquire all
> the time unless an "unsafe" takeover happened.
>
> The 2nd problem is worse. But allowing the reacquire all the time
> might actually help as well.
>
> Note that the information about the "unsafe_takeover" is stored
> in struct console so that we even won't need a new global
> variable.

That is a good idea. We should add unsafe_takeover as a condition as
well. That can only happen way after the smp_send_stop() anyway. But it
means that only the atomic printing code would ever need to worry about
unsafe_takeover (assuming a driver were to implement some sort of
handling of it).

>>  /**
>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> index d60596777d278..d960cb8a05558 100644
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -235,7 +235,8 @@ static void nbcon_seq_try_update(struct nbcon_context *ctxt, u64 new_seq)
>>   *			the handover acquire method.
>>   */
>>  static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>> -					    struct nbcon_state *cur)
>> +					    struct nbcon_state *cur,
>> +					    bool ignore_other_cpu_in_panic)
>>  {
>>  	unsigned int cpu = smp_processor_id();
>>  	struct console *con = ctxt->console;
>> @@ -249,7 +250,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>>  		 * nbcon_waiter_matches(). In particular, the assumption that
>>  		 * lower priorities are ignored during panic.
>>  		 */
>> -		if (other_cpu_in_panic())
>> +		if (other_cpu_in_panic() && !ignore_other_cpu_in_panic)
>
> If you agree with allowing the reacquire all the time then I would
> rename the parameter to @is_reacquire and do something like:
>
> 	if (other_cpu_in_panic() &&
> 	   (!is_reacquire || cur->unsafe_takeover))

Looks fine to me.

>>  			return -EPERM;
>>  
>>  		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
>> @@ -913,7 +920,7 @@ void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt)
>>  {
>>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
>>  
>> -	while (!nbcon_context_try_acquire(ctxt))
>> +	while (!nbcon_context_try_acquire(ctxt, READ_ONCE(nbcon_panic_allow_reacquire)))
>
> And here it would be:
>
> 	while (!nbcon_context_try_acquire(ctxt, true))

Right.

>>  		cpu_relax();
>>  
>>  	nbcon_write_context_set_buf(wctxt, NULL, 0);
>
>
> Summary:
>
> I open to give this alternative approach a chance when we allow the
> reacquire all the time. It might work well. And it won't require
> any special "panic" handling in all console drivers.

Agreed. Thanks for being open about this approach. I will put together
an official patch.

John

  reply	other threads:[~2025-06-06 16:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-03  3:18 Problem with nbcon console and amba-pl011 serial port Michael Kelley
2025-06-03  9:03 ` Ryo Takakura
2025-06-03  9:36 ` Toshiyuki Sato (Fujitsu)
2025-06-03 10:13   ` John Ogness
2025-06-03 10:44     ` John Ogness
2025-06-04  1:22       ` Toshiyuki Sato (Fujitsu)
2025-06-04  7:44         ` John Ogness
2025-06-04  8:11           ` Russell King (Oracle)
2025-06-03 11:09     ` John Ogness
2025-06-04  4:11       ` Toshiyuki Sato (Fujitsu)
2025-06-04  7:52         ` John Ogness
2025-06-04 11:08         ` Petr Mladek
2025-06-04 11:50           ` John Ogness
2025-06-04 13:42             ` Petr Mladek
2025-06-05  5:27               ` Toshiyuki Sato (Fujitsu)
2025-06-05 13:39                 ` Petr Mladek
2025-06-06  6:46                   ` Toshiyuki Sato (Fujitsu)
2025-06-06 10:19                   ` John Ogness
2025-06-06 10:35                     ` John Ogness
2025-06-06 14:01                     ` Petr Mladek
2025-06-06 16:58                       ` John Ogness [this message]
2025-06-05  2:49         ` Michael Kelley
2025-06-05  6:22           ` Toshiyuki Sato (Fujitsu)
2025-06-05  7:42             ` John Ogness
2025-06-09  3:38               ` Michael Kelley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84msakdcy9.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=fj6611ie@fujitsu.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jirislaby@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mhklinux@outlook.com \
    --cc=pmladek@suse.com \
    --cc=ryotkkr98@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox