public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Donghyeok Choe <d7271.choe@samsung.com>,
	linux-kernel@vger.kernel.org, takakura@valinux.co.jp,
	youngmin.nam@samsung.com, hajun.sung@samsung.com,
	seungh.jung@samsung.com, jh1012.choi@samsung.com
Subject: Re: printk: selective deactivation of feature ignoring non panic cpu's messages
Date: Fri, 28 Feb 2025 15:26:34 +0106	[thread overview]
Message-ID: <8434fytakt.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <Z78eGNIuG_-CVOGl@pathway.suse.cz>

On 2025-02-26, Petr Mladek <pmladek@suse.com> wrote:
> I wonder if this is actually safe. I recall that we simplified the
> design somewhere because we expected that non-panic CPUs will not
> add messages.

Perhaps you are recalling commit 7412dc6d55ee ("dump_stack: Do not get
cpu_sync for panic CPU").

> I am not sure that I found all locations. But
> we might want to revise:
>
>
> 1st problem: _prb_read_valid() skips non-finalized records on non-panic CPUs.
>
>    opinion: We should not do it in this case.

I don't know. This could result in seeing even less output if some CPU
didn't finalize a record.

> 2nd problem: Is _prb_read_valid() actually safe when
> 	panic_triggering_all_cpu_backtrace is true?
>
>    opinion: It should be safe because the backtraces from different CPUs
> 	are serialized via printk_cpu_sync_get_irqsave().

To clarify, by "safe" you mean it does not skip backtrace records from
other CPUs.

It does not skip their records because trigger_all_cpu_backtrace() waits
(up to 10 seconds) for the other CPUs to finish storing their backtraces
before continuing.

The use of the printk_cpu_sync is only to avoid interweaving the
multiple non-panic CPU backtraces.

> 3rd problem: nbcon_get_default_prio() returns NBCON_PRIO_NORMAL on
> 	non-panic CPUs. As a result, printk_get_console_flush_type()
> 	would suggest flushing like when the system works as expected.
>
> 	But the legacy-loop will bail out after flushing one
> 	message on one console, see console_flush_all(). It is weird
> 	behavior.

I believe you are talking about commit 8ebc476fd51e ("printk: Drop
console_sem during panic")? And also be aware of commit 51a1d258e50e
("printk: Keep non-panic-CPUs out of console lock").

> 	Another question is who would flush the messages when the panic()
> 	CPU does not reach the explicit flush.

Nobody. That is by design.

>    opinion: We should probably try to flush the messages on non-panic
> 	CPUs in this mode when safe. This is why I prefer the name
> 	"printk_debug_non_panic_cpus".
>
> 	We should update console_flush_all() to do not bail out when
> 	the new option is set.

It is not that simple. Legacy printing involves acquiring the
console_lock, which currently is not possible for non-panic CPUs during
panic.

> 	We should call nbcon_atomic_flush_pending() on non-panic CPUs
> 	when the new option is set. printk_get_console_flush_type()
> 	should behave like with NBCON_PRIO_EMERGENCY.

Note that non-panic CPUs during panic are also forbidden from acquiring
console ownership.

> 	Maybe, nbcon_get_default_prio() should actually return
> 	NBCON_PRIO_EMERGENCY on non-panic CPUs when this option is set.
> 	It allow the non-panic CPUs to take over the nbcon context
> 	from the potentially frozen kthread.

Note that nbcon_waiter_matches() requires that "Only one CPU is allowed
to request PANIC priority."

IMO it is opening up a huge can of worms to start allowing non-panic
CPUs to acquire the console_lock, take over console ownership, and print
(possibly with PANIC priority).

You could argue that this is just a debug mode. But I don't think that
is justification to allow things to explode and possibly deadlock, when
they otherwise would not have.

Perhaps Donghyeok might be happy enough if the debug option simply
allowed the non-panic CPUs to store records. There are plenty of tools
available to get at the dmesg buffer.

John

  reply	other threads:[~2025-02-28 14:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250226031756epcas2p3674cccc82687effb40575aa5fa2956e0@epcas2p3.samsung.com>
2025-02-26  3:16 ` printk: selective deactivation of feature ignoring non panic cpu's messages Donghyeok Choe
2025-02-26  4:25   ` John Ogness
2025-02-26 13:58     ` Petr Mladek
2025-02-28 14:20       ` John Ogness [this message]
2025-03-04  2:01         ` Donghyeok Choe
2025-03-04 13:22         ` Petr Mladek
2025-03-04 13:59           ` John Ogness
2025-03-04 14:15             ` Petr Mladek
2025-03-17  5:06               ` Donghyeok Choe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8434fytakt.fsf@jogness.linutronix.de \
    --to=john.ogness@linutronix.de \
    --cc=d7271.choe@samsung.com \
    --cc=hajun.sung@samsung.com \
    --cc=jh1012.choi@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=seungh.jung@samsung.com \
    --cc=takakura@valinux.co.jp \
    --cc=youngmin.nam@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox