public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Donghyeok Choe <d7271.choe@samsung.com>,
	linux-kernel@vger.kernel.org, takakura@valinux.co.jp,
	youngmin.nam@samsung.com, hajun.sung@samsung.com,
	seungh.jung@samsung.com, jh1012.choi@samsung.com
Subject: Re: printk: selective deactivation of feature ignoring non panic cpu's messages
Date: Wed, 26 Feb 2025 14:58:48 +0100	[thread overview]
Message-ID: <Z78eGNIuG_-CVOGl@pathway.suse.cz> (raw)
In-Reply-To: <84ikoxxrfy.fsf@jogness.linutronix.de>

On Wed 2025-02-26 05:31:53, John Ogness wrote:
> Hi Donghyeok,
> 
> On 2025-02-26, Donghyeok Choe <d7271.choe@samsung.com> wrote:
> > I would like to print out the message of non panic cpu as it is.
> > Can I use early_param to selectively disable that feature?
> 
> I have no issues about allowing this type of feature for debugging
> purposes.

Yes. It makes sense. Another scenario might be when
panic_other_cpus_shutdown() is not able to stop some CPUs.
It might be useful to see messages from the problematic ones.

> I do not know if early_param is the best approach. I expect
> Petr will offer good insight here.

early_param() looks good to me. There are already similar early
parameters, for example, "ignore_loglevel".


> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index fb242739aec8..3f420e8bdb2c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2368,6 +2368,17 @@ void printk_legacy_allow_panic_sync(void)
> >         }
> >  }
> >
> > +static bool __read_mostly keep_printk_all_cpu_in_panic;
> > +
> > +static int __init keep_printk_all_cpu_in_panic_setup(char *str)
> > +{
> > +       keep_printk_all_cpu_in_panic = true;
> > +       pr_info("printk: keep printk all cpu in panic.\n");
> > +
> > +       return 0;
> > +}
> > +early_param("keep_printk_all_cpu_in_panic", keep_printk_all_cpu_in_panic_setup);
> 
> Quite a long argument. I am horrible at naming. I expect Petr would have
> a good suggestion (if early_param is the way to go).

Heh. It seems to be hard to find a good name ;-)

Anyway, I would use "printk_" prefix to make it clear that
it is printk-related. The following comes to my mind:

  + printk_allow_non_panic_cpus
  + printk_keep_non_panic_cpus
  + printk_debug_non_panic_cpus

I prefer "printk_debug_non_panic_cpus", see below.


> >  asmlinkage int vprintk_emit(int facility, int level,
> >                             const struct dev_printk_info *dev_info,
> >                             const char *fmt, va_list args)
> > @@ -2379,13 +2390,15 @@ asmlinkage int vprintk_emit(int facility, int level,
> >         if (unlikely(suppress_printk))
> >                 return 0;
> >
> > -       /*
> > -        * The messages on the panic CPU are the most important. If
> > -        * non-panic CPUs are generating any messages, they will be
> > -        * silently dropped.
> > -        */
> > -       if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > -               return 0;
> > +       if (!keep_printk_all_cpu_in_panic) {
> > +               /*
> > +                * The messages on the panic CPU are the most important. If
> > +                * non-panic CPUs are generating any messages, they will be
> > +                * silently dropped.
> > +                */
> > +               if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > +                       return 0;
> > +       }
> 
> I would not nest it. Just something like:
> 
> 	/*
> 	 * The messages on the panic CPU are the most important. If
> 	 * non-panic CPUs are generating any messages, they may be
> 	 * silently dropped.
> 	 */
> 	if (!keep_printk_all_cpu_in_panic &&
> 	    !panic_triggering_all_cpu_backtrace &&
> 	    other_cpu_in_panic()) {
> 		return 0;
> 	}

I would prefer this form as well.

Thinking loudly:

I wonder if this is actually safe. I recall that we simplified the
design somewhere because we expected that non-panic CPUs will not
add messages. I am not sure that I found all locations. But
we might want to revise:


1st problem: _prb_read_valid() skips non-finalized records on non-panic CPUs.

   opinion: We should not do it in this case.


2nd problem: Is _prb_read_valid() actually safe when
	panic_triggering_all_cpu_backtrace is true?

   opinion: It should be safe because the backtraces from different CPUs
	are serialized via printk_cpu_sync_get_irqsave().


3rd problem: nbcon_get_default_prio() returns NBCON_PRIO_NORMAL on
	non-panic CPUs. As a result, printk_get_console_flush_type()
	would suggest flushing like when the system works as expected.

	But the legacy-loop will bail out after flushing one
	message on one console, see console_flush_all(). It is weird
	behavior.

	Another question is who would flush the messages when the panic()
	CPU does not reach the explicit flush.

   opinion: We should probably try to flush the messages on non-panic
	CPUs in this mode when safe. This is why I prefer the name
	"printk_debug_non_panic_cpus".

	We should update console_flush_all() to do not bail out when
	the new option is set.

	We should call nbcon_atomic_flush_pending() on non-panic CPUs
	when the new option is set. printk_get_console_flush_type()
	should behave like with NBCON_PRIO_EMERGENCY.

	Maybe, nbcon_get_default_prio() should actually return
	NBCON_PRIO_EMERGENCY on non-panic CPUs when this option is set.
	It allow the non-panic CPUs to take over the nbcon context
	from the potentially frozen kthread.


Best Regards,
Petr

  reply	other threads:[~2025-02-26 13:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250226031756epcas2p3674cccc82687effb40575aa5fa2956e0@epcas2p3.samsung.com>
2025-02-26  3:16 ` printk: selective deactivation of feature ignoring non panic cpu's messages Donghyeok Choe
2025-02-26  4:25   ` John Ogness
2025-02-26 13:58     ` Petr Mladek [this message]
2025-02-28 14:20       ` John Ogness
2025-03-04  2:01         ` Donghyeok Choe
2025-03-04 13:22         ` Petr Mladek
2025-03-04 13:59           ` John Ogness
2025-03-04 14:15             ` Petr Mladek
2025-03-17  5:06               ` Donghyeok Choe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z78eGNIuG_-CVOGl@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=d7271.choe@samsung.com \
    --cc=hajun.sung@samsung.com \
    --cc=jh1012.choi@samsung.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seungh.jung@samsung.com \
    --cc=takakura@valinux.co.jp \
    --cc=youngmin.nam@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox