All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Donghyeok Choe <d7271.choe@samsung.com>,
	linux-kernel@vger.kernel.org, takakura@valinux.co.jp,
	youngmin.nam@samsung.com, hajun.sung@samsung.com,
	seungh.jung@samsung.com, jh1012.choi@samsung.com
Subject: Re: printk: selective deactivation of feature ignoring non panic cpu's messages
Date: Wed, 26 Feb 2025 14:58:48 +0100	[thread overview]
Message-ID: <Z78eGNIuG_-CVOGl@pathway.suse.cz> (raw)
In-Reply-To: <84ikoxxrfy.fsf@jogness.linutronix.de>

On Wed 2025-02-26 05:31:53, John Ogness wrote:
> Hi Donghyeok,
> 
> On 2025-02-26, Donghyeok Choe <d7271.choe@samsung.com> wrote:
> > I would like to print out the message of non panic cpu as it is.
> > Can I use early_param to selectively disable that feature?
> 
> I have no issues about allowing this type of feature for debugging
> purposes.

Yes. It makes sense. Another scenario might be when
panic_other_cpus_shutdown() is not able to stop some CPUs.
It might be useful to see messages from the problematic ones.

> I do not know if early_param is the best approach. I expect
> Petr will offer good insight here.

early_param() looks good to me. There are already similar early
parameters, for example, "ignore_loglevel".


> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index fb242739aec8..3f420e8bdb2c 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2368,6 +2368,17 @@ void printk_legacy_allow_panic_sync(void)
> >         }
> >  }
> >
> > +static bool __read_mostly keep_printk_all_cpu_in_panic;
> > +
> > +static int __init keep_printk_all_cpu_in_panic_setup(char *str)
> > +{
> > +       keep_printk_all_cpu_in_panic = true;
> > +       pr_info("printk: keep printk all cpu in panic.\n");
> > +
> > +       return 0;
> > +}
> > +early_param("keep_printk_all_cpu_in_panic", keep_printk_all_cpu_in_panic_setup);
> 
> Quite a long argument. I am horrible at naming. I expect Petr would have
> a good suggestion (if early_param is the way to go).

Heh. It seems to be hard to find a good name ;-)

Anyway, I would use "printk_" prefix to make it clear that
it is printk-related. The following comes to my mind:

  + printk_allow_non_panic_cpus
  + printk_keep_non_panic_cpus
  + printk_debug_non_panic_cpus

I prefer "printk_debug_non_panic_cpus", see below.


> >  asmlinkage int vprintk_emit(int facility, int level,
> >                             const struct dev_printk_info *dev_info,
> >                             const char *fmt, va_list args)
> > @@ -2379,13 +2390,15 @@ asmlinkage int vprintk_emit(int facility, int level,
> >         if (unlikely(suppress_printk))
> >                 return 0;
> >
> > -       /*
> > -        * The messages on the panic CPU are the most important. If
> > -        * non-panic CPUs are generating any messages, they will be
> > -        * silently dropped.
> > -        */
> > -       if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > -               return 0;
> > +       if (!keep_printk_all_cpu_in_panic) {
> > +               /*
> > +                * The messages on the panic CPU are the most important. If
> > +                * non-panic CPUs are generating any messages, they will be
> > +                * silently dropped.
> > +                */
> > +               if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> > +                       return 0;
> > +       }
> 
> I would not nest it. Just something like:
> 
> 	/*
> 	 * The messages on the panic CPU are the most important. If
> 	 * non-panic CPUs are generating any messages, they may be
> 	 * silently dropped.
> 	 */
> 	if (!keep_printk_all_cpu_in_panic &&
> 	    !panic_triggering_all_cpu_backtrace &&
> 	    other_cpu_in_panic()) {
> 		return 0;
> 	}

I would prefer this form as well.

Thinking loudly:

I wonder if this is actually safe. I recall that we simplified the
design somewhere because we expected that non-panic CPUs will not
add messages. I am not sure that I found all locations. But
we might want to revise:


1st problem: _prb_read_valid() skips non-finalized records on non-panic CPUs.

   opinion: We should not do it in this case.


2nd problem: Is _prb_read_valid() actually safe when
	panic_triggering_all_cpu_backtrace is true?

   opinion: It should be safe because the backtraces from different CPUs
	are serialized via printk_cpu_sync_get_irqsave().


3rd problem: nbcon_get_default_prio() returns NBCON_PRIO_NORMAL on
	non-panic CPUs. As a result, printk_get_console_flush_type()
	would suggest flushing like when the system works as expected.

	But the legacy-loop will bail out after flushing one
	message on one console, see console_flush_all(). It is weird
	behavior.

	Another question is who would flush the messages when the panic()
	CPU does not reach the explicit flush.

   opinion: We should probably try to flush the messages on non-panic
	CPUs in this mode when safe. This is why I prefer the name
	"printk_debug_non_panic_cpus".

	We should update console_flush_all() to do not bail out when
	the new option is set.

	We should call nbcon_atomic_flush_pending() on non-panic CPUs
	when the new option is set. printk_get_console_flush_type()
	should behave like with NBCON_PRIO_EMERGENCY.

	Maybe, nbcon_get_default_prio() should actually return
	NBCON_PRIO_EMERGENCY on non-panic CPUs when this option is set.
	It allow the non-panic CPUs to take over the nbcon context
	from the potentially frozen kthread.


Best Regards,
Petr

  reply	other threads:[~2025-02-26 13:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250226031756epcas2p3674cccc82687effb40575aa5fa2956e0@epcas2p3.samsung.com>
2025-02-26  3:16 ` printk: selective deactivation of feature ignoring non panic cpu's messages Donghyeok Choe
2025-02-26  4:25   ` John Ogness
2025-02-26 13:58     ` Petr Mladek [this message]
2025-02-28 14:20       ` John Ogness
2025-03-04  2:01         ` Donghyeok Choe
2025-03-04 13:22         ` Petr Mladek
2025-03-04 13:59           ` John Ogness
2025-03-04 14:15             ` Petr Mladek
2025-03-17  5:06               ` Donghyeok Choe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z78eGNIuG_-CVOGl@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=d7271.choe@samsung.com \
    --cc=hajun.sung@samsung.com \
    --cc=jh1012.choi@samsung.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seungh.jung@samsung.com \
    --cc=takakura@valinux.co.jp \
    --cc=youngmin.nam@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.