From: Nicholas Piggin <npiggin@gmail.com>
To: John Ogness <john.ogness@linutronix.de>, Petr Mladek <pmladek@suse.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>, linux-kernel@vger.kernel.org
Subject: Re: Removal of printk safe buffers delays NMI context printk
Date: Sat, 06 Nov 2021 09:57:47 +1000 [thread overview]
Message-ID: <1636156115.ubjgngowl8.astroid@bobo.none> (raw)
In-Reply-To: <YYVakNdzjrYuBmhf@alley>
Excerpts from Petr Mladek's message of November 6, 2021 2:23 am:
> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>> On 2021-11-05, Nicholas Piggin <npiggin@gmail.com> wrote:
>> >> What was removed from 93d102f094b was irq_work triggering on all
>> >> CPUs.
>> >
>> > No, it was the caller executing the flush for all remote CPUs itself.
>> > irq work was not involved (and irq work can't be raised in a remote
>> > CPU from NMI context).
>>
>> Maybe I am missing something. In 93d102f094b~1 I see:
>>
>> watchdog_smp_panic
>> printk_safe_flush
>> __printk_safe_flush
>> printk_safe_flush_buffer
>> printk_safe_flush_line
>> printk_deferred
>> vprintk_deferred
>> vprintk_emit (but no direct printing)
>> defer_console_output
>> irq_work_queue
>>
>> AFAICT, using defer_console_output() instead of your new printk_flush()
>> should cause the exact behavior as before.
>
> I agree. printk_safe_flush() used printk_deferred(). It only queued
> the irq_work and never called consoles directly.
>
>> > but we do need that printk flush capability back there and for
>> > nmi_backtrace.
>>
>> Agreed. I had not considered this necessary side-effect when I removed
>> the NMI safe buffers.
>
> Honestly, I do not understand why it stopped working or how
> it worked before.
>
> printk() calls vprintk(). Current vprintk() does:
>
> asmlinkage int vprintk(const char *fmt, va_list args)
> {
> [...]
> /*
> * Use the main logbuf even in NMI. But avoid calling console
> * drivers that might have their own locks.
> */
> if (this_cpu_read(printk_context) || in_nmi()) {
> int len;
>
> len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
> defer_console_output();
> return len;
> }
>
> /* No obstacles. */
> return vprintk_default(fmt, args);
> }
>
> By other words, current vprintk():
>
> + queues irq_work() in NMI context
> + tries to flush consoles immeditely otherwise
>
>
>> I am just wondering if we should fix the regression by going back to
>> using irq_work (such as defer_console_output()) or if we want to
>> introduce something new that introduces direct printing.
>
> Yup, defer_console_output() should do the same as printk_safe_flush()
> before. We do not longer need to copy the messages because they are
> already in the main lockless log buffer.
>
> Well, I am curious about the original code. The commit 93d102f094be9beab28e5
> ("printk: remove safe buffers") did the following:
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index c9a8f4781a10..dc17d8903d4f 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -183,11 +183,6 @@ static void watchdog_smp_panic(int cpu, u64 tb)
>
> wd_smp_unlock(&flags);
>
> - printk_safe_flush();
> - /*
> - * printk_safe_flush() seems to require another print
> - * before anything actually goes out to console.
> - */
> if (sysctl_hardlockup_all_cpu_backtrace)
> trigger_allbutself_cpu_backtrace();
>
> And I am curious because:
>
> + Why was printk_safe_flush() called before triggering backtraces
> on other CPUs?
I commented about that earlier in the thread. The flush is for a
previous non-maskable IPI. trigger_allbutself_cpu_backtrace is not NMI
on powerpc it's just regular irq context so if they did respond to it
they would also be able to run the irq work.
> + The comment says that another print is needed before the messages
> goes to the console. It makes sense because printk_safe_flush()
> only set irq_work. But the patch did not remove any printk().
> So, nobody called any printk() even before.
I think that was my misunderstanding as to the problem. The NMI IPI is
actually asynchronous so it was not guaranteed that targets had
completed their backtraces at this point. powerpc watchdog needs a
separate fix (after which it seems to work fine with printk_safe_flush)
Thanks,
Nick
next prev parent reply other threads:[~2021-11-05 23:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-04 15:54 Removal of printk safe buffers delays NMI context printk Nicholas Piggin
2021-11-04 16:18 ` John Ogness
2021-11-05 1:26 ` Nicholas Piggin
2021-11-05 9:55 ` John Ogness
2021-11-05 11:43 ` Nicholas Piggin
2021-11-05 13:57 ` John Ogness
2021-11-05 16:23 ` Petr Mladek
2021-11-05 16:44 ` John Ogness
2021-11-06 0:26 ` Nicholas Piggin
2021-11-06 20:05 ` John Ogness
2021-11-05 23:57 ` Nicholas Piggin [this message]
2021-11-05 23:48 ` Nicholas Piggin
2021-11-05 16:01 ` Petr Mladek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1636156115.ubjgngowl8.astroid@bobo.none \
--to=npiggin@gmail.com \
--cc=john.ogness@linutronix.de \
--cc=ldufour@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pmladek@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox