Re: [PATCH RFC] panic: call panic handlers before panic_other_cpus_shutdown()

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ryo Takakura <ryotkkr98@gmail.com>
To: pmladek@suse.com
Cc: akpm@linux-foundation.org, bhe@redhat.com, decui@microsoft.com,
	gregkh@linuxfoundation.org, haiyangz@microsoft.com,
	hamzamahfooz@linux.microsoft.com, jani.nikula@intel.com,
	jfalempe@redhat.com, joel.granados@kernel.org,
	john.ogness@linutronix.de, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, ryotkkr98@gmail.com,
	wei.liu@kernel.org
Subject: Re: [PATCH RFC] panic: call panic handlers before panic_other_cpus_shutdown()
Date: Sun,  2 Mar 2025 15:36:38 +0900	[thread overview]
Message-ID: <20250302063638.7096-1-ryotkkr98@gmail.com> (raw)
In-Reply-To: <Z784BiUZohADyoOW@pathway.suse.cz>

Hi Petr!

On Wed, 26 Feb 2025 16:49:26 +0100, Petr Mladek wrote:
>On Sat 2025-02-22 14:44:05, Ryo Takakura wrote:
>> On Fri, 21 Feb 2025 16:23:07 -0500, Hamza Mahfooz wrote:
>> >On Fri, Feb 21, 2025 at 11:23:28AM +0900, Ryo Takakura wrote:
>> >> On Thu, 20 Feb 2025 17:53:00 -0500, Hamza Mahfooz wrote:
>> >> >Since, the panic handlers may require certain cpus to be online to panic
>> >> >gracefully, we should call them before turning off SMP. Without this
>> >> >re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
>> >> >vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
>> >> >is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
>> >> >crash_smp_send_stop() before the vmbus channel can be deconstructed.
>> >> >
>> >> So maybe panic_other_cpus_shutdown() should be palced after 
>> >> atomic_notifier_call_chain() along with printk_legacy_allow_panic_sync()
>> >> like below?
>> >> 
>> >> ----- BEGIN -----
>> >> diff --git a/kernel/panic.c b/kernel/panic.c
>> >> index d8635d5cecb2..7ac40e85ee27 100644
>> >> --- a/kernel/panic.c
>> >> +++ b/kernel/panic.c
>> >> @@ -372,16 +372,16 @@ void panic(const char *fmt, ...)
>> >>         if (!_crash_kexec_post_notifiers)
>> >>                 __crash_kexec(NULL);
>> >> 
>> >> -       panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
>> >> -
>> >> -       printk_legacy_allow_panic_sync();
>> >> -
>> >>         /*
>> >>          * Run any panic handlers, including those that might need to
>> >>          * add information to the kmsg dump output.
>> >>          */
>> >>         atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >> 
>> >> +       panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
>> >> +
>> >> +       printk_legacy_allow_panic_sync();
>> >> 
>> >>         panic_print_sys_info(false);
>> >> 
>> >>         kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>> >> ----- END -----
>> >
>> >Ya, that looks fine to me, that's actually how I had it initally, but I
>> >wasn't sure if it had to go before the panic handlers. So, I erred on
>> >the side of caution.
>
>The ordering (stopping CPUs before allowing printk_legacy loop)
>is important from the printk POV. So, keep it, please.

Thanks for the check.

>> I see, sorry that I was only speaking in relation to stored backtraces.
>> It seems that printk_legacy_allow_panic_sync() is placed before
>> atomic_notifier_call_chain() so that it can handle flushing before calling
>> any panic handlers as described [0].
>
>> [0] https://lore.kernel.org/lkml/ZeHSgZs9I3Ihvpye@alley/
>
>> I'm not really familar with the problems associated with panic handlers
>> so I hope maybe John and Petr can help on this matter...
>
>Honestly, I do not have much experience with failures of the panic
>notifiers. But I saw a patchset which tried to add filtering of
>some problematic ones, see
>https://lore.kernel.org/lkml/20220108153451.195121-1-gpiccoli@igalia.com/
>
>I did not like the way of ad-hoc filtering. The right solution was to
>fix the problematic notifiers.
>
>Anyway, it went out that the situation was not that easy. The notifiers
>do various things. Some of them just printing extra information. Others
>stopped or suspended some devices or services. Some should be called
>before and some after crash_dump.
>
>The outcome was a monster-patchset which tried to fix some problematic
>notifiers and split them into more notifier chains, see
>https://lore.kernel.org/all/20220427224924.592546-1-gpiccoli@igalia.com/
>
>Some of the fixes were accepted but the split has never been done.

I see. I went through some of the discussions on the thread [0]
and I can see how complicated the subject is...

>My opinion:
>
>1. The best solution would be to make the problematic notifier working
>   with stopped CPUs. The discussion around [v2] suggests that the author
>   made it working at least for x86_64, see
>   https://lore.kernel.org/r/20250221213055.133849-1-hamzamahfooz@linux.microsoft.com

I agree. But I also like the below solution as well!

>2. Another good solution might be to do the split of the notifier
>   chain, for an example, see
>   https://lore.kernel.org/lkml/Yn0TnsWVxCcdB2yO@alley/
>
>   The problematic notifier can be then added into a chain which
>   is called before stopping CPUs.

Thanks for sharing this! Such an interesting discussion on what and when 
should be handled in panic path. I think I have a better picture of panic()
now :).

>3. In the worst case, you could change the ordering as proposed above.
>   I am just afraid that it might bring in new problems. There might
>   be notifiers which were not tested with more running CPUs...
>
>
>In general, the system is in an unpredictable state when panic() is
>called. Notifiers should not expect that non-panic CPUs will be
>able to handle any requests.
>
>Also it looks like a good idea to stop non-panic CPUs as soon as possible.
>Otherwise, they might create more harm than good.

I see. I also think that 3. might introduce new problems...
It goes against the general statements which you pointed out 
in which case its not really a problem of handler anymore.

Sincerely,
Ryo Takakura

>Best Regards,
>Petr

[0] https://lore.kernel.org/lkml/20220427224924.592546-25-gpiccoli@igalia.com/

     prev parent reply	other threads:[~2025-03-02  6:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-20 22:53 [PATCH RFC] panic: call panic handlers before panic_other_cpus_shutdown() Hamza Mahfooz
2025-02-21  2:23 ` Ryo Takakura
2025-02-21 21:23   ` Hamza Mahfooz
2025-02-22  5:44     ` Ryo Takakura
2025-02-26 15:49       ` Petr Mladek
2025-03-02  6:36         ` Ryo Takakura [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250302063638.7096-1-ryotkkr98@gmail.com \
    --to=ryotkkr98@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=decui@microsoft.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=haiyangz@microsoft.com \
    --cc=hamzamahfooz@linux.microsoft.com \
    --cc=jani.nikula@intel.com \
    --cc=jfalempe@redhat.com \
    --cc=joel.granados@kernel.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).