From: Ryo Takakura <ryotkkr98@gmail.com>
To: pmladek@suse.com
Cc: akpm@linux-foundation.org, bhe@redhat.com, decui@microsoft.com,
gregkh@linuxfoundation.org, haiyangz@microsoft.com,
hamzamahfooz@linux.microsoft.com, jani.nikula@intel.com,
jfalempe@redhat.com, joel.granados@kernel.org,
john.ogness@linutronix.de, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, ryotkkr98@gmail.com,
wei.liu@kernel.org
Subject: Re: [PATCH RFC] panic: call panic handlers before panic_other_cpus_shutdown()
Date: Sun, 2 Mar 2025 15:36:38 +0900 [thread overview]
Message-ID: <20250302063638.7096-1-ryotkkr98@gmail.com> (raw)
In-Reply-To: <Z784BiUZohADyoOW@pathway.suse.cz>
Hi Petr!
On Wed, 26 Feb 2025 16:49:26 +0100, Petr Mladek wrote:
>On Sat 2025-02-22 14:44:05, Ryo Takakura wrote:
>> On Fri, 21 Feb 2025 16:23:07 -0500, Hamza Mahfooz wrote:
>> >On Fri, Feb 21, 2025 at 11:23:28AM +0900, Ryo Takakura wrote:
>> >> On Thu, 20 Feb 2025 17:53:00 -0500, Hamza Mahfooz wrote:
>> >> >Since, the panic handlers may require certain cpus to be online to panic
>> >> >gracefully, we should call them before turning off SMP. Without this
>> >> >re-ordering, on Hyper-V hv_panic_vmbus_unload() times out, because the
>> >> >vmbus channel is bound to VMBUS_CONNECT_CPU and unless the crashing cpu
>> >> >is the same as VMBUS_CONNECT_CPU, VMBUS_CONNECT_CPU will be offlined by
>> >> >crash_smp_send_stop() before the vmbus channel can be deconstructed.
>> >> >
>> >> So maybe panic_other_cpus_shutdown() should be palced after
>> >> atomic_notifier_call_chain() along with printk_legacy_allow_panic_sync()
>> >> like below?
>> >>
>> >> ----- BEGIN -----
>> >> diff --git a/kernel/panic.c b/kernel/panic.c
>> >> index d8635d5cecb2..7ac40e85ee27 100644
>> >> --- a/kernel/panic.c
>> >> +++ b/kernel/panic.c
>> >> @@ -372,16 +372,16 @@ void panic(const char *fmt, ...)
>> >> if (!_crash_kexec_post_notifiers)
>> >> __crash_kexec(NULL);
>> >>
>> >> - panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
>> >> -
>> >> - printk_legacy_allow_panic_sync();
>> >> -
>> >> /*
>> >> * Run any panic handlers, including those that might need to
>> >> * add information to the kmsg dump output.
>> >> */
>> >> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>> >>
>> >> + panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
>> >> +
>> >> + printk_legacy_allow_panic_sync();
>> >>
>> >> panic_print_sys_info(false);
>> >>
>> >> kmsg_dump_desc(KMSG_DUMP_PANIC, buf);
>> >> ----- END -----
>> >
>> >Ya, that looks fine to me, that's actually how I had it initally, but I
>> >wasn't sure if it had to go before the panic handlers. So, I erred on
>> >the side of caution.
>
>The ordering (stopping CPUs before allowing printk_legacy loop)
>is important from the printk POV. So, keep it, please.
Thanks for the check.
>> I see, sorry that I was only speaking in relation to stored backtraces.
>> It seems that printk_legacy_allow_panic_sync() is placed before
>> atomic_notifier_call_chain() so that it can handle flushing before calling
>> any panic handlers as described [0].
>
>> [0] https://lore.kernel.org/lkml/ZeHSgZs9I3Ihvpye@alley/
>
>> I'm not really familar with the problems associated with panic handlers
>> so I hope maybe John and Petr can help on this matter...
>
>Honestly, I do not have much experience with failures of the panic
>notifiers. But I saw a patchset which tried to add filtering of
>some problematic ones, see
>https://lore.kernel.org/lkml/20220108153451.195121-1-gpiccoli@igalia.com/
>
>I did not like the way of ad-hoc filtering. The right solution was to
>fix the problematic notifiers.
>
>Anyway, it went out that the situation was not that easy. The notifiers
>do various things. Some of them just printing extra information. Others
>stopped or suspended some devices or services. Some should be called
>before and some after crash_dump.
>
>The outcome was a monster-patchset which tried to fix some problematic
>notifiers and split them into more notifier chains, see
>https://lore.kernel.org/all/20220427224924.592546-1-gpiccoli@igalia.com/
>
>Some of the fixes were accepted but the split has never been done.
I see. I went through some of the discussions on the thread [0]
and I can see how complicated the subject is...
>My opinion:
>
>1. The best solution would be to make the problematic notifier working
> with stopped CPUs. The discussion around [v2] suggests that the author
> made it working at least for x86_64, see
> https://lore.kernel.org/r/20250221213055.133849-1-hamzamahfooz@linux.microsoft.com
I agree. But I also like the below solution as well!
>2. Another good solution might be to do the split of the notifier
> chain, for an example, see
> https://lore.kernel.org/lkml/Yn0TnsWVxCcdB2yO@alley/
>
> The problematic notifier can be then added into a chain which
> is called before stopping CPUs.
Thanks for sharing this! Such an interesting discussion on what and when
should be handled in panic path. I think I have a better picture of panic()
now :).
>3. In the worst case, you could change the ordering as proposed above.
> I am just afraid that it might bring in new problems. There might
> be notifiers which were not tested with more running CPUs...
>
>
>In general, the system is in an unpredictable state when panic() is
>called. Notifiers should not expect that non-panic CPUs will be
>able to handle any requests.
>
>Also it looks like a good idea to stop non-panic CPUs as soon as possible.
>Otherwise, they might create more harm than good.
I see. I also think that 3. might introduce new problems...
It goes against the general statements which you pointed out
in which case its not really a problem of handler anymore.
Sincerely,
Ryo Takakura
>Best Regards,
>Petr
[0] https://lore.kernel.org/lkml/20220427224924.592546-25-gpiccoli@igalia.com/
prev parent reply other threads:[~2025-03-02 6:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 22:53 [PATCH RFC] panic: call panic handlers before panic_other_cpus_shutdown() Hamza Mahfooz
2025-02-21 2:23 ` Ryo Takakura
2025-02-21 21:23 ` Hamza Mahfooz
2025-02-22 5:44 ` Ryo Takakura
2025-02-26 15:49 ` Petr Mladek
2025-03-02 6:36 ` Ryo Takakura [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250302063638.7096-1-ryotkkr98@gmail.com \
--to=ryotkkr98@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=decui@microsoft.com \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=hamzamahfooz@linux.microsoft.com \
--cc=jani.nikula@intel.com \
--cc=jfalempe@redhat.com \
--cc=joel.granados@kernel.org \
--cc=john.ogness@linutronix.de \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pmladek@suse.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).