From: Hari Bathini <hbathini@linux.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Sourabh Jain <sourabhjain@linux.ibm.com>
Subject: Re: [PATCH] powerpc/crash: save cpu register data in crash_smp_send_stop()
Date: Tue, 24 May 2022 11:42:05 +0530 [thread overview]
Message-ID: <1c95a54a-e96d-1b70-e9fb-6dbba9c7ce98@linux.ibm.com> (raw)
In-Reply-To: <1652171381.tcl5f5aq9f.astroid@bobo.none>
Hi Nick,
Thanks for the review..
On 10/05/22 2:31 pm, Nicholas Piggin wrote:
> Excerpts from Hari Bathini's message of May 7, 2022 2:39 am:
>> Capture register data for secondary CPUs in crash_smp_send_stop()
>> instead of doing it much later in crash_kexec_prepare_cpus() function
>> with another set of NMI IPIs to secondary CPUs. This change avoids
>> unnecessarily tricky post processing of data to get the right
>> backtrace for these CPUs.
>
> Is the tricky post processing done in crash tools?
Yeah. In tools like crash-utility that try to make sense of the
register data captured.
> Is it buggy in
> some situations or just fragile code you want to deprecate? Seems
> like a good goal either way
The post processing may need looking up the emergency stack and
eventually tracing back to the regular stack. No code in crash-utility
to handle it currently though. This meant no proper backtrace, with
crash-utility, for only cases like "crash_kexec_post_notifiers". But
default cases will start having improper backtraces as well with
series [0] moving around crash_smp_send_stop().
> I assume the desire to stop secondaries ASAP is not just to get
> register data but also to limit the amount of damage they might
> cause to the crash process. Can they take interrupts or trigger
> the hard lockup watchdog, for example?
True. Intention is to stop secondaries ASAP and make dump capture
as smooth as possible...
>> -void crash_smp_send_stop(void)
>> -{
>> - static bool stopped = false;
>> -
>> - /*
>> - * In case of fadump, register data for all CPUs is captured by f/w
>> - * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
>> - * this rtas call to avoid tricky post processing of those CPUs'
>> - * backtraces.
>> - */
>> - if (should_fadump_crash())
>> - return;
>
> This is not actually code you changed, but I wonder if it's wrong,
> if fadump is enabled then panic runs without stopping secondaries?
> Doesn't seem quite right.
So far, haven't seen any problem. F/W seems to handle it alright
without having to stop the secondaries before the ibm,os-term call.
But I do agree that stopping secondaries before calling into rtas sounds
like the right thing to do, even if that meant processing more than one
stack to get the proper backtrace..
>> -
>> - if (stopped)
>> - return;
>> -
>> - stopped = true;
>> -
>> -#ifdef CONFIG_NMI_IPI
>> - smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 1000000);
>> -#else
>> - smp_call_function(crash_stop_this_cpu, NULL, 0);
>> -#endif /* CONFIG_NMI_IPI */
>> -}
>
> Now if kexec is not configured do we lose our crash_smp_send_stop
> function, or is it only ever called if kexec is enabled?
crash_smp_send_stop() is proposed to be called for both kdump and
non-kdump cases as well with [1].
>> -
>> #ifdef CONFIG_NMI_IPI
>> static void nmi_stop_this_cpu(struct pt_regs *regs)
>> {
>> diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
>> index 22ceeeb705ab..f06dfe71caca 100644
>> --- a/arch/powerpc/kexec/crash.c
>> +++ b/arch/powerpc/kexec/crash.c
>> @@ -25,6 +25,7 @@
>> #include <asm/setjmp.h>
>> #include <asm/debug.h>
>> #include <asm/interrupt.h>
>> +#include <asm/fadump.h>
>>
>> /*
>> * The primary CPU waits a while for all secondary CPUs to enter. This is to
>> @@ -102,7 +103,7 @@ void crash_ipi_callback(struct pt_regs *regs)
>> /* NOTREACHED */
>> }
>>
>> -static void crash_kexec_prepare_cpus(int cpu)
>> +static void crash_kexec_prepare_cpus(void)
>> {
>> unsigned int msecs;
>> volatile unsigned int ncpus = num_online_cpus() - 1;/* Excluding the panic cpu */
>> @@ -203,7 +204,7 @@ void crash_kexec_secondary(struct pt_regs *regs)
>>
>> #else /* ! CONFIG_SMP */
>>
>> -static void crash_kexec_prepare_cpus(int cpu)
>> +static void crash_kexec_prepare_cpus(void)
>> {
>> /*
>> * move the secondaries to us so that we can copy
>> @@ -249,6 +250,42 @@ static void __maybe_unused crash_kexec_wait_realmode(int cpu)
>> static inline void crash_kexec_wait_realmode(int cpu) {}
>> #endif /* CONFIG_SMP && CONFIG_PPC64 */
>>
>> +void crash_smp_send_stop(void)
>> +{
>> + static int cpus_stopped;
>> +
>> + /*
>> + * In case of fadump, register data for all CPUs is captured by f/w
>> + * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
>> + * this rtas call to avoid tricky post processing of those CPUs'
>> + * backtraces.
>> + */
>> + if (should_fadump_crash())
>> + return;
>> +
>> + if (cpus_stopped)
>> + return;
>> +
>> + cpus_stopped = 1;
>> +
>> + /* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
>> + printk_deferred_enter();
>> +
>> + /*
>> + * This function is only called after the system
>> + * has panicked or is otherwise in a critical state.
>> + * The minimum amount of code to allow a kexec'd kernel
>> + * to run successfully needs to happen here.
>> + *
>> + * In practice this means stopping other cpus in
>> + * an SMP system.
>> + * The kernel is broken so disable interrupts.
>> + */
>> + hard_irq_disable();
>> +
>> + crash_kexec_prepare_cpus();
>
> This seems to move a bit of the kexec code around so this runs
> before notifiers in the panic path now. Maybe that's okay, I don't
> know this code too well, but how feasible would it be to have
> crash_stop_this_cpu() call crash_save_cpu()? And keeping the
> second IPI.
Yeah. With the series [0] being proposed, it makes sense to move
crash_save_cpu() call to crash_stop_this_cpu() itself.
> I do like the idea of removing the second IPI if possible, but
> that could be done later by moving the logic into crash_save_cpu()
> (it could just poll on a flag until the primary releases it to
> the next phase, rather than have the primary send another IPI).
Also, polling for flag in crash_stop_this_cpu()/crash_save_cpu()
and avoiding the second IPI sounds right. Will work on it.
Sorry about the delay in response. I lost track of this one.
Thanks
Hari
[0]
https://lore.kernel.org/linuxppc-dev/20220427224924.592546-1-gpiccoli@igalia.com/
[1]
https://lore.kernel.org/linuxppc-dev/20220427224924.592546-25-gpiccoli@igalia.com/
next prev parent reply other threads:[~2022-05-24 6:13 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-06 16:39 [PATCH] powerpc/crash: save cpu register data in crash_smp_send_stop() Hari Bathini
2022-05-10 9:01 ` Nicholas Piggin
2022-05-24 6:12 ` Hari Bathini [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-06-30 5:30 Hari Bathini
2022-06-30 6:37 ` Hari Bathini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1c95a54a-e96d-1b70-e9fb-6dbba9c7ce98@linux.ibm.com \
--to=hbathini@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=npiggin@gmail.com \
--cc=sourabhjain@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).