From: Nicholas Piggin <npiggin@gmail.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au
Subject: Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
Date: Thu, 14 Dec 2017 11:51:37 +1000 [thread overview]
Message-ID: <20171214115137.5603f77d@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <20171214111213.07ba99a4@gmail.com>
On Thu, 14 Dec 2017 11:12:13 +1100
Balbir Singh <bsingharora@gmail.com> wrote:
> On Wed, 13 Dec 2017 20:51:01 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
>
> > This is looking pretty nice now...
> >
> > On Wed, 13 Dec 2017 19:08:28 +1100
> > Balbir Singh <bsingharora@gmail.com> wrote:
> >
> > > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
> > > #ifdef CONFIG_KEXEC_CORE
> > > void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
> > > {
> > > + int cpu;
> > > +
> > > smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
> > > + if (kdump_in_progress() && crash_wake_offline) {
> > > + for_each_present_cpu(cpu) {
> > > + if (cpu_online(cpu))
> > > + continue;
> > > + /*
> > > + * crash_ipi_callback will wait for
> > > + * all cpus, including offline CPUs.
> > > + * We don't care about nmi_ipi_function.
> > > + * Offline cpus will jump straight into
> > > + * crash_ipi_callback, we can skip the
> > > + * entire NMI dance and waiting for
> > > + * cpus to clear pending mask, etc.
> > > + */
> > > + do_smp_send_nmi_ipi(cpu);
> >
> > Still a little bit concerned about using NMI IPI for this.
> >
>
> OK -- for offline CPUs you mean?
Yes.
> > If you take an NMI IPI from stop, the idle code should do the
> > right thing and we would just return the system reset wakeup
> > reason in SRR1 here (which does not need to be cleared).
> >
> > If you take the system reset anywhere else in the loop, it's
> > going to go out via system_reset_exception. I guess that
> > would end up doing the right thing, it probably gets to
> > crash_ipi_callback from crash_kexec_secondary?
>
> You mean like if we are online at the time of NMI'ing? If so
> the original loop will NMI us back into crash_ipi_callback
> anyway. We don't expect this to occur for offline CPUs
No, if the offline CPU is executing any instruction except for
stop when the crash occurs.
>
> >
> > It's just going to be a very untested code path :( What we
> > gain I suppose is better ability to handle a CPU that's locked
> > up somewhere in the cpu offline path. Assuming the uncommon
> > case works...
> >
> > Actually, if you *always* go via the system reset exception
> > handler, then code paths will be shared. That might be the
> > way to go. So I would check for system reset wakeup SRR1 reason
> > and call replay_system_reset() for it. What do you think?
> >
>
> We could do that, but that would call pnv_system_reset_exception
> and try to call the NMI function, but we've not used that path
> to initiate the NMI, so it should call the stale nmi_ipi_function
> which is crash_ipi_callback and not go via the crash_kexec path.
It shouldn't, if the CPU is not set in the NMI bitmask, I think
it should go back out and do the rest of the system_reset_exception
handler.
Anyway we have to get this case right, because it can already hit
as I said if the offline CPU takes the NMI when it is not stopped.
This is why I want to try to use a unified code path.
> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
> I'm worried about calling a stale nmi_ipi_function via the
> system_reset_exception path, if we are OK with it, I can revisit
> the code path
You shouldn't get a stale one, that would also be a bug -- we
have to cope with NMIs coming in at any time that are triggered
externally (not by smp_send_nmi_ipi), so if you see any bugs
there those need to be fixed separately.
Thanks,
Nick
next prev parent reply other threads:[~2017-12-14 1:51 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-13 8:08 [PATCH 1/2] powerpc/crash: Remove the test for cpu_online in the IPI callback Balbir Singh
2017-12-13 8:08 ` [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's Balbir Singh
2017-12-13 10:51 ` Nicholas Piggin
2017-12-14 0:12 ` Balbir Singh
2017-12-14 1:51 ` Nicholas Piggin [this message]
2017-12-14 12:16 ` Balbir Singh
2017-12-14 13:32 ` Nicholas Piggin
2017-12-15 2:55 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171214115137.5603f77d@roar.ozlabs.ibm.com \
--to=npiggin@gmail.com \
--cc=bsingharora@gmail.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).