From: Borislav Petkov <bp@alien8.de>
To: "河合英宏 / KAWAI,HIDEHIRO" <hidehiro.kawai.ez@hitachi.com>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
"Peter Zijlstra" <peterz@infradead.org>,
"Ingo Molnar" <mingo@kernel.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
"H. Peter Anvin" <hpa@zytor.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Vivek Goyal" <vgoyal@redhat.com>, "Baoquan He" <bhe@redhat.com>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"平松雅巳 / HIRAMATU,MASAMI" <masami.hiramatsu.pt@hitachi.com>
Subject: Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
Date: Wed, 25 Nov 2015 09:56:21 +0100 [thread overview]
Message-ID: <20151125085620.GA29499@pd.tnic> (raw)
In-Reply-To: <04EAB7311EE43145B2D3536183D1A84454A2A425@GSjpTKYDCembx31.service.hitachi.net>
On Wed, Nov 25, 2015 at 05:51:59AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > `Infinite loop in NMI context' can happen:
> > >
> > > a. when a cpu panics on NMI while another cpu is processing panic
> > > b. when a cpu received an external or unknown NMI while another
> > > cpu is processing panic on NMI
> > >
> > > In the case of a, it loops in panic_smp_self_stop(). In the case
> > > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> >
> > Please describe those two cases more verbosely - it takes slow people
> > like me a while to figure out what exactly can happen.
>
> a. when a cpu panics on NMI while another cpu is processing panic
> Ex.
> CPU 0 CPU 1
> ================= =================
> panic()
> panic_cpu <-- 0
> check panic_cpu
> crash_kexec()
> receive an unknown NMI
> unknown_nmi_error()
> nmi_panic()
> panic()
> check panic_cpu
> panic_smp_self_stop()
> infinite loop in NMI context
>
> b. when a cpu received an external or unknown NMI while another
> cpu is processing panic on NMI
> Ex.
> CPU 0 CPU 1
> ====================== ==================
> receive an unknown NMI
> unknown_nmi_error()
> nmi_panic() receive an unknown NMI
> panic_cpu <-- 0 unknown_nmi_error()
> panic() nmi_panic()
> check panic_cpu panic
> crash_kexec() check panic_cpu
> panic_smp_self_stop()
> infinite loop in NMI context
Ok, that's what I saw too, thanks for confirming.
But please write those examples with the *old* code in the commit
message, i.e. without panic_cpu and nmi_panic() which you're adding.
Basically, you want to structure your commit message this way:
This is the problem the current code has: ...
But we need to do this: ...
We fix it by doing that: ...
This will be of great help now when reading the commit message and of
invaluable help later, when we all have forgotten about the issue and
are scratching heads over why stuff was added.
> > > To save registers in these case too, this patch does following things:
> > >
> > > 1. Move the timing of `infinite loop in NMI context' (actually
> > > done by panic_smp_self_stop()) outside of panic() to enable us to
> > > refer pt_regs
> >
> > I can't parse that sentence. And I really tried :-\
> > panic_smp_self_stop() is still in panic().
>
> panic_smp_self_stop() is still used when a CPU in normal context
> should go into infinite loop. Only when a CPU is in NMI context,
> nmi_panic_self_stop() is called and the CPU loops infinitely
> without entering panic().
>
> I'll try to revise this sentense.
FWIW, it sounds better already! :)
> > > + */
> > > + while (!raw_spin_trylock(&nmi_reason_lock))
> > > + poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
>
> As Steven commented, poll_crash_ipi_and_callback() does nothing
> if we're not crash dumping. But yes, this is confusing. I'll add
> more detailed comment, or change the logic a bit if I come up with
> better one.
Thanks, much appreciated!
> > > +/*
> > > + * Wait for the timing of IPI for crash dumping, and then call its callback
> >
> > "Wait for the crash dumping IPI to complete... "
>
> So, I think "Wait for the crash dumping IPI to be issued..." is better.
> "complete" would be ambiguous in this context.
Ok.
>
> > > + * directly. This function is used when we have already been in NMI handler.
> > > + */
> > > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> >
> > Why "poll"? We won't return from crash_nmi_callback() if we're not the
> > crashing CPU.
>
> This function polls that crash IPI has been issued by checking
> crash_ipi_done, then invokes the callback. This is different
> from so-called "poll" functions. Do you have some good name?
Maybe something as simple as "run_crash_callback"?
Or since we're calling it from other places, maybe add the "crash"
prefix:
while (!raw_spin_trylock(&nmi_reason_lock))
crash_run_callback(regs);
Looks even better to me in code context. :)
Thanks!
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
next prev parent reply other threads:[~2015-11-25 8:56 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-20 9:36 [V5 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
2015-11-20 9:36 ` [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
2015-11-23 18:49 ` Borislav Petkov
2015-11-24 4:06 ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-24 12:45 ` Michal Hocko
2015-11-24 15:05 ` Steven Rostedt
2015-11-24 15:12 ` Steven Rostedt
2015-11-24 20:27 ` Michal Hocko
2015-11-24 20:45 ` Steven Rostedt
2015-11-20 9:36 ` [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Hidehiro Kawai
2015-11-24 10:48 ` Borislav Petkov
2015-11-24 19:37 ` Steven Rostedt
2015-11-24 20:16 ` Borislav Petkov
2015-11-25 5:57 ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25 5:51 ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25 8:56 ` Borislav Petkov [this message]
2015-11-25 9:46 ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25 9:57 ` Borislav Petkov
2015-11-25 15:11 ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-24 12:58 ` Michal Hocko
2015-12-03 2:23 ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-20 9:36 ` [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
2015-11-24 13:05 ` Michal Hocko
2015-11-24 20:35 ` Steven Rostedt
2015-11-25 6:28 ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25 9:54 ` Borislav Petkov
2015-12-02 11:57 ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-02 15:40 ` Borislav Petkov
2015-12-03 2:01 ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-03 9:35 ` Borislav Petkov
2015-12-03 11:29 ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-03 12:22 ` Borislav Petkov
2015-11-20 9:36 ` [V5 PATCH 4/4] x86/apic: Introduce apic_extnmi boot option Hidehiro Kawai
2015-11-25 11:49 ` Borislav Petkov
2015-11-25 15:29 ` 河合英宏 / KAWAI,HIDEHIRO
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151125085620.GA29499@pd.tnic \
--to=bp@alien8.de \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=corbet@lwn.net \
--cc=ebiederm@xmission.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=hpa@zytor.com \
--cc=kexec@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mhocko@kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=vgoyal@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox