From: Xunlei Pang <xpang@redhat.com>
To: "Luck, Tony" <tony.luck@intel.com>, Xunlei Pang <xlpang@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
x86@kernel.org, kexec@lists.infradead.org,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Junichi Nomura <j-nomura@ce.jp.nec.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Dave Young <dyoung@redhat.com>
Subject: Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
Date: Thu, 23 Feb 2017 14:04:50 +0800 [thread overview]
Message-ID: <58AE7B82.2040008@redhat.com> (raw)
In-Reply-To: <20170222185015.GA6141@intel.com>
On 02/23/2017 at 02:50 AM, Luck, Tony wrote:
> On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
>> + /*
>> + * Cases to bail out to avoid rendezvous process timeout:
>> + * 1)If this CPU is offline.
>> + * 2)If crashing_cpu was set, e.g. entering kdump,
>> + * we need to skip cpus remaining in 1st kernel.
>> + */
>> + if (cpu_is_offline(cpu) ||
>> + (crashing_cpu != -1 && crashing_cpu != cpu)) {
>> u64 mcgstatus;
>>
>> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>
> I think we should document the remaining race conditions. I don't
> think there is any good way to eliminate them, and they are already
> pretty small windows.
>
> I think the sequence of events looks like:
>
> 1 Panic occurs
> 2 nmi_shootdown_cpus() sets crashing_cpu
> 3 send NMI to everyone else
> 4 wait up to a second for other CPUs to take NMI
> 5 go to kexec code
> 6 start new kernel
> 7 new kernel establishes #MC handler
>
> If one of the other cpus triggers a machine check while
> getting to, or in, the NMI handler ... then that cpu will
> skip processing (if RIPV is set).
>
> Between '2' and '5' if crashing_cpu gets a machine check it
> will execute in the old kernel handler, and do the right thing.
>
> There's a fuzzy area between '6' and '7' where a machine check
> might not end up in the right code.
>
> From '7' onwards the kexec kernel will handle and machine
> checks caused by kdump.
>
Agree, will update the comment.
Regards,
Xunlei
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Xunlei Pang <xpang@redhat.com>
To: "Luck, Tony" <tony.luck@intel.com>, Xunlei Pang <xlpang@redhat.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
kexec@lists.infradead.org, Borislav Petkov <bp@alien8.de>,
Ingo Molnar <mingo@redhat.com>, Dave Young <dyoung@redhat.com>,
Prarit Bhargava <prarit@redhat.com>,
Junichi Nomura <j-nomura@ce.jp.nec.com>,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made
Date: Thu, 23 Feb 2017 14:04:50 +0800 [thread overview]
Message-ID: <58AE7B82.2040008@redhat.com> (raw)
In-Reply-To: <20170222185015.GA6141@intel.com>
On 02/23/2017 at 02:50 AM, Luck, Tony wrote:
> On Wed, Feb 22, 2017 at 12:11:14PM +0800, Xunlei Pang wrote:
>> + /*
>> + * Cases to bail out to avoid rendezvous process timeout:
>> + * 1)If this CPU is offline.
>> + * 2)If crashing_cpu was set, e.g. entering kdump,
>> + * we need to skip cpus remaining in 1st kernel.
>> + */
>> + if (cpu_is_offline(cpu) ||
>> + (crashing_cpu != -1 && crashing_cpu != cpu)) {
>> u64 mcgstatus;
>>
>> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>
> I think we should document the remaining race conditions. I don't
> think there is any good way to eliminate them, and they are already
> pretty small windows.
>
> I think the sequence of events looks like:
>
> 1 Panic occurs
> 2 nmi_shootdown_cpus() sets crashing_cpu
> 3 send NMI to everyone else
> 4 wait up to a second for other CPUs to take NMI
> 5 go to kexec code
> 6 start new kernel
> 7 new kernel establishes #MC handler
>
> If one of the other cpus triggers a machine check while
> getting to, or in, the NMI handler ... then that cpu will
> skip processing (if RIPV is set).
>
> Between '2' and '5' if crashing_cpu gets a machine check it
> will execute in the old kernel handler, and do the right thing.
>
> There's a fuzzy area between '6' and '7' where a machine check
> might not end up in the right code.
>
> From '7' onwards the kexec kernel will handle and machine
> checks caused by kdump.
>
Agree, will update the comment.
Regards,
Xunlei
next prev parent reply other threads:[~2017-02-23 6:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-22 4:11 [PATCH v3] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made Xunlei Pang
2017-02-22 4:11 ` Xunlei Pang
2017-02-22 18:50 ` Luck, Tony
2017-02-22 18:50 ` Luck, Tony
2017-02-23 6:04 ` Xunlei Pang [this message]
2017-02-23 6:04 ` Xunlei Pang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58AE7B82.2040008@redhat.com \
--to=xpang@redhat.com \
--cc=bp@alien8.de \
--cc=dyoung@redhat.com \
--cc=j-nomura@ce.jp.nec.com \
--cc=k-ueda@ct.jp.nec.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=prarit@redhat.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=xlpang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.