From: Chen Yucong <slaoub@gmail.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
"bp@alien8.de" <bp@alien8.de>,
"ak@linux.intel.com" <ak@linux.intel.com>,
"Huang, Ying" <ying.huang@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
Date: Fri, 23 May 2014 09:32:19 +0800 [thread overview]
Message-ID: <1400808739.19982.48.camel@debian> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F328115FD@ORSMSX114.amr.corp.intel.com>
On Wed, 2014-05-21 at 21:09 +0000, Luck, Tony wrote:
> Please do give us more detail on the scenario that you see that would
> make your new version behave better.
>
> I'm sure the current code has no races w.r.t. clearing mces_seen. The
> monarch clears them all in mce_reign() before clearing mce_executing
> at the foot of mce_end() and allowing the others to run again.
>
Right. There are not races for cleaning mces_seen. But, if a timeout
occurs in monarch, mces_seen will be not cleaned. It will affect all
other CPUs.
As Borislav Petkov says, if we reach a timeout, there is very little
chance for recovering. Thought. the probability for this situation to
happen is very slight, it's not impossible. Indeed, it's hard to know
the precise causes for timeout.
As Naoya Horiguchi says, this patch also have a small benefit that it
can reduce the processing time of monarch CPU.
> Your code has the monarch release all the other cpus from the spinloop
> in mce_end() so they will all rush together through the final lines of
> do_machine_check().
No. My code just distribute cleaning operation to Per-CPU. And all other
CPUs still have to wait for clearing mce_executing by monarch.
In fact, mces_seen is just used for system panics as quickly as possible
if there is a truly data corrupting error. So there is not advantage for
cleaning mces_seen in the monarch.
> Some of them will have work to do if they saw
> errors - they may have to send signals, or log the error. Others can
> fly directly to the end of do_machine_check() and clear MCG_STATUS
> and return to executing whatever code was interrupted.
>
> So it is possible that some processors will be out doing things that can
> generate another machine check, before others have finished their
> tasks and got to the point to clear mces_seen.(*)
>
> -Tony
>
> (*) maybe that doesn't matter because they haven't zeroed MCG_STATUS
> yet - so this second machine check will force those cpus to shutdown. See MCIP
> description in "15.3.1.2 IA32_MCG_STATUS_MSR" section of software
> developer manual.
Right. This I also know. This is the reason why you can find the
following snippet in my code:
/*
* Now clear the mces_seen of current CPU -*final - so that it
does not
* reappear on the next mce.
*/
memset(final, 0, sizeof(struct mce));
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
---
Thanks very much for your reply.
cyc
next prev parent reply other threads:[~2014-05-23 1:34 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-20 2:11 [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU Chen Yucong
2014-05-20 17:33 ` Borislav Petkov
2014-05-21 0:48 ` Chen Yucong
2014-05-21 1:33 ` Chen Yucong
2014-05-21 1:40 ` Hidetoshi Seto
2014-05-21 2:03 ` Chen Yucong
2014-05-21 2:43 ` Hidetoshi Seto
2014-05-21 3:19 ` Chen Yucong
2014-05-21 3:36 ` Hidetoshi Seto
2014-05-21 21:09 ` Luck, Tony
2014-05-23 1:32 ` Chen Yucong [this message]
2014-05-23 9:10 ` Borislav Petkov
2014-05-23 11:57 ` Chen Yucong
2014-05-23 22:40 ` Tony Luck
2014-05-23 21:50 ` Tony Luck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1400808739.19982.48.camel@debian \
--to=slaoub@gmail.com \
--cc=ak@linux.intel.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox