From: Borislav Petkov <bp@amd64.org>
To: Tony Luck <tony.luck@intel.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Borislav Petkov <bp@amd64.org>,
Chen Gong <gong.chen@linux.intel.com>,
"Huang, Ying" <ying.huang@intel.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Subject: Re: [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler
Date: Fri, 16 Dec 2011 09:14:10 +0100 [thread overview]
Message-ID: <20111216081410.GA9508@aftab> (raw)
In-Reply-To: <5d6588ab3dadabd8334cffee06b3e87abe3b81b7.1323979146.git.tony.luck@intel.com>
On Wed, Dec 14, 2011 at 03:55:20PM -0800, Tony Luck wrote:
> Machine checks on Intel cpus interrupt execution on all cpus, regardless
> of interrupt masking. We have a need to save some data about the cause
> of the machine check (physical address) in the machine check handler that
> can be retrieved later to attempt recovery in a more flexible execution
> state.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 43 ++++++++++++++++++++++++++++++++++++++
> 1 files changed, 43 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 645070f..7d7303a 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -887,6 +887,49 @@ static void mce_clear_state(unsigned long *toclear)
> }
>
> /*
> + * Need to save faulting physical address associated with a process
> + * in the machine check handler some place where we can grab it back
> + * later in mce_notify_process()
> + */
> +#define MAX_MCE_INFO 16
> +
> +struct mce_info {
> + atomic_t inuse;
> + struct task_struct *t;
> + __u64 paddr;
> +} mce_info[MAX_MCE_INFO];
> +
> +static void mce_save_info(__u64 addr)
> +{
> + struct mce_info *mi;
> +
> + for (mi = mce_info; mi < &mce_info[MAX_MCE_INFO]; mi++) {
This looks strange, although valid. I thought we do
for (i = 0; i < MCE_INFO_MAX; i++) {
struct mce_info *mi = &mce_info[i];
...
in such loops. Just a nitpick I guess.
> + if (atomic_cmpxchg(&mi->inuse, 0, 1) == 0) {
> + mi->t = current;
> + mi->paddr = addr;
> + return;
> + }
> + }
> +
> + mce_panic("Too many concurrent recoverable errors", NULL, NULL);
So we're setting an artificial limit of 16 in-flight AR errors and if >
16, we're panicking? Do we really want to do that? I guess we do... I got
nothing better anyway.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2011-12-16 8:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-15 19:59 [PATCH 0/6] x86, mce: machine check recovery for applications [updated] Tony Luck
2011-12-08 22:49 ` [PATCH 6/6] x86, mce: Recognise machine check bank signature for data path error Tony Luck
2011-12-13 17:27 ` [PATCH 2/6] HWPOISON: Add code to handle "action required" errors Tony Luck
2011-12-13 17:48 ` [PATCH 3/6] x86, mce: create helper function to save addr/misc when needed Tony Luck
2011-12-14 23:55 ` [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler Tony Luck
2011-12-16 0:13 ` Hidetoshi Seto
2011-12-16 8:14 ` Borislav Petkov [this message]
2011-12-15 18:48 ` [PATCH 1/6] HWPOISON: clean up memory_failure() vs. __memory_failure() Tony Luck
2011-12-16 8:17 ` Borislav Petkov
2011-12-15 19:02 ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-16 0:14 ` Hidetoshi Seto
2011-12-16 0:29 ` Tony Luck
2011-12-16 0:22 ` [PATCH 5/6] x86, mce: handle "action required" errors (unjumbled version) Tony Luck
2011-12-16 16:35 ` Borislav Petkov
2011-12-17 19:25 ` Tony Luck
2011-12-16 0:51 ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-16 23:36 ` [PATCH 5/6] x86, mce: handle "action required" errors (new version) Tony Luck
-- strict thread matches above, loose matches on Subject: below --
2011-12-13 19:05 [PATCH 0/6] x86, mce: machine check recovery for applications Tony Luck
2011-12-12 21:06 ` [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler Tony Luck
2011-12-14 7:52 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111216081410.GA9508@aftab \
--to=bp@amd64.org \
--cc=gong.chen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).