From: Borislav Petkov <bp@amd64.org>
To: Tony Luck <tony.luck@intel.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Borislav Petkov <bp@amd64.org>,
Chen Gong <gong.chen@linux.intel.com>,
"Huang, Ying" <ying.huang@intel.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Subject: Re: [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler
Date: Fri, 16 Dec 2011 09:14:10 +0100 [thread overview]
Message-ID: <20111216081410.GA9508@aftab> (raw)
In-Reply-To: <5d6588ab3dadabd8334cffee06b3e87abe3b81b7.1323979146.git.tony.luck@intel.com>
On Wed, Dec 14, 2011 at 03:55:20PM -0800, Tony Luck wrote:
> Machine checks on Intel cpus interrupt execution on all cpus, regardless
> of interrupt masking. We have a need to save some data about the cause
> of the machine check (physical address) in the machine check handler that
> can be retrieved later to attempt recovery in a more flexible execution
> state.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 43 ++++++++++++++++++++++++++++++++++++++
> 1 files changed, 43 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 645070f..7d7303a 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -887,6 +887,49 @@ static void mce_clear_state(unsigned long *toclear)
> }
>
> /*
> + * Need to save faulting physical address associated with a process
> + * in the machine check handler some place where we can grab it back
> + * later in mce_notify_process()
> + */
> +#define MAX_MCE_INFO 16
> +
> +struct mce_info {
> + atomic_t inuse;
> + struct task_struct *t;
> + __u64 paddr;
> +} mce_info[MAX_MCE_INFO];
> +
> +static void mce_save_info(__u64 addr)
> +{
> + struct mce_info *mi;
> +
> + for (mi = mce_info; mi < &mce_info[MAX_MCE_INFO]; mi++) {
This looks strange, although valid. I thought we do
for (i = 0; i < MCE_INFO_MAX; i++) {
struct mce_info *mi = &mce_info[i];
...
in such loops. Just a nitpick I guess.
> + if (atomic_cmpxchg(&mi->inuse, 0, 1) == 0) {
> + mi->t = current;
> + mi->paddr = addr;
> + return;
> + }
> + }
> +
> + mce_panic("Too many concurrent recoverable errors", NULL, NULL);
So we're setting an artificial limit of 16 in-flight AR errors and if >
16, we're panicking? Do we really want to do that? I guess we do... I got
nothing better anyway.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2011-12-16 8:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-15 19:59 [PATCH 0/6] x86, mce: machine check recovery for applications [updated] Tony Luck
2011-12-08 22:49 ` [PATCH 6/6] x86, mce: Recognise machine check bank signature for data path error Tony Luck
2011-12-13 17:27 ` [PATCH 2/6] HWPOISON: Add code to handle "action required" errors Tony Luck
2011-12-13 17:48 ` [PATCH 3/6] x86, mce: create helper function to save addr/misc when needed Tony Luck
2011-12-14 23:55 ` [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler Tony Luck
2011-12-16 0:13 ` Hidetoshi Seto
2011-12-16 8:14 ` Borislav Petkov [this message]
2011-12-15 18:48 ` [PATCH 1/6] HWPOISON: clean up memory_failure() vs. __memory_failure() Tony Luck
2011-12-16 8:17 ` Borislav Petkov
2011-12-15 19:02 ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-16 0:14 ` Hidetoshi Seto
2011-12-16 0:29 ` Tony Luck
2011-12-16 0:22 ` [PATCH 5/6] x86, mce: handle "action required" errors (unjumbled version) Tony Luck
2011-12-16 16:35 ` Borislav Petkov
2011-12-17 19:25 ` Tony Luck
2011-12-16 0:51 ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-16 23:36 ` [PATCH 5/6] x86, mce: handle "action required" errors (new version) Tony Luck
-- strict thread matches above, loose matches on Subject: below --
2011-12-13 19:05 [PATCH 0/6] x86, mce: machine check recovery for applications Tony Luck
2011-12-12 21:06 ` [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler Tony Luck
2011-12-14 7:52 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111216081410.GA9508@aftab \
--to=bp@amd64.org \
--cc=gong.chen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.