All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: Tony Luck <tony.luck@intel.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Borislav Petkov <bp@amd64.org>,
	Chen Gong <gong.chen@linux.intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Subject: Re: [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler
Date: Fri, 16 Dec 2011 09:14:10 +0100	[thread overview]
Message-ID: <20111216081410.GA9508@aftab> (raw)
In-Reply-To: <5d6588ab3dadabd8334cffee06b3e87abe3b81b7.1323979146.git.tony.luck@intel.com>

On Wed, Dec 14, 2011 at 03:55:20PM -0800, Tony Luck wrote:
> Machine checks on Intel cpus interrupt execution on all cpus, regardless
> of interrupt masking.  We have a need to save some data about the cause
> of the machine check (physical address) in the machine check handler that
> can be retrieved later to attempt recovery in a more flexible execution
> state.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   43 ++++++++++++++++++++++++++++++++++++++
>  1 files changed, 43 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 645070f..7d7303a 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -887,6 +887,49 @@ static void mce_clear_state(unsigned long *toclear)
>  }
>  
>  /*
> + * Need to save faulting physical address associated with a process
> + * in the machine check handler some place where we can grab it back
> + * later in mce_notify_process()
> + */
> +#define	MAX_MCE_INFO	16
> +
> +struct mce_info {
> +	atomic_t		inuse;
> +	struct task_struct	*t;
> +	__u64			paddr;
> +} mce_info[MAX_MCE_INFO];
> +
> +static void mce_save_info(__u64 addr)
> +{
> +	struct mce_info *mi;
> +
> +	for (mi = mce_info; mi < &mce_info[MAX_MCE_INFO]; mi++) {

This looks strange, although valid. I thought we do

	for (i = 0; i < MCE_INFO_MAX; i++) {
		struct mce_info *mi = &mce_info[i];

		...

in such loops. Just a nitpick I guess.

> +		if (atomic_cmpxchg(&mi->inuse, 0, 1) == 0) {
> +			mi->t = current;
> +			mi->paddr = addr;
> +			return;
> +		}
> +	}
> +
> +	mce_panic("Too many concurrent recoverable errors", NULL, NULL);

So we're setting an artificial limit of 16 in-flight AR errors and if >
16, we're panicking? Do we really want to do that? I guess we do... I got
nothing better anyway.

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

  parent reply	other threads:[~2011-12-16  8:14 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-15 19:59 [PATCH 0/6] x86, mce: machine check recovery for applications [updated] Tony Luck
2011-12-08 22:49 ` [PATCH 6/6] x86, mce: Recognise machine check bank signature for data path error Tony Luck
2011-12-13 17:27 ` [PATCH 2/6] HWPOISON: Add code to handle "action required" errors Tony Luck
2011-12-13 17:48 ` [PATCH 3/6] x86, mce: create helper function to save addr/misc when needed Tony Luck
2011-12-14 23:55 ` [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler Tony Luck
2011-12-16  0:13   ` Hidetoshi Seto
2011-12-16  8:14   ` Borislav Petkov [this message]
2011-12-15 18:48 ` [PATCH 1/6] HWPOISON: clean up memory_failure() vs. __memory_failure() Tony Luck
2011-12-16  8:17   ` Borislav Petkov
2011-12-15 19:02 ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-16  0:14   ` Hidetoshi Seto
2011-12-16  0:29     ` Tony Luck
2011-12-16  0:22       ` [PATCH 5/6] x86, mce: handle "action required" errors (unjumbled version) Tony Luck
2011-12-16 16:35         ` Borislav Petkov
2011-12-17 19:25           ` Tony Luck
2011-12-16  0:51     ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-16 23:36       ` [PATCH 5/6] x86, mce: handle "action required" errors (new version) Tony Luck
  -- strict thread matches above, loose matches on Subject: below --
2011-12-13 19:05 [PATCH 0/6] x86, mce: machine check recovery for applications Tony Luck
2011-12-12 21:06 ` [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler Tony Luck
2011-12-14  7:52   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111216081410.GA9508@aftab \
    --to=bp@amd64.org \
    --cc=gong.chen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.