All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems
Date: Wed, 18 Apr 2018 19:13:47 +0200	[thread overview]
Message-ID: <20180418171347.GH4795@pd.tnic> (raw)

On Tue, Apr 17, 2018 at 06:30:34PM +0000, Ghannam, Yazen wrote:
> We could but it's an issue of documentation and testing the older systems.
> 
> My first pass at this was to unconditionally read the registers because my
> understanding was that registers that aren't accessible would be read-as-zero.
> I thought this was a common MCA implementation. But Tony pointed out that
> this isn't the case on Intel systems. This is the case on recent AMD systems. But
> I don't know if it's the case on older systems which may or may not have
> followed the Intel implementation more closely.

So if our worry is the #GPs, we can always use the rdmsr*_safe()
variants and look at the return value. And dump a invalid value like
0xdeadbeef or so, if the read failed.

But if any bit of info we've gotten this way, helps us debug an MCE,
we're already golden!

> For example,
> 
> Deferred error occurs:
> - MCA_{STATUS,ADDR,DESTAT,DEADDR} all have valid data.
> 
> MCE occurs
> - MCA_{STATUS,ADDR} are overwritten with non-zero data.
> - MCE handler clears MCA_STATUS. MCA_ADDR is non-zero.
> 
> DFR handler finds MCA_STATUS[Deferred] is clear, so it saves
> MCA_DESTAT and MCA_DEADDR which is 0.
> 
> If !m->addr (which has MCA_DEADDR), then we read MCA_STATUS
> which has the address from the MCE.

The code could use a shorter version of this as a comment to state why
we're doing it. Because it is not obvious.

Thx.

WARNING: multiple messages have this Message-ID (diff)
From: Borislav Petkov <bp@alien8.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems
Date: Wed, 18 Apr 2018 19:13:47 +0200	[thread overview]
Message-ID: <20180418171347.GH4795@pd.tnic> (raw)
In-Reply-To: <CY4PR12MB15574493593120818D9BFE11F8B70@CY4PR12MB1557.namprd12.prod.outlook.com>

On Tue, Apr 17, 2018 at 06:30:34PM +0000, Ghannam, Yazen wrote:
> We could but it's an issue of documentation and testing the older systems.
> 
> My first pass at this was to unconditionally read the registers because my
> understanding was that registers that aren't accessible would be read-as-zero.
> I thought this was a common MCA implementation. But Tony pointed out that
> this isn't the case on Intel systems. This is the case on recent AMD systems. But
> I don't know if it's the case on older systems which may or may not have
> followed the Intel implementation more closely.

So if our worry is the #GPs, we can always use the rdmsr*_safe()
variants and look at the return value. And dump a invalid value like
0xdeadbeef or so, if the read failed.

But if any bit of info we've gotten this way, helps us debug an MCE,
we're already golden!

> For example,
> 
> Deferred error occurs:
> - MCA_{STATUS,ADDR,DESTAT,DEADDR} all have valid data.
> 
> MCE occurs
> - MCA_{STATUS,ADDR} are overwritten with non-zero data.
> - MCE handler clears MCA_STATUS. MCA_ADDR is non-zero.
> 
> DFR handler finds MCA_STATUS[Deferred] is clear, so it saves
> MCA_DESTAT and MCA_DEADDR which is 0.
> 
> If !m->addr (which has MCA_DEADDR), then we read MCA_STATUS
> which has the address from the MCE.

The code could use a shorter version of this as a comment to state why
we're doing it. Because it is not obvious.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

             reply	other threads:[~2018-04-18 17:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-18 17:13 Borislav Petkov [this message]
2018-04-18 17:13 ` [PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems Borislav Petkov
  -- strict thread matches above, loose matches on Subject: below --
2018-04-20 18:03 Borislav Petkov
2018-04-20 18:03 ` [PATCH] " Borislav Petkov
2018-04-20 13:05 Yazen Ghannam
2018-04-20 13:05 ` [PATCH] " Ghannam, Yazen
2018-04-17 18:30 Yazen Ghannam
2018-04-17 18:30 ` [PATCH] " Ghannam, Yazen
2018-04-17 17:21 Borislav Petkov
2018-04-17 17:21 ` [PATCH] " Borislav Petkov
2018-04-02 19:57 Yazen Ghannam
2018-04-02 19:57 ` [PATCH] " Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180418171347.GH4795@pd.tnic \
    --to=bp@alien8.de \
    --cc=Yazen.Ghannam@amd.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.