From: Borislav Petkov <bp@alien8.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"tony.luck@intel.com" <tony.luck@intel.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems
Date: Wed, 18 Apr 2018 19:13:47 +0200 [thread overview]
Message-ID: <20180418171347.GH4795@pd.tnic> (raw)
On Tue, Apr 17, 2018 at 06:30:34PM +0000, Ghannam, Yazen wrote:
> We could but it's an issue of documentation and testing the older systems.
>
> My first pass at this was to unconditionally read the registers because my
> understanding was that registers that aren't accessible would be read-as-zero.
> I thought this was a common MCA implementation. But Tony pointed out that
> this isn't the case on Intel systems. This is the case on recent AMD systems. But
> I don't know if it's the case on older systems which may or may not have
> followed the Intel implementation more closely.
So if our worry is the #GPs, we can always use the rdmsr*_safe()
variants and look at the return value. And dump a invalid value like
0xdeadbeef or so, if the read failed.
But if any bit of info we've gotten this way, helps us debug an MCE,
we're already golden!
> For example,
>
> Deferred error occurs:
> - MCA_{STATUS,ADDR,DESTAT,DEADDR} all have valid data.
>
> MCE occurs
> - MCA_{STATUS,ADDR} are overwritten with non-zero data.
> - MCE handler clears MCA_STATUS. MCA_ADDR is non-zero.
>
> DFR handler finds MCA_STATUS[Deferred] is clear, so it saves
> MCA_DESTAT and MCA_DEADDR which is 0.
>
> If !m->addr (which has MCA_DEADDR), then we read MCA_STATUS
> which has the address from the MCE.
The code could use a shorter version of this as a comment to state why
we're doing it. Because it is not obvious.
Thx.
WARNING: multiple messages have this Message-ID (diff)
From: Borislav Petkov <bp@alien8.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"tony.luck@intel.com" <tony.luck@intel.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems
Date: Wed, 18 Apr 2018 19:13:47 +0200 [thread overview]
Message-ID: <20180418171347.GH4795@pd.tnic> (raw)
In-Reply-To: <CY4PR12MB15574493593120818D9BFE11F8B70@CY4PR12MB1557.namprd12.prod.outlook.com>
On Tue, Apr 17, 2018 at 06:30:34PM +0000, Ghannam, Yazen wrote:
> We could but it's an issue of documentation and testing the older systems.
>
> My first pass at this was to unconditionally read the registers because my
> understanding was that registers that aren't accessible would be read-as-zero.
> I thought this was a common MCA implementation. But Tony pointed out that
> this isn't the case on Intel systems. This is the case on recent AMD systems. But
> I don't know if it's the case on older systems which may or may not have
> followed the Intel implementation more closely.
So if our worry is the #GPs, we can always use the rdmsr*_safe()
variants and look at the return value. And dump a invalid value like
0xdeadbeef or so, if the read failed.
But if any bit of info we've gotten this way, helps us debug an MCE,
we're already golden!
> For example,
>
> Deferred error occurs:
> - MCA_{STATUS,ADDR,DESTAT,DEADDR} all have valid data.
>
> MCE occurs
> - MCA_{STATUS,ADDR} are overwritten with non-zero data.
> - MCE handler clears MCA_STATUS. MCA_ADDR is non-zero.
>
> DFR handler finds MCA_STATUS[Deferred] is clear, so it saves
> MCA_DESTAT and MCA_DEADDR which is 0.
>
> If !m->addr (which has MCA_DEADDR), then we read MCA_STATUS
> which has the address from the MCE.
The code could use a shorter version of this as a comment to state why
we're doing it. Because it is not obvious.
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
next reply other threads:[~2018-04-18 17:13 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-18 17:13 Borislav Petkov [this message]
2018-04-18 17:13 ` [PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems Borislav Petkov
-- strict thread matches above, loose matches on Subject: below --
2018-04-20 18:03 Borislav Petkov
2018-04-20 18:03 ` [PATCH] " Borislav Petkov
2018-04-20 13:05 Yazen Ghannam
2018-04-20 13:05 ` [PATCH] " Ghannam, Yazen
2018-04-17 18:30 Yazen Ghannam
2018-04-17 18:30 ` [PATCH] " Ghannam, Yazen
2018-04-17 17:21 Borislav Petkov
2018-04-17 17:21 ` [PATCH] " Borislav Petkov
2018-04-02 19:57 Yazen Ghannam
2018-04-02 19:57 ` [PATCH] " Yazen Ghannam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180418171347.GH4795@pd.tnic \
--to=bp@alien8.de \
--cc=Yazen.Ghannam@amd.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.