public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: <slaoub@gmail.com>, Tony Luck <tony.luck@intel.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: Fwd: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
Date: Thu, 9 Oct 2014 11:53:39 -0500	[thread overview]
Message-ID: <20141009165339.GA11360@arav-dinar> (raw)
In-Reply-To: <20141008225750.GH16892@pd.tnic>

On Thu, Oct 09, 2014 at 12:57:50AM +0200, Borislav Petkov wrote:
> On Wed, Oct 08, 2014 at 04:52:06PM -0500, Aravind Gopalakrishnan wrote:
> > I am not understanding why m.bank is assigned this value..
> 
> That's a very good question, see below for some history.
> 
> > 
> > It only causes incorrect decoding-
> > [  608.832916] DEBUG: raise_amd_threshold_event
> > [  608.832926] [Hardware Error]: Corrected error, no action required.
> > [  608.833143] [Hardware Error]: CPU:26 (15:2:0)
> > MC165_STATUS[-|CE|MiscV|-|AddrV|-|-]: 0x8c00000000000000
> > [  608.833551] [Hardware Error]: MC165_ADDR: 0x0000000000000000
> > [  608.833777] [Hardware Error]: cache level: RESV, tx: INSN
> > [  608.834034] amd_inject module loaded ...
> > 
> > 
> > (Obviously, as in amd_decode_mce() we switch (m->bank) for decoding the
> > status and there is no bank 165)
> > 
> > OTOH, if m.bank = bank;
> > Then we get correct decoding info-
> 
> Yes, and I think we should do that only if we're using the *last* error
> to report the overflow with: we're reporting a thresholding counter
> overflow and the bank on which it was detected on should, of course, be
> part of the report.
> 

How do you mean "last error"?
The interrupt is only fired upon overflow..

> The "funny" bank is some sort of a software defined banks thing which
> got added in 2005 (see the patch I dug out below) and it was supposed
> to be used (I'm guessing here) for reporting thermal events using MCA
> (dumb idea, if you ask me) so since thermal events don't really have
> a bank, they decided to have some sort of a software-defined MCA bank
> which doesn't correspond to any hardware bank.
> 
> Then Jacob decided to use it for some reason too:
> 
> 95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors")
> 
> maybe because thresholding errors don't have a bank associated with them
> but if I'm not missing anything, they do!
> 

Right. The thresholding registers are nothing but _MISC(x) where x is a
bank value.

> Oh oh, ok, it just dawned on me! I think I know what it *might* have
> been: they wanted to report the overflowing with a special error
> signature which uses a software-defined bank. Ok, that actually makes
> sense: when you see an error for a sw-defined bank, you're reporting an
> thresholding counter overflow.
> 
> Which means that we shouldn't be populating m.status either, i.e. what
> we did earlier:
> 
> 	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> 
> because this is a special error type.
>

How is it a "special error type"? It's still the same CE error that
we get notified with. Only difference being - now it's crossed a
specific 'threshold_limit'

So- I am not getting the rationale behind a S/W defined bank for reporting
this.

CE error if collected through polling gives proper decoding
info. So, why should this be any different for the same CE error for
which an interrupt is generated on crossing a threshold?

Thanks,
-Aravind

> Hmm, it is too late here to think straight, more tomorrow. But Aravind,
> that was a very good question, you actually made me dig into git history
> :-)
> 
> Good night.
> 

  reply	other threads:[~2014-10-09 16:33 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23  2:16 [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
2014-09-28  8:15   ` Chen Yucong
2014-09-29 12:05   ` Borislav Petkov
2014-09-30  0:39     ` Chen Yucong
2014-09-30  7:25       ` Borislav Petkov
2014-09-30  9:56         ` Chen Yucong
2014-09-30 10:09           ` Borislav Petkov
2014-10-01  4:35             ` Chen Yucong
2014-10-02 13:12               ` Borislav Petkov
2014-10-02 14:37                 ` Chen Yucong
     [not found]                 ` <CAOjmkp9qQiTbqU3NUhUDAoQAa8wAPJnE_qXbDuBKrA3ee1_APQ@mail.gmail.com>
2014-10-08 21:52                   ` Fwd: " Aravind Gopalakrishnan
2014-10-08 22:57                     ` Borislav Petkov
2014-10-09 16:53                       ` Aravind Gopalakrishnan [this message]
2014-10-09 17:35                         ` Borislav Petkov
2014-10-09 19:01                           ` Aravind Gopalakrishnan
2014-10-21 20:28                             ` Borislav Petkov
2014-10-22  1:51                               ` Chen Yucong
2014-10-22  8:16                                 ` Borislav Petkov
2014-10-22  8:53                                   ` Chen Yucong
2014-10-22  9:30                                     ` Borislav Petkov
2014-10-29 15:59                                       ` Aravind Gopalakrishnan
2014-10-30 19:04                                         ` Aravind Gopalakrishnan
2014-10-30 21:39                                           ` Borislav Petkov
2014-10-01  5:26             ` Chen Yucong
2014-10-01 10:10               ` Borislav Petkov
2014-09-28  8:09 ` [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
2014-09-29 11:48 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141009165339.GA11360@arav-dinar \
    --to=aravind.gopalakrishnan@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=slaoub@gmail.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox