All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Chen Yucong <slaoub@gmail.com>
Cc: tony.luck@intel.com, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it
Date: Thu, 2 Oct 2014 15:12:06 +0200	[thread overview]
Message-ID: <20141002131206.GA16452@pd.tnic> (raw)
In-Reply-To: <1412138102.21488.20.camel@debian>

On Wed, Oct 01, 2014 at 12:35:02PM +0800, Chen Yucong wrote:
> On Tue, 2014-09-30 at 12:09 +0200, Borislav Petkov wrote:
> > On Tue, Sep 30, 2014 at 05:56:31PM +0800, Chen Yucong wrote:
> > > I just clear it to avoid that the mce_log() call logs the above
> > > threshold event again in machine_check_poll().
> > 
> > Ok, that's a good point, please put it in the commit message.
> > 
> > > It is just used for scanning other banks for recording other valid
> > > error information.
> > 
> > This is actually not what we want - we want to log the errors which
> > cause the overflow first and then the rest. So you don't need the goto
> > but simply have the machine_check_poll() at the end. 
> 
> 
> From: Chen Yucong <slaoub@gmail.com>
> 
> machine_check_poll() will reset IA32_MCi_STATUS register to zero.
> So we need to save the content of IA32_MCi_STATUS MSRs before
> calling machine_check_poll() for logging threshold interrupt event.
> 
> mce_setup() does not gather the content of IA32_MCG_STATUS, so it
> should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
> to avoid that mce_log() logs the processed threshold event again
> at next time.
> 
> Signed-off-by: Chen Yucong <slaoub@gmail.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c |   18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f8c56bd..643e6a2 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -274,6 +274,7 @@ static void amd_threshold_interrupt(void)
>  	struct mce m;
>  
>  	mce_setup(&m);
> +	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
>  
>  	/* assume first bank caused it */
>  	for (bank = 0; bank < mca_cfg.banks; ++bank) {
> @@ -305,24 +306,27 @@ static void amd_threshold_interrupt(void)
>  			     (high & MASK_LOCKED_HI))
>  				continue;
>  
> -			/*
> -			 * Log the machine check that caused the threshold
> -			 * event.
> -			 */
> -			machine_check_poll(MCP_TIMESTAMP,
> -					this_cpu_ptr(&mce_poll_banks));
> -
>  			if (high & MASK_OVERFLOW_HI) {
>  				rdmsrl(address, m.misc);
>  				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
> +				if (m.status & MCI_STATUS_ADDRV)
> +					rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr);
>  				m.bank = K8_MCE_THRESHOLD_BASE
>  				       + bank * NR_BLOCKS
>  				       + block;
>  				mce_log(&m);
> +
> +				wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
>  				return;

Ok, this return is still bugging me - we're logging the error which
caused the counter overflow but we go and explicitly clear _STATUS so
that machine_check_poll doesn't pick up the same error again.

Even though, machine_check_poll is intended to log the thresholding
error.

Which actually makes me think that that machine_check_poll is actually
completely useless there. IOW, how about that instead:

---
From: Chen Yucong <slaoub@gmail.com>
Date: Thu, 2 Oct 2014 14:48:19 +0200
Subject: [PATCH] x86, MCE, AMD: Correct thresholding error logging

mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
to avoid that mce_log() logs the processed threshold event again
at next time.

But we do the logging ourselves and machine_check_poll() is completely
useless there. So kill it.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 1c54d3d61a4d..9ce64955559d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -270,14 +270,13 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 static void amd_threshold_interrupt(void)
 {
 	u32 low = 0, high = 0, address = 0;
+	int cpu = smp_processor_id();
 	unsigned int bank, block;
 	struct mce m;
 
-	mce_setup(&m);
-
 	/* assume first bank caused it */
 	for (bank = 0; bank < mca_cfg.banks; ++bank) {
-		if (!(per_cpu(bank_map, m.cpu) & (1 << bank)))
+		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 		for (block = 0; block < NR_BLOCKS; ++block) {
 			if (block == 0) {
@@ -309,20 +308,21 @@ static void amd_threshold_interrupt(void)
 			 * Log the machine check that caused the threshold
 			 * event.
 			 */
-			machine_check_poll(MCP_TIMESTAMP,
-					&__get_cpu_var(mce_poll_banks));
-
-			if (high & MASK_OVERFLOW_HI) {
-				rdmsrl(address, m.misc);
-				rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
-				m.bank = K8_MCE_THRESHOLD_BASE
-				       + bank * NR_BLOCKS
-				       + block;
-				mce_log(&m);
-				return;
-			}
+			if (high & MASK_OVERFLOW_HI)
+				goto log;
 		}
 	}
+	return;
+
+log:
+	mce_setup(&m);
+	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
+	rdmsrl(address, m.misc);
+	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+	m.bank = K8_MCE_THRESHOLD_BASE + bank * NR_BLOCKS + block;
+	mce_log(&m);
+
+	wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
 }
 
 /*
-- 
2.0.0

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

  reply	other threads:[~2014-10-02 13:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23  2:16 [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
2014-09-23  8:19 ` [PATCH] x86, MCE, AMD: save IA32_MCi_STATUS before machine_check_poll() resets it Chen Yucong
2014-09-28  8:15   ` Chen Yucong
2014-09-29 12:05   ` Borislav Petkov
2014-09-30  0:39     ` Chen Yucong
2014-09-30  7:25       ` Borislav Petkov
2014-09-30  9:56         ` Chen Yucong
2014-09-30 10:09           ` Borislav Petkov
2014-10-01  4:35             ` Chen Yucong
2014-10-02 13:12               ` Borislav Petkov [this message]
2014-10-02 14:37                 ` Chen Yucong
     [not found]                 ` <CAOjmkp9qQiTbqU3NUhUDAoQAa8wAPJnE_qXbDuBKrA3ee1_APQ@mail.gmail.com>
2014-10-08 21:52                   ` Fwd: " Aravind Gopalakrishnan
2014-10-08 22:57                     ` Borislav Petkov
2014-10-09 16:53                       ` Aravind Gopalakrishnan
2014-10-09 17:35                         ` Borislav Petkov
2014-10-09 19:01                           ` Aravind Gopalakrishnan
2014-10-21 20:28                             ` Borislav Petkov
2014-10-22  1:51                               ` Chen Yucong
2014-10-22  8:16                                 ` Borislav Petkov
2014-10-22  8:53                                   ` Chen Yucong
2014-10-22  9:30                                     ` Borislav Petkov
2014-10-29 15:59                                       ` Aravind Gopalakrishnan
2014-10-30 19:04                                         ` Aravind Gopalakrishnan
2014-10-30 21:39                                           ` Borislav Petkov
2014-10-01  5:26             ` Chen Yucong
2014-10-01 10:10               ` Borislav Petkov
2014-09-28  8:09 ` [PATCH] x86, MCE, AMD: use macros to compute bank MSRs Chen Yucong
2014-09-29 11:48 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141002131206.GA16452@pd.tnic \
    --to=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=slaoub@gmail.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.