Linux EDAC development
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Aristeu Rozanski <aris@redhat.com>,
	linux-edac@vger.kernel.org
Subject: [3/3] EDAC, sb_edac: Fix reporting wrong DIMM when patrol scrubber finds error
Date: Thu, 6 Sep 2018 12:53:36 +0200	[thread overview]
Message-ID: <20180906105336.GD10768@zn.tnic> (raw)

On Tue, Sep 04, 2018 at 02:07:59PM -0700, Tony Luck wrote:
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> 
> EDAC driver sometimes reports the wrong DIMM for a memory

"EDAC driver"? Which one?

Please be more precise when writing your commit messages. They're not
write-only.

> error found by the patrol scrubber. It's rooted in h/w that
> only provides a 4KB page aligned address for the error case.
> This means that the EDAC driver will point at the DIMM matching
> offset 0x0 in the 4KB page, but because of interleaving across
> channels and ranks the actual DIMM involved may be different
> if the error is on some other cache line within the page.
> 
> For this error case, we could pass the socket/iMC/channel
> information from the "mce" structure passed the EDAC driver
> and "dimm=-1" to the EDAC core. So it will report all the
> DIMMs on that channel may be affected.
> 
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  drivers/edac/sb_edac.c | 95 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 89 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
> index f3678cdada83..f6009c7d452b 100644
> --- a/drivers/edac/sb_edac.c
> +++ b/drivers/edac/sb_edac.c
> @@ -326,6 +326,7 @@ struct sbridge_info {
>  	const struct interleave_pkg *interleave_pkg;
>  	u8		max_sad;
>  	u8		(*get_node_id)(struct sbridge_pvt *pvt);
> +	u8		(*get_ha)(u8 bank);

I'm staring at all this code and wondering what this "ha" is. I could
use a comment somewhere...

>  	enum mem_type	(*get_memory_type)(struct sbridge_pvt *pvt);
>  	enum dev_type	(*get_width)(struct sbridge_pvt *pvt, u32 mtr);
>  	struct pci_dev	*pci_vtd;
> @@ -1002,6 +1003,22 @@ static u8 knl_get_node_id(struct sbridge_pvt *pvt)
>  	return GET_BITFIELD(reg, 0, 2);
>  }
>  
> +static u8 sbridge_get_ha(u8 bank)
> +{
> +	return 0;
> +}
> +
> +static u8 ibridge_get_ha(u8 bank)
> +{
> +	switch (bank) {
> +	case 7 ... 8:
> +		return bank - 7;
> +	case 9 ... 16:
> +		return (bank - 9) / 4;
> +	default:
> +		return -EINVAL;
> +	}
> +}
>  
>  static u64 haswell_get_tolm(struct sbridge_pvt *pvt)
>  {
> @@ -2207,6 +2224,56 @@ static int get_memory_error_data(struct mem_ctl_info *mci,
>  	return 0;
>  }
>  
> +static int get_memory_error_data_from_mce(struct mem_ctl_info *mci,
> +					  const struct mce *m, u8 *socket,
> +					  u8 *ha, long *channel_mask,
> +					  char *msg)
> +{
> +	u32 reg, channel = GET_BITFIELD(m->status, 0, 3);
> +	struct mem_ctl_info *new_mci;
> +	struct sbridge_pvt *pvt;
> +	struct pci_dev *pci_ha;
> +	bool tad0;
> +
> +	if (channel >= NUM_CHANNELS) {
> +		sprintf(msg, "Invalid channel 0x%x", channel);
> +		return -EINVAL;
> +	}
> +
> +	pvt = mci->pvt_info;
> +	*ha = pvt->info.get_ha(m->bank);

You need to check the get_ha pointer before calling it because
KNIGHTS_LANDING assigns NULL to it.

             reply	other threads:[~2018-09-06 10:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-06 10:53 Borislav Petkov [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-09-06 13:12 [3/3] EDAC, sb_edac: Fix reporting wrong DIMM when patrol scrubber finds error Qiuxu Zhuo
2018-09-06 12:34 Borislav Petkov
2018-09-06 11:48 Qiuxu Zhuo
2018-09-04 21:07 Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180906105336.GD10768@zn.tnic \
    --to=bp@alien8.de \
    --cc=aris@redhat.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox