linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Doug Thompson <norsk5@yahoo.com>
Subject: Re: [EDAC ABI v13 01/25] edac: Initialize the dimm label with the known information
Date: Mon, 7 May 2012 17:52:26 +0200	[thread overview]
Message-ID: <20120507155226.GE11462@aftab.osrc.amd.com> (raw)
In-Reply-To: <1334608729-30803-2-git-send-email-mchehab@redhat.com>

Adding latest version here:

> From 50e9a89aad7045909780d635d73ab2893f8c1f90 Mon Sep 17 00:00:00 2001
> From: Mauro Carvalho Chehab <mchehab@redhat.com>
> Date: Thu, 9 Feb 2012 11:05:20 -0300
> Subject: [PATCH] edac: Initialize the dimm label with the known information
> 
> While userspace doesn't fill the dimm labels, add there the dimm location,
> as described by the used memory model. This could eventually match what
> is described at the dmidecode, making easier for people to identify the

	       in		 making it easier

> memory.

	 stick in error.

> For example, on an Intel motherboard, the memory is described as:
> 
> Memory Device
> 	Array Handle: 0x0029
> 	Error Information Handle: Not Provided
> 	Total Width: 64 bits
> 	Data Width: 64 bits
> 	Size: 2048 MB
> 	Form Factor: DIMM
> 	Set: 1
> 	Locator: A1_DIMM0
> 	Bank Locator: A1_Node0_Channel0_Dimm0
> 	Type: <OUT OF SPEC>
> 	Type Detail: Synchronous
> 	Speed: 800 MHz
> 	Manufacturer: A1_Manufacturer0
> 	Serial Number: A1_SerNum0
> 	Asset Tag: A1_AssetTagNum0
> 	Part Number: A1_PartNum0
> 
> After this patch, the memory label will be filled with:
> 	/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

This is only with the Intel-MCs, right, I still have the csrows here:

tree /sys/devices/system/edac/mc/
/sys/devices/system/edac/mc/
|-- mc0
|   |-- ce_count
|   |-- ce_noinfo_count
|   |-- csrow0
|   |   |-- ce_count
|   |   |-- ch0_ce_count
|   |   |-- ch0_dimm_label
|   |   |-- ch1_ce_count
|   |   |-- ch1_dimm_label
|   |   |-- dev_type
|   |   |-- edac_mode
|   |   |-- mem_type
|   |   |-- size_mb
|   |   `-- ue_count
|   |-- csrow1
|   |   |-- ce_count
|   |   |-- ch0_ce_count
|   |   |-- ch0_dimm_label
|   |   |-- ch1_ce_count
|   |   |-- ch1_dimm_label
|   |   |-- dev_type
|   |   |-- edac_mode
|   |   |-- mem_type
|   |   |-- size_mb
|   |   `-- ue_count
|   |-- csrow2
|   |   |-- ce_count
|   |   |-- ch0_ce_count
|   |   |-- ch0_dimm_label
|   |   |-- ch1_ce_count
|   |   |-- ch1_dimm_label
|   |   |-- dev_type
|   |   |-- edac_mode
|   |   |-- mem_type
|   |   |-- size_mb
|   |   `-- ue_count
 …


> With somewhat matches what it is at the Bank Locator DMI information.

I wouldn't say that - DMI is notoriously unreliable, let's look at some
boxes:

1st box:

Handle 0x0038, DMI type 17, 28 bytes
Memory Device
        Array Handle: 0x0036
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 2048 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM0

2nd box:

Memory Device
        Array Handle: 0x0014
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 4096 bits
        Size: 9 MB
        Form Factor: <OUT OF SPEC>
        Set: 73
        Locator: P0_DIMM_A1
        Bank Locator: CHANNEL A

3rd box:

Handle 0x0033, DMI type 17, 28 bytes
Memory Device
        Array Handle: 0x0031
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 4096 MB
        Form Factor: SODIMM
        Set: 2
        Locator: J401
        Bank Locator: Channel B

and so on.

IOW, DMI fields are almost random permutations of [a-zA-Z0-9].

> So, it is easier to associate the dimm labels, of course assuming that
> the DMI has the Bank Locator filled, and the BIOS doesn't have any bugs.
> 
> Yet, even without it, several motherboards are provided with enough
> info to map from channel/slot (or branch/channel/slot) into the DIMM
> label. So, letting the EDAC core fill it, by default is a good thing.
> 
> It should noticed that, as the label filling happens at the
> edac_mc_alloc(), drivers can override it to better describe the memories
> (and some actually do it).

But I guess having the info is still fine, simply remove the DMI
references in the commit message pls.

> Cc: Aristeu Rozanski <arozansk@redhat.com>
> Cc: Doug Thompson <norsk5@yahoo.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
> ---
>  drivers/edac/edac_mc.c       |   25 +++++++++++++++++++------
>  drivers/edac/edac_mc_sysfs.c |    8 ++++----
>  include/linux/edac.h         |    2 +-
>  3 files changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 4c44cd298c0b..77263b33b7f0 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -210,10 +210,10 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  	struct dimm_info *dimm;
>  	u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
>  	unsigned pos[EDAC_MAX_LAYERS];
> -	void *pvt, *ptr = NULL;
>  	unsigned size, tot_dimms = 1, count = 1;
>  	unsigned tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
> -	int i, j, err, row, chn;
> +	void *pvt, *p, *ptr = NULL;
> +	int i, j, err, row, chn, n, len;
>  	bool per_rank = false;
>  
>  	BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
> @@ -325,9 +325,22 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
>  			i, per_rank ? "rank" : "dimm", (dimm - mci->dimms),
>  			pos[0], pos[1], pos[2], row, chn);
>  
> -		/* Copy DIMM location */
> -		for (j = 0; j < n_layers; j++)
> +		/*
> +		 * Copy DIMM location and initialize the memory location

					 initialize it.

or do you mean two different locations?

> +		 */
> +		len = sizeof(dimm->label);
> +		p = dimm->label;
> +		n = snprintf(p, len, "mc#%u", mc_num);
> +		p += n;
> +		len -= n;
> +		for (j = 0; j < n_layers; j++) {
> +			n = snprintf(p, len, "%s#%u",
> +				     edac_layer_name[layers[j].type],
> +				     pos[j]);
> +			p += n;
> +			len -= n;

Err, you're not checking how much len is left here, i.e.
EDAC_MC_LABEL_LEN. Or even better, each time before you do snprintf.

>  			dimm->location[j] = pos[j];
> +		}
>  
>  		/* Link it to the csrows old API data */
>  		chan->dimm = dimm;
> @@ -837,7 +850,7 @@ static void edac_inc_ce_error(struct mem_ctl_info *mci,
>  {
>  	int i, index = 0;
>  
> -	mci->ce_count++;
> +	mci->ce_mc++;

Oh, renaming them back, ok.

<rest snipped>

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

  reply	other threads:[~2012-05-07 15:52 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-16 20:38 [EDAC ABI v13 00/25] Fix EDAC userspace ABI Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 01/25] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
2012-05-07 15:52   ` Borislav Petkov [this message]
2012-05-14 12:48     ` Borislav Petkov
2012-05-14 13:47       ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 02/25] edac: Cleanup the logs for i7core and sb edac drivers Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 03/25] i5400_edac: improve debug messages to better represent the filled memory Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 04/25] events/hw_event: Create a Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
2012-05-09 12:13   ` Borislav Petkov
2012-05-09 12:50     ` Mauro Carvalho Chehab
2012-05-09 13:22       ` Borislav Petkov
2012-05-09 13:51         ` Mauro Carvalho Chehab
2012-05-09 14:06           ` Borislav Petkov
2012-05-09 14:15             ` Mauro Carvalho Chehab
2012-05-09 14:24               ` Borislav Petkov
2012-05-10 13:16                 ` Mauro Carvalho Chehab
2012-05-10 13:41                   ` Borislav Petkov
2012-05-10 14:53                     ` Mauro Carvalho Chehab
2012-05-10 15:02                       ` Borislav Petkov
2012-05-10 15:08                       ` Mauro Carvalho Chehab
2012-05-10 15:12                         ` Borislav Petkov
2012-05-10 15:16                           ` Mauro Carvalho Chehab
2012-05-10 19:57                             ` [PATCH] edac: Increase version to 3.0.0 (aka: "HERM" version) Mauro Carvalho Chehab
2012-05-11 10:08                               ` Borislav Petkov
2012-05-10 15:20                           ` [EDAC ABI v13 04/25] events/hw_event: Create a Hardware Events Report Mecanism (HERM) Steven Rostedt
2012-05-10 15:27                             ` Borislav Petkov
2012-05-10 15:34                             ` Mauro Carvalho Chehab
2012-05-09 14:19           ` Steven Rostedt
2012-05-10 13:17             ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 05/25] i5000_edac: Fix the logic that retrieves memory information Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 06/25] e752x_edac: provide more info about how DIMMS/ranks are mapped Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 07/25] edac: Rename the parent dev to pdev Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 08/25] edac: use Documentation-nano format for some data structs Mauro Carvalho Chehab
2012-05-09 12:23   ` Borislav Petkov
2012-05-09 12:55     ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 09/25] edac: rewrite the sysfs code to use struct device Mauro Carvalho Chehab
2012-05-09 12:34   ` Borislav Petkov
2012-05-09 13:10     ` Mauro Carvalho Chehab
2012-05-09 13:24       ` Borislav Petkov
2012-05-09 14:09         ` [PATCH v21] edac: rewrite the sysfs code to use struct device - Was: " Mauro Carvalho Chehab
2012-05-09 13:13     ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 10/25] mpc85xx_edac: convert sysfs logic " Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 11/25] amd64_edac: " Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 12/25] i7core_edac: convert it " Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 13/25] edac: Get rid of the old kobj's from the edac mc code Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 14/25] edac: add a new per-dimm API and make the old per-virtual-rank API obsolete Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 15/25] edac: add a sysfs node to report the maximum location for the system Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 16/25] edac: Add debufs nodes to allow doing fake error inject Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 17/25] edac: Create a per-Memory Controller bus Mauro Carvalho Chehab
2012-04-16 23:25   ` Greg K H
2012-04-16 20:38 ` [EDAC ABI v13 18/25] edac: Move grain/dtype/edac_type calculus to be out of channel loop Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 19/25] i82975x_edac: Test nr_pages earlier to save a few CPU cycles Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 20/25] i5100_edac: Fix a warning when compiled with 32 bits Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 21/25] i7300_edac: Get rid of some wrongly-solved rebase conflict Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 22/25] edac: Only expose csrows/channels on legacy API if they're populated Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 23/25] edac: Fix a typo at edac_mc_sysfs Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 24/25] edac: change the mem allocation scheme to make Documentation/kobject.txt happy Mauro Carvalho Chehab
2012-04-17 21:17   ` Joe Perches
2012-04-19 13:14     ` Mauro Carvalho Chehab
2012-04-22  6:37       ` Joe Perches
2012-04-19 13:21     ` [PATCH] " Mauro Carvalho Chehab
2012-04-19 15:28       ` Greg K H
2012-04-16 20:38 ` [EDAC ABI v13 25/25] i7core_edac: " Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120507155226.GE11462@aftab.osrc.amd.com \
    --to=bp@amd64.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=norsk5@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).