From: Borislav Petkov <bp@amd64.org>
To: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Linux Edac Mailing List <linux-edac@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Doug Thompson <norsk5@yahoo.com>
Subject: Re: [EDAC ABI v13 01/25] edac: Initialize the dimm label with the known information
Date: Mon, 7 May 2012 17:52:26 +0200 [thread overview]
Message-ID: <20120507155226.GE11462@aftab.osrc.amd.com> (raw)
In-Reply-To: <1334608729-30803-2-git-send-email-mchehab@redhat.com>
Adding latest version here:
> From 50e9a89aad7045909780d635d73ab2893f8c1f90 Mon Sep 17 00:00:00 2001
> From: Mauro Carvalho Chehab <mchehab@redhat.com>
> Date: Thu, 9 Feb 2012 11:05:20 -0300
> Subject: [PATCH] edac: Initialize the dimm label with the known information
>
> While userspace doesn't fill the dimm labels, add there the dimm location,
> as described by the used memory model. This could eventually match what
> is described at the dmidecode, making easier for people to identify the
in making it easier
> memory.
stick in error.
> For example, on an Intel motherboard, the memory is described as:
>
> Memory Device
> Array Handle: 0x0029
> Error Information Handle: Not Provided
> Total Width: 64 bits
> Data Width: 64 bits
> Size: 2048 MB
> Form Factor: DIMM
> Set: 1
> Locator: A1_DIMM0
> Bank Locator: A1_Node0_Channel0_Dimm0
> Type: <OUT OF SPEC>
> Type Detail: Synchronous
> Speed: 800 MHz
> Manufacturer: A1_Manufacturer0
> Serial Number: A1_SerNum0
> Asset Tag: A1_AssetTagNum0
> Part Number: A1_PartNum0
>
> After this patch, the memory label will be filled with:
> /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0
This is only with the Intel-MCs, right, I still have the csrows here:
tree /sys/devices/system/edac/mc/
/sys/devices/system/edac/mc/
|-- mc0
| |-- ce_count
| |-- ce_noinfo_count
| |-- csrow0
| | |-- ce_count
| | |-- ch0_ce_count
| | |-- ch0_dimm_label
| | |-- ch1_ce_count
| | |-- ch1_dimm_label
| | |-- dev_type
| | |-- edac_mode
| | |-- mem_type
| | |-- size_mb
| | `-- ue_count
| |-- csrow1
| | |-- ce_count
| | |-- ch0_ce_count
| | |-- ch0_dimm_label
| | |-- ch1_ce_count
| | |-- ch1_dimm_label
| | |-- dev_type
| | |-- edac_mode
| | |-- mem_type
| | |-- size_mb
| | `-- ue_count
| |-- csrow2
| | |-- ce_count
| | |-- ch0_ce_count
| | |-- ch0_dimm_label
| | |-- ch1_ce_count
| | |-- ch1_dimm_label
| | |-- dev_type
| | |-- edac_mode
| | |-- mem_type
| | |-- size_mb
| | `-- ue_count
…
> With somewhat matches what it is at the Bank Locator DMI information.
I wouldn't say that - DMI is notoriously unreliable, let's look at some
boxes:
1st box:
Handle 0x0038, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0036
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: None
Locator: DIMM0
2nd box:
Memory Device
Array Handle: 0x0014
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 4096 bits
Size: 9 MB
Form Factor: <OUT OF SPEC>
Set: 73
Locator: P0_DIMM_A1
Bank Locator: CHANNEL A
3rd box:
Handle 0x0033, DMI type 17, 28 bytes
Memory Device
Array Handle: 0x0031
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 4096 MB
Form Factor: SODIMM
Set: 2
Locator: J401
Bank Locator: Channel B
and so on.
IOW, DMI fields are almost random permutations of [a-zA-Z0-9].
> So, it is easier to associate the dimm labels, of course assuming that
> the DMI has the Bank Locator filled, and the BIOS doesn't have any bugs.
>
> Yet, even without it, several motherboards are provided with enough
> info to map from channel/slot (or branch/channel/slot) into the DIMM
> label. So, letting the EDAC core fill it, by default is a good thing.
>
> It should noticed that, as the label filling happens at the
> edac_mc_alloc(), drivers can override it to better describe the memories
> (and some actually do it).
But I guess having the info is still fine, simply remove the DMI
references in the commit message pls.
> Cc: Aristeu Rozanski <arozansk@redhat.com>
> Cc: Doug Thompson <norsk5@yahoo.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
> ---
> drivers/edac/edac_mc.c | 25 +++++++++++++++++++------
> drivers/edac/edac_mc_sysfs.c | 8 ++++----
> include/linux/edac.h | 2 +-
> 3 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 4c44cd298c0b..77263b33b7f0 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -210,10 +210,10 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
> struct dimm_info *dimm;
> u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
> unsigned pos[EDAC_MAX_LAYERS];
> - void *pvt, *ptr = NULL;
> unsigned size, tot_dimms = 1, count = 1;
> unsigned tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
> - int i, j, err, row, chn;
> + void *pvt, *p, *ptr = NULL;
> + int i, j, err, row, chn, n, len;
> bool per_rank = false;
>
> BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
> @@ -325,9 +325,22 @@ struct mem_ctl_info *edac_mc_alloc(unsigned mc_num,
> i, per_rank ? "rank" : "dimm", (dimm - mci->dimms),
> pos[0], pos[1], pos[2], row, chn);
>
> - /* Copy DIMM location */
> - for (j = 0; j < n_layers; j++)
> + /*
> + * Copy DIMM location and initialize the memory location
initialize it.
or do you mean two different locations?
> + */
> + len = sizeof(dimm->label);
> + p = dimm->label;
> + n = snprintf(p, len, "mc#%u", mc_num);
> + p += n;
> + len -= n;
> + for (j = 0; j < n_layers; j++) {
> + n = snprintf(p, len, "%s#%u",
> + edac_layer_name[layers[j].type],
> + pos[j]);
> + p += n;
> + len -= n;
Err, you're not checking how much len is left here, i.e.
EDAC_MC_LABEL_LEN. Or even better, each time before you do snprintf.
> dimm->location[j] = pos[j];
> + }
>
> /* Link it to the csrows old API data */
> chan->dimm = dimm;
> @@ -837,7 +850,7 @@ static void edac_inc_ce_error(struct mem_ctl_info *mci,
> {
> int i, index = 0;
>
> - mci->ce_count++;
> + mci->ce_mc++;
Oh, renaming them back, ok.
<rest snipped>
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2012-05-07 15:52 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-16 20:38 [EDAC ABI v13 00/25] Fix EDAC userspace ABI Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 01/25] edac: Initialize the dimm label with the known information Mauro Carvalho Chehab
2012-05-07 15:52 ` Borislav Petkov [this message]
2012-05-14 12:48 ` Borislav Petkov
2012-05-14 13:47 ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 02/25] edac: Cleanup the logs for i7core and sb edac drivers Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 03/25] i5400_edac: improve debug messages to better represent the filled memory Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 04/25] events/hw_event: Create a Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
2012-05-09 12:13 ` Borislav Petkov
2012-05-09 12:50 ` Mauro Carvalho Chehab
2012-05-09 13:22 ` Borislav Petkov
2012-05-09 13:51 ` Mauro Carvalho Chehab
2012-05-09 14:06 ` Borislav Petkov
2012-05-09 14:15 ` Mauro Carvalho Chehab
2012-05-09 14:24 ` Borislav Petkov
2012-05-10 13:16 ` Mauro Carvalho Chehab
2012-05-10 13:41 ` Borislav Petkov
2012-05-10 14:53 ` Mauro Carvalho Chehab
2012-05-10 15:02 ` Borislav Petkov
2012-05-10 15:08 ` Mauro Carvalho Chehab
2012-05-10 15:12 ` Borislav Petkov
2012-05-10 15:16 ` Mauro Carvalho Chehab
2012-05-10 19:57 ` [PATCH] edac: Increase version to 3.0.0 (aka: "HERM" version) Mauro Carvalho Chehab
2012-05-11 10:08 ` Borislav Petkov
2012-05-10 15:20 ` [EDAC ABI v13 04/25] events/hw_event: Create a Hardware Events Report Mecanism (HERM) Steven Rostedt
2012-05-10 15:27 ` Borislav Petkov
2012-05-10 15:34 ` Mauro Carvalho Chehab
2012-05-09 14:19 ` Steven Rostedt
2012-05-10 13:17 ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 05/25] i5000_edac: Fix the logic that retrieves memory information Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 06/25] e752x_edac: provide more info about how DIMMS/ranks are mapped Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 07/25] edac: Rename the parent dev to pdev Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 08/25] edac: use Documentation-nano format for some data structs Mauro Carvalho Chehab
2012-05-09 12:23 ` Borislav Petkov
2012-05-09 12:55 ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 09/25] edac: rewrite the sysfs code to use struct device Mauro Carvalho Chehab
2012-05-09 12:34 ` Borislav Petkov
2012-05-09 13:10 ` Mauro Carvalho Chehab
2012-05-09 13:24 ` Borislav Petkov
2012-05-09 14:09 ` [PATCH v21] edac: rewrite the sysfs code to use struct device - Was: " Mauro Carvalho Chehab
2012-05-09 13:13 ` Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 10/25] mpc85xx_edac: convert sysfs logic " Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 11/25] amd64_edac: " Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 12/25] i7core_edac: convert it " Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 13/25] edac: Get rid of the old kobj's from the edac mc code Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 14/25] edac: add a new per-dimm API and make the old per-virtual-rank API obsolete Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 15/25] edac: add a sysfs node to report the maximum location for the system Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 16/25] edac: Add debufs nodes to allow doing fake error inject Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 17/25] edac: Create a per-Memory Controller bus Mauro Carvalho Chehab
2012-04-16 23:25 ` Greg K H
2012-04-16 20:38 ` [EDAC ABI v13 18/25] edac: Move grain/dtype/edac_type calculus to be out of channel loop Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 19/25] i82975x_edac: Test nr_pages earlier to save a few CPU cycles Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 20/25] i5100_edac: Fix a warning when compiled with 32 bits Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 21/25] i7300_edac: Get rid of some wrongly-solved rebase conflict Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 22/25] edac: Only expose csrows/channels on legacy API if they're populated Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 23/25] edac: Fix a typo at edac_mc_sysfs Mauro Carvalho Chehab
2012-04-16 20:38 ` [EDAC ABI v13 24/25] edac: change the mem allocation scheme to make Documentation/kobject.txt happy Mauro Carvalho Chehab
2012-04-17 21:17 ` Joe Perches
2012-04-19 13:14 ` Mauro Carvalho Chehab
2012-04-22 6:37 ` Joe Perches
2012-04-19 13:21 ` [PATCH] " Mauro Carvalho Chehab
2012-04-19 15:28 ` Greg K H
2012-04-16 20:38 ` [EDAC ABI v13 25/25] i7core_edac: " Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120507155226.GE11462@aftab.osrc.amd.com \
--to=bp@amd64.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
--cc=norsk5@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).