From: Aaron Miller <aaronmiller@fb.com>
To: Borislav Petkov <bp@alien8.de>,
Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <linux-edac@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Aaron Miller <aaronmiller@fb.com>
Subject: [PATCH v3] EDAC: expose per-dimm error counts in sysfs
Date: Thu, 3 Nov 2016 15:01:53 -0700 [thread overview]
Message-ID: <20161103220153.3997328-1-aaronmiller@fb.com> (raw)
In-Reply-To: <20161025232551.3270769-1-aaronmiller@fb.com>
The old 'csrowX' sysfs directories had per-csrow error counters, but the
new 'dimmX' directories do not currently expose error counts.
EDAC already keeps these counts, add them to sysfs so per-dimm counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n
Signed-off-by: Aaron Miller <aaronmiller@fb.com>
---
Notes:
v2: Add commit messsage and documentation
v3: Add ReST documentation on top of Mauro's patchset
Documentation/ABI/testing/sysfs-devices-edac | 17 +++++++++++++
Documentation/admin-guide/ras.rst | 20 +++++++++++++++
drivers/edac/edac_mc_sysfs.c | 38 ++++++++++++++++++++++++++++
3 files changed, 75 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac
index 6568e0010e1a..46ff929fd52a 100644
--- a/Documentation/ABI/testing/sysfs-devices-edac
+++ b/Documentation/ABI/testing/sysfs-devices-edac
@@ -138,3 +138,20 @@ Contact: Mauro Carvalho Chehab <m.chehab@samsung.com>
Description: This attribute file will display what type of memory is
currently on this csrow. Normally, either buffered or
unbuffered memory (for example, Unbuffered-DDR3).
+
+What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_ce_count
+Date: October 2016
+Contact: linux-edac@vger.kernel.org
+Description: This attribute file displays the total count of correctable
+ errors that have occurred on this DIMM. This count is very important
+ to examine. CEs provide early indications that a DIMM is beginning
+ to fail. This count field should be monitored for non-zero values
+ and report such information to the system administrator.
+
+What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_ue_count
+Date: October 2016
+Contact: linux-edac@vger.kernel.org
+Description: This attribute file displays the total count of uncorrectable
+ errors that have occurred on this DIMM. If panic_on_ue is set, this
+ counter will not have a chance to increment, since EDAC will panic the
+ system
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index d71340e86c27..9939348bd4a3 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -438,11 +438,13 @@ A typical EDAC system has the following structure under
│ │ ├── ce_count
│ │ ├── ce_noinfo_count
│ │ ├── dimm0
+ │ │ │ ├── dimm_ce_count
│ │ │ ├── dimm_dev_type
│ │ │ ├── dimm_edac_mode
│ │ │ ├── dimm_label
│ │ │ ├── dimm_location
│ │ │ ├── dimm_mem_type
+ │ │ │ ├── dimm_ue_count
│ │ │ ├── size
│ │ │ └── uevent
│ │ ├── max_location
@@ -457,11 +459,13 @@ A typical EDAC system has the following structure under
│ │ ├── ce_count
│ │ ├── ce_noinfo_count
│ │ ├── dimm0
+ │ │ │ ├── dimm_ce_count
│ │ │ ├── dimm_dev_type
│ │ │ ├── dimm_edac_mode
│ │ │ ├── dimm_label
│ │ │ ├── dimm_location
│ │ │ ├── dimm_mem_type
+ │ │ │ ├── dimm_ue_count
│ │ │ ├── size
│ │ │ └── uevent
│ │ ├── max_location
@@ -483,6 +487,22 @@ this ``X`` memory module:
This attribute file displays, in count of megabytes, the memory
that this csrow contains.
+- ``dimm_ue_count`` - Uncorrectable Errors count attribute file
+
+ This attribute file displays the total count of uncorrectable
+ errors that have occurred on this DIMM. If panic_on_ue is set
+ this counter will not have a chance to increment, since EDAC
+ will panic the system.
+
+- ``dimm_ce_count`` - Correctable Errors count attribute file
+
+ This attribute file displays the total count of correctable
+ errors that have occurred on this DIMM. This count is very
+ important to examine. CEs provide early indications that a
+ DIMM is beginning to fail. This count field should be
+ monitored for non-zero values and report such information
+ to the system administrator.
+
- ``dimm_dev_type`` - Device type attribute file
This attribute file will display what type of DRAM device is
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 39dbab7d62f1..184fed2b005d 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -569,6 +569,40 @@ static ssize_t dimmdev_edac_mode_show(struct device *dev,
return sprintf(data, "%s\n", edac_caps[dimm->edac_mode]);
}
+static ssize_t dimmdev_ce_count_show(struct device *dev,
+ struct device_attribute *mattr,
+ char *data)
+{
+ struct dimm_info *dimm = to_dimm(dev);
+ u32 count;
+ int off;
+
+ off = EDAC_DIMM_OFF(dimm->mci->layers,
+ dimm->mci->n_layers,
+ dimm->location[0],
+ dimm->location[1],
+ dimm->location[2]);
+ count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][off];
+ return sprintf(data, "%u\n", count);
+}
+
+static ssize_t dimmdev_ue_count_show(struct device *dev,
+ struct device_attribute *mattr,
+ char *data)
+{
+ struct dimm_info *dimm = to_dimm(dev);
+ u32 count;
+ int off;
+
+ off = EDAC_DIMM_OFF(dimm->mci->layers,
+ dimm->mci->n_layers,
+ dimm->location[0],
+ dimm->location[1],
+ dimm->location[2]);
+ count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][off];
+ return sprintf(data, "%u\n", count);
+}
+
/* dimm/rank attribute files */
static DEVICE_ATTR(dimm_label, S_IRUGO | S_IWUSR,
dimmdev_label_show, dimmdev_label_store);
@@ -577,6 +611,8 @@ static DEVICE_ATTR(size, S_IRUGO, dimmdev_size_show, NULL);
static DEVICE_ATTR(dimm_mem_type, S_IRUGO, dimmdev_mem_type_show, NULL);
static DEVICE_ATTR(dimm_dev_type, S_IRUGO, dimmdev_dev_type_show, NULL);
static DEVICE_ATTR(dimm_edac_mode, S_IRUGO, dimmdev_edac_mode_show, NULL);
+static DEVICE_ATTR(dimm_ce_count, S_IRUGO, dimmdev_ce_count_show, NULL);
+static DEVICE_ATTR(dimm_ue_count, S_IRUGO, dimmdev_ue_count_show, NULL);
/* attributes of the dimm<id>/rank<id> object */
static struct attribute *dimm_attrs[] = {
@@ -586,6 +622,8 @@ static struct attribute *dimm_attrs[] = {
&dev_attr_dimm_mem_type.attr,
&dev_attr_dimm_dev_type.attr,
&dev_attr_dimm_edac_mode.attr,
+ &dev_attr_dimm_ce_count.attr,
+ &dev_attr_dimm_ue_count.attr,
NULL,
};
--
2.9.3
next prev parent reply other threads:[~2016-11-03 22:02 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-25 23:25 [PATCH] EDAC: expose per-dimm error counts in sysfs Aaron Miller
2016-10-27 18:07 ` Borislav Petkov
2016-10-27 18:18 ` Mauro Carvalho Chehab
2016-10-27 21:23 ` Aaron Miller
2016-10-28 9:55 ` Aaron Miller
2016-10-27 21:33 ` [PATCH v2] " Aaron Miller
2016-10-28 13:02 ` Borislav Petkov
2016-10-28 18:13 ` Aaron Miller
2016-11-02 10:54 ` Borislav Petkov
2016-11-03 22:01 ` Aaron Miller [this message]
2016-11-04 11:21 ` [PATCH v3] " Borislav Petkov
2017-01-19 8:56 ` Aaron Miller
2017-01-19 9:29 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161103220153.3997328-1-aaronmiller@fb.com \
--to=aaronmiller@fb.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox