public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Aaron Miller <aaronmiller@fb.com>
To: Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: <linux-edac@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Aaron Miller <aaronmiller@fb.com>
Subject: [PATCH v3] EDAC: expose per-dimm error counts in sysfs
Date: Thu, 3 Nov 2016 15:01:53 -0700	[thread overview]
Message-ID: <20161103220153.3997328-1-aaronmiller@fb.com> (raw)
In-Reply-To: <20161025232551.3270769-1-aaronmiller@fb.com>

The old 'csrowX' sysfs directories had per-csrow error counters, but the
new 'dimmX' directories do not currently expose error counts.

EDAC already keeps these counts, add them to sysfs so per-dimm counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n

Signed-off-by: Aaron Miller <aaronmiller@fb.com>
---

Notes:
    v2: Add commit messsage and documentation
    v3: Add ReST documentation on top of Mauro's patchset

 Documentation/ABI/testing/sysfs-devices-edac | 17 +++++++++++++
 Documentation/admin-guide/ras.rst            | 20 +++++++++++++++
 drivers/edac/edac_mc_sysfs.c                 | 38 ++++++++++++++++++++++++++++
 3 files changed, 75 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac
index 6568e0010e1a..46ff929fd52a 100644
--- a/Documentation/ABI/testing/sysfs-devices-edac
+++ b/Documentation/ABI/testing/sysfs-devices-edac
@@ -138,3 +138,20 @@ Contact:	Mauro Carvalho Chehab <m.chehab@samsung.com>
 Description:	This attribute file will display what type of memory is
 		currently on this csrow. Normally, either buffered or
 		unbuffered memory (for example, Unbuffered-DDR3).
+
+What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_ce_count
+Date:		October 2016
+Contact:	linux-edac@vger.kernel.org
+Description:	This attribute file displays the total count of correctable
+		errors that have occurred on this DIMM. This count is very important
+		to examine. CEs provide early indications that a DIMM is beginning
+		to fail. This count field should be monitored for non-zero values
+		and report such information to the system administrator.
+
+What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_ue_count
+Date:		October 2016
+Contact:	linux-edac@vger.kernel.org
+Description:	This attribute file displays the total count of uncorrectable
+		errors that have occurred on this DIMM. If panic_on_ue is set, this
+		counter will not have a chance to increment, since EDAC will panic the
+		system
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index d71340e86c27..9939348bd4a3 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -438,11 +438,13 @@ A typical EDAC system has the following structure under
 	│   │   ├── ce_count
 	│   │   ├── ce_noinfo_count
 	│   │   ├── dimm0
+	│   │   │   ├── dimm_ce_count
 	│   │   │   ├── dimm_dev_type
 	│   │   │   ├── dimm_edac_mode
 	│   │   │   ├── dimm_label
 	│   │   │   ├── dimm_location
 	│   │   │   ├── dimm_mem_type
+	│   │   │   ├── dimm_ue_count
 	│   │   │   ├── size
 	│   │   │   └── uevent
 	│   │   ├── max_location
@@ -457,11 +459,13 @@ A typical EDAC system has the following structure under
 	│   │   ├── ce_count
 	│   │   ├── ce_noinfo_count
 	│   │   ├── dimm0
+	│   │   │   ├── dimm_ce_count
 	│   │   │   ├── dimm_dev_type
 	│   │   │   ├── dimm_edac_mode
 	│   │   │   ├── dimm_label
 	│   │   │   ├── dimm_location
 	│   │   │   ├── dimm_mem_type
+	│   │   │   ├── dimm_ue_count
 	│   │   │   ├── size
 	│   │   │   └── uevent
 	│   │   ├── max_location
@@ -483,6 +487,22 @@ this ``X`` memory module:
 	This attribute file displays, in count of megabytes, the memory
 	that this csrow contains.
 
+- ``dimm_ue_count`` - Uncorrectable Errors count attribute file
+
+	This attribute file displays the total count of uncorrectable
+	errors that have occurred on this DIMM. If panic_on_ue is set
+	this counter will not have a chance to increment, since EDAC
+	will panic the system.
+
+- ``dimm_ce_count`` - Correctable Errors count attribute file
+
+	This attribute file displays the total count of correctable
+	errors that have occurred on this DIMM. This count is very
+	important to examine. CEs provide early indications that a
+	DIMM is beginning to fail. This count field should be
+	monitored for non-zero values and report such information
+	to the system administrator.
+
 - ``dimm_dev_type``  - Device type attribute file
 
 	This attribute file will display what type of DRAM device is
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 39dbab7d62f1..184fed2b005d 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -569,6 +569,40 @@ static ssize_t dimmdev_edac_mode_show(struct device *dev,
 	return sprintf(data, "%s\n", edac_caps[dimm->edac_mode]);
 }
 
+static ssize_t dimmdev_ce_count_show(struct device *dev,
+				      struct device_attribute *mattr,
+				      char *data)
+{
+	struct dimm_info *dimm = to_dimm(dev);
+	u32 count;
+	int off;
+
+	off = EDAC_DIMM_OFF(dimm->mci->layers,
+			    dimm->mci->n_layers,
+			    dimm->location[0],
+			    dimm->location[1],
+			    dimm->location[2]);
+	count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][off];
+	return sprintf(data, "%u\n", count);
+}
+
+static ssize_t dimmdev_ue_count_show(struct device *dev,
+				      struct device_attribute *mattr,
+				      char *data)
+{
+	struct dimm_info *dimm = to_dimm(dev);
+	u32 count;
+	int off;
+
+	off = EDAC_DIMM_OFF(dimm->mci->layers,
+			    dimm->mci->n_layers,
+			    dimm->location[0],
+			    dimm->location[1],
+			    dimm->location[2]);
+	count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][off];
+	return sprintf(data, "%u\n", count);
+}
+
 /* dimm/rank attribute files */
 static DEVICE_ATTR(dimm_label, S_IRUGO | S_IWUSR,
 		   dimmdev_label_show, dimmdev_label_store);
@@ -577,6 +611,8 @@ static DEVICE_ATTR(size, S_IRUGO, dimmdev_size_show, NULL);
 static DEVICE_ATTR(dimm_mem_type, S_IRUGO, dimmdev_mem_type_show, NULL);
 static DEVICE_ATTR(dimm_dev_type, S_IRUGO, dimmdev_dev_type_show, NULL);
 static DEVICE_ATTR(dimm_edac_mode, S_IRUGO, dimmdev_edac_mode_show, NULL);
+static DEVICE_ATTR(dimm_ce_count, S_IRUGO, dimmdev_ce_count_show, NULL);
+static DEVICE_ATTR(dimm_ue_count, S_IRUGO, dimmdev_ue_count_show, NULL);
 
 /* attributes of the dimm<id>/rank<id> object */
 static struct attribute *dimm_attrs[] = {
@@ -586,6 +622,8 @@ static struct attribute *dimm_attrs[] = {
 	&dev_attr_dimm_mem_type.attr,
 	&dev_attr_dimm_dev_type.attr,
 	&dev_attr_dimm_edac_mode.attr,
+	&dev_attr_dimm_ce_count.attr,
+	&dev_attr_dimm_ue_count.attr,
 	NULL,
 };
 
-- 
2.9.3

  parent reply	other threads:[~2016-11-03 22:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-25 23:25 [PATCH] EDAC: expose per-dimm error counts in sysfs Aaron Miller
2016-10-27 18:07 ` Borislav Petkov
2016-10-27 18:18   ` Mauro Carvalho Chehab
2016-10-27 21:23   ` Aaron Miller
2016-10-28  9:55     ` Aaron Miller
2016-10-27 21:33 ` [PATCH v2] " Aaron Miller
2016-10-28 13:02   ` Borislav Petkov
2016-10-28 18:13     ` Aaron Miller
2016-11-02 10:54       ` Borislav Petkov
2016-11-03 22:01 ` Aaron Miller [this message]
2016-11-04 11:21   ` [PATCH v3] " Borislav Petkov
2017-01-19  8:56     ` Aaron Miller
2017-01-19  9:29       ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161103220153.3997328-1-aaronmiller@fb.com \
    --to=aaronmiller@fb.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox