From: Borislav Petkov <bp@amd64.org>
To: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Linux Edac Mailing List <linux-edac@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/6] Add a per-dimm structure
Date: Sun, 11 Mar 2012 12:34:11 +0100 [thread overview]
Message-ID: <20120311113411.GB29175@aftab> (raw)
In-Reply-To: <4F5A5E2D.4090408@redhat.com>
On Fri, Mar 09, 2012 at 04:46:53PM -0300, Mauro Carvalho Chehab wrote:
[..]
> > Right, what I mean is that the rank?/ already contains some info:
> >
> > rank0/
> > |-- dimm_dev_type
> > |-- dimm_edac_mode
> > |-- dimm_label
> > |-- dimm_location
> > |-- dimm_mem_type
> > `-- dimm_size
> >
> > Now, we do the CE/UE error counting on a per-rank granularity anyway, so
> > the most natural way to have that is to add those counts to the ranks:
> >
> > rank0/
> > |-- dimm_dev_type
> > |-- dimm_edac_mode
> > |-- dimm_label
> > |-- dimm_location
> > |-- dimm_mem_type
> > |-- CE
> > |-- UE
> > `-- dimm_size
> >
> > And this has to be _very_ easy to do without any adding additional
> > sysfs nodes with ugly names to /sys/devices/system/edac etc. This is
> > even better grouping than the mc?/-based hierarchy I suggested above,
> > actually.
>
> Agreed. Yeah, it is easy to add CE/UE there. I actually implemented it
> on one of my internal patches, but there's an issue:
>
> The typical case for UE is to report errors by cacheline (128 bits), and
> not by DIMM. This happens on all FB-DIMM memory controllers, and also on
> several CS-based ones.
>
> For example, this is how (currently) the amd64_handle_ue() handles an
> Uncorrected Error:
>
> error_address_to_page_and_offset(sys_addr, &page, &offset);
> edac_mc_handle_ue(log_mci, page, offset, csrow, EDAC_MOD_STR);
>
> There's no channel info there.
Right, this looks like a largely untested path which has been that way
since forever. But, since UEs generally cause the machine to syncflood
and warm reset (now, at least), I don't think it makes a whole lot of
sense to even have such a counter - if we did, it would either say 0 or
1 :).
So, I'd suggest the UE counter to be optional and to let the driver
decide whether it wants it or not.
[..]
> One alternative would simply to remove all those intermediate
> counters, letting userspace to count the errors via perf (provided
> that we have a proper location field).
Yes, that would be where we want to go eventually because I too don't
see any reason for those counters. Besides, they don't decay over time,
for example, say you have a DIMM which experiences a temporary failure
and generates k CEs. Then, the source of that error disappears and the
DIMM works fine for months.
Now, when you look at the counters, you'll still see k CEs in one of its
ranks which doesn't tell you when those errors happened and what their
rate was, etc.
So, I'm fine with dropping those counters since they don't give you the
flexibility of a userspace tool and they don't work properly anyway.
HOWEVER, I don't know who uses them still so probably a deprecation
warning is in order here...
>
> Regards,
> Mauro
>
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2012-03-11 11:35 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-07 11:40 [PATCH 0/6] Add a per-dimm structure Mauro Carvalho Chehab
2012-03-07 11:40 ` [PATCH 1/6] edac: Create a dimm struct and move the labels into it Mauro Carvalho Chehab
2012-03-07 11:40 ` [PATCH 2/6] edac: Add per dimm's sysfs nodes Mauro Carvalho Chehab
2012-03-07 11:40 ` [PATCH 3/6] edac: move dimm properties to struct memset_info Mauro Carvalho Chehab
2012-03-07 11:40 ` [PATCH 4/6] edac: Don't initialize csrow's first_page & friends when not needed Mauro Carvalho Chehab
2012-03-07 11:40 ` [PATCH 5/6] edac: move nr_pages to dimm struct Mauro Carvalho Chehab
2012-03-07 11:40 ` [PATCH 6/6] edac: Add per-dimm sysfs show nodes Mauro Carvalho Chehab
2012-03-08 21:57 ` [PATCH 0/6] Add a per-dimm structure Borislav Petkov
2012-03-09 10:32 ` Mauro Carvalho Chehab
2012-03-09 14:38 ` Borislav Petkov
2012-03-09 16:40 ` Mauro Carvalho Chehab
2012-03-09 18:47 ` Borislav Petkov
2012-03-09 19:46 ` Mauro Carvalho Chehab
2012-03-11 11:34 ` Borislav Petkov [this message]
2012-03-11 12:32 ` Mauro Carvalho Chehab
2012-03-12 16:39 ` Borislav Petkov
2012-03-12 17:03 ` Luck, Tony
2012-03-12 18:10 ` Borislav Petkov
2012-03-13 23:32 ` Greg KH
2012-03-14 19:35 ` Mauro Carvalho Chehab
2012-03-14 20:43 ` Greg KH
2012-03-14 22:20 ` Mauro Carvalho Chehab
2012-03-14 23:32 ` Greg KH
2012-03-15 2:22 ` Mauro Carvalho Chehab
2012-03-15 15:00 ` Greg KH
2012-03-14 22:31 ` Borislav Petkov
2012-03-14 22:40 ` Greg KH
2012-03-15 1:37 ` Mauro Carvalho Chehab
2012-03-15 1:44 ` Mauro Carvalho Chehab
2012-03-15 11:31 ` Borislav Petkov
2012-03-15 12:40 ` Mauro Carvalho Chehab
2012-03-15 21:38 ` Borislav Petkov
2012-03-16 8:47 ` Mauro Carvalho Chehab
2012-03-16 11:15 ` Borislav Petkov
2012-03-16 12:07 ` Mauro Carvalho Chehab
2012-03-16 14:07 ` Mauro Carvalho Chehab
2012-03-16 15:31 ` Greg KH
2012-03-16 16:54 ` Borislav Petkov
2012-03-16 15:30 ` Greg KH
2012-03-16 15:44 ` Mauro Carvalho Chehab
2012-03-16 16:01 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120311113411.GB29175@aftab \
--to=bp@amd64.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox