From: Borislav Petkov <bp@alien8.de>
To: linux-edac <linux-edac@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>, Tony Luck <tony.luck@intel.com>
Subject: [RFC PATCH -v2 0/3] RAS: Correctable Errors Collector thing
Date: Thu, 12 Jun 2014 18:22:27 +0200 [thread overview]
Message-ID: <1402590150-9798-1-git-send-email-bp@alien8.de> (raw)
From: Borislav Petkov <bp@suse.de>
Hi all,
so here's v2 with the feedback from last time addressed (... hopefully).
This is ontop of Gong's extlog stuff which is currently a moving target
but I've based this stuff on it as we're starting slowly to relocate
generic RAS stuff into drivers/ras/.
A couple of points I was thinking about which we should talk about:
* This version automatically removes the oldest element from the array
when it gets full. With 512 PFNs max size, I think we should be ok.
* If CEC (let's call this thing that) can perform all RAS actions
needed/required, we should not forward correctable errors to userspace
because it simply doesn't need to. Unless there is something more we
want to do in userspace... we could make it configurable, dunno.
This version simply collects the errors and does the soft offlining,
thus issuing to dmesg something like this:
[ 520.872376] RAS: Soft-offlining pfn: 0xdead
[ 520.874384] soft offline: 0xdead page already poisoned
I'm not sure what we want to do with this info - we need to think about
it more but we're flexible there so... :-)
My main reasoning behind not forwarding each single correctable error
is that we don't want to upset the user unnecessarily and cause those
expensive support calls.
* Concerning policy and at which error count we should soft-offline a
page and whether we should make it configurable or not and what the
interface would be: we still don't know and we probably need to talk
about it too. Right now, using 10 bits for that count feels right. The
count gets decayed anyway.
But, do we need to run it on lotsa live systems and hear feedback?
Definitely.
* As to why we're putting this in the kernel and enabling it by default:
a userspace daemon is much more fragile than doing this in the kernel.
And regardless of distro, everyone gets this.
Constructive feedback is, as always, appreciated.
Thanks.
Borislav Petkov (3):
MCE, CE: Corrected errors collecting thing
MCE, CE: Wire in the CE collector
MCE, CE: Add debugging glue
arch/x86/kernel/cpu/mcheck/mce.c | 87 ++++++++++-
drivers/ras/Kconfig | 11 ++
drivers/ras/Makefile | 3 +-
drivers/ras/ce.c | 309 +++++++++++++++++++++++++++++++++++++++
include/linux/ras.h | 2 +
5 files changed, 403 insertions(+), 9 deletions(-)
create mode 100644 drivers/ras/ce.c
--
2.0.0
next reply other threads:[~2014-06-12 16:25 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-12 16:22 Borislav Petkov [this message]
2014-06-12 16:22 ` [RFC PATCH -v2 1/3] MCE, CE: Corrected errors collecting thing Borislav Petkov
2014-06-12 16:22 ` [RFC PATCH -v2 2/3] MCE, CE: Wire in the CE collector Borislav Petkov
2014-06-12 16:22 ` [RFC PATCH -v2 3/3] MCE, CE: Add debugging glue Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1402590150-9798-1-git-send-email-bp@alien8.de \
--to=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox