From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>,
Yazen Ghannam <Yazen.Ghannam@amd.com>, X86 ML <x86@kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/4] RAS: Add a Corrected Errors Collector
Date: Tue, 7 Jun 2016 23:04:27 +0200 [thread overview]
Message-ID: <20160607210427.GF1152@pd.tnic> (raw)
In-Reply-To: <20160607181109.GA23770@intel.com>
On Tue, Jun 07, 2016 at 11:11:09AM -0700, Luck, Tony wrote:
> Is there a reason that we need to call the ce_add_elem() inline
> here instead of having it just register on the mce_notifier chain?
> This series just cleaned out all the /dev/mcelog special code from
> here, and you are adding something back before the ink is dry on
> that change.
>
> I'm also strongly divided about whether this corrected error
> handler should be allowed to preempt anything else even seeing
> the error.
Well, so this is the main reason for adding the CEC: not to disturb
users with random CECC errors which might happen a couple of times
due to alpha particles and then never again. I.e., address all those
sporadic bursts of correctable errors which don't mean that the hw is
going faulty.
If the CEC consumes the error and does the leaky bucket of "forgetting"
about it after a while and after no more of that same PFN triggers
errors, then we do that silently and do not scare users. Yeah, they
think their hw is broken and whether they should start swapping things.
And then there's the aspect of soft-offlining PFNs when the error
threshold has been reached. I don't think we have had any automatic
recovery actions wrt errors so far without external agents.
> Argument for:
> Lonely corrected errors are "No Big Deal"(TM). Just counting them
> and moving on is a good thing.
Yap, exactly.
> Arguments against:
> 1) We may miss out on a one-time opportunity to get extra information
> (from acpi_extlog.c).
> 2) I think this subverts our CMCI storm detection and mitigation code?
...and we can address that by adding "ras=cec_doesnt_consume_errors" or
somesuch so that the rest of the chain sees them too.
I think we can be pretty flexible about it. And again, my main angle is
the "do not disturb users unnecessarily".
> We could make the chain more caller friendly by adding a filter
> argument so users could say "just tell me about memory errors"
> (currently each of the EDAC drivers has inline code to do the same
> as "memory_error(m) && mce_usable_address(m)")
Sure, that too.
And it can work on any system without the need for an EDAC driver.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
next prev parent reply other threads:[~2016-06-07 21:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-07 16:52 [R-F-C PATCH 0/4] RFC: x86/mce: Deprecate mcelog and other funsies Borislav Petkov
2016-06-07 16:52 ` [RFC PATCH 1/4] RAS: Add a Corrected Errors Collector Borislav Petkov
2016-06-07 18:11 ` Luck, Tony
2016-06-07 21:04 ` Borislav Petkov [this message]
2016-06-07 16:52 ` [RFC PATCH 2/4] x86/mce: Deprecate /dev/mcelog Borislav Petkov
2016-06-13 9:00 ` Thomas Gleixner
2016-06-07 16:52 ` [RFC PATCH 3/4] x86/mce: Merge mce_amd_inj into mce-inject Borislav Petkov
2016-06-07 16:52 ` [RFC PATCH 4/4] x86/mce-inject: Use debugfs_remove_recursive() Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160607210427.GF1152@pd.tnic \
--to=bp@alien8.de \
--cc=Yazen.Ghannam@amd.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox