linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	"ananth@in.ibm.com" <ananth@in.ibm.com>,
	"masbock@linux.vnet.ibm.com" <masbock@linux.vnet.ibm.com>,
	"lcm@linux.vnet.ibm.com" <lcm@linux.vnet.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"Huang, Ying" <ying.huang@intel.com>,
	Robert Richter <rric@kernel.org>
Subject: Re: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
Date: Wed, 19 Jun 2013 23:41:45 +0200	[thread overview]
Message-ID: <20130619214145.GS28300@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F2DA884F0@ORSMSX106.amr.corp.intel.com>

On Wed, Jun 19, 2013 at 09:28:50PM +0000, Luck, Tony wrote:
> > Ok, where is that semantics? What in a CPER record does say "this error
> > should tell you that you need to offline the containing page and I'm
> > telling you this exactly only once"? Error Severity 0, i.e. Recoverable?
> 
> Naveen - this one is for you (or for your BIOS team).  Can you get us a sample
> CPER that you plan to provide when the BIOS decides that its threshold has
> been exceeded?  How will it be different from what old WSM-EX platforms
> were sending to us?  Hopefully the answer is encoded in the CPER record
> and not in some code we have to put in Linux to say "if (IBMplatform) do_thing_1(); else ... "

If we're going to be vendor-specific (which I'm pretty sure we will),
we'd gonna have to export knobs to userspace which each vendor can tweak
for themselves. Otherwise we'd get the DMI quirks ugliness all over
again and we better nip it in the bud, while we can.

> > Ok, we're talking about the S in RAS now. Do we have error recovery
> > strategies specified anywhere? Are they per-platform or generic? Is this
> > CPER strategy above, for example, only valid for some platforms or for
> > all APEI-using hardware?
> 
> mcelog(8) daemon has been doing this for years ... but it used the "predictive
> failure analysis" buzzwords that were popular way back then (today the
> marketing people seem to prefer "self healing" ). Whatever the name, the
> concept is the same ... take some set of corrected event reports and infer
> from them that something worse may happen soon, and use that information
> to try to avoid the (possibly) impending crash.

Ok, so some sort of userspace is enforcing policy based on collected
data/heuristics.

The above question about what to do *without* going to userspace and
back is maybe more interesting and we'd need a clean design there...
we'll see.

> > Questions over questions...
> 
> Questions are good - they help fill out gaps

I know - that's why I'm trying to poke holes early...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

  reply	other threads:[~2013-06-19 21:41 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-19 17:57 [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC Naveen N. Rao
2013-06-19 17:57 ` [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors Naveen N. Rao
2013-06-19 18:04   ` Borislav Petkov
2013-06-19 18:17     ` Naveen N. Rao
2013-06-19 18:19     ` Luck, Tony
2013-06-19 18:36       ` Borislav Petkov
2013-06-19 19:05         ` Luck, Tony
2013-06-19 20:14           ` Borislav Petkov
2013-06-19 20:33             ` Luck, Tony
2013-06-19 21:07               ` Borislav Petkov
2013-06-19 21:28                 ` Luck, Tony
2013-06-19 21:41                   ` Borislav Petkov [this message]
2013-06-19 22:08                     ` Luck, Tony
2013-06-20  5:35                       ` Borislav Petkov
2013-06-20 21:21                   ` Naveen N. Rao
2013-06-20 22:11                     ` Luck, Tony
2013-06-21  7:27                       ` Borislav Petkov
2013-06-21 16:43                         ` Naveen N. Rao
2013-06-28 12:04                         ` Naveen N. Rao
2013-06-28 17:31                           ` Tony Luck
2013-07-01 15:07                             ` Naveen N. Rao
2013-07-01 15:38                               ` Borislav Petkov
2013-07-01 15:41                                 ` Naveen N. Rao
2013-06-20  7:48   ` Borislav Petkov
2013-06-20 19:02     ` Naveen N. Rao
2013-06-20  7:39 ` [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC Borislav Petkov
2013-06-20 19:08   ` Naveen N. Rao
2013-06-20 19:29     ` Borislav Petkov
2013-06-20 20:14       ` Naveen N. Rao
2013-06-20 20:57         ` Borislav Petkov
2013-06-20 21:22           ` Naveen N. Rao
2013-06-21  7:34             ` Borislav Petkov
2013-06-21  7:46               ` Naveen N. Rao
2013-06-21  8:36                 ` Borislav Petkov
2013-06-21  9:32                   ` Naveen N. Rao
2013-06-21 14:08                     ` Borislav Petkov
2013-06-21 16:47                   ` Tony Luck
2013-06-21 17:40                     ` Borislav Petkov
2013-06-25 17:46                       ` Naveen N. Rao
2013-06-25 17:53                         ` Borislav Petkov
2013-06-25 17:55                         ` Luck, Tony
2013-06-25 18:28                           ` Naveen N. Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130619214145.GS28300@pd.tnic \
    --to=bp@alien8.de \
    --cc=ananth@in.ibm.com \
    --cc=lcm@linux.vnet.ibm.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masbock@linux.vnet.ibm.com \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=rric@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).