linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab@redhat.com>
To: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@amd64.org>, Borislav Petkov <bp@alien8.de>,
	Linux Edac Mailing List <linux-edac@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC 2/2] events/hw_event: Create a Hardware Anomaly Report Mechanism (HARM)
Date: Sat, 26 Mar 2011 08:56:27 -0300	[thread overview]
Message-ID: <4D8DD46B.1030903@redhat.com> (raw)
In-Reply-To: <AANLkTi=jGB_nFO0nnWWZ60gqgdELAQjv5dpAZmz5zKSz@mail.gmail.com>

Em 25-03-2011 19:37, Tony Luck escreveu:
> On Fri, Mar 25, 2011 at 2:22 PM, Mauro Carvalho Chehab
> <mchehab@redhat.com> wrote:
>> Em 25-03-2011 11:13, Borislav Petkov escreveu:
>>> However, there's
>>> another issue with fatal errors - you want to execute as less code as
>>> possible in the wake of a fatal error.
>>
>> Yes. That's one of the reasons why it may make sense to have a separate event
>> for fatal errors.
> 
> We have three categories (severities):
> 1) Corrected - log these
> 2) Uncorrected-but-not-immediately-fatal - log these too
> 3) Fatal - all we can do with these is log to some persistent store (or
>     to a serial console connected to a logging device). perf style event
>     tracing doesn't help when all the userland daemons will never get a
>     chance to run.

Ok. Assuming that fatal errors will be stored on some persistent way, on a next
boot, the daemon will be able to catch them. So, I think it would be a nice feature
to have 3 different trace events, in order to allow users to filter between them.
Alternatively, we may implement filtering capabilities on userspace, but as perf
has this already, I'm in favor of using what's there.

>> It would be good to use some non-volatile ram for these. I was told that
>> APEI spec defines a way for that, but I'm not sure if low end machines would
>> be shipped with that.
> 
> You are talking about ERST - and you are right, this is generally not going
> to be present on low-end machines.  drivers/acpi/apei/erst.c was accepted
> in 2.6.35.  My /dev/pstore changes are in the current merge for 2.6.39 (but
> currently only show dmesg traces to the user).

It makes sense to integrate it on perf, after we add there a way to recover
persistent data when the daemon starts.

>> Alternatively, edac could fill a translation table, and the decoding code at
>> mce would be just a table retrieve routine (in order to speed-up translation,
>> in the case of fatal errors.
> 
> Eventually the translation table should move above edac (to the drivers/ras/
> area that Borislav suggested earlier?) so that both mce and edac can access.
> I think we'll need this for some time as SMBIOS continues to disappoint
> me with its inaccuracies.

That makes sense to me. The translation table there is only for memories, currently.

The /ras table needs to be generic enough to cover other types of translation, like
for example, translating a cpu Kernel representation into a CPU socket label,
and a PCI BUS ID into a PCI slot number.

Mauro.

  reply	other threads:[~2011-03-26 11:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1300996141.git.mchehab@redhat.com>
2011-03-24 20:32 ` [PATCH RFC 2/2] events/hw_event: Create a Hardware Anomaly Report Mechanism (HARM) Mauro Carvalho Chehab
2011-03-24 22:39   ` Borislav Petkov
2011-03-25 10:20     ` Mauro Carvalho Chehab
2011-03-25 14:13       ` Borislav Petkov
2011-03-25 21:22         ` Mauro Carvalho Chehab
2011-03-25 22:37           ` Tony Luck
2011-03-26 11:56             ` Mauro Carvalho Chehab [this message]
2011-03-28 17:03           ` Borislav Petkov
2011-03-28 19:44             ` Mauro Carvalho Chehab
2011-03-30 17:27               ` Luck, Tony
2011-03-30 17:51                 ` Borislav Petkov
2011-03-30 18:30                   ` Francis St. Amant
2011-03-30 19:50                     ` Borislav Petkov
2011-03-30 20:00                       ` Francis St. Amant
2011-03-31  7:43                         ` Borislav Petkov
2012-01-26 23:05     ` [PATCH 1/3] events/hw_event: Create a Hardware Events Report Mecanism (HERM) Mauro Carvalho Chehab
2012-01-26 23:05       ` [PATCH 2/3] events/hw_event: use __string() trace macros for events Mauro Carvalho Chehab
2012-01-26 23:05       ` [PATCH 3/3] hw_event: Consolidate uncorrected/corrected error msgs into one Mauro Carvalho Chehab
2011-03-24 20:32 ` [PATCH RFC 1/2] edac: Move edac main structs to include/linux/edac.h Mauro Carvalho Chehab
2011-03-24 20:54 ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D8DD46B.1030903@redhat.com \
    --to=mchehab@redhat.com \
    --cc=bp@alien8.de \
    --cc=bp@amd64.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).