All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <m.chehab@samsung.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	tony.luck@intel.com, bhelgaas@google.com, rostedt@goodmis.org,
	rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
Date: Thu, 15 Aug 2013 11:14:07 -0300	[thread overview]
Message-ID: <20130815111407.4080a744@samsung.com> (raw)
In-Reply-To: <20130815134454.GF27616@pd.tnic>

Em Thu, 15 Aug 2013 15:44:54 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> On Thu, Aug 15, 2013 at 10:26:07AM -0300, Mauro Carvalho Chehab wrote:
> > I mean that the edac core needs to know that, on a given system, the
> > BIOS is accessing the hardware registers and sending the data via
> > ghes_edac.
> 
> Right, that's the firmware-first thing which Naveen did - see
> mce_disable_bank.
> 
> > No. As we want that fatal errors to also be properly reported, the
> > kernel will still need to know the memory layout.
> 
> Read what I said: if you have the silkscreen label you don't need the
> memory layout - you *already* *know* which DIMM is affected.

AFAIKT, APEI doesn't provide the silkscreen label. Some code (or some
datasheet) is needed to translate between what APEI provides into the
silkscreen label.

Naveen, please correct me if I'm wrong.

> Also, fatal errors are a whole different beast where we run in NMI
> context or we even don't get to run the #MC handler on some systems.

I see.

Yes, APEI currently prints only a raw event on high severity errors
at ghes_notify_nmi(), and doesn't call ghes_edac. Changing it would
require to parse the error at __ghes_print_estatus(). Not sure how
easy would be to change that.

Em Thu, 15 Aug 2013 15:51:06 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> On Thu, Aug 15, 2013 at 10:34:21AM -0300, Mauro Carvalho Chehab wrote:
> > Yes, but the thing is that it is not safe to use the hardware driver
> > if the BIOS is also reading the hardware error registers directly, as,
> > on several hardware, a read cause the error data to be cleaned on such
> > register.
> 
> Here's the deal:
> 
> * We parse some APEI table and disable those MCA banks which the BIOS
> wants to handle first.
> 
> * When the BIOS decides to report an error from that handling, it does
> so over another BIOS table.

OK.

> 
> * Now you have two possibilities:
> 
> ** On systems without an edac driver or where it doesn't make sense to
> have the ghes_edac driver, we call trace_mc_event() straight from APEI
> code (this is what we're currently discussung).
> 
> ** On other systems, where we need ghes_edac, we *don't* use the
> trace_mc_event() tracepoint in the APEI code but let it come from
> ghes_edac with additional information collected by edac.

I don't see why should we have those two alternatives, as, at worse
case (e. g. if ghes_edac can't enrich the APEI data with labels),
they'll basically provide the very same data to userspace, and the
EDAC extra overhead is small, on its error report logic.

The risk of doing the very same thing on two different places is that
the logic to encapsulate APEI data into trace_mc_event() would be
on two separate places. It risks that someone would change one of the
drivers and forget to apply the very same change on the other, causing
parse errors on userspace, depending on the source.

-- 

Cheers,
Mauro

  reply	other threads:[~2013-08-15 14:14 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
2013-08-08 19:23   ` Steven Rostedt
2013-08-12 11:37     ` Naveen N. Rao
2013-08-12 13:13       ` Steven Rostedt
2013-08-12 13:26         ` Borislav Petkov
2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
2013-08-08 19:17   ` Borislav Petkov
2013-08-12 11:28     ` Naveen N. Rao
2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
2013-08-08 19:38   ` Mauro Carvalho Chehab
2013-08-10 18:03     ` Borislav Petkov
2013-08-12 11:33       ` Mauro Carvalho Chehab
2013-08-12 12:38         ` Borislav Petkov
2013-08-12 14:49           ` Mauro Carvalho Chehab
2013-08-12 15:04             ` Borislav Petkov
2013-08-12 17:25               ` Mauro Carvalho Chehab
2013-08-12 17:54                 ` Luck, Tony
2013-08-12 17:56                 ` Borislav Petkov
2013-08-13 11:36                   ` Naveen N. Rao
2013-08-13 12:21                     ` Mauro Carvalho Chehab
2013-08-13 12:33                       ` Borislav Petkov
2013-08-13 16:55                       ` Naveen N. Rao
2013-08-14 23:54                         ` Mauro Carvalho Chehab
2013-08-12 12:41         ` Naveen N. Rao
2013-08-12 12:53           ` Borislav Petkov
2013-08-13 11:21             ` Naveen N. Rao
2013-08-13 12:42               ` Borislav Petkov
2013-08-13 17:32                 ` Naveen N. Rao
2013-08-13 17:58                   ` Borislav Petkov
2013-08-13 18:05                     ` Luck, Tony
2013-08-13 18:05                       ` Luck, Tony
2013-08-13 18:05                       ` Luck, Tony
2013-08-13 18:10                       ` Borislav Petkov
2013-08-13 20:13                         ` Luck, Tony
2013-08-13 20:13                           ` Luck, Tony
2013-08-13 20:13                           ` Luck, Tony
2013-08-14  5:43                           ` Borislav Petkov
2013-08-14 18:38                             ` Luck, Tony
2013-08-14 18:38                               ` Luck, Tony
2013-08-14 18:38                               ` Luck, Tony
2013-08-15 10:14                               ` Borislav Petkov
2013-08-15 19:14                                 ` Luck, Tony
2013-08-15 19:14                                   ` Luck, Tony
2013-08-15 19:14                                   ` Luck, Tony
2013-08-15 19:43                                   ` Borislav Petkov
2013-08-15  0:05                             ` Mauro Carvalho Chehab
2013-08-14 10:57                     ` Naveen N. Rao
2013-08-15  0:22                       ` Mauro Carvalho Chehab
2013-08-15  9:38                         ` Borislav Petkov
2013-08-15 13:26                           ` Mauro Carvalho Chehab
2013-08-15 13:44                             ` Borislav Petkov
2013-08-15 14:14                               ` Mauro Carvalho Chehab [this message]
2013-08-15 16:11                                 ` Borislav Petkov
2013-08-15 19:20                                 ` Luck, Tony
2013-08-15 19:41                                   ` Borislav Petkov
2013-08-15  0:00                   ` Mauro Carvalho Chehab
2013-08-15  9:43                     ` Borislav Petkov
2013-08-12 14:44           ` Mauro Carvalho Chehab
2013-08-13 11:41             ` Naveen N. Rao
2013-08-13 12:41               ` Mauro Carvalho Chehab
2013-08-13 17:17                 ` Naveen N. Rao
2013-08-13 17:39                   ` Luck, Tony
2013-08-14 10:47                     ` Naveen N. Rao
2013-08-14 12:18                       ` Borislav Petkov
2013-08-15  0:15                       ` Mauro Carvalho Chehab
2013-08-15 10:01                         ` Borislav Petkov
2013-08-15 13:34                           ` Mauro Carvalho Chehab
2013-08-15 13:51                             ` Borislav Petkov
2013-08-15 18:16                               ` Luck, Tony
2013-08-15 18:16                                 ` Luck, Tony
2013-08-15 18:16                                 ` Luck, Tony
2013-08-15 18:41                                 ` Borislav Petkov
2013-08-14 23:56                   ` Mauro Carvalho Chehab
2013-08-15 10:02                     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130815111407.4080a744@samsung.com \
    --to=m.chehab@samsung.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=lance.ortiz@hp.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.