linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <m.chehab@samsung.com>
To: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>,
	tony.luck@intel.com, bhelgaas@google.com, rostedt@goodmis.org,
	rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Aristeu Rozanski Filho <arozansk@redhat.com>
Subject: Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
Date: Mon, 12 Aug 2013 11:44:04 -0300	[thread overview]
Message-ID: <20130812114404.3bd64fa0@samsung.com> (raw)
In-Reply-To: <5208D80D.5030206@linux.vnet.ibm.com>

Em Mon, 12 Aug 2013 18:11:49 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:

> On 08/12/2013 05:03 PM, Mauro Carvalho Chehab wrote:
> > Em Sat, 10 Aug 2013 20:03:22 +0200
> > Borislav Petkov <bp@alien8.de> escreveu:
> >
> >> On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote:
> >>> Em Thu, 08 Aug 2013 23:57:51 +0530
> >>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> >>>
> >>>> Enable memory error trace event in cper.c
> >>>
> >>> Why do we need to do that? Memory error events are already handled
> >>> via edac_ghes module,
> >>
> >> If APEI gives me all the required information in order to deal with the
> >> hardware error - and it looks like it does - then the additional layer
> >> of ghes_edac is not needed.
> >
> > APEI is just the mechanism that collects the data, not the mechanism
> > that reports to userspace.
> 
> I think what Boris is saying is that ghes_edac isn't adding anything 
> more here given what we get from APEI structures. So, there doesn't seem 
> to be a need to add dependency on edac for this purpose.
> 
> Further, ghes_edac seems to require EDAC_MM_EDAC to be compiled into the 
> kernel (not a module). So, more dependencies.
> 
> >
> > The current implementation is that APEI already reports those errors
> > via ghes_edac driver. It also reports the very same error via MCE
> > (although the APEI interface to MCE is currently broken for everything
> > that it is not Nehalem-EX - as it basically emulates the MCE log for
> > that specific architecture).
> 
> So, I looked at ghes_edac and it basically seems to boil down to 
> trace_mc_event. 

Yes. It also provides the sysfs nodes that describe the memory.

> But, this only seems to expose the APEI data as a string 
> and doesn't look to really make all the fields available to user-space 
> in a raw manner. Not sure how well this can be utilised by a user-space 
> tool. Do you have suggestions on how we can do this?

There's already an userspace tool that handes it:
	https://git.fedorahosted.org/cgit/rasdaemon.git/

What is missing there on the current version is the bits that would allow
to translate from APEI way to report an error (memory node, card, module,
bank, device) into a DIMM label[1]. 

At the end, what really matters is to be able to point to the DIMM(s)
in a way that the user can replace them (e. g. using the silk screen
labels on the system motherboard).

[1] It does such translation for the other EDAC drivers, via a
configuration file that does such per-system mapping. Extending it to
also handle APEI errors shouldn't be hard.

> 
> Thanks,
> Naveen
> 


-- 

Cheers,
Mauro

  parent reply	other threads:[~2013-08-12 14:44 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
2013-08-08 19:23   ` Steven Rostedt
2013-08-12 11:37     ` Naveen N. Rao
2013-08-12 13:13       ` Steven Rostedt
2013-08-12 13:26         ` Borislav Petkov
2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
2013-08-08 19:17   ` Borislav Petkov
2013-08-12 11:28     ` Naveen N. Rao
2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
2013-08-08 19:38   ` Mauro Carvalho Chehab
2013-08-10 18:03     ` Borislav Petkov
2013-08-12 11:33       ` Mauro Carvalho Chehab
2013-08-12 12:38         ` Borislav Petkov
2013-08-12 14:49           ` Mauro Carvalho Chehab
2013-08-12 15:04             ` Borislav Petkov
2013-08-12 17:25               ` Mauro Carvalho Chehab
2013-08-12 17:54                 ` Luck, Tony
2013-08-12 17:56                 ` Borislav Petkov
2013-08-13 11:36                   ` Naveen N. Rao
2013-08-13 12:21                     ` Mauro Carvalho Chehab
2013-08-13 12:33                       ` Borislav Petkov
2013-08-13 16:55                       ` Naveen N. Rao
2013-08-14 23:54                         ` Mauro Carvalho Chehab
2013-08-12 12:41         ` Naveen N. Rao
2013-08-12 12:53           ` Borislav Petkov
2013-08-13 11:21             ` Naveen N. Rao
2013-08-13 12:42               ` Borislav Petkov
2013-08-13 17:32                 ` Naveen N. Rao
2013-08-13 17:58                   ` Borislav Petkov
2013-08-13 18:05                     ` Luck, Tony
2013-08-13 18:10                       ` Borislav Petkov
2013-08-13 20:13                         ` Luck, Tony
2013-08-14  5:43                           ` Borislav Petkov
2013-08-14 18:38                             ` Luck, Tony
2013-08-15 10:14                               ` Borislav Petkov
2013-08-15 19:14                                 ` Luck, Tony
2013-08-15 19:43                                   ` Borislav Petkov
2013-08-15  0:05                             ` Mauro Carvalho Chehab
2013-08-14 10:57                     ` Naveen N. Rao
2013-08-15  0:22                       ` Mauro Carvalho Chehab
2013-08-15  9:38                         ` Borislav Petkov
2013-08-15 13:26                           ` Mauro Carvalho Chehab
2013-08-15 13:44                             ` Borislav Petkov
2013-08-15 14:14                               ` Mauro Carvalho Chehab
2013-08-15 16:11                                 ` Borislav Petkov
2013-08-15 19:20                                 ` Luck, Tony
2013-08-15 19:41                                   ` Borislav Petkov
2013-08-15  0:00                   ` Mauro Carvalho Chehab
2013-08-15  9:43                     ` Borislav Petkov
2013-08-12 14:44           ` Mauro Carvalho Chehab [this message]
2013-08-13 11:41             ` Naveen N. Rao
2013-08-13 12:41               ` Mauro Carvalho Chehab
2013-08-13 17:17                 ` Naveen N. Rao
2013-08-13 17:39                   ` Luck, Tony
2013-08-14 10:47                     ` Naveen N. Rao
2013-08-14 12:18                       ` Borislav Petkov
2013-08-15  0:15                       ` Mauro Carvalho Chehab
2013-08-15 10:01                         ` Borislav Petkov
2013-08-15 13:34                           ` Mauro Carvalho Chehab
2013-08-15 13:51                             ` Borislav Petkov
2013-08-15 18:16                               ` Luck, Tony
2013-08-15 18:41                                 ` Borislav Petkov
2013-08-14 23:56                   ` Mauro Carvalho Chehab
2013-08-15 10:02                     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130812114404.3bd64fa0@samsung.com \
    --to=m.chehab@samsung.com \
    --cc=arozansk@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=lance.ortiz@hp.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).