From: Mauro Carvalho Chehab <m.chehab@samsung.com>
To: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>,
tony.luck@intel.com, bhelgaas@google.com, rostedt@goodmis.org,
rjw@sisk.pl, lance.ortiz@hp.com, linux-pci@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
Aristeu Rozanski Filho <arozansk@redhat.com>
Subject: Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event
Date: Mon, 12 Aug 2013 11:44:04 -0300 [thread overview]
Message-ID: <20130812114404.3bd64fa0@samsung.com> (raw)
In-Reply-To: <5208D80D.5030206@linux.vnet.ibm.com>
Em Mon, 12 Aug 2013 18:11:49 +0530
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> On 08/12/2013 05:03 PM, Mauro Carvalho Chehab wrote:
> > Em Sat, 10 Aug 2013 20:03:22 +0200
> > Borislav Petkov <bp@alien8.de> escreveu:
> >
> >> On Thu, Aug 08, 2013 at 04:38:22PM -0300, Mauro Carvalho Chehab wrote:
> >>> Em Thu, 08 Aug 2013 23:57:51 +0530
> >>> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> escreveu:
> >>>
> >>>> Enable memory error trace event in cper.c
> >>>
> >>> Why do we need to do that? Memory error events are already handled
> >>> via edac_ghes module,
> >>
> >> If APEI gives me all the required information in order to deal with the
> >> hardware error - and it looks like it does - then the additional layer
> >> of ghes_edac is not needed.
> >
> > APEI is just the mechanism that collects the data, not the mechanism
> > that reports to userspace.
>
> I think what Boris is saying is that ghes_edac isn't adding anything
> more here given what we get from APEI structures. So, there doesn't seem
> to be a need to add dependency on edac for this purpose.
>
> Further, ghes_edac seems to require EDAC_MM_EDAC to be compiled into the
> kernel (not a module). So, more dependencies.
>
> >
> > The current implementation is that APEI already reports those errors
> > via ghes_edac driver. It also reports the very same error via MCE
> > (although the APEI interface to MCE is currently broken for everything
> > that it is not Nehalem-EX - as it basically emulates the MCE log for
> > that specific architecture).
>
> So, I looked at ghes_edac and it basically seems to boil down to
> trace_mc_event.
Yes. It also provides the sysfs nodes that describe the memory.
> But, this only seems to expose the APEI data as a string
> and doesn't look to really make all the fields available to user-space
> in a raw manner. Not sure how well this can be utilised by a user-space
> tool. Do you have suggestions on how we can do this?
There's already an userspace tool that handes it:
https://git.fedorahosted.org/cgit/rasdaemon.git/
What is missing there on the current version is the bits that would allow
to translate from APEI way to report an error (memory node, card, module,
bank, device) into a DIMM label[1].
At the end, what really matters is to be able to point to the DIMM(s)
in a way that the user can replace them (e. g. using the silk screen
labels on the system motherboard).
[1] It does such translation for the other EDAC drivers, via a
configuration file that does such per-system mapping. Extending it to
also handle APEI errors shouldn't be hard.
>
> Thanks,
> Naveen
>
--
Cheers,
Mauro
next prev parent reply other threads:[~2013-08-12 14:44 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-08 18:27 [PATCH 0/3] Add trace event for ghes memory error Naveen N. Rao
2013-08-08 18:27 ` [PATCH 1/3] mce: acpi/apei: trace: Include PCIe AER trace event conditionally Naveen N. Rao
2013-08-08 19:23 ` Steven Rostedt
2013-08-12 11:37 ` Naveen N. Rao
2013-08-12 13:13 ` Steven Rostedt
2013-08-12 13:26 ` Borislav Petkov
2013-08-08 18:27 ` [PATCH 2/3] mce: acpi/apei: trace: Add trace event for ghes memory error Naveen N. Rao
2013-08-08 19:17 ` Borislav Petkov
2013-08-12 11:28 ` Naveen N. Rao
2013-08-08 18:27 ` [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error trace event Naveen N. Rao
2013-08-08 19:38 ` Mauro Carvalho Chehab
2013-08-10 18:03 ` Borislav Petkov
2013-08-12 11:33 ` Mauro Carvalho Chehab
2013-08-12 12:38 ` Borislav Petkov
2013-08-12 14:49 ` Mauro Carvalho Chehab
2013-08-12 15:04 ` Borislav Petkov
2013-08-12 17:25 ` Mauro Carvalho Chehab
2013-08-12 17:54 ` Luck, Tony
2013-08-12 17:56 ` Borislav Petkov
2013-08-13 11:36 ` Naveen N. Rao
2013-08-13 12:21 ` Mauro Carvalho Chehab
2013-08-13 12:33 ` Borislav Petkov
2013-08-13 16:55 ` Naveen N. Rao
2013-08-14 23:54 ` Mauro Carvalho Chehab
2013-08-12 12:41 ` Naveen N. Rao
2013-08-12 12:53 ` Borislav Petkov
2013-08-13 11:21 ` Naveen N. Rao
2013-08-13 12:42 ` Borislav Petkov
2013-08-13 17:32 ` Naveen N. Rao
2013-08-13 17:58 ` Borislav Petkov
2013-08-13 18:05 ` Luck, Tony
2013-08-13 18:05 ` Luck, Tony
2013-08-13 18:05 ` Luck, Tony
2013-08-13 18:10 ` Borislav Petkov
2013-08-13 20:13 ` Luck, Tony
2013-08-13 20:13 ` Luck, Tony
2013-08-13 20:13 ` Luck, Tony
2013-08-14 5:43 ` Borislav Petkov
2013-08-14 18:38 ` Luck, Tony
2013-08-14 18:38 ` Luck, Tony
2013-08-14 18:38 ` Luck, Tony
2013-08-15 10:14 ` Borislav Petkov
2013-08-15 19:14 ` Luck, Tony
2013-08-15 19:14 ` Luck, Tony
2013-08-15 19:14 ` Luck, Tony
2013-08-15 19:43 ` Borislav Petkov
2013-08-15 0:05 ` Mauro Carvalho Chehab
2013-08-14 10:57 ` Naveen N. Rao
2013-08-15 0:22 ` Mauro Carvalho Chehab
2013-08-15 9:38 ` Borislav Petkov
2013-08-15 13:26 ` Mauro Carvalho Chehab
2013-08-15 13:44 ` Borislav Petkov
2013-08-15 14:14 ` Mauro Carvalho Chehab
2013-08-15 16:11 ` Borislav Petkov
2013-08-15 19:20 ` Luck, Tony
2013-08-15 19:41 ` Borislav Petkov
2013-08-15 0:00 ` Mauro Carvalho Chehab
2013-08-15 9:43 ` Borislav Petkov
2013-08-12 14:44 ` Mauro Carvalho Chehab [this message]
2013-08-13 11:41 ` Naveen N. Rao
2013-08-13 12:41 ` Mauro Carvalho Chehab
2013-08-13 17:17 ` Naveen N. Rao
2013-08-13 17:39 ` Luck, Tony
2013-08-14 10:47 ` Naveen N. Rao
2013-08-14 12:18 ` Borislav Petkov
2013-08-15 0:15 ` Mauro Carvalho Chehab
2013-08-15 10:01 ` Borislav Petkov
2013-08-15 13:34 ` Mauro Carvalho Chehab
2013-08-15 13:51 ` Borislav Petkov
2013-08-15 18:16 ` Luck, Tony
2013-08-15 18:16 ` Luck, Tony
2013-08-15 18:16 ` Luck, Tony
2013-08-15 18:41 ` Borislav Petkov
2013-08-14 23:56 ` Mauro Carvalho Chehab
2013-08-15 10:02 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130812114404.3bd64fa0@samsung.com \
--to=m.chehab@samsung.com \
--cc=arozansk@redhat.com \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=lance.ortiz@hp.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=rjw@sisk.pl \
--cc=rostedt@goodmis.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.