From: Mauro Carvalho Chehab <m.chehab@samsung.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
"Chen, Gong" <gong.chen@linux.intel.com>,
tony.luck@intel.com, linux-kernel@vger.kernel.org,
linux-acpi@vger.kernel.org,
Aristeu Rozanski Filho <arozansk@redhat.com>,
Steven Rostedt <srostedt@redhat.com>
Subject: Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver
Date: Wed, 16 Oct 2013 08:55:58 -0300 [thread overview]
Message-ID: <20131016085558.19fe143a@samsung.com> (raw)
In-Reply-To: <20131016104221.GC13608@pd.tnic>
Em Wed, 16 Oct 2013 12:42:21 +0200
Borislav Petkov <bp@alien8.de> escreveu:
> On Wed, Oct 16, 2013 at 07:35:39AM -0300, Mauro Carvalho Chehab wrote:
> > Well, try to write some code on userspace to discover what's the error.
> >
> > An error threshold mechanism on userspace will only work if userspace
> > knows that the error belongs to the same DIMM.
>
> Just read the first mail again:
>
> <idle>-0 [000] d.h. 56068.488759: extlog_mem_event: 3 corrected errors:unknown on Memriser1 CHANNEL A DIMM 0(FRU: 00000000-0000
> -0000-0000-000000000000 physical addr: 0x0000000851fe0000 node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 28927 column: 1296)
On that log, "physical addr: 0x0000000851fe0000 node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 28927 column: 1296"
is a string, instead of an hierarchical position, like what it is provided
on EDAC.
Worse than that, not all data may be available, as CPER allows to
ommit some data.
Also, I suspect that, if an error happens to affect more than one DIMM
(e. g. part of the location is not available for a given error),
that the DIMM label will also not be properly shown.
Also, writing the userspace counterpart that would work properly is
extremely hard, if the information about the memory layout is not known
in advance. So, in practice, if the above memory error is provided, all
userspace will likely be able to do is to store it and require someone
to manually identify what's happening.
On the other hand, if node, channel and dimm number information is
properly filled (like it happens on EDAC), usersapce can rely on those
data, in order to apply per dimm, per channel and per node thresholds.
It may even use the physical address to identify if the problem is only on
a certain region of a physical DIMM and poison that region, while it is
not possible to replace the damaged component.
Regards,
Mauro
next prev parent reply other threads:[~2013-10-16 11:56 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-11 6:32 Extended H/W error log driver Chen, Gong
2013-10-11 6:32 ` [PATCH 1/8] ACPI, APEI, CPER: Fix status check during error printing Chen, Gong
2013-10-11 8:50 ` Borislav Petkov
2013-10-11 6:32 ` [PATCH 2/8] ACPI, CPER: Update cper info Chen, Gong
2013-10-11 9:06 ` Borislav Petkov
2013-10-11 9:06 ` Borislav Petkov
2013-10-11 15:47 ` Borislav Petkov
2013-10-16 1:57 ` Joe Perches
2013-10-16 2:46 ` Chen Gong
2013-10-16 3:10 ` Joe Perches
2013-10-15 18:17 ` Naveen N. Rao
2013-10-16 1:39 ` Chen Gong
2013-10-17 12:21 ` Naveen N. Rao
2013-10-18 11:06 ` Naveen N. Rao
2013-10-11 6:32 ` [PATCH 3/8] ACPI, x86: Extended error log driver for x86 platform Chen, Gong
2013-10-11 15:24 ` Borislav Petkov
2013-10-14 3:16 ` Chen Gong
2013-10-14 10:26 ` Borislav Petkov
2013-10-14 13:03 ` Chen Gong
2013-10-14 13:28 ` Borislav Petkov
2013-10-14 16:50 ` Tony Luck
2013-10-14 17:07 ` Borislav Petkov
2013-10-14 17:16 ` Tony Luck
2013-10-11 6:32 ` [PATCH 4/8] DMI: Parse memory device (type 17) in SMBIOS Chen, Gong
2013-10-11 15:40 ` Borislav Petkov
2013-10-14 3:21 ` Chen Gong
2013-10-14 10:30 ` Borislav Petkov
2013-10-15 19:00 ` Naveen N. Rao
2013-10-11 6:32 ` [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error Chen, Gong
2013-10-11 15:41 ` Borislav Petkov
2013-10-15 17:26 ` Naveen N. Rao
2013-10-16 1:35 ` Chen Gong
2013-10-11 6:32 ` [PATCH 6/8] ACPI, APEI, CPER: Enhance memory reporting capability Chen, Gong
2013-10-11 15:49 ` Borislav Petkov
2013-10-15 19:18 ` Naveen N. Rao
2013-10-11 6:32 ` [PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format Chen, Gong
2013-10-11 16:02 ` Borislav Petkov
2013-10-14 4:55 ` Chen Gong
2013-10-14 10:36 ` Borislav Petkov
2013-10-14 17:12 ` Tony Luck
2013-10-14 18:47 ` Borislav Petkov
2013-10-14 21:03 ` Tony Luck
2013-10-14 21:50 ` Borislav Petkov
2013-10-15 9:18 ` Chen Gong
2013-10-15 10:13 ` Borislav Petkov
2013-10-15 11:28 ` Naveen N. Rao
2013-10-15 11:41 ` Naveen N. Rao
2013-10-15 12:29 ` Borislav Petkov
2013-10-15 16:42 ` Joe Perches
2013-10-15 16:49 ` Tony Luck
2013-10-15 16:56 ` Borislav Petkov
2013-10-11 6:32 ` [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver Chen, Gong
2013-10-11 7:52 ` Borislav Petkov
2013-10-11 16:14 ` Borislav Petkov
2013-10-14 7:07 ` Chen Gong
2013-10-15 16:54 ` Naveen N. Rao
2013-10-15 17:00 ` Borislav Petkov
2013-10-15 17:30 ` Naveen N. Rao
2013-10-15 17:47 ` Borislav Petkov
2013-10-16 0:43 ` Mauro Carvalho Chehab
2013-10-16 9:16 ` Borislav Petkov
2013-10-16 10:35 ` Mauro Carvalho Chehab
2013-10-16 10:42 ` Borislav Petkov
2013-10-16 11:55 ` Mauro Carvalho Chehab [this message]
2013-10-16 12:20 ` Borislav Petkov
2013-10-16 20:47 ` Luck, Tony
2013-10-17 10:34 ` Mauro Carvalho Chehab
2013-10-17 21:35 ` Luck, Tony
2013-10-16 20:35 ` Luck, Tony
2013-10-17 10:32 ` Mauro Carvalho Chehab
2013-10-16 9:50 ` Chen Gong
2013-10-16 10:49 ` Borislav Petkov
2013-10-18 11:04 ` Naveen N. Rao
2013-10-11 7:00 ` Extended H/W error log driver Joe Perches
2013-10-11 8:04 ` Borislav Petkov
2013-10-11 14:54 ` Luck, Tony
2013-10-11 14:54 ` Luck, Tony
2013-10-11 15:27 ` Borislav Petkov
2013-10-14 6:49 ` Chen Gong
2013-10-14 10:55 ` Borislav Petkov
2013-10-15 4:07 ` Chen Gong
2013-10-15 9:28 ` Borislav Petkov
2013-10-15 16:15 ` Tony Luck
2013-10-15 19:10 ` Naveen N. Rao
2013-10-15 19:23 ` Borislav Petkov
2013-10-17 12:07 ` Naveen N. Rao
2013-10-17 13:04 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131016085558.19fe143a@samsung.com \
--to=m.chehab@samsung.com \
--cc=arozansk@redhat.com \
--cc=bp@alien8.de \
--cc=gong.chen@linux.intel.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=srostedt@redhat.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.