From: Mauro Carvalho Chehab <m.chehab@samsung.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
"Chen, Gong" <gong.chen@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
Aristeu Rozanski Filho <arozansk@redhat.com>,
Steven Rostedt <srostedt@redhat.com>
Subject: Re: [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver
Date: Thu, 17 Oct 2013 07:34:43 -0300 [thread overview]
Message-ID: <20131017073443.675e97f2@samsung.com> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F31D31C65@ORSMSX106.amr.corp.intel.com>
Em Wed, 16 Oct 2013 20:47:05 +0000
"Luck, Tony" <tony.luck@intel.com> escreveu:
> > Also, I suspect that, if an error happens to affect more than one DIMM
> > (e. g. part of the location is not available for a given error),
> > that the DIMM label will also not be properly shown.
>
> There are a couple of cases here:
>
> 1) There are a number of DIMMs behind some flaky h/w that introduces errors
> that are apparently blamed onto each of those DIMMs.
>
> All we can do here is statistical correlations ... each error is reported independently,
> it is up to some entity to notice the higher level topology connection. There is enough
> information in the UEFI error record to do that (assuming that BIOS filled out the
> necessary fields).
>
> 2) There is a single reported error that spans more than one DIMM.
>
> This can happen with a UC error in a pair of lock-step DIMMs. Since the error is UC
> we know that two (or more) bits are bad. But we have no way to tell whether the
> bad bits came from the same DIMM, or one bit from each (because we don't know
> which bits are bad - if we knew that, we could fix them :-) The eMCA case should
> log two subsections in this case - one for each of the lockstep DIMMs involved. A user
> seeing this will should probably just replace both DIMMs to be safe. If they wanted to
> diagnose further they should swap DIMMs around so this pair are no longer lockstepped
> and see if they start seeing correctable errors from each of the split pair - or if the UC
> errors move with one or the other of the DIMMs
There's also a third case: mirrored memories.
As a matter of coherency with hw-based reports, for cases (2) and (3),
the error tracing should be displaying both memories that are affected
by a UC error (or a CE error on a mirrored address space).
Regards,
Mauro
next prev parent reply other threads:[~2013-10-17 10:34 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-11 6:32 Extended H/W error log driver Chen, Gong
2013-10-11 6:32 ` [PATCH 1/8] ACPI, APEI, CPER: Fix status check during error printing Chen, Gong
2013-10-11 8:50 ` Borislav Petkov
2013-10-11 6:32 ` [PATCH 2/8] ACPI, CPER: Update cper info Chen, Gong
2013-10-11 9:06 ` Borislav Petkov
2013-10-11 9:06 ` Borislav Petkov
2013-10-11 15:47 ` Borislav Petkov
2013-10-16 1:57 ` Joe Perches
2013-10-16 2:46 ` Chen Gong
2013-10-16 3:10 ` Joe Perches
2013-10-15 18:17 ` Naveen N. Rao
2013-10-16 1:39 ` Chen Gong
2013-10-17 12:21 ` Naveen N. Rao
2013-10-18 11:06 ` Naveen N. Rao
2013-10-11 6:32 ` [PATCH 3/8] ACPI, x86: Extended error log driver for x86 platform Chen, Gong
2013-10-11 15:24 ` Borislav Petkov
2013-10-14 3:16 ` Chen Gong
2013-10-14 10:26 ` Borislav Petkov
2013-10-14 13:03 ` Chen Gong
2013-10-14 13:28 ` Borislav Petkov
2013-10-14 16:50 ` Tony Luck
2013-10-14 17:07 ` Borislav Petkov
2013-10-14 17:16 ` Tony Luck
2013-10-11 6:32 ` [PATCH 4/8] DMI: Parse memory device (type 17) in SMBIOS Chen, Gong
2013-10-11 15:40 ` Borislav Petkov
2013-10-14 3:21 ` Chen Gong
2013-10-14 10:30 ` Borislav Petkov
2013-10-15 19:00 ` Naveen N. Rao
2013-10-11 6:32 ` [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error Chen, Gong
2013-10-11 15:41 ` Borislav Petkov
2013-10-15 17:26 ` Naveen N. Rao
2013-10-16 1:35 ` Chen Gong
2013-10-11 6:32 ` [PATCH 6/8] ACPI, APEI, CPER: Enhance memory reporting capability Chen, Gong
2013-10-11 15:49 ` Borislav Petkov
2013-10-15 19:18 ` Naveen N. Rao
2013-10-11 6:32 ` [PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format Chen, Gong
2013-10-11 16:02 ` Borislav Petkov
2013-10-14 4:55 ` Chen Gong
2013-10-14 10:36 ` Borislav Petkov
2013-10-14 17:12 ` Tony Luck
2013-10-14 18:47 ` Borislav Petkov
2013-10-14 21:03 ` Tony Luck
2013-10-14 21:50 ` Borislav Petkov
2013-10-15 9:18 ` Chen Gong
2013-10-15 10:13 ` Borislav Petkov
2013-10-15 11:28 ` Naveen N. Rao
2013-10-15 11:41 ` Naveen N. Rao
2013-10-15 12:29 ` Borislav Petkov
2013-10-15 16:42 ` Joe Perches
2013-10-15 16:49 ` Tony Luck
2013-10-15 16:56 ` Borislav Petkov
2013-10-11 6:32 ` [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver Chen, Gong
2013-10-11 7:52 ` Borislav Petkov
2013-10-11 16:14 ` Borislav Petkov
2013-10-14 7:07 ` Chen Gong
2013-10-15 16:54 ` Naveen N. Rao
2013-10-15 17:00 ` Borislav Petkov
2013-10-15 17:30 ` Naveen N. Rao
2013-10-15 17:47 ` Borislav Petkov
2013-10-16 0:43 ` Mauro Carvalho Chehab
2013-10-16 9:16 ` Borislav Petkov
2013-10-16 10:35 ` Mauro Carvalho Chehab
2013-10-16 10:42 ` Borislav Petkov
2013-10-16 11:55 ` Mauro Carvalho Chehab
2013-10-16 12:20 ` Borislav Petkov
2013-10-16 20:47 ` Luck, Tony
2013-10-17 10:34 ` Mauro Carvalho Chehab [this message]
2013-10-17 21:35 ` Luck, Tony
2013-10-16 20:35 ` Luck, Tony
2013-10-17 10:32 ` Mauro Carvalho Chehab
2013-10-16 9:50 ` Chen Gong
2013-10-16 10:49 ` Borislav Petkov
2013-10-18 11:04 ` Naveen N. Rao
2013-10-11 7:00 ` Extended H/W error log driver Joe Perches
2013-10-11 8:04 ` Borislav Petkov
2013-10-11 14:54 ` Luck, Tony
2013-10-11 14:54 ` Luck, Tony
2013-10-11 15:27 ` Borislav Petkov
2013-10-14 6:49 ` Chen Gong
2013-10-14 10:55 ` Borislav Petkov
2013-10-15 4:07 ` Chen Gong
2013-10-15 9:28 ` Borislav Petkov
2013-10-15 16:15 ` Tony Luck
2013-10-15 19:10 ` Naveen N. Rao
2013-10-15 19:23 ` Borislav Petkov
2013-10-17 12:07 ` Naveen N. Rao
2013-10-17 13:04 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131017073443.675e97f2@samsung.com \
--to=m.chehab@samsung.com \
--cc=arozansk@redhat.com \
--cc=bp@alien8.de \
--cc=gong.chen@linux.intel.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=srostedt@redhat.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.