From: Ingo Molnar <mingo@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Yazen Ghannam <Yazen.Ghannam@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 3/6] x86/mce: Add support for new MCA_SYND register
Date: Fri, 8 Jul 2016 12:26:48 +0200 [thread overview]
Message-ID: <20160708102648.GA22597@gmail.com> (raw)
In-Reply-To: <20160708101452.GD3808@pd.tnic>
* Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Jul 08, 2016 at 11:46:53AM +0200, Ingo Molnar wrote:
> > I'm not sure I can parse that: how can a reported error have bits corrupted?
>
> No, it is about the actual bits in memory the ECC error is generated
> for. So, for example, if an ECC error reports that memory location X had
> some bit flips, the syndrome value which gets reported together with
> same ECC error shows which actual bits have flipped.
>
> Here's an example from the AMD BKDG, maybe that'll make it more clear:
>
> http://support.amd.com/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
>
> Go to page 246, there it says this:
>
> "For example, assume the ECC syndrome is 03EAh. First search row EAh
> for the complete syndrome. Since it is not found, search row 03h for
> the complete syndrome. It is found in column 9h, so symbol 9h has the
> error. Since the error bitmask indicates value 3h (0011b), bits 0 and 1
> within that symbol are corrupted. Symbol 9h maps to bits 72-79, so the
> corrupted bits are 72 and 73 of the line."
>
> So you basically search the table of x8 ECC correctable syndromes, first
> in row EAh (second syndrome byte) and if you don't find the complete
> syndrome there, you search row 03 for it.
>
> It is in column 9 and that means symbol 9. The symbols are 16 - one
> symbol for each byte in a 128bit DRAM word + 3 special symbols for the
> ECC bits.
>
> The row number 3h is also the error bitmask, so bits 0 and 1 are the
> ones which are corrupted.
>
> Which means, when you look at the value in DRAM at the address the error
> was reported, you need to go to symbol 9, that's 9*8 = 72 which means,
> bits 72-79 and the first 2 in that byte are bits 72 and 73.
>
> So if you want to correct them, you simply flip them as the syndrome
> tells you that those 2 are corrupted.
>
> Ok?
So is 'ECC syndrome' a fancy word and a complicated process for identifying what
data got corrupted, in a more accurate fashion than what we had before?
Because previously we already had a memory address of the memory corruption,
right?
What is the typical 'scope' of that memory corruption address - a cache line, a
machine word, a byte or maybe a variable unit that is memory hardware dependent?
Thanks,
Ingo
next prev parent reply other threads:[~2016-07-08 10:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-08 9:09 [PATCH 0/6] x86/RAS queue Borislav Petkov
2016-07-08 9:09 ` [PATCH 1/6] x86/mce/AMD: Increase size of bank_map type Borislav Petkov
2016-07-08 9:21 ` Ingo Molnar
2016-07-08 9:32 ` Borislav Petkov
2016-07-08 12:05 ` [tip:ras/core] x86/mce/AMD: Increase size of the " tip-bot for Aravind Gopalakrishnan
2016-07-08 9:09 ` [PATCH 2/6] x86/RAS/AMD: Reduce number of IPIs when prepping error injection Borislav Petkov
2016-07-08 12:06 ` [tip:ras/core] x86/RAS/AMD: Reduce the " tip-bot for Yazen Ghannam
2016-07-08 9:09 ` [PATCH 3/6] x86/mce: Add support for new MCA_SYND register Borislav Petkov
2016-07-08 9:26 ` Ingo Molnar
2016-07-08 9:37 ` Borislav Petkov
2016-07-08 9:46 ` Ingo Molnar
2016-07-08 10:14 ` Borislav Petkov
2016-07-08 10:26 ` Ingo Molnar [this message]
2016-07-08 10:48 ` Borislav Petkov
2016-07-08 9:09 ` [PATCH 4/6] x86/mce: Fix mce_rdmsrl() warning message Borislav Petkov
2016-07-08 12:06 ` [tip:ras/core] " tip-bot for Borislav Petkov
2016-07-08 9:09 ` [PATCH 5/6] EDAC, mce_amd: Print syndrome register value on SMCA systems Borislav Petkov
2016-07-08 9:09 ` [PATCH 6/6] x86/RAS: Add syndrome support to mce_amd_inj Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160708102648.GA22597@gmail.com \
--to=mingo@kernel.org \
--cc=Yazen.Ghannam@amd.com \
--cc=a.p.zijlstra@chello.nl \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.