From: Ingo Molnar <mingo@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Yazen Ghannam <Yazen.Ghannam@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 3/6] x86/mce: Add support for new MCA_SYND register
Date: Fri, 8 Jul 2016 12:26:48 +0200 [thread overview]
Message-ID: <20160708102648.GA22597@gmail.com> (raw)
In-Reply-To: <20160708101452.GD3808@pd.tnic>
* Borislav Petkov <bp@alien8.de> wrote:
> On Fri, Jul 08, 2016 at 11:46:53AM +0200, Ingo Molnar wrote:
> > I'm not sure I can parse that: how can a reported error have bits corrupted?
>
> No, it is about the actual bits in memory the ECC error is generated
> for. So, for example, if an ECC error reports that memory location X had
> some bit flips, the syndrome value which gets reported together with
> same ECC error shows which actual bits have flipped.
>
> Here's an example from the AMD BKDG, maybe that'll make it more clear:
>
> http://support.amd.com/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
>
> Go to page 246, there it says this:
>
> "For example, assume the ECC syndrome is 03EAh. First search row EAh
> for the complete syndrome. Since it is not found, search row 03h for
> the complete syndrome. It is found in column 9h, so symbol 9h has the
> error. Since the error bitmask indicates value 3h (0011b), bits 0 and 1
> within that symbol are corrupted. Symbol 9h maps to bits 72-79, so the
> corrupted bits are 72 and 73 of the line."
>
> So you basically search the table of x8 ECC correctable syndromes, first
> in row EAh (second syndrome byte) and if you don't find the complete
> syndrome there, you search row 03 for it.
>
> It is in column 9 and that means symbol 9. The symbols are 16 - one
> symbol for each byte in a 128bit DRAM word + 3 special symbols for the
> ECC bits.
>
> The row number 3h is also the error bitmask, so bits 0 and 1 are the
> ones which are corrupted.
>
> Which means, when you look at the value in DRAM at the address the error
> was reported, you need to go to symbol 9, that's 9*8 = 72 which means,
> bits 72-79 and the first 2 in that byte are bits 72 and 73.
>
> So if you want to correct them, you simply flip them as the syndrome
> tells you that those 2 are corrupted.
>
> Ok?
So is 'ECC syndrome' a fancy word and a complicated process for identifying what
data got corrupted, in a more accurate fashion than what we had before?
Because previously we already had a memory address of the memory corruption,
right?
What is the typical 'scope' of that memory corruption address - a cache line, a
machine word, a byte or maybe a variable unit that is memory hardware dependent?
Thanks,
Ingo
next prev parent reply other threads:[~2016-07-08 10:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-08 9:09 [PATCH 0/6] x86/RAS queue Borislav Petkov
2016-07-08 9:09 ` [PATCH 1/6] x86/mce/AMD: Increase size of bank_map type Borislav Petkov
2016-07-08 9:21 ` Ingo Molnar
2016-07-08 9:32 ` Borislav Petkov
2016-07-08 12:05 ` [tip:ras/core] x86/mce/AMD: Increase size of the " tip-bot for Aravind Gopalakrishnan
2016-07-08 9:09 ` [PATCH 2/6] x86/RAS/AMD: Reduce number of IPIs when prepping error injection Borislav Petkov
2016-07-08 12:06 ` [tip:ras/core] x86/RAS/AMD: Reduce the " tip-bot for Yazen Ghannam
2016-07-08 9:09 ` [PATCH 3/6] x86/mce: Add support for new MCA_SYND register Borislav Petkov
2016-07-08 9:26 ` Ingo Molnar
2016-07-08 9:37 ` Borislav Petkov
2016-07-08 9:46 ` Ingo Molnar
2016-07-08 10:14 ` Borislav Petkov
2016-07-08 10:26 ` Ingo Molnar [this message]
2016-07-08 10:48 ` Borislav Petkov
2016-07-08 9:09 ` [PATCH 4/6] x86/mce: Fix mce_rdmsrl() warning message Borislav Petkov
2016-07-08 12:06 ` [tip:ras/core] " tip-bot for Borislav Petkov
2016-07-08 9:09 ` [PATCH 5/6] EDAC, mce_amd: Print syndrome register value on SMCA systems Borislav Petkov
2016-07-08 9:09 ` [PATCH 6/6] x86/RAS: Add syndrome support to mce_amd_inj Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160708102648.GA22597@gmail.com \
--to=mingo@kernel.org \
--cc=Yazen.Ghannam@amd.com \
--cc=a.p.zijlstra@chello.nl \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox