From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Aristeu Rozanski <aris@redhat.com>,
linux-edac@vger.kernel.org
Subject: [v4,2/2] EDAC, sb_edac: Fix reporting wrong DIMM when patrol scrubber finds error
Date: Tue, 11 Sep 2018 11:15:27 +0200 [thread overview]
Message-ID: <20180911091527.GB12094@zn.tnic> (raw)
On Mon, Sep 10, 2018 at 02:11:45PM -0700, Luck, Tony wrote:
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>
> The sb_edac driver sometimes reports the wrong DIMM for a memory
> error found by the patrol scrubber. It's rooted in h/w that
> only provides a 4KB page aligned address for the error case.
> This means that the EDAC driver will point at the DIMM matching
> offset 0x0 in the 4KB page, but because of interleaving across
> channels and ranks the actual DIMM involved may be different
> if the error is on some other cache line within the page.
>
> For this case, we can reconstruct the socket/iMC/channel
> information from the "mce" structure passed to the EDAC driver.
> We cannot determine the DIMM, so pass "dimm=-1" to the EDAC core.
> It will report all the DIMMs on that channel may be affected.
>
> [Tony: Improved comments on the functions to convert bank number
> to memory controller number. Minor cleanup to commit comment]
>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>
> v4:
> Fixed typo s/detemine/determine/
> Changed the "msg" when we can't determine the "ha" to
> include the bank number. I've left this in the "msg"
> because we do use that in the error path to the edac
> core code. I don't think a KERN_ERROR message will
> help (especially as this is a "Can't happen(TM)" error).
Fair enough.
All applied,
thx.
next reply other threads:[~2018-09-11 9:15 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-11 9:15 Borislav Petkov [this message]
-- strict thread matches above, loose matches on Subject: below --
2018-09-10 21:11 [v4,2/2] EDAC, sb_edac: Fix reporting wrong DIMM when patrol scrubber finds error Luck, Tony
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180911091527.GB12094@zn.tnic \
--to=bp@alien8.de \
--cc=aris@redhat.com \
--cc=linux-edac@vger.kernel.org \
--cc=qiuxu.zhuo@intel.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox