From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Tony Luck <tony.luck@intel.com>,
linux-edac <linux-edac@vger.kernel.org>,
Borislav Petkov <bp@suse.de>
Subject: [PATCH 4.14 01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present
Date: Sun, 19 Nov 2017 15:59:35 +0100 [thread overview]
Message-ID: <20171119145951.202879437@linuxfoundation.org> (raw)
In-Reply-To: <20171119145951.136379453@linuxfoundation.org>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream.
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.
The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.
The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:
4024 addresses fell in TOLM hole area
12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
106400 addresses were injected totally.
The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.
In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.
Thus, do not create a second memory controller if the key HA1 is absent.
[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/edac/sb_edac.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -462,6 +462,7 @@ static const struct pci_id_table pci_dev
static const struct pci_id_descr pci_dev_descr_ibridge[] = {
/* Processor Home Agent */
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0, 0, IMC0) },
+ { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
/* Memory controller */
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA, 0, IMC0) },
@@ -472,7 +473,6 @@ static const struct pci_id_descr pci_dev
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3, 0, IMC0) },
/* Optional, mode 2HA */
- { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0, 1, IMC1) },
@@ -2291,6 +2291,13 @@ static int sbridge_get_onedevice(struct
next_imc:
sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
if (!sbridge_dev) {
+ /* If the HA1 wasn't found, don't create EDAC second memory controller */
+ if (dev_descr->dom == IMC1 && devno != 1) {
+ edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+ PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+ pci_dev_put(pdev);
+ return 0;
+ }
if (dev_descr->dom == SOCK)
goto out_imc;
next prev parent reply other threads:[~2017-11-19 15:00 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-19 14:59 [PATCH 4.14 00/31] 4.14.1-stable review Greg Kroah-Hartman
2017-11-19 14:59 ` Greg Kroah-Hartman [this message]
2017-11-19 14:59 ` [PATCH 4.14 02/31] dmaengine: dmatest: warn user when dma test times out Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 03/31] media: imon: Fix null-ptr-deref in imon_probe Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 04/31] media: dib0700: fix invalid dvb_detach argument Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 05/31] crypto: dh - Fix double free of ctx->p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 06/31] crypto: dh - Dont permit p to be 0 Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 07/31] crypto: dh - Dont permit key or g size longer than p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 08/31] crypto: brcm - Explicity ACK mailbox message Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 09/31] USB: early: Use new USB product ID and strings for DbC device Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 10/31] USB: usbfs: compute urb->actual_length for isochronous Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 11/31] USB: Add delay-init quirk for Corsair K70 LUX keyboards Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 12/31] usb: gadget: f_fs: Fix use-after-free in ffs_free_inst Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 13/31] USB: serial: metro-usb: stop I/O after failed open Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 14/31] USB: serial: Change DbC debug device binding ID Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 15/31] USB: serial: qcserial: add pid/vid for Sierra Wireless EM7355 fw update Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 16/31] USB: serial: garmin_gps: fix I/O after failed probe and remove Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 17/31] USB: serial: garmin_gps: fix memory leak on probe errors Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 18/31] selftests/x86/protection_keys: Fix syscall NR redefinition warnings Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 19/31] x86/MCE/AMD: Always give panic severity for UC errors in kernel context Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 20/31] platform/x86: peaq-wmi: Add DMI check before binding to the WMI interface Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 21/31] platform/x86: peaq_wmi: Fix missing terminating entry for peaq_dmi_table Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 23/31] HID: wacom: generic: Recognize WACOM_HID_WD_PEN as a type of pen collection Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 24/31] rpmsg: glink: Add missing MODULE_LICENSE Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 25/31] staging: wilc1000: Fix bssid buffer offset in Txq Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 26/31] staging: sm750fb: Fix parameter mistake in poke32 Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 27/31] staging: ccree: fix 64 bit scatter/gather DMA ops Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 28/31] staging: greybus: spilib: fix use-after-free after deregistration Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 29/31] staging: vboxvideo: Fix reporting invalid suggested-offset-properties Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 30/31] staging: rtl8188eu: Revert 4 commits breaking ARP Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 31/31] spi: fix use-after-free at controller deregistration Greg Kroah-Hartman
2017-11-20 14:27 ` [PATCH 4.14 00/31] 4.14.1-stable review Guenter Roeck
2017-11-21 15:26 ` Ben Hutchings
2017-11-21 16:35 ` Greg Kroah-Hartman
2017-11-21 16:46 ` Ben Hutchings
2017-11-21 17:09 ` Greg Kroah-Hartman
2017-11-21 19:07 ` Ben Hutchings
2017-11-21 19:38 ` Guenter Roeck
2017-11-22 16:06 ` Ben Hutchings
2017-11-22 17:00 ` Greg Kroah-Hartman
2017-11-22 18:52 ` Guenter Roeck
2017-11-20 18:21 ` Guenter Roeck
2017-11-20 19:16 ` Greg Kroah-Hartman
2017-11-20 21:19 ` Shuah Khan
2017-11-21 7:22 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171119145951.202879437@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bp@suse.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=qiuxu.zhuo@intel.com \
--cc=stable@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).