From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Tony Luck <tony.luck@intel.com>,
linux-edac <linux-edac@vger.kernel.org>,
Borislav Petkov <bp@suse.de>
Subject: [4.14,01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present
Date: Sun, 19 Nov 2017 15:59:35 +0100 [thread overview]
Message-ID: <20171119145951.202879437@linuxfoundation.org> (raw)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream.
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.
The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.
The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:
4024 addresses fell in TOLM hole area
12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
106400 addresses were injected totally.
The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.
In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.
Thus, do not create a second memory controller if the key HA1 is absent.
[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/edac/sb_edac.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -462,6 +462,7 @@ static const struct pci_id_table pci_dev
static const struct pci_id_descr pci_dev_descr_ibridge[] = {
/* Processor Home Agent */
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0, 0, IMC0) },
+ { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
/* Memory controller */
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA, 0, IMC0) },
@@ -472,7 +473,6 @@ static const struct pci_id_descr pci_dev
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3, 0, IMC0) },
/* Optional, mode 2HA */
- { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0, 1, IMC1) },
@@ -2291,6 +2291,13 @@ static int sbridge_get_onedevice(struct
next_imc:
sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
if (!sbridge_dev) {
+ /* If the HA1 wasn't found, don't create EDAC second memory controller */
+ if (dev_descr->dom == IMC1 && devno != 1) {
+ edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+ PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+ pci_dev_put(pdev);
+ return 0;
+ }
if (dev_descr->dom == SOCK)
goto out_imc;
WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
Tony Luck <tony.luck@intel.com>,
linux-edac <linux-edac@vger.kernel.org>,
Borislav Petkov <bp@suse.de>
Subject: [PATCH 4.14 01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present
Date: Sun, 19 Nov 2017 15:59:35 +0100 [thread overview]
Message-ID: <20171119145951.202879437@linuxfoundation.org> (raw)
In-Reply-To: <20171119145951.136379453@linuxfoundation.org>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream.
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.
The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.
The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:
4024 addresses fell in TOLM hole area
12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
106400 addresses were injected totally.
The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.
In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.
Thus, do not create a second memory controller if the key HA1 is absent.
[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/edac/sb_edac.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -462,6 +462,7 @@ static const struct pci_id_table pci_dev
static const struct pci_id_descr pci_dev_descr_ibridge[] = {
/* Processor Home Agent */
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0, 0, IMC0) },
+ { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
/* Memory controller */
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA, 0, IMC0) },
@@ -472,7 +473,6 @@ static const struct pci_id_descr pci_dev
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3, 0, IMC0) },
/* Optional, mode 2HA */
- { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS, 1, IMC1) },
{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0, 1, IMC1) },
@@ -2291,6 +2291,13 @@ static int sbridge_get_onedevice(struct
next_imc:
sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
if (!sbridge_dev) {
+ /* If the HA1 wasn't found, don't create EDAC second memory controller */
+ if (dev_descr->dom == IMC1 && devno != 1) {
+ edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+ PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+ pci_dev_put(pdev);
+ return 0;
+ }
if (dev_descr->dom == SOCK)
goto out_imc;
next reply other threads:[~2017-11-19 14:59 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-19 14:59 Greg Kroah-Hartman [this message]
2017-11-19 14:59 ` [PATCH 4.14 01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present Greg Kroah-Hartman
-- strict thread matches above, loose matches on Subject: below --
2017-11-19 14:59 [4.14,19/31] x86/MCE/AMD: Always give panic severity for UC errors in kernel context Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 19/31] " Greg Kroah-Hartman
2017-11-19 14:59 [PATCH 4.14 00/31] 4.14.1-stable review Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 02/31] dmaengine: dmatest: warn user when dma test times out Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 03/31] media: imon: Fix null-ptr-deref in imon_probe Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 04/31] media: dib0700: fix invalid dvb_detach argument Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 05/31] crypto: dh - Fix double free of ctx->p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 06/31] crypto: dh - Dont permit p to be 0 Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 07/31] crypto: dh - Dont permit key or g size longer than p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 08/31] crypto: brcm - Explicity ACK mailbox message Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 09/31] USB: early: Use new USB product ID and strings for DbC device Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 10/31] USB: usbfs: compute urb->actual_length for isochronous Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 11/31] USB: Add delay-init quirk for Corsair K70 LUX keyboards Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 12/31] usb: gadget: f_fs: Fix use-after-free in ffs_free_inst Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 13/31] USB: serial: metro-usb: stop I/O after failed open Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 14/31] USB: serial: Change DbC debug device binding ID Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 15/31] USB: serial: qcserial: add pid/vid for Sierra Wireless EM7355 fw update Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 16/31] USB: serial: garmin_gps: fix I/O after failed probe and remove Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 17/31] USB: serial: garmin_gps: fix memory leak on probe errors Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 18/31] selftests/x86/protection_keys: Fix syscall NR redefinition warnings Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 20/31] platform/x86: peaq-wmi: Add DMI check before binding to the WMI interface Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 21/31] platform/x86: peaq_wmi: Fix missing terminating entry for peaq_dmi_table Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 23/31] HID: wacom: generic: Recognize WACOM_HID_WD_PEN as a type of pen collection Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 24/31] rpmsg: glink: Add missing MODULE_LICENSE Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 25/31] staging: wilc1000: Fix bssid buffer offset in Txq Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 26/31] staging: sm750fb: Fix parameter mistake in poke32 Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 27/31] staging: ccree: fix 64 bit scatter/gather DMA ops Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 28/31] staging: greybus: spilib: fix use-after-free after deregistration Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 29/31] staging: vboxvideo: Fix reporting invalid suggested-offset-properties Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 30/31] staging: rtl8188eu: Revert 4 commits breaking ARP Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 31/31] spi: fix use-after-free at controller deregistration Greg Kroah-Hartman
2017-11-20 14:27 ` [PATCH 4.14 00/31] 4.14.1-stable review Guenter Roeck
2017-11-21 15:26 ` Ben Hutchings
2017-11-21 16:35 ` Greg Kroah-Hartman
2017-11-21 16:35 ` Greg Kroah-Hartman
2017-11-21 16:46 ` Ben Hutchings
2017-11-21 17:09 ` Greg Kroah-Hartman
2017-11-21 17:09 ` Greg Kroah-Hartman
2017-11-21 19:07 ` Ben Hutchings
2017-11-21 19:38 ` Guenter Roeck
2017-11-21 19:38 ` Guenter Roeck
2017-11-22 16:06 ` Ben Hutchings
2017-11-22 17:00 ` Greg Kroah-Hartman
2017-11-22 17:00 ` Greg Kroah-Hartman
2017-11-22 18:52 ` Guenter Roeck
2017-11-22 18:52 ` Guenter Roeck
2017-11-20 18:21 ` Guenter Roeck
2017-11-20 19:16 ` Greg Kroah-Hartman
2017-11-20 21:19 ` Shuah Khan
2017-11-21 7:22 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171119145951.202879437@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=bp@suse.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=qiuxu.zhuo@intel.com \
--cc=stable@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.