All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	linux-edac <linux-edac@vger.kernel.org>,
	Borislav Petkov <bp@suse.de>
Subject: [4.14,01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present
Date: Sun, 19 Nov 2017 15:59:35 +0100	[thread overview]
Message-ID: <20171119145951.202879437@linuxfoundation.org> (raw)

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream.

Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/edac/sb_edac.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -462,6 +462,7 @@ static const struct pci_id_table pci_dev
 static const struct pci_id_descr pci_dev_descr_ibridge[] = {
 		/* Processor Home Agent */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0,        0, IMC0) },
+	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 
 		/* Memory controller */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA,     0, IMC0) },
@@ -472,7 +473,6 @@ static const struct pci_id_descr pci_dev
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3,   0, IMC0) },
 
 		/* Optional, mode 2HA */
-	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA,     1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS,    1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0,   1, IMC1) },
@@ -2291,6 +2291,13 @@ static int sbridge_get_onedevice(struct
 next_imc:
 	sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
 	if (!sbridge_dev) {
+		/* If the HA1 wasn't found, don't create EDAC second memory controller */
+		if (dev_descr->dom == IMC1 && devno != 1) {
+			edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+				 PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+			pci_dev_put(pdev);
+			return 0;
+		}
 
 		if (dev_descr->dom == SOCK)
 			goto out_imc;

WARNING: multiple messages have this Message-ID (diff)
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	linux-edac <linux-edac@vger.kernel.org>,
	Borislav Petkov <bp@suse.de>
Subject: [PATCH 4.14 01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present
Date: Sun, 19 Nov 2017 15:59:35 +0100	[thread overview]
Message-ID: <20171119145951.202879437@linuxfoundation.org> (raw)
In-Reply-To: <20171119145951.136379453@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream.

Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/edac/sb_edac.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -462,6 +462,7 @@ static const struct pci_id_table pci_dev
 static const struct pci_id_descr pci_dev_descr_ibridge[] = {
 		/* Processor Home Agent */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0,        0, IMC0) },
+	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 
 		/* Memory controller */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA,     0, IMC0) },
@@ -472,7 +473,6 @@ static const struct pci_id_descr pci_dev
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3,   0, IMC0) },
 
 		/* Optional, mode 2HA */
-	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA,     1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS,    1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0,   1, IMC1) },
@@ -2291,6 +2291,13 @@ static int sbridge_get_onedevice(struct
 next_imc:
 	sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
 	if (!sbridge_dev) {
+		/* If the HA1 wasn't found, don't create EDAC second memory controller */
+		if (dev_descr->dom == IMC1 && devno != 1) {
+			edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+				 PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+			pci_dev_put(pdev);
+			return 0;
+		}
 
 		if (dev_descr->dom == SOCK)
 			goto out_imc;

             reply	other threads:[~2017-11-19 14:59 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-19 14:59 Greg Kroah-Hartman [this message]
2017-11-19 14:59 ` [PATCH 4.14 01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present Greg Kroah-Hartman
  -- strict thread matches above, loose matches on Subject: below --
2017-11-19 14:59 [4.14,19/31] x86/MCE/AMD: Always give panic severity for UC errors in kernel context Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 19/31] " Greg Kroah-Hartman
2017-11-19 14:59 [PATCH 4.14 00/31] 4.14.1-stable review Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 02/31] dmaengine: dmatest: warn user when dma test times out Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 03/31] media: imon: Fix null-ptr-deref in imon_probe Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 04/31] media: dib0700: fix invalid dvb_detach argument Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 05/31] crypto: dh - Fix double free of ctx->p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 06/31] crypto: dh - Dont permit p to be 0 Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 07/31] crypto: dh - Dont permit key or g size longer than p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 08/31] crypto: brcm - Explicity ACK mailbox message Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 09/31] USB: early: Use new USB product ID and strings for DbC device Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 10/31] USB: usbfs: compute urb->actual_length for isochronous Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 11/31] USB: Add delay-init quirk for Corsair K70 LUX keyboards Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 12/31] usb: gadget: f_fs: Fix use-after-free in ffs_free_inst Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 13/31] USB: serial: metro-usb: stop I/O after failed open Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 14/31] USB: serial: Change DbC debug device binding ID Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 15/31] USB: serial: qcserial: add pid/vid for Sierra Wireless EM7355 fw update Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 16/31] USB: serial: garmin_gps: fix I/O after failed probe and remove Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 17/31] USB: serial: garmin_gps: fix memory leak on probe errors Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 18/31] selftests/x86/protection_keys: Fix syscall NR redefinition warnings Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 20/31] platform/x86: peaq-wmi: Add DMI check before binding to the WMI interface Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 21/31] platform/x86: peaq_wmi: Fix missing terminating entry for peaq_dmi_table Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 23/31] HID: wacom: generic: Recognize WACOM_HID_WD_PEN as a type of pen collection Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 24/31] rpmsg: glink: Add missing MODULE_LICENSE Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 25/31] staging: wilc1000: Fix bssid buffer offset in Txq Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 26/31] staging: sm750fb: Fix parameter mistake in poke32 Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 27/31] staging: ccree: fix 64 bit scatter/gather DMA ops Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 28/31] staging: greybus: spilib: fix use-after-free after deregistration Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 29/31] staging: vboxvideo: Fix reporting invalid suggested-offset-properties Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 30/31] staging: rtl8188eu: Revert 4 commits breaking ARP Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 31/31] spi: fix use-after-free at controller deregistration Greg Kroah-Hartman
2017-11-20 14:27 ` [PATCH 4.14 00/31] 4.14.1-stable review Guenter Roeck
2017-11-21 15:26   ` Ben Hutchings
2017-11-21 16:35     ` Greg Kroah-Hartman
2017-11-21 16:35       ` Greg Kroah-Hartman
2017-11-21 16:46       ` Ben Hutchings
2017-11-21 17:09         ` Greg Kroah-Hartman
2017-11-21 17:09           ` Greg Kroah-Hartman
2017-11-21 19:07           ` Ben Hutchings
2017-11-21 19:38             ` Guenter Roeck
2017-11-21 19:38               ` Guenter Roeck
2017-11-22 16:06               ` Ben Hutchings
2017-11-22 17:00                 ` Greg Kroah-Hartman
2017-11-22 17:00                   ` Greg Kroah-Hartman
2017-11-22 18:52                   ` Guenter Roeck
2017-11-22 18:52                     ` Guenter Roeck
2017-11-20 18:21 ` Guenter Roeck
2017-11-20 19:16   ` Greg Kroah-Hartman
2017-11-20 21:19 ` Shuah Khan
2017-11-21  7:22   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171119145951.202879437@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bp@suse.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.