stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	linux-edac <linux-edac@vger.kernel.org>,
	Borislav Petkov <bp@suse.de>
Subject: [PATCH 4.14 01/31] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present
Date: Sun, 19 Nov 2017 15:59:35 +0100	[thread overview]
Message-ID: <20171119145951.202879437@linuxfoundation.org> (raw)
In-Reply-To: <20171119145951.136379453@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream.

Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/edac/sb_edac.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -462,6 +462,7 @@ static const struct pci_id_table pci_dev
 static const struct pci_id_descr pci_dev_descr_ibridge[] = {
 		/* Processor Home Agent */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0,        0, IMC0) },
+	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 
 		/* Memory controller */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA,     0, IMC0) },
@@ -472,7 +473,6 @@ static const struct pci_id_descr pci_dev
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3,   0, IMC0) },
 
 		/* Optional, mode 2HA */
-	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA,     1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS,    1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0,   1, IMC1) },
@@ -2291,6 +2291,13 @@ static int sbridge_get_onedevice(struct
 next_imc:
 	sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
 	if (!sbridge_dev) {
+		/* If the HA1 wasn't found, don't create EDAC second memory controller */
+		if (dev_descr->dom == IMC1 && devno != 1) {
+			edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+				 PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+			pci_dev_put(pdev);
+			return 0;
+		}
 
 		if (dev_descr->dom == SOCK)
 			goto out_imc;

  reply	other threads:[~2017-11-19 15:00 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-19 14:59 [PATCH 4.14 00/31] 4.14.1-stable review Greg Kroah-Hartman
2017-11-19 14:59 ` Greg Kroah-Hartman [this message]
2017-11-19 14:59 ` [PATCH 4.14 02/31] dmaengine: dmatest: warn user when dma test times out Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 03/31] media: imon: Fix null-ptr-deref in imon_probe Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 04/31] media: dib0700: fix invalid dvb_detach argument Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 05/31] crypto: dh - Fix double free of ctx->p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 06/31] crypto: dh - Dont permit p to be 0 Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 07/31] crypto: dh - Dont permit key or g size longer than p Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 08/31] crypto: brcm - Explicity ACK mailbox message Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 09/31] USB: early: Use new USB product ID and strings for DbC device Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 10/31] USB: usbfs: compute urb->actual_length for isochronous Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 11/31] USB: Add delay-init quirk for Corsair K70 LUX keyboards Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 12/31] usb: gadget: f_fs: Fix use-after-free in ffs_free_inst Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 13/31] USB: serial: metro-usb: stop I/O after failed open Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 14/31] USB: serial: Change DbC debug device binding ID Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 15/31] USB: serial: qcserial: add pid/vid for Sierra Wireless EM7355 fw update Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 16/31] USB: serial: garmin_gps: fix I/O after failed probe and remove Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 17/31] USB: serial: garmin_gps: fix memory leak on probe errors Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 18/31] selftests/x86/protection_keys: Fix syscall NR redefinition warnings Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 19/31] x86/MCE/AMD: Always give panic severity for UC errors in kernel context Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 20/31] platform/x86: peaq-wmi: Add DMI check before binding to the WMI interface Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 21/31] platform/x86: peaq_wmi: Fix missing terminating entry for peaq_dmi_table Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 23/31] HID: wacom: generic: Recognize WACOM_HID_WD_PEN as a type of pen collection Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 24/31] rpmsg: glink: Add missing MODULE_LICENSE Greg Kroah-Hartman
2017-11-19 14:59 ` [PATCH 4.14 25/31] staging: wilc1000: Fix bssid buffer offset in Txq Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 26/31] staging: sm750fb: Fix parameter mistake in poke32 Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 27/31] staging: ccree: fix 64 bit scatter/gather DMA ops Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 28/31] staging: greybus: spilib: fix use-after-free after deregistration Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 29/31] staging: vboxvideo: Fix reporting invalid suggested-offset-properties Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 30/31] staging: rtl8188eu: Revert 4 commits breaking ARP Greg Kroah-Hartman
2017-11-19 15:00 ` [PATCH 4.14 31/31] spi: fix use-after-free at controller deregistration Greg Kroah-Hartman
2017-11-20 14:27 ` [PATCH 4.14 00/31] 4.14.1-stable review Guenter Roeck
2017-11-21 15:26   ` Ben Hutchings
2017-11-21 16:35     ` Greg Kroah-Hartman
2017-11-21 16:46       ` Ben Hutchings
2017-11-21 17:09         ` Greg Kroah-Hartman
2017-11-21 19:07           ` Ben Hutchings
2017-11-21 19:38             ` Guenter Roeck
2017-11-22 16:06               ` Ben Hutchings
2017-11-22 17:00                 ` Greg Kroah-Hartman
2017-11-22 18:52                   ` Guenter Roeck
2017-11-20 18:21 ` Guenter Roeck
2017-11-20 19:16   ` Greg Kroah-Hartman
2017-11-20 21:19 ` Shuah Khan
2017-11-21  7:22   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171119145951.202879437@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bp@suse.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).