public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Qiuxu Zhuo <qiuxu.zhuo@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	linux-edac <linux-edac@vger.kernel.org>,
	Borislav Petkov <bp@suse.de>
Subject: [PATCH 4.13 43/44] EDAC, sb_edac: Dont create a second memory controller if HA1 is not present
Date: Thu, 16 Nov 2017 18:43:07 +0100	[thread overview]
Message-ID: <20171116172825.483949271@linuxfoundation.org> (raw)
In-Reply-To: <20171116172823.336649076@linuxfoundation.org>

4.13-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

commit 15cc3ae001873845b5d842e212478a6570c7d938 upstream.

Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/edac/sb_edac.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -455,6 +455,7 @@ static const struct pci_id_table pci_dev
 static const struct pci_id_descr pci_dev_descr_ibridge[] = {
 		/* Processor Home Agent */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0,        0, IMC0) },
+	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 
 		/* Memory controller */
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA,     0, IMC0) },
@@ -465,7 +466,6 @@ static const struct pci_id_descr pci_dev
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3,   0, IMC0) },
 
 		/* Optional, mode 2HA */
-	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1,        1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA,     1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS,    1, IMC1) },
 	{ PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0,   1, IMC1) },
@@ -2260,6 +2260,13 @@ static int sbridge_get_onedevice(struct
 next_imc:
 	sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
 	if (!sbridge_dev) {
+		/* If the HA1 wasn't found, don't create EDAC second memory controller */
+		if (dev_descr->dom == IMC1 && devno != 1) {
+			edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
+				 PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
+			pci_dev_put(pdev);
+			return 0;
+		}
 
 		if (dev_descr->dom == SOCK)
 			goto out_imc;

  parent reply	other threads:[~2017-11-16 17:53 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-16 17:42 [PATCH 4.13 00/44] 4.13.14-stable review Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 01/44] ppp: fix race in ppp device destruction Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 02/44] gso: fix payload length when gso_size is zero Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 03/44] ipv4: Fix traffic triggered IPsec connections Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 04/44] ipv6: " Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 05/44] netlink: do not set cb_running if dumps start() errs Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 06/44] net: call cgroup_sk_alloc() earlier in sk_clone_lock() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 07/44] macsec: fix memory leaks when skb_to_sgvec fails Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 08/44] l2tp: check ps->sock before running pppol2tp_session_ioctl() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 09/44] tun: call dev_get_valid_name() before register_netdevice() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 10/44] netlink: fix netlink_ack() extack race Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 11/44] sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 12/44] tcp/dccp: fix ireq->opt races Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 13/44] packet: avoid panic in packet_getsockopt() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 14/44] geneve: Fix function matching VNI and tunnel ID on big-endian Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 15/44] net: bridge: fix returning of vlan range op errors Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 16/44] soreuseport: fix initialization race Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 17/44] ipv6: flowlabel: do not leave opt->tot_len with garbage Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 18/44] sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 19/44] tcp/dccp: fix lockdep splat in inet_csk_route_req() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 21/44] net: dsa: check master device before put Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 22/44] net/unix: dont show information about sockets from other namespaces Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 23/44] tap: double-free in error path in tap_open() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 24/44] net/mlx5: Fix health work queue spin lock to IRQ safe Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 25/44] net/mlx5e: Properly deal with encap flows add/del under neigh update Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 26/44] ipip: only increase err_count for some certain type icmp in ipip_err Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 27/44] ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 28/44] ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 29/44] tcp: refresh tp timestamp before tcp_mtu_probe() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 30/44] tap: reference to KVA of an unloaded module causes kernel panic Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 31/44] tun: allow positive return values on dev_get_valid_name() call Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 32/44] sctp: reset owner sk for data chunks on out queues when migrating a sock Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 33/44] net_sched: avoid matching qdisc with zero handle Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 34/44] l2tp: hold tunnel in pppol2tp_connect() Greg Kroah-Hartman
2017-11-16 17:42 ` [PATCH 4.13 35/44] tun/tap: sanitize TUNSETSNDBUF input Greg Kroah-Hartman
2017-11-16 17:43 ` [PATCH 4.13 36/44] ipv6: addrconf: increment ifp refcount before ipv6_del_addr() Greg Kroah-Hartman
2017-11-16 17:43 ` [PATCH 4.13 37/44] tcp: fix tcp_mtu_probe() vs highest_sack Greg Kroah-Hartman
2017-11-16 17:43 ` [PATCH 4.13 38/44] mac80211: accept key reinstall without changing anything Greg Kroah-Hartman
2017-11-16 17:43 ` [PATCH 4.13 39/44] mac80211: use constant time comparison with keys Greg Kroah-Hartman
2017-11-16 17:43 ` [PATCH 4.13 40/44] mac80211: dont compare TKIP TX MIC key in reinstall prevention Greg Kroah-Hartman
2017-11-16 17:43 ` [PATCH 4.13 41/44] usb: usbtest: fix NULL pointer dereference Greg Kroah-Hartman
2017-11-16 17:43 ` [PATCH 4.13 42/44] Input: ims-psu - check if CDC union descriptor is sane Greg Kroah-Hartman
2017-11-16 17:43 ` Greg Kroah-Hartman [this message]
2017-11-16 17:43 ` [PATCH 4.13 44/44] dmaengine: dmatest: warn user when dma test times out Greg Kroah-Hartman
2017-11-16 22:45 ` [PATCH 4.13 00/44] 4.13.14-stable review Shuah Khan
2017-11-17  8:10   ` Greg Kroah-Hartman
2017-11-17  2:03 ` Guenter Roeck
2017-11-17  8:11   ` Greg Kroah-Hartman
     [not found] ` <5a0e2714.c5badf0a.3983c.7eaa@mx.google.com>
2017-11-17  8:14   ` Greg Kroah-Hartman
2017-11-18  0:41     ` Kevin Hilman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171116172825.483949271@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bp@suse.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qiuxu.zhuo@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox