From: Roland Dreier <roland@kernel.org>
To: Mauro Carvalho Chehab <mchehab@redhat.com>,
Doug Thompson <dougthompson@xmission.com>
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH] sb_edac: Only register mce_decode_chain once
Date: Thu, 31 May 2012 11:19:51 -0700 [thread overview]
Message-ID: <1338488391-20755-1-git-send-email-roland@kernel.org> (raw)
From: Roland Dreier <roland@purestorage.com>
I was lucky enough to get a 4-socket Sandy Bridge system.
Unfortunately it hangs on boot when loading the sb_edac module, with
the NMI watchdog giving the following trace:
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC MC1: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#1': DEV 0000:7f:0e.0
EDAC MC2: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#2': DEV 0000:bf:0e.0
------------[ cut here ]------------
WARNING: at /home/roland/linux-2.6/kernel/watchdog.c:242 watchdog_overflow_callback+0x9a/0xc0()
Hardware name:
Watchdog detected hard LOCKUP on cpu 11
Modules linked in: sb_edac(+) edac_core kvm_intel coretemp kvm mei acpi_pad joydev ghash_clmulni_intel hid_generic aesni_intel cryptd aes_x86_64 acpi_power_meter lpc_ich shpchp microcode usbhid hid ses enclosure bnx2x libcrc32c megaraid_sas mdio
Pid: 2408, comm: modprobe Tainted: G W 3.4.0+ #1
Call Trace:
<NMI> [<ffffffff810515ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffff810516f6>] warn_slowpath_fmt+0x46/0x50
[<ffffffff810db3da>] watchdog_overflow_callback+0x9a/0xc0
[<ffffffff81115ffc>] __perf_event_overflow+0x9c/0x220
[<ffffffff81024faa>] ? x86_perf_event_set_period+0xda/0x150
[<ffffffff81116ad4>] perf_event_overflow+0x14/0x20
[<ffffffff8102a190>] intel_pmu_handle_irq+0x180/0x300
[<ffffffff813a0ed0>] ? ghes_read_estatus+0x90/0x180
[<ffffffff816550d1>] perf_event_nmi_handler+0x21/0x30
[<ffffffff81654851>] nmi_handle.isra.0+0x51/0x80
[<ffffffff81654a21>] do_nmi+0x1a1/0x380
[<ffffffff81653e7c>] end_repeat_nmi+0x1a/0x1e
[<ffffffff8107aa05>] ? atomic_notifier_chain_register+0x35/0x60
[<ffffffff8107aa05>] ? atomic_notifier_chain_register+0x35/0x60
[<ffffffff8107aa05>] ? atomic_notifier_chain_register+0x35/0x60
<<EOE>> [<ffffffff8102b5dd>] mce_register_decode_chain+0x2d/0x120
[<ffffffffa00fdbdb>] sbridge_probe+0xa86/0xbab [sb_edac]
[<ffffffff811eaf05>] ? sysfs_link_sibling+0xa5/0xe0
[<ffffffff81339c1c>] local_pci_probe+0x5c/0xd0
[<ffffffff8133b551>] pci_device_probe+0x101/0x120
[<ffffffff813fbf3e>] driver_probe_device+0x7e/0x220
[<ffffffff813fc18b>] __driver_attach+0xab/0xb0
[<ffffffff813fc0e0>] ? driver_probe_device+0x220/0x220
[<ffffffff813fa376>] bus_for_each_dev+0x56/0x90
[<ffffffff813fba5e>] driver_attach+0x1e/0x20
[<ffffffff813fb610>] bus_add_driver+0x1a0/0x270
[<ffffffffa000a000>] ? 0xffffffffa0009fff
[<ffffffffa000a000>] ? 0xffffffffa0009fff
[<ffffffff813fc6e6>] driver_register+0x76/0x130
[<ffffffffa000a000>] ? 0xffffffffa0009fff
[<ffffffff8133b225>] __pci_register_driver+0x55/0xd0
[<ffffffffa000a033>] sbridge_init+0x33/0x1000 [sb_edac]
[<ffffffff8100203f>] do_one_initcall+0x3f/0x170
[<ffffffff810b2c0e>] sys_init_module+0xbe/0x230
[<ffffffff8165b7a9>] system_call_fastpath+0x16/0x1b
---[ end trace a7919e7f17c0a727 ]---
The problem is that the system has multiple memory controllers but
registers the same static notifier_block multiple times. Fix this by
moving the registration/unregistration to the module init/exit function.
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
---
drivers/edac/sb_edac.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 4adaf4b..a21ace0 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -1604,8 +1604,6 @@ static void sbridge_unregister_mci(struct sbridge_dev *sbridge_dev)
debugf0("MC: " __FILE__ ": %s(): mci = %p, dev = %p\n",
__func__, mci, &sbridge_dev->pdev[0]->dev);
- mce_unregister_decode_chain(&sbridge_mce_dec);
-
/* Remove MC sysfs nodes */
edac_mc_del_mc(mci->dev);
@@ -1682,7 +1680,6 @@ static int sbridge_register_mci(struct sbridge_dev *sbridge_dev)
goto fail0;
}
- mce_register_decode_chain(&sbridge_mce_dec);
return 0;
fail0:
@@ -1811,8 +1808,10 @@ static int __init sbridge_init(void)
pci_rc = pci_register_driver(&sbridge_driver);
- if (pci_rc >= 0)
+ if (pci_rc >= 0) {
+ mce_register_decode_chain(&sbridge_mce_dec);
return 0;
+ }
sbridge_printk(KERN_ERR, "Failed to register device with error %d.\n",
pci_rc);
@@ -1828,6 +1827,7 @@ static void __exit sbridge_exit(void)
{
debugf2("MC: " __FILE__ ": %s()\n", __func__);
pci_unregister_driver(&sbridge_driver);
+ mce_unregister_decode_chain(&sbridge_mce_dec);
}
module_init(sbridge_init);
--
1.7.9.5
next reply other threads:[~2012-05-31 18:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-31 18:19 Roland Dreier [this message]
2012-06-01 2:37 ` [PATCH] sb_edac: Only register mce_decode_chain once Chen Gong
2012-06-01 6:04 ` Roland Dreier
2012-06-01 7:16 ` Chen Gong
2012-06-11 17:23 ` Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1338488391-20755-1-git-send-email-roland@kernel.org \
--to=roland@kernel.org \
--cc=dougthompson@xmission.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox