From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Chen Gong <gong.chen@linux.intel.com>,
Mauro Carvalho Chehab <mchehab@redhat.com>
Subject: [ 03/48] edac: avoid mce decoding crash after edac driver unloaded
Date: Sun, 01 Jul 2012 18:20:09 +0100 [thread overview]
Message-ID: <20120701172007.137742223@decadent.org.uk> (raw)
In-Reply-To: <20120701172006.535271340@decadent.org.uk>
3.2-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Gong <gong.chen@linux.intel.com>
commit e35fca4791fcdd43dc1fd769797df40c562ab491 upstream.
Some edac drivers register themselves as mce decoders via
notifier_chain. But in current notifier_chain implementation logic,
it doesn't accept same notifier registered twice. If so, it will be
wrong when adding/removing the element from the list. For example,
on one SandyBridge platform, remove module sb_edac and then trigger
one error, it will hit oops because it has no mce decoder registered
but related notifier_chain still points to an invalid callback
function. Here is an example:
Call Trace:
[<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20
[<ffffffff8102b936>] mce_log+0x46/0x180
[<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60
[<ffffffff812e19d2>] ghes_do_proc+0x192/0x210
[<ffffffff812e2066>] ghes_proc+0x46/0x70
[<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80
[<ffffffff8150ef05>] notifier_call_chain+0x55/0x80
[<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80
[<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23
[<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b
[<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b
[<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f
[<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36
[<ffffffff81069dc2>] process_one_work+0x132/0x450
[<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0
[<ffffffff8106ba50>] ? manage_workers+0x120/0x120
[<ffffffff81070aee>] kthread+0x9e/0xb0
[<ffffffff81514724>] kernel_thread_helper+0x4/0x10
[<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff81514720>] ? gs_change+0x13/0x13
Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41
0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b
79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41
RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80
RSP <ffff88042868fb20>
CR2: ffffffffa01af838
---[ end trace 0100930068e73e6f ]---
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff810705b0>] kthread_data+0x10/0x20
PGD 1a0d067 PUD 1a0e067 PMD 0
Oops: 0000 [#2] SMP
Only i7core_edac and sb_edac have such issues because they have more
than one memory controller which means they have to register mce
decoder many times.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[bwh: Backported to 3.2: drivers call atomic_notifier_chain_{,un}register()
directly]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1932,12 +1932,6 @@
if (mce->bank != 8)
return NOTIFY_DONE;
-#ifdef CONFIG_SMP
- /* Only handle if it is the right mc controller */
- if (mce->socketid != pvt->i7core_dev->socket)
- return NOTIFY_DONE;
-#endif
-
smp_rmb();
if ((pvt->mce_out + 1) % MCE_LOG_LEN == pvt->mce_in) {
smp_wmb();
@@ -2234,8 +2228,6 @@
if (pvt->enable_scrub)
disable_sdram_scrub_setting(mci);
- atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &i7_mce_dec);
-
/* Disable EDAC polling */
i7core_pci_ctl_release(pvt);
@@ -2336,8 +2328,6 @@
/* DCLK for scrub rate setting */
pvt->dclk_freq = get_dclk_freq();
- atomic_notifier_chain_register(&x86_mce_decoder_chain, &i7_mce_dec);
-
return 0;
fail0:
@@ -2481,8 +2471,10 @@
pci_rc = pci_register_driver(&i7core_driver);
- if (pci_rc >= 0)
+ if (pci_rc >= 0) {
+ atomic_notifier_chain_register(&x86_mce_decoder_chain, &i7_mce_dec);
return 0;
+ }
i7core_printk(KERN_ERR, "Failed to register device with error %d.\n",
pci_rc);
@@ -2498,6 +2490,7 @@
{
debugf2("MC: " __FILE__ ": %s()\n", __func__);
pci_unregister_driver(&i7core_driver);
+ atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &i7_mce_dec);
}
module_init(i7core_init);
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -1661,9 +1661,6 @@
debugf0("MC: " __FILE__ ": %s(): mci = %p, dev = %p\n",
__func__, mci, &sbridge_dev->pdev[0]->dev);
- atomic_notifier_chain_unregister(&x86_mce_decoder_chain,
- &sbridge_mce_dec);
-
/* Remove MC sysfs nodes */
edac_mc_del_mc(mci->dev);
@@ -1731,8 +1728,6 @@
goto fail0;
}
- atomic_notifier_chain_register(&x86_mce_decoder_chain,
- &sbridge_mce_dec);
return 0;
fail0:
@@ -1861,8 +1856,10 @@
pci_rc = pci_register_driver(&sbridge_driver);
- if (pci_rc >= 0)
+ if (pci_rc >= 0) {
+ atomic_notifier_chain_register(&x86_mce_decoder_chain, &sbridge_mce_dec);
return 0;
+ }
sbridge_printk(KERN_ERR, "Failed to register device with error %d.\n",
pci_rc);
@@ -1878,6 +1875,7 @@
{
debugf2("MC: " __FILE__ ": %s()\n", __func__);
pci_unregister_driver(&sbridge_driver);
+ atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &sbridge_mce_dec);
}
module_init(sbridge_init);
next prev parent reply other threads:[~2012-07-01 18:16 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-01 17:20 [ 00/48] 3.2.22-stable review Ben Hutchings
2012-07-01 17:20 ` [ 01/48] staging:iio:ad7606: Re-add missing scale attribute Ben Hutchings
2012-07-01 17:20 ` [ 02/48] Tools: hv: verify origin of netlink connector message Ben Hutchings
2012-07-01 17:20 ` Ben Hutchings [this message]
2012-07-01 17:20 ` [ 04/48] hwrng: atmel-rng - fix data valid check Ben Hutchings
2012-07-01 17:20 ` [ 05/48] staging: r8712u: Add new USB IDs Ben Hutchings
2012-07-01 17:20 ` [ 06/48] hwmon: (applesmc) Limit key length in warning messages Ben Hutchings
2012-07-01 17:20 ` [ 07/48] mm: fix slab->page _count corruption when using slub Ben Hutchings
2012-07-02 23:46 ` Herton Ronaldo Krzesinski
2012-07-02 23:56 ` Herton Ronaldo Krzesinski
2012-07-03 1:17 ` Herton Ronaldo Krzesinski
2012-07-03 20:19 ` Pravin Shelar
2012-07-04 4:36 ` Ben Hutchings
2012-07-01 17:20 ` [ 08/48] mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition Ben Hutchings
2012-07-01 17:20 ` [ 09/48] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE Ben Hutchings
2012-07-01 17:20 ` [ 10/48] nilfs2: ensure proper cache clearing for gc-inodes Ben Hutchings
2012-07-01 17:20 ` [ 11/48] mm: correctly synchronize rss-counters at exit/exec Ben Hutchings
2012-07-01 19:02 ` Hugh Dickins
2012-07-02 2:01 ` Ben Hutchings
2012-07-02 16:46 ` Oleg Nesterov
2012-07-04 4:31 ` Ben Hutchings
2012-07-01 17:20 ` [ 12/48] drm/i915: Finish any pending operations on the framebuffer before disabling Ben Hutchings
2012-07-01 17:20 ` [ 13/48] drm/i915: Remove use of the autoreported ringbuffer HEAD position Ben Hutchings
2012-07-01 17:20 ` [ 14/48] e1000e: Disable ASPM L1 on 82574 Ben Hutchings
2012-07-01 17:20 ` [ 15/48] e1000e: Remove special case for 82573/82574 ASPM L1 disablement Ben Hutchings
2012-07-01 22:59 ` Jonathan Nieder
2012-07-02 7:21 ` Chris Boot
2012-07-01 17:20 ` [ 16/48] drm/i915: Do the fallback non-IRQ wait in ring throttle, too Ben Hutchings
2012-07-01 17:20 ` [ 17/48] staging:rts_pstor:Fix possible panic by NULL pointer dereference Ben Hutchings
2012-07-01 17:20 ` [ 18/48] [media] gspca-core: Fix buffers staying in queued state after a stream_off Ben Hutchings
2012-07-01 17:20 ` [ 19/48] [media] smsusb: add autodetection support for USB ID 2040:f5a0 Ben Hutchings
2012-07-01 17:20 ` [ 20/48] drm/edid: dont return stack garbage from supports_rb Ben Hutchings
2012-07-01 17:20 ` [ 21/48] drm/nouveau/fbcon: using nv_two_heads is not a good idea Ben Hutchings
2012-07-01 17:20 ` [ 22/48] dm thin: reinstate missing mempool_free in cell_release_singleton Ben Hutchings
2012-07-01 17:20 ` [ 23/48] ath9k: Fix a WARNING on suspend/resume with IBSS Ben Hutchings
2012-07-01 17:20 ` [ 24/48] cfg80211: fix potential deadlock in regulatory Ben Hutchings
2012-07-01 17:20 ` [ 25/48] ath9k: Fix softlockup in AR9485 Ben Hutchings
2012-07-01 17:20 ` [ 26/48] can: c_can: precedence error in c_can_chip_config() Ben Hutchings
2012-07-01 17:20 ` [ 27/48] ath9k: fix a tx rate duration calculation bug Ben Hutchings
2012-07-01 17:20 ` [ 28/48] batman-adv: fix skb->data assignment Ben Hutchings
2012-07-01 17:20 ` [ 29/48] ARM: SAMSUNG: Should check for IS_ERR(clk) instead of NULL Ben Hutchings
2012-07-01 17:20 ` [ 30/48] ath9k_hw: avoid possible infinite loop in ar9003_get_pll_sqsum_dvc Ben Hutchings
2012-07-01 17:20 ` [ 31/48] iwlwifi: remove log_event debugfs file debugging is disabled Ben Hutchings
2012-07-01 17:20 ` [ 32/48] ARM: SAMSUNG: Fix for S3C2412 EBI memory mapping Ben Hutchings
2012-07-01 17:20 ` [ 33/48] USB: option: add id for Cellient MEN-200 Ben Hutchings
2012-07-01 17:20 ` [ 34/48] oprofile: perf: use NR_CPUS instead or nr_cpumask_bits for static array Ben Hutchings
2012-07-01 17:20 ` [ 35/48] drm/i915: Refactor the deferred PM_IIR handling into a single function Ben Hutchings
2012-07-01 17:20 ` [ 36/48] drm/i915: rip out the PM_IIR WARN Ben Hutchings
2012-07-01 17:20 ` [ 37/48] drm/i915: Fix eDP blank screen after S3 resume on HP desktops Ben Hutchings
2012-07-01 17:20 ` [ 38/48] SCSI & usb-storage: add try_rc_10_first flag Ben Hutchings
2012-07-02 7:10 ` Hans de Goede
2012-07-02 18:52 ` Linus Torvalds
2012-07-02 20:39 ` James Bottomley
2012-07-02 22:23 ` Linus Torvalds
2012-07-03 0:41 ` Matthew Wilcox
2012-07-03 6:18 ` James Bottomley
2012-07-03 15:49 ` Alan Stern
2012-07-03 17:32 ` Matthew Wilcox
2012-07-03 19:50 ` Alan Stern
2012-07-03 20:07 ` James Bottomley
2012-07-03 20:25 ` Alan Stern
2012-07-03 20:35 ` Matthew Wilcox
2012-07-05 21:40 ` Alan Stern
2012-07-06 3:05 ` Matthew Wilcox
2012-07-06 14:00 ` Alan Stern
2012-07-04 4:39 ` Ben Hutchings
2012-07-01 17:20 ` [ 39/48] PM / Sleep: Prevent waiting forever on asynchronous suspend after abort Ben Hutchings
2012-07-01 17:20 ` [ 40/48] x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM Ben Hutchings
2012-07-01 17:20 ` [ 41/48] stable: Allow merging of backports for serious user-visible performance issues Ben Hutchings
2012-07-01 17:20 ` [ 42/48] ALSA: hda - Add Realtek ALC280 codec support Ben Hutchings
2012-07-01 17:20 ` [ 43/48] USB: option: Add USB ID for Novatel Ovation MC551 Ben Hutchings
2012-07-01 17:20 ` [ 44/48] USB: CP210x Add 10 Device IDs Ben Hutchings
2012-07-01 17:20 ` [ 45/48] xen/netfront: teardown the device before unregistering it Ben Hutchings
2012-07-01 17:20 ` [ 46/48] can: flexcan: use be32_to_cpup to handle the value of dt entry Ben Hutchings
2012-07-01 17:20 ` [ 47/48] acpi_pad: fix power_saving thread deadlock Ben Hutchings
2012-07-01 17:20 ` [ 48/48] batman-adv: only drop packets of known wifi clients Ben Hutchings
2012-07-01 19:11 ` [ 00/48] 3.2.22-stable review Ben Hutchings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120701172007.137742223@decadent.org.uk \
--to=ben@decadent.org.uk \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=gong.chen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@redhat.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox