All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Chen Gong <gong.chen@linux.intel.com>,
	Mauro Carvalho Chehab <mchehab@redhat.com>
Subject: [ 03/48] edac: avoid mce decoding crash after edac driver unloaded
Date: Sun, 01 Jul 2012 18:20:09 +0100	[thread overview]
Message-ID: <20120701172007.137742223@decadent.org.uk> (raw)
In-Reply-To: <20120701172006.535271340@decadent.org.uk>

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chen Gong <gong.chen@linux.intel.com>

commit e35fca4791fcdd43dc1fd769797df40c562ab491 upstream.

Some edac drivers register themselves as mce decoders via
notifier_chain. But in current notifier_chain implementation logic,
it doesn't accept same notifier registered twice. If so, it will be
wrong when adding/removing the element from the list. For example,
on one SandyBridge platform, remove module sb_edac and then trigger
one error, it will hit oops because it has no mce decoder registered
but related notifier_chain still points to an invalid callback
function. Here is an example:

Call Trace:
 [<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff8102b936>] mce_log+0x46/0x180
 [<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60
 [<ffffffff812e19d2>] ghes_do_proc+0x192/0x210
 [<ffffffff812e2066>] ghes_proc+0x46/0x70
 [<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80
 [<ffffffff8150ef05>] notifier_call_chain+0x55/0x80
 [<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80
 [<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23
 [<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b
 [<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b
 [<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f
 [<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36
 [<ffffffff81069dc2>] process_one_work+0x132/0x450
 [<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0
 [<ffffffff8106ba50>] ? manage_workers+0x120/0x120
 [<ffffffff81070aee>] kthread+0x9e/0xb0
 [<ffffffff81514724>] kernel_thread_helper+0x4/0x10
 [<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff81514720>] ? gs_change+0x13/0x13
Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41
0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b
79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41
RIP  [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80
 RSP <ffff88042868fb20>
CR2: ffffffffa01af838
---[ end trace 0100930068e73e6f ]---
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff810705b0>] kthread_data+0x10/0x20
PGD 1a0d067 PUD 1a0e067 PMD 0
Oops: 0000 [#2] SMP

Only i7core_edac and sb_edac have such issues because they have more
than one memory controller which means they have to register mce
decoder many times.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[bwh: Backported to 3.2: drivers call atomic_notifier_chain_{,un}register()
 directly]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1932,12 +1932,6 @@
 	if (mce->bank != 8)
 		return NOTIFY_DONE;
 
-#ifdef CONFIG_SMP
-	/* Only handle if it is the right mc controller */
-	if (mce->socketid != pvt->i7core_dev->socket)
-		return NOTIFY_DONE;
-#endif
-
 	smp_rmb();
 	if ((pvt->mce_out + 1) % MCE_LOG_LEN == pvt->mce_in) {
 		smp_wmb();
@@ -2234,8 +2228,6 @@
 	if (pvt->enable_scrub)
 		disable_sdram_scrub_setting(mci);
 
-	atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &i7_mce_dec);
-
 	/* Disable EDAC polling */
 	i7core_pci_ctl_release(pvt);
 
@@ -2336,8 +2328,6 @@
 	/* DCLK for scrub rate setting */
 	pvt->dclk_freq = get_dclk_freq();
 
-	atomic_notifier_chain_register(&x86_mce_decoder_chain, &i7_mce_dec);
-
 	return 0;
 
 fail0:
@@ -2481,8 +2471,10 @@
 
 	pci_rc = pci_register_driver(&i7core_driver);
 
-	if (pci_rc >= 0)
+	if (pci_rc >= 0) {
+		atomic_notifier_chain_register(&x86_mce_decoder_chain, &i7_mce_dec);
 		return 0;
+	}
 
 	i7core_printk(KERN_ERR, "Failed to register device with error %d.\n",
 		      pci_rc);
@@ -2498,6 +2490,7 @@
 {
 	debugf2("MC: " __FILE__ ": %s()\n", __func__);
 	pci_unregister_driver(&i7core_driver);
+	atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &i7_mce_dec);
 }
 
 module_init(i7core_init);
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -1661,9 +1661,6 @@
 	debugf0("MC: " __FILE__ ": %s(): mci = %p, dev = %p\n",
 		__func__, mci, &sbridge_dev->pdev[0]->dev);
 
-	atomic_notifier_chain_unregister(&x86_mce_decoder_chain,
-					 &sbridge_mce_dec);
-
 	/* Remove MC sysfs nodes */
 	edac_mc_del_mc(mci->dev);
 
@@ -1731,8 +1728,6 @@
 		goto fail0;
 	}
 
-	atomic_notifier_chain_register(&x86_mce_decoder_chain,
-				       &sbridge_mce_dec);
 	return 0;
 
 fail0:
@@ -1861,8 +1856,10 @@
 
 	pci_rc = pci_register_driver(&sbridge_driver);
 
-	if (pci_rc >= 0)
+	if (pci_rc >= 0) {
+		atomic_notifier_chain_register(&x86_mce_decoder_chain, &sbridge_mce_dec);
 		return 0;
+	}
 
 	sbridge_printk(KERN_ERR, "Failed to register device with error %d.\n",
 		      pci_rc);
@@ -1878,6 +1875,7 @@
 {
 	debugf2("MC: " __FILE__ ": %s()\n", __func__);
 	pci_unregister_driver(&sbridge_driver);
+	atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &sbridge_mce_dec);
 }
 
 module_init(sbridge_init);



  parent reply	other threads:[~2012-07-01 18:16 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-01 17:20 [ 00/48] 3.2.22-stable review Ben Hutchings
2012-07-01 17:20 ` [ 01/48] staging:iio:ad7606: Re-add missing scale attribute Ben Hutchings
2012-07-01 17:20 ` [ 02/48] Tools: hv: verify origin of netlink connector message Ben Hutchings
2012-07-01 17:20 ` Ben Hutchings [this message]
2012-07-01 17:20 ` [ 04/48] hwrng: atmel-rng - fix data valid check Ben Hutchings
2012-07-01 17:20 ` [ 05/48] staging: r8712u: Add new USB IDs Ben Hutchings
2012-07-01 17:20 ` [ 06/48] hwmon: (applesmc) Limit key length in warning messages Ben Hutchings
2012-07-01 17:20 ` [ 07/48] mm: fix slab->page _count corruption when using slub Ben Hutchings
2012-07-02 23:46   ` Herton Ronaldo Krzesinski
2012-07-02 23:56     ` Herton Ronaldo Krzesinski
2012-07-03  1:17       ` Herton Ronaldo Krzesinski
2012-07-03 20:19         ` Pravin Shelar
2012-07-04  4:36     ` Ben Hutchings
2012-07-01 17:20 ` [ 08/48] mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition Ben Hutchings
2012-07-01 17:20 ` [ 09/48] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE Ben Hutchings
2012-07-01 17:20 ` [ 10/48] nilfs2: ensure proper cache clearing for gc-inodes Ben Hutchings
2012-07-01 17:20 ` [ 11/48] mm: correctly synchronize rss-counters at exit/exec Ben Hutchings
2012-07-01 19:02   ` Hugh Dickins
2012-07-02  2:01     ` Ben Hutchings
2012-07-02 16:46       ` Oleg Nesterov
2012-07-04  4:31         ` Ben Hutchings
2012-07-01 17:20 ` [ 12/48] drm/i915: Finish any pending operations on the framebuffer before disabling Ben Hutchings
2012-07-01 17:20 ` [ 13/48] drm/i915: Remove use of the autoreported ringbuffer HEAD position Ben Hutchings
2012-07-01 17:20 ` [ 14/48] e1000e: Disable ASPM L1 on 82574 Ben Hutchings
2012-07-01 17:20 ` [ 15/48] e1000e: Remove special case for 82573/82574 ASPM L1 disablement Ben Hutchings
2012-07-01 22:59   ` Jonathan Nieder
2012-07-02  7:21     ` Chris Boot
2012-07-01 17:20 ` [ 16/48] drm/i915: Do the fallback non-IRQ wait in ring throttle, too Ben Hutchings
2012-07-01 17:20 ` [ 17/48] staging:rts_pstor:Fix possible panic by NULL pointer dereference Ben Hutchings
2012-07-01 17:20 ` [ 18/48] [media] gspca-core: Fix buffers staying in queued state after a stream_off Ben Hutchings
2012-07-01 17:20 ` [ 19/48] [media] smsusb: add autodetection support for USB ID 2040:f5a0 Ben Hutchings
2012-07-01 17:20 ` [ 20/48] drm/edid: dont return stack garbage from supports_rb Ben Hutchings
2012-07-01 17:20 ` [ 21/48] drm/nouveau/fbcon: using nv_two_heads is not a good idea Ben Hutchings
2012-07-01 17:20 ` [ 22/48] dm thin: reinstate missing mempool_free in cell_release_singleton Ben Hutchings
2012-07-01 17:20 ` [ 23/48] ath9k: Fix a WARNING on suspend/resume with IBSS Ben Hutchings
2012-07-01 17:20 ` [ 24/48] cfg80211: fix potential deadlock in regulatory Ben Hutchings
2012-07-01 17:20 ` [ 25/48] ath9k: Fix softlockup in AR9485 Ben Hutchings
2012-07-01 17:20 ` [ 26/48] can: c_can: precedence error in c_can_chip_config() Ben Hutchings
2012-07-01 17:20 ` [ 27/48] ath9k: fix a tx rate duration calculation bug Ben Hutchings
2012-07-01 17:20 ` [ 28/48] batman-adv: fix skb->data assignment Ben Hutchings
2012-07-01 17:20 ` [ 29/48] ARM: SAMSUNG: Should check for IS_ERR(clk) instead of NULL Ben Hutchings
2012-07-01 17:20 ` [ 30/48] ath9k_hw: avoid possible infinite loop in ar9003_get_pll_sqsum_dvc Ben Hutchings
2012-07-01 17:20 ` [ 31/48] iwlwifi: remove log_event debugfs file debugging is disabled Ben Hutchings
2012-07-01 17:20 ` [ 32/48] ARM: SAMSUNG: Fix for S3C2412 EBI memory mapping Ben Hutchings
2012-07-01 17:20 ` [ 33/48] USB: option: add id for Cellient MEN-200 Ben Hutchings
2012-07-01 17:20 ` [ 34/48] oprofile: perf: use NR_CPUS instead or nr_cpumask_bits for static array Ben Hutchings
2012-07-01 17:20 ` [ 35/48] drm/i915: Refactor the deferred PM_IIR handling into a single function Ben Hutchings
2012-07-01 17:20 ` [ 36/48] drm/i915: rip out the PM_IIR WARN Ben Hutchings
2012-07-01 17:20 ` [ 37/48] drm/i915: Fix eDP blank screen after S3 resume on HP desktops Ben Hutchings
2012-07-01 17:20 ` [ 38/48] SCSI & usb-storage: add try_rc_10_first flag Ben Hutchings
2012-07-02  7:10   ` Hans de Goede
2012-07-02 18:52     ` Linus Torvalds
2012-07-02 20:39       ` James Bottomley
2012-07-02 20:39         ` James Bottomley
2012-07-02 22:23         ` Linus Torvalds
2012-07-03  0:41           ` Matthew Wilcox
2012-07-03  6:18             ` James Bottomley
2012-07-03 15:49               ` Alan Stern
2012-07-03 17:32                 ` Matthew Wilcox
2012-07-03 19:50                   ` Alan Stern
2012-07-03 20:07                     ` James Bottomley
2012-07-03 20:25                       ` Alan Stern
2012-07-03 20:35                       ` Matthew Wilcox
2012-07-05 21:40                         ` Alan Stern
2012-07-06  3:05                           ` Matthew Wilcox
2012-07-06 14:00                             ` Alan Stern
2012-07-04  4:39     ` Ben Hutchings
2012-07-01 17:20 ` [ 39/48] PM / Sleep: Prevent waiting forever on asynchronous suspend after abort Ben Hutchings
2012-07-01 17:20 ` [ 40/48] x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM Ben Hutchings
2012-07-01 17:20 ` [ 41/48] stable: Allow merging of backports for serious user-visible performance issues Ben Hutchings
2012-07-01 17:20 ` [ 42/48] ALSA: hda - Add Realtek ALC280 codec support Ben Hutchings
2012-07-01 17:20 ` [ 43/48] USB: option: Add USB ID for Novatel Ovation MC551 Ben Hutchings
2012-07-01 17:20 ` [ 44/48] USB: CP210x Add 10 Device IDs Ben Hutchings
2012-07-01 17:20 ` [ 45/48] xen/netfront: teardown the device before unregistering it Ben Hutchings
2012-07-01 17:20 ` [ 46/48] can: flexcan: use be32_to_cpup to handle the value of dt entry Ben Hutchings
2012-07-01 17:20 ` [ 47/48] acpi_pad: fix power_saving thread deadlock Ben Hutchings
2012-07-01 17:20 ` [ 48/48] batman-adv: only drop packets of known wifi clients Ben Hutchings
2012-07-01 19:11 ` [ 00/48] 3.2.22-stable review Ben Hutchings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120701172007.137742223@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=gong.chen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.