All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: akpm@linux-foundation.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Robin Holt" <holt@sgi.com>,
	"Xiao Guangrong" <xiaoguangrong@linux.vnet.ibm.com>
Subject: [61/94] mm: mmu_notifier: re-fix freed page still mapped in secondary MMU
Date: Tue, 28 May 2013 04:49:53 +0100	[thread overview]
Message-ID: <lsq.1369712993.897650264@decadent.org.uk> (raw)
In-Reply-To: <lsq.1369712992.755341692@decadent.org.uk>

3.2.46-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

commit d34883d4e35c0a994e91dd847a82b4c9e0c31d83 upstream.

Commit 751efd8610d3 ("mmu_notifier_unregister NULL Pointer deref and
multiple ->release()") breaks the fix 3ad3d901bbcf ("mm: mmu_notifier:
fix freed page still mapped in secondary MMU").

Since hlist_for_each_entry_rcu() is changed now, we can not revert that
patch directly, so this patch reverts the commit and simply fix the bug
spotted by that patch

This bug spotted by commit 751efd8610d3 is:

    There is a race condition between mmu_notifier_unregister() and
    __mmu_notifier_release().

    Assume two tasks, one calling mmu_notifier_unregister() as a result
    of a filp_close() ->flush() callout (task A), and the other calling
    mmu_notifier_release() from an mmput() (task B).

                        A                               B
    t1                                            srcu_read_lock()
    t2            if (!hlist_unhashed())
    t3                                            srcu_read_unlock()
    t4            srcu_read_lock()
    t5                                            hlist_del_init_rcu()
    t6                                            synchronize_srcu()
    t7            srcu_read_unlock()
    t8            hlist_del_rcu()  <--- NULL pointer deref.

This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu.

The another issue spotted in the commit is "multiple ->release()
callouts", we needn't care it too much because it is really rare (e.g,
can not happen on kvm since mmu-notify is unregistered after
exit_mmap()) and the later call of multiple ->release should be fast
since all the pages have already been released by the first call.
Anyway, this issue should be fixed in a separate patch.

-stable suggestions: Any version that has commit 751efd8610d3 need to be
backported.  I find the oldest version has this commit is 3.0-stable.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: hlist_for_each_entry_rcu() still requires the
 struct hlist_node pointer]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 mm/mmu_notifier.c |   79 ++++++++++++++++++++++++++---------------------------
 1 file changed, 39 insertions(+), 40 deletions(-)

--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -37,51 +37,48 @@ static struct srcu_struct srcu;
 void __mmu_notifier_release(struct mm_struct *mm)
 {
 	struct mmu_notifier *mn;
+	struct hlist_node *n;
 	int id;
 
 	/*
-	 * srcu_read_lock() here will block synchronize_srcu() in
-	 * mmu_notifier_unregister() until all registered
-	 * ->release() callouts this function makes have
-	 * returned.
+	 * SRCU here will block mmu_notifier_unregister until
+	 * ->release returns.
 	 */
 	id = srcu_read_lock(&srcu);
+	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist)
+		/*
+		 * If ->release runs before mmu_notifier_unregister it must be
+		 * handled, as it's the only way for the driver to flush all
+		 * existing sptes and stop the driver from establishing any more
+		 * sptes before all the pages in the mm are freed.
+		 */
+		if (mn->ops->release)
+			mn->ops->release(mn, mm);
+	srcu_read_unlock(&srcu, id);
+
 	spin_lock(&mm->mmu_notifier_mm->lock);
 	while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
 		mn = hlist_entry(mm->mmu_notifier_mm->list.first,
 				 struct mmu_notifier,
 				 hlist);
-
 		/*
-		 * Unlink.  This will prevent mmu_notifier_unregister()
-		 * from also making the ->release() callout.
+		 * We arrived before mmu_notifier_unregister so
+		 * mmu_notifier_unregister will do nothing other than to wait
+		 * for ->release to finish and for mmu_notifier_unregister to
+		 * return.
 		 */
 		hlist_del_init_rcu(&mn->hlist);
-		spin_unlock(&mm->mmu_notifier_mm->lock);
-
-		/*
-		 * Clear sptes. (see 'release' description in mmu_notifier.h)
-		 */
-		if (mn->ops->release)
-			mn->ops->release(mn, mm);
-
-		spin_lock(&mm->mmu_notifier_mm->lock);
 	}
 	spin_unlock(&mm->mmu_notifier_mm->lock);
 
 	/*
-	 * All callouts to ->release() which we have done are complete.
-	 * Allow synchronize_srcu() in mmu_notifier_unregister() to complete
-	 */
-	srcu_read_unlock(&srcu, id);
-
-	/*
-	 * mmu_notifier_unregister() may have unlinked a notifier and may
-	 * still be calling out to it.	Additionally, other notifiers
-	 * may have been active via vmtruncate() et. al. Block here
-	 * to ensure that all notifier callouts for this mm have been
-	 * completed and the sptes are really cleaned up before returning
-	 * to exit_mmap().
+	 * synchronize_srcu here prevents mmu_notifier_release from returning to
+	 * exit_mmap (which would proceed with freeing all pages in the mm)
+	 * until the ->release method returns, if it was invoked by
+	 * mmu_notifier_unregister.
+	 *
+	 * The mmu_notifier_mm can't go away from under us because one mm_count
+	 * is held by exit_mmap.
 	 */
 	synchronize_srcu(&srcu);
 }
@@ -302,31 +299,34 @@ void mmu_notifier_unregister(struct mmu_
 {
 	BUG_ON(atomic_read(&mm->mm_count) <= 0);
 
-	spin_lock(&mm->mmu_notifier_mm->lock);
 	if (!hlist_unhashed(&mn->hlist)) {
+		/*
+		 * SRCU here will force exit_mmap to wait for ->release to
+		 * finish before freeing the pages.
+		 */
 		int id;
 
+		id = srcu_read_lock(&srcu);
 		/*
-		 * Ensure we synchronize up with __mmu_notifier_release().
+		 * exit_mmap will block in mmu_notifier_release to guarantee
+		 * that ->release is called before freeing the pages.
 		 */
-		id = srcu_read_lock(&srcu);
-
-		hlist_del_rcu(&mn->hlist);
-		spin_unlock(&mm->mmu_notifier_mm->lock);
-
 		if (mn->ops->release)
 			mn->ops->release(mn, mm);
+		srcu_read_unlock(&srcu, id);
 
+		spin_lock(&mm->mmu_notifier_mm->lock);
 		/*
-		 * Allow __mmu_notifier_release() to complete.
+		 * Can not use list_del_rcu() since __mmu_notifier_release
+		 * can delete it before we hold the lock.
 		 */
-		srcu_read_unlock(&srcu, id);
-	} else
+		hlist_del_init_rcu(&mn->hlist);
 		spin_unlock(&mm->mmu_notifier_mm->lock);
+	}
 
 	/*
-	 * Wait for any running method to finish, including ->release() if it
-	 * was run by __mmu_notifier_release() instead of us.
+	 * Wait for any running method to finish, of course including
+	 * ->release if it was run by mmu_notifier_relase instead of us.
 	 */
 	synchronize_srcu(&srcu);
 


  parent reply	other threads:[~2013-05-28  4:17 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-28  3:49 [00/94] 3.2.46-rc1 review Ben Hutchings
2013-05-28  3:49 ` [22/94] hp_accel: Ignore the error from lis3lv02d_poweron() at resume Ben Hutchings
2013-05-28  3:49 ` [13/94] autofs - remove autofs dentry mount check Ben Hutchings
2013-05-28  3:49 ` [11/94] ext4: limit group search loop for non-extent files Ben Hutchings
2013-05-28  3:49 ` [29/94] staging: vt6656: use free_netdev instead of kfree Ben Hutchings
2013-05-28  3:49 ` [35/94] timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE Ben Hutchings
2013-05-28  3:49 ` [09/94] nfsd4: don't allow owner override on 4.1 CLAIM_FH opens Ben Hutchings
2013-05-28  3:49 ` [88/94] macvlan: fix passthru mode race between dev removal and rx path Ben Hutchings
2013-05-28  3:49 ` [24/94] dm snapshot: fix error return code in snapshot_ctr Ben Hutchings
2013-05-28  3:49 ` [94/94] um: Serve io_remap_pfn_range() Ben Hutchings
2013-05-28  3:49 ` [56/94] perf: net_dropmonitor: Fix trace parameter order Ben Hutchings
2013-05-28  3:49 ` [10/94] net/eth/ibmveth: Fixup retrieval of MAC address Ben Hutchings
2013-05-28  3:49 ` [76/94] x86, efivars: firmware bug workarounds should be in platform code Ben Hutchings
2013-05-28  3:49 ` [30/94] hwmon: fix error return code in abituguru_probe() Ben Hutchings
2013-05-28  3:49 ` [19/94] mwifiex: fix memory leak issue when driver unload Ben Hutchings
2013-05-28  3:49 ` [42/94] ipmi: ipmi_devintf: compat_ioctl method fails to take ipmi_mutex Ben Hutchings
2013-05-28  3:49 ` [52/94] USB: cxacru: potential underflow in cxacru_cm_get_array() Ben Hutchings
2013-05-28  3:49 ` [80/94] tcp: force a dst refcount when prequeue packet Ben Hutchings
2013-05-28  3:49 ` [54/94] sunrpc: clarify comments on rpc_make_runnable Ben Hutchings
2013-05-28 10:14   ` Jeff Layton
2013-05-28 10:41     ` Luis Henriques
2013-05-28 10:48       ` Jeff Layton
2013-05-28  3:49 ` [14/94] ALSA: HDA: Fix Oops caused by dereference NULL pointer Ben Hutchings
2013-05-28  3:49 ` [17/94] B43: Handle DMA RX descriptor underrun Ben Hutchings
2013-05-28  3:49 ` [65/94] drivers/block/brd.c: fix brd_lookup_page() race Ben Hutchings
2013-05-28  3:49 ` [44/94] usb: option: Add Telewell TW-LTE 4G Ben Hutchings
2013-05-28  3:49 ` [58/94] ACPI / video: Add "Asus UL30A" to ACPI video detect blacklist Ben Hutchings
2013-05-28  3:49 ` [31/94] Kirkwood: Enable PCIe port 1 on QNAP TS-11x/TS-21x Ben Hutchings
2013-05-28  3:49 ` [92/94] [media] mantis: fix silly crash case Ben Hutchings
2013-05-28  3:49 ` [40/94] usermodehelper: check subprocess_info->path != NULL Ben Hutchings
2013-05-28  3:49 ` [38/94] USB: UHCI: fix for suspend of virtual HP controller Ben Hutchings
2013-05-28  3:49 ` [50/94] virtio_console: fix uapi header Ben Hutchings
2013-05-28  3:49 ` [81/94] 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) Ben Hutchings
2013-05-28  3:49   ` Ben Hutchings
2013-05-28  3:49 ` [43/94] USB: reset resume quirk needed by a hub Ben Hutchings
2013-05-28  3:49 ` [83/94] bridge: fix race with topology change timer Ben Hutchings
2013-05-28  3:49 ` [90/94] xfrm6: release dev before returning error Ben Hutchings
2013-05-28  3:49   ` Ben Hutchings
2013-05-28  3:49 ` [37/94] USB: xHCI: override bogus bulk wMaxPacketSize values Ben Hutchings
2013-05-28  3:49 ` [59/94] fat: fix possible overflow for fat_clusters Ben Hutchings
2013-05-28  3:49 ` [82/94] net_sched: act_ipt forward compat with xtables Ben Hutchings
2013-05-28  3:49 ` [51/94] ARM: plat-orion: Fix num_resources and id for ge10 and ge11 Ben Hutchings
2013-05-28  3:49 ` [70/94] xhci: Don't warn on empty ring for suspended devices Ben Hutchings
2013-05-28  3:49 ` [66/94] nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary Ben Hutchings
2013-05-28  3:49 ` [26/94] tick: Cleanup NOHZ per cpu data on cpu down Ben Hutchings
2013-05-28  3:49 ` [33/94] powerpc/pseries: Fix partition migration hang in stop_topology_update Ben Hutchings
2013-05-28  3:49 ` [53/94] TTY: Fix tty miss restart after we turn off flow-control Ben Hutchings
2013-05-28  3:49 ` [87/94] if_cablemodem.h: Add parenthesis around ioctl macros Ben Hutchings
2013-05-28  3:49 ` [08/94] ath9k: fix key allocation error handling for powersave keys Ben Hutchings
2013-05-28  3:49 ` [21/94] cifs: only set ops for inodes in I_NEW state Ben Hutchings
2013-05-28  3:49 ` [49/94] btrfs: don't stop searching after encountering the wrong item Ben Hutchings
2013-05-28  3:49 ` [68/94] ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap() Ben Hutchings
2013-05-28  3:49 ` [18/94] mwifiex: clear is_suspended flag when interrupt is received early Ben Hutchings
2013-05-28  3:49 ` [86/94] 3c59x: fix PCI resource management Ben Hutchings
2013-05-28  3:49 ` [79/94] x86,efi: Implement efi_no_storage_paranoia parameter Ben Hutchings
2013-05-28  3:49 ` [06/94] x86: Eliminate irq_mis_count counted in arch_irq_stat Ben Hutchings
2013-05-28  3:49 ` [05/94] mmc: atmel-mci: pio hang on block errors Ben Hutchings
2013-05-28  3:49 ` [04/94] mfd: adp5520: Restore mode bits on resume Ben Hutchings
2013-05-28  3:49 ` [73/94] ipvs: ip_vs_sip_fill_param() BUG: bad check of return value Ben Hutchings
2013-05-28  3:49 ` [25/94] dm bufio: avoid a possible __vmalloc deadlock Ben Hutchings
2013-05-28  3:49 ` [93/94] staging: comedi: prevent auto-unconfig of manually configured devices Ben Hutchings
2013-05-28  3:49 ` [32/94] avr32: fix relocation check for signed 18-bit offset Ben Hutchings
2013-05-28  3:49 ` [28/94] drm/radeon: check incoming cliprects pointer Ben Hutchings
2013-05-28  3:49 ` Ben Hutchings [this message]
2013-05-28  3:49 ` [36/94] tg3: Skip powering down function 0 on certain serdes devices Ben Hutchings
2013-05-28  3:49 ` [39/94] tracing: Fix leaks of filter preds Ben Hutchings
2013-05-28  3:49 ` [16/94] ACPICA: Fix possible buffer overflow during a field unit read operation Ben Hutchings
2013-05-28  3:49 ` [07/94] ASoC: wm8994: missing break in wm8994_aif3_hw_params() Ben Hutchings
2013-05-28  3:49 ` [15/94] iscsi-target: Fix processing of OOO commands Ben Hutchings
2013-05-28  3:49 ` [34/94] powerpc: Bring all threads online prior to migration/hibernation Ben Hutchings
2013-05-28  3:49 ` [01/94] Revert "drm/i915: Fix detection of base of stolen memory" Ben Hutchings
2013-05-28  3:49 ` [45/94] USB: Blacklisted Cinterion's PLxx WWAN Interface Ben Hutchings
2013-05-28  3:49 ` [64/94] mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer Ben Hutchings
2013-05-28  3:49 ` [78/94] x86,efi: Check max_size only if it is non-zero Ben Hutchings
2013-05-28  3:49 ` [62/94] rapidio/tsi721: fix bug in MSI interrupt handling Ben Hutchings
2013-05-28  3:49 ` [41/94] drivers/char/ipmi: memcpy, need additional 2 bytes to avoid memory overflow Ben Hutchings
2013-05-28  3:49 ` [60/94] wait: fix false timeouts when using wait_event_timeout() Ben Hutchings
2013-05-28  3:49 ` [03/94] mmc: core: Fix bit width test failing on old eMMC cards Ben Hutchings
2013-05-28  3:49 ` [69/94] mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas Ben Hutchings
2013-05-28  3:49 ` [84/94] packet: tpacket_v3: do not trigger bug() on wrong header status Ben Hutchings
2013-05-28  3:49 ` [47/94] i2c: designware: always clear interrupts before enabling them Ben Hutchings
2013-05-28  3:49 ` [75/94] drbd: fix for deadlock when using automatic split-brain-recovery Ben Hutchings
2013-05-28  3:49 ` [48/94] USB: ftdi_sio: Add support for Newport CONEX motor drivers Ben Hutchings
2013-05-28  3:49 ` [57/94] perf: net_dropmonitor: Fix symbol-relative addresses Ben Hutchings
2013-05-28  3:49 ` [63/94] mm compaction: fix of improper cache flush in migration code Ben Hutchings
2013-05-28  3:49 ` [55/94] SUNRPC: Prevent an rpc_task wakeup race Ben Hutchings
2013-05-28  3:49 ` [20/94] mwifiex: fix setting of multicast filter Ben Hutchings
2013-05-28  3:49 ` [91/94] drivers/rtc/rtc-pcf2123.c: fix error return code in pcf2123_probe() Ben Hutchings
2013-05-28  3:49 ` [74/94] pch_dma: Use GFP_ATOMIC because called from interrupt context Ben Hutchings
2013-05-28  3:49 ` [89/94] ipv6: do not clear pinet6 field Ben Hutchings
2013-05-28  3:49 ` [12/94] xen/vcpu/pvhvm: Fix vcpu hotplugging hanging Ben Hutchings
2013-05-28  3:49 ` [85/94] 3c59x: fix freeing nonexistent resource on driver unload Ben Hutchings
2013-05-28  3:49 ` [46/94] USB: option: add device IDs for Dell 5804 (Novatel E371) WWAN card Ben Hutchings
2013-05-28  3:49 ` [71/94] sched/debug: Limit sd->*_idx range on sysctl Ben Hutchings
2013-05-28  3:49 ` [67/94] random: fix accounting race condition with lockless irq entropy_count update Ben Hutchings
2013-05-28  3:49 ` [27/94] ACPI / EC: Restart transaction even when the IBF flag set Ben Hutchings
2013-05-28  3:49 ` [72/94] sched/debug: Fix sd->*_idx limit range avoiding overflow Ben Hutchings
2013-05-28  3:49 ` [77/94] efi: Export efi_query_variable_store() for efivars.ko Ben Hutchings
2013-05-28  3:49 ` [02/94] mmc: at91/avr32/atmel-mci: fix DMA-channel leak on module unload Ben Hutchings
2013-05-28  3:49 ` [23/94] KVM: VMX: fix halt emulation while emulating invalid guest sate Ben Hutchings
2013-05-28  4:25 ` [00/94] 3.2.46-rc1 review Ben Hutchings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lsq.1369712993.897650264@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=holt@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.