public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>,
	Robin Holt <holt@sgi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 24/44] mm: mmu_notifier: re-fix freed page still mapped in secondary MMU
Date: Wed,  5 Jun 2013 14:12:22 -0700	[thread overview]
Message-ID: <20130605211224.406437292@linuxfoundation.org> (raw)
In-Reply-To: <20130605211221.858177087@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>

commit d34883d4e35c0a994e91dd847a82b4c9e0c31d83 upstream.

Commit 751efd8610d3 ("mmu_notifier_unregister NULL Pointer deref and
multiple ->release()") breaks the fix 3ad3d901bbcf ("mm: mmu_notifier:
fix freed page still mapped in secondary MMU").

Since hlist_for_each_entry_rcu() is changed now, we can not revert that
patch directly, so this patch reverts the commit and simply fix the bug
spotted by that patch

This bug spotted by commit 751efd8610d3 is:

    There is a race condition between mmu_notifier_unregister() and
    __mmu_notifier_release().

    Assume two tasks, one calling mmu_notifier_unregister() as a result
    of a filp_close() ->flush() callout (task A), and the other calling
    mmu_notifier_release() from an mmput() (task B).

                        A                               B
    t1                                            srcu_read_lock()
    t2            if (!hlist_unhashed())
    t3                                            srcu_read_unlock()
    t4            srcu_read_lock()
    t5                                            hlist_del_init_rcu()
    t6                                            synchronize_srcu()
    t7            srcu_read_unlock()
    t8            hlist_del_rcu()  <--- NULL pointer deref.

This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu.

The another issue spotted in the commit is "multiple ->release()
callouts", we needn't care it too much because it is really rare (e.g,
can not happen on kvm since mmu-notify is unregistered after
exit_mmap()) and the later call of multiple ->release should be fast
since all the pages have already been released by the first call.
Anyway, this issue should be fixed in a separate patch.

-stable suggestions: Any version that has commit 751efd8610d3 need to be
backported.  I find the oldest version has this commit is 3.0-stable.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/mmu_notifier.c |   80 +++++++++++++++++++++++++++---------------------------
 1 file changed, 40 insertions(+), 40 deletions(-)

--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -37,51 +37,48 @@ static struct srcu_struct srcu;
 void __mmu_notifier_release(struct mm_struct *mm)
 {
 	struct mmu_notifier *mn;
+	struct hlist_node *node;
 	int id;
 
 	/*
-	 * srcu_read_lock() here will block synchronize_srcu() in
-	 * mmu_notifier_unregister() until all registered
-	 * ->release() callouts this function makes have
-	 * returned.
+	 * SRCU here will block mmu_notifier_unregister until
+	 * ->release returns.
 	 */
 	id = srcu_read_lock(&srcu);
+	hlist_for_each_entry_rcu(mn, node, &mm->mmu_notifier_mm->list, hlist)
+		/*
+		 * If ->release runs before mmu_notifier_unregister it must be
+		 * handled, as it's the only way for the driver to flush all
+		 * existing sptes and stop the driver from establishing any more
+		 * sptes before all the pages in the mm are freed.
+		 */
+		if (mn->ops->release)
+			mn->ops->release(mn, mm);
+	srcu_read_unlock(&srcu, id);
+
 	spin_lock(&mm->mmu_notifier_mm->lock);
 	while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
 		mn = hlist_entry(mm->mmu_notifier_mm->list.first,
 				 struct mmu_notifier,
 				 hlist);
-
 		/*
-		 * Unlink.  This will prevent mmu_notifier_unregister()
-		 * from also making the ->release() callout.
+		 * We arrived before mmu_notifier_unregister so
+		 * mmu_notifier_unregister will do nothing other than to wait
+		 * for ->release to finish and for mmu_notifier_unregister to
+		 * return.
 		 */
 		hlist_del_init_rcu(&mn->hlist);
-		spin_unlock(&mm->mmu_notifier_mm->lock);
-
-		/*
-		 * Clear sptes. (see 'release' description in mmu_notifier.h)
-		 */
-		if (mn->ops->release)
-			mn->ops->release(mn, mm);
-
-		spin_lock(&mm->mmu_notifier_mm->lock);
 	}
 	spin_unlock(&mm->mmu_notifier_mm->lock);
 
 	/*
-	 * All callouts to ->release() which we have done are complete.
-	 * Allow synchronize_srcu() in mmu_notifier_unregister() to complete
-	 */
-	srcu_read_unlock(&srcu, id);
-
-	/*
-	 * mmu_notifier_unregister() may have unlinked a notifier and may
-	 * still be calling out to it.	Additionally, other notifiers
-	 * may have been active via vmtruncate() et. al. Block here
-	 * to ensure that all notifier callouts for this mm have been
-	 * completed and the sptes are really cleaned up before returning
-	 * to exit_mmap().
+	 * synchronize_srcu here prevents mmu_notifier_release from returning to
+	 * exit_mmap (which would proceed with freeing all pages in the mm)
+	 * until the ->release method returns, if it was invoked by
+	 * mmu_notifier_unregister.
+	 *
+	 * The mmu_notifier_mm can't go away from under us because one mm_count
+	 * is held by exit_mmap.
 	 */
 	synchronize_srcu(&srcu);
 }
@@ -302,31 +299,34 @@ void mmu_notifier_unregister(struct mmu_
 {
 	BUG_ON(atomic_read(&mm->mm_count) <= 0);
 
-	spin_lock(&mm->mmu_notifier_mm->lock);
 	if (!hlist_unhashed(&mn->hlist)) {
+		/*
+		 * SRCU here will force exit_mmap to wait for ->release to
+		 * finish before freeing the pages.
+		 */
 		int id;
 
+		id = srcu_read_lock(&srcu);
 		/*
-		 * Ensure we synchronize up with __mmu_notifier_release().
+		 * exit_mmap will block in mmu_notifier_release to guarantee
+		 * that ->release is called before freeing the pages.
 		 */
-		id = srcu_read_lock(&srcu);
-
-		hlist_del_rcu(&mn->hlist);
-		spin_unlock(&mm->mmu_notifier_mm->lock);
-
 		if (mn->ops->release)
 			mn->ops->release(mn, mm);
+		srcu_read_unlock(&srcu, id);
 
+		spin_lock(&mm->mmu_notifier_mm->lock);
 		/*
-		 * Allow __mmu_notifier_release() to complete.
+		 * Can not use list_del_rcu() since __mmu_notifier_release
+		 * can delete it before we hold the lock.
 		 */
-		srcu_read_unlock(&srcu, id);
-	} else
+		hlist_del_init_rcu(&mn->hlist);
 		spin_unlock(&mm->mmu_notifier_mm->lock);
+	}
 
 	/*
-	 * Wait for any running method to finish, including ->release() if it
-	 * was run by __mmu_notifier_release() instead of us.
+	 * Wait for any running method to finish, of course including
+	 * ->release if it was run by mmu_notifier_relase instead of us.
 	 */
 	synchronize_srcu(&srcu);
 



  parent reply	other threads:[~2013-06-05 21:18 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05 21:11 [ 00/44] 3.4.48-stable review Greg Kroah-Hartman
2013-06-05 21:11 ` [ 01/44] avr32: fix relocation check for signed 18-bit offset Greg Kroah-Hartman
2013-06-05 21:12 ` [ 02/44] ARM: plat-orion: Fix num_resources and id for ge10 and ge11 Greg Kroah-Hartman
2013-06-05 21:12 ` [ 03/44] staging: vt6656: use free_netdev instead of kfree Greg Kroah-Hartman
2013-06-05 21:12 ` [ 04/44] usb: option: Add Telewell TW-LTE 4G Greg Kroah-Hartman
2013-06-05 21:12 ` [ 05/44] USB: option: add device IDs for Dell 5804 (Novatel E371) WWAN card Greg Kroah-Hartman
2013-06-05 21:12 ` [ 06/44] USB: ftdi_sio: Add support for Newport CONEX motor drivers Greg Kroah-Hartman
2013-06-05 21:12 ` [ 07/44] USB: cxacru: potential underflow in cxacru_cm_get_array() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 08/44] TTY: Fix tty miss restart after we turn off flow-control Greg Kroah-Hartman
2013-06-05 21:12 ` [ 09/44] USB: Blacklisted Cinterions PLxx WWAN Interface Greg Kroah-Hartman
2013-06-05 21:12 ` [ 10/44] USB: reset resume quirk needed by a hub Greg Kroah-Hartman
2013-06-05 21:12 ` [ 11/44] USB: xHCI: override bogus bulk wMaxPacketSize values Greg Kroah-Hartman
2013-06-05 21:12 ` [ 12/44] USB: UHCI: fix for suspend of virtual HP controller Greg Kroah-Hartman
2013-06-05 21:12 ` [ 13/44] cifs: only set ops for inodes in I_NEW state Greg Kroah-Hartman
2013-06-05 21:12 ` [ 14/44] fat: fix possible overflow for fat_clusters Greg Kroah-Hartman
2013-06-05 21:12 ` [ 15/44] perf: net_dropmonitor: Fix trace parameter order Greg Kroah-Hartman
2013-06-05 21:12 ` [ 16/44] perf: net_dropmonitor: Fix symbol-relative addresses Greg Kroah-Hartman
2013-06-05 21:12 ` [ 17/44] ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 18/44] Kirkwood: Enable PCIe port 1 on QNAP TS-11x/TS-21x Greg Kroah-Hartman
2013-06-05 21:12 ` [ 19/44] drivers/leds/leds-ot200.c: fix error caused by shifted mask Greg Kroah-Hartman
2013-06-05 21:12 ` [ 20/44] mm compaction: fix of improper cache flush in migration code Greg Kroah-Hartman
2013-06-05 21:12 ` [ 21/44] klist: del waiter from klist_remove_waiters before wakeup waitting process Greg Kroah-Hartman
2013-06-05 21:12 ` [ 22/44] wait: fix false timeouts when using wait_event_timeout() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 23/44] nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary Greg Kroah-Hartman
2013-06-05 21:12 ` Greg Kroah-Hartman [this message]
2013-06-05 21:12 ` [ 25/44] drivers/block/brd.c: fix brd_lookup_page() race Greg Kroah-Hartman
2013-06-05 21:12 ` [ 26/44] mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas Greg Kroah-Hartman
2013-06-05 21:12 ` [ 27/44] mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer Greg Kroah-Hartman
2013-06-05 21:12 ` [ 28/44] iscsi-target: fix heap buffer overflow on error Greg Kroah-Hartman
2013-06-05 21:12 ` [ 29/44] NFSv4: Fix a thinko in nfs4_try_open_cached Greg Kroah-Hartman
2013-06-05 21:12 ` [ 30/44] xfs: kill suid/sgid through the truncate path Greg Kroah-Hartman
2013-06-05 21:12 ` [ 31/44] drm/radeon: fix card_posted check for newer asics Greg Kroah-Hartman
2013-06-05 21:12 ` [ 32/44] cifs: fix potential buffer overrun when composing a new options string Greg Kroah-Hartman
2013-06-05 21:12 ` [ 33/44] USB: io_ti: Fix NULL dereference in chase_port() Greg Kroah-Hartman
2013-06-05 21:12 ` [ 34/44] ata_piix: add PCI IDs for Intel BayTail Greg Kroah-Hartman
2013-06-05 21:12 ` [ 35/44] libata: make ata_exec_internal_sg honor DMADIR Greg Kroah-Hartman
2013-06-05 21:12 ` [ 36/44] m68k/mac: Fix unexpected interrupt with CONFIG_EARLY_PRINTK Greg Kroah-Hartman
2013-06-05 21:12 ` [ 37/44] xen/events: Handle VIRQ_TIMER before any other hardirq in event loop Greg Kroah-Hartman
2013-06-05 21:12 ` [ 38/44] jfs: fix a couple races Greg Kroah-Hartman
2013-06-05 21:12 ` [ 39/44] xen-netback: remove skb in xen_netbk_alloc_page Greg Kroah-Hartman
2013-06-05 21:12 ` [ 40/44] iommu/amd: Re-enable IOMMU event log interrupt after handling Greg Kroah-Hartman
2013-06-05 21:12 ` [ 41/44] iommu/amd: Workaround for ERBT1312 Greg Kroah-Hartman
2013-06-05 21:12 ` [ 42/44] x86, um: Correct syscall table type attributes breaking gcc 4.8 Greg Kroah-Hartman
2013-06-05 21:12 ` [ 43/44] mac80211: close AP_VLAN interfaces before unregistering all Greg Kroah-Hartman
2013-06-05 21:12 ` [ 44/44] thinkpad-acpi: recognize latest V-Series using DMI_BIOS_VENDOR Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130605211224.406437292@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=holt@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox