stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jann Horn <jannh@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>
Subject: [PATCH 3.18 03/15] fix mntput/mntput race
Date: Thu, 16 Aug 2018 20:41:40 +0200	[thread overview]
Message-ID: <20180816171633.673294381@linuxfoundation.org> (raw)
In-Reply-To: <20180816171633.546734046@linuxfoundation.org>

3.18-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Al Viro <viro@zeniv.linux.org.uk>

commit 9ea0a46ca2c318fcc449c1e6b62a7230a17888f1 upstream.

mntput_no_expire() does the calculation of total refcount under mount_lock;
unfortunately, the decrement (as well as all increments) are done outside
of it, leading to false positives in the "are we dropping the last reference"
test.  Consider the following situation:
	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
mntput(mnt) (called from __fput()) drops one reference, decrementing component
	* After it has looked at component #0, the process on CPU 0 does
mntget(), incrementing component #0, gets preempted and gets to run again -
on CPU 69.  There it does mntput(), which drops the reference (component #69)
and proceeds to spin on mount_lock.
	* On CPU 42 our first mntput() finishes counting.  It observes the
decrement of component #69, but not the increment of component #0.  As the
result, the total it gets is not 1 as it should've been - it's 0.  At which
point we decide that vfsmount needs to be killed and proceed to free it and
shut the filesystem down.  However, there's still another opened file
on that filesystem, with reference to (now freed) vfsmount, etc. and we are
screwed.

It's not a wide race, but it can be reproduced with artificial slowdown of
the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.

Fix consists of moving the refcount decrement under mount_lock; the tricky
part is that we want (and can) keep the fast case (i.e. mount that still
has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
be dropped.

Reported-by: Jann Horn <jannh@google.com>
Tested-by: Jann Horn <jannh@google.com>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/namespace.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1049,12 +1049,22 @@ static DECLARE_DELAYED_WORK(delayed_mntp
 static void mntput_no_expire(struct mount *mnt)
 {
 	rcu_read_lock();
-	mnt_add_count(mnt, -1);
-	if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */
+	if (likely(READ_ONCE(mnt->mnt_ns))) {
+		/*
+		 * Since we don't do lock_mount_hash() here,
+		 * ->mnt_ns can change under us.  However, if it's
+		 * non-NULL, then there's a reference that won't
+		 * be dropped until after an RCU delay done after
+		 * turning ->mnt_ns NULL.  So if we observe it
+		 * non-NULL under rcu_read_lock(), the reference
+		 * we are dropping is not the final one.
+		 */
+		mnt_add_count(mnt, -1);
 		rcu_read_unlock();
 		return;
 	}
 	lock_mount_hash();
+	mnt_add_count(mnt, -1);
 	if (mnt_get_count(mnt)) {
 		rcu_read_unlock();
 		unlock_mount_hash();

  parent reply	other threads:[~2018-08-16 21:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16 18:41 [PATCH 3.18 00/15] 3.18.119-stable review Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 01/15] xen/netfront: dont cache skb_shinfo() Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 02/15] root dentries need RCU-delayed freeing Greg Kroah-Hartman
2018-08-16 18:41 ` Greg Kroah-Hartman [this message]
2018-08-16 18:41 ` [PATCH 3.18 04/15] fix __legitimize_mnt()/mntput() race Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 05/15] ARM: dts: imx6sx: fix irq for pcie bridge Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 06/15] kprobes/x86: Fix %p uses in error messages Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 07/15] ALSA: info: Check for integer overflow in snd_info_entry_write() Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 08/15] mm: slub: fix format mismatches in slab_err() callers Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 09/15] i2c: ismt: fix wrong device address when unmap the data buffer Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 10/15] kbuild: verify that $DEPMOD is installed Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 11/15] crypto: vmac - require a block cipher with 128-bit block size Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 12/15] crypto: vmac - separate tfm and request context Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 13/15] crypto: blkcipher - fix crash flushing dcache in error path Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 14/15] crypto: ablkcipher " Greg Kroah-Hartman
2018-08-16 18:41 ` [PATCH 3.18 15/15] Bluetooth: hidp: buffer overflow in hidp_process_report Greg Kroah-Hartman
2018-08-16 19:44 ` [PATCH 3.18 00/15] 3.18.119-stable review Nathan Chancellor
2018-08-17 10:08   ` Greg Kroah-Hartman
2018-08-17 13:59 ` Harsh 'Shandilya
2018-08-17 17:13   ` Greg Kroah-Hartman
2018-08-17 17:16 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180816171633.673294381@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).