From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:38869 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1757228AbdIIAOY (ORCPT ); Fri, 8 Sep 2017 20:14:24 -0400 From: NeilBrown To: Goldwyn Rodrigues , linux-fsdevel@vger.kernel.org Date: Sat, 09 Sep 2017 10:14:11 +1000 Cc: viro@ZenIV.linux.org.uk, alexey.lyashkov@gmail.com, Goldwyn Rodrigues Subject: Re: [PATCH] d_move() vs d_unhashed() race: retry under d_lock In-Reply-To: <20170908162109.17906-1-rgoldwyn@suse.de> References: <20170908162109.17906-1-rgoldwyn@suse.de> Message-ID: <87d170d8z0.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Fri, Sep 08 2017, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues > > This is a follow-up of Alexey's patch at > https://patchwork.kernel.org/patch/9455345/ > with suggestions proposed by Al Viro. > > d_move() and d_unhashed() may race because there is a small window > where the dentry is unhashed. This may result in ENOENT (for getcwd). > This must be checked under d_lock. However, in order to keep the fast > path, perform the d_unhashed without d_lock first, and in the unlikely > event that it succeeds, perform the check again under d_lock. For your consideration, here is an alternate patch which - I believe - achieves the same end. I think this approach is a little more robust, but there isn't a lot in it - Goldwyn's is arguably simpler so might be better for that reason. NeilBrown From=20dfaa166e2afaed051c388dc9f43d1468020b5e22 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Fri, 8 Sep 2017 16:03:42 +1000 Subject: [PATCH] VFS: close race between getcwd() and d_move() d_move() will call __d_drop() and then __d_rehash() on the dentry being moved. This creates a small window when the dentry appears to be unhashed. Many tests of d_unhashed() are made under ->d_lock and so are safe from racing with this window, but some aren't. In particular, getcwd() calls d_unlinked() (which calls d_unhashed()) without d_lock protection, so it can race. This races has been seen in practice with lustre, which uses d_move() as part of name lookup. See: https://jira.hpdd.intel.com/browse/LU-9735 It could race with a regular rename(), and result in ENOENT instead of either the 'before' or 'after' name. We could fix this race by taking d_lock an rechecking when d_unhashed() reports true. Alternately when can remove the window, which is the approach this patch takes. When __d_drop and __d_rehash are used to move a dentry, an extra flag is passed which causes d_hash.pprev to not be cleared, and to not be tested. Signed-off-by: NeilBrown =2D-- fs/dcache.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index f90141387f01..3d1f14c6c306 100644 =2D-- a/fs/dcache.c +++ b/fs/dcache.c @@ -469,8 +469,11 @@ static void dentry_lru_add(struct dentry *dentry) * reason (NFS timeouts or autofs deletes). * * __d_drop requires dentry->d_lock. + * ___d_drop takes an extra @moving argument. + * If true, d_hash.pprev is not cleared, so there is no transient d_unhash= ed() + * state. */ =2Dvoid __d_drop(struct dentry *dentry) +static void inline ___d_drop(struct dentry *dentry, bool moving) { if (!d_unhashed(dentry)) { struct hlist_bl_head *b; @@ -486,12 +489,18 @@ void __d_drop(struct dentry *dentry) =20 hlist_bl_lock(b); __hlist_bl_del(&dentry->d_hash); =2D dentry->d_hash.pprev =3D NULL; + if (likely(!moving)) + dentry->d_hash.pprev =3D NULL; hlist_bl_unlock(b); /* After this call, in-progress rcu-walk path lookup will fail. */ write_seqcount_invalidate(&dentry->d_seq); } } + +void __d_drop(struct dentry *dentry) +{ + ___d_drop(dentry, false); +} EXPORT_SYMBOL(__d_drop); =20 void d_drop(struct dentry *dentry) @@ -2378,10 +2387,10 @@ void d_delete(struct dentry * dentry) } EXPORT_SYMBOL(d_delete); =20 =2Dstatic void __d_rehash(struct dentry *entry) +static void __d_rehash(struct dentry *entry, bool moving) { struct hlist_bl_head *b =3D d_hash(entry->d_name.hash); =2D BUG_ON(!d_unhashed(entry)); + BUG_ON(!moving && !d_unhashed(entry)); hlist_bl_lock(b); hlist_bl_add_head_rcu(&entry->d_hash, b); hlist_bl_unlock(b); @@ -2397,7 +2406,7 @@ static void __d_rehash(struct dentry *entry) void d_rehash(struct dentry * entry) { spin_lock(&entry->d_lock); =2D __d_rehash(entry); + __d_rehash(entry, false); spin_unlock(&entry->d_lock); } EXPORT_SYMBOL(d_rehash); @@ -2571,7 +2580,7 @@ static inline void __d_add(struct dentry *dentry, str= uct inode *inode) raw_write_seqcount_end(&dentry->d_seq); fsnotify_update_flags(dentry); } =2D __d_rehash(dentry); + __d_rehash(dentry, false); if (dir) end_dir_add(dir, n); spin_unlock(&dentry->d_lock); @@ -2633,7 +2642,7 @@ struct dentry *d_exact_alias(struct dentry *entry, st= ruct inode *inode) alias =3D NULL; } else { __dget_dlock(alias); =2D __d_rehash(alias); + __d_rehash(alias, false); spin_unlock(&alias->d_lock); } spin_unlock(&inode->i_lock); @@ -2819,8 +2828,8 @@ static void __d_move(struct dentry *dentry, struct de= ntry *target, =20 /* unhash both */ /* __d_drop does write_seqcount_barrier, but they're OK to nest. */ =2D __d_drop(dentry); =2D __d_drop(target); + ___d_drop(dentry, true); + ___d_drop(target, exchange); =20 /* Switch the names.. */ if (exchange) @@ -2829,9 +2838,9 @@ static void __d_move(struct dentry *dentry, struct de= ntry *target, copy_name(dentry, target); =20 /* rehash in new place(s) */ =2D __d_rehash(dentry); + __d_rehash(dentry, true); if (exchange) =2D __d_rehash(target); + __d_rehash(target, true); =20 /* ... and switch them in the tree */ if (IS_ROOT(dentry)) { =2D-=20 2.14.0.rc0.dirty --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAlmzMlYACgkQOeye3VZi gbk+KA/6A40S4Pzit9EZedGEEDSQQgWvNWjwENUw651f8LeF4EwYLiTnv41f4y0c zNhxQjTPh2xp/QDnQRV8jgdTaeAiLGIZFemfdX9lnfsUJCvhn/A6jjXCNm5XEexX 0eR8+FGXWYooMhPOgHFTvzWAhPC09My4/rJVz24uyaZEk6Q2gr4tr9Wa5cBu1GGt UYVDBAa60mV5LniZl2TK+YFe6wLcXM5Wy8+U8cgXAGdpo7+hCd4JMBjQExWkk+Ib Mdy1X2EdER6EKWMKKVyQp70Kjf93mS/DwVHz1VpXwfzrHTGHDCVoCca33z/hCCvt 1xYNaHxjLcocrOJ9Y0wZr8VVOsHRgJYJ0vlvA/zr8lWydYXIlfluDxtjIeZ2+8EA srGjq2r2DNVxIVfVrlkoJwbQWWm/rFPBxWlRGeDdRAO8fcFx7HHgaOzdKJxO38sM RbOxC8v9y3bzu4LGPeSbYF4kmsUO3K1N19dsytixq+YtmRjIqUQkDYxVQbcT+td6 MvJ4seDbESms76VckxbLnOQnUoX3fNdIedmF0LAp+CYFXSiiJOpHdVl7HIhw/FfY 7Dp3oXwjSwZahIDbziTH0ZXHC3LXSH9xNkaSygd8IDmFHWPFjs0x3kzhSD1rLVO2 g40L1xaDx25O5t8xBgeP7S9dj1hlGKmXCsACGi5fg4MyfiqXNt8= =OnOj -----END PGP SIGNATURE----- --=-=-=--