From: Ian Kent <raven@themaw.net>
To: Sage Weil <sage@newdream.net>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
Andreas Dilger <adilger@sun.com>,
Yehuda Saheh <yehuda@newdream.net>,
Jim Garlick <garlick@llnl.gov>
Subject: [RFC PATCH 01/11] Subject: [PATCH] vfs: make real_lookup do dentry revalidation with i_mutex held
Date: Thu, 24 Sep 2009 16:21:25 +0800 [thread overview]
Message-ID: <20090924082125.22151.94452.stgit@zeus.themaw.net> (raw)
In-Reply-To: <20090924082036.22151.85151.stgit@zeus.themaw.net>
From: Sage Weil <sage@newdream.net>
real_lookup() is called by do_lookup() if dentry revalidation fails. If
the cache is re-populated while waiting for i_mutex, it may find that
a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
revalidate failed _again_, however, it would give up with -ENOENT. The
problem here that network file systems may be invalidating dentries via
server callbacks, e.g. due to concurrent access from another client, and
-ENOENT is frequently the wrong answer.
This problem has been seen with both Lustre and Ceph. It seems possible
to hit this case with NFS as well if the cache lifetime is very short.
Instead, we should do_revalidate() while i_mutex is still held. If
revalidation fails, we can move on to a ->lookup() and ensure a correct
result without worrying about any subsequent races.
Note that do_revalidate() is called with i_mutex held elsewhere. For
example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(),
and possibly others all take the directory i_mutex, and then
-> lookup_hash
-> __lookup_hash
-> cached_lookup
-> do_revalidate
so this does not introduce any new locking rules for d_revalidate
implementations.
Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
---
fs/namei.c | 58 +++++++++++++++++++++++++++++++---------------------------
1 files changed, 31 insertions(+), 27 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index d11f404..d68ea6d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -477,6 +477,7 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
{
struct dentry * result;
struct inode *dir = parent->d_inode;
+ struct dentry *dentry;
mutex_lock(&dir->i_mutex);
/*
@@ -494,38 +495,41 @@ static struct dentry * real_lookup(struct dentry * parent, struct qstr * name, s
* so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
*/
result = d_lookup(parent, name);
- if (!result) {
- struct dentry *dentry;
-
- /* Don't create child dentry for a dead directory. */
- result = ERR_PTR(-ENOENT);
- if (IS_DEADDIR(dir))
- goto out_unlock;
-
- dentry = d_alloc(parent, name);
- result = ERR_PTR(-ENOMEM);
- if (dentry) {
- result = dir->i_op->lookup(dir, dentry, nd);
+ if (result) {
+ /*
+ * The cache was re-populated while we waited on the
+ * mutex. We need to revalidate, this time while
+ * holding i_mutex (to avoid another race).
+ */
+ if (result->d_op && result->d_op->d_revalidate) {
+ result = do_revalidate(result, nd);
if (result)
- dput(dentry);
- else
- result = dentry;
+ goto out_unlock;
+ /*
+ * The dentry was left behind invalid. Just
+ * do the lookup.
+ */
+ } else {
+ goto out_unlock;
}
-out_unlock:
- mutex_unlock(&dir->i_mutex);
- return result;
}
- /*
- * Uhhuh! Nasty case: the cache was re-populated while
- * we waited on the semaphore. Need to revalidate.
- */
- mutex_unlock(&dir->i_mutex);
- if (result->d_op && result->d_op->d_revalidate) {
- result = do_revalidate(result, nd);
- if (!result)
- result = ERR_PTR(-ENOENT);
+ /* Don't create child dentry for a dead directory. */
+ result = ERR_PTR(-ENOENT);
+ if (IS_DEADDIR(dir))
+ goto out_unlock;
+
+ dentry = d_alloc(parent, name);
+ result = ERR_PTR(-ENOMEM);
+ if (dentry) {
+ result = dir->i_op->lookup(dir, dentry, nd);
+ if (result)
+ dput(dentry);
+ else
+ result = dentry;
}
+out_unlock:
+ mutex_unlock(&dir->i_mutex);
return result;
}
next prev parent reply other threads:[~2009-09-24 8:31 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-24 8:21 [RFC PATCH 00/11] autofs4 - update autofs4 to deal with VFS locking change Ian Kent
2009-09-24 8:21 ` Ian Kent [this message]
2009-09-24 8:21 ` [RFC PATCH 02/11] autofs4 - use macros for active list handling Ian Kent
2009-09-24 8:21 ` [RFC PATCH 03/11] autofs4 - use macros for expiring list Ian Kent
2009-09-24 8:21 ` [RFC PATCH 04/11] autofs4 - use macro for need mount check Ian Kent
2009-09-24 8:21 ` [RFC PATCH 05/11] autofs4 - use autofs_info for pending flag Ian Kent
2009-09-24 8:21 ` [RFC PATCH 06/11] autofs4 - renamer unhashed to active in autofs4_lookup() Ian Kent
2009-09-24 8:22 ` [RFC PATCH 07/11] autofs4 - cleanup active and expire lookup Ian Kent
2009-09-24 8:22 ` [RFC PATCH 08/11] autofs4 - eliminate d_unhashed in path walk checks Ian Kent
2009-09-24 8:22 ` [RFC PATCH 09/11] autofs4 - rename dentry to active in autofs4_lookup_active() Ian Kent
2009-09-24 8:22 ` [RFC PATCH 10/11] autofs4 - rename dentry to expiring in autofs4_lookup_expiring() Ian Kent
2009-09-24 8:22 ` [RFC PATCH 11/11] autofs4 - always use lookup for lookup Ian Kent
2009-09-24 9:19 ` [RFC PATCH 00/11] autofs4 - update autofs4 to deal with VFS locking change Ian Kent
2009-09-24 16:10 ` Sage Weil
2009-09-28 7:41 ` Ian Kent
2009-09-28 7:53 ` Ian Kent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090924082125.22151.94452.stgit@zeus.themaw.net \
--to=raven@themaw.net \
--cc=adilger@sun.com \
--cc=garlick@llnl.gov \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sage@newdream.net \
--cc=viro@ZenIV.linux.org.uk \
--cc=yehuda@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).