From mboxrd@z Thu Jan 1 00:00:00 1970 From: akpm@linux-foundation.org Subject: [patch 7/9] vfs: make real_lookup do dentry revalidation with i_mutex held Date: Tue, 17 Nov 2009 14:56:33 -0800 Message-ID: <200911172256.nAHMuXQl027650@imap1.linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit Cc: linux-fsdevel@vger.kernel.org, akpm@linux-foundation.org, sage@newdream.net, adilger@sun.com, hch@infradead.org, raven@themaw.net, yehuda@newdream.net To: viro@zeniv.linux.org.uk Return-path: Received: from smtp1.linux-foundation.org ([140.211.169.13]:37694 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756200AbZKQW45 (ORCPT ); Tue, 17 Nov 2009 17:56:57 -0500 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: From: Sage Weil real_lookup() is called by do_lookup() if dentry revalidation fails. If the cache is re-populated while waiting for i_mutex, it may find that a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment). Previously, real_lookup() would drop i_mutex and do_revalidate() again. If revalidate failed _again_, however, it would give up with -ENOENT. The problem here that network file systems may be invalidating dentries via server callbacks, e.g. due to concurrent access from another client, and -ENOENT is frequently the wrong answer. This problem has been seen with both Lustre and Ceph. It seems possible to hit this case with NFS as well if the cache lifetime is very short. Instead, we should do_revalidate() while i_mutex is still held. If revalidation fails, we can move on to a ->lookup() and ensure a correct result without worrying about any subsequent races. Note that do_revalidate() is called with i_mutex held elsewhere. For example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(), and possibly others all take the directory i_mutex, and then -> lookup_hash -> __lookup_hash -> cached_lookup -> do_revalidate so this does not introduce any new locking rules for d_revalidate implementations. Yes, the goto is ugly. A cleanup patch follows. Cc: Ian Kent Cc: Christoph Hellwig Cc: Al Viro Cc: Andreas Dilger Signed-off-by: Yehuda Sadeh Signed-off-by: Sage Weil Signed-off-by: Andrew Morton --- fs/namei.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN fs/namei.c~vfs-make-real_lookup-do-dentry-revalidation-with-i_mutex-held fs/namei.c --- a/fs/namei.c~vfs-make-real_lookup-do-dentry-revalidation-with-i_mutex-held +++ a/fs/namei.c @@ -497,6 +497,7 @@ static struct dentry * real_lookup(struc if (!result) { struct dentry *dentry; +do_the_lookup: /* Don't create child dentry for a dead directory. */ result = ERR_PTR(-ENOENT); if (IS_DEADDIR(dir)) @@ -520,12 +521,12 @@ out_unlock: * Uhhuh! Nasty case: the cache was re-populated while * we waited on the semaphore. Need to revalidate. */ - mutex_unlock(&dir->i_mutex); if (result->d_op && result->d_op->d_revalidate) { result = do_revalidate(result, nd); if (!result) - result = ERR_PTR(-ENOENT); + goto do_the_lookup; } + mutex_unlock(&dir->i_mutex); return result; } _