linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: hooanon05@yahoo.co.jp
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	hch@infradead.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: Q: NFSD readdir in linux-2.6.28
Date: Fri, 17 Apr 2009 10:32:19 +0100	[thread overview]
Message-ID: <1239960739.3428.33.camel@macbook.infradead.org> (raw)
In-Reply-To: <8913.1237476890@jrobl>

On Fri, 2009-03-20 at 00:34 +0900, hooanon05@yahoo.co.jp wrote:
> David Woodhouse:
> > Yes, well spotted. It didn't matter when the buffered readdir() was
> > purely internal to XFS, because it didn't matter there that we called
> > ->lookup() without i_mutex set. But now we're exposing arbitrary file
> > systems to it, we need to make sure we follow the locking rules. 
> > 
> > I _think_ it's sufficient to make the affected callers of
> > lookup_one_len() lock the parent's i_mutex for themselves before calling
> > it. I'll take a closer look...
> 
> If you remember why you discarded the FS_NO_LOOKUP_IN_READDIR flag
> approach, please let me know. URL or something is enough.

I think someone talked me into doing it in the interest of simplicity,
so we confine the necessary hack into the NFS code alone and _always_
just use the "safe" version. I can't remember who it was, but I'm
guessing Al or Christoph -- both of whom are CC'd in case they want to
object:

Given the fun with i_mutex I think the best answer is to bring it back.
I'm about to test this patch which restores it (and sets it for btrfs,
jffs2 and xfs)...

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index deeeed0..06f539c 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -357,7 +357,10 @@ otherwise noted.
 	If you wish to overload the dentry methods then you should
 	initialise the "d_dop" field in the dentry; this is a pointer
 	to a struct "dentry_operations".
-	This method is called with the directory inode semaphore held
+	This method is called with the directory inode mutex held
+	unless the file system sets the FS_NO_LOOKUP_IN_READDIR flag
+	its file system flags, in which case lookup() calls from NFS
+	readdirplus() requests will be made without directory mutex.
 
   link: called by the link(2) system call. Only required if you want
 	to support hard links. You will probably need to call
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 9744af9..f0e9d18 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -619,7 +619,7 @@ static struct file_system_type btrfs_fs_type = {
 	.name		= "btrfs",
 	.get_sb		= btrfs_get_sb,
 	.kill_sb	= kill_anon_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV | FS_NO_LOOKUP_IN_READDIR,
 };
 
 /*
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c
index 4c4e18c..0e97a35 100644
--- a/fs/jffs2/super.c
+++ b/fs/jffs2/super.c
@@ -209,6 +209,7 @@ static struct file_system_type jffs2_fs_type = {
 	.name =		"jffs2",
 	.get_sb =	jffs2_get_sb,
 	.kill_sb =	jffs2_kill_sb,
+	.flags =	FS_NO_LOOKUP_IN_READDIR,
 };
 
 static int __init init_jffs2_fs(void)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..230e30e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1841,6 +1841,30 @@ out:
 	return err;
 }
 
+
+static int nfsd_real_readdir(struct file *file, filldir_t func,
+			     struct readdir_cd *cdp, loff_t *offsetp)
+{
+	int host_err;
+
+	/*
+	 * Read the directory entries. This silly loop is necessary because
+	 * readdir() is not guaranteed to fill up the entire buffer, but
+	 * may choose to do less.
+	 */
+	do {
+		cdp->err = nfserr_eof; /* will be cleared on successful read */
+		host_err = vfs_readdir(file, func, cdp);
+	} while (host_err >=0 && cdp->err == nfs_ok);
+
+	*offsetp = vfs_llseek(file, 0, 1);
+
+	if (host_err)
+		return nfserrno(host_err);
+	else
+		return cdp->err;
+}
+
 /*
  * We do this buffering because we must not call back into the file
  * system's ->lookup() method from the filldir callback. That may well
@@ -1970,7 +1994,11 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
 		goto out_close;
 	}
 
-	err = nfsd_buffered_readdir(file, func, cdp, offsetp);
+	if ((file->f_path.dentry->d_inode->i_sb->s_type->fs_flags &
+	     FS_NO_LOOKUP_IN_READDIR))
+		err = nfsd_buffered_readdir(file, func, cdp, offsetp);
+	else
+		err = nfsd_real_readdir(file, func, cdp, offsetp);
 
 	if (err == nfserr_eof || err == nfserr_toosmall)
 		err = nfs_ok; /* can still be found in ->err */
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index bb68526..7bba46a 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1546,7 +1546,7 @@ static struct file_system_type xfs_fs_type = {
 	.name			= "xfs",
 	.get_sb			= xfs_fs_get_sb,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV,
+	.fs_flags		= FS_REQUIRES_DEV | FS_NO_LOOKUP_IN_READDIR,
 };
 
 STATIC int __init
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e766be0..caf2a94 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -176,6 +176,11 @@ struct inodes_stat_t {
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move()
 					 * during rename() internally.
 					 */
+#define FS_NO_LOOKUP_IN_READDIR	65536	/* FS has its own locking and will
+	   deadlock if you call its lookup() method from within a filldir
+	   function, as NFSd wants to. A file system setting this flag must
+	   be happy for its ->lookup() method to be called without i_mutex,
+	   which is what the NFSd workaround code will do. */
 
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported


-- 
dwmw2


  parent reply	other threads:[~2009-04-17  9:32 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-19 14:54 Q: NFSD readdir in linux-2.6.28 hooanon05
2009-03-19 15:17 ` David Woodhouse
2009-03-19 15:34   ` hooanon05
2009-03-19 15:51     ` David Woodhouse
2009-04-17  9:32     ` David Woodhouse [this message]
2009-04-17 19:32       ` Al Viro
2009-04-17 22:17         ` David Woodhouse
2009-04-17 22:43           ` Al Viro
2009-04-17 22:51             ` David Woodhouse
2009-04-17 22:53             ` Al Viro
2009-04-17 22:55               ` David Woodhouse
2009-04-17 23:23               ` David Woodhouse
2009-04-17 23:37                 ` Al Viro
2009-04-17 23:39                   ` Al Viro
2009-04-18  0:15               ` [PATCH] Fix i_mutex handling in nfsd readdir David Woodhouse
2009-04-18  3:11                 ` hooanon05
2009-04-18 14:25                   ` Al Viro
2009-04-19  7:18                   ` David Woodhouse
2009-04-19 12:27                 ` [PATCH v2] " David Woodhouse
2009-04-19 20:51                   ` J. Bruce Fields
2009-04-20 19:50                     ` J. Bruce Fields
2009-04-21  0:29                       ` Al Viro
2009-04-21 21:15                         ` J. Bruce Fields
2009-04-21 21:54                           ` Al Viro
2009-05-11 23:16                       ` J. Bruce Fields
2009-04-22  4:41                   ` hooanon05
2009-04-22 19:12                     ` J. Bruce Fields
2009-04-23  6:40                       ` hooanon05
2009-04-23 20:27                         ` J. Bruce Fields
2009-05-05 23:35                           ` J. Bruce Fields
2009-05-06  5:09                             ` hooanon05
2009-05-06 20:20                               ` J. Bruce Fields
2009-05-07  4:38                                 ` hooanon05
2009-05-08 18:47                                   ` J. Bruce Fields
2009-03-19 15:37   ` Q: NFSD readdir in linux-2.6.28 Matthew Wilcox
2009-03-19 15:45     ` hooanon05
2009-03-19 16:01     ` David Woodhouse
2009-04-16 21:45   ` J. Bruce Fields
2009-04-17  4:13     ` hooanon05

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1239960739.3428.33.camel@macbook.infradead.org \
    --to=dwmw2@infradead.org \
    --cc=hch@infradead.org \
    --cc=hooanon05@yahoo.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).