public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()
@ 2010-01-29 20:19 Trond Myklebust
  2010-01-29 20:35 ` Trond Myklebust
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Trond Myklebust @ 2010-01-29 20:19 UTC (permalink / raw)
  To: Dr. J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Currently, the nfs server holds the inode->i_mutex across both the
filemap_write_and_wait() call and the fsync() call itself. However we know
that filemap_write_and_wait() is already safe against livelocks, so we only
need to hold the mutex across the fsync() call.

Fix this by reusing vfs_fsync(), which already does the right thing.
Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
improve the efficiency for clients that do specify a range.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfsd/vfs.c |   58 +++++++++++++++++++++------------------------------------
 1 files changed, 21 insertions(+), 37 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c194793..e4568d6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -769,40 +769,17 @@ nfsd_close(struct file *filp)
 }
 
 /*
- * Sync a file
- * As this calls fsync (not fdatasync) there is no need for a write_inode
- * after it.
+ * Sync a directory
+ * returns 0 if the directory had no fsync method
  */
-static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
-			      const struct file_operations *fop)
-{
-	struct inode *inode = dp->d_inode;
-	int (*fsync) (struct file *, struct dentry *, int);
-	int err;
-
-	err = filemap_write_and_wait(inode->i_mapping);
-	if (err == 0 && fop && (fsync = fop->fsync))
-		err = fsync(filp, dp, 0);
-	return err;
-}
-
-static int
-nfsd_sync(struct file *filp)
-{
-        int err;
-	struct inode *inode = filp->f_path.dentry->d_inode;
-	dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
-	mutex_lock(&inode->i_mutex);
-	err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
-	mutex_unlock(&inode->i_mutex);
-
-	return err;
-}
-
 int
 nfsd_sync_dir(struct dentry *dp)
 {
-	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
+	int err;
+
+	dprintk("nfsd: sync directory %s\n", dp->d_name.name);
+	err = vfs_fsync(NULL, dp, 0);
+	return (err != -EINVAL) ? err : 0;
 }
 
 /*
@@ -1008,7 +985,9 @@ static int wait_for_concurrent_writes(struct file *file)
 
 	if (inode->i_state & I_DIRTY) {
 		dprintk("nfsd: write sync %d\n", task_pid_nr(current));
-		err = nfsd_sync(file);
+		err = vfs_fsync(file, file->f_path.dentry, 0);
+		if (err == -EINVAL)
+			err = 0;
 	}
 	last_ino = inode->i_ino;
 	last_dev = inode->i_sb->s_dev;
@@ -1156,8 +1135,6 @@ out:
 #ifdef CONFIG_NFSD_V3
 /*
  * Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
  *
  * Unfortunately we cannot lock the file to make sure we return full WCC
  * data to the client, as locking happens lower down in the filesystem.
@@ -1176,11 +1153,18 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err)
 		return err;
 	if (EX_ISSYNC(fhp->fh_export)) {
-		if (file->f_op && file->f_op->fsync) {
-			err = nfserrno(nfsd_sync(file));
-		} else {
+		loff_t end = LLONG_MAX;
+		int err2;
+		
+		if (count != 0)
+			end = offset + count;
+		err2 = vfs_fsync_range(file, file->f_path.dentry,
+				offset, end, 0);
+
+		if (err2 != -EINVAL)
+			err = nfserrno(err2);
+		else
 			err = nfserr_notsupp;
-		}
 	}
 
 	nfsd_close(file);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()
  2010-01-29 20:19 nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range() Trond Myklebust
@ 2010-01-29 20:35 ` Trond Myklebust
  2010-01-29 20:44 ` [PATCH v2] " Trond Myklebust
  2010-01-29 20:50 ` Christoph Hellwig
  2 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2010-01-29 20:35 UTC (permalink / raw)
  To: Dr. J. Bruce Fields; +Cc: linux-nfs

On Fri, 2010-01-29 at 15:19 -0500, Trond Myklebust wrote: 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> Currently, the nfs server holds the inode->i_mutex across both the
> filemap_write_and_wait() call and the fsync() call itself. However we know
> that filemap_write_and_wait() is already safe against livelocks, so we only
> need to hold the mutex across the fsync() call.
> 
> Fix this by reusing vfs_fsync(), which already does the right thing.
> Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
> improve the efficiency for clients that do specify a range.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> 
>  fs/nfsd/vfs.c |   58 +++++++++++++++++++++------------------------------------
>  1 files changed, 21 insertions(+), 37 deletions(-)
> 
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index c194793..e4568d6 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -769,40 +769,17 @@ nfsd_close(struct file *filp)
>  }
>  
>  /*
> - * Sync a file
> - * As this calls fsync (not fdatasync) there is no need for a write_inode
> - * after it.
> + * Sync a directory
> + * returns 0 if the directory had no fsync method
>   */
> -static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
> -			      const struct file_operations *fop)
> -{
> -	struct inode *inode = dp->d_inode;
> -	int (*fsync) (struct file *, struct dentry *, int);
> -	int err;
> -
> -	err = filemap_write_and_wait(inode->i_mapping);
> -	if (err == 0 && fop && (fsync = fop->fsync))
> -		err = fsync(filp, dp, 0);
> -	return err;
> -}
> -
> -static int
> -nfsd_sync(struct file *filp)
> -{
> -        int err;
> -	struct inode *inode = filp->f_path.dentry->d_inode;
> -	dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
> -	mutex_lock(&inode->i_mutex);
> -	err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
> -	mutex_unlock(&inode->i_mutex);
> -
> -	return err;
> -}
> -
>  int
>  nfsd_sync_dir(struct dentry *dp)
>  {
> -	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
> +	int err;
> +
> +	dprintk("nfsd: sync directory %s\n", dp->d_name.name);
> +	err = vfs_fsync(NULL, dp, 0);
> +	return (err != -EINVAL) ? err : 0;
>  }
>  
>  /*
> @@ -1008,7 +985,9 @@ static int wait_for_concurrent_writes(struct file *file)
>  
>  	if (inode->i_state & I_DIRTY) {
>  		dprintk("nfsd: write sync %d\n", task_pid_nr(current));
> -		err = nfsd_sync(file);
> +		err = vfs_fsync(file, file->f_path.dentry, 0);
> +		if (err == -EINVAL)
> +			err = 0;
>  	}
>  	last_ino = inode->i_ino;
>  	last_dev = inode->i_sb->s_dev;
> @@ -1156,8 +1135,6 @@ out:
>  #ifdef CONFIG_NFSD_V3
>  /*
>   * Commit all pending writes to stable storage.
> - * Strictly speaking, we could sync just the indicated file region here,
> - * but there's currently no way we can ask the VFS to do so.
>   *
>   * Unfortunately we cannot lock the file to make sure we return full WCC
>   * data to the client, as locking happens lower down in the filesystem.
> @@ -1176,11 +1153,18 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (err)
>  		return err;
>  	if (EX_ISSYNC(fhp->fh_export)) {
> -		if (file->f_op && file->f_op->fsync) {
> -			err = nfserrno(nfsd_sync(file));
> -		} else {
> +		loff_t end = LLONG_MAX;
> +		int err2;
> +		
> +		if (count != 0)
> +			end = offset + count;
> +		err2 = vfs_fsync_range(file, file->f_path.dentry,
> +				offset, end, 0);
> +
Actually, we should probably check for offset < 0 || end < offset. I'll
add that...

Cheers
  Trond


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()
  2010-01-29 20:19 nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range() Trond Myklebust
  2010-01-29 20:35 ` Trond Myklebust
@ 2010-01-29 20:44 ` Trond Myklebust
  2010-01-29 20:50 ` Christoph Hellwig
  2 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2010-01-29 20:44 UTC (permalink / raw)
  To: Dr. J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Currently, the nfs server holds the inode->i_mutex across both the
filemap_write_and_wait() call and the fsync() call itself. However we know
that filemap_write_and_wait() is already safe against livelocks, so we only
need to hold the mutex across the fsync() call.

Fix this by reusing vfs_fsync(), which already does the right thing.
Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
improve the efficiency for clients that do specify a range.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfsd/vfs.c |   59 ++++++++++++++++++++-------------------------------------
 1 files changed, 21 insertions(+), 38 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c194793..26d0d2c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -769,40 +769,17 @@ nfsd_close(struct file *filp)
 }
 
 /*
- * Sync a file
- * As this calls fsync (not fdatasync) there is no need for a write_inode
- * after it.
+ * Sync a directory
+ * returns 0 if the directory had no fsync method
  */
-static inline int nfsd_dosync(struct file *filp, struct dentry *dp,
-			      const struct file_operations *fop)
-{
-	struct inode *inode = dp->d_inode;
-	int (*fsync) (struct file *, struct dentry *, int);
-	int err;
-
-	err = filemap_write_and_wait(inode->i_mapping);
-	if (err == 0 && fop && (fsync = fop->fsync))
-		err = fsync(filp, dp, 0);
-	return err;
-}
-
-static int
-nfsd_sync(struct file *filp)
-{
-        int err;
-	struct inode *inode = filp->f_path.dentry->d_inode;
-	dprintk("nfsd: sync file %s\n", filp->f_path.dentry->d_name.name);
-	mutex_lock(&inode->i_mutex);
-	err=nfsd_dosync(filp, filp->f_path.dentry, filp->f_op);
-	mutex_unlock(&inode->i_mutex);
-
-	return err;
-}
-
 int
 nfsd_sync_dir(struct dentry *dp)
 {
-	return nfsd_dosync(NULL, dp, dp->d_inode->i_fop);
+	int err;
+
+	dprintk("nfsd: sync directory %s\n", dp->d_name.name);
+	err = vfs_fsync(NULL, dp, 0);
+	return (err != -EINVAL) ? err : 0;
 }
 
 /*
@@ -1008,7 +985,9 @@ static int wait_for_concurrent_writes(struct file *file)
 
 	if (inode->i_state & I_DIRTY) {
 		dprintk("nfsd: write sync %d\n", task_pid_nr(current));
-		err = nfsd_sync(file);
+		err = vfs_fsync(file, file->f_path.dentry, 0);
+		if (err == -EINVAL)
+			err = 0;
 	}
 	last_ino = inode->i_ino;
 	last_dev = inode->i_sb->s_dev;
@@ -1156,8 +1135,6 @@ out:
 #ifdef CONFIG_NFSD_V3
 /*
  * Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
  *
  * Unfortunately we cannot lock the file to make sure we return full WCC
  * data to the client, as locking happens lower down in the filesystem.
@@ -1167,20 +1144,26 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
                loff_t offset, unsigned long count)
 {
 	struct file	*file;
+	loff_t end = LLONG_MAX;
 	__be32		err;
 
-	if ((u64)count > ~(u64)offset)
+	if (count != 0)
+		end = offset + count - 1;
+
+	if (offset < 0 || end < offset)
 		return nfserr_inval;
 
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
 	if (err)
 		return err;
 	if (EX_ISSYNC(fhp->fh_export)) {
-		if (file->f_op && file->f_op->fsync) {
-			err = nfserrno(nfsd_sync(file));
-		} else {
+		int err2 = vfs_fsync_range(file, file->f_path.dentry,
+				offset, end, 0);
+
+		if (err2 != -EINVAL)
+			err = nfserrno(err2);
+		else
 			err = nfserr_notsupp;
-		}
 	}
 
 	nfsd_close(file);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()
  2010-01-29 20:19 nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range() Trond Myklebust
  2010-01-29 20:35 ` Trond Myklebust
  2010-01-29 20:44 ` [PATCH v2] " Trond Myklebust
@ 2010-01-29 20:50 ` Christoph Hellwig
  2010-01-29 20:57   ` Trond Myklebust
  2 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2010-01-29 20:50 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Dr. J. Bruce Fields, linux-nfs

On Fri, Jan 29, 2010 at 03:19:13PM -0500, Trond Myklebust wrote:
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> Currently, the nfs server holds the inode->i_mutex across both the
> filemap_write_and_wait() call and the fsync() call itself. However we know
> that filemap_write_and_wait() is already safe against livelocks, so we only
> need to hold the mutex across the fsync() call.
> 
> Fix this by reusing vfs_fsync(), which already does the right thing.
> Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
> improve the efficiency for clients that do specify a range.

I already sent a patch to replace nfsd_sync with it that should be
queued up.  We can't use vfs_fsync for nfsd_sync_dir as we're already
holding i_mutex when calling it.  The vfs_fsync_range optimizations
seems like something that should be applied ontop, though.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range()
  2010-01-29 20:50 ` Christoph Hellwig
@ 2010-01-29 20:57   ` Trond Myklebust
  2010-01-29 21:18     ` nfsd: Use vfs_fsync_range() in nfsd_commit Trond Myklebust
  0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2010-01-29 20:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dr. J. Bruce Fields, linux-nfs

On Fri, 2010-01-29 at 15:50 -0500, Christoph Hellwig wrote: 
> On Fri, Jan 29, 2010 at 03:19:13PM -0500, Trond Myklebust wrote:
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > 
> > Currently, the nfs server holds the inode->i_mutex across both the
> > filemap_write_and_wait() call and the fsync() call itself. However we know
> > that filemap_write_and_wait() is already safe against livelocks, so we only
> > need to hold the mutex across the fsync() call.
> > 
> > Fix this by reusing vfs_fsync(), which already does the right thing.
> > Also make sure that we use vfs_fsync_range() in the COMMIT operation, to
> > improve the efficiency for clients that do specify a range.
> 
> I already sent a patch to replace nfsd_sync with it that should be
> queued up.  We can't use vfs_fsync for nfsd_sync_dir as we're already
> holding i_mutex when calling it.  The vfs_fsync_range optimizations
> seems like something that should be applied ontop, though.
> 

OK. I missed your patch as it flew by on the list, but I assume it is
this one:

http://git.linux-nfs.org/?p=bfields/linux.git;a=commitdiff;h=6a68f89ee1f2d177af4a5410fa7a45734c975fd6;hp=de3cab793c6a5c8505d66bee111edcc7098380ba


I'll separate out the vfs_sync_range() changes and cobble up a patch on
top of the above changeset...

Cheers
  Trond

^ permalink raw reply	[flat|nested] 14+ messages in thread

* nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 20:57   ` Trond Myklebust
@ 2010-01-29 21:18     ` Trond Myklebust
  2010-01-29 21:27       ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2010-01-29 21:18 UTC (permalink / raw)
  To: Christoph Hellwig, Dr. J. Bruce Fields; +Cc: linux-nfs

On Fri, 2010-01-29 at 15:57 -0500, Trond Myklebust wrote: 
> OK. I missed your patch as it flew by on the list, but I assume it is
> this one:
> 
> http://git.linux-nfs.org/?p=bfields/linux.git;a=commitdiff;h=6a68f89ee1f2d177af4a5410fa7a45734c975fd6;hp=de3cab793c6a5c8505d66bee111edcc7098380ba
> 
> 
> I'll separate out the vfs_sync_range() changes and cobble up a patch on
> top of the above changeset...
> 
> Cheers
>   Trond

Here it is....

Cheers
  Trond
------------------------------------------------------------------------------------------------------------------- 
nfsd: Use vfs_fsync_range() in nfsd_commit

From: Trond Myklebust <Trond.Myklebust@netapp.com>

The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfsd/vfs.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 79d216f..cc046ca 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1141,8 +1141,6 @@ out:
 #ifdef CONFIG_NFSD_V3
 /*
  * Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
  *
  * Unfortunately we cannot lock the file to make sure we return full WCC
  * data to the client, as locking happens lower down in the filesystem.
@@ -1152,23 +1150,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
                loff_t offset, unsigned long count)
 {
 	struct file	*file;
-	__be32		err;
+	loff_t		end = LLONG_MAX;
+	__be32		err = nfserr_inval;
 
-	if ((u64)count > ~(u64)offset)
-		return nfserr_inval;
+	if (offset < 0)
+		goto out;
+	if (count != 0) {
+		end = offset + (loff_t)count - 1;
+		if (end < offset)
+			goto out;
+	}
 
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
 	if (err)
-		return err;
+		goto out;
 	if (EX_ISSYNC(fhp->fh_export)) {
-		if (file->f_op && file->f_op->fsync) {
-			err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
-		} else {
+		int err2 = vfs_fsync_range(file, file->f_path.dentry,
+				offset, end, 0);
+
+		if (err2 != -EINVAL)
+			err = nfserrno(err2);
+		else
 			err = nfserr_notsupp;
-		}
 	}
 
 	nfsd_close(file);
+out:
 	return err;
 }
 #endif /* CONFIG_NFSD_V3 */


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 21:18     ` nfsd: Use vfs_fsync_range() in nfsd_commit Trond Myklebust
@ 2010-01-29 21:27       ` Christoph Hellwig
  2010-01-29 21:39         ` Trond Myklebust
  2010-01-29 21:44         ` [PATCH v4] " Trond Myklebust
  0 siblings, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2010-01-29 21:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Christoph Hellwig, Dr. J. Bruce Fields, linux-nfs

On Fri, Jan 29, 2010 at 04:18:26PM -0500, Trond Myklebust wrote:
> nfsd: Use vfs_fsync_range() in nfsd_commit
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> The NFS COMMIT operation allows the client to specify the exact byte range
> that it wishes to sync to disk in order to optimise server performance.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Looks good to me,


Reviewed-by: Christoph Hellwig <hch@lst.de>

>   * Commit all pending writes to stable storage.
> - * Strictly speaking, we could sync just the indicated file region here,
> - * but there's currently no way we can ask the VFS to do so.

The commit above could use some addition that the commit only happens
for the specified range.


And not actually touched by your patch, but that is the reason to
open/close the file if we don't actually do anything with it for an
async export?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 21:27       ` Christoph Hellwig
@ 2010-01-29 21:39         ` Trond Myklebust
  2010-01-29 23:58           ` Dr. J. Bruce Fields
  2010-01-29 21:44         ` [PATCH v4] " Trond Myklebust
  1 sibling, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2010-01-29 21:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dr. J. Bruce Fields, linux-nfs

On Fri, 2010-01-29 at 16:27 -0500, Christoph Hellwig wrote: 
> On Fri, Jan 29, 2010 at 04:18:26PM -0500, Trond Myklebust wrote:
> > nfsd: Use vfs_fsync_range() in nfsd_commit
> > 
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > 
> > The NFS COMMIT operation allows the client to specify the exact byte range
> > that it wishes to sync to disk in order to optimise server performance.
> > 
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> Looks good to me,
> 
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

> >   * Commit all pending writes to stable storage.
> > - * Strictly speaking, we could sync just the indicated file region here,
> > - * but there's currently no way we can ask the VFS to do so.
> 
> The commit above could use some addition that the commit only happens
> for the specified range.

I'll add in a couple of words. 

> And not actually touched by your patch, but that is the reason to
> open/close the file if we don't actually do anything with it for an
> async export?

I must admit that I was wondering about that too. I'm assuming that the
reason is to provide consistent behaviour w.r.t. access checks and
possibly also to ensure that NFSv4 delegations are revoked. Perhaps
Bruce could comment?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4] nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 21:27       ` Christoph Hellwig
  2010-01-29 21:39         ` Trond Myklebust
@ 2010-01-29 21:44         ` Trond Myklebust
  2010-01-29 23:53           ` Dr. J. Bruce Fields
  1 sibling, 1 reply; 14+ messages in thread
From: Trond Myklebust @ 2010-01-29 21:44 UTC (permalink / raw)
  To: Christoph Hellwig, Dr. J. Bruce Fields; +Cc: linux-nfs

nfsd: Use vfs_fsync_range() in nfsd_commit

From: Trond Myklebust <Trond.Myklebust@netapp.com>

The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---

 fs/nfsd/vfs.c |   30 ++++++++++++++++++++----------
 1 files changed, 20 insertions(+), 10 deletions(-)


diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 79d216f..ed024d3 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1141,8 +1141,9 @@ out:
 #ifdef CONFIG_NFSD_V3
 /*
  * Commit all pending writes to stable storage.
- * Strictly speaking, we could sync just the indicated file region here,
- * but there's currently no way we can ask the VFS to do so.
+ *
+ * Note: we only guarantee that data that lies within the range specified
+ * by the 'offset' and 'count' parameters will be synced.
  *
  * Unfortunately we cannot lock the file to make sure we return full WCC
  * data to the client, as locking happens lower down in the filesystem.
@@ -1152,23 +1153,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
                loff_t offset, unsigned long count)
 {
 	struct file	*file;
-	__be32		err;
+	loff_t		end = LLONG_MAX;
+	__be32		err = nfserr_inval;
 
-	if ((u64)count > ~(u64)offset)
-		return nfserr_inval;
+	if (offset < 0)
+		goto out;
+	if (count != 0) {
+		end = offset + (loff_t)count - 1;
+		if (end < offset)
+			goto out;
+	}
 
 	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
 	if (err)
-		return err;
+		goto out;
 	if (EX_ISSYNC(fhp->fh_export)) {
-		if (file->f_op && file->f_op->fsync) {
-			err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
-		} else {
+		int err2 = vfs_fsync_range(file, file->f_path.dentry,
+				offset, end, 0);
+
+		if (err2 != -EINVAL)
+			err = nfserrno(err2);
+		else
 			err = nfserr_notsupp;
-		}
 	}
 
 	nfsd_close(file);
+out:
 	return err;
 }
 #endif /* CONFIG_NFSD_V3 */


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4] nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 21:44         ` [PATCH v4] " Trond Myklebust
@ 2010-01-29 23:53           ` Dr. J. Bruce Fields
  2010-01-29 23:54             ` Dr. J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Dr. J. Bruce Fields @ 2010-01-29 23:53 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Christoph Hellwig, linux-nfs

On Fri, Jan 29, 2010 at 04:44:25PM -0500, Trond Myklebust wrote:
> nfsd: Use vfs_fsync_range() in nfsd_commit
> 
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> The NFS COMMIT operation allows the client to specify the exact byte range
> that it wishes to sync to disk in order to optimise server performance.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Applied, thanks!

--b.

> ---
> 
>  fs/nfsd/vfs.c |   30 ++++++++++++++++++++----------
>  1 files changed, 20 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 79d216f..ed024d3 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1141,8 +1141,9 @@ out:
>  #ifdef CONFIG_NFSD_V3
>  /*
>   * Commit all pending writes to stable storage.
> - * Strictly speaking, we could sync just the indicated file region here,
> - * but there's currently no way we can ask the VFS to do so.
> + *
> + * Note: we only guarantee that data that lies within the range specified
> + * by the 'offset' and 'count' parameters will be synced.
>   *
>   * Unfortunately we cannot lock the file to make sure we return full WCC
>   * data to the client, as locking happens lower down in the filesystem.
> @@ -1152,23 +1153,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
>                 loff_t offset, unsigned long count)
>  {
>  	struct file	*file;
> -	__be32		err;
> +	loff_t		end = LLONG_MAX;
> +	__be32		err = nfserr_inval;
>  
> -	if ((u64)count > ~(u64)offset)
> -		return nfserr_inval;
> +	if (offset < 0)
> +		goto out;
> +	if (count != 0) {
> +		end = offset + (loff_t)count - 1;
> +		if (end < offset)
> +			goto out;
> +	}
>  
>  	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
>  	if (err)
> -		return err;
> +		goto out;
>  	if (EX_ISSYNC(fhp->fh_export)) {
> -		if (file->f_op && file->f_op->fsync) {
> -			err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
> -		} else {
> +		int err2 = vfs_fsync_range(file, file->f_path.dentry,
> +				offset, end, 0);
> +
> +		if (err2 != -EINVAL)
> +			err = nfserrno(err2);
> +		else
>  			err = nfserr_notsupp;
> -		}
>  	}
>  
>  	nfsd_close(file);
> +out:
>  	return err;
>  }
>  #endif /* CONFIG_NFSD_V3 */
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4] nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 23:53           ` Dr. J. Bruce Fields
@ 2010-01-29 23:54             ` Dr. J. Bruce Fields
  0 siblings, 0 replies; 14+ messages in thread
From: Dr. J. Bruce Fields @ 2010-01-29 23:54 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Christoph Hellwig, linux-nfs

On Fri, Jan 29, 2010 at 06:53:54PM -0500, Dr. J. Bruce Fields wrote:
> On Fri, Jan 29, 2010 at 04:44:25PM -0500, Trond Myklebust wrote:
> > nfsd: Use vfs_fsync_range() in nfsd_commit
> > 
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > 
> > The NFS COMMIT operation allows the client to specify the exact byte range
> > that it wishes to sync to disk in order to optimise server performance.
> > 
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Applied, thanks!

Do you have any performance results, by the way?

--b.

> 
> --b.
> 
> > ---
> > 
> >  fs/nfsd/vfs.c |   30 ++++++++++++++++++++----------
> >  1 files changed, 20 insertions(+), 10 deletions(-)
> > 
> > 
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 79d216f..ed024d3 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1141,8 +1141,9 @@ out:
> >  #ifdef CONFIG_NFSD_V3
> >  /*
> >   * Commit all pending writes to stable storage.
> > - * Strictly speaking, we could sync just the indicated file region here,
> > - * but there's currently no way we can ask the VFS to do so.
> > + *
> > + * Note: we only guarantee that data that lies within the range specified
> > + * by the 'offset' and 'count' parameters will be synced.
> >   *
> >   * Unfortunately we cannot lock the file to make sure we return full WCC
> >   * data to the client, as locking happens lower down in the filesystem.
> > @@ -1152,23 +1153,32 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >                 loff_t offset, unsigned long count)
> >  {
> >  	struct file	*file;
> > -	__be32		err;
> > +	loff_t		end = LLONG_MAX;
> > +	__be32		err = nfserr_inval;
> >  
> > -	if ((u64)count > ~(u64)offset)
> > -		return nfserr_inval;
> > +	if (offset < 0)
> > +		goto out;
> > +	if (count != 0) {
> > +		end = offset + (loff_t)count - 1;
> > +		if (end < offset)
> > +			goto out;
> > +	}
> >  
> >  	err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_WRITE, &file);
> >  	if (err)
> > -		return err;
> > +		goto out;
> >  	if (EX_ISSYNC(fhp->fh_export)) {
> > -		if (file->f_op && file->f_op->fsync) {
> > -			err = nfserrno(vfs_fsync(file, file->f_path.dentry, 0));
> > -		} else {
> > +		int err2 = vfs_fsync_range(file, file->f_path.dentry,
> > +				offset, end, 0);
> > +
> > +		if (err2 != -EINVAL)
> > +			err = nfserrno(err2);
> > +		else
> >  			err = nfserr_notsupp;
> > -		}
> >  	}
> >  
> >  	nfsd_close(file);
> > +out:
> >  	return err;
> >  }
> >  #endif /* CONFIG_NFSD_V3 */
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 21:39         ` Trond Myklebust
@ 2010-01-29 23:58           ` Dr. J. Bruce Fields
  2010-03-17 19:08             ` Jeff Layton
  0 siblings, 1 reply; 14+ messages in thread
From: Dr. J. Bruce Fields @ 2010-01-29 23:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Christoph Hellwig, linux-nfs

On Fri, Jan 29, 2010 at 04:39:11PM -0500, Trond Myklebust wrote:
> On Fri, 2010-01-29 at 16:27 -0500, Christoph Hellwig wrote: 
> > And not actually touched by your patch, but that is the reason to
> > open/close the file if we don't actually do anything with it for an
> > async export?
> 
> I must admit that I was wondering about that too. I'm assuming that the
> reason is to provide consistent behaviour w.r.t. access checks and
> possibly also to ensure that NFSv4 delegations are revoked. Perhaps
> Bruce could comment?

Do delegations need to be revoked on commit?  (And do we care about
access checks?)

We could do both without the need for an actual open, but I don't know
that it matters much either way.

--b.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nfsd: Use vfs_fsync_range() in nfsd_commit
  2010-01-29 23:58           ` Dr. J. Bruce Fields
@ 2010-03-17 19:08             ` Jeff Layton
       [not found]               ` <20100317150813.43815b5a-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Layton @ 2010-03-17 19:08 UTC (permalink / raw)
  To: Dr. J. Bruce Fields; +Cc: Trond Myklebust, Christoph Hellwig, linux-nfs

On Fri, 29 Jan 2010 18:58:52 -0500
"Dr. J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Fri, Jan 29, 2010 at 04:39:11PM -0500, Trond Myklebust wrote:
> > On Fri, 2010-01-29 at 16:27 -0500, Christoph Hellwig wrote: 
> > > And not actually touched by your patch, but that is the reason to
> > > open/close the file if we don't actually do anything with it for an
> > > async export?
> > 
> > I must admit that I was wondering about that too. I'm assuming that the
> > reason is to provide consistent behaviour w.r.t. access checks and
> > possibly also to ensure that NFSv4 delegations are revoked. Perhaps
> > Bruce could comment?
> 
> Do delegations need to be revoked on commit?  (And do we care about
> access checks?)
> 
> We could do both without the need for an actual open, but I don't know
> that it matters much either way.
> 

Is there any reason that we need to revoke the delegation on a commit?

The only real effect is that writes would be flushed to stable storage.
The VM on the server could just decide the flush the data to stable
storage whenever it feels like it without revoking the lease. I don't
see why we'd need to revoke the lease just because a client asked for
the flush.

Bruce has already chimed in on it, but a deadlock of sorts has popped
up relating to this:

    https://bugzilla.redhat.com/show_bug.cgi?id=551028#c10

...and it seems like not recalling the delegation on a COMMIT might
help resolve it.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: nfsd: Use vfs_fsync_range() in nfsd_commit
       [not found]               ` <20100317150813.43815b5a-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2010-03-17 19:35                 ` Trond Myklebust
  0 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2010-03-17 19:35 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Dr. J. Bruce Fields, Christoph Hellwig, linux-nfs

On Wed, 2010-03-17 at 15:08 -0400, Jeff Layton wrote: 
> Is there any reason that we need to revoke the delegation on a commit?
> 
> The only real effect is that writes would be flushed to stable storage.
> The VM on the server could just decide the flush the data to stable
> storage whenever it feels like it without revoking the lease. I don't
> see why we'd need to revoke the lease just because a client asked for
> the flush.
> 
> Bruce has already chimed in on it, but a deadlock of sorts has popped
> up relating to this:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=551028#c10
> 
> ...and it seems like not recalling the delegation on a COMMIT might
> help resolve it.

I agree. A commit doesn't change the inode in any way, so it shouldn't
require you to revoke the delegation.

Furthermore, I think that the write access check there is incorrect too.
The posix definition of fsync() doesn't specify that the file descriptor
has be writeable, and since COMMIT semantics are supposed to be modelled
on fsync...

Cheers
   Trond

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-03-17 19:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-29 20:19 nfsd: Replace nfsd_sync() with vfs_fsync() and vfs_fsync_range() Trond Myklebust
2010-01-29 20:35 ` Trond Myklebust
2010-01-29 20:44 ` [PATCH v2] " Trond Myklebust
2010-01-29 20:50 ` Christoph Hellwig
2010-01-29 20:57   ` Trond Myklebust
2010-01-29 21:18     ` nfsd: Use vfs_fsync_range() in nfsd_commit Trond Myklebust
2010-01-29 21:27       ` Christoph Hellwig
2010-01-29 21:39         ` Trond Myklebust
2010-01-29 23:58           ` Dr. J. Bruce Fields
2010-03-17 19:08             ` Jeff Layton
     [not found]               ` <20100317150813.43815b5a-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-03-17 19:35                 ` Trond Myklebust
2010-01-29 21:44         ` [PATCH v4] " Trond Myklebust
2010-01-29 23:53           ` Dr. J. Bruce Fields
2010-01-29 23:54             ` Dr. J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox