linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Trond Myklebust
	<Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	Steve Rago <sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org>,
	"linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"jens.axboe" <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Peter Staubach <staubach-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Arjan van de Ven <arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>,
	"linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes
Date: Thu, 7 Jan 2010 10:18:49 +0800	[thread overview]
Message-ID: <20100107021849.GC9475@localhost> (raw)
In-Reply-To: <20100106205110.22547.17971.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

On Thu, Jan 07, 2010 at 04:51:10AM +0800, Trond Myklebust wrote:
> If the call to do_writepages() succeeded in starting writeback, we do not
> know whether or not we will need to COMMIT any unstable writes until after
> the write RPC calls are finished. Currently, we assume that at least one
> write RPC call will have finished, and set I_DIRTY_DATASYNC by the time
> do_writepages is done, so that write_inode() is triggered.
> 
> In order to ensure reliable operation (i.e. ensure that a single call to
> writeback_single_inode() with WB_SYNC_ALL set suffices to ensure that pages
> are on disk) we need to first wait for filemap_fdatawait() to complete,
> then test for unstable pages.
> 
> Since NFS is currently the only filesystem that has unstable pages, we can
> add a new inode state I_UNSTABLE_PAGES that NFS alone will set. When set,
> this will trigger a callback to a new address_space_operation to call the
> COMMIT.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
> Acked-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> ---
> 
>  fs/fs-writeback.c  |   31 ++++++++++++++++++++++++++++++-
>  fs/nfs/file.c      |    1 +
>  fs/nfs/inode.c     |   16 ----------------
>  fs/nfs/internal.h  |    3 ++-
>  fs/nfs/super.c     |    2 --
>  fs/nfs/write.c     |   33 ++++++++++++++++++++++++++++++++-
>  include/linux/fs.h |    9 +++++++++
>  7 files changed, 74 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 1a7c42c..3bc0a96 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -389,6 +389,17 @@ static int write_inode(struct inode *inode, int sync)
>  }
>  
>  /*
> + * Commit the NFS unstable pages.
> + */
> +static int commit_unstable_pages(struct address_space *mapping,
> +		struct writeback_control *wbc)
> +{
> +	if (mapping->a_ops && mapping->a_ops->commit_unstable_pages)
> +		return mapping->a_ops->commit_unstable_pages(mapping, wbc);
> +	return 0;
> +}
> +
> +/*
>   * Wait for writeback on an inode to complete.
>   */
>  static void inode_wait_for_writeback(struct inode *inode)
> @@ -475,6 +486,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
>  	}
>  
>  	spin_lock(&inode_lock);
> +	/*
> +	 * Special state for cleaning NFS unstable pages
> +	 */
> +	if (inode->i_state & I_UNSTABLE_PAGES) {
> +		int err;
> +		inode->i_state &= ~I_UNSTABLE_PAGES;
> +		spin_unlock(&inode_lock);
> +		err = commit_unstable_pages(mapping, wbc);
> +		if (ret == 0)
> +			ret = err;
> +		spin_lock(&inode_lock);
> +	}
>  	inode->i_state &= ~I_SYNC;
>  	if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
>  		if ((inode->i_state & I_DIRTY_PAGES) && wbc->for_kupdate) {
> @@ -533,6 +556,12 @@ select_queue:
>  				inode->i_state |= I_DIRTY_PAGES;
>  				redirty_tail(inode);
>  			}
> +		} else if (inode->i_state & I_UNSTABLE_PAGES) {
> +			/*
> +			 * The inode has got yet more unstable pages to
> +			 * commit. Requeue on b_more_io
> +			 */
> +			requeue_io(inode);

This risks "busy retrying" inodes with unstable pages, when

- nfs_commit_unstable_pages() don't think it's time to commit
- NFS server somehow response slowly

The workaround is to use redirty_tail() for now. But that risks delay
the COMMIT for up to 30s, which obviously might stuck applications in
balance_dirty_pages() for too long.

I have a patch to shorten the retry time to 1s (or other constant)
by introducing b_more_io_wait. It currently sits in my writeback queue
series whose main blocking issue is the constantly broken NFS pipeline..

>  		} else if (atomic_read(&inode->i_count)) {
>  			/*
>  			 * The inode is clean, inuse
> @@ -1051,7 +1080,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  
>  	spin_lock(&inode_lock);
>  	if ((inode->i_state & flags) != flags) {
> -		const int was_dirty = inode->i_state & I_DIRTY;
> +		const int was_dirty = inode->i_state & (I_DIRTY|I_UNSTABLE_PAGES);
>  
>  		inode->i_state |= flags;
>  
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 6b89132..67e50ac 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -526,6 +526,7 @@ const struct address_space_operations nfs_file_aops = {
>  	.migratepage = nfs_migrate_page,
>  	.launder_page = nfs_launder_page,
>  	.error_remove_page = generic_error_remove_page,
> +	.commit_unstable_pages = nfs_commit_unstable_pages,
>  };
>  
>  /*
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index faa0918..8341709 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -97,22 +97,6 @@ u64 nfs_compat_user_ino64(u64 fileid)
>  	return ino;
>  }
>  
> -int nfs_write_inode(struct inode *inode, int sync)
> -{
> -	int ret;
> -
> -	if (sync) {
> -		ret = filemap_fdatawait(inode->i_mapping);
> -		if (ret == 0)
> -			ret = nfs_commit_inode(inode, FLUSH_SYNC);
> -	} else
> -		ret = nfs_commit_inode(inode, 0);
> -	if (ret >= 0)
> -		return 0;
> -	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> -	return ret;
> -}
> -
>  void nfs_clear_inode(struct inode *inode)
>  {
>  	/*
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 29e464d..7bb326f 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -211,7 +211,6 @@ extern int nfs_access_cache_shrinker(int nr_to_scan, gfp_t gfp_mask);
>  extern struct workqueue_struct *nfsiod_workqueue;
>  extern struct inode *nfs_alloc_inode(struct super_block *sb);
>  extern void nfs_destroy_inode(struct inode *);
> -extern int nfs_write_inode(struct inode *,int);
>  extern void nfs_clear_inode(struct inode *);
>  #ifdef CONFIG_NFS_V4
>  extern void nfs4_clear_inode(struct inode *);
> @@ -253,6 +252,8 @@ extern int nfs4_path_walk(struct nfs_server *server,
>  extern void nfs_read_prepare(struct rpc_task *task, void *calldata);
>  
>  /* write.c */
> +extern int nfs_commit_unstable_pages(struct address_space *mapping,
> +		struct writeback_control *wbc);
>  extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
>  #ifdef CONFIG_MIGRATION
>  extern int nfs_migrate_page(struct address_space *,
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index ce907ef..805c1a0 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -265,7 +265,6 @@ struct file_system_type nfs_xdev_fs_type = {
>  static const struct super_operations nfs_sops = {
>  	.alloc_inode	= nfs_alloc_inode,
>  	.destroy_inode	= nfs_destroy_inode,
> -	.write_inode	= nfs_write_inode,
>  	.statfs		= nfs_statfs,
>  	.clear_inode	= nfs_clear_inode,
>  	.umount_begin	= nfs_umount_begin,
> @@ -334,7 +333,6 @@ struct file_system_type nfs4_referral_fs_type = {
>  static const struct super_operations nfs4_sops = {
>  	.alloc_inode	= nfs_alloc_inode,
>  	.destroy_inode	= nfs_destroy_inode,
> -	.write_inode	= nfs_write_inode,
>  	.statfs		= nfs_statfs,
>  	.clear_inode	= nfs4_clear_inode,
>  	.umount_begin	= nfs_umount_begin,
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index d171696..910be28 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -441,7 +441,7 @@ nfs_mark_request_commit(struct nfs_page *req)
>  	spin_unlock(&inode->i_lock);
>  	inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
>  	inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
> -	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> +	mark_inode_unstable_pages(inode);

Then we shall mark I_DIRTY_DATASYNC on other places that extend i_size.

>  }
>  
>  static int
> @@ -1406,11 +1406,42 @@ int nfs_commit_inode(struct inode *inode, int how)
>  	}
>  	return res;
>  }
> +
> +int nfs_commit_unstable_pages(struct address_space *mapping,
> +		struct writeback_control *wbc)
> +{
> +	struct inode *inode = mapping->host;
> +	int flags = FLUSH_SYNC;
> +	int ret;
> +
> +	/* Don't commit yet if this is a non-blocking flush and there are
> +	 * outstanding writes for this mapping.
> +	 */
> +	if (wbc->sync_mode != WB_SYNC_ALL &&
> +	    radix_tree_tagged(&NFS_I(inode)->nfs_page_tree,
> +		    NFS_PAGE_TAG_LOCKED)) {
> +		mark_inode_unstable_pages(inode);
> +		return 0;
> +	}

A dumb question: does NFS_PAGE_TAG_LOCKED means either flying COMMITs
or WRITEs? As an NFS newbie, I'm only confident on the COMMIT part :)

> +	if (wbc->nonblocking)
> +		flags = 0;
> +	ret = nfs_commit_inode(inode, flags);
> +	if (ret > 0)
> +		ret = 0;
> +	return ret;
> +}
> +
>  #else
>  static inline int nfs_commit_list(struct inode *inode, struct list_head *head, int how)
>  {
>  	return 0;
>  }
> +
> +int nfs_commit_unstable_pages(struct address_space *mapping,
> +		struct writeback_control *wbc)
> +{
> +	return 0;
> +}
>  #endif
>  
>  long nfs_sync_mapping_wait(struct address_space *mapping, struct writeback_control *wbc, int how)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9147ca8..ea0b7a3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -602,6 +602,8 @@ struct address_space_operations {
>  	int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
>  					unsigned long);
>  	int (*error_remove_page)(struct address_space *, struct page *);
> +	int (*commit_unstable_pages)(struct address_space *,
> +			struct writeback_control *);
>  };
>  
>  /*
> @@ -1635,6 +1637,8 @@ struct super_operations {
>  #define I_CLEAR			64
>  #define __I_SYNC		7
>  #define I_SYNC			(1 << __I_SYNC)
> +#define __I_UNSTABLE_PAGES	9
> +#define I_UNSTABLE_PAGES	(1 << __I_UNSTABLE_PAGES)
>  
>  #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
>  
> @@ -1649,6 +1653,11 @@ static inline void mark_inode_dirty_sync(struct inode *inode)
>  	__mark_inode_dirty(inode, I_DIRTY_SYNC);
>  }
>  
> +static inline void mark_inode_unstable_pages(struct inode *inode)
> +{
> +	__mark_inode_dirty(inode, I_UNSTABLE_PAGES);
> +}
> +
>  /**
>   * inc_nlink - directly increment an inode's link count
>   * @inode: inode
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-01-07  2:18 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1261015420.1947.54.camel@serenity>
     [not found] ` <1261037877.27920.36.camel@laptop>
     [not found]   ` <20091219122033.GA11360@localhost>
     [not found]     ` <1261232747.1947.194.camel@serenity>
2009-12-22  1:59       ` [PATCH] improve the performance of large sequential write NFS workloads Wu Fengguang
2009-12-22 12:35         ` Jan Kara
     [not found]           ` <20091222123538.GB604-jyMamyUUXNJG4ohzP4jBZS1Fcj925eT/@public.gmane.org>
2009-12-23  8:43             ` Christoph Hellwig
     [not found]               ` <20091223084302.GA14912-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-12-23 13:32                 ` Jan Kara
     [not found]                   ` <20091223133244.GB3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-24  5:25                     ` Wu Fengguang
2009-12-24  1:26           ` Wu Fengguang
2009-12-22 16:41         ` Steve Rago
2009-12-24  1:21           ` Wu Fengguang
2009-12-24 14:49             ` Steve Rago
2009-12-25  7:37               ` Wu Fengguang
2009-12-23 14:21         ` Trond Myklebust
2009-12-23 18:05           ` Jan Kara
     [not found]             ` <20091223180551.GD3159-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2009-12-23 19:12               ` Trond Myklebust
2009-12-24  2:52                 ` Wu Fengguang
2009-12-24 12:04                   ` Trond Myklebust
2009-12-25  5:56                     ` Wu Fengguang
2009-12-30 16:22                       ` Trond Myklebust
2009-12-31  5:04                         ` Wu Fengguang
2009-12-31 19:13                           ` Trond Myklebust
2010-01-06  3:03                             ` Wu Fengguang
2010-01-06 16:56                               ` Trond Myklebust
2010-01-06 18:26                                 ` Trond Myklebust
2010-01-06 18:37                                   ` Peter Zijlstra
2010-01-06 18:52                                     ` Trond Myklebust
2010-01-06 19:07                                       ` Peter Zijlstra
2010-01-06 19:21                                         ` Trond Myklebust
2010-01-06 19:53                                           ` Trond Myklebust
2010-01-06 20:09                                             ` Jan Kara
     [not found]                                               ` <20100106200928.GB22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 20:51                                                 ` [PATCH 0/6] " Trond Myklebust
     [not found]                                                   ` <20100106205110.22547.85345.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 20:51                                                     ` [PATCH 3/6] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.93554.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  1:48                                                         ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 2/6] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  2:29                                                       ` Wu Fengguang
2010-01-07  4:49                                                         ` Trond Myklebust
2010-01-07  5:03                                                           ` Wu Fengguang
2010-01-07  5:30                                                             ` Trond Myklebust
2010-01-07 14:37                                                               ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 5/6] VM: Use per-bdi unstable accounting to improve use of wbc->force_commit Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.32584.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:34                                                         ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 6/6] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.31434.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  2:32                                                         ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 4/6] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  1:56                                                       ` Wu Fengguang
2010-01-06 20:51                                                     ` [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
     [not found]                                                       ` <20100106205110.22547.17971.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-06 21:38                                                         ` Jan Kara
     [not found]                                                           ` <20100106213843.GD22781-+0h/O2h83AeN3ZZ/Hiejyg@public.gmane.org>
2010-01-06 21:48                                                             ` Trond Myklebust
2010-01-07  2:18                                                         ` Wu Fengguang [this message]
     [not found]                                                           ` <1262839082.2185.15.camel@localhost>
2010-01-07  4:48                                                             ` Wu Fengguang
2010-01-07  4:53                                                               ` [PATCH 0/5] Re: [PATCH] improve the performance of large sequential write NFS workloads Trond Myklebust
     [not found]                                                                 ` <20100107045330.5986.55090.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-01-07  4:53                                                                   ` [PATCH 3/5] VM: Don't call bdi_stat(BDI_UNSTABLE) on non-nfs backing-devices Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 5/5] NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 2/5] VM: Split out the accounting of unstable writes from BDI_RECLAIMABLE Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 4/5] VM/NFS: The VM must tell the filesystem when to free reclaimable pages Trond Myklebust
2010-01-07  4:53                                                                   ` [PATCH 1/5] VFS: Ensure that writeback_single_inode() commits unstable writes Trond Myklebust
2010-01-07 14:56                                                             ` [PATCH 1/6] " Wu Fengguang
2010-01-07 15:10                                                               ` Trond Myklebust
2010-01-08  1:17                                                                 ` Wu Fengguang
2010-01-08  1:37                                                                   ` Trond Myklebust
2010-01-08  1:53                                                                     ` Wu Fengguang
2010-01-08  9:25                                                                 ` Christoph Hellwig
2010-01-08 13:46                                                                   ` Trond Myklebust
2010-01-08 13:54                                                                     ` Christoph Hellwig
2010-01-08 14:15                                                                       ` Trond Myklebust
2010-01-06 21:44                                                     ` [PATCH 0/6] Re: [PATCH] improve the performance of large sequential write NFS workloads Jan Kara
2010-01-06 22:03                                                       ` Trond Myklebust
2010-01-07  8:16                                                   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100107021849.GC9475@localhost \
    --to=fengguang.wu-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=jack-AlSwsSmVLrQ@public.gmane.org \
    --cc=jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mingo-X9Un+BFzKDI@public.gmane.org \
    --cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=sar-a+KepyhlMvJWk0Htik3J/w@public.gmane.org \
    --cc=staubach-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).