All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()
Date: Fri, 7 Oct 2011 22:29:28 +0800	[thread overview]
Message-ID: <20111007142928.GA14427@localhost> (raw)
In-Reply-To: <20111007142201.GB30754@quack.suse.cz>

On Fri, Oct 07, 2011 at 10:22:01PM +0800, Jan Kara wrote:
> On Fri 07-10-11 21:43:47, Wu Fengguang wrote:
> > >   Great, thanks for review! I'll resend the two patches to Christoph so
> > > that he can try them.
> > 
> > Jan, I'd like to test out your updated patches with my stupid dd
> > workloads. Would you (re)send them publicly?
>   Ah, I resent them publicly on Wednesday
> (http://comments.gmane.org/gmane.linux.kernel/1199713) but git send-email
> apparently does not include emails from Acked-by into list of recipients so
> you didn't get them. Sorry for that. The patches are attached for your
> convenience.

OK thanks. I only checked the linux-fsdevel list before asking..
The results should be ready tomorrow.

Thanks,
Fengguang

> From a042c2a839ad3cf89d8ee158b2bb4b94b573f578 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 8 Sep 2011 01:05:25 +0200
> Subject: [PATCH 1/2] writeback: Improve busyloop prevention
> 
> Writeback of an inode can be stalled by things like internal fs locks being
> held. So in case we didn't write anything during a pass through b_io list,
> just wait for a moment and try again.
> 
> CC: Christoph Hellwig <hch@infradead.org>
> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/fs-writeback.c |   26 ++++++++++++++------------
>  1 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 04cf3b9..bdeb26a 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -699,8 +699,8 @@ static long wb_writeback(struct bdi_writeback *wb,
>  	unsigned long wb_start = jiffies;
>  	long nr_pages = work->nr_pages;
>  	unsigned long oldest_jif;
> -	struct inode *inode;
>  	long progress;
> +	long pause = 1;
>  
>  	oldest_jif = jiffies;
>  	work->older_than_this = &oldest_jif;
> @@ -755,25 +755,27 @@ static long wb_writeback(struct bdi_writeback *wb,
>  		 * mean the overall work is done. So we keep looping as long
>  		 * as made some progress on cleaning pages or inodes.
>  		 */
> -		if (progress)
> +		if (progress) {
> +			pause = 1;
>  			continue;
> +		}
>  		/*
>  		 * No more inodes for IO, bail
>  		 */
>  		if (list_empty(&wb->b_more_io))
>  			break;
>  		/*
> -		 * Nothing written. Wait for some inode to
> -		 * become available for writeback. Otherwise
> -		 * we'll just busyloop.
> +		 * Nothing written (some internal fs locks were unavailable or
> +		 * inode was under writeback from balance_dirty_pages() or
> +		 * similar conditions).  Wait for a while to avoid busylooping.
>  		 */
> -		if (!list_empty(&wb->b_more_io))  {
> -			trace_writeback_wait(wb->bdi, work);
> -			inode = wb_inode(wb->b_more_io.prev);
> -			spin_lock(&inode->i_lock);
> -			inode_wait_for_writeback(inode, wb);
> -			spin_unlock(&inode->i_lock);
> -		}
> +		trace_writeback_wait(wb->bdi, work);
> +		spin_unlock(&wb->list_lock);
> +		__set_current_state(TASK_INTERRUPTIBLE);
> +		schedule_timeout(pause);
> +		if (pause < HZ / 10)
> +			pause <<= 1;
> +		spin_lock(&wb->list_lock);
>  	}
>  	spin_unlock(&wb->list_lock);
>  
> -- 
> 1.7.1
> 

> From 0a4a2cb4d5432f5446215b1e6e44f7d83032dba3 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 8 Sep 2011 01:46:42 +0200
> Subject: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()
> 
> Calling redirty_tail() can put off inode writeback for upto 30 seconds (or
> whatever dirty_expire_centisecs is). This is unnecessarily big delay in some
> cases and in other cases it is a really bad thing. In particular XFS tries to
> be nice to writeback and when ->write_inode is called for an inode with locked
> ilock, it just redirties the inode and returns EAGAIN. That currently causes
> writeback_single_inode() to redirty_tail() the inode. As contended ilock is
> common thing with XFS while extending files the result can be that inode
> writeout is put off for a really long time.
> 
> Now that we have more robust busyloop prevention in wb_writeback() we can
> call requeue_io() in cases where quick retry is required without fear of
> raising CPU consumption too much.
> 
> CC: Christoph Hellwig <hch@infradead.org>
> Acked-by: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/fs-writeback.c |   61 ++++++++++++++++++++++++----------------------------
>  1 files changed, 28 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index bdeb26a..c786023 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -356,6 +356,7 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
>  	long nr_to_write = wbc->nr_to_write;
>  	unsigned dirty;
>  	int ret;
> +	bool inode_written = false;
>  
>  	assert_spin_locked(&wb->list_lock);
>  	assert_spin_locked(&inode->i_lock);
> @@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
>  	/* Don't write the inode if only I_DIRTY_PAGES was set */
>  	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
>  		int err = write_inode(inode, wbc);
> +		if (!err)
> +			inode_written = true;
>  		if (ret == 0)
>  			ret = err;
>  	}
> @@ -430,42 +433,39 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
>  	if (!(inode->i_state & I_FREEING)) {
>  		/*
>  		 * Sync livelock prevention. Each inode is tagged and synced in
> -		 * one shot. If still dirty, it will be redirty_tail()'ed below.
> -		 * Update the dirty time to prevent enqueue and sync it again.
> +		 * one shot. If still dirty, update dirty time and put it back
> +		 * to dirty list to prevent enqueue and syncing it again.
>  		 */
>  		if ((inode->i_state & I_DIRTY) &&
> -		    (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages))
> +		    (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)) {
>  			inode->dirtied_when = jiffies;
> -
> -		if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> +			redirty_tail(inode, wb);
> +		} else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
>  			/*
> -			 * We didn't write back all the pages.  nfs_writepages()
> -			 * sometimes bales out without doing anything.
> +			 * We didn't write back all the pages. nfs_writepages()
> +			 * sometimes bales out without doing anything or we
> +			 * just run our of our writeback slice.
>  			 */
>  			inode->i_state |= I_DIRTY_PAGES;
> -			if (wbc->nr_to_write <= 0) {
> -				/*
> -				 * slice used up: queue for next turn
> -				 */
> -				requeue_io(inode, wb);
> -			} else {
> -				/*
> -				 * Writeback blocked by something other than
> -				 * congestion. Delay the inode for some time to
> -				 * avoid spinning on the CPU (100% iowait)
> -				 * retrying writeback of the dirty page/inode
> -				 * that cannot be performed immediately.
> -				 */
> -				redirty_tail(inode, wb);
> -			}
> +			requeue_io(inode, wb);
>  		} else if (inode->i_state & I_DIRTY) {
>  			/*
>  			 * Filesystems can dirty the inode during writeback
>  			 * operations, such as delayed allocation during
>  			 * submission or metadata updates after data IO
> -			 * completion.
> +			 * completion. Also inode could have been dirtied by
> +			 * some process aggressively touching metadata.
> +			 * Finally, filesystem could just fail to write the
> +			 * inode for some reason. We have to distinguish the
> +			 * last case from the previous ones - in the last case
> +			 * we want to give the inode quick retry, in the
> +			 * other cases we want to put it back to the dirty list
> +			 * to avoid livelocking of writeback.
>  			 */
> -			redirty_tail(inode, wb);
> +			if (inode_written)
> +				redirty_tail(inode, wb);
> +			else
> +				requeue_io(inode, wb);
>  		} else {
>  			/*
>  			 * The inode is clean.  At this point we either have
> @@ -583,10 +583,10 @@ static long writeback_sb_inodes(struct super_block *sb,
>  			wrote++;
>  		if (wbc.pages_skipped) {
>  			/*
> -			 * writeback is not making progress due to locked
> -			 * buffers.  Skip this inode for now.
> +			 * Writeback is not making progress due to unavailable
> +			 * fs locks or similar condition. Retry in next round.
>  			 */
> -			redirty_tail(inode, wb);
> +			requeue_io(inode, wb);
>  		}
>  		spin_unlock(&inode->i_lock);
>  		spin_unlock(&wb->list_lock);
> @@ -618,12 +618,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
>  		struct super_block *sb = inode->i_sb;
>  
>  		if (!grab_super_passive(sb)) {
> -			/*
> -			 * grab_super_passive() may fail consistently due to
> -			 * s_umount being grabbed by someone else. Don't use
> -			 * requeue_io() to avoid busy retrying the inode/sb.
> -			 */
> -			redirty_tail(inode, wb);
> +			requeue_io(inode, wb);
>  			continue;
>  		}
>  		wrote += writeback_sb_inodes(sb, wb, work);
> -- 
> 1.7.1
> 


  reply	other threads:[~2011-10-07 14:29 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-08  0:44 [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-09-08  0:44 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-09-08  1:22   ` Wu Fengguang
2011-09-08 15:03     ` Jan Kara
2011-09-18 14:07       ` Wu Fengguang
2011-10-05 17:39         ` Jan Kara
2011-10-07 13:43           ` Wu Fengguang
2011-10-07 14:22             ` Jan Kara
2011-10-07 14:29               ` Wu Fengguang [this message]
2011-10-07 14:45                 ` Jan Kara
2011-10-07 15:29                   ` Wu Fengguang
2011-10-08  4:00                   ` Wu Fengguang
2011-10-08 11:52                     ` Wu Fengguang
2011-10-08 13:49                       ` Wu Fengguang
2011-10-09  0:27                         ` Wu Fengguang
2011-10-09  8:44                           ` Wu Fengguang
2011-10-10 11:21                     ` Jan Kara
2011-10-10 11:31                       ` Wu Fengguang
2011-10-10 23:30                         ` Jan Kara
2011-10-11  2:36                           ` Wu Fengguang
2011-10-11 21:53                             ` Jan Kara
2011-10-12  2:44                               ` Wu Fengguang
2011-10-12 19:34                                 ` Jan Kara
2011-09-08  0:57 ` [PATCH 1/2] writeback: Improve busyloop prevention Wu Fengguang
2011-09-08 13:49   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2011-10-05 17:58 [PATCH 0/2] Avoid putting of writeback of inodes for too long (v3) Jan Kara
2011-10-05 17:58 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-10-12 20:57 [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Jan Kara
2011-10-12 20:57 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-10-13 14:30   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111007142928.GA14427@localhost \
    --to=fengguang.wu@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.