Re: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()
Date: Fri, 7 Oct 2011 16:22:01 +0200	[thread overview]
Message-ID: <20111007142201.GB30754@quack.suse.cz> (raw)
In-Reply-To: <20111007134347.GA6891@localhost>

[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

On Fri 07-10-11 21:43:47, Wu Fengguang wrote:
> >   Great, thanks for review! I'll resend the two patches to Christoph so
> > that he can try them.
> 
> Jan, I'd like to test out your updated patches with my stupid dd
> workloads. Would you (re)send them publicly?
  Ah, I resent them publicly on Wednesday
(http://comments.gmane.org/gmane.linux.kernel/1199713) but git send-email
apparently does not include emails from Acked-by into list of recipients so
you didn't get them. Sorry for that. The patches are attached for your
convenience.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-writeback-Improve-busyloop-prevention.patch --]
[-- Type: text/x-patch, Size: 2241 bytes --]

>From a042c2a839ad3cf89d8ee158b2bb4b94b573f578 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 8 Sep 2011 01:05:25 +0200
Subject: [PATCH 1/2] writeback: Improve busyloop prevention

Writeback of an inode can be stalled by things like internal fs locks being
held. So in case we didn't write anything during a pass through b_io list,
just wait for a moment and try again.

CC: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |   26 ++++++++++++++------------
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 04cf3b9..bdeb26a 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -699,8 +699,8 @@ static long wb_writeback(struct bdi_writeback *wb,
 	unsigned long wb_start = jiffies;
 	long nr_pages = work->nr_pages;
 	unsigned long oldest_jif;
-	struct inode *inode;
 	long progress;
+	long pause = 1;
 
 	oldest_jif = jiffies;
 	work->older_than_this = &oldest_jif;
@@ -755,25 +755,27 @@ static long wb_writeback(struct bdi_writeback *wb,
 		 * mean the overall work is done. So we keep looping as long
 		 * as made some progress on cleaning pages or inodes.
 		 */
-		if (progress)
+		if (progress) {
+			pause = 1;
 			continue;
+		}
 		/*
 		 * No more inodes for IO, bail
 		 */
 		if (list_empty(&wb->b_more_io))
 			break;
 		/*
-		 * Nothing written. Wait for some inode to
-		 * become available for writeback. Otherwise
-		 * we'll just busyloop.
+		 * Nothing written (some internal fs locks were unavailable or
+		 * inode was under writeback from balance_dirty_pages() or
+		 * similar conditions).  Wait for a while to avoid busylooping.
 		 */
-		if (!list_empty(&wb->b_more_io))  {
-			trace_writeback_wait(wb->bdi, work);
-			inode = wb_inode(wb->b_more_io.prev);
-			spin_lock(&inode->i_lock);
-			inode_wait_for_writeback(inode, wb);
-			spin_unlock(&inode->i_lock);
-		}
+		trace_writeback_wait(wb->bdi, work);
+		spin_unlock(&wb->list_lock);
+		__set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(pause);
+		if (pause < HZ / 10)
+			pause <<= 1;
+		spin_lock(&wb->list_lock);
 	}
 	spin_unlock(&wb->list_lock);
 
-- 
1.7.1


[-- Attachment #3: 0002-writeback-Replace-some-redirty_tail-calls-with-reque.patch --]
[-- Type: text/x-patch, Size: 5390 bytes --]

>From 0a4a2cb4d5432f5446215b1e6e44f7d83032dba3 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 8 Sep 2011 01:46:42 +0200
Subject: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()

Calling redirty_tail() can put off inode writeback for upto 30 seconds (or
whatever dirty_expire_centisecs is). This is unnecessarily big delay in some
cases and in other cases it is a really bad thing. In particular XFS tries to
be nice to writeback and when ->write_inode is called for an inode with locked
ilock, it just redirties the inode and returns EAGAIN. That currently causes
writeback_single_inode() to redirty_tail() the inode. As contended ilock is
common thing with XFS while extending files the result can be that inode
writeout is put off for a really long time.

Now that we have more robust busyloop prevention in wb_writeback() we can
call requeue_io() in cases where quick retry is required without fear of
raising CPU consumption too much.

CC: Christoph Hellwig <hch@infradead.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |   61 ++++++++++++++++++++++++----------------------------
 1 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index bdeb26a..c786023 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -356,6 +356,7 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
 	long nr_to_write = wbc->nr_to_write;
 	unsigned dirty;
 	int ret;
+	bool inode_written = false;
 
 	assert_spin_locked(&wb->list_lock);
 	assert_spin_locked(&inode->i_lock);
@@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
 		int err = write_inode(inode, wbc);
+		if (!err)
+			inode_written = true;
 		if (ret == 0)
 			ret = err;
 	}
@@ -430,42 +433,39 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
 	if (!(inode->i_state & I_FREEING)) {
 		/*
 		 * Sync livelock prevention. Each inode is tagged and synced in
-		 * one shot. If still dirty, it will be redirty_tail()'ed below.
-		 * Update the dirty time to prevent enqueue and sync it again.
+		 * one shot. If still dirty, update dirty time and put it back
+		 * to dirty list to prevent enqueue and syncing it again.
 		 */
 		if ((inode->i_state & I_DIRTY) &&
-		    (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages))
+		    (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)) {
 			inode->dirtied_when = jiffies;
-
-		if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+			redirty_tail(inode, wb);
+		} else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
 			/*
-			 * We didn't write back all the pages.  nfs_writepages()
-			 * sometimes bales out without doing anything.
+			 * We didn't write back all the pages. nfs_writepages()
+			 * sometimes bales out without doing anything or we
+			 * just run our of our writeback slice.
 			 */
 			inode->i_state |= I_DIRTY_PAGES;
-			if (wbc->nr_to_write <= 0) {
-				/*
-				 * slice used up: queue for next turn
-				 */
-				requeue_io(inode, wb);
-			} else {
-				/*
-				 * Writeback blocked by something other than
-				 * congestion. Delay the inode for some time to
-				 * avoid spinning on the CPU (100% iowait)
-				 * retrying writeback of the dirty page/inode
-				 * that cannot be performed immediately.
-				 */
-				redirty_tail(inode, wb);
-			}
+			requeue_io(inode, wb);
 		} else if (inode->i_state & I_DIRTY) {
 			/*
 			 * Filesystems can dirty the inode during writeback
 			 * operations, such as delayed allocation during
 			 * submission or metadata updates after data IO
-			 * completion.
+			 * completion. Also inode could have been dirtied by
+			 * some process aggressively touching metadata.
+			 * Finally, filesystem could just fail to write the
+			 * inode for some reason. We have to distinguish the
+			 * last case from the previous ones - in the last case
+			 * we want to give the inode quick retry, in the
+			 * other cases we want to put it back to the dirty list
+			 * to avoid livelocking of writeback.
 			 */
-			redirty_tail(inode, wb);
+			if (inode_written)
+				redirty_tail(inode, wb);
+			else
+				requeue_io(inode, wb);
 		} else {
 			/*
 			 * The inode is clean.  At this point we either have
@@ -583,10 +583,10 @@ static long writeback_sb_inodes(struct super_block *sb,
 			wrote++;
 		if (wbc.pages_skipped) {
 			/*
-			 * writeback is not making progress due to locked
-			 * buffers.  Skip this inode for now.
+			 * Writeback is not making progress due to unavailable
+			 * fs locks or similar condition. Retry in next round.
 			 */
-			redirty_tail(inode, wb);
+			requeue_io(inode, wb);
 		}
 		spin_unlock(&inode->i_lock);
 		spin_unlock(&wb->list_lock);
@@ -618,12 +618,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 		struct super_block *sb = inode->i_sb;
 
 		if (!grab_super_passive(sb)) {
-			/*
-			 * grab_super_passive() may fail consistently due to
-			 * s_umount being grabbed by someone else. Don't use
-			 * requeue_io() to avoid busy retrying the inode/sb.
-			 */
-			redirty_tail(inode, wb);
+			requeue_io(inode, wb);
 			continue;
 		}
 		wrote += writeback_sb_inodes(sb, wb, work);
-- 
1.7.1

next prev parent reply	other threads:[~2011-10-07 14:22 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-08  0:44 [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-09-08  0:44 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-09-08  1:22   ` Wu Fengguang
2011-09-08 15:03     ` Jan Kara
2011-09-18 14:07       ` Wu Fengguang
2011-10-05 17:39         ` Jan Kara
2011-10-07 13:43           ` Wu Fengguang
2011-10-07 14:22             ` Jan Kara [this message]
2011-10-07 14:29               ` Wu Fengguang
2011-10-07 14:45                 ` Jan Kara
2011-10-07 15:29                   ` Wu Fengguang
2011-10-08  4:00                   ` Wu Fengguang
2011-10-08 11:52                     ` Wu Fengguang
2011-10-08 13:49                       ` Wu Fengguang
2011-10-09  0:27                         ` Wu Fengguang
2011-10-09  8:44                           ` Wu Fengguang
2011-10-10 11:21                     ` Jan Kara
2011-10-10 11:31                       ` Wu Fengguang
2011-10-10 23:30                         ` Jan Kara
2011-10-11  2:36                           ` Wu Fengguang
2011-10-11 21:53                             ` Jan Kara
2011-10-12  2:44                               ` Wu Fengguang
2011-10-12 19:34                                 ` Jan Kara
2011-09-08  0:57 ` [PATCH 1/2] writeback: Improve busyloop prevention Wu Fengguang
2011-09-08 13:49   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2011-10-12 20:57 [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Jan Kara
2011-10-12 20:57 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-10-13 14:30   ` Wu Fengguang

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:04cf3b9 dfblob:bdeb26a dfblob:bdeb26a dfblob:c786023 )
 OR (
bs:"writeback: Replace some redirty_tail() calls with requeue_io()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111007142201.GB30754@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).