From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: linux-fsdevel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>
Subject: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()
Date: Wed, 12 Oct 2011 22:57:23 +0200 [thread overview]
Message-ID: <1318453043-32057-3-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1318453043-32057-1-git-send-email-jack@suse.cz>
Calling redirty_tail() can put off inode writeback for upto 30 seconds (or
whatever dirty_expire_centisecs is). This is unnecessarily big delay in some
cases and in other cases it is a really bad thing. In particular XFS tries to
be nice to writeback and when ->write_inode is called for an inode with locked
ilock, it just redirties the inode and returns EAGAIN. That currently causes
writeback_single_inode() to redirty_tail() the inode. As contended ilock is
common thing with XFS while extending files the result can be that inode
writeout is put off for a really long time.
Now that we have more robust busyloop prevention in wb_writeback() we can
call requeue_io() in cases where quick retry is required without fear of
raising CPU consumption too much.
CC: Christoph Hellwig <hch@infradead.org>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 56 +++++++++++++++++++++++++++-------------------------
1 files changed, 29 insertions(+), 27 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index b619f3a..094afcd 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -356,6 +356,7 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
long nr_to_write = wbc->nr_to_write;
unsigned dirty;
int ret;
+ bool inode_written = false;
assert_spin_locked(&wb->list_lock);
assert_spin_locked(&inode->i_lock);
@@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
int err = write_inode(inode, wbc);
+ if (!err)
+ inode_written = true;
if (ret == 0)
ret = err;
}
@@ -430,17 +433,20 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
if (!(inode->i_state & I_FREEING)) {
/*
* Sync livelock prevention. Each inode is tagged and synced in
- * one shot. If still dirty, it will be redirty_tail()'ed below.
- * Update the dirty time to prevent enqueue and sync it again.
+ * one shot. If still dirty, update dirty time and put it back
+ * to dirty list to prevent enqueue and syncing it again.
*/
if ((inode->i_state & I_DIRTY) &&
- (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages))
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)) {
inode->dirtied_when = jiffies;
-
- if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
+ redirty_tail(inode, wb);
+ } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
/*
- * We didn't write back all the pages. nfs_writepages()
- * sometimes bales out without doing anything.
+ * We didn't write back all the pages. We may have just
+ * run out of our writeback slice, or nfs_writepages()
+ * sometimes bales out without doing anything, or e.g.
+ * btrfs ignores for_kupdate writeback requests for
+ * metadata inodes.
*/
inode->i_state |= I_DIRTY_PAGES;
if (wbc->nr_to_write <= 0) {
@@ -450,11 +456,9 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
requeue_io(inode, wb);
} else {
/*
- * Writeback blocked by something other than
- * congestion. Delay the inode for some time to
- * avoid spinning on the CPU (100% iowait)
- * retrying writeback of the dirty page/inode
- * that cannot be performed immediately.
+ * Writeback blocked by something. Put inode
+ * back to dirty list to prevent livelocking of
+ * writeback.
*/
redirty_tail(inode, wb);
}
@@ -463,9 +467,19 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
* Filesystems can dirty the inode during writeback
* operations, such as delayed allocation during
* submission or metadata updates after data IO
- * completion.
+ * completion. Also inode could have been dirtied by
+ * some process aggressively touching metadata.
+ * Finally, filesystem could just fail to write the
+ * inode for some reason. We have to distinguish the
+ * last case from the previous ones - in the last case
+ * we want to give the inode quick retry, in the
+ * other cases we want to put it back to the dirty list
+ * to avoid livelocking of writeback.
*/
- redirty_tail(inode, wb);
+ if (inode_written)
+ redirty_tail(inode, wb);
+ else
+ requeue_io(inode, wb);
} else {
/*
* The inode is clean. At this point we either have
@@ -581,13 +595,6 @@ static long writeback_sb_inodes(struct super_block *sb,
wrote += write_chunk - wbc.nr_to_write;
if (!(inode->i_state & I_DIRTY))
wrote++;
- if (wbc.pages_skipped) {
- /*
- * writeback is not making progress due to locked
- * buffers. Skip this inode for now.
- */
- redirty_tail(inode, wb);
- }
spin_unlock(&inode->i_lock);
spin_unlock(&wb->list_lock);
iput(inode);
@@ -618,12 +625,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
struct super_block *sb = inode->i_sb;
if (!grab_super_passive(sb)) {
- /*
- * grab_super_passive() may fail consistently due to
- * s_umount being grabbed by someone else. Don't use
- * requeue_io() to avoid busy retrying the inode/sb.
- */
- redirty_tail(inode, wb);
+ requeue_io(inode, wb);
continue;
}
wrote += writeback_sb_inodes(sb, wb, work);
--
1.7.1
next prev parent reply other threads:[~2011-10-12 20:57 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-12 20:57 [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Jan Kara
2011-10-12 20:57 ` [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-10-13 14:26 ` Wu Fengguang
2011-10-13 20:13 ` Jan Kara
2011-10-14 7:18 ` Christoph Hellwig
2011-10-14 19:31 ` Chris Mason
[not found] ` <20111013143939.GA9691@localhost>
2011-10-13 20:18 ` Jan Kara
2011-10-14 16:00 ` Wu Fengguang
2011-10-14 16:28 ` Wu Fengguang
2011-10-18 0:51 ` Jan Kara
2011-10-18 14:35 ` Wu Fengguang
2011-10-19 11:56 ` Jan Kara
2011-10-19 13:25 ` Wu Fengguang
2011-10-19 13:30 ` Wu Fengguang
2011-10-19 13:35 ` Wu Fengguang
2011-10-20 12:09 ` Wu Fengguang
2011-10-20 12:33 ` Wu Fengguang
2011-10-20 13:39 ` Wu Fengguang
2011-10-20 22:26 ` Jan Kara
2011-10-22 4:20 ` Wu Fengguang
2011-10-24 15:45 ` Jan Kara
[not found] ` <20111027063133.GA10146@localhost>
2011-10-27 20:31 ` Jan Kara
[not found] ` <20111101134231.GA31718@localhost>
2011-11-01 21:53 ` Jan Kara
2011-11-02 17:25 ` Wu Fengguang
[not found] ` <20111102185603.GA4034@localhost>
2011-11-03 1:51 ` Jan Kara
2011-11-03 14:52 ` Wu Fengguang
[not found] ` <20111104152054.GA11577@localhost>
2011-11-08 23:52 ` Jan Kara
2011-11-09 13:51 ` Wu Fengguang
2011-11-10 14:50 ` Jan Kara
2011-12-05 8:02 ` Wu Fengguang
2011-12-07 10:13 ` Jan Kara
2011-12-07 11:45 ` Wu Fengguang
[not found] ` <20111027064745.GA14017@localhost>
2011-10-27 20:50 ` Jan Kara
2011-10-20 9:46 ` Christoph Hellwig
2011-10-20 15:32 ` Jan Kara
2011-10-15 12:41 ` Wu Fengguang
2011-10-12 20:57 ` Jan Kara [this message]
2011-10-13 14:30 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Wu Fengguang
2011-10-13 14:15 ` [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2011-09-08 0:44 [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-09-08 0:44 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-09-08 1:22 ` Wu Fengguang
2011-09-08 15:03 ` Jan Kara
2011-09-18 14:07 ` Wu Fengguang
2011-10-05 17:39 ` Jan Kara
2011-10-07 13:43 ` Wu Fengguang
2011-10-07 14:22 ` Jan Kara
2011-10-07 14:29 ` Wu Fengguang
2011-10-07 14:45 ` Jan Kara
2011-10-07 15:29 ` Wu Fengguang
2011-10-08 4:00 ` Wu Fengguang
2011-10-08 11:52 ` Wu Fengguang
2011-10-08 13:49 ` Wu Fengguang
2011-10-09 0:27 ` Wu Fengguang
2011-10-09 8:44 ` Wu Fengguang
2011-10-10 11:21 ` Jan Kara
2011-10-10 11:31 ` Wu Fengguang
2011-10-10 23:30 ` Jan Kara
2011-10-11 2:36 ` Wu Fengguang
2011-10-11 21:53 ` Jan Kara
2011-10-12 2:44 ` Wu Fengguang
2011-10-12 19:34 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1318453043-32057-3-git-send-email-jack@suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).