From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()
Date: Fri, 7 Oct 2011 22:29:28 +0800 [thread overview]
Message-ID: <20111007142928.GA14427@localhost> (raw)
In-Reply-To: <20111007142201.GB30754@quack.suse.cz>
On Fri, Oct 07, 2011 at 10:22:01PM +0800, Jan Kara wrote:
> On Fri 07-10-11 21:43:47, Wu Fengguang wrote:
> > > Great, thanks for review! I'll resend the two patches to Christoph so
> > > that he can try them.
> >
> > Jan, I'd like to test out your updated patches with my stupid dd
> > workloads. Would you (re)send them publicly?
> Ah, I resent them publicly on Wednesday
> (http://comments.gmane.org/gmane.linux.kernel/1199713) but git send-email
> apparently does not include emails from Acked-by into list of recipients so
> you didn't get them. Sorry for that. The patches are attached for your
> convenience.
OK thanks. I only checked the linux-fsdevel list before asking..
The results should be ready tomorrow.
Thanks,
Fengguang
> From a042c2a839ad3cf89d8ee158b2bb4b94b573f578 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 8 Sep 2011 01:05:25 +0200
> Subject: [PATCH 1/2] writeback: Improve busyloop prevention
>
> Writeback of an inode can be stalled by things like internal fs locks being
> held. So in case we didn't write anything during a pass through b_io list,
> just wait for a moment and try again.
>
> CC: Christoph Hellwig <hch@infradead.org>
> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/fs-writeback.c | 26 ++++++++++++++------------
> 1 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 04cf3b9..bdeb26a 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -699,8 +699,8 @@ static long wb_writeback(struct bdi_writeback *wb,
> unsigned long wb_start = jiffies;
> long nr_pages = work->nr_pages;
> unsigned long oldest_jif;
> - struct inode *inode;
> long progress;
> + long pause = 1;
>
> oldest_jif = jiffies;
> work->older_than_this = &oldest_jif;
> @@ -755,25 +755,27 @@ static long wb_writeback(struct bdi_writeback *wb,
> * mean the overall work is done. So we keep looping as long
> * as made some progress on cleaning pages or inodes.
> */
> - if (progress)
> + if (progress) {
> + pause = 1;
> continue;
> + }
> /*
> * No more inodes for IO, bail
> */
> if (list_empty(&wb->b_more_io))
> break;
> /*
> - * Nothing written. Wait for some inode to
> - * become available for writeback. Otherwise
> - * we'll just busyloop.
> + * Nothing written (some internal fs locks were unavailable or
> + * inode was under writeback from balance_dirty_pages() or
> + * similar conditions). Wait for a while to avoid busylooping.
> */
> - if (!list_empty(&wb->b_more_io)) {
> - trace_writeback_wait(wb->bdi, work);
> - inode = wb_inode(wb->b_more_io.prev);
> - spin_lock(&inode->i_lock);
> - inode_wait_for_writeback(inode, wb);
> - spin_unlock(&inode->i_lock);
> - }
> + trace_writeback_wait(wb->bdi, work);
> + spin_unlock(&wb->list_lock);
> + __set_current_state(TASK_INTERRUPTIBLE);
> + schedule_timeout(pause);
> + if (pause < HZ / 10)
> + pause <<= 1;
> + spin_lock(&wb->list_lock);
> }
> spin_unlock(&wb->list_lock);
>
> --
> 1.7.1
>
> From 0a4a2cb4d5432f5446215b1e6e44f7d83032dba3 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 8 Sep 2011 01:46:42 +0200
> Subject: [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io()
>
> Calling redirty_tail() can put off inode writeback for upto 30 seconds (or
> whatever dirty_expire_centisecs is). This is unnecessarily big delay in some
> cases and in other cases it is a really bad thing. In particular XFS tries to
> be nice to writeback and when ->write_inode is called for an inode with locked
> ilock, it just redirties the inode and returns EAGAIN. That currently causes
> writeback_single_inode() to redirty_tail() the inode. As contended ilock is
> common thing with XFS while extending files the result can be that inode
> writeout is put off for a really long time.
>
> Now that we have more robust busyloop prevention in wb_writeback() we can
> call requeue_io() in cases where quick retry is required without fear of
> raising CPU consumption too much.
>
> CC: Christoph Hellwig <hch@infradead.org>
> Acked-by: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/fs-writeback.c | 61 ++++++++++++++++++++++++----------------------------
> 1 files changed, 28 insertions(+), 33 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index bdeb26a..c786023 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -356,6 +356,7 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
> long nr_to_write = wbc->nr_to_write;
> unsigned dirty;
> int ret;
> + bool inode_written = false;
>
> assert_spin_locked(&wb->list_lock);
> assert_spin_locked(&inode->i_lock);
> @@ -420,6 +421,8 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
> /* Don't write the inode if only I_DIRTY_PAGES was set */
> if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
> int err = write_inode(inode, wbc);
> + if (!err)
> + inode_written = true;
> if (ret == 0)
> ret = err;
> }
> @@ -430,42 +433,39 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
> if (!(inode->i_state & I_FREEING)) {
> /*
> * Sync livelock prevention. Each inode is tagged and synced in
> - * one shot. If still dirty, it will be redirty_tail()'ed below.
> - * Update the dirty time to prevent enqueue and sync it again.
> + * one shot. If still dirty, update dirty time and put it back
> + * to dirty list to prevent enqueue and syncing it again.
> */
> if ((inode->i_state & I_DIRTY) &&
> - (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages))
> + (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)) {
> inode->dirtied_when = jiffies;
> -
> - if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> + redirty_tail(inode, wb);
> + } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> /*
> - * We didn't write back all the pages. nfs_writepages()
> - * sometimes bales out without doing anything.
> + * We didn't write back all the pages. nfs_writepages()
> + * sometimes bales out without doing anything or we
> + * just run our of our writeback slice.
> */
> inode->i_state |= I_DIRTY_PAGES;
> - if (wbc->nr_to_write <= 0) {
> - /*
> - * slice used up: queue for next turn
> - */
> - requeue_io(inode, wb);
> - } else {
> - /*
> - * Writeback blocked by something other than
> - * congestion. Delay the inode for some time to
> - * avoid spinning on the CPU (100% iowait)
> - * retrying writeback of the dirty page/inode
> - * that cannot be performed immediately.
> - */
> - redirty_tail(inode, wb);
> - }
> + requeue_io(inode, wb);
> } else if (inode->i_state & I_DIRTY) {
> /*
> * Filesystems can dirty the inode during writeback
> * operations, such as delayed allocation during
> * submission or metadata updates after data IO
> - * completion.
> + * completion. Also inode could have been dirtied by
> + * some process aggressively touching metadata.
> + * Finally, filesystem could just fail to write the
> + * inode for some reason. We have to distinguish the
> + * last case from the previous ones - in the last case
> + * we want to give the inode quick retry, in the
> + * other cases we want to put it back to the dirty list
> + * to avoid livelocking of writeback.
> */
> - redirty_tail(inode, wb);
> + if (inode_written)
> + redirty_tail(inode, wb);
> + else
> + requeue_io(inode, wb);
> } else {
> /*
> * The inode is clean. At this point we either have
> @@ -583,10 +583,10 @@ static long writeback_sb_inodes(struct super_block *sb,
> wrote++;
> if (wbc.pages_skipped) {
> /*
> - * writeback is not making progress due to locked
> - * buffers. Skip this inode for now.
> + * Writeback is not making progress due to unavailable
> + * fs locks or similar condition. Retry in next round.
> */
> - redirty_tail(inode, wb);
> + requeue_io(inode, wb);
> }
> spin_unlock(&inode->i_lock);
> spin_unlock(&wb->list_lock);
> @@ -618,12 +618,7 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
> struct super_block *sb = inode->i_sb;
>
> if (!grab_super_passive(sb)) {
> - /*
> - * grab_super_passive() may fail consistently due to
> - * s_umount being grabbed by someone else. Don't use
> - * requeue_io() to avoid busy retrying the inode/sb.
> - */
> - redirty_tail(inode, wb);
> + requeue_io(inode, wb);
> continue;
> }
> wrote += writeback_sb_inodes(sb, wb, work);
> --
> 1.7.1
>
next prev parent reply other threads:[~2011-10-07 14:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-08 0:44 [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-09-08 0:44 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-09-08 1:22 ` Wu Fengguang
2011-09-08 15:03 ` Jan Kara
2011-09-18 14:07 ` Wu Fengguang
2011-10-05 17:39 ` Jan Kara
2011-10-07 13:43 ` Wu Fengguang
2011-10-07 14:22 ` Jan Kara
2011-10-07 14:29 ` Wu Fengguang [this message]
2011-10-07 14:45 ` Jan Kara
2011-10-07 15:29 ` Wu Fengguang
2011-10-08 4:00 ` Wu Fengguang
2011-10-08 11:52 ` Wu Fengguang
2011-10-08 13:49 ` Wu Fengguang
2011-10-09 0:27 ` Wu Fengguang
2011-10-09 8:44 ` Wu Fengguang
2011-10-10 11:21 ` Jan Kara
2011-10-10 11:31 ` Wu Fengguang
2011-10-10 23:30 ` Jan Kara
2011-10-11 2:36 ` Wu Fengguang
2011-10-11 21:53 ` Jan Kara
2011-10-12 2:44 ` Wu Fengguang
2011-10-12 19:34 ` Jan Kara
2011-09-08 0:57 ` [PATCH 1/2] writeback: Improve busyloop prevention Wu Fengguang
2011-09-08 13:49 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2011-10-12 20:57 [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Jan Kara
2011-10-12 20:57 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-10-13 14:30 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111007142928.GA14427@localhost \
--to=fengguang.wu@intel.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).