From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Christoph Hellwig <hch@infradead.org>,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 1/2] writeback: Improve busyloop prevention
Date: Thu, 13 Oct 2011 22:13:54 +0200 [thread overview]
Message-ID: <20111013201354.GC27363@quack.suse.cz> (raw)
In-Reply-To: <20111013142638.GB6938@localhost>
[-- Attachment #1: Type: text/plain, Size: 4174 bytes --]
On Thu 13-10-11 22:26:38, Wu Fengguang wrote:
> On Thu, Oct 13, 2011 at 04:57:22AM +0800, Jan Kara wrote:
> > Writeback of an inode can be stalled by things like internal fs locks being
> > held. So in case we didn't write anything during a pass through b_io list,
> > just wait for a moment and try again. When retrying is fruitless for a long
> > time, or we have some other work to do, we just stop current work to avoid
> > blocking flusher thread.
> >
> > CC: Christoph Hellwig <hch@infradead.org>
> > Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/fs-writeback.c | 39 +++++++++++++++++++++++++++------------
> > 1 files changed, 27 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 04cf3b9..b619f3a 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -699,8 +699,11 @@ static long wb_writeback(struct bdi_writeback *wb,
> > unsigned long wb_start = jiffies;
> > long nr_pages = work->nr_pages;
> > unsigned long oldest_jif;
> > - struct inode *inode;
> > long progress;
> > + long pause = 1;
> > + long max_pause = dirty_writeback_interval ?
> > + msecs_to_jiffies(dirty_writeback_interval * 10) :
> > + HZ;
>
> It's better not to put the flusher to sleeps more than 10ms, so that
> when the condition changes, we don't risk making the storage idle for
> too long time.
>
> So let's distinguish between accumulated and one-shot max pause time
> in the below code?
I was thinking about this as well but then I realized that when some work
is queued or when background writeback is necessary, we always wakeup
flusher thread so these conditions will be noticed promptly. And regarding
locks we potentially blocked on, we always wait only as long as we already
waited in previous waits together so that doesn't look like too defensive
to me. Or are you concerned about something else? I've just noticed the was
was unnecessarily racy wrt wakeups so attached is a new version which
should be safe in this regard.
Specifically I didn't want to wake up every 10 ms because if there are
inodes which are unwriteable for a long time like in case of btrfs, we
would just wakeup flusher thread (and thus CPU) every 10 ms on otherwise
idle system and that does draw considerable amount of power on a laptop.
> > oldest_jif = jiffies;
> > work->older_than_this = &oldest_jif;
> > @@ -755,25 +758,37 @@ static long wb_writeback(struct bdi_writeback *wb,
> > * mean the overall work is done. So we keep looping as long
> > * as made some progress on cleaning pages or inodes.
> > */
> > - if (progress)
> > + if (progress) {
> > + pause = 1;
> > continue;
> > + }
> > /*
> > * No more inodes for IO, bail
> > */
> > if (list_empty(&wb->b_more_io))
> > break;
> > /*
> > - * Nothing written. Wait for some inode to
> > - * become available for writeback. Otherwise
> > - * we'll just busyloop.
> > + * Nothing written (some internal fs locks were unavailable or
> > + * inode was under writeback from balance_dirty_pages() or
> > + * similar conditions).
> > */
> > - if (!list_empty(&wb->b_more_io)) {
> > - trace_writeback_wait(wb->bdi, work);
> > - inode = wb_inode(wb->b_more_io.prev);
> > - spin_lock(&inode->i_lock);
> > - inode_wait_for_writeback(inode, wb);
> > - spin_unlock(&inode->i_lock);
> > - }
> > + /* If there's some other work to do, proceed with it... */
> > + if (!list_empty(&wb->bdi->work_list) ||
> > + (!work->for_background && over_bground_thresh()))
> > + break;
> > + /*
> > + * Wait for a while to avoid busylooping unless we waited for
> > + * so long it does not make sense to retry anymore.
> > + */
> > + if (pause > max_pause)
> > + break;
> > + trace_writeback_wait(wb->bdi, work);
> > + spin_unlock(&wb->list_lock);
> > + __set_current_state(TASK_INTERRUPTIBLE);
> > + schedule_timeout(pause);
> > + if (pause < max_pause)
> > + pause <<= 1;
> > + spin_lock(&wb->list_lock);
> > }
> > spin_unlock(&wb->list_lock);
> >
> > --
> > 1.7.1
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
[-- Attachment #2: 0001-writeback-Improve-busyloop-prevention.patch --]
[-- Type: text/x-patch, Size: 3007 bytes --]
>From 7d59989e38af2e10101f7d9f0c98343fe551c536 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 8 Sep 2011 01:05:25 +0200
Subject: [PATCH 1/2] writeback: Improve busyloop prevention
Writeback of an inode can be stalled by things like internal fs locks being
held. So in case we didn't write anything during a pass through b_io list,
just wait for a moment and try again. When retrying is fruitless for a long
time, or we have some other work to do, we just stop current work to avoid
blocking flusher thread.
CC: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 43 ++++++++++++++++++++++++++++++++-----------
1 files changed, 32 insertions(+), 11 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 04cf3b9..4ffc07f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -699,8 +699,11 @@ static long wb_writeback(struct bdi_writeback *wb,
unsigned long wb_start = jiffies;
long nr_pages = work->nr_pages;
unsigned long oldest_jif;
- struct inode *inode;
long progress;
+ long pause = 1;
+ long max_pause = dirty_writeback_interval ?
+ msecs_to_jiffies(dirty_writeback_interval * 10) :
+ HZ;
oldest_jif = jiffies;
work->older_than_this = &oldest_jif;
@@ -755,25 +758,43 @@ static long wb_writeback(struct bdi_writeback *wb,
* mean the overall work is done. So we keep looping as long
* as made some progress on cleaning pages or inodes.
*/
- if (progress)
+ if (progress) {
+ pause = 1;
continue;
+ }
/*
* No more inodes for IO, bail
*/
if (list_empty(&wb->b_more_io))
break;
/*
- * Nothing written. Wait for some inode to
- * become available for writeback. Otherwise
- * we'll just busyloop.
+ * Nothing written (some internal fs locks were unavailable or
+ * inode was under writeback from balance_dirty_pages() or
+ * similar conditions).
+ *
+ * Wait for a while to avoid busylooping unless we waited for
+ * so long it does not make sense to retry anymore.
*/
- if (!list_empty(&wb->b_more_io)) {
- trace_writeback_wait(wb->bdi, work);
- inode = wb_inode(wb->b_more_io.prev);
- spin_lock(&inode->i_lock);
- inode_wait_for_writeback(inode, wb);
- spin_unlock(&inode->i_lock);
+ if (pause > max_pause)
+ break;
+ /*
+ * Set state here to prevent races with someone waking us up
+ * (because new work is queued or because background limit is
+ * exceeded).
+ */
+ set_current_state(TASK_INTERRUPTIBLE);
+ /* If there's some other work to do, proceed with it... */
+ if (!list_empty(&wb->bdi->work_list) ||
+ (!work->for_background && over_bground_thresh())) {
+ __set_current_state(TASK_RUNNING);
+ break;
}
+ trace_writeback_wait(wb->bdi, work);
+ spin_unlock(&wb->list_lock);
+ schedule_timeout(pause);
+ if (pause < max_pause)
+ pause <<= 1;
+ spin_lock(&wb->list_lock);
}
spin_unlock(&wb->list_lock);
--
1.7.1
next prev parent reply other threads:[~2011-10-13 20:13 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-12 20:57 [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Jan Kara
2011-10-12 20:57 ` [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-10-13 14:26 ` Wu Fengguang
2011-10-13 20:13 ` Jan Kara [this message]
2011-10-14 7:18 ` Christoph Hellwig
2011-10-14 19:31 ` Chris Mason
[not found] ` <20111013143939.GA9691@localhost>
2011-10-13 20:18 ` Jan Kara
2011-10-14 16:00 ` Wu Fengguang
2011-10-14 16:28 ` Wu Fengguang
2011-10-18 0:51 ` Jan Kara
2011-10-18 14:35 ` Wu Fengguang
2011-10-19 11:56 ` Jan Kara
2011-10-19 13:25 ` Wu Fengguang
2011-10-19 13:30 ` Wu Fengguang
2011-10-19 13:35 ` Wu Fengguang
2011-10-20 12:09 ` Wu Fengguang
2011-10-20 12:33 ` Wu Fengguang
2011-10-20 13:39 ` Wu Fengguang
2011-10-20 22:26 ` Jan Kara
2011-10-22 4:20 ` Wu Fengguang
2011-10-24 15:45 ` Jan Kara
[not found] ` <20111027063133.GA10146@localhost>
2011-10-27 20:31 ` Jan Kara
[not found] ` <20111101134231.GA31718@localhost>
2011-11-01 21:53 ` Jan Kara
2011-11-02 17:25 ` Wu Fengguang
[not found] ` <20111102185603.GA4034@localhost>
2011-11-03 1:51 ` Jan Kara
2011-11-03 14:52 ` Wu Fengguang
[not found] ` <20111104152054.GA11577@localhost>
2011-11-08 23:52 ` Jan Kara
2011-11-09 13:51 ` Wu Fengguang
2011-11-10 14:50 ` Jan Kara
2011-12-05 8:02 ` Wu Fengguang
2011-12-07 10:13 ` Jan Kara
2011-12-07 11:45 ` Wu Fengguang
[not found] ` <20111027064745.GA14017@localhost>
2011-10-27 20:50 ` Jan Kara
2011-10-20 9:46 ` Christoph Hellwig
2011-10-20 15:32 ` Jan Kara
2011-10-15 12:41 ` Wu Fengguang
2011-10-12 20:57 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-10-13 14:30 ` Wu Fengguang
2011-10-13 14:15 ` [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2011-09-08 0:44 [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-09-08 0:57 ` Wu Fengguang
2011-09-08 13:49 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111013201354.GC27363@quack.suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).