From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Christoph Hellwig <hch@infradead.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/17] writeback: sync expired inodes first in background writeback
Date: Mon, 16 May 2011 21:00:55 +0800 [thread overview]
Message-ID: <20110516130055.GA12407@localhost> (raw)
In-Reply-To: <20110512225525.GK19446@dastard>
On Fri, May 13, 2011 at 06:55:25AM +0800, Dave Chinner wrote:
> On Thu, May 12, 2011 at 09:57:12PM +0800, Wu Fengguang wrote:
> > A background flush work may run for ever. So it's reasonable for it to
> > mimic the kupdate behavior of syncing old/expired inodes first.
> >
> > At each queue_io() time, first try enqueuing only newly expired inodes.
> > If there are zero expired inodes to work with, then relax the rule and
> > enqueue all dirty inodes.
> >
> > It at least makes sense from the data integrity point of view.
> >
> > This may also reduce the number of dirty pages encountered by page
> > reclaim, eg. the pageout() calls. Normally older inodes contain older
> > dirty pages, which are more close to the end of the LRU lists. So
> > syncing older inodes first helps reducing the dirty pages reached by the
> > page reclaim code.
> >
> > More background: as Mel put it, "it makes sense to write old pages first
> > to reduce the chances page reclaim is initiating IO."
> >
> > Rik also presented the situation with a graph:
> >
> > LRU head [*] dirty page
> > [ * * * * * * * * * * *]
> >
> > Ideally, most dirty pages should lie close to the LRU tail instead of
> > LRU head. That requires the flusher thread to sync old/expired inodes
> > first (as there are obvious correlations between inode age and page
> > age), and to give fair opportunities to newly expired inodes rather
> > than sticking with some large eldest inodes (as larger inodes have
> > weaker correlations in the inode<=>page ages).
> >
> > This patch helps the flusher to meet both the above requirements.
> >
> > Side effects: it might reduce the batch size and hence reduce
> > inode_wb_list_lock hold time, but in turn make the cluster-by-partition
> > logic in the same function less effective on reducing disk seeks.
> >
> > v2: keep policy changes inside wb_writeback() and keep the
> > wbc.older_than_this visibility as suggested by Dave.
> >
> > CC: Dave Chinner <david@fromorbit.com>
> > Acked-by: Jan Kara <jack@suse.cz>
> > Acked-by: Rik van Riel<riel@redhat.com>
> > Acked-by: Mel Gorman <mel@csn.ul.ie>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> > fs/fs-writeback.c | 16 +++++++++++++++-
> > 1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > --- linux-next.orig/fs/fs-writeback.c 2011-05-05 23:30:25.000000000 +0800
> > +++ linux-next/fs/fs-writeback.c 2011-05-05 23:30:26.000000000 +0800
> > @@ -718,7 +718,7 @@ static long wb_writeback(struct bdi_writ
> > if (work->for_background && !over_bground_thresh())
> > break;
> >
> > - if (work->for_kupdate) {
> > + if (work->for_kupdate || work->for_background) {
> > oldest_jif = jiffies -
> > msecs_to_jiffies(dirty_expire_interval * 10);
> > wbc.older_than_this = &oldest_jif;
> > @@ -729,6 +729,7 @@ static long wb_writeback(struct bdi_writ
> > wbc.pages_skipped = 0;
> > wbc.inodes_cleaned = 0;
> >
> > +retry:
> > trace_wbc_writeback_start(&wbc, wb->bdi);
> > if (work->sb)
> > __writeback_inodes_sb(work->sb, wb, &wbc);
> > @@ -752,6 +753,19 @@ static long wb_writeback(struct bdi_writ
> > if (wbc.inodes_cleaned)
> > continue;
> > /*
> > + * background writeback will start with expired inodes, and
> > + * if none is found, fallback to all inodes. This order helps
> > + * reduce the number of dirty pages reaching the end of LRU
> > + * lists and cause trouble to the page reclaim.
> > + */
> > + if (work->for_background &&
> > + wbc.older_than_this &&
> > + list_empty(&wb->b_io) &&
> > + list_empty(&wb->b_more_io)) {
> > + wbc.older_than_this = NULL;
> > + goto retry;
> > + }
> > + /*
> > * No more inodes for IO, bail
> > */
> > if (!wbc.more_io)
>
> I have to say that I dislike this implicit nested looping structure
> using a goto. It would seem better to me to make it explicit that we
> can do multiple writeback calls by using a do/while loop here and
> moving the logic of setting/resetting wbc.older_than_this to one
> place inside the nested loop...
This is the best I can do. Does it look better? Now the
.older_than_this modifying lines are close to each others :)
Thanks,
Fengguang
---
fs/fs-writeback.c | 58 +++++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 27 deletions(-)
--- linux-next.orig/fs/fs-writeback.c 2011-05-16 20:29:53.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2011-05-16 20:53:31.000000000 +0800
@@ -713,42 +713,22 @@ static long wb_writeback(struct bdi_writ
unsigned long oldest_jif;
struct inode *inode;
long progress;
+ bool bg_retry_all = false;
oldest_jif = jiffies;
work->older_than_this = &oldest_jif;
spin_lock(&wb->list_lock);
for (;;) {
- /*
- * Stop writeback when nr_pages has been consumed
- */
- if (work->nr_pages <= 0)
- break;
-
- /*
- * Background writeout and kupdate-style writeback may
- * run forever. Stop them if there is other work to do
- * so that e.g. sync can proceed. They'll be restarted
- * after the other works are all done.
- */
- if ((work->for_background || work->for_kupdate) &&
- !list_empty(&wb->bdi->work_list))
- break;
-
- /*
- * For background writeout, stop when we are below the
- * background dirty threshold
- */
- if (work->for_background && !over_bground_thresh())
- break;
-
- if (work->for_kupdate || work->for_background) {
+ if (bg_retry_all) {
+ bg_retry_all = false;
+ work->older_than_this = NULL;
+ } else if (work->for_kupdate || work->for_background) {
oldest_jif = jiffies -
msecs_to_jiffies(dirty_expire_interval * 10);
work->older_than_this = &oldest_jif;
}
-retry:
trace_writeback_start(wb->bdi, work);
if (list_empty(&wb->b_io))
queue_io(wb, work->older_than_this);
@@ -778,14 +758,38 @@ retry:
work->older_than_this &&
list_empty(&wb->b_io) &&
list_empty(&wb->b_more_io)) {
- work->older_than_this = NULL;
- goto retry;
+ bg_retry_all = true;
+ continue;
}
/*
* No more inodes for IO, bail
*/
if (list_empty(&wb->b_more_io))
break;
+
+ /*
+ * Stop writeback when nr_pages has been consumed
+ */
+ if (work->nr_pages <= 0)
+ break;
+
+ /*
+ * Background writeout and kupdate-style writeback may
+ * run forever. Stop them if there is other work to do
+ * so that e.g. sync can proceed. They'll be restarted
+ * after the other works are all done.
+ */
+ if ((work->for_background || work->for_kupdate) &&
+ !list_empty(&wb->bdi->work_list))
+ break;
+
+ /*
+ * For background writeout, stop when we are below the
+ * background dirty threshold
+ */
+ if (work->for_background && !over_bground_thresh())
+ break;
+
/*
* Nothing written. Wait for some inode to
* become available for writeback. Otherwise
next prev parent reply other threads:[~2011-05-16 13:01 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-12 13:57 [PATCH 00/17] writeback fixes and cleanups for 2.6.40 (v2) Wu Fengguang
2011-05-12 13:57 ` [PATCH 01/17] writeback: introduce .tagged_sync for the WB_SYNC_NONE sync stage Wu Fengguang
2011-05-12 22:40 ` Dave Chinner
2011-05-13 2:56 ` Wu Fengguang
2011-05-13 10:17 ` Christoph Hellwig
2011-05-15 23:43 ` Dave Chinner
2011-05-16 5:39 ` Wu Fengguang
2011-05-19 21:17 ` Wu Fengguang
2011-05-12 13:57 ` [PATCH 02/17] writeback: update dirtied_when for synced inode to prevent livelock Wu Fengguang
2011-05-12 22:42 ` Dave Chinner
2011-05-13 3:08 ` Wu Fengguang
2011-05-19 21:31 ` Wu Fengguang
2011-05-23 13:14 ` Jan Kara
2011-05-24 3:03 ` Wu Fengguang
2011-05-12 13:57 ` [PATCH 03/17] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-05-12 22:44 ` Dave Chinner
2011-05-13 3:36 ` Wu Fengguang
2011-05-15 23:50 ` Dave Chinner
2011-05-16 10:40 ` Christoph Hellwig
2011-05-16 11:14 ` Wu Fengguang
2011-05-12 13:57 ` [PATCH 04/17] writeback: try more writeback as long as something was written Wu Fengguang
2011-05-12 13:57 ` [PATCH 05/17] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-05-12 13:57 ` [PATCH 06/17] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-05-12 22:55 ` Dave Chinner
2011-05-16 13:00 ` Wu Fengguang [this message]
2011-05-12 13:57 ` [PATCH 07/17] writeback: refill b_io iff empty Wu Fengguang
2011-05-12 13:57 ` [PATCH 08/17] writeback: split inode_wb_list_lock into bdi_writeback.list_lock Wu Fengguang
2011-05-12 13:57 ` [PATCH 09/17] writeback: elevate queue_io() into wb_writeback() Wu Fengguang
2011-05-12 13:57 ` [PATCH 10/17] writeback: avoid extra sync work at enqueue time Wu Fengguang
2011-05-12 13:57 ` [PATCH 11/17] writeback: add bdi_dirty_limit() kernel-doc Wu Fengguang
2011-05-12 13:57 ` [PATCH 12/17] writeback: skip balance_dirty_pages() for in-memory fs Wu Fengguang
2011-05-16 10:43 ` Christoph Hellwig
2011-05-16 10:49 ` Wu Fengguang
2011-05-12 13:57 ` [PATCH 13/17] writeback: remove writeback_control.more_io Wu Fengguang
2011-05-12 14:25 ` Minchan Kim
2011-05-12 23:04 ` Dave Chinner
2011-05-13 5:03 ` Wu Fengguang
2011-05-15 23:54 ` Dave Chinner
2011-05-12 13:57 ` [PATCH 14/17] writeback: make writeback_control.nr_to_write straight Wu Fengguang
2011-05-12 14:56 ` Jan Kara
2011-05-12 23:18 ` Dave Chinner
2011-05-13 5:28 ` Wu Fengguang
2011-05-16 0:12 ` Dave Chinner
2011-05-16 12:05 ` Wu Fengguang
2011-05-12 13:57 ` [PATCH 15/17] writeback: remove .nonblocking and .encountered_congestion Wu Fengguang
2011-05-12 13:57 ` [PATCH 16/17] writeback: trace event writeback_single_inode Wu Fengguang
2011-05-12 23:20 ` Dave Chinner
2011-05-13 5:37 ` Wu Fengguang
2011-05-16 0:14 ` Dave Chinner
2011-05-16 12:21 ` Wu Fengguang
2011-05-12 13:57 ` [PATCH 17/17] writeback: trace event writeback_queue_io Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2011-05-06 3:08 [PATCH 00/17] writeback fixes and cleanups for 2.6.40 Wu Fengguang
2011-05-06 3:08 ` [PATCH 06/17] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-05-06 19:02 ` Rik van Riel
2011-05-09 16:08 ` Jan Kara
2011-05-09 16:18 ` Rik van Riel
2011-05-10 2:45 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110516130055.GA12407@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).