From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@linux.vnet.ibm.com>, Mel Gorman <mel@csn.ul.ie>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
Minchan Kim <minchan.kim@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 3/6] writeback: sync expired inodes first in background writeback
Date: Wed, 20 Apr 2011 10:53:21 +0800 [thread overview]
Message-ID: <20110420025321.GA14398@localhost> (raw)
In-Reply-To: <20110420012120.GK23985@dastard>
On Wed, Apr 20, 2011 at 09:21:20AM +0800, Dave Chinner wrote:
> On Tue, Apr 19, 2011 at 08:56:16PM +0800, Wu Fengguang wrote:
> > On Tue, Apr 19, 2011 at 05:57:40PM +0800, Jan Kara wrote:
> > > On Tue 19-04-11 17:35:23, Dave Chinner wrote:
> > > > On Tue, Apr 19, 2011 at 11:00:06AM +0800, Wu Fengguang wrote:
> > > > > A background flush work may run for ever. So it's reasonable for it to
> > > > > mimic the kupdate behavior of syncing old/expired inodes first.
> > > > >
> > > > > The policy is
> > > > > - enqueue all newly expired inodes at each queue_io() time
> > > > > - enqueue all dirty inodes if there are no more expired inodes to sync
> > > > >
> > > > > This will help reduce the number of dirty pages encountered by page
> > > > > reclaim, eg. the pageout() calls. Normally older inodes contain older
> > > > > dirty pages, which are more close to the end of the LRU lists. So
> > > > > syncing older inodes first helps reducing the dirty pages reached by
> > > > > the page reclaim code.
> > > >
> > > > Once again I think this is the wrong place to be changing writeback
> > > > policy decisions. for_background writeback only goes through
> > > > wb_writeback() and writeback_inodes_wb() (same as for_kupdate
> > > > writeback), so a decision to change from expired inodes to fresh
> > > > inodes, IMO, should be made in wb_writeback.
> > > >
> > > > That is, for_background and for_kupdate writeback start with the
> > > > same policy (older_than_this set) to writeback expired inodes first,
> > > > then when background writeback runs out of expired inodes, it should
> > > > switch to all remaining inodes by clearing older_than_this instead
> > > > of refreshing it for the next loop.
> > > Yes, I agree with this and my impression is that Fengguang is trying to
> > > achieve exactly this behavior.
> > >
> > > > This keeps all the policy decisions in the one place, all using the
> > > > same (existing) mechanism, and all relatively simple to understand,
> > > > and easy to tracepoint for debugging. Changing writeback policy
> > > > deep in the writeback stack is not a good idea as it will make
> > > > extending writeback policies in future (e.g. for cgroup awareness)
> > > > very messy.
> > > Hmm, I see. I agree the policy decisions should be at one place if
> > > reasonably possible. Fengguang moves them from wb_writeback() to inode
> > > queueing code which looks like a logical place to me as well - there we
> > > have the largest control over what inodes do we decide to write and don't
> > > have to pass all the detailed 'instructions' down in wbc structure. So if
> > > we later want to add cgroup awareness to writeback, I imagine we just add
> > > the knowledge to inode queueing code.
> >
> > I actually started with wb_writeback() as a natural choice, and then
> > found it much easier to do the expired-only=>all-inodes switching in
> > move_expired_inodes() since it needs to know the @b_dirty and @tmp
> > lists' emptiness to trigger the switch. It's not sane for
> > wb_writeback() to look into such details. And once you do the switch
> > part in move_expired_inodes(), the whole policy naturally follows.
>
> Well, not really. You didn't need to modify move_expired_inodes() at
> all to implement these changes - all you needed to do was modify how
> older_than_this is configured.
>
> writeback policy is defined by the struct writeback_control.
> move_expired_inodes() is pure mechanism. What you've done is remove
> policy from the struct wbc and moved it to move_expired_inodes(),
> which now defines both policy and mechanism.
> Furhter, this means that all the tracing that uses the struct wbc no
> no longer shows the entire writeback policy that is being worked on,
> so we lose visibility into policy decisions that writeback is
> making.
Good point! I'm convinced, visibility is a necessity for debugging the
complex writeback behaviors.
> This same change is as simple as updating wbc->older_than_this
> appropriately after the wb_writeback() call for both background and
> kupdate and leaving the lower layers untouched. It's just a policy
> change. If you thinkthe mechanism is inefficient, copy
> wbc->older_than_this to a local variable inside
> move_expired_inodes()....
Do you like something like this? (details will change a bit when
rearranging the patchset)
--- linux-next.orig/fs/fs-writeback.c 2011-04-20 10:30:47.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2011-04-20 10:40:19.000000000 +0800
@@ -660,11 +660,6 @@ static long wb_writeback(struct bdi_writ
long write_chunk;
struct inode *inode;
- if (wbc.for_kupdate) {
- wbc.older_than_this = &oldest_jif;
- oldest_jif = jiffies -
- msecs_to_jiffies(dirty_expire_interval * 10);
- }
if (!wbc.range_cyclic) {
wbc.range_start = 0;
wbc.range_end = LLONG_MAX;
@@ -713,10 +708,17 @@ static long wb_writeback(struct bdi_writ
if (work->for_background && !over_bground_thresh())
break;
+ if (work->for_kupdate || work->for_background) {
+ oldest_jif = jiffies -
+ msecs_to_jiffies(dirty_expire_interval * 10);
+ wbc.older_than_this = &oldest_jif;
+ }
+
wbc.more_io = 0;
wbc.nr_to_write = write_chunk;
wbc.pages_skipped = 0;
+retry_all:
trace_wbc_writeback_start(&wbc, wb->bdi);
if (work->sb)
__writeback_inodes_sb(work->sb, wb, &wbc);
@@ -733,6 +735,17 @@ static long wb_writeback(struct bdi_writ
if (wbc.nr_to_write <= 0)
continue;
/*
+ * No expired inode? Try all fresh ones
+ */
+ if ((work->for_kupdate || work->for_background) &&
+ wbc.older_than_this &&
+ wbc.nr_to_write == write_chunk &&
+ list_empty(&wb->b_io) &&
+ list_empty(&wb->b_more_io)) {
+ wbc.older_than_this = NULL;
+ goto retry_all;
+ }
+ /*
* Didn't write everything and we don't have more IO, bail
*/
if (!wbc.more_io)
Thanks,
Fengguang
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mel@linux.vnet.ibm.com>, Mel Gorman <mel@csn.ul.ie>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
Itaru Kitayama <kitayama@cl.bb4u.ne.jp>,
Minchan Kim <minchan.kim@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [PATCH 3/6] writeback: sync expired inodes first in background writeback
Date: Wed, 20 Apr 2011 10:53:21 +0800 [thread overview]
Message-ID: <20110420025321.GA14398@localhost> (raw)
In-Reply-To: <20110420012120.GK23985@dastard>
On Wed, Apr 20, 2011 at 09:21:20AM +0800, Dave Chinner wrote:
> On Tue, Apr 19, 2011 at 08:56:16PM +0800, Wu Fengguang wrote:
> > On Tue, Apr 19, 2011 at 05:57:40PM +0800, Jan Kara wrote:
> > > On Tue 19-04-11 17:35:23, Dave Chinner wrote:
> > > > On Tue, Apr 19, 2011 at 11:00:06AM +0800, Wu Fengguang wrote:
> > > > > A background flush work may run for ever. So it's reasonable for it to
> > > > > mimic the kupdate behavior of syncing old/expired inodes first.
> > > > >
> > > > > The policy is
> > > > > - enqueue all newly expired inodes at each queue_io() time
> > > > > - enqueue all dirty inodes if there are no more expired inodes to sync
> > > > >
> > > > > This will help reduce the number of dirty pages encountered by page
> > > > > reclaim, eg. the pageout() calls. Normally older inodes contain older
> > > > > dirty pages, which are more close to the end of the LRU lists. So
> > > > > syncing older inodes first helps reducing the dirty pages reached by
> > > > > the page reclaim code.
> > > >
> > > > Once again I think this is the wrong place to be changing writeback
> > > > policy decisions. for_background writeback only goes through
> > > > wb_writeback() and writeback_inodes_wb() (same as for_kupdate
> > > > writeback), so a decision to change from expired inodes to fresh
> > > > inodes, IMO, should be made in wb_writeback.
> > > >
> > > > That is, for_background and for_kupdate writeback start with the
> > > > same policy (older_than_this set) to writeback expired inodes first,
> > > > then when background writeback runs out of expired inodes, it should
> > > > switch to all remaining inodes by clearing older_than_this instead
> > > > of refreshing it for the next loop.
> > > Yes, I agree with this and my impression is that Fengguang is trying to
> > > achieve exactly this behavior.
> > >
> > > > This keeps all the policy decisions in the one place, all using the
> > > > same (existing) mechanism, and all relatively simple to understand,
> > > > and easy to tracepoint for debugging. Changing writeback policy
> > > > deep in the writeback stack is not a good idea as it will make
> > > > extending writeback policies in future (e.g. for cgroup awareness)
> > > > very messy.
> > > Hmm, I see. I agree the policy decisions should be at one place if
> > > reasonably possible. Fengguang moves them from wb_writeback() to inode
> > > queueing code which looks like a logical place to me as well - there we
> > > have the largest control over what inodes do we decide to write and don't
> > > have to pass all the detailed 'instructions' down in wbc structure. So if
> > > we later want to add cgroup awareness to writeback, I imagine we just add
> > > the knowledge to inode queueing code.
> >
> > I actually started with wb_writeback() as a natural choice, and then
> > found it much easier to do the expired-only=>all-inodes switching in
> > move_expired_inodes() since it needs to know the @b_dirty and @tmp
> > lists' emptiness to trigger the switch. It's not sane for
> > wb_writeback() to look into such details. And once you do the switch
> > part in move_expired_inodes(), the whole policy naturally follows.
>
> Well, not really. You didn't need to modify move_expired_inodes() at
> all to implement these changes - all you needed to do was modify how
> older_than_this is configured.
>
> writeback policy is defined by the struct writeback_control.
> move_expired_inodes() is pure mechanism. What you've done is remove
> policy from the struct wbc and moved it to move_expired_inodes(),
> which now defines both policy and mechanism.
> Furhter, this means that all the tracing that uses the struct wbc no
> no longer shows the entire writeback policy that is being worked on,
> so we lose visibility into policy decisions that writeback is
> making.
Good point! I'm convinced, visibility is a necessity for debugging the
complex writeback behaviors.
> This same change is as simple as updating wbc->older_than_this
> appropriately after the wb_writeback() call for both background and
> kupdate and leaving the lower layers untouched. It's just a policy
> change. If you thinkthe mechanism is inefficient, copy
> wbc->older_than_this to a local variable inside
> move_expired_inodes()....
Do you like something like this? (details will change a bit when
rearranging the patchset)
--- linux-next.orig/fs/fs-writeback.c 2011-04-20 10:30:47.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2011-04-20 10:40:19.000000000 +0800
@@ -660,11 +660,6 @@ static long wb_writeback(struct bdi_writ
long write_chunk;
struct inode *inode;
- if (wbc.for_kupdate) {
- wbc.older_than_this = &oldest_jif;
- oldest_jif = jiffies -
- msecs_to_jiffies(dirty_expire_interval * 10);
- }
if (!wbc.range_cyclic) {
wbc.range_start = 0;
wbc.range_end = LLONG_MAX;
@@ -713,10 +708,17 @@ static long wb_writeback(struct bdi_writ
if (work->for_background && !over_bground_thresh())
break;
+ if (work->for_kupdate || work->for_background) {
+ oldest_jif = jiffies -
+ msecs_to_jiffies(dirty_expire_interval * 10);
+ wbc.older_than_this = &oldest_jif;
+ }
+
wbc.more_io = 0;
wbc.nr_to_write = write_chunk;
wbc.pages_skipped = 0;
+retry_all:
trace_wbc_writeback_start(&wbc, wb->bdi);
if (work->sb)
__writeback_inodes_sb(work->sb, wb, &wbc);
@@ -733,6 +735,17 @@ static long wb_writeback(struct bdi_writ
if (wbc.nr_to_write <= 0)
continue;
/*
+ * No expired inode? Try all fresh ones
+ */
+ if ((work->for_kupdate || work->for_background) &&
+ wbc.older_than_this &&
+ wbc.nr_to_write == write_chunk &&
+ list_empty(&wb->b_io) &&
+ list_empty(&wb->b_more_io)) {
+ wbc.older_than_this = NULL;
+ goto retry_all;
+ }
+ /*
* Didn't write everything and we don't have more IO, bail
*/
if (!wbc.more_io)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-04-20 2:53 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-19 3:00 [PATCH 0/6] writeback: moving expire targets for background/kupdate works Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` [PATCH 1/6] writeback: pass writeback_control down to move_expired_inodes() Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` [PATCH 2/6] writeback: the kupdate expire timestamp should be a moving target Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 7:02 ` Dave Chinner
2011-04-19 7:02 ` Dave Chinner
2011-04-19 7:20 ` Wu Fengguang
2011-04-19 7:20 ` Wu Fengguang
2011-04-19 9:31 ` Jan Kara
2011-04-19 9:31 ` Jan Kara
2011-04-19 3:00 ` [PATCH 3/6] writeback: sync expired inodes first in background writeback Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 7:35 ` Dave Chinner
2011-04-19 7:35 ` Dave Chinner
2011-04-19 9:57 ` Jan Kara
2011-04-19 9:57 ` Jan Kara
2011-04-19 12:56 ` Wu Fengguang
2011-04-19 13:46 ` Wu Fengguang
2011-04-19 13:46 ` Wu Fengguang
2011-04-20 1:21 ` Dave Chinner
2011-04-20 1:21 ` Dave Chinner
2011-04-20 2:53 ` Wu Fengguang [this message]
2011-04-20 2:53 ` Wu Fengguang
2011-04-21 0:45 ` Dave Chinner
2011-04-21 0:45 ` Dave Chinner
2011-04-21 2:06 ` Wu Fengguang
2011-04-21 2:06 ` Wu Fengguang
2011-04-21 3:01 ` Dave Chinner
2011-04-21 3:01 ` Dave Chinner
2011-04-21 3:59 ` Wu Fengguang
2011-04-21 3:59 ` Wu Fengguang
2011-04-21 4:10 ` Wu Fengguang
2011-04-21 4:10 ` Wu Fengguang
2011-04-21 4:36 ` Christoph Hellwig
2011-04-21 4:36 ` Christoph Hellwig
2011-04-21 6:36 ` Dave Chinner
2011-04-21 6:36 ` Dave Chinner
2011-04-21 16:04 ` Jan Kara
2011-04-21 16:04 ` Jan Kara
2011-04-22 2:24 ` Wu Fengguang
2011-04-22 2:24 ` Wu Fengguang
2011-04-22 21:12 ` Jan Kara
2011-04-22 21:12 ` Jan Kara
2011-04-26 5:37 ` Wu Fengguang
2011-04-26 5:37 ` Wu Fengguang
2011-04-26 14:30 ` Jan Kara
2011-04-26 14:30 ` Jan Kara
2011-04-20 7:38 ` Wu Fengguang
2011-04-20 7:38 ` Wu Fengguang
2011-04-21 1:01 ` Dave Chinner
2011-04-21 1:01 ` Dave Chinner
2011-04-21 1:47 ` Wu Fengguang
2011-04-21 1:47 ` Wu Fengguang
2011-04-19 3:00 ` [PATCH 4/6] writeback: introduce writeback_control.inodes_cleaned Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 9:47 ` Jan Kara
2011-04-19 9:47 ` Jan Kara
2011-04-19 3:00 ` [PATCH 5/6] writeback: try more writeback as long as something was written Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 10:20 ` Jan Kara
2011-04-19 10:20 ` Jan Kara
2011-04-19 11:16 ` Wu Fengguang
2011-04-19 11:16 ` Wu Fengguang
2011-04-19 21:10 ` Jan Kara
2011-04-19 21:10 ` Jan Kara
2011-04-20 7:50 ` Wu Fengguang
2011-04-20 7:50 ` Wu Fengguang
2011-04-20 15:22 ` Jan Kara
2011-04-20 15:22 ` Jan Kara
2011-04-21 3:33 ` Wu Fengguang
2011-04-21 4:39 ` Christoph Hellwig
2011-04-21 4:39 ` Christoph Hellwig
2011-04-21 6:05 ` Wu Fengguang
2011-04-21 6:05 ` Wu Fengguang
2011-04-21 16:41 ` Jan Kara
2011-04-21 16:41 ` Jan Kara
2011-04-22 2:32 ` Wu Fengguang
2011-04-22 2:32 ` Wu Fengguang
2011-04-22 21:23 ` Jan Kara
2011-04-22 21:23 ` Jan Kara
2011-04-21 7:09 ` Dave Chinner
2011-04-21 7:09 ` Dave Chinner
2011-04-21 7:14 ` Christoph Hellwig
2011-04-21 7:14 ` Christoph Hellwig
2011-04-21 7:52 ` Dave Chinner
2011-04-21 7:52 ` Dave Chinner
2011-04-21 8:00 ` Christoph Hellwig
2011-04-21 8:00 ` Christoph Hellwig
2011-04-19 3:00 ` [PATCH 6/6] NFS: return -EAGAIN when skipped commit in nfs_commit_unstable_pages() Wu Fengguang
2011-04-19 3:00 ` Wu Fengguang
2011-04-19 3:29 ` Trond Myklebust
2011-04-19 3:29 ` Trond Myklebust
2011-04-19 3:55 ` Wu Fengguang
2011-04-19 3:55 ` Wu Fengguang
2011-04-21 4:40 ` Christoph Hellwig
2011-04-21 4:40 ` Christoph Hellwig
2011-04-19 6:38 ` [PATCH 0/6] writeback: moving expire targets for background/kupdate works Dave Chinner
2011-04-19 6:38 ` Dave Chinner
2011-04-19 8:02 ` Wu Fengguang
2011-04-19 8:02 ` Wu Fengguang
2011-04-21 4:34 ` Christoph Hellwig
2011-04-21 4:34 ` Christoph Hellwig
2011-04-21 5:50 ` Wu Fengguang
2011-04-21 5:50 ` Wu Fengguang
2011-04-21 5:56 ` Christoph Hellwig
2011-04-21 5:56 ` Christoph Hellwig
2011-04-21 6:07 ` Wu Fengguang
2011-04-21 6:07 ` Wu Fengguang
2011-04-21 7:17 ` Christoph Hellwig
2011-04-21 7:17 ` Christoph Hellwig
2011-04-21 10:15 ` Wu Fengguang
2011-04-21 10:15 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110420025321.GA14398@localhost \
--to=fengguang.wu@intel.com \
--cc=Trond.Myklebust@netapp.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=kitayama@cl.bb4u.ne.jp \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mel@linux.vnet.ibm.com \
--cc=minchan.kim@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.