From: Wu Fengguang <fengguang.wu@intel.com>
To: Jeff Layton <jlayton@redhat.com>
Cc: Ian Kent <raven@themaw.net>, Dave Chinner <david@fromorbit.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"hch@infradead.org" <hch@infradead.org>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] writeback: reset inode dirty time when adding it back to empty s_dirty list
Date: Fri, 27 Mar 2009 10:13:03 +0800 [thread overview]
Message-ID: <20090327021303.GA7547@localhost> (raw)
In-Reply-To: <20090326130327.3206e00b@barsoom.rdu.redhat.com>
On Fri, Mar 27, 2009 at 01:03:27AM +0800, Jeff Layton wrote:
> On Wed, 25 Mar 2009 22:16:18 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
>
> > >
> > > Actually, I think you were right. We still have this check in
> > > generic_sync_sb_inodes() even with Wu's January 2008 patches:
> > >
> > > /* Was this inode dirtied after sync_sb_inodes was called? */
> > > if (time_after(inode->dirtied_when, start))
> > > break;
> >
> > Yeah, ugly code. Jens' per-bdi flush daemons should eliminate it...
> >
>
> I had a look over Jens' patches and they seem to be more concerned with
> how the queues and daemons are organized (per-bdi rather than per-sb).
> The actual way that inodes flow between the queues and get written out
> don't look like they really change with his set.
OK, sorry that I've not carefully reviewed the per-bdi flushing patchset.
> They also don't eliminate the problematic check above. Regardless of
> whether your or Jens' patches make it in, I think we'll still need
> something like the following (untested) patch.
>
> If this looks ok, I'll flesh out the comments some and "officially" post
> it. Thoughts?
It's good in itself. However with more_io_wait queue, the first two
chunks will be eliminated. Mind I carry this patch with my patchset?
Thanks,
Fengguang
> --------------[snip]-----------------
>
> >From d10adff2d5f9a15d19c438119dbb2c410bd26e3c Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@redhat.com>
> Date: Thu, 26 Mar 2009 12:54:52 -0400
> Subject: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks
>
> The dirtied_when value on an inode is supposed to represent the first
> time that an inode has one of its pages dirtied. This value is in units
> of jiffies. This value is used in several places in the writeback code
> to determine when to write out an inode.
>
> The problem is that these checks assume that dirtied_when is updated
> periodically. But if an inode is continuously being used for I/O it can
> be persistently marked as dirty and will continue to age. Once the time
> difference between dirtied_when and the jiffies value it is being
> compared to is greater than (or equal to) half the maximum of the
> jiffies type, the logic of the time_*() macros inverts and the opposite
> of what is needed is returned. On 32-bit architectures that's just under
> 25 days (assuming HZ == 1000).
>
> As the least-recently dirtied inode, it'll end up being the first one
> that pdflush will try to write out. sync_sb_inodes does this check
> however:
>
> /* Was this inode dirtied after sync_sb_inodes was called? */
> if (time_after(inode->dirtied_when, start))
> break;
>
> ...but now dirtied_when appears to be in the future. sync_sb_inodes
> bails out without attempting to write any dirty inodes. When this
> occurs, pdflush will stop writing out inodes for this superblock and
> nothing will unwedge it until jiffies moves out of the problematic
> window.
>
> This patch fixes this problem by changing the time_after checks against
> dirtied_when to also check whether dirtied_when appears to be in the
> future. If it does, then we consider the value to be in the past.
>
> This should shrink the problematic window to such a small period as not
> to matter.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
> fs/fs-writeback.c | 11 +++++++----
> 1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index e3fe991..dba69a5 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -196,8 +196,9 @@ static void redirty_tail(struct inode *inode)
> struct inode *tail_inode;
>
> tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
> - if (!time_after_eq(inode->dirtied_when,
> - tail_inode->dirtied_when))
> + if (time_before(inode->dirtied_when,
> + tail_inode->dirtied_when) ||
> + time_after(inode->dirtied_when, jiffies))
> inode->dirtied_when = jiffies;
> }
> list_move(&inode->i_list, &sb->s_dirty);
> @@ -231,7 +232,8 @@ static void move_expired_inodes(struct list_head *delaying_queue,
> struct inode *inode = list_entry(delaying_queue->prev,
> struct inode, i_list);
> if (older_than_this &&
> - time_after(inode->dirtied_when, *older_than_this))
> + time_after(inode->dirtied_when, *older_than_this) &&
> + time_before_eq(inode->dirtied_when, jiffies))
> break;
> list_move(&inode->i_list, dispatch_queue);
> }
> @@ -493,7 +495,8 @@ void generic_sync_sb_inodes(struct super_block *sb,
> }
>
> /* Was this inode dirtied after sync_sb_inodes was called? */
> - if (time_after(inode->dirtied_when, start))
> + if (time_after(inode->dirtied_when, start) &&
> + time_before_eq(inode->dirtied_when, jiffies))
> break;
>
> /* Is another pdflush already flushing this queue? */
> --
> 1.5.5.6
next prev parent reply other threads:[~2009-03-27 2:13 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-23 20:30 [PATCH] writeback: reset inode dirty time when adding it back to empty s_dirty list Jeff Layton
2009-03-24 4:41 ` Ian Kent
2009-03-24 5:04 ` Ian Kent
2009-03-24 13:57 ` Wu Fengguang
2009-03-24 14:27 ` Ian Kent
2009-03-24 14:28 ` Jeff Layton
2009-03-24 14:46 ` Jeff Layton
2009-03-24 15:04 ` Ian Kent
2009-03-25 2:25 ` Wu Fengguang
2009-03-25 1:28 ` Wu Fengguang
2009-03-25 2:15 ` Jeff Layton
[not found] ` <20090324221528.2bb7c50b-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2009-03-25 2:50 ` Wu Fengguang
2009-03-25 11:51 ` Jeff Layton
[not found] ` <20090325075110.028f0d1d-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2009-03-25 12:17 ` Wu Fengguang
2009-03-25 13:13 ` Jeff Layton
2009-03-25 13:18 ` Ian Kent
2009-03-25 13:38 ` Ian Kent
2009-03-25 13:44 ` Wu Fengguang
2009-03-25 14:00 ` Jeff Layton
2009-03-25 14:16 ` Wu Fengguang
2009-03-25 14:28 ` Jeff Layton
[not found] ` <20090325102833.138819d1-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2009-03-25 14:38 ` Wu Fengguang
2009-03-26 17:03 ` Jeff Layton
2009-03-27 2:13 ` Wu Fengguang [this message]
2009-03-27 11:16 ` Jeff Layton
[not found] ` <20090327071633.0c1a0e3a-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2009-03-28 12:44 ` Wu Fengguang
2009-03-25 16:55 ` hch
[not found] ` <20090325165500.GA6047-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-03-25 20:07 ` Chris Mason
2009-03-25 2:56 ` Ian Kent
2009-03-25 3:28 ` Wu Fengguang
2009-03-25 5:03 ` Ian Kent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090327021303.GA7547@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=raven@themaw.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).