linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
@ 2009-04-01  0:03 Jeff Layton
  2009-04-01  0:20 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2009-04-01  0:03 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-fsdevel

This is the second version of this patch. The only difference from the
first version is the addition of some comments.

The dirtied_when value on an inode is supposed to represent the first
time that an inode has one of its pages dirtied. This value is in units
of jiffies. It's used in several places in the writeback code to
determine when to write out an inode.

The problem is that these checks assume that dirtied_when is updated
periodically. If an inode is continuously being used for I/O it can be
persistently marked as dirty and will continue to age. Once the time
difference between dirtied_when and the jiffies value it is being
compared to is greater than or equal to half the maximum of the jiffies
type, the logic of the time_*() macros inverts and the opposite of what
is needed is returned. On 32-bit architectures that's just under 25 days
(assuming HZ == 1000).

As the least-recently dirtied inode, it'll end up being the first one
that pdflush will try to write out. sync_sb_inodes does this check:

	/* Was this inode dirtied after sync_sb_inodes was called? */
 	if (time_after(inode->dirtied_when, start))
 		break;

...but now dirtied_when appears to be in the future. sync_sb_inodes
bails out without attempting to write any dirty inodes. When this
occurs, pdflush will stop writing out inodes for this superblock.
Nothing can unwedge it until jiffies moves out of the problematic
window.

This patch fixes this problem by changing the checks against
dirtied_when to also check whether it appears to be in the future. If it
does, then we consider the value to be far in the past.

This should shrink the problematic window of time to such a small period
as not to matter.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Ian Kent <raven@themaw.net>
---
 fs/fs-writeback.c |   32 +++++++++++++++++++++++++++-----
 1 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e3fe991..0c10c61 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -196,8 +196,13 @@ static void redirty_tail(struct inode *inode)
 		struct inode *tail_inode;
 
 		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
-		if (!time_after_eq(inode->dirtied_when,
-				tail_inode->dirtied_when))
+		/*
+		 * must also check whether dirtied_when appears to be in the
+		 * future, in which case it's actually in the distant past.
+		 */
+		if (time_before(inode->dirtied_when,
+				tail_inode->dirtied_when) ||
+		    time_after(inode->dirtied_when, jiffies))
 			inode->dirtied_when = jiffies;
 	}
 	list_move(&inode->i_list, &sb->s_dirty);
@@ -230,8 +235,13 @@ static void move_expired_inodes(struct list_head *delaying_queue,
 	while (!list_empty(delaying_queue)) {
 		struct inode *inode = list_entry(delaying_queue->prev,
 						struct inode, i_list);
+		/*
+		 * must also check whether dirtied_when appears to be in the
+		 * future, in which case it's actually in the distant past.
+		 */
 		if (older_than_this &&
-			time_after(inode->dirtied_when, *older_than_this))
+			time_after(inode->dirtied_when, *older_than_this) &&
+			time_before_eq(inode->dirtied_when, jiffies))
 			break;
 		list_move(&inode->i_list, dispatch_queue);
 	}
@@ -492,8 +502,20 @@ void generic_sync_sb_inodes(struct super_block *sb,
 			continue;		/* blockdev has wrong queue */
 		}
 
-		/* Was this inode dirtied after sync_sb_inodes was called? */
-		if (time_after(inode->dirtied_when, start))
+		/*
+		 * Was this inode dirtied after sync_sb_inodes was called?
+		 *
+		 * It's not sufficient to just do a time_after() check on
+		 * dirtied_when. That assumes that dirtied_when will always
+		 * change within a period of jiffies that encompasses half the
+		 * machine word size (2^31 jiffies on 32-bit arch). That's not
+		 * necessarily the case if an inode is being constantly
+		 * redirtied. Since dirtied_when can never be in the future,
+		 * we can assume that if it appears to be so then it is
+		 * actually in the distant past.
+		 */
+		if (time_after(inode->dirtied_when, start) &&
+		    time_before_eq(inode->dirtied_when, jiffies))
 			break;
 
 		/* Is another pdflush already flushing this queue? */
-- 
1.5.5.6


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01  0:03 [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2) Jeff Layton
@ 2009-04-01  0:20 ` Andrew Morton
  2009-04-01  0:50   ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2009-04-01  0:20 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-fsdevel

On Tue, 31 Mar 2009 20:03:59 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> +		 * It's not sufficient to just do a time_after() check on
> +		 * dirtied_when. That assumes that dirtied_when will always
> +		 * change within a period of jiffies that encompasses half the
> +		 * machine word size (2^31 jiffies on 32-bit arch). That's not
> +		 * necessarily the case if an inode is being constantly
> +		 * redirtied. Since dirtied_when can never be in the future,
> +		 * we can assume that if it appears to be so then it is
> +		 * actually in the distant past.

so this really is a 32-bit-only thing.

I guess that isn't worth optimising for though.

otoh, given that all three comparisons are the same:

+			time_after(inode->dirtied_when, *older_than_this) &&
+			time_before_eq(inode->dirtied_when, jiffies))

(although one is inverted (i think?)), it might end up nicer if this was all done
in a little helper function?

That way we only need to comment what's going on at a single site, and
we could omit the additional test if !CONFIG_64BIT.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01  0:20 ` Andrew Morton
@ 2009-04-01  0:50   ` Jeff Layton
  2009-04-01  1:07     ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2009-04-01  0:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

On Tue, 31 Mar 2009 17:20:31 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 31 Mar 2009 20:03:59 -0400
> Jeff Layton <jlayton@redhat.com> wrote:
> 
> > +		 * It's not sufficient to just do a time_after() check on
> > +		 * dirtied_when. That assumes that dirtied_when will always
> > +		 * change within a period of jiffies that encompasses half the
> > +		 * machine word size (2^31 jiffies on 32-bit arch). That's not
> > +		 * necessarily the case if an inode is being constantly
> > +		 * redirtied. Since dirtied_when can never be in the future,
> > +		 * we can assume that if it appears to be so then it is
> > +		 * actually in the distant past.
> 
> so this really is a 32-bit-only thing.
> 
> I guess that isn't worth optimising for though.
> 

Yeah, it's pretty much impossible to hit this on a 64-bit machine.

> otoh, given that all three comparisons are the same:
> 
> +			time_after(inode->dirtied_when, *older_than_this) &&
> +			time_before_eq(inode->dirtied_when, jiffies))
> 
> (although one is inverted (i think?)), it might end up nicer if this was all done
> in a little helper function?
> 
> That way we only need to comment what's going on at a single site, and
> we could omit the additional test if !CONFIG_64BIT.

Ok, that seems reasonable.

At one point I had a macro similar to time_in_range(), but dropped it
primarily because time_after_but_before_eq() wasn't easy on the eyes.
Thoughts on better names?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01  0:50   ` Jeff Layton
@ 2009-04-01  1:07     ` Andrew Morton
  2009-04-01  6:56       ` Wu Fengguang
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2009-04-01  1:07 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, linux-fsdevel

On Tue, 31 Mar 2009 20:50:18 -0400 Jeff Layton <jlayton@redhat.com> wrote:

> On Tue, 31 Mar 2009 17:20:31 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Tue, 31 Mar 2009 20:03:59 -0400
> > Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > > +		 * It's not sufficient to just do a time_after() check on
> > > +		 * dirtied_when. That assumes that dirtied_when will always
> > > +		 * change within a period of jiffies that encompasses half the
> > > +		 * machine word size (2^31 jiffies on 32-bit arch). That's not
> > > +		 * necessarily the case if an inode is being constantly
> > > +		 * redirtied. Since dirtied_when can never be in the future,
> > > +		 * we can assume that if it appears to be so then it is
> > > +		 * actually in the distant past.
> > 
> > so this really is a 32-bit-only thing.
> > 
> > I guess that isn't worth optimising for though.
> > 
> 
> Yeah, it's pretty much impossible to hit this on a 64-bit machine.
> 
> > otoh, given that all three comparisons are the same:
> > 
> > +			time_after(inode->dirtied_when, *older_than_this) &&
> > +			time_before_eq(inode->dirtied_when, jiffies))
> > 
> > (although one is inverted (i think?)), it might end up nicer if this was all done
> > in a little helper function?
> > 
> > That way we only need to comment what's going on at a single site, and
> > we could omit the additional test if !CONFIG_64BIT.
> 
> Ok, that seems reasonable.
> 
> At one point I had a macro similar to time_in_range(), but dropped it
> primarily because time_after_but_before_eq() wasn't easy on the eyes.
> Thoughts on better names?

I was thinking

	bool inode_dirtied_after(...);

and just leave the innards using time_after() and time_before_eq()?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01  1:07     ` Andrew Morton
@ 2009-04-01  6:56       ` Wu Fengguang
  2009-04-01 11:53         ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Wu Fengguang @ 2009-04-01  6:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jeff Layton, linux-kernel, linux-fsdevel

On Tue, Mar 31, 2009 at 06:07:30PM -0700, Andrew Morton wrote:
> On Tue, 31 Mar 2009 20:50:18 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Tue, 31 Mar 2009 17:20:31 -0700
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Tue, 31 Mar 2009 20:03:59 -0400
> > > Jeff Layton <jlayton@redhat.com> wrote:
> > > 
> > > > +		 * It's not sufficient to just do a time_after() check on
> > > > +		 * dirtied_when. That assumes that dirtied_when will always
> > > > +		 * change within a period of jiffies that encompasses half the
> > > > +		 * machine word size (2^31 jiffies on 32-bit arch). That's not
> > > > +		 * necessarily the case if an inode is being constantly
> > > > +		 * redirtied. Since dirtied_when can never be in the future,
> > > > +		 * we can assume that if it appears to be so then it is
> > > > +		 * actually in the distant past.
> > > 
> > > so this really is a 32-bit-only thing.
> > > 
> > > I guess that isn't worth optimising for though.
> > > 
> > 
> > Yeah, it's pretty much impossible to hit this on a 64-bit machine.
> > 
> > > otoh, given that all three comparisons are the same:
> > > 
> > > +			time_after(inode->dirtied_when, *older_than_this) &&
> > > +			time_before_eq(inode->dirtied_when, jiffies))
> > > 
> > > (although one is inverted (i think?)), it might end up nicer if this was all done
> > > in a little helper function?
> > > 
> > > That way we only need to comment what's going on at a single site, and
> > > we could omit the additional test if !CONFIG_64BIT.
> > 
> > Ok, that seems reasonable.
> > 
> > At one point I had a macro similar to time_in_range(), but dropped it
> > primarily because time_after_but_before_eq() wasn't easy on the eyes.
> > Thoughts on better names?
> 
> I was thinking
> 
> 	bool inode_dirtied_after(...);
> 
> and just leave the innards using time_after() and time_before_eq()?

Andrew, here is the updated patch. Note that the first chunk for
redirty_tail() was not absolutely necessary and so removed.

Thanks,
Fengguang
---
Subject: writeback: guard against jiffies wraparound on inode->dirtied_when checks
From:	Jeff Layton <jlayton@redhat.com>

The dirtied_when value on an inode is supposed to represent the first
time that an inode has one of its pages dirtied. This value is in units
of jiffies. It's used in several places in the writeback code to
determine when to write out an inode.

The problem is that these checks assume that dirtied_when is updated
periodically. If an inode is continuously being used for I/O it can be
persistently marked as dirty and will continue to age. Once the time
difference between dirtied_when and the jiffies value it is being
compared to is greater than or equal to half the maximum of the jiffies
type, the logic of the time_*() macros inverts and the opposite of what
is needed is returned. On 32-bit architectures that's just under 25 days
(assuming HZ == 1000).

As the least-recently dirtied inode, it'll end up being the first one
that pdflush will try to write out. sync_sb_inodes() does this check:

	/* Was this inode dirtied after sync_sb_inodes was called? */
 	if (time_after(inode->dirtied_when, start))
 		break;

...but now dirtied_when appears to be in the future. sync_sb_inodes()
bails out without attempting to write any dirty inodes. When this
occurs, pdflush will stop writing out inodes for this superblock.
Nothing can unwedge it until jiffies moves out of the problematic
window.

Fix this problem by changing the checks against dirtied_when to also
check whether it appears to be in the future. If it does, then we
consider the value to be far in the past.

This should shrink the problematic window of time to such a small
period(30s) as not to matter.

Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

--- mm.orig/fs/fs-writeback.c
+++ mm/fs/fs-writeback.c
@@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i
 		struct inode *tail_inode;
 
 		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
-		if (!time_after_eq(inode->dirtied_when,
+		if (time_before(inode->dirtied_when,
 				tail_inode->dirtied_when))
 			inode->dirtied_when = jiffies;
 	}
@@ -220,6 +220,21 @@ static void inode_sync_complete(struct i
 	wake_up_bit(&inode->i_state, __I_SYNC);
 }
 
+static bool inode_dirtied_after(struct inode *inode, unsigned long t)
+{
+	bool ret = time_after(inode->dirtied_when, t);
+#ifndef CONFIG_64BIT
+	/*
+	 * For inodes being constantly redirtied, dirtied_when can get stuck.
+	 * It _appears_ to be in the future, but is actually in distant past.
+	 * This test is necessary to prevent such wrapped-around relative times
+	 * from permanently stopping the whole pdflush writeback.
+	 */
+	ret = ret && time_before_eq(inode->dirtied_when, jiffies);
+#endif
+	return ret;
+}
+
 /*
  * Move expired dirty inodes from @delaying_queue to @dispatch_queue.
  */
@@ -231,7 +246,7 @@ static void move_expired_inodes(struct l
 		struct inode *inode = list_entry(delaying_queue->prev,
 						struct inode, i_list);
 		if (older_than_this &&
-			time_after(inode->dirtied_when, *older_than_this))
+				inode_dirtied_after(inode, *older_than_this))
 			break;
 		list_move(&inode->i_list, dispatch_queue);
 	}
@@ -492,8 +507,11 @@ void generic_sync_sb_inodes(struct super
 			continue;		/* blockdev has wrong queue */
 		}
 
-		/* Was this inode dirtied after sync_sb_inodes was called? */
-		if (time_after(inode->dirtied_when, start))
+		/*
+		 * Was this inode dirtied after sync_sb_inodes was called?
+		 * This keeps sync from extra jobs and livelock.
+		 */
+		if (inode_dirtied_after(inode, start))
 			break;
 
 		/* Is another pdflush already flushing this queue? */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01  6:56       ` Wu Fengguang
@ 2009-04-01 11:53         ` Jeff Layton
  2009-04-01 12:26           ` Wu Fengguang
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2009-04-01 11:53 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

On Wed, 1 Apr 2009 14:56:18 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Tue, Mar 31, 2009 at 06:07:30PM -0700, Andrew Morton wrote:
> > On Tue, 31 Mar 2009 20:50:18 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > > On Tue, 31 Mar 2009 17:20:31 -0700
> > > Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > On Tue, 31 Mar 2009 20:03:59 -0400
> > > > Jeff Layton <jlayton@redhat.com> wrote:
> > > > 
> > > > > +		 * It's not sufficient to just do a time_after() check on
> > > > > +		 * dirtied_when. That assumes that dirtied_when will always
> > > > > +		 * change within a period of jiffies that encompasses half the
> > > > > +		 * machine word size (2^31 jiffies on 32-bit arch). That's not
> > > > > +		 * necessarily the case if an inode is being constantly
> > > > > +		 * redirtied. Since dirtied_when can never be in the future,
> > > > > +		 * we can assume that if it appears to be so then it is
> > > > > +		 * actually in the distant past.
> > > > 
> > > > so this really is a 32-bit-only thing.
> > > > 
> > > > I guess that isn't worth optimising for though.
> > > > 
> > > 
> > > Yeah, it's pretty much impossible to hit this on a 64-bit machine.
> > > 
> > > > otoh, given that all three comparisons are the same:
> > > > 
> > > > +			time_after(inode->dirtied_when, *older_than_this) &&
> > > > +			time_before_eq(inode->dirtied_when, jiffies))
> > > > 
> > > > (although one is inverted (i think?)), it might end up nicer if this was all done
> > > > in a little helper function?
> > > > 
> > > > That way we only need to comment what's going on at a single site, and
> > > > we could omit the additional test if !CONFIG_64BIT.
> > > 
> > > Ok, that seems reasonable.
> > > 
> > > At one point I had a macro similar to time_in_range(), but dropped it
> > > primarily because time_after_but_before_eq() wasn't easy on the eyes.
> > > Thoughts on better names?
> > 
> > I was thinking
> > 
> > 	bool inode_dirtied_after(...);
> > 
> > and just leave the innards using time_after() and time_before_eq()?
> 
> Andrew, here is the updated patch. Note that the first chunk for
> redirty_tail() was not absolutely necessary and so removed.
> 
> Thanks,
> Fengguang
> ---
> Subject: writeback: guard against jiffies wraparound on inode->dirtied_when checks
> From:	Jeff Layton <jlayton@redhat.com>
> 
> The dirtied_when value on an inode is supposed to represent the first
> time that an inode has one of its pages dirtied. This value is in units
> of jiffies. It's used in several places in the writeback code to
> determine when to write out an inode.
> 
> The problem is that these checks assume that dirtied_when is updated
> periodically. If an inode is continuously being used for I/O it can be
> persistently marked as dirty and will continue to age. Once the time
> difference between dirtied_when and the jiffies value it is being
> compared to is greater than or equal to half the maximum of the jiffies
> type, the logic of the time_*() macros inverts and the opposite of what
> is needed is returned. On 32-bit architectures that's just under 25 days
> (assuming HZ == 1000).
> 
> As the least-recently dirtied inode, it'll end up being the first one
> that pdflush will try to write out. sync_sb_inodes() does this check:
> 
> 	/* Was this inode dirtied after sync_sb_inodes was called? */
>  	if (time_after(inode->dirtied_when, start))
>  		break;
> 
> ...but now dirtied_when appears to be in the future. sync_sb_inodes()
> bails out without attempting to write any dirty inodes. When this
> occurs, pdflush will stop writing out inodes for this superblock.
> Nothing can unwedge it until jiffies moves out of the problematic
> window.
> 
> Fix this problem by changing the checks against dirtied_when to also
> check whether it appears to be in the future. If it does, then we
> consider the value to be far in the past.
> 
> This should shrink the problematic window of time to such a small
> period(30s) as not to matter.
> 
> Acked-by: Ian Kent <raven@themaw.net>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  fs/fs-writeback.c |   26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> --- mm.orig/fs/fs-writeback.c
> +++ mm/fs/fs-writeback.c
> @@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i
>  		struct inode *tail_inode;
>  
>  		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
> -		if (!time_after_eq(inode->dirtied_when,
> +		if (time_before(inode->dirtied_when,
>  				tail_inode->dirtied_when))
>  			inode->dirtied_when = jiffies;
>  	}

I think we need a similar change in this function in order to maintain
the list order.

Consider this case:

We have an s_dirty list with a head inode that appears to be in the
future. We start writeback and clear out s_dirty (all of the inodes are
moved to s_io). A new inode is dirtied, and goes onto the empty s_dirty
list with a dirtied_when value that equals now. The inode with the
dirtied_when value that looks like it's in the future is redirtied while
being written and redirty_tail is called. It goes back on the list
without resetting dirtied_when even though it's actually older than the
inode at the tail.

There is another option too that I'll throw out here...

We could just make dirtied_when a 64 bit value on 32 bit machines and
use jiffies_64 there. On the upside there is no "problematic
window" with that. The downside is that struct inode would grow by 4
bytes on 32 bit arches, and checking jiffies_64 on such an arch is
more computationally intensive. We'd also have to change the size of
older_than_this value in the writeback_control struct too if we want to
go this route...


> @@ -220,6 +220,21 @@ static void inode_sync_complete(struct i
>  	wake_up_bit(&inode->i_state, __I_SYNC);
>  }
>  
> +static bool inode_dirtied_after(struct inode *inode, unsigned long t)
> +{
> +	bool ret = time_after(inode->dirtied_when, t);
> +#ifndef CONFIG_64BIT
> +	/*
> +	 * For inodes being constantly redirtied, dirtied_when can get stuck.
> +	 * It _appears_ to be in the future, but is actually in distant past.
> +	 * This test is necessary to prevent such wrapped-around relative times
> +	 * from permanently stopping the whole pdflush writeback.
> +	 */
> +	ret = ret && time_before_eq(inode->dirtied_when, jiffies);
> +#endif
> +	return ret;
> +}
> +
>  /*
>   * Move expired dirty inodes from @delaying_queue to @dispatch_queue.
>   */
> @@ -231,7 +246,7 @@ static void move_expired_inodes(struct l
>  		struct inode *inode = list_entry(delaying_queue->prev,
>  						struct inode, i_list);
>  		if (older_than_this &&
> -			time_after(inode->dirtied_when, *older_than_this))
> +				inode_dirtied_after(inode, *older_than_this))
>  			break;
>  		list_move(&inode->i_list, dispatch_queue);
>  	}
> @@ -492,8 +507,11 @@ void generic_sync_sb_inodes(struct super
>  			continue;		/* blockdev has wrong queue */
>  		}
>  
> -		/* Was this inode dirtied after sync_sb_inodes was called? */
> -		if (time_after(inode->dirtied_when, start))
> +		/*
> +		 * Was this inode dirtied after sync_sb_inodes was called?
> +		 * This keeps sync from extra jobs and livelock.
> +		 */
> +		if (inode_dirtied_after(inode, start))
>  			break;
>  
>  		/* Is another pdflush already flushing this queue? */


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01 11:53         ` Jeff Layton
@ 2009-04-01 12:26           ` Wu Fengguang
  2009-04-01 12:48             ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Wu Fengguang @ 2009-04-01 12:26 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Andrew Morton, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On Wed, Apr 01, 2009 at 07:53:20PM +0800, Jeff Layton wrote:
> On Wed, 1 Apr 2009 14:56:18 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Tue, Mar 31, 2009 at 06:07:30PM -0700, Andrew Morton wrote:
> > > On Tue, 31 Mar 2009 20:50:18 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> > > 
> > > > On Tue, 31 Mar 2009 17:20:31 -0700
> > > > Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > 
> > > > > On Tue, 31 Mar 2009 20:03:59 -0400
> > > > > Jeff Layton <jlayton@redhat.com> wrote:
> > > > > 
> > > > > > +		 * It's not sufficient to just do a time_after() check on
> > > > > > +		 * dirtied_when. That assumes that dirtied_when will always
> > > > > > +		 * change within a period of jiffies that encompasses half the
> > > > > > +		 * machine word size (2^31 jiffies on 32-bit arch). That's not
> > > > > > +		 * necessarily the case if an inode is being constantly
> > > > > > +		 * redirtied. Since dirtied_when can never be in the future,
> > > > > > +		 * we can assume that if it appears to be so then it is
> > > > > > +		 * actually in the distant past.
> > > > > 
> > > > > so this really is a 32-bit-only thing.
> > > > > 
> > > > > I guess that isn't worth optimising for though.
> > > > > 
> > > > 
> > > > Yeah, it's pretty much impossible to hit this on a 64-bit machine.
> > > > 
> > > > > otoh, given that all three comparisons are the same:
> > > > > 
> > > > > +			time_after(inode->dirtied_when, *older_than_this) &&
> > > > > +			time_before_eq(inode->dirtied_when, jiffies))
> > > > > 
> > > > > (although one is inverted (i think?)), it might end up nicer if this was all done
> > > > > in a little helper function?
> > > > > 
> > > > > That way we only need to comment what's going on at a single site, and
> > > > > we could omit the additional test if !CONFIG_64BIT.
> > > > 
> > > > Ok, that seems reasonable.
> > > > 
> > > > At one point I had a macro similar to time_in_range(), but dropped it
> > > > primarily because time_after_but_before_eq() wasn't easy on the eyes.
> > > > Thoughts on better names?
> > > 
> > > I was thinking
> > > 
> > > 	bool inode_dirtied_after(...);
> > > 
> > > and just leave the innards using time_after() and time_before_eq()?
> > 
> > Andrew, here is the updated patch. Note that the first chunk for
> > redirty_tail() was not absolutely necessary and so removed.
> > 
> > Thanks,
> > Fengguang
> > ---
> > Subject: writeback: guard against jiffies wraparound on inode->dirtied_when checks
> > From:	Jeff Layton <jlayton@redhat.com>
> > 
> > The dirtied_when value on an inode is supposed to represent the first
> > time that an inode has one of its pages dirtied. This value is in units
> > of jiffies. It's used in several places in the writeback code to
> > determine when to write out an inode.
> > 
> > The problem is that these checks assume that dirtied_when is updated
> > periodically. If an inode is continuously being used for I/O it can be
> > persistently marked as dirty and will continue to age. Once the time
> > difference between dirtied_when and the jiffies value it is being
> > compared to is greater than or equal to half the maximum of the jiffies
> > type, the logic of the time_*() macros inverts and the opposite of what
> > is needed is returned. On 32-bit architectures that's just under 25 days
> > (assuming HZ == 1000).
> > 
> > As the least-recently dirtied inode, it'll end up being the first one
> > that pdflush will try to write out. sync_sb_inodes() does this check:
> > 
> > 	/* Was this inode dirtied after sync_sb_inodes was called? */
> >  	if (time_after(inode->dirtied_when, start))
> >  		break;
> > 
> > ...but now dirtied_when appears to be in the future. sync_sb_inodes()
> > bails out without attempting to write any dirty inodes. When this
> > occurs, pdflush will stop writing out inodes for this superblock.
> > Nothing can unwedge it until jiffies moves out of the problematic
> > window.
> > 
> > Fix this problem by changing the checks against dirtied_when to also
> > check whether it appears to be in the future. If it does, then we
> > consider the value to be far in the past.
> > 
> > This should shrink the problematic window of time to such a small
> > period(30s) as not to matter.
> > 
> > Acked-by: Ian Kent <raven@themaw.net>
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  fs/fs-writeback.c |   26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> > 
> > --- mm.orig/fs/fs-writeback.c
> > +++ mm/fs/fs-writeback.c
> > @@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i
> >  		struct inode *tail_inode;
> >  
> >  		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
> > -		if (!time_after_eq(inode->dirtied_when,
> > +		if (time_before(inode->dirtied_when,
> >  				tail_inode->dirtied_when))
> >  			inode->dirtied_when = jiffies;
> >  	}
> 
> I think we need a similar change in this function in order to maintain
> the list order.
> 
> Consider this case:
> 
> We have an s_dirty list with a head inode that appears to be in the
> future. We start writeback and clear out s_dirty (all of the inodes are
> moved to s_io). A new inode is dirtied, and goes onto the empty s_dirty
> list with a dirtied_when value that equals now. The inode with the
> dirtied_when value that looks like it's in the future is redirtied while
> being written and redirty_tail is called. It goes back on the list
> without resetting dirtied_when even though it's actually older than the
> inode at the tail.

What's the difference? It _is_ the past because all 2 reference sites
are now taught to think so.

So s_dirty is still in order, and the writeback process won't be blocked.

> There is another option too that I'll throw out here...
> 
> We could just make dirtied_when a 64 bit value on 32 bit machines and
> use jiffies_64 there. On the upside there is no "problematic
> window" with that. The downside is that struct inode would grow by 4
> bytes on 32 bit arches, and checking jiffies_64 on such an arch is
> more computationally intensive. We'd also have to change the size of
> older_than_this value in the writeback_control struct too if we want to
> go this route...

Yes that could eliminate the 30s or more temporary writeback stillness.
The only problem is the extra costs for normal cases, especially the
space cost.

Thanks,
Fengguang

> 
> > @@ -220,6 +220,21 @@ static void inode_sync_complete(struct i
> >  	wake_up_bit(&inode->i_state, __I_SYNC);
> >  }
> >  
> > +static bool inode_dirtied_after(struct inode *inode, unsigned long t)
> > +{
> > +	bool ret = time_after(inode->dirtied_when, t);
> > +#ifndef CONFIG_64BIT
> > +	/*
> > +	 * For inodes being constantly redirtied, dirtied_when can get stuck.
> > +	 * It _appears_ to be in the future, but is actually in distant past.
> > +	 * This test is necessary to prevent such wrapped-around relative times
> > +	 * from permanently stopping the whole pdflush writeback.
> > +	 */
> > +	ret = ret && time_before_eq(inode->dirtied_when, jiffies);
> > +#endif
> > +	return ret;
> > +}
> > +
> >  /*
> >   * Move expired dirty inodes from @delaying_queue to @dispatch_queue.
> >   */
> > @@ -231,7 +246,7 @@ static void move_expired_inodes(struct l
> >  		struct inode *inode = list_entry(delaying_queue->prev,
> >  						struct inode, i_list);
> >  		if (older_than_this &&
> > -			time_after(inode->dirtied_when, *older_than_this))
> > +				inode_dirtied_after(inode, *older_than_this))
> >  			break;
> >  		list_move(&inode->i_list, dispatch_queue);
> >  	}
> > @@ -492,8 +507,11 @@ void generic_sync_sb_inodes(struct super
> >  			continue;		/* blockdev has wrong queue */
> >  		}
> >  
> > -		/* Was this inode dirtied after sync_sb_inodes was called? */
> > -		if (time_after(inode->dirtied_when, start))
> > +		/*
> > +		 * Was this inode dirtied after sync_sb_inodes was called?
> > +		 * This keeps sync from extra jobs and livelock.
> > +		 */
> > +		if (inode_dirtied_after(inode, start))
> >  			break;
> >  
> >  		/* Is another pdflush already flushing this queue? */
> 
> 
> -- 
> Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01 12:26           ` Wu Fengguang
@ 2009-04-01 12:48             ` Jeff Layton
  2009-04-01 13:07               ` Wu Fengguang
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2009-04-01 12:48 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

> > > --- mm.orig/fs/fs-writeback.c
> > > +++ mm/fs/fs-writeback.c
> > > @@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i
> > >  		struct inode *tail_inode;
> > >  
> > >  		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
> > > -		if (!time_after_eq(inode->dirtied_when,
> > > +		if (time_before(inode->dirtied_when,
> > >  				tail_inode->dirtied_when))
> > >  			inode->dirtied_when = jiffies;
> > >  	}
> > 
> > I think we need a similar change in this function in order to maintain
> > the list order.
> > 
> > Consider this case:
> > 
> > We have an s_dirty list with a head inode that appears to be in the
> > future. We start writeback and clear out s_dirty (all of the inodes are
> > moved to s_io). A new inode is dirtied, and goes onto the empty s_dirty
> > list with a dirtied_when value that equals now. The inode with the
> > dirtied_when value that looks like it's in the future is redirtied while
> > being written and redirty_tail is called. It goes back on the list
> > without resetting dirtied_when even though it's actually older than the
> > inode at the tail.
> 
> What's the difference? It _is_ the past because all 2 reference sites
> are now taught to think so.
> 
> So s_dirty is still in order, and the writeback process won't be blocked.
> 

Sanity check -- my understanding is this:

head == least-recently dirtied inode
tail == most-recently dirtied inode

...if so, then we are violating the list order if we don't make a
change to redirty_tail. We're putting an inode that's far in the past
back onto the tail of the list without resetting dirtied_when. A more
recently-dirtied inode will precede one that was dirtied less recently.

Since the newly dirtied inode is closer to the head of the list, the
older inode that's constantly being redirtied won't be written out
until the newly dirtied one passes the older_than_this check (30s or
so in the usual case).

> > There is another option too that I'll throw out here...
> > 
> > We could just make dirtied_when a 64 bit value on 32 bit machines and
> > use jiffies_64 there. On the upside there is no "problematic
> > window" with that. The downside is that struct inode would grow by 4
> > bytes on 32 bit arches, and checking jiffies_64 on such an arch is
> > more computationally intensive. We'd also have to change the size of
> > older_than_this value in the writeback_control struct too if we want to
> > go this route...
> 
> Yes that could eliminate the 30s or more temporary writeback stillness.
> The only problem is the extra costs for normal cases, especially the
> space cost.
> 

Correct. I'm not necessarily advocating that approach but it's one to
consider...

If your s_more_io_wait patchset comes to fruition though then that
change really won't be needed, so maybe it's best not to go that route.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01 12:48             ` Jeff Layton
@ 2009-04-01 13:07               ` Wu Fengguang
  2009-04-01 14:35                 ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Wu Fengguang @ 2009-04-01 13:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Andrew Morton, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On Wed, Apr 01, 2009 at 08:48:43PM +0800, Jeff Layton wrote:
> > > > --- mm.orig/fs/fs-writeback.c
> > > > +++ mm/fs/fs-writeback.c
> > > > @@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i
> > > >  		struct inode *tail_inode;
> > > >  
> > > >  		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
> > > > -		if (!time_after_eq(inode->dirtied_when,
> > > > +		if (time_before(inode->dirtied_when,
> > > >  				tail_inode->dirtied_when))
> > > >  			inode->dirtied_when = jiffies;
> > > >  	}
> > > 
> > > I think we need a similar change in this function in order to maintain
> > > the list order.
> > > 
> > > Consider this case:
> > > 
> > > We have an s_dirty list with a head inode that appears to be in the
> > > future. We start writeback and clear out s_dirty (all of the inodes are
> > > moved to s_io). A new inode is dirtied, and goes onto the empty s_dirty
> > > list with a dirtied_when value that equals now. The inode with the
> > > dirtied_when value that looks like it's in the future is redirtied while
> > > being written and redirty_tail is called. It goes back on the list
> > > without resetting dirtied_when even though it's actually older than the
> > > inode at the tail.
> > 
> > What's the difference? It _is_ the past because all 2 reference sites
> > are now taught to think so.
> > 
> > So s_dirty is still in order, and the writeback process won't be blocked.
> > 
> 
> Sanity check -- my understanding is this:
> 
> head == least-recently dirtied inode
> tail == most-recently dirtied inode
> 
> ...if so, then we are violating the list order if we don't make a
> change to redirty_tail. We're putting an inode that's far in the past
> back onto the tail of the list without resetting dirtied_when. A more
> recently-dirtied inode will precede one that was dirtied less recently.
> 
> Since the newly dirtied inode is closer to the head of the list, the
> older inode that's constantly being redirtied won't be written out
> until the newly dirtied one passes the older_than_this check (30s or
> so in the usual case).

If you call that out-of-order, yes it is. Sadly it cannot be improved
by playing with dirtied_when: the _physical_ order is still the same.

You know what? That's exactly the drawback of redirtying into s_dirty.
It's irrelevant to the resetting of dirtied_when. A new s_more_io_wait
is the only way to solve this problem.

> > > There is another option too that I'll throw out here...
> > > 
> > > We could just make dirtied_when a 64 bit value on 32 bit machines and
> > > use jiffies_64 there. On the upside there is no "problematic
> > > window" with that. The downside is that struct inode would grow by 4
> > > bytes on 32 bit arches, and checking jiffies_64 on such an arch is
> > > more computationally intensive. We'd also have to change the size of
> > > older_than_this value in the writeback_control struct too if we want to
> > > go this route...
> > 
> > Yes that could eliminate the 30s or more temporary writeback stillness.
> > The only problem is the extra costs for normal cases, especially the
> > space cost.
> > 
> 
> Correct. I'm not necessarily advocating that approach but it's one to
> consider...
> 
> If your s_more_io_wait patchset comes to fruition though then that
> change really won't be needed, so maybe it's best not to go that route.

Sorry for the delay. But I'm still curious about the redirty
process&timing of NFS/XFS. The conflicts with Jens' per-bdi pdflush
patches are another concern...

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2)
  2009-04-01 13:07               ` Wu Fengguang
@ 2009-04-01 14:35                 ` Jeff Layton
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Layton @ 2009-04-01 14:35 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org

On Wed, 1 Apr 2009 21:07:37 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Wed, Apr 01, 2009 at 08:48:43PM +0800, Jeff Layton wrote:
> > > > > --- mm.orig/fs/fs-writeback.c
> > > > > +++ mm/fs/fs-writeback.c
> > > > > @@ -196,7 +196,7 @@ static void redirty_tail(struct inode *i
> > > > >  		struct inode *tail_inode;
> > > > >  
> > > > >  		tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
> > > > > -		if (!time_after_eq(inode->dirtied_when,
> > > > > +		if (time_before(inode->dirtied_when,
> > > > >  				tail_inode->dirtied_when))
> > > > >  			inode->dirtied_when = jiffies;
> > > > >  	}
> > > > 
> > > > I think we need a similar change in this function in order to maintain
> > > > the list order.
> > > > 
> > > > Consider this case:
> > > > 
> > > > We have an s_dirty list with a head inode that appears to be in the
> > > > future. We start writeback and clear out s_dirty (all of the inodes are
> > > > moved to s_io). A new inode is dirtied, and goes onto the empty s_dirty
> > > > list with a dirtied_when value that equals now. The inode with the
> > > > dirtied_when value that looks like it's in the future is redirtied while
> > > > being written and redirty_tail is called. It goes back on the list
> > > > without resetting dirtied_when even though it's actually older than the
> > > > inode at the tail.
> > > 
> > > What's the difference? It _is_ the past because all 2 reference sites
> > > are now taught to think so.
> > > 
> > > So s_dirty is still in order, and the writeback process won't be blocked.
> > > 
> > 
> > Sanity check -- my understanding is this:
> > 
> > head == least-recently dirtied inode
> > tail == most-recently dirtied inode
> > 
> > ...if so, then we are violating the list order if we don't make a
> > change to redirty_tail. We're putting an inode that's far in the past
> > back onto the tail of the list without resetting dirtied_when. A more
> > recently-dirtied inode will precede one that was dirtied less recently.
> > 
> > Since the newly dirtied inode is closer to the head of the list, the
> > older inode that's constantly being redirtied won't be written out
> > until the newly dirtied one passes the older_than_this check (30s or
> > so in the usual case).
> 
> If you call that out-of-order, yes it is. Sadly it cannot be improved
> by playing with dirtied_when: the _physical_ order is still the same.
> 
> You know what? That's exactly the drawback of redirtying into s_dirty.
> It's irrelevant to the resetting of dirtied_when. A new s_more_io_wait
> is the only way to solve this problem.
> 

Agreed. The consequences are also the same regardless of whether we
update dirtied when -- a 30s delay in writeback when we redirty back
onto s_dirty.

Ok, I'm convinced. Ack on that patch since the behavior is the same
regardless of whether we update dirtied_when.

> > > > There is another option too that I'll throw out here...
> > > > 
> > > > We could just make dirtied_when a 64 bit value on 32 bit machines and
> > > > use jiffies_64 there. On the upside there is no "problematic
> > > > window" with that. The downside is that struct inode would grow by 4
> > > > bytes on 32 bit arches, and checking jiffies_64 on such an arch is
> > > > more computationally intensive. We'd also have to change the size of
> > > > older_than_this value in the writeback_control struct too if we want to
> > > > go this route...
> > > 
> > > Yes that could eliminate the 30s or more temporary writeback stillness.
> > > The only problem is the extra costs for normal cases, especially the
> > > space cost.
> > > 
> > 
> > Correct. I'm not necessarily advocating that approach but it's one to
> > consider...
> > 
> > If your s_more_io_wait patchset comes to fruition though then that
> > change really won't be needed, so maybe it's best not to go that route.
> 
> Sorry for the delay. But I'm still curious about the redirty
> process&timing of NFS/XFS. The conflicts with Jens' per-bdi pdflush
> patches are another concern...
> 

No worries. Getting this correct is the most important thing.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-04-01 14:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-01  0:03 [PATCH] writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #2) Jeff Layton
2009-04-01  0:20 ` Andrew Morton
2009-04-01  0:50   ` Jeff Layton
2009-04-01  1:07     ` Andrew Morton
2009-04-01  6:56       ` Wu Fengguang
2009-04-01 11:53         ` Jeff Layton
2009-04-01 12:26           ` Wu Fengguang
2009-04-01 12:48             ` Jeff Layton
2009-04-01 13:07               ` Wu Fengguang
2009-04-01 14:35                 ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).