public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: re-enable xfsaild idle mode and fix associated races
@ 2012-05-24 16:06 Brian Foster
  2012-06-04 14:39 ` Brian Foster
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Brian Foster @ 2012-05-24 16:06 UTC (permalink / raw)
  To: xfs; +Cc: Brian Foster

xfsaild idle mode logic currently leads to a couple hangs:

1.) If xfsaild is rescheduled in during an incremental scan
    (i.e., tout != 0) and the target has been updated since
    the previous run, we can hit the new target and go into
    idle mode with a still populated ail.
2.) A wake up is only issued when the target is pushed forward.
    The wake up can race with xfsaild if it is currently in the
    process of entering idle mode, causing future wake up
    events to be lost.

Both hangs are reproducible by running xfstests 273 in a loop.
Modify xfsaild to enter idle mode only when the ail is empty
and the push target has not been moved forward since the last
push.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

This is a lightly tested version against the xfs tree. I'll be more
heavily testing a version based on my upstream reproducer tree over the
next few days followed by similar testing on this patch if all goes
well. Sending in advance for review.

 fs/xfs/xfs_trans_ail.c  |   29 ++++++++++++++++++++++++++---
 fs/xfs/xfs_trans_priv.h |    1 +
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 9c51448..0819cd3 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -383,6 +383,12 @@ xfsaild_push(
 	}
 
 	spin_lock(&ailp->xa_lock);
+
+	/* barrier matches the xa_target update in xfs_ail_push() */
+	smp_rmb();
+	target = ailp->xa_target;
+	ailp->xa_target_prev = target;
+
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
 	if (!lip) {
 		/*
@@ -397,7 +403,6 @@ xfsaild_push(
 	XFS_STATS_INC(xs_push_ail);
 
 	lsn = lip->li_lsn;
-	target = ailp->xa_target;
 	while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) {
 		int	lock_result;
 
@@ -527,8 +532,26 @@ xfsaild(
 			__set_current_state(TASK_KILLABLE);
 		else
 			__set_current_state(TASK_INTERRUPTIBLE);
-		schedule_timeout(tout ?
-				 msecs_to_jiffies(tout) : MAX_SCHEDULE_TIMEOUT);
+
+		spin_lock(&ailp->xa_lock);
+
+		/* barrier matches the xa_target update in xfs_ail_push() */
+		smp_rmb();
+		if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) {
+			/* the ail is empty and no change to the push target - idle */
+			spin_unlock(&ailp->xa_lock);
+			schedule();
+			tout = 0;
+			continue;
+		}
+		spin_unlock(&ailp->xa_lock);
+
+		if (tout) {
+			/* more work to do soon */
+			schedule_timeout(msecs_to_jiffies(tout));
+		}
+
+		__set_current_state(TASK_RUNNING);
 
 		try_to_freeze();
 
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index fb62377..53b7c9b 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -67,6 +67,7 @@ struct xfs_ail {
 	struct task_struct	*xa_task;
 	struct list_head	xa_ail;
 	xfs_lsn_t		xa_target;
+	xfs_lsn_t		xa_target_prev;
 	struct list_head	xa_cursors;
 	spinlock_t		xa_lock;
 	xfs_lsn_t		xa_last_pushed_lsn;
-- 
1.7.7.6

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: re-enable xfsaild idle mode and fix associated races
  2012-05-24 16:06 [PATCH] xfs: re-enable xfsaild idle mode and fix associated races Brian Foster
@ 2012-06-04 14:39 ` Brian Foster
  2012-06-04 17:25 ` Mark Tinguely
  2012-06-07  2:24 ` Dave Chinner
  2 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2012-06-04 14:39 UTC (permalink / raw)
  To: xfs

On 05/24/2012 12:06 PM, Brian Foster wrote:
> xfsaild idle mode logic currently leads to a couple hangs:
> 
> 1.) If xfsaild is rescheduled in during an incremental scan
>     (i.e., tout != 0) and the target has been updated since
>     the previous run, we can hit the new target and go into
>     idle mode with a still populated ail.
> 2.) A wake up is only issued when the target is pushed forward.
>     The wake up can race with xfsaild if it is currently in the
>     process of entering idle mode, causing future wake up
>     events to be lost.
> 
> Both hangs are reproducible by running xfstests 273 in a loop.
> Modify xfsaild to enter idle mode only when the ail is empty
> and the push target has not been moved forward since the last
> push.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> This is a lightly tested version against the xfs tree. I'll be more
> heavily testing a version based on my upstream reproducer tree over the
> next few days followed by similar testing on this patch if all goes
> well. Sending in advance for review.
> 

FYI, I've done a decent amount of testing at this point and can no
longer reproduce the original lockups due to idle mode. I can still
reproduce the sync worker lockup in the xfs tree, but I verify that
xfsaild is actually still running in that case (thus it's a different
issue). Unless there are other objections, this patch seems good to me.
Thanks.

Brian

>  fs/xfs/xfs_trans_ail.c  |   29 ++++++++++++++++++++++++++---
>  fs/xfs/xfs_trans_priv.h |    1 +
>  2 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 9c51448..0819cd3 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -383,6 +383,12 @@ xfsaild_push(
>  	}
>  
>  	spin_lock(&ailp->xa_lock);
> +
> +	/* barrier matches the xa_target update in xfs_ail_push() */
> +	smp_rmb();
> +	target = ailp->xa_target;
> +	ailp->xa_target_prev = target;
> +
>  	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
>  	if (!lip) {
>  		/*
> @@ -397,7 +403,6 @@ xfsaild_push(
>  	XFS_STATS_INC(xs_push_ail);
>  
>  	lsn = lip->li_lsn;
> -	target = ailp->xa_target;
>  	while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) {
>  		int	lock_result;
>  
> @@ -527,8 +532,26 @@ xfsaild(
>  			__set_current_state(TASK_KILLABLE);
>  		else
>  			__set_current_state(TASK_INTERRUPTIBLE);
> -		schedule_timeout(tout ?
> -				 msecs_to_jiffies(tout) : MAX_SCHEDULE_TIMEOUT);
> +
> +		spin_lock(&ailp->xa_lock);
> +
> +		/* barrier matches the xa_target update in xfs_ail_push() */
> +		smp_rmb();
> +		if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) {
> +			/* the ail is empty and no change to the push target - idle */
> +			spin_unlock(&ailp->xa_lock);
> +			schedule();
> +			tout = 0;
> +			continue;
> +		}
> +		spin_unlock(&ailp->xa_lock);
> +
> +		if (tout) {
> +			/* more work to do soon */
> +			schedule_timeout(msecs_to_jiffies(tout));
> +		}
> +
> +		__set_current_state(TASK_RUNNING);
>  
>  		try_to_freeze();
>  
> diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
> index fb62377..53b7c9b 100644
> --- a/fs/xfs/xfs_trans_priv.h
> +++ b/fs/xfs/xfs_trans_priv.h
> @@ -67,6 +67,7 @@ struct xfs_ail {
>  	struct task_struct	*xa_task;
>  	struct list_head	xa_ail;
>  	xfs_lsn_t		xa_target;
> +	xfs_lsn_t		xa_target_prev;
>  	struct list_head	xa_cursors;
>  	spinlock_t		xa_lock;
>  	xfs_lsn_t		xa_last_pushed_lsn;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: re-enable xfsaild idle mode and fix associated races
  2012-05-24 16:06 [PATCH] xfs: re-enable xfsaild idle mode and fix associated races Brian Foster
  2012-06-04 14:39 ` Brian Foster
@ 2012-06-04 17:25 ` Mark Tinguely
  2012-06-07  2:13   ` Dave Chinner
  2012-06-07  2:24 ` Dave Chinner
  2 siblings, 1 reply; 5+ messages in thread
From: Mark Tinguely @ 2012-06-04 17:25 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On 05/24/12 11:06, Brian Foster wrote:
> xfsaild idle mode logic currently leads to a couple hangs:
>
> 1.) If xfsaild is rescheduled in during an incremental scan
>      (i.e., tout != 0) and the target has been updated since
>      the previous run, we can hit the new target and go into
>      idle mode with a still populated ail.
> 2.) A wake up is only issued when the target is pushed forward.
>      The wake up can race with xfsaild if it is currently in the
>      process of entering idle mode, causing future wake up
>      events to be lost.
>
> Both hangs are reproducible by running xfstests 273 in a loop.
> Modify xfsaild to enter idle mode only when the ail is empty
> and the push target has not been moved forward since the last
> push.
>
> Signed-off-by: Brian Foster<bfoster@redhat.com>
> ---

I wouldn't mind keeping an large (a few minutes) wake up value for the
empty AIL case to be paranoid.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: re-enable xfsaild idle mode and fix associated races
  2012-06-04 17:25 ` Mark Tinguely
@ 2012-06-07  2:13   ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2012-06-07  2:13 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: Brian Foster, xfs

On Mon, Jun 04, 2012 at 12:25:40PM -0500, Mark Tinguely wrote:
> On 05/24/12 11:06, Brian Foster wrote:
> >xfsaild idle mode logic currently leads to a couple hangs:
> >
> >1.) If xfsaild is rescheduled in during an incremental scan
> >     (i.e., tout != 0) and the target has been updated since
> >     the previous run, we can hit the new target and go into
> >     idle mode with a still populated ail.
> >2.) A wake up is only issued when the target is pushed forward.
> >     The wake up can race with xfsaild if it is currently in the
> >     process of entering idle mode, causing future wake up
> >     events to be lost.
> >
> >Both hangs are reproducible by running xfstests 273 in a loop.
> >Modify xfsaild to enter idle mode only when the ail is empty
> >and the push target has not been moved forward since the last
> >push.
> >
> >Signed-off-by: Brian Foster<bfoster@redhat.com>
> >---
> 
> I wouldn't mind keeping an large (a few minutes) wake up value for the
> empty AIL case to be paranoid.

And then we'll never hear about hangs....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: re-enable xfsaild idle mode and fix associated races
  2012-05-24 16:06 [PATCH] xfs: re-enable xfsaild idle mode and fix associated races Brian Foster
  2012-06-04 14:39 ` Brian Foster
  2012-06-04 17:25 ` Mark Tinguely
@ 2012-06-07  2:24 ` Dave Chinner
  2 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2012-06-07  2:24 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, May 24, 2012 at 12:06:42PM -0400, Brian Foster wrote:
> xfsaild idle mode logic currently leads to a couple hangs:
> 
> 1.) If xfsaild is rescheduled in during an incremental scan
>     (i.e., tout != 0) and the target has been updated since
>     the previous run, we can hit the new target and go into
>     idle mode with a still populated ail.
> 2.) A wake up is only issued when the target is pushed forward.
>     The wake up can race with xfsaild if it is currently in the
>     process of entering idle mode, causing future wake up
>     events to be lost.
> 
> Both hangs are reproducible by running xfstests 273 in a loop.
> Modify xfsaild to enter idle mode only when the ail is empty
> and the push target has not been moved forward since the last
> push.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> This is a lightly tested version against the xfs tree. I'll be more
> heavily testing a version based on my upstream reproducer tree over the
> next few days followed by similar testing on this patch if all goes
> well. Sending in advance for review.
> 
>  fs/xfs/xfs_trans_ail.c  |   29 ++++++++++++++++++++++++++---
>  fs/xfs/xfs_trans_priv.h |    1 +
>  2 files changed, 27 insertions(+), 3 deletions(-)

Looks OK but some minor formatting nits.

> @@ -527,8 +532,26 @@ xfsaild(
>  			__set_current_state(TASK_KILLABLE);
>  		else
>  			__set_current_state(TASK_INTERRUPTIBLE);
> -		schedule_timeout(tout ?
> -				 msecs_to_jiffies(tout) : MAX_SCHEDULE_TIMEOUT);
> +
> +		spin_lock(&ailp->xa_lock);
> +
> +		/* barrier matches the xa_target update in xfs_ail_push() */
> +		smp_rmb();
> +		if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) {
> +			/* the ail is empty and no change to the push target - idle */

I much prefer comments above the if() statement - the natural order
of reading something is top down - explain why, then do. iAlso,
there is no reason to write comments in abbreviated english - make
them verbose so that when you come back to this code in 2 years time
you can immediately understand why the code is like this from the
comment. So this:

		/*
		 * Idle if the ail is empty and we are not racing
		 * with a target update. The barrier matches the
		 * xa_target update in xfs_ail_push().
		 */
		smp_rmb();
		if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) {

> +			spin_unlock(&ailp->xa_lock);
> +			schedule();
> +			tout = 0;
> +			continue;
> +		}
> +		spin_unlock(&ailp->xa_lock);
> +
> +		if (tout) {
> +			/* more work to do soon */
> +			schedule_timeout(msecs_to_jiffies(tout));
> +		}

And here I think this comment is redundant, because the code
is self documenting - if we have a timeout set, sleep for that
timeout, otherwise continue and do more work....

		if (tout)
			schedule_timeout(msecs_to_jiffies(tout));

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-07  2:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-24 16:06 [PATCH] xfs: re-enable xfsaild idle mode and fix associated races Brian Foster
2012-06-04 14:39 ` Brian Foster
2012-06-04 17:25 ` Mark Tinguely
2012-06-07  2:13   ` Dave Chinner
2012-06-07  2:24 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox