* [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races
@ 2012-06-07 16:49 Brian Foster
2012-06-20 8:05 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2012-06-07 16:49 UTC (permalink / raw)
To: xfs
xfsaild idle mode logic currently leads to a couple hangs:
1.) If xfsaild is rescheduled in during an incremental scan
(i.e., tout != 0) and the target has been updated since
the previous run, we can hit the new target and go into
idle mode with a still populated ail.
2.) A wake up is only issued when the target is pushed forward.
The wake up can race with xfsaild if it is currently in the
process of entering idle mode, causing future wake up
events to be lost.
Both hangs are reproducible by running xfstests 273 in a loop.
Modify xfsaild to enter idle mode only when the ail is empty
and the push target has not been moved forward since the last
push.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
v2:
- Fixed up comments in xfsaild() based on Dave Chinner's
review.
fs/xfs/xfs_trans_ail.c | 30 +++++++++++++++++++++++++++---
fs/xfs/xfs_trans_priv.h | 1 +
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 9c51448..881f9c6 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -383,6 +383,12 @@ xfsaild_push(
}
spin_lock(&ailp->xa_lock);
+
+ /* barrier matches the xa_target update in xfs_ail_push() */
+ smp_rmb();
+ target = ailp->xa_target;
+ ailp->xa_target_prev = target;
+
lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
if (!lip) {
/*
@@ -397,7 +403,6 @@ xfsaild_push(
XFS_STATS_INC(xs_push_ail);
lsn = lip->li_lsn;
- target = ailp->xa_target;
while ((XFS_LSN_CMP(lip->li_lsn, target) <= 0)) {
int lock_result;
@@ -527,8 +532,27 @@ xfsaild(
__set_current_state(TASK_KILLABLE);
else
__set_current_state(TASK_INTERRUPTIBLE);
- schedule_timeout(tout ?
- msecs_to_jiffies(tout) : MAX_SCHEDULE_TIMEOUT);
+
+ spin_lock(&ailp->xa_lock);
+
+ /*
+ * Idle if the AIL is empty and we are not racing with a target
+ * update. The barrier matches the xa_target update in
+ * xfs_ail_push().
+ */
+ smp_rmb();
+ if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) {
+ spin_unlock(&ailp->xa_lock);
+ schedule();
+ tout = 0;
+ continue;
+ }
+ spin_unlock(&ailp->xa_lock);
+
+ if (tout)
+ schedule_timeout(msecs_to_jiffies(tout));
+
+ __set_current_state(TASK_RUNNING);
try_to_freeze();
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index fb62377..53b7c9b 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -67,6 +67,7 @@ struct xfs_ail {
struct task_struct *xa_task;
struct list_head xa_ail;
xfs_lsn_t xa_target;
+ xfs_lsn_t xa_target_prev;
struct list_head xa_cursors;
spinlock_t xa_lock;
xfs_lsn_t xa_last_pushed_lsn;
--
1.7.7.6
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races
2012-06-07 16:49 [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races Brian Foster
@ 2012-06-20 8:05 ` Christoph Hellwig
2012-06-20 15:35 ` Brian Foster
2012-06-20 15:59 ` Brian Foster
0 siblings, 2 replies; 6+ messages in thread
From: Christoph Hellwig @ 2012-06-20 8:05 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Thu, Jun 07, 2012 at 12:49:53PM -0400, Brian Foster wrote:
> xfsaild idle mode logic currently leads to a couple hangs:
>
> 1.) If xfsaild is rescheduled in during an incremental scan
> (i.e., tout != 0) and the target has been updated since
> the previous run, we can hit the new target and go into
> idle mode with a still populated ail.
> 2.) A wake up is only issued when the target is pushed forward.
> The wake up can race with xfsaild if it is currently in the
> process of entering idle mode, causing future wake up
> events to be lost.
>
> Both hangs are reproducible by running xfstests 273 in a loop.
> Modify xfsaild to enter idle mode only when the ail is empty
> and the push target has not been moved forward since the last
> push.
What tree is this against? The current XFS tree never fully idles,
so something is missing here, probably just in the patch description.
> spin_lock(&ailp->xa_lock);
> +
> + /* barrier matches the xa_target update in xfs_ail_push() */
> + smp_rmb();
> + target = ailp->xa_target;
> + ailp->xa_target_prev = target;
> +
> lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
> if (!lip) {
> /*
> +
> + spin_lock(&ailp->xa_lock);
> +
> + /*
> + * Idle if the AIL is empty and we are not racing with a target
> + * update. The barrier matches the xa_target update in
> + * xfs_ail_push().
> + */
> + smp_rmb();
Given that both sides are under xa_lock I can't see any need for
barriers here.
> + if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) {
> + spin_unlock(&ailp->xa_lock);
> + schedule();
> + tout = 0;
> + continue;
> + }
This seems to add the actual idling, but in a rather confusing way.
Can you add the xfs_ail_min and target checks to the end of xfsaild_push
so that they are in one place with the other decisions for the timeout.
Please also add a comment explaining the conditions when we want to
idle.
Also two small style nipicks:
- please make sure lines are shorter than 80 characters
- no need for the braces around the target comparism above.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races
2012-06-20 8:05 ` Christoph Hellwig
@ 2012-06-20 15:35 ` Brian Foster
2012-06-20 15:59 ` Brian Foster
1 sibling, 0 replies; 6+ messages in thread
From: Brian Foster @ 2012-06-20 15:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On 06/20/2012 04:05 AM, Christoph Hellwig wrote:
> On Thu, Jun 07, 2012 at 12:49:53PM -0400, Brian Foster wrote:
>> xfsaild idle mode logic currently leads to a couple hangs:
>>
>> 1.) If xfsaild is rescheduled in during an incremental scan
>> (i.e., tout != 0) and the target has been updated since
>> the previous run, we can hit the new target and go into
>> idle mode with a still populated ail.
>> 2.) A wake up is only issued when the target is pushed forward.
>> The wake up can race with xfsaild if it is currently in the
>> process of entering idle mode, causing future wake up
>> events to be lost.
>>
>> Both hangs are reproducible by running xfstests 273 in a loop.
>> Modify xfsaild to enter idle mode only when the ail is empty
>> and the push target has not been moved forward since the last
>> push.
>
> What tree is this against? The current XFS tree never fully idles,
> so something is missing here, probably just in the patch description.
>
Yes, this was tested in an upstream tree with a minor modification to
re-enable idle mode (as previously implemented) and test/reproduce the
hangs. I'll send an updated patch that calls that out in the description.
>> spin_lock(&ailp->xa_lock);
>> +
>> + /* barrier matches the xa_target update in xfs_ail_push() */
>> + smp_rmb();
>> + target = ailp->xa_target;
>> + ailp->xa_target_prev = target;
>> +
>> lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
>> if (!lip) {
>> /*
>> +
>> + spin_lock(&ailp->xa_lock);
>> +
>> + /*
>> + * Idle if the AIL is empty and we are not racing with a target
>> + * update. The barrier matches the xa_target update in
>> + * xfs_ail_push().
>> + */
>> + smp_rmb();
>
> Given that both sides are under xa_lock I can't see any need for
> barriers here.
>
Ok.
>> + if (!xfs_ail_min(ailp) && (ailp->xa_target == ailp->xa_target_prev)) {
>> + spin_unlock(&ailp->xa_lock);
>> + schedule();
>> + tout = 0;
>> + continue;
>> + }
>
> This seems to add the actual idling, but in a rather confusing way.
>
> Can you add the xfs_ail_min and target checks to the end of xfsaild_push
> so that they are in one place with the other decisions for the timeout.
>
A couple points here:
- This code is primarily based on logic from Dave (whom I should also
probably credit in the description), whom I think generally preferred to
see the sleeping logic pulled out from xfsaild_push(). To provide some
context, the relevant thread is here:
http://oss.sgi.com/archives/xfs/2012-05/msg00288.html
- One lockup this actually fixes is a race between xa_target updates and
the decision to idle the xfsaild thread. To be specific, it was possible
to update xa_target and issue a wake_up_process() that is "lost" due to
the idle thread already running. If the idle thread happens to be
somewhere after the decision to sleep (set tout=0) but before the thread
is set !TASK_RUNNING, it might schedule out indefinitely.
I think if we move the sleep check back into xfsaild_push(), we
reintroduce the race, narrow as it may be. Now that I think of it, I
could also elaborate on this particular race in the comment you call out
below.
So if I'm mistaken with this point, I'd at least like to see you and
Dave agree on an approach here before I post v3, so I'll await your
response...
> Please also add a comment explaining the conditions when we want to
> idle.
>
This is the intent of the comment in the previous chunk you commented on
wrt to the barrier ("Idle if the AIL is empty..."). I can remove the
barrier bit if that goes away...
> Also two small style nipicks:
>
> - please make sure lines are shorter than 80 characters
> - no need for the braces around the target comparism above.
>
Ok. Thanks for the review.
Brian
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races
2012-06-20 8:05 ` Christoph Hellwig
2012-06-20 15:35 ` Brian Foster
@ 2012-06-20 15:59 ` Brian Foster
2012-06-20 22:35 ` Dave Chinner
1 sibling, 1 reply; 6+ messages in thread
From: Brian Foster @ 2012-06-20 15:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On 06/20/2012 04:05 AM, Christoph Hellwig wrote:
> On Thu, Jun 07, 2012 at 12:49:53PM -0400, Brian Foster wrote:
[snip]
>> spin_lock(&ailp->xa_lock);
>> +
>> + /* barrier matches the xa_target update in xfs_ail_push() */
>> + smp_rmb();
>> + target = ailp->xa_target;
>> + ailp->xa_target_prev = target;
>> +
>> lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
>> if (!lip) {
>> /*
>> +
>> + spin_lock(&ailp->xa_lock);
>> +
>> + /*
>> + * Idle if the AIL is empty and we are not racing with a target
>> + * update. The barrier matches the xa_target update in
>> + * xfs_ail_push().
>> + */
>> + smp_rmb();
>
> Given that both sides are under xa_lock I can't see any need for
> barriers here.
>
Actually now that I look at it, it appears xfs_trans_ail_copy_lsn() does
not necessarily acquire the xa_lock...
Brian
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races
2012-06-20 15:59 ` Brian Foster
@ 2012-06-20 22:35 ` Dave Chinner
2012-06-21 7:11 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2012-06-20 22:35 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, xfs
On Wed, Jun 20, 2012 at 11:59:58AM -0400, Brian Foster wrote:
> On 06/20/2012 04:05 AM, Christoph Hellwig wrote:
> > On Thu, Jun 07, 2012 at 12:49:53PM -0400, Brian Foster wrote:
> [snip]
> >> spin_lock(&ailp->xa_lock);
> >> +
> >> + /* barrier matches the xa_target update in xfs_ail_push() */
> >> + smp_rmb();
> >> + target = ailp->xa_target;
> >> + ailp->xa_target_prev = target;
> >> +
> >> lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
> >> if (!lip) {
> >> /*
> >> +
> >> + spin_lock(&ailp->xa_lock);
> >> +
> >> + /*
> >> + * Idle if the AIL is empty and we are not racing with a target
> >> + * update. The barrier matches the xa_target update in
> >> + * xfs_ail_push().
> >> + */
> >> + smp_rmb();
> >
> > Given that both sides are under xa_lock I can't see any need for
> > barriers here.
> >
>
> Actually now that I look at it, it appears xfs_trans_ail_copy_lsn() does
> not necessarily acquire the xa_lock...
Right, it does not acquire the lock on 64 bit systems as reads and
writes are single instructions on 64 bit systems. The lock is
required for 32 bit systems because the write requires separate 32
bit writes to the LSN which can result in unlocked accesses seeing
partially updated (and hence incorrect) LSN values.
So the memory barriers are definitely needed for 64 bit machines
because there is no locking on the update and spinlocks only provide
memory barriers via unlock->lock transitions, not via a single
spin_lock() call.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races
2012-06-20 22:35 ` Dave Chinner
@ 2012-06-21 7:11 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2012-06-21 7:11 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, Brian Foster, xfs
On Thu, Jun 21, 2012 at 08:35:48AM +1000, Dave Chinner wrote:
> writes are single instructions on 64 bit systems. The lock is
> required for 32 bit systems because the write requires separate 32
> bit writes to the LSN which can result in unlocked accesses seeing
> partially updated (and hence incorrect) LSN values.
>
> So the memory barriers are definitely needed for 64 bit machines
> because there is no locking on the update and spinlocks only provide
> memory barriers via unlock->lock transitions, not via a single
> spin_lock() call.
Indeed. So we'll either need the barriers, or just always take xa_lock
in xfs_ail_push. Given that xa_lock and xa_target appear in the same
cache line it probably wouldn't even make much of a difference.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-06-21 7:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-07 16:49 [PATCH v2] xfs: re-enable xfsaild idle mode and fix associated races Brian Foster
2012-06-20 8:05 ` Christoph Hellwig
2012-06-20 15:35 ` Brian Foster
2012-06-20 15:59 ` Brian Foster
2012-06-20 22:35 ` Dave Chinner
2012-06-21 7:11 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox