From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [RFC PATCH v3 2/2] xfs: fix xfsaild hang due to lost wake ups
Date: Wed, 23 May 2012 10:58:30 +1000 [thread overview]
Message-ID: <20120523005830.GL25351@dastard> (raw)
In-Reply-To: <1337704714-50235-3-git-send-email-bfoster@redhat.com>
On Tue, May 22, 2012 at 12:38:34PM -0400, Brian Foster wrote:
> Running xfstests 273 in a loop reproduces an XFS lockup due to
> xfsaild entering idle mode indefinitely. The following
> high-level sequence of events leads to the hang:
>
> - xfsaild is running with a cached target lsn
> - xfs_ail_push() is invoked, updates ailp->xa_target_lsn and
> invokes wake_up_process(). wake_up_process() returns 0
> because xfsaild is already running.
> - xfsaild enters idle mode having met its current target.
>
> Once in the described state, xfs_ail_push() is invoked many
> more times with the already set threshold_lsn, but these calls
> do not lead to wake_up_process() calls because no further
> invocations result in moving the threshold_lsn forward. Add a
> flag to xfs_ail to capture whether an issued wake actually
> succeeds. If not, continue issuing wakes until we know one has
> been successful for the current target.
Hi Brian - here's kind of what I was thinking when we were talking
on IRC. basically we move all the idling logic into xfsaild() to
keep it out of xfsaild_push(), and make sure we only idle on an
empty AIL when we haven't raced with a target update.
So, I was thinking that we add a previous target variable to the
xfs_ail structure. Then xfsaild would become something like:
while (!kthread_should_stop()) {
spin_lock(&ailp->xa_lock);
__set_current_state(TASK_INTERRUPTIBLE);
/* barrier matches the xa_target update in xfs_ail_push() */
smp_rmb();
if (!xfs_ail_min(ailp) && ailp->xa_target == ailp->xa_prev_target) {
/* empty ail, not change to push target - idle */
spin_unlock(&ailp->xa_lock);
schedule();
tout = 0;
}
spin_unlock(&ailp->xa_lock);
if (tout) {
/* more work to do soon */
schedule_timeout(msecs_to_jiffies(tout));
}
__set_current_state(TASK_RUNNING);
try_to_freeze();
tout = xfsaild_push(ailp);
}
And in xfsaild_push(), move where we sample the push target to before the cursor
setup, and keep a snapshot of it:
/* barrier matches the xa_target update in xfs_ail_push() */
smp_rmb();
target = ailp->xa_target;
ailp->xa_prev_target = target;
This means we do not idle if a new push target was set while we were pushing,
even if we emptied the AIL (call it paranoia!).
We can avoid the returning of a zero timeout from xfsaild_push, too,
because the idling is not based on the state that we return from the
push. Hence we always will return a 10, 20 or 50ms timeout and we
can avoid complicating xfsaild_push logic with idling logic. i.e.
the logic that is there right now should not need modification...
Finally, rather than calling wake_up_process() in the
xfs_ail_push*() functions, call wake_up(&ailp->xa_idle); There can
only be one thread sleeping on that (the xfsaild) so there is no
need to use the wake_up_all() variant...
FWIW, you might be able to do this without the idle wait queue and
just use wake_up_process() -
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-05-23 0:58 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-22 16:38 [RFC PATCH v3 0/2] xfs: fix xfsaild races and re-enable idle mode Brian Foster
2012-05-22 16:38 ` [RFC PATCH v3 1/2] xfs: re-enable xfsaild idle mode when the ail is empty Brian Foster
2012-05-22 16:38 ` [RFC PATCH v3 2/2] xfs: fix xfsaild hang due to lost wake ups Brian Foster
2012-05-23 0:58 ` Dave Chinner [this message]
2012-05-23 13:05 ` Brian Foster
2012-05-24 0:01 ` Dave Chinner
2012-05-23 17:48 ` Brian Foster
2012-05-23 18:19 ` Mark Tinguely
2012-05-23 23:41 ` Brian Foster
2012-05-23 23:53 ` Dave Chinner
2012-05-24 14:38 ` Mark Tinguely
2012-05-24 0:06 ` Dave Chinner
2012-05-24 13:07 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120523005830.GL25351@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox