From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [RFC PATCH v3 2/2] xfs: fix xfsaild hang due to lost wake ups
Date: Wed, 23 May 2012 10:58:30 +1000 [thread overview]
Message-ID: <20120523005830.GL25351@dastard> (raw)
In-Reply-To: <1337704714-50235-3-git-send-email-bfoster@redhat.com>
On Tue, May 22, 2012 at 12:38:34PM -0400, Brian Foster wrote:
> Running xfstests 273 in a loop reproduces an XFS lockup due to
> xfsaild entering idle mode indefinitely. The following
> high-level sequence of events leads to the hang:
>
> - xfsaild is running with a cached target lsn
> - xfs_ail_push() is invoked, updates ailp->xa_target_lsn and
> invokes wake_up_process(). wake_up_process() returns 0
> because xfsaild is already running.
> - xfsaild enters idle mode having met its current target.
>
> Once in the described state, xfs_ail_push() is invoked many
> more times with the already set threshold_lsn, but these calls
> do not lead to wake_up_process() calls because no further
> invocations result in moving the threshold_lsn forward. Add a
> flag to xfs_ail to capture whether an issued wake actually
> succeeds. If not, continue issuing wakes until we know one has
> been successful for the current target.
Hi Brian - here's kind of what I was thinking when we were talking
on IRC. basically we move all the idling logic into xfsaild() to
keep it out of xfsaild_push(), and make sure we only idle on an
empty AIL when we haven't raced with a target update.
So, I was thinking that we add a previous target variable to the
xfs_ail structure. Then xfsaild would become something like:
while (!kthread_should_stop()) {
spin_lock(&ailp->xa_lock);
__set_current_state(TASK_INTERRUPTIBLE);
/* barrier matches the xa_target update in xfs_ail_push() */
smp_rmb();
if (!xfs_ail_min(ailp) && ailp->xa_target == ailp->xa_prev_target) {
/* empty ail, not change to push target - idle */
spin_unlock(&ailp->xa_lock);
schedule();
tout = 0;
}
spin_unlock(&ailp->xa_lock);
if (tout) {
/* more work to do soon */
schedule_timeout(msecs_to_jiffies(tout));
}
__set_current_state(TASK_RUNNING);
try_to_freeze();
tout = xfsaild_push(ailp);
}
And in xfsaild_push(), move where we sample the push target to before the cursor
setup, and keep a snapshot of it:
/* barrier matches the xa_target update in xfs_ail_push() */
smp_rmb();
target = ailp->xa_target;
ailp->xa_prev_target = target;
This means we do not idle if a new push target was set while we were pushing,
even if we emptied the AIL (call it paranoia!).
We can avoid the returning of a zero timeout from xfsaild_push, too,
because the idling is not based on the state that we return from the
push. Hence we always will return a 10, 20 or 50ms timeout and we
can avoid complicating xfsaild_push logic with idling logic. i.e.
the logic that is there right now should not need modification...
Finally, rather than calling wake_up_process() in the
xfs_ail_push*() functions, call wake_up(&ailp->xa_idle); There can
only be one thread sleeping on that (the xfsaild) so there is no
need to use the wake_up_all() variant...
FWIW, you might be able to do this without the idle wait queue and
just use wake_up_process() -
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-05-23 0:58 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-22 16:38 [RFC PATCH v3 0/2] xfs: fix xfsaild races and re-enable idle mode Brian Foster
2012-05-22 16:38 ` [RFC PATCH v3 1/2] xfs: re-enable xfsaild idle mode when the ail is empty Brian Foster
2012-05-22 16:38 ` [RFC PATCH v3 2/2] xfs: fix xfsaild hang due to lost wake ups Brian Foster
2012-05-23 0:58 ` Dave Chinner [this message]
2012-05-23 13:05 ` Brian Foster
2012-05-24 0:01 ` Dave Chinner
2012-05-23 17:48 ` Brian Foster
2012-05-23 18:19 ` Mark Tinguely
2012-05-23 23:41 ` Brian Foster
2012-05-23 23:53 ` Dave Chinner
2012-05-24 14:38 ` Mark Tinguely
2012-05-24 0:06 ` Dave Chinner
2012-05-24 13:07 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120523005830.GL25351@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.