From: Dave Chinner <david@fromorbit.com>
To: Mark Tinguely <tinguely@sgi.com>
Cc: Eric Sandeen <sandeen@redhat.com>,
Boris Ranto <branto@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
Date: Tue, 29 Apr 2014 08:18:49 +1000 [thread overview]
Message-ID: <20140428221849.GC18672@dastard> (raw)
In-Reply-To: <535ECAA6.3050200@sgi.com>
On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote:
> On 04/28/14 15:54, Dave Chinner wrote:
> >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
> >>Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
> >>to test for a shut down fs, lest we go down paths we'll
> >>never be able to complete; Boris reported that during some
> >>stress tests he had threads stuck in xlog_cil_force_lsn
> >>via xfs_dir_fsync().
> >>
> >>[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
> >>
> >>[ 3663.394472] Call Trace:
> >>[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
> >>[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
> >>[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
> >>[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
> >>[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
> >>[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
> >>[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
> >
> >Wow, I believe it's taken this long for us to notice that we can't
> >break out of xlog_cil_force_lsn() if we fail on xlog_write()
> >from a CIL push.
....
> Similar to what Jeff Liu mention in Dec:
>
> http://oss.sgi.com/archives/xfs/2013-12/msg00870.html
Which fell through the cracks because of objections to calling
wake_up_all(&ctx->cil->xc_commit_wait) from xlog_cil_committed().
FYI, I just independently wrote a patch to fix this, and part of the
fix is that it calls wake_up_all(&ctx->cil->xc_commit_wait) from
xlog_cil_committed(). The rest of the fix indicates that the above
patch wasn't sufficient. Patch below.
This time it isn't going to fall through the cracks because I don't
think the objections are valid...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
xfs: don't sleep in xlog_cil_force_lsn on shutdown
From: Dave Chinner <dchinner@redhat.com>
Reports of a shutdown hang when fsyncing a directory have surfaced,
such as this:
[ 3663.394472] Call Trace:
[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
will never wake waiters on the current push sequence number, so
anything waiting in xlog_cil_force_lsn() for that push sequence
number to come up will not get woken and hence stall the shutdown.
Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
the push abort handling, in the log shutdown code when waking all
waiters, and adding a shutdown check in the sequence completion wait
loops to ensure they abort when a wakeup due to a shutdown occurs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 7 +++++--
fs/xfs/xfs_log_cil.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 47 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index a5f8bd9..dbba2d7 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3952,11 +3952,14 @@ xfs_log_force_umount(
retval = xlog_state_ioerror(log);
spin_unlock(&log->l_icloglock);
}
+
/*
- * Wake up everybody waiting on xfs_log_force.
- * Callback all log item committed functions as if the
+ * Wake up everybody waiting on xfs_log_force. This needs to wake anyone
+ * waiting on a CIL push that is issued as part of a log force first
+ * before running the log item committed callback functions as if the
* log writes were completed.
*/
+ wake_up_all(&log->l_cilp->xc_commit_wait);
xlog_state_do_callback(log, XFS_LI_ABORTED, NULL);
#ifdef XFSERRORDEBUG
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 7e54553..3a68ddf 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -385,7 +385,15 @@ xlog_cil_committed(
xfs_extent_busy_clear(mp, &ctx->busy_extents,
(mp->m_flags & XFS_MOUNT_DISCARD) && !abort);
+ /*
+ * If we are aborting the commit, wake up anyone waiting on the
+ * committing list. If we don't, then a shutdown we can leave processes
+ * waiting in xlog_cil_force_lsn() waiting on a sequence commit that
+ * will never happen because we aborted it.
+ */
spin_lock(&ctx->cil->xc_push_lock);
+ if (abort)
+ wake_up_all(&cil->xc_commit_wait);
list_del(&ctx->committing);
spin_unlock(&ctx->cil->xc_push_lock);
@@ -564,8 +572,18 @@ restart:
spin_lock(&cil->xc_push_lock);
list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
/*
+ * Avoid getting stuck in this loop because we were woken by the
+ * shutdown, but then went back to sleep once already in the
+ * shutdown state.
+ */
+ if (XLOG_FORCED_SHUTDOWN(log)) {
+ spin_unlock(&cil->xc_push_lock);
+ goto out_abort_free_ticket;
+ }
+
+ /*
* Higher sequences will wait for this one so skip them.
- * Don't wait for own own sequence, either.
+ * Don't wait for our own sequence, either.
*/
if (new_ctx->sequence >= ctx->sequence)
continue;
@@ -810,6 +828,13 @@ restart:
*/
spin_lock(&cil->xc_push_lock);
list_for_each_entry(ctx, &cil->xc_committing, committing) {
+ /*
+ * Avoid getting stuck in this loop because we were woken by the
+ * shutdown, but then went back to sleep once already in the
+ * shutdown state.
+ */
+ if (XLOG_FORCED_SHUTDOWN(log))
+ goto out_shutdown;
if (ctx->sequence > sequence)
continue;
if (!ctx->commit_lsn) {
@@ -833,14 +858,12 @@ restart:
* push sequence after the above wait loop and the CIL still contains
* dirty objects.
*
- * When the push occurs, it will empty the CIL and
- * atomically increment the currect sequence past the push sequence and
- * move it into the committing list. Of course, if the CIL is clean at
- * the time of the push, it won't have pushed the CIL at all, so in that
- * case we should try the push for this sequence again from the start
- * just in case.
+ * When the push occurs, it will empty the CIL and atomically increment
+ * the currect sequence past the push sequence and move it into the
+ * committing list. Of course, if the CIL is clean at the time of the
+ * push, it won't have pushed the CIL at all, so in that case we should
+ * try the push for this sequence again from the start just in case.
*/
-
if (sequence == cil->xc_current_sequence &&
!list_empty(&cil->xc_cil)) {
spin_unlock(&cil->xc_push_lock);
@@ -849,6 +872,17 @@ restart:
spin_unlock(&cil->xc_push_lock);
return commit_lsn;
+
+ /*
+ * We detected a shutdown in progress. We need to trigger the log force
+ * to pass through it's iclog state machine error handling, even though
+ * we are already in a shutdown state. Hence we can't return
+ * NULLCOMMITLSN here as that has special meaning to log forces (i.e.
+ * LSN is already stable), so we return a zero LSN instead.
+ */
+out_shutdown:
+ spin_unlock(&cil->xc_push_lock);
+ return 0;
}
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-04-28 22:19 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen
2014-04-28 16:47 ` Christoph Hellwig
2014-04-28 17:18 ` Eric Sandeen
2014-04-28 17:22 ` Mark Tinguely
2014-04-28 17:26 ` Eric Sandeen
2014-04-28 17:49 ` Mark Tinguely
2014-04-28 17:53 ` Eric Sandeen
2014-04-29 10:24 ` Boris Ranto
2014-04-28 20:54 ` Dave Chinner
2014-04-28 21:39 ` Mark Tinguely
2014-04-28 22:18 ` Dave Chinner [this message]
2014-04-28 23:00 ` Mark Tinguely
2014-04-29 12:56 ` Brian Foster
2014-04-29 20:12 ` Dave Chinner
2014-07-21 15:33 ` Eric Sandeen
2014-09-12 19:29 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140428221849.GC18672@dastard \
--to=david@fromorbit.com \
--cc=branto@redhat.com \
--cc=sandeen@redhat.com \
--cc=tinguely@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.