* [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
@ 2014-04-28 16:35 Eric Sandeen
2014-04-28 16:47 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Eric Sandeen @ 2014-04-28 16:35 UTC (permalink / raw)
To: xfs-oss; +Cc: Boris Ranto
Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
to test for a shut down fs, lest we go down paths we'll
never be able to complete; Boris reported that during some
stress tests he had threads stuck in xlog_cil_force_lsn
via xfs_dir_fsync().
[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
[ 3663.394472] Call Trace:
[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
Reported-by: Boris Ranto <branto@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
NB: While I've not asked Boris to test this yet, it seems
clear (?) that dir_fsync should behave the same as
file_fsync() in the face of a shut-down fs.
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 4c749ab..2b94362 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -146,6 +146,9 @@ xfs_dir_fsync(
trace_xfs_dir_fsync(ip);
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return -XFS_ERROR(EIO);
+
xfs_ilock(ip, XFS_ILOCK_SHARED);
if (xfs_ipincount(ip))
lsn = ip->i_itemp->ili_last_lsn;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen
@ 2014-04-28 16:47 ` Christoph Hellwig
2014-04-28 17:18 ` Eric Sandeen
2014-04-28 20:54 ` Dave Chinner
2014-09-12 19:29 ` Eric Sandeen
2 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2014-04-28 16:47 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Boris Ranto, xfs-oss
On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
> to test for a shut down fs,
It probably should.
> lest we go down paths we'll
> never be able to complete; Boris reported that during some
> stress tests he had threads stuck in xlog_cil_force_lsn
> via xfs_dir_fsync().
But this could still happen if we get a shutdown coming in after that
test.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 16:47 ` Christoph Hellwig
@ 2014-04-28 17:18 ` Eric Sandeen
2014-04-28 17:22 ` Mark Tinguely
0 siblings, 1 reply; 16+ messages in thread
From: Eric Sandeen @ 2014-04-28 17:18 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Boris Ranto, xfs-oss
On 4/28/14, 11:47 AM, Christoph Hellwig wrote:
> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>> to test for a shut down fs,
>
> It probably should.
>
>> lest we go down paths we'll
>> never be able to complete; Boris reported that during some
>> stress tests he had threads stuck in xlog_cil_force_lsn
>> via xfs_dir_fsync().
>
> But this could still happen if we get a shutdown coming in after that
> test.
True... that looked a bit hairier to sort out. :(
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 17:18 ` Eric Sandeen
@ 2014-04-28 17:22 ` Mark Tinguely
2014-04-28 17:26 ` Eric Sandeen
0 siblings, 1 reply; 16+ messages in thread
From: Mark Tinguely @ 2014-04-28 17:22 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Christoph Hellwig, Boris Ranto, xfs-oss
On 04/28/14 12:18, Eric Sandeen wrote:
> On 4/28/14, 11:47 AM, Christoph Hellwig wrote:
>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>>> to test for a shut down fs,
>>
>> It probably should.
>>
>>> lest we go down paths we'll
>>> never be able to complete; Boris reported that during some
>>> stress tests he had threads stuck in xlog_cil_force_lsn
>>> via xfs_dir_fsync().
>>
>> But this could still happen if we get a shutdown coming in after that
>> test.
>
> True... that looked a bit hairier to sort out. :(
>
> -Eric
>
Are the sync lsn look okay? Was there an error writing the iclog buffer?
xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the
xfs_log_force_umount if the log buffer write was not in error. That
should be the same or later than this lsn.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 17:22 ` Mark Tinguely
@ 2014-04-28 17:26 ` Eric Sandeen
2014-04-28 17:49 ` Mark Tinguely
0 siblings, 1 reply; 16+ messages in thread
From: Eric Sandeen @ 2014-04-28 17:26 UTC (permalink / raw)
To: Mark Tinguely, Eric Sandeen; +Cc: Christoph Hellwig, Boris Ranto, xfs-oss
On 4/28/14, 12:22 PM, Mark Tinguely wrote:
> On 04/28/14 12:18, Eric Sandeen wrote:
>> On 4/28/14, 11:47 AM, Christoph Hellwig wrote:
>>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>>>> to test for a shut down fs,
>>>
>>> It probably should.
>>>
>>>> lest we go down paths we'll
>>>> never be able to complete; Boris reported that during some
>>>> stress tests he had threads stuck in xlog_cil_force_lsn
>>>> via xfs_dir_fsync().
>>>
>>> But this could still happen if we get a shutdown coming in after that
>>> test.
>>
>> True... that looked a bit hairier to sort out. :(
>>
>> -Eric
>>
>
>
> Are the sync lsn look okay? Was there an error writing the iclog buffer?
>
> xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the
> xfs_log_force_umount if the log buffer write was not in error. That
> should be the same or later than this lsn.
I don't have full details, unfortunately, just the sysrq backtraces.
-Eric
> --Mark.
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 17:26 ` Eric Sandeen
@ 2014-04-28 17:49 ` Mark Tinguely
2014-04-28 17:53 ` Eric Sandeen
2014-04-29 10:24 ` Boris Ranto
0 siblings, 2 replies; 16+ messages in thread
From: Mark Tinguely @ 2014-04-28 17:49 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Christoph Hellwig, Eric Sandeen, Boris Ranto, xfs-oss
On 04/28/14 12:26, Eric Sandeen wrote:
> On 4/28/14, 12:22 PM, Mark Tinguely wrote:
>> On 04/28/14 12:18, Eric Sandeen wrote:
>>> On 4/28/14, 11:47 AM, Christoph Hellwig wrote:
>>>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>>>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>>>>> to test for a shut down fs,
>>>>
>>>> It probably should.
>>>>
>>>>> lest we go down paths we'll
>>>>> never be able to complete; Boris reported that during some
>>>>> stress tests he had threads stuck in xlog_cil_force_lsn
>>>>> via xfs_dir_fsync().
>>>>
>>>> But this could still happen if we get a shutdown coming in after that
>>>> test.
>>>
>>> True... that looked a bit hairier to sort out. :(
>>>
>>> -Eric
>>>
>>
>>
>> Are the sync lsn look okay? Was there an error writing the iclog buffer?
>>
>> xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the
>> xfs_log_force_umount if the log buffer write was not in error. That
>> should be the same or later than this lsn.
>
> I don't have full details, unfortunately, just the sysrq backtraces.
>
> -Eric
>
>> --Mark.
anything in the log as to what caused the hang? You mention a forced
shutdown; what caused the forced shutdown?
Is this the latest bits (3.15)?
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 17:49 ` Mark Tinguely
@ 2014-04-28 17:53 ` Eric Sandeen
2014-04-29 10:24 ` Boris Ranto
1 sibling, 0 replies; 16+ messages in thread
From: Eric Sandeen @ 2014-04-28 17:53 UTC (permalink / raw)
To: Mark Tinguely; +Cc: Christoph Hellwig, Eric Sandeen, Boris Ranto, xfs-oss
On 4/28/14, 12:49 PM, Mark Tinguely wrote:
> On 04/28/14 12:26, Eric Sandeen wrote:
>> On 4/28/14, 12:22 PM, Mark Tinguely wrote:
>>> On 04/28/14 12:18, Eric Sandeen wrote:
>>>> On 4/28/14, 11:47 AM, Christoph Hellwig wrote:
>>>>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>>>>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>>>>>> to test for a shut down fs,
>>>>>
>>>>> It probably should.
>>>>>
>>>>>> lest we go down paths we'll
>>>>>> never be able to complete; Boris reported that during some
>>>>>> stress tests he had threads stuck in xlog_cil_force_lsn
>>>>>> via xfs_dir_fsync().
>>>>>
>>>>> But this could still happen if we get a shutdown coming in after that
>>>>> test.
>>>>
>>>> True... that looked a bit hairier to sort out. :(
>>>>
>>>> -Eric
>>>>
>>>
>>>
>>> Are the sync lsn look okay? Was there an error writing the iclog buffer?
>>>
>>> xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the
>>> xfs_log_force_umount if the log buffer write was not in error. That
>>> should be the same or later than this lsn.
>>
>> I don't have full details, unfortunately, just the sysrq backtraces.
>>
>> -Eric
>>
>>> --Mark.
>
> anything in the log as to what caused the hang? You mention a forced shutdown; what caused the forced shutdown?
Perhaps Boris can answer that.
> Is this the latest bits (3.15)?
It was 3.10 with backported xfs bits from about 3.14.
-Eric
> --Mark.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen
2014-04-28 16:47 ` Christoph Hellwig
@ 2014-04-28 20:54 ` Dave Chinner
2014-04-28 21:39 ` Mark Tinguely
2014-07-21 15:33 ` Eric Sandeen
2014-09-12 19:29 ` Eric Sandeen
2 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2014-04-28 20:54 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Boris Ranto, xfs-oss
On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
> to test for a shut down fs, lest we go down paths we'll
> never be able to complete; Boris reported that during some
> stress tests he had threads stuck in xlog_cil_force_lsn
> via xfs_dir_fsync().
>
> [ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
>
> [ 3663.394472] Call Trace:
> [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
> [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
> [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
> [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
> [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
> [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
> [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
Wow, I believe it's taken this long for us to notice that we can't
break out of xlog_cil_force_lsn() if we fail on xlog_write()
from a CIL push.
I'd say that xlog_cil_force_lsn() needs log shutdown checks before
it goes to sleep in xlog_wait()....
> Reported-by: Boris Ranto <branto@redhat.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
>
> NB: While I've not asked Boris to test this yet, it seems
> clear (?) that dir_fsync should behave the same as
> file_fsync() in the face of a shut-down fs.
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 4c749ab..2b94362 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -146,6 +146,9 @@ xfs_dir_fsync(
>
> trace_xfs_dir_fsync(ip);
>
> + if (XFS_FORCED_SHUTDOWN(mp))
> + return -XFS_ERROR(EIO);
> +
That won't hurt, but it won't fix the problem.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 20:54 ` Dave Chinner
@ 2014-04-28 21:39 ` Mark Tinguely
2014-04-28 22:18 ` Dave Chinner
2014-07-21 15:33 ` Eric Sandeen
1 sibling, 1 reply; 16+ messages in thread
From: Mark Tinguely @ 2014-04-28 21:39 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, Boris Ranto, xfs-oss
On 04/28/14 15:54, Dave Chinner wrote:
> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>> to test for a shut down fs, lest we go down paths we'll
>> never be able to complete; Boris reported that during some
>> stress tests he had threads stuck in xlog_cil_force_lsn
>> via xfs_dir_fsync().
>>
>> [ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
>>
>> [ 3663.394472] Call Trace:
>> [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
>> [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
>> [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
>> [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
>> [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
>> [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
>> [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
>
> Wow, I believe it's taken this long for us to notice that we can't
> break out of xlog_cil_force_lsn() if we fail on xlog_write()
> from a CIL push.
>
> I'd say that xlog_cil_force_lsn() needs log shutdown checks before
> it goes to sleep in xlog_wait()....
>
>> Reported-by: Boris Ranto<branto@redhat.com>
>> Signed-off-by: Eric Sandeen<sandeen@redhat.com>
>> ---
>>
>> NB: While I've not asked Boris to test this yet, it seems
>> clear (?) that dir_fsync should behave the same as
>> file_fsync() in the face of a shut-down fs.
>>
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index 4c749ab..2b94362 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -146,6 +146,9 @@ xfs_dir_fsync(
>>
>> trace_xfs_dir_fsync(ip);
>>
>> + if (XFS_FORCED_SHUTDOWN(mp))
>> + return -XFS_ERROR(EIO);
>> +
>
> That won't hurt, but it won't fix the problem.
>
> Cheers,
>
> Dave.
Similar to what Jeff Liu mention in Dec:
http://oss.sgi.com/archives/xfs/2013-12/msg00870.html
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 21:39 ` Mark Tinguely
@ 2014-04-28 22:18 ` Dave Chinner
2014-04-28 23:00 ` Mark Tinguely
2014-04-29 12:56 ` Brian Foster
0 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2014-04-28 22:18 UTC (permalink / raw)
To: Mark Tinguely; +Cc: Eric Sandeen, Boris Ranto, xfs-oss
On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote:
> On 04/28/14 15:54, Dave Chinner wrote:
> >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
> >>Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
> >>to test for a shut down fs, lest we go down paths we'll
> >>never be able to complete; Boris reported that during some
> >>stress tests he had threads stuck in xlog_cil_force_lsn
> >>via xfs_dir_fsync().
> >>
> >>[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
> >>
> >>[ 3663.394472] Call Trace:
> >>[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
> >>[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
> >>[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
> >>[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
> >>[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
> >>[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
> >>[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
> >
> >Wow, I believe it's taken this long for us to notice that we can't
> >break out of xlog_cil_force_lsn() if we fail on xlog_write()
> >from a CIL push.
....
> Similar to what Jeff Liu mention in Dec:
>
> http://oss.sgi.com/archives/xfs/2013-12/msg00870.html
Which fell through the cracks because of objections to calling
wake_up_all(&ctx->cil->xc_commit_wait) from xlog_cil_committed().
FYI, I just independently wrote a patch to fix this, and part of the
fix is that it calls wake_up_all(&ctx->cil->xc_commit_wait) from
xlog_cil_committed(). The rest of the fix indicates that the above
patch wasn't sufficient. Patch below.
This time it isn't going to fall through the cracks because I don't
think the objections are valid...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
xfs: don't sleep in xlog_cil_force_lsn on shutdown
From: Dave Chinner <dchinner@redhat.com>
Reports of a shutdown hang when fsyncing a directory have surfaced,
such as this:
[ 3663.394472] Call Trace:
[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
will never wake waiters on the current push sequence number, so
anything waiting in xlog_cil_force_lsn() for that push sequence
number to come up will not get woken and hence stall the shutdown.
Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
the push abort handling, in the log shutdown code when waking all
waiters, and adding a shutdown check in the sequence completion wait
loops to ensure they abort when a wakeup due to a shutdown occurs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 7 +++++--
fs/xfs/xfs_log_cil.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 47 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index a5f8bd9..dbba2d7 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3952,11 +3952,14 @@ xfs_log_force_umount(
retval = xlog_state_ioerror(log);
spin_unlock(&log->l_icloglock);
}
+
/*
- * Wake up everybody waiting on xfs_log_force.
- * Callback all log item committed functions as if the
+ * Wake up everybody waiting on xfs_log_force. This needs to wake anyone
+ * waiting on a CIL push that is issued as part of a log force first
+ * before running the log item committed callback functions as if the
* log writes were completed.
*/
+ wake_up_all(&log->l_cilp->xc_commit_wait);
xlog_state_do_callback(log, XFS_LI_ABORTED, NULL);
#ifdef XFSERRORDEBUG
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 7e54553..3a68ddf 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -385,7 +385,15 @@ xlog_cil_committed(
xfs_extent_busy_clear(mp, &ctx->busy_extents,
(mp->m_flags & XFS_MOUNT_DISCARD) && !abort);
+ /*
+ * If we are aborting the commit, wake up anyone waiting on the
+ * committing list. If we don't, then a shutdown we can leave processes
+ * waiting in xlog_cil_force_lsn() waiting on a sequence commit that
+ * will never happen because we aborted it.
+ */
spin_lock(&ctx->cil->xc_push_lock);
+ if (abort)
+ wake_up_all(&cil->xc_commit_wait);
list_del(&ctx->committing);
spin_unlock(&ctx->cil->xc_push_lock);
@@ -564,8 +572,18 @@ restart:
spin_lock(&cil->xc_push_lock);
list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
/*
+ * Avoid getting stuck in this loop because we were woken by the
+ * shutdown, but then went back to sleep once already in the
+ * shutdown state.
+ */
+ if (XLOG_FORCED_SHUTDOWN(log)) {
+ spin_unlock(&cil->xc_push_lock);
+ goto out_abort_free_ticket;
+ }
+
+ /*
* Higher sequences will wait for this one so skip them.
- * Don't wait for own own sequence, either.
+ * Don't wait for our own sequence, either.
*/
if (new_ctx->sequence >= ctx->sequence)
continue;
@@ -810,6 +828,13 @@ restart:
*/
spin_lock(&cil->xc_push_lock);
list_for_each_entry(ctx, &cil->xc_committing, committing) {
+ /*
+ * Avoid getting stuck in this loop because we were woken by the
+ * shutdown, but then went back to sleep once already in the
+ * shutdown state.
+ */
+ if (XLOG_FORCED_SHUTDOWN(log))
+ goto out_shutdown;
if (ctx->sequence > sequence)
continue;
if (!ctx->commit_lsn) {
@@ -833,14 +858,12 @@ restart:
* push sequence after the above wait loop and the CIL still contains
* dirty objects.
*
- * When the push occurs, it will empty the CIL and
- * atomically increment the currect sequence past the push sequence and
- * move it into the committing list. Of course, if the CIL is clean at
- * the time of the push, it won't have pushed the CIL at all, so in that
- * case we should try the push for this sequence again from the start
- * just in case.
+ * When the push occurs, it will empty the CIL and atomically increment
+ * the currect sequence past the push sequence and move it into the
+ * committing list. Of course, if the CIL is clean at the time of the
+ * push, it won't have pushed the CIL at all, so in that case we should
+ * try the push for this sequence again from the start just in case.
*/
-
if (sequence == cil->xc_current_sequence &&
!list_empty(&cil->xc_cil)) {
spin_unlock(&cil->xc_push_lock);
@@ -849,6 +872,17 @@ restart:
spin_unlock(&cil->xc_push_lock);
return commit_lsn;
+
+ /*
+ * We detected a shutdown in progress. We need to trigger the log force
+ * to pass through it's iclog state machine error handling, even though
+ * we are already in a shutdown state. Hence we can't return
+ * NULLCOMMITLSN here as that has special meaning to log forces (i.e.
+ * LSN is already stable), so we return a zero LSN instead.
+ */
+out_shutdown:
+ spin_unlock(&cil->xc_push_lock);
+ return 0;
}
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 22:18 ` Dave Chinner
@ 2014-04-28 23:00 ` Mark Tinguely
2014-04-29 12:56 ` Brian Foster
1 sibling, 0 replies; 16+ messages in thread
From: Mark Tinguely @ 2014-04-28 23:00 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, Boris Ranto, xfs-oss
On 04/28/14 17:18, Dave Chinner wrote:
> On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote:
>> > On 04/28/14 15:54, Dave Chinner wrote:
>>> > >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>>>> > >>Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>>>> > >>to test for a shut down fs, lest we go down paths we'll
>>>> > >>never be able to complete; Boris reported that during some
>>>> > >>stress tests he had threads stuck in xlog_cil_force_lsn
>>>> > >>via xfs_dir_fsync().
>>>> > >>
>>>> > >>[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
>>>> > >>
>>>> > >>[ 3663.394472] Call Trace:
>>>> > >>[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
>>>> > >>[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
>>>> > >>[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
>>>> > >>[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
>>>> > >>[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
>>>> > >>[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
>>>> > >>[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
>>> > >
>>> > >Wow, I believe it's taken this long for us to notice that we can't
>>> > >break out of xlog_cil_force_lsn() if we fail on xlog_write()
>> > >from a CIL push.
> ....
>
>> > Similar to what Jeff Liu mention in Dec:
>> >
>> > http://oss.sgi.com/archives/xfs/2013-12/msg00870.html
> Which fell through the cracks because of objections to calling
> wake_up_all(&ctx->cil->xc_commit_wait) from xlog_cil_committed().
>
> FYI, I just independently wrote a patch to fix this, and part of the
> fix is that it calls wake_up_all(&ctx->cil->xc_commit_wait) from
> xlog_cil_committed(). The rest of the fix indicates that the above
> patch wasn't sufficient. Patch below.
>
> This time it isn't going to fall through the cracks because I don't
> think the objections are valid...
>
> Cheers,
>
> Dave.
> --
I did not intend to stall out the patch.
I came to like the idea of always notifying the waiters on an lsn after
the iclog is successfully written out not just when we start the IO.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 17:49 ` Mark Tinguely
2014-04-28 17:53 ` Eric Sandeen
@ 2014-04-29 10:24 ` Boris Ranto
1 sibling, 0 replies; 16+ messages in thread
From: Boris Ranto @ 2014-04-29 10:24 UTC (permalink / raw)
To: Mark Tinguely; +Cc: Christoph Hellwig, Eric Sandeen, Eric Sandeen, xfs-oss
On Mon, 2014-04-28 at 12:49 -0500, Mark Tinguely wrote:
> On 04/28/14 12:26, Eric Sandeen wrote:
> > On 4/28/14, 12:22 PM, Mark Tinguely wrote:
> >> On 04/28/14 12:18, Eric Sandeen wrote:
> >>> On 4/28/14, 11:47 AM, Christoph Hellwig wrote:
> >>>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
> >>>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
> >>>>> to test for a shut down fs,
> >>>>
> >>>> It probably should.
> >>>>
> >>>>> lest we go down paths we'll
> >>>>> never be able to complete; Boris reported that during some
> >>>>> stress tests he had threads stuck in xlog_cil_force_lsn
> >>>>> via xfs_dir_fsync().
> >>>>
> >>>> But this could still happen if we get a shutdown coming in after that
> >>>> test.
> >>>
> >>> True... that looked a bit hairier to sort out. :(
> >>>
> >>> -Eric
> >>>
> >>
> >>
> >> Are the sync lsn look okay? Was there an error writing the iclog buffer?
> >>
> >> xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the
> >> xfs_log_force_umount if the log buffer write was not in error. That
> >> should be the same or later than this lsn.
> >
> > I don't have full details, unfortunately, just the sysrq backtraces.
> >
> > -Eric
> >
> >> --Mark.
>
> anything in the log as to what caused the hang? You mention a forced
> shutdown; what caused the forced shutdown?
Hi,
I used src/godown utility from xfstests to simulate a power failure and
cause the forced shutdown. The kernel does not report much. All I get is
this kernel message (it repeats every few seconds):
[498704.267273] XFS (dm-3): xfs_log_force: error 5 returned.
Nothing else is reported by the kernel albeit ls /mnt/point already
returns EIO.
The whole test is fairly simple to describe: It runs several threads
that ~randomly create/delete/truncate/mmap/fdatasync files in several
iterations, then (after ~2 minutes) the src/godown utility is run and
when the program gets the message from all the threads that they got
some I/O errors, it will remount the fs and tell all the threads to
continue with verification of fsynced data.
The trouble is that the program will never get the message from all the
threads because some of them are stuck in D state on fsync/fdatasync
syscall after the src/godown utility is run
> Is this the latest bits (3.15)?
>
As Eric mentioned earlier, it is 3.10 with a lot of backports from later
kernels.
Boris
> --Mark.
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 22:18 ` Dave Chinner
2014-04-28 23:00 ` Mark Tinguely
@ 2014-04-29 12:56 ` Brian Foster
2014-04-29 20:12 ` Dave Chinner
1 sibling, 1 reply; 16+ messages in thread
From: Brian Foster @ 2014-04-29 12:56 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, Mark Tinguely, Boris Ranto, xfs-oss
On Tue, Apr 29, 2014 at 08:18:49AM +1000, Dave Chinner wrote:
> On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote:
> > On 04/28/14 15:54, Dave Chinner wrote:
> > >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
...
>
> xfs: don't sleep in xlog_cil_force_lsn on shutdown
>
> From: Dave Chinner <dchinner@redhat.com>
>
> Reports of a shutdown hang when fsyncing a directory have surfaced,
> such as this:
>
> [ 3663.394472] Call Trace:
> [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
> [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
> [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
> [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
> [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
> [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
> [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
>
> If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
> will never wake waiters on the current push sequence number, so
> anything waiting in xlog_cil_force_lsn() for that push sequence
> number to come up will not get woken and hence stall the shutdown.
>
> Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
> the push abort handling, in the log shutdown code when waking all
> waiters, and adding a shutdown check in the sequence completion wait
> loops to ensure they abort when a wakeup due to a shutdown occurs.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 7 +++++--
> fs/xfs/xfs_log_cil.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 47 insertions(+), 10 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index a5f8bd9..dbba2d7 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3952,11 +3952,14 @@ xfs_log_force_umount(
> retval = xlog_state_ioerror(log);
> spin_unlock(&log->l_icloglock);
> }
> +
> /*
> - * Wake up everybody waiting on xfs_log_force.
> - * Callback all log item committed functions as if the
> + * Wake up everybody waiting on xfs_log_force. This needs to wake anyone
> + * waiting on a CIL push that is issued as part of a log force first
> + * before running the log item committed callback functions as if the
> * log writes were completed.
> */
> + wake_up_all(&log->l_cilp->xc_commit_wait);
> xlog_state_do_callback(log, XFS_LI_ABORTED, NULL);
>
Why is this necessary? It looks like xlog_state_do_callback() will hit
the xlog_cil_committed() callback with the aborted flag...
> #ifdef XFSERRORDEBUG
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 7e54553..3a68ddf 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -385,7 +385,15 @@ xlog_cil_committed(
> xfs_extent_busy_clear(mp, &ctx->busy_extents,
> (mp->m_flags & XFS_MOUNT_DISCARD) && !abort);
>
> + /*
> + * If we are aborting the commit, wake up anyone waiting on the
> + * committing list. If we don't, then a shutdown we can leave processes
> + * waiting in xlog_cil_force_lsn() waiting on a sequence commit that
> + * will never happen because we aborted it.
> + */
> spin_lock(&ctx->cil->xc_push_lock);
> + if (abort)
> + wake_up_all(&cil->xc_commit_wait);
There's a compile error here. The parameter should be '&ctx->cil->...'
Brian
> list_del(&ctx->committing);
> spin_unlock(&ctx->cil->xc_push_lock);
>
> @@ -564,8 +572,18 @@ restart:
> spin_lock(&cil->xc_push_lock);
> list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
> /*
> + * Avoid getting stuck in this loop because we were woken by the
> + * shutdown, but then went back to sleep once already in the
> + * shutdown state.
> + */
> + if (XLOG_FORCED_SHUTDOWN(log)) {
> + spin_unlock(&cil->xc_push_lock);
> + goto out_abort_free_ticket;
> + }
> +
> + /*
> * Higher sequences will wait for this one so skip them.
> - * Don't wait for own own sequence, either.
> + * Don't wait for our own sequence, either.
> */
> if (new_ctx->sequence >= ctx->sequence)
> continue;
> @@ -810,6 +828,13 @@ restart:
> */
> spin_lock(&cil->xc_push_lock);
> list_for_each_entry(ctx, &cil->xc_committing, committing) {
> + /*
> + * Avoid getting stuck in this loop because we were woken by the
> + * shutdown, but then went back to sleep once already in the
> + * shutdown state.
> + */
> + if (XLOG_FORCED_SHUTDOWN(log))
> + goto out_shutdown;
> if (ctx->sequence > sequence)
> continue;
> if (!ctx->commit_lsn) {
> @@ -833,14 +858,12 @@ restart:
> * push sequence after the above wait loop and the CIL still contains
> * dirty objects.
> *
> - * When the push occurs, it will empty the CIL and
> - * atomically increment the currect sequence past the push sequence and
> - * move it into the committing list. Of course, if the CIL is clean at
> - * the time of the push, it won't have pushed the CIL at all, so in that
> - * case we should try the push for this sequence again from the start
> - * just in case.
> + * When the push occurs, it will empty the CIL and atomically increment
> + * the currect sequence past the push sequence and move it into the
> + * committing list. Of course, if the CIL is clean at the time of the
> + * push, it won't have pushed the CIL at all, so in that case we should
> + * try the push for this sequence again from the start just in case.
> */
> -
> if (sequence == cil->xc_current_sequence &&
> !list_empty(&cil->xc_cil)) {
> spin_unlock(&cil->xc_push_lock);
> @@ -849,6 +872,17 @@ restart:
>
> spin_unlock(&cil->xc_push_lock);
> return commit_lsn;
> +
> + /*
> + * We detected a shutdown in progress. We need to trigger the log force
> + * to pass through it's iclog state machine error handling, even though
> + * we are already in a shutdown state. Hence we can't return
> + * NULLCOMMITLSN here as that has special meaning to log forces (i.e.
> + * LSN is already stable), so we return a zero LSN instead.
> + */
> +out_shutdown:
> + spin_unlock(&cil->xc_push_lock);
> + return 0;
> }
>
> /*
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-29 12:56 ` Brian Foster
@ 2014-04-29 20:12 ` Dave Chinner
0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2014-04-29 20:12 UTC (permalink / raw)
To: Brian Foster; +Cc: Eric Sandeen, Mark Tinguely, Boris Ranto, xfs-oss
On Tue, Apr 29, 2014 at 08:56:49AM -0400, Brian Foster wrote:
> On Tue, Apr 29, 2014 at 08:18:49AM +1000, Dave Chinner wrote:
> > On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote:
> > > On 04/28/14 15:54, Dave Chinner wrote:
> > > >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
> ...
> >
> > xfs: don't sleep in xlog_cil_force_lsn on shutdown
> >
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Reports of a shutdown hang when fsyncing a directory have surfaced,
> > such as this:
> >
> > [ 3663.394472] Call Trace:
> > [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
> > [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
> > [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
> > [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
> > [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
> > [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
> > [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
> >
> > If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
> > will never wake waiters on the current push sequence number, so
> > anything waiting in xlog_cil_force_lsn() for that push sequence
> > number to come up will not get woken and hence stall the shutdown.
> >
> > Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
> > the push abort handling, in the log shutdown code when waking all
> > waiters, and adding a shutdown check in the sequence completion wait
> > loops to ensure they abort when a wakeup due to a shutdown occurs.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_log.c | 7 +++++--
> > fs/xfs/xfs_log_cil.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
> > 2 files changed, 47 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index a5f8bd9..dbba2d7 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -3952,11 +3952,14 @@ xfs_log_force_umount(
> > retval = xlog_state_ioerror(log);
> > spin_unlock(&log->l_icloglock);
> > }
> > +
> > /*
> > - * Wake up everybody waiting on xfs_log_force.
> > - * Callback all log item committed functions as if the
> > + * Wake up everybody waiting on xfs_log_force. This needs to wake anyone
> > + * waiting on a CIL push that is issued as part of a log force first
> > + * before running the log item committed callback functions as if the
> > * log writes were completed.
> > */
> > + wake_up_all(&log->l_cilp->xc_commit_wait);
> > xlog_state_do_callback(log, XFS_LI_ABORTED, NULL);
> >
>
> Why is this necessary? It looks like xlog_state_do_callback() will hit
> the xlog_cil_committed() callback with the aborted flag...
I added it before adding the code into xlog_cil_committed().
Regardless, I think it's a good idea to have a specific wakeup to
for waiters catch the shutdown purely from a defensive POV - if we
screw up something else, we will still get a wakeup after the log is
marked shutdown and not hang...
> > #ifdef XFSERRORDEBUG
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 7e54553..3a68ddf 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -385,7 +385,15 @@ xlog_cil_committed(
> > xfs_extent_busy_clear(mp, &ctx->busy_extents,
> > (mp->m_flags & XFS_MOUNT_DISCARD) && !abort);
> >
> > + /*
> > + * If we are aborting the commit, wake up anyone waiting on the
> > + * committing list. If we don't, then a shutdown we can leave processes
> > + * waiting in xlog_cil_force_lsn() waiting on a sequence commit that
> > + * will never happen because we aborted it.
> > + */
> > spin_lock(&ctx->cil->xc_push_lock);
> > + if (abort)
> > + wake_up_all(&cil->xc_commit_wait);
>
> There's a compile error here. The parameter should be '&ctx->cil->...'
Forgot to refresh the patch - the tested version has this fix.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 20:54 ` Dave Chinner
2014-04-28 21:39 ` Mark Tinguely
@ 2014-07-21 15:33 ` Eric Sandeen
1 sibling, 0 replies; 16+ messages in thread
From: Eric Sandeen @ 2014-07-21 15:33 UTC (permalink / raw)
To: Dave Chinner, Eric Sandeen; +Cc: Boris Ranto, xfs-oss
On 4/28/14, 3:54 PM, Dave Chinner wrote:
> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote:
>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
>> to test for a shut down fs, lest we go down paths we'll
>> never be able to complete; Boris reported that during some
>> stress tests he had threads stuck in xlog_cil_force_lsn
>> via xfs_dir_fsync().
>>
>> [ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
>>
>> [ 3663.394472] Call Trace:
>> [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
>> [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
>> [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
>> [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
>> [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
>> [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
>> [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
>
> Wow, I believe it's taken this long for us to notice that we can't
> break out of xlog_cil_force_lsn() if we fail on xlog_write()
> from a CIL push.
>
> I'd say that xlog_cil_force_lsn() needs log shutdown checks before
> it goes to sleep in xlog_wait()....
>
>> Reported-by: Boris Ranto <branto@redhat.com>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> NB: While I've not asked Boris to test this yet, it seems
>> clear (?) that dir_fsync should behave the same as
>> file_fsync() in the face of a shut-down fs.
>>
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index 4c749ab..2b94362 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -146,6 +146,9 @@ xfs_dir_fsync(
>>
>> trace_xfs_dir_fsync(ip);
>>
>> + if (XFS_FORCED_SHUTDOWN(mp))
>> + return -XFS_ERROR(EIO);
>> +
>
> That won't hurt, but it won't fix the problem.
So, you did solve the problem properly I guess, in
commit ac983517ec5941da0c58cacdbad10a231dc4e001
Author: Dave Chinner <dchinner@redhat.com>
Date: Wed May 7 08:05:50 2014 +1000
xfs: don't sleep in xlog_cil_force_lsn on shutdown
so thanks. :) Should my patch still go in, to be consistent
with file_fsync() paths?
-Eric
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen
2014-04-28 16:47 ` Christoph Hellwig
2014-04-28 20:54 ` Dave Chinner
@ 2014-09-12 19:29 ` Eric Sandeen
2 siblings, 0 replies; 16+ messages in thread
From: Eric Sandeen @ 2014-09-12 19:29 UTC (permalink / raw)
To: Eric Sandeen, xfs-oss; +Cc: Boris Ranto
On 4/28/14 11:35 AM, Eric Sandeen wrote:
> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
> to test for a shut down fs, lest we go down paths we'll
> never be able to complete; Boris reported that during some
> stress tests he had threads stuck in xlog_cil_force_lsn
> via xfs_dir_fsync().
(re-ping)
So Dave, you fixed this with:
So, you did solve the problem properly I guess, in
commit ac983517ec5941da0c58cacdbad10a231dc4e001
Author: Dave Chinner <dchinner@redhat.com>
Date: Wed May 7 08:05:50 2014 +1000
xfs: don't sleep in xlog_cil_force_lsn on shutdown
But should my patch still go in, if only to be consistent
with file_fsync() paths?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-09-12 19:29 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen
2014-04-28 16:47 ` Christoph Hellwig
2014-04-28 17:18 ` Eric Sandeen
2014-04-28 17:22 ` Mark Tinguely
2014-04-28 17:26 ` Eric Sandeen
2014-04-28 17:49 ` Mark Tinguely
2014-04-28 17:53 ` Eric Sandeen
2014-04-29 10:24 ` Boris Ranto
2014-04-28 20:54 ` Dave Chinner
2014-04-28 21:39 ` Mark Tinguely
2014-04-28 22:18 ` Dave Chinner
2014-04-28 23:00 ` Mark Tinguely
2014-04-29 12:56 ` Brian Foster
2014-04-29 20:12 ` Dave Chinner
2014-07-21 15:33 ` Eric Sandeen
2014-09-12 19:29 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).