* [PATCH] xfs: test for shut down fs in xfs_dir_fsync()
@ 2014-04-28 16:35 Eric Sandeen
2014-04-28 16:47 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Eric Sandeen @ 2014-04-28 16:35 UTC (permalink / raw)
To: xfs-oss; +Cc: Boris Ranto
Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs
to test for a shut down fs, lest we go down paths we'll
never be able to complete; Boris reported that during some
stress tests he had threads stuck in xlog_cil_force_lsn
via xfs_dir_fsync().
[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080
[ 3663.394472] Call Trace:
[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70
[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0
[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
Reported-by: Boris Ranto <branto@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
NB: While I've not asked Boris to test this yet, it seems
clear (?) that dir_fsync should behave the same as
file_fsync() in the face of a shut-down fs.
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 4c749ab..2b94362 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -146,6 +146,9 @@ xfs_dir_fsync(
trace_xfs_dir_fsync(ip);
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return -XFS_ERROR(EIO);
+
xfs_ilock(ip, XFS_ILOCK_SHARED);
if (xfs_ipincount(ip))
lsn = ip->i_itemp->ili_last_lsn;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen @ 2014-04-28 16:47 ` Christoph Hellwig 2014-04-28 17:18 ` Eric Sandeen 2014-04-28 20:54 ` Dave Chinner 2014-09-12 19:29 ` Eric Sandeen 2 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2014-04-28 16:47 UTC (permalink / raw) To: Eric Sandeen; +Cc: Boris Ranto, xfs-oss On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: > Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs > to test for a shut down fs, It probably should. > lest we go down paths we'll > never be able to complete; Boris reported that during some > stress tests he had threads stuck in xlog_cil_force_lsn > via xfs_dir_fsync(). But this could still happen if we get a shutdown coming in after that test. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 16:47 ` Christoph Hellwig @ 2014-04-28 17:18 ` Eric Sandeen 2014-04-28 17:22 ` Mark Tinguely 0 siblings, 1 reply; 16+ messages in thread From: Eric Sandeen @ 2014-04-28 17:18 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Boris Ranto, xfs-oss On 4/28/14, 11:47 AM, Christoph Hellwig wrote: > On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >> to test for a shut down fs, > > It probably should. > >> lest we go down paths we'll >> never be able to complete; Boris reported that during some >> stress tests he had threads stuck in xlog_cil_force_lsn >> via xfs_dir_fsync(). > > But this could still happen if we get a shutdown coming in after that > test. True... that looked a bit hairier to sort out. :( -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 17:18 ` Eric Sandeen @ 2014-04-28 17:22 ` Mark Tinguely 2014-04-28 17:26 ` Eric Sandeen 0 siblings, 1 reply; 16+ messages in thread From: Mark Tinguely @ 2014-04-28 17:22 UTC (permalink / raw) To: Eric Sandeen; +Cc: Christoph Hellwig, Boris Ranto, xfs-oss On 04/28/14 12:18, Eric Sandeen wrote: > On 4/28/14, 11:47 AM, Christoph Hellwig wrote: >> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >>> to test for a shut down fs, >> >> It probably should. >> >>> lest we go down paths we'll >>> never be able to complete; Boris reported that during some >>> stress tests he had threads stuck in xlog_cil_force_lsn >>> via xfs_dir_fsync(). >> >> But this could still happen if we get a shutdown coming in after that >> test. > > True... that looked a bit hairier to sort out. :( > > -Eric > Are the sync lsn look okay? Was there an error writing the iclog buffer? xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the xfs_log_force_umount if the log buffer write was not in error. That should be the same or later than this lsn. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 17:22 ` Mark Tinguely @ 2014-04-28 17:26 ` Eric Sandeen 2014-04-28 17:49 ` Mark Tinguely 0 siblings, 1 reply; 16+ messages in thread From: Eric Sandeen @ 2014-04-28 17:26 UTC (permalink / raw) To: Mark Tinguely, Eric Sandeen; +Cc: Christoph Hellwig, Boris Ranto, xfs-oss On 4/28/14, 12:22 PM, Mark Tinguely wrote: > On 04/28/14 12:18, Eric Sandeen wrote: >> On 4/28/14, 11:47 AM, Christoph Hellwig wrote: >>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >>>> to test for a shut down fs, >>> >>> It probably should. >>> >>>> lest we go down paths we'll >>>> never be able to complete; Boris reported that during some >>>> stress tests he had threads stuck in xlog_cil_force_lsn >>>> via xfs_dir_fsync(). >>> >>> But this could still happen if we get a shutdown coming in after that >>> test. >> >> True... that looked a bit hairier to sort out. :( >> >> -Eric >> > > > Are the sync lsn look okay? Was there an error writing the iclog buffer? > > xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the > xfs_log_force_umount if the log buffer write was not in error. That > should be the same or later than this lsn. I don't have full details, unfortunately, just the sysrq backtraces. -Eric > --Mark. > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 17:26 ` Eric Sandeen @ 2014-04-28 17:49 ` Mark Tinguely 2014-04-28 17:53 ` Eric Sandeen 2014-04-29 10:24 ` Boris Ranto 0 siblings, 2 replies; 16+ messages in thread From: Mark Tinguely @ 2014-04-28 17:49 UTC (permalink / raw) To: Eric Sandeen; +Cc: Christoph Hellwig, Eric Sandeen, Boris Ranto, xfs-oss On 04/28/14 12:26, Eric Sandeen wrote: > On 4/28/14, 12:22 PM, Mark Tinguely wrote: >> On 04/28/14 12:18, Eric Sandeen wrote: >>> On 4/28/14, 11:47 AM, Christoph Hellwig wrote: >>>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >>>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >>>>> to test for a shut down fs, >>>> >>>> It probably should. >>>> >>>>> lest we go down paths we'll >>>>> never be able to complete; Boris reported that during some >>>>> stress tests he had threads stuck in xlog_cil_force_lsn >>>>> via xfs_dir_fsync(). >>>> >>>> But this could still happen if we get a shutdown coming in after that >>>> test. >>> >>> True... that looked a bit hairier to sort out. :( >>> >>> -Eric >>> >> >> >> Are the sync lsn look okay? Was there an error writing the iclog buffer? >> >> xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the >> xfs_log_force_umount if the log buffer write was not in error. That >> should be the same or later than this lsn. > > I don't have full details, unfortunately, just the sysrq backtraces. > > -Eric > >> --Mark. anything in the log as to what caused the hang? You mention a forced shutdown; what caused the forced shutdown? Is this the latest bits (3.15)? --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 17:49 ` Mark Tinguely @ 2014-04-28 17:53 ` Eric Sandeen 2014-04-29 10:24 ` Boris Ranto 1 sibling, 0 replies; 16+ messages in thread From: Eric Sandeen @ 2014-04-28 17:53 UTC (permalink / raw) To: Mark Tinguely; +Cc: Christoph Hellwig, Eric Sandeen, Boris Ranto, xfs-oss On 4/28/14, 12:49 PM, Mark Tinguely wrote: > On 04/28/14 12:26, Eric Sandeen wrote: >> On 4/28/14, 12:22 PM, Mark Tinguely wrote: >>> On 04/28/14 12:18, Eric Sandeen wrote: >>>> On 4/28/14, 11:47 AM, Christoph Hellwig wrote: >>>>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >>>>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >>>>>> to test for a shut down fs, >>>>> >>>>> It probably should. >>>>> >>>>>> lest we go down paths we'll >>>>>> never be able to complete; Boris reported that during some >>>>>> stress tests he had threads stuck in xlog_cil_force_lsn >>>>>> via xfs_dir_fsync(). >>>>> >>>>> But this could still happen if we get a shutdown coming in after that >>>>> test. >>>> >>>> True... that looked a bit hairier to sort out. :( >>>> >>>> -Eric >>>> >>> >>> >>> Are the sync lsn look okay? Was there an error writing the iclog buffer? >>> >>> xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the >>> xfs_log_force_umount if the log buffer write was not in error. That >>> should be the same or later than this lsn. >> >> I don't have full details, unfortunately, just the sysrq backtraces. >> >> -Eric >> >>> --Mark. > > anything in the log as to what caused the hang? You mention a forced shutdown; what caused the forced shutdown? Perhaps Boris can answer that. > Is this the latest bits (3.15)? It was 3.10 with backported xfs bits from about 3.14. -Eric > --Mark. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 17:49 ` Mark Tinguely 2014-04-28 17:53 ` Eric Sandeen @ 2014-04-29 10:24 ` Boris Ranto 1 sibling, 0 replies; 16+ messages in thread From: Boris Ranto @ 2014-04-29 10:24 UTC (permalink / raw) To: Mark Tinguely; +Cc: Christoph Hellwig, Eric Sandeen, Eric Sandeen, xfs-oss On Mon, 2014-04-28 at 12:49 -0500, Mark Tinguely wrote: > On 04/28/14 12:26, Eric Sandeen wrote: > > On 4/28/14, 12:22 PM, Mark Tinguely wrote: > >> On 04/28/14 12:18, Eric Sandeen wrote: > >>> On 4/28/14, 11:47 AM, Christoph Hellwig wrote: > >>>> On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: > >>>>> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs > >>>>> to test for a shut down fs, > >>>> > >>>> It probably should. > >>>> > >>>>> lest we go down paths we'll > >>>>> never be able to complete; Boris reported that during some > >>>>> stress tests he had threads stuck in xlog_cil_force_lsn > >>>>> via xfs_dir_fsync(). > >>>> > >>>> But this could still happen if we get a shutdown coming in after that > >>>> test. > >>> > >>> True... that looked a bit hairier to sort out. :( > >>> > >>> -Eric > >>> > >> > >> > >> Are the sync lsn look okay? Was there an error writing the iclog buffer? > >> > >> xfs_do_force_shutdown() will also do a xlog_cil_force_lsn() via the > >> xfs_log_force_umount if the log buffer write was not in error. That > >> should be the same or later than this lsn. > > > > I don't have full details, unfortunately, just the sysrq backtraces. > > > > -Eric > > > >> --Mark. > > anything in the log as to what caused the hang? You mention a forced > shutdown; what caused the forced shutdown? Hi, I used src/godown utility from xfstests to simulate a power failure and cause the forced shutdown. The kernel does not report much. All I get is this kernel message (it repeats every few seconds): [498704.267273] XFS (dm-3): xfs_log_force: error 5 returned. Nothing else is reported by the kernel albeit ls /mnt/point already returns EIO. The whole test is fairly simple to describe: It runs several threads that ~randomly create/delete/truncate/mmap/fdatasync files in several iterations, then (after ~2 minutes) the src/godown utility is run and when the program gets the message from all the threads that they got some I/O errors, it will remount the fs and tell all the threads to continue with verification of fsynced data. The trouble is that the program will never get the message from all the threads because some of them are stuck in D state on fsync/fdatasync syscall after the src/godown utility is run > Is this the latest bits (3.15)? > As Eric mentioned earlier, it is 3.10 with a lot of backports from later kernels. Boris > --Mark. > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen 2014-04-28 16:47 ` Christoph Hellwig @ 2014-04-28 20:54 ` Dave Chinner 2014-04-28 21:39 ` Mark Tinguely 2014-07-21 15:33 ` Eric Sandeen 2014-09-12 19:29 ` Eric Sandeen 2 siblings, 2 replies; 16+ messages in thread From: Dave Chinner @ 2014-04-28 20:54 UTC (permalink / raw) To: Eric Sandeen; +Cc: Boris Ranto, xfs-oss On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: > Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs > to test for a shut down fs, lest we go down paths we'll > never be able to complete; Boris reported that during some > stress tests he had threads stuck in xlog_cil_force_lsn > via xfs_dir_fsync(). > > [ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080 > > [ 3663.394472] Call Trace: > [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 > [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] > [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] > [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] > [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 > [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 > [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b Wow, I believe it's taken this long for us to notice that we can't break out of xlog_cil_force_lsn() if we fail on xlog_write() from a CIL push. I'd say that xlog_cil_force_lsn() needs log shutdown checks before it goes to sleep in xlog_wait().... > Reported-by: Boris Ranto <branto@redhat.com> > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > --- > > NB: While I've not asked Boris to test this yet, it seems > clear (?) that dir_fsync should behave the same as > file_fsync() in the face of a shut-down fs. > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 4c749ab..2b94362 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -146,6 +146,9 @@ xfs_dir_fsync( > > trace_xfs_dir_fsync(ip); > > + if (XFS_FORCED_SHUTDOWN(mp)) > + return -XFS_ERROR(EIO); > + That won't hurt, but it won't fix the problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 20:54 ` Dave Chinner @ 2014-04-28 21:39 ` Mark Tinguely 2014-04-28 22:18 ` Dave Chinner 2014-07-21 15:33 ` Eric Sandeen 1 sibling, 1 reply; 16+ messages in thread From: Mark Tinguely @ 2014-04-28 21:39 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, Boris Ranto, xfs-oss On 04/28/14 15:54, Dave Chinner wrote: > On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >> to test for a shut down fs, lest we go down paths we'll >> never be able to complete; Boris reported that during some >> stress tests he had threads stuck in xlog_cil_force_lsn >> via xfs_dir_fsync(). >> >> [ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080 >> >> [ 3663.394472] Call Trace: >> [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 >> [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] >> [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] >> [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] >> [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 >> [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 >> [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b > > Wow, I believe it's taken this long for us to notice that we can't > break out of xlog_cil_force_lsn() if we fail on xlog_write() > from a CIL push. > > I'd say that xlog_cil_force_lsn() needs log shutdown checks before > it goes to sleep in xlog_wait().... > >> Reported-by: Boris Ranto<branto@redhat.com> >> Signed-off-by: Eric Sandeen<sandeen@redhat.com> >> --- >> >> NB: While I've not asked Boris to test this yet, it seems >> clear (?) that dir_fsync should behave the same as >> file_fsync() in the face of a shut-down fs. >> >> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c >> index 4c749ab..2b94362 100644 >> --- a/fs/xfs/xfs_file.c >> +++ b/fs/xfs/xfs_file.c >> @@ -146,6 +146,9 @@ xfs_dir_fsync( >> >> trace_xfs_dir_fsync(ip); >> >> + if (XFS_FORCED_SHUTDOWN(mp)) >> + return -XFS_ERROR(EIO); >> + > > That won't hurt, but it won't fix the problem. > > Cheers, > > Dave. Similar to what Jeff Liu mention in Dec: http://oss.sgi.com/archives/xfs/2013-12/msg00870.html --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 21:39 ` Mark Tinguely @ 2014-04-28 22:18 ` Dave Chinner 2014-04-28 23:00 ` Mark Tinguely 2014-04-29 12:56 ` Brian Foster 0 siblings, 2 replies; 16+ messages in thread From: Dave Chinner @ 2014-04-28 22:18 UTC (permalink / raw) To: Mark Tinguely; +Cc: Eric Sandeen, Boris Ranto, xfs-oss On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote: > On 04/28/14 15:54, Dave Chinner wrote: > >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: > >>Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs > >>to test for a shut down fs, lest we go down paths we'll > >>never be able to complete; Boris reported that during some > >>stress tests he had threads stuck in xlog_cil_force_lsn > >>via xfs_dir_fsync(). > >> > >>[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080 > >> > >>[ 3663.394472] Call Trace: > >>[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 > >>[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] > >>[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] > >>[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] > >>[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 > >>[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 > >>[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b > > > >Wow, I believe it's taken this long for us to notice that we can't > >break out of xlog_cil_force_lsn() if we fail on xlog_write() > >from a CIL push. .... > Similar to what Jeff Liu mention in Dec: > > http://oss.sgi.com/archives/xfs/2013-12/msg00870.html Which fell through the cracks because of objections to calling wake_up_all(&ctx->cil->xc_commit_wait) from xlog_cil_committed(). FYI, I just independently wrote a patch to fix this, and part of the fix is that it calls wake_up_all(&ctx->cil->xc_commit_wait) from xlog_cil_committed(). The rest of the fix indicates that the above patch wasn't sufficient. Patch below. This time it isn't going to fall through the cracks because I don't think the objections are valid... Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: don't sleep in xlog_cil_force_lsn on shutdown From: Dave Chinner <dchinner@redhat.com> Reports of a shutdown hang when fsyncing a directory have surfaced, such as this: [ 3663.394472] Call Trace: [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b If we trigger a shutdown in xlog_cil_push() from xlog_write(), we will never wake waiters on the current push sequence number, so anything waiting in xlog_cil_force_lsn() for that push sequence number to come up will not get woken and hence stall the shutdown. Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in the push abort handling, in the log shutdown code when waking all waiters, and adding a shutdown check in the sequence completion wait loops to ensure they abort when a wakeup due to a shutdown occurs. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_log.c | 7 +++++-- fs/xfs/xfs_log_cil.c | 50 ++++++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 47 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index a5f8bd9..dbba2d7 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -3952,11 +3952,14 @@ xfs_log_force_umount( retval = xlog_state_ioerror(log); spin_unlock(&log->l_icloglock); } + /* - * Wake up everybody waiting on xfs_log_force. - * Callback all log item committed functions as if the + * Wake up everybody waiting on xfs_log_force. This needs to wake anyone + * waiting on a CIL push that is issued as part of a log force first + * before running the log item committed callback functions as if the * log writes were completed. */ + wake_up_all(&log->l_cilp->xc_commit_wait); xlog_state_do_callback(log, XFS_LI_ABORTED, NULL); #ifdef XFSERRORDEBUG diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 7e54553..3a68ddf 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -385,7 +385,15 @@ xlog_cil_committed( xfs_extent_busy_clear(mp, &ctx->busy_extents, (mp->m_flags & XFS_MOUNT_DISCARD) && !abort); + /* + * If we are aborting the commit, wake up anyone waiting on the + * committing list. If we don't, then a shutdown we can leave processes + * waiting in xlog_cil_force_lsn() waiting on a sequence commit that + * will never happen because we aborted it. + */ spin_lock(&ctx->cil->xc_push_lock); + if (abort) + wake_up_all(&cil->xc_commit_wait); list_del(&ctx->committing); spin_unlock(&ctx->cil->xc_push_lock); @@ -564,8 +572,18 @@ restart: spin_lock(&cil->xc_push_lock); list_for_each_entry(new_ctx, &cil->xc_committing, committing) { /* + * Avoid getting stuck in this loop because we were woken by the + * shutdown, but then went back to sleep once already in the + * shutdown state. + */ + if (XLOG_FORCED_SHUTDOWN(log)) { + spin_unlock(&cil->xc_push_lock); + goto out_abort_free_ticket; + } + + /* * Higher sequences will wait for this one so skip them. - * Don't wait for own own sequence, either. + * Don't wait for our own sequence, either. */ if (new_ctx->sequence >= ctx->sequence) continue; @@ -810,6 +828,13 @@ restart: */ spin_lock(&cil->xc_push_lock); list_for_each_entry(ctx, &cil->xc_committing, committing) { + /* + * Avoid getting stuck in this loop because we were woken by the + * shutdown, but then went back to sleep once already in the + * shutdown state. + */ + if (XLOG_FORCED_SHUTDOWN(log)) + goto out_shutdown; if (ctx->sequence > sequence) continue; if (!ctx->commit_lsn) { @@ -833,14 +858,12 @@ restart: * push sequence after the above wait loop and the CIL still contains * dirty objects. * - * When the push occurs, it will empty the CIL and - * atomically increment the currect sequence past the push sequence and - * move it into the committing list. Of course, if the CIL is clean at - * the time of the push, it won't have pushed the CIL at all, so in that - * case we should try the push for this sequence again from the start - * just in case. + * When the push occurs, it will empty the CIL and atomically increment + * the currect sequence past the push sequence and move it into the + * committing list. Of course, if the CIL is clean at the time of the + * push, it won't have pushed the CIL at all, so in that case we should + * try the push for this sequence again from the start just in case. */ - if (sequence == cil->xc_current_sequence && !list_empty(&cil->xc_cil)) { spin_unlock(&cil->xc_push_lock); @@ -849,6 +872,17 @@ restart: spin_unlock(&cil->xc_push_lock); return commit_lsn; + + /* + * We detected a shutdown in progress. We need to trigger the log force + * to pass through it's iclog state machine error handling, even though + * we are already in a shutdown state. Hence we can't return + * NULLCOMMITLSN here as that has special meaning to log forces (i.e. + * LSN is already stable), so we return a zero LSN instead. + */ +out_shutdown: + spin_unlock(&cil->xc_push_lock); + return 0; } /* _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 22:18 ` Dave Chinner @ 2014-04-28 23:00 ` Mark Tinguely 2014-04-29 12:56 ` Brian Foster 1 sibling, 0 replies; 16+ messages in thread From: Mark Tinguely @ 2014-04-28 23:00 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, Boris Ranto, xfs-oss On 04/28/14 17:18, Dave Chinner wrote: > On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote: >> > On 04/28/14 15:54, Dave Chinner wrote: >>> > >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >>>> > >>Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >>>> > >>to test for a shut down fs, lest we go down paths we'll >>>> > >>never be able to complete; Boris reported that during some >>>> > >>stress tests he had threads stuck in xlog_cil_force_lsn >>>> > >>via xfs_dir_fsync(). >>>> > >> >>>> > >>[ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080 >>>> > >> >>>> > >>[ 3663.394472] Call Trace: >>>> > >>[ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 >>>> > >>[ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] >>>> > >>[ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] >>>> > >>[ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] >>>> > >>[ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 >>>> > >>[ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 >>>> > >>[ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b >>> > > >>> > >Wow, I believe it's taken this long for us to notice that we can't >>> > >break out of xlog_cil_force_lsn() if we fail on xlog_write() >> > >from a CIL push. > .... > >> > Similar to what Jeff Liu mention in Dec: >> > >> > http://oss.sgi.com/archives/xfs/2013-12/msg00870.html > Which fell through the cracks because of objections to calling > wake_up_all(&ctx->cil->xc_commit_wait) from xlog_cil_committed(). > > FYI, I just independently wrote a patch to fix this, and part of the > fix is that it calls wake_up_all(&ctx->cil->xc_commit_wait) from > xlog_cil_committed(). The rest of the fix indicates that the above > patch wasn't sufficient. Patch below. > > This time it isn't going to fall through the cracks because I don't > think the objections are valid... > > Cheers, > > Dave. > -- I did not intend to stall out the patch. I came to like the idea of always notifying the waiters on an lsn after the iclog is successfully written out not just when we start the IO. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 22:18 ` Dave Chinner 2014-04-28 23:00 ` Mark Tinguely @ 2014-04-29 12:56 ` Brian Foster 2014-04-29 20:12 ` Dave Chinner 1 sibling, 1 reply; 16+ messages in thread From: Brian Foster @ 2014-04-29 12:56 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, Mark Tinguely, Boris Ranto, xfs-oss On Tue, Apr 29, 2014 at 08:18:49AM +1000, Dave Chinner wrote: > On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote: > > On 04/28/14 15:54, Dave Chinner wrote: > > >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: ... > > xfs: don't sleep in xlog_cil_force_lsn on shutdown > > From: Dave Chinner <dchinner@redhat.com> > > Reports of a shutdown hang when fsyncing a directory have surfaced, > such as this: > > [ 3663.394472] Call Trace: > [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 > [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] > [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] > [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] > [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 > [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 > [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b > > If we trigger a shutdown in xlog_cil_push() from xlog_write(), we > will never wake waiters on the current push sequence number, so > anything waiting in xlog_cil_force_lsn() for that push sequence > number to come up will not get woken and hence stall the shutdown. > > Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in > the push abort handling, in the log shutdown code when waking all > waiters, and adding a shutdown check in the sequence completion wait > loops to ensure they abort when a wakeup due to a shutdown occurs. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_log.c | 7 +++++-- > fs/xfs/xfs_log_cil.c | 50 ++++++++++++++++++++++++++++++++++++++++++-------- > 2 files changed, 47 insertions(+), 10 deletions(-) > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > index a5f8bd9..dbba2d7 100644 > --- a/fs/xfs/xfs_log.c > +++ b/fs/xfs/xfs_log.c > @@ -3952,11 +3952,14 @@ xfs_log_force_umount( > retval = xlog_state_ioerror(log); > spin_unlock(&log->l_icloglock); > } > + > /* > - * Wake up everybody waiting on xfs_log_force. > - * Callback all log item committed functions as if the > + * Wake up everybody waiting on xfs_log_force. This needs to wake anyone > + * waiting on a CIL push that is issued as part of a log force first > + * before running the log item committed callback functions as if the > * log writes were completed. > */ > + wake_up_all(&log->l_cilp->xc_commit_wait); > xlog_state_do_callback(log, XFS_LI_ABORTED, NULL); > Why is this necessary? It looks like xlog_state_do_callback() will hit the xlog_cil_committed() callback with the aborted flag... > #ifdef XFSERRORDEBUG > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > index 7e54553..3a68ddf 100644 > --- a/fs/xfs/xfs_log_cil.c > +++ b/fs/xfs/xfs_log_cil.c > @@ -385,7 +385,15 @@ xlog_cil_committed( > xfs_extent_busy_clear(mp, &ctx->busy_extents, > (mp->m_flags & XFS_MOUNT_DISCARD) && !abort); > > + /* > + * If we are aborting the commit, wake up anyone waiting on the > + * committing list. If we don't, then a shutdown we can leave processes > + * waiting in xlog_cil_force_lsn() waiting on a sequence commit that > + * will never happen because we aborted it. > + */ > spin_lock(&ctx->cil->xc_push_lock); > + if (abort) > + wake_up_all(&cil->xc_commit_wait); There's a compile error here. The parameter should be '&ctx->cil->...' Brian > list_del(&ctx->committing); > spin_unlock(&ctx->cil->xc_push_lock); > > @@ -564,8 +572,18 @@ restart: > spin_lock(&cil->xc_push_lock); > list_for_each_entry(new_ctx, &cil->xc_committing, committing) { > /* > + * Avoid getting stuck in this loop because we were woken by the > + * shutdown, but then went back to sleep once already in the > + * shutdown state. > + */ > + if (XLOG_FORCED_SHUTDOWN(log)) { > + spin_unlock(&cil->xc_push_lock); > + goto out_abort_free_ticket; > + } > + > + /* > * Higher sequences will wait for this one so skip them. > - * Don't wait for own own sequence, either. > + * Don't wait for our own sequence, either. > */ > if (new_ctx->sequence >= ctx->sequence) > continue; > @@ -810,6 +828,13 @@ restart: > */ > spin_lock(&cil->xc_push_lock); > list_for_each_entry(ctx, &cil->xc_committing, committing) { > + /* > + * Avoid getting stuck in this loop because we were woken by the > + * shutdown, but then went back to sleep once already in the > + * shutdown state. > + */ > + if (XLOG_FORCED_SHUTDOWN(log)) > + goto out_shutdown; > if (ctx->sequence > sequence) > continue; > if (!ctx->commit_lsn) { > @@ -833,14 +858,12 @@ restart: > * push sequence after the above wait loop and the CIL still contains > * dirty objects. > * > - * When the push occurs, it will empty the CIL and > - * atomically increment the currect sequence past the push sequence and > - * move it into the committing list. Of course, if the CIL is clean at > - * the time of the push, it won't have pushed the CIL at all, so in that > - * case we should try the push for this sequence again from the start > - * just in case. > + * When the push occurs, it will empty the CIL and atomically increment > + * the currect sequence past the push sequence and move it into the > + * committing list. Of course, if the CIL is clean at the time of the > + * push, it won't have pushed the CIL at all, so in that case we should > + * try the push for this sequence again from the start just in case. > */ > - > if (sequence == cil->xc_current_sequence && > !list_empty(&cil->xc_cil)) { > spin_unlock(&cil->xc_push_lock); > @@ -849,6 +872,17 @@ restart: > > spin_unlock(&cil->xc_push_lock); > return commit_lsn; > + > + /* > + * We detected a shutdown in progress. We need to trigger the log force > + * to pass through it's iclog state machine error handling, even though > + * we are already in a shutdown state. Hence we can't return > + * NULLCOMMITLSN here as that has special meaning to log forces (i.e. > + * LSN is already stable), so we return a zero LSN instead. > + */ > +out_shutdown: > + spin_unlock(&cil->xc_push_lock); > + return 0; > } > > /* > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-29 12:56 ` Brian Foster @ 2014-04-29 20:12 ` Dave Chinner 0 siblings, 0 replies; 16+ messages in thread From: Dave Chinner @ 2014-04-29 20:12 UTC (permalink / raw) To: Brian Foster; +Cc: Eric Sandeen, Mark Tinguely, Boris Ranto, xfs-oss On Tue, Apr 29, 2014 at 08:56:49AM -0400, Brian Foster wrote: > On Tue, Apr 29, 2014 at 08:18:49AM +1000, Dave Chinner wrote: > > On Mon, Apr 28, 2014 at 04:39:50PM -0500, Mark Tinguely wrote: > > > On 04/28/14 15:54, Dave Chinner wrote: > > > >On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: > ... > > > > xfs: don't sleep in xlog_cil_force_lsn on shutdown > > > > From: Dave Chinner <dchinner@redhat.com> > > > > Reports of a shutdown hang when fsyncing a directory have surfaced, > > such as this: > > > > [ 3663.394472] Call Trace: > > [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 > > [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] > > [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] > > [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] > > [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 > > [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 > > [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b > > > > If we trigger a shutdown in xlog_cil_push() from xlog_write(), we > > will never wake waiters on the current push sequence number, so > > anything waiting in xlog_cil_force_lsn() for that push sequence > > number to come up will not get woken and hence stall the shutdown. > > > > Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in > > the push abort handling, in the log shutdown code when waking all > > waiters, and adding a shutdown check in the sequence completion wait > > loops to ensure they abort when a wakeup due to a shutdown occurs. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > --- > > fs/xfs/xfs_log.c | 7 +++++-- > > fs/xfs/xfs_log_cil.c | 50 ++++++++++++++++++++++++++++++++++++++++++-------- > > 2 files changed, 47 insertions(+), 10 deletions(-) > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > index a5f8bd9..dbba2d7 100644 > > --- a/fs/xfs/xfs_log.c > > +++ b/fs/xfs/xfs_log.c > > @@ -3952,11 +3952,14 @@ xfs_log_force_umount( > > retval = xlog_state_ioerror(log); > > spin_unlock(&log->l_icloglock); > > } > > + > > /* > > - * Wake up everybody waiting on xfs_log_force. > > - * Callback all log item committed functions as if the > > + * Wake up everybody waiting on xfs_log_force. This needs to wake anyone > > + * waiting on a CIL push that is issued as part of a log force first > > + * before running the log item committed callback functions as if the > > * log writes were completed. > > */ > > + wake_up_all(&log->l_cilp->xc_commit_wait); > > xlog_state_do_callback(log, XFS_LI_ABORTED, NULL); > > > > Why is this necessary? It looks like xlog_state_do_callback() will hit > the xlog_cil_committed() callback with the aborted flag... I added it before adding the code into xlog_cil_committed(). Regardless, I think it's a good idea to have a specific wakeup to for waiters catch the shutdown purely from a defensive POV - if we screw up something else, we will still get a wakeup after the log is marked shutdown and not hang... > > #ifdef XFSERRORDEBUG > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c > > index 7e54553..3a68ddf 100644 > > --- a/fs/xfs/xfs_log_cil.c > > +++ b/fs/xfs/xfs_log_cil.c > > @@ -385,7 +385,15 @@ xlog_cil_committed( > > xfs_extent_busy_clear(mp, &ctx->busy_extents, > > (mp->m_flags & XFS_MOUNT_DISCARD) && !abort); > > > > + /* > > + * If we are aborting the commit, wake up anyone waiting on the > > + * committing list. If we don't, then a shutdown we can leave processes > > + * waiting in xlog_cil_force_lsn() waiting on a sequence commit that > > + * will never happen because we aborted it. > > + */ > > spin_lock(&ctx->cil->xc_push_lock); > > + if (abort) > > + wake_up_all(&cil->xc_commit_wait); > > There's a compile error here. The parameter should be '&ctx->cil->...' Forgot to refresh the patch - the tested version has this fix. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 20:54 ` Dave Chinner 2014-04-28 21:39 ` Mark Tinguely @ 2014-07-21 15:33 ` Eric Sandeen 1 sibling, 0 replies; 16+ messages in thread From: Eric Sandeen @ 2014-07-21 15:33 UTC (permalink / raw) To: Dave Chinner, Eric Sandeen; +Cc: Boris Ranto, xfs-oss On 4/28/14, 3:54 PM, Dave Chinner wrote: > On Mon, Apr 28, 2014 at 11:35:16AM -0500, Eric Sandeen wrote: >> Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs >> to test for a shut down fs, lest we go down paths we'll >> never be able to complete; Boris reported that during some >> stress tests he had threads stuck in xlog_cil_force_lsn >> via xfs_dir_fsync(). >> >> [ 3663.361709] sfsuspend-par D ffff88042f0b4540 0 3981 3947 0x00000080 >> >> [ 3663.394472] Call Trace: >> [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 >> [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] >> [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] >> [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] >> [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 >> [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 >> [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b > > Wow, I believe it's taken this long for us to notice that we can't > break out of xlog_cil_force_lsn() if we fail on xlog_write() > from a CIL push. > > I'd say that xlog_cil_force_lsn() needs log shutdown checks before > it goes to sleep in xlog_wait().... > >> Reported-by: Boris Ranto <branto@redhat.com> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> >> --- >> >> NB: While I've not asked Boris to test this yet, it seems >> clear (?) that dir_fsync should behave the same as >> file_fsync() in the face of a shut-down fs. >> >> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c >> index 4c749ab..2b94362 100644 >> --- a/fs/xfs/xfs_file.c >> +++ b/fs/xfs/xfs_file.c >> @@ -146,6 +146,9 @@ xfs_dir_fsync( >> >> trace_xfs_dir_fsync(ip); >> >> + if (XFS_FORCED_SHUTDOWN(mp)) >> + return -XFS_ERROR(EIO); >> + > > That won't hurt, but it won't fix the problem. So, you did solve the problem properly I guess, in commit ac983517ec5941da0c58cacdbad10a231dc4e001 Author: Dave Chinner <dchinner@redhat.com> Date: Wed May 7 08:05:50 2014 +1000 xfs: don't sleep in xlog_cil_force_lsn on shutdown so thanks. :) Should my patch still go in, to be consistent with file_fsync() paths? -Eric > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfs: test for shut down fs in xfs_dir_fsync() 2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen 2014-04-28 16:47 ` Christoph Hellwig 2014-04-28 20:54 ` Dave Chinner @ 2014-09-12 19:29 ` Eric Sandeen 2 siblings, 0 replies; 16+ messages in thread From: Eric Sandeen @ 2014-09-12 19:29 UTC (permalink / raw) To: Eric Sandeen, xfs-oss; +Cc: Boris Ranto On 4/28/14 11:35 AM, Eric Sandeen wrote: > Similar to xfs_file_fsync(), I think xfs_dir_fsync() needs > to test for a shut down fs, lest we go down paths we'll > never be able to complete; Boris reported that during some > stress tests he had threads stuck in xlog_cil_force_lsn > via xfs_dir_fsync(). (re-ping) So Dave, you fixed this with: So, you did solve the problem properly I guess, in commit ac983517ec5941da0c58cacdbad10a231dc4e001 Author: Dave Chinner <dchinner@redhat.com> Date: Wed May 7 08:05:50 2014 +1000 xfs: don't sleep in xlog_cil_force_lsn on shutdown But should my patch still go in, if only to be consistent with file_fsync() paths? -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-09-12 19:29 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-28 16:35 [PATCH] xfs: test for shut down fs in xfs_dir_fsync() Eric Sandeen 2014-04-28 16:47 ` Christoph Hellwig 2014-04-28 17:18 ` Eric Sandeen 2014-04-28 17:22 ` Mark Tinguely 2014-04-28 17:26 ` Eric Sandeen 2014-04-28 17:49 ` Mark Tinguely 2014-04-28 17:53 ` Eric Sandeen 2014-04-29 10:24 ` Boris Ranto 2014-04-28 20:54 ` Dave Chinner 2014-04-28 21:39 ` Mark Tinguely 2014-04-28 22:18 ` Dave Chinner 2014-04-28 23:00 ` Mark Tinguely 2014-04-29 12:56 ` Brian Foster 2014-04-29 20:12 ` Dave Chinner 2014-07-21 15:33 ` Eric Sandeen 2014-09-12 19:29 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).