* [PATCH] xfs: unmount does not wait for shutdown during unmount
@ 2014-04-10 4:42 Dave Chinner
2014-04-10 13:29 ` Mark Tinguely
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Dave Chinner @ 2014-04-10 4:42 UTC (permalink / raw)
To: xfs; +Cc: bob.mastors, snitzer
From: Dave Chinner <dchinner@redhat.com>
And interesting situation can occur if a log IO error occurs during
the unmount of a filesystem. The cases reported have the same
signature - the update of the superblock counters fails due to a log
write IO error:
XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
XFS (dm-16): xfs_log_force: error 5 returned.
XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
It can be seen that the last line of output contains a corrupt
device name - this is because the log and xfs_mount structures have
already been freed by the time this message is printed. A kernel
oops closely follows.
The issue is that the shutdown is occurring in a separate IO
completion thread to the unmount. Once the shutdown processing has
started and all the iclogs are marked with XLOG_STATE_IOERROR, the
log shutdown code wakes anyone waiting on a log force so they can
process the shutdown error. This wakes up the unmount code that
is doing a synchronous transaction to update the superblock
counters.
The unmount path now sees all the iclogs are marked with
XLOG_STATE_IOERROR and so never waits on them again, knowing that if
it does, there will not be a wakeup trigger for it and we will hang
the unmount if we do. Hence the unmount runs through all the
remaining code and frees all the filesystem structures while the
xlog_iodone() is still processing the shutdown. When the log
shutdown processing completes, xfs_do_force_shutdown() emits the
"Please umount the filesystem and rectify the problem(s)" message,
and xlog_iodone() then aborts all the objects attached to the iclog.
An iclog that has already been freed....
The real issue here is that there is no serialisation point between
the log IO and the unmount. We have serialisations points for log
writes, log forces, reservations, etc, but we don't actually have
any code that wakes for log IO to fully complete. We do that for all
other types of object, so why not iclogbufs?
Well, it turns out that we can easily do this. We've got xfs_buf
handles, and that's what everyone else uses for IO serialisation.
i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
release the lock in xlog_iodone() when we are finished with the
buffer. That way before we tear down the iclog, we can lock and
unlock the buffer to ensure IO completion has finished completely
before we tear it down.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 8497a00..08624dc 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1181,11 +1181,14 @@ xlog_iodone(xfs_buf_t *bp)
/* log I/O is always issued ASYNC */
ASSERT(XFS_BUF_ISASYNC(bp));
xlog_state_done_syncing(iclog, aborted);
+
/*
- * do not reference the buffer (bp) here as we could race
- * with it being freed after writing the unmount record to the
- * log.
+ * drop the buffer lock now that we are done. Nothing references
+ * the buffer after this, so an unmount waiting on this lock can now
+ * tear it down safely. As such, it is unsafe to reference the buffer
+ * (bp) after the unlock as we could race with it being freed.
*/
+ xfs_buf_unlock(bp);
}
/*
@@ -1368,8 +1371,16 @@ xlog_alloc_log(
bp = xfs_buf_alloc(mp->m_logdev_targp, 0, BTOBB(log->l_iclog_size), 0);
if (!bp)
goto out_free_log;
- bp->b_iodone = xlog_iodone;
+
+ /*
+ * The iclogbuf buffer locks are held over IO but we are not going to do
+ * IO yet. Hence unlock the buffer so that the log IO path can grab it
+ * when appropriately.
+ */
ASSERT(xfs_buf_islocked(bp));
+ xfs_buf_unlock(bp);
+
+ bp->b_iodone = xlog_iodone;
log->l_xbuf = bp;
spin_lock_init(&log->l_icloglock);
@@ -1398,6 +1409,9 @@ xlog_alloc_log(
if (!bp)
goto out_free_iclog;
+ ASSERT(xfs_buf_islocked(bp));
+ xfs_buf_unlock(bp);
+
bp->b_iodone = xlog_iodone;
iclog->ic_bp = bp;
iclog->ic_data = bp->b_addr;
@@ -1422,7 +1436,6 @@ xlog_alloc_log(
iclog->ic_callback_tail = &(iclog->ic_callback);
iclog->ic_datap = (char *)iclog->ic_data + log->l_iclog_hsize;
- ASSERT(xfs_buf_islocked(iclog->ic_bp));
init_waitqueue_head(&iclog->ic_force_wait);
init_waitqueue_head(&iclog->ic_write_wait);
@@ -1631,6 +1644,12 @@ xlog_cksum(
* we transition the iclogs to IOERROR state *after* flushing all existing
* iclogs to disk. This is because we don't want anymore new transactions to be
* started or completed afterwards.
+ *
+ * We lock the iclogbufs here so that we can serialise against IO completion
+ * during unmount. We might be processing a shutdown triggered during unmount,
+ * and that can occur asynchronously to the unmount thread, and hence we need to
+ * ensure that completes before tearing down the iclogbufs. Hence we need to
+ * hold the buffer lock across the log IO to acheive that.
*/
STATIC int
xlog_bdstrat(
@@ -1638,6 +1657,7 @@ xlog_bdstrat(
{
struct xlog_in_core *iclog = bp->b_fspriv;
+ xfs_buf_lock(bp);
if (iclog->ic_state & XLOG_STATE_IOERROR) {
xfs_buf_ioerror(bp, EIO);
xfs_buf_stale(bp);
@@ -1645,7 +1665,8 @@ xlog_bdstrat(
/*
* It would seem logical to return EIO here, but we rely on
* the log state machine to propagate I/O errors instead of
- * doing it here.
+ * doing it here. Similarly, IO completion will unlock the
+ * buffer, so we don't do it here.
*/
return 0;
}
@@ -1847,14 +1868,28 @@ xlog_dealloc_log(
xlog_cil_destroy(log);
/*
- * always need to ensure that the extra buffer does not point to memory
- * owned by another log buffer before we free it.
+ * Cycle all the iclogbuf locks to make sure all log IO completion
+ * is done before we tear down these buffers.
*/
+ iclog = log->l_iclog;
+ for (i = 0; i < log->l_iclog_bufs; i++) {
+ xfs_buf_lock(iclog->ic_bp);
+ xfs_buf_unlock(iclog->ic_bp);
+ iclog = iclog->ic_next;
+ }
+
+ /*
+ * Always need to ensure that the extra buffer does not point to memory
+ * owned by another log buffer before we free it. Also, cycle the lock
+ * first to ensure we've completed IO on it.
+ */
+ xfs_buf_lock(log->l_xbuf);
+ xfs_buf_unlock(log->l_xbuf);
xfs_buf_set_empty(log->l_xbuf, BTOBB(log->l_iclog_size));
xfs_buf_free(log->l_xbuf);
iclog = log->l_iclog;
- for (i=0; i<log->l_iclog_bufs; i++) {
+ for (i = 0; i < log->l_iclog_bufs; i++) {
xfs_buf_free(iclog->ic_bp);
next_iclog = iclog->ic_next;
kmem_free(iclog);
--
1.9.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-10 4:42 [PATCH] xfs: unmount does not wait for shutdown during unmount Dave Chinner
@ 2014-04-10 13:29 ` Mark Tinguely
2014-04-10 21:52 ` Dave Chinner
2014-04-10 16:25 ` Mike Snitzer
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Mark Tinguely @ 2014-04-10 13:29 UTC (permalink / raw)
To: Dave Chinner; +Cc: bob.mastors, snitzer, xfs
On 04/09/14 23:42, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> And interesting situation can occur if a log IO error occurs during
> the unmount of a filesystem. The cases reported have the same
> signature - the update of the superblock counters fails due to a log
> write IO error:
>
> XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> XFS (dm-16): xfs_log_force: error 5 returned.
> XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
>
> It can be seen that the last line of output contains a corrupt
> device name - this is because the log and xfs_mount structures have
> already been freed by the time this message is printed. A kernel
> oops closely follows.
>
> The issue is that the shutdown is occurring in a separate IO
> completion thread to the unmount. Once the shutdown processing has
> started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> log shutdown code wakes anyone waiting on a log force so they can
> process the shutdown error. This wakes up the unmount code that
> is doing a synchronous transaction to update the superblock
> counters.
>
> The unmount path now sees all the iclogs are marked with
> XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> it does, there will not be a wakeup trigger for it and we will hang
> the unmount if we do. Hence the unmount runs through all the
> remaining code and frees all the filesystem structures while the
> xlog_iodone() is still processing the shutdown. When the log
> shutdown processing completes, xfs_do_force_shutdown() emits the
> "Please umount the filesystem and rectify the problem(s)" message,
> and xlog_iodone() then aborts all the objects attached to the iclog.
> An iclog that has already been freed....
>
> The real issue here is that there is no serialisation point between
> the log IO and the unmount. We have serialisations points for log
> writes, log forces, reservations, etc, but we don't actually have
> any code that wakes for log IO to fully complete. We do that for all
> other types of object, so why not iclogbufs?
>
> Well, it turns out that we can easily do this. We've got xfs_buf
> handles, and that's what everyone else uses for IO serialisation.
> i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> release the lock in xlog_iodone() when we are finished with the
> buffer. That way before we tear down the iclog, we can lock and
> unlock the buffer to ensure IO completion has finished completely
> before we tear it down.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
The wait queue "xc_commit_wait" is used for two purposes, first to start
the next ic_log buffer completion and also to wake those waiting for a
syncronous event. Shutdown does syncronous cil pushes but it does not
wait for the IO to happen.
Why not wait for the IO to happen or fail before waking out the sync
waiters? If you want to keep the speedier completion of the next cil
push add another wait queue. There only a few (typically 8) per filesystem.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: xfs: unmount does not wait for shutdown during unmount
2014-04-10 4:42 [PATCH] xfs: unmount does not wait for shutdown during unmount Dave Chinner
2014-04-10 13:29 ` Mark Tinguely
@ 2014-04-10 16:25 ` Mike Snitzer
2014-04-11 19:21 ` [PATCH] " Bob Mastors
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2014-04-10 16:25 UTC (permalink / raw)
To: Dave Chinner; +Cc: bob.mastors, xfs
On Thu, Apr 10 2014 at 12:42am -0400,
Dave Chinner <david@fromorbit.com> wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> And interesting situation can occur if a log IO error occurs during
> the unmount of a filesystem. The cases reported have the same
> signature - the update of the superblock counters fails due to a log
> write IO error:
>
> XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> XFS (dm-16): xfs_log_force: error 5 returned.
> XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
>
> It can be seen that the last line of output contains a corrupt
> device name - this is because the log and xfs_mount structures have
> already been freed by the time this message is printed. A kernel
> oops closely follows.
>
> The issue is that the shutdown is occurring in a separate IO
> completion thread to the unmount. Once the shutdown processing has
> started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> log shutdown code wakes anyone waiting on a log force so they can
> process the shutdown error. This wakes up the unmount code that
> is doing a synchronous transaction to update the superblock
> counters.
>
> The unmount path now sees all the iclogs are marked with
> XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> it does, there will not be a wakeup trigger for it and we will hang
> the unmount if we do. Hence the unmount runs through all the
> remaining code and frees all the filesystem structures while the
> xlog_iodone() is still processing the shutdown. When the log
> shutdown processing completes, xfs_do_force_shutdown() emits the
> "Please umount the filesystem and rectify the problem(s)" message,
> and xlog_iodone() then aborts all the objects attached to the iclog.
> An iclog that has already been freed....
>
> The real issue here is that there is no serialisation point between
> the log IO and the unmount. We have serialisations points for log
> writes, log forces, reservations, etc, but we don't actually have
> any code that wakes for log IO to fully complete. We do that for all
> other types of object, so why not iclogbufs?
>
> Well, it turns out that we can easily do this. We've got xfs_buf
> handles, and that's what everyone else uses for IO serialisation.
> i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> release the lock in xlog_iodone() when we are finished with the
> buffer. That way before we tear down the iclog, we can lock and
> unlock the buffer to ensure IO completion has finished completely
> before we tear it down.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Works for the dm-thinp test-case that was failing, thanks Dave!
Tested-by: Mike Snitzer <snitzer@redhat.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-10 13:29 ` Mark Tinguely
@ 2014-04-10 21:52 ` Dave Chinner
0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2014-04-10 21:52 UTC (permalink / raw)
To: Mark Tinguely; +Cc: bob.mastors, snitzer, xfs
On Thu, Apr 10, 2014 at 08:29:00AM -0500, Mark Tinguely wrote:
> On 04/09/14 23:42, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >And interesting situation can occur if a log IO error occurs during
> >the unmount of a filesystem. The cases reported have the same
> >signature - the update of the superblock counters fails due to a log
> >write IO error:
> >
> >XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> >XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> >XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> >XFS (dm-16): xfs_log_force: error 5 returned.
> >XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
> >
> >It can be seen that the last line of output contains a corrupt
> >device name - this is because the log and xfs_mount structures have
> >already been freed by the time this message is printed. A kernel
> >oops closely follows.
> >
> >The issue is that the shutdown is occurring in a separate IO
> >completion thread to the unmount. Once the shutdown processing has
> >started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> >log shutdown code wakes anyone waiting on a log force so they can
> >process the shutdown error. This wakes up the unmount code that
> >is doing a synchronous transaction to update the superblock
> >counters.
> >
> >The unmount path now sees all the iclogs are marked with
> >XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> >it does, there will not be a wakeup trigger for it and we will hang
> >the unmount if we do. Hence the unmount runs through all the
> >remaining code and frees all the filesystem structures while the
> >xlog_iodone() is still processing the shutdown. When the log
> >shutdown processing completes, xfs_do_force_shutdown() emits the
> >"Please umount the filesystem and rectify the problem(s)" message,
> >and xlog_iodone() then aborts all the objects attached to the iclog.
> >An iclog that has already been freed....
> >
> >The real issue here is that there is no serialisation point between
> >the log IO and the unmount. We have serialisations points for log
> >writes, log forces, reservations, etc, but we don't actually have
> >any code that wakes for log IO to fully complete. We do that for all
> >other types of object, so why not iclogbufs?
> >
> >Well, it turns out that we can easily do this. We've got xfs_buf
> >handles, and that's what everyone else uses for IO serialisation.
> >i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> >release the lock in xlog_iodone() when we are finished with the
> >buffer. That way before we tear down the iclog, we can lock and
> >unlock the buffer to ensure IO completion has finished completely
> >before we tear it down.
> >
> >Signed-off-by: Dave Chinner<dchinner@redhat.com>
>
> The wait queue "xc_commit_wait" is used for two purposes, first to
> start the next ic_log buffer completion and also to wake those
> waiting for a syncronous event. Shutdown does syncronous cil pushes
> but it does not wait for the IO to happen.
First of all, shutdown does not do a CIL push if the source of the
shutdown is a log error, as is this case.
Secondly, shutdown can't wait for log IO completion because it can
be run from log IO completion, as is this case.
Thirdly, xc_commit_wait does not guarantee that *other* iclogbufs
are not under IO, just that specific CIL push is complete. Indeed,
we could be writing the unmount record, which writes to the
iclogbufs and issues IO directly and hence cannot be waited on by a
xc_commit_wait.
If you want to wait on iclogbuf state machine transitions (i.e. when
it considers the iclogbuf to be clean and active), that's what the
iclog->ic_write_wait wait queue is for. But even that is not
sufficient for our purposes, because the IO completion code
references the iclog after it has woken waiters during state machine
transitions.
Finally, the issue is that the superblock counter update transaction
is blocked in _xfs_log_force_lsn() as it is a synchronous
transaction. This is after the CIL has been pushed and now we are
waiting on iclogbuf IO completion blocked on the the
iclog->ic_force_wait wait queue. The last thing
xfs_log_force_unmount() does in a shutdown - after the state
machines are moved to error states - is wake up any waiters on the
iclog->ic_force_wait wait queues. That's where the "xfs_log_force:
error 5" message comes from - the sync transaction failing the
transaction commit.
Again, this happens before the iclogbuf IO completion processing is
finished, and hence we cannot use any of the existing wait queue
mechanisms to wait for iclogbuf IO completion to be finished. Hence,
a new mechanism is required, and that's what the patch provides.
> Why not wait for the IO to happen or fail
> before waking out the sync waiters?
How? You can't wait for IO completion when it's IO completion that
has detected the error and is running the shutdown.
> If you want to keep the speedier completion of the next cil
> push add another wait queue. There only a few (typically 8) per
> filesystem.
I don't follow you. What does this have to do with ensuring
iclogbufs are not still in use when we free them?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-10 4:42 [PATCH] xfs: unmount does not wait for shutdown during unmount Dave Chinner
2014-04-10 13:29 ` Mark Tinguely
2014-04-10 16:25 ` Mike Snitzer
@ 2014-04-11 19:21 ` Bob Mastors
2014-04-14 8:21 ` Dave Chinner
2014-04-14 19:28 ` Brian Foster
4 siblings, 0 replies; 10+ messages in thread
From: Bob Mastors @ 2014-04-11 19:21 UTC (permalink / raw)
To: Dave Chinner; +Cc: snitzer, xfs
[-- Attachment #1.1: Type: text/plain, Size: 8241 bytes --]
Thanks Dave.
This fixed the umount problem I was seeing.
Tried it a few thousand times. Works great.
Bob
On Wed, Apr 9, 2014 at 10:42 PM, Dave Chinner <david@fromorbit.com> wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> And interesting situation can occur if a log IO error occurs during
> the unmount of a filesystem. The cases reported have the same
> signature - the update of the superblock counters fails due to a log
> write IO error:
>
> XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file
> fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> XFS (dm-16): Unable to update superblock counters. Freespace may not be
> correct on next mount.
> XFS (dm-16): xfs_log_force: error 5 returned.
> XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
>
> It can be seen that the last line of output contains a corrupt
> device name - this is because the log and xfs_mount structures have
> already been freed by the time this message is printed. A kernel
> oops closely follows.
>
> The issue is that the shutdown is occurring in a separate IO
> completion thread to the unmount. Once the shutdown processing has
> started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> log shutdown code wakes anyone waiting on a log force so they can
> process the shutdown error. This wakes up the unmount code that
> is doing a synchronous transaction to update the superblock
> counters.
>
> The unmount path now sees all the iclogs are marked with
> XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> it does, there will not be a wakeup trigger for it and we will hang
> the unmount if we do. Hence the unmount runs through all the
> remaining code and frees all the filesystem structures while the
> xlog_iodone() is still processing the shutdown. When the log
> shutdown processing completes, xfs_do_force_shutdown() emits the
> "Please umount the filesystem and rectify the problem(s)" message,
> and xlog_iodone() then aborts all the objects attached to the iclog.
> An iclog that has already been freed....
>
> The real issue here is that there is no serialisation point between
> the log IO and the unmount. We have serialisations points for log
> writes, log forces, reservations, etc, but we don't actually have
> any code that wakes for log IO to fully complete. We do that for all
> other types of object, so why not iclogbufs?
>
> Well, it turns out that we can easily do this. We've got xfs_buf
> handles, and that's what everyone else uses for IO serialisation.
> i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> release the lock in xlog_iodone() when we are finished with the
> buffer. That way before we tear down the iclog, we can lock and
> unlock the buffer to ensure IO completion has finished completely
> before we tear it down.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 53
> ++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 44 insertions(+), 9 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 8497a00..08624dc 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1181,11 +1181,14 @@ xlog_iodone(xfs_buf_t *bp)
> /* log I/O is always issued ASYNC */
> ASSERT(XFS_BUF_ISASYNC(bp));
> xlog_state_done_syncing(iclog, aborted);
> +
> /*
> - * do not reference the buffer (bp) here as we could race
> - * with it being freed after writing the unmount record to the
> - * log.
> + * drop the buffer lock now that we are done. Nothing references
> + * the buffer after this, so an unmount waiting on this lock can
> now
> + * tear it down safely. As such, it is unsafe to reference the
> buffer
> + * (bp) after the unlock as we could race with it being freed.
> */
> + xfs_buf_unlock(bp);
> }
>
> /*
> @@ -1368,8 +1371,16 @@ xlog_alloc_log(
> bp = xfs_buf_alloc(mp->m_logdev_targp, 0,
> BTOBB(log->l_iclog_size), 0);
> if (!bp)
> goto out_free_log;
> - bp->b_iodone = xlog_iodone;
> +
> + /*
> + * The iclogbuf buffer locks are held over IO but we are not going
> to do
> + * IO yet. Hence unlock the buffer so that the log IO path can
> grab it
> + * when appropriately.
> + */
> ASSERT(xfs_buf_islocked(bp));
> + xfs_buf_unlock(bp);
> +
> + bp->b_iodone = xlog_iodone;
> log->l_xbuf = bp;
>
> spin_lock_init(&log->l_icloglock);
> @@ -1398,6 +1409,9 @@ xlog_alloc_log(
> if (!bp)
> goto out_free_iclog;
>
> + ASSERT(xfs_buf_islocked(bp));
> + xfs_buf_unlock(bp);
> +
> bp->b_iodone = xlog_iodone;
> iclog->ic_bp = bp;
> iclog->ic_data = bp->b_addr;
> @@ -1422,7 +1436,6 @@ xlog_alloc_log(
> iclog->ic_callback_tail = &(iclog->ic_callback);
> iclog->ic_datap = (char *)iclog->ic_data +
> log->l_iclog_hsize;
>
> - ASSERT(xfs_buf_islocked(iclog->ic_bp));
> init_waitqueue_head(&iclog->ic_force_wait);
> init_waitqueue_head(&iclog->ic_write_wait);
>
> @@ -1631,6 +1644,12 @@ xlog_cksum(
> * we transition the iclogs to IOERROR state *after* flushing all existing
> * iclogs to disk. This is because we don't want anymore new transactions
> to be
> * started or completed afterwards.
> + *
> + * We lock the iclogbufs here so that we can serialise against IO
> completion
> + * during unmount. We might be processing a shutdown triggered during
> unmount,
> + * and that can occur asynchronously to the unmount thread, and hence we
> need to
> + * ensure that completes before tearing down the iclogbufs. Hence we need
> to
> + * hold the buffer lock across the log IO to acheive that.
> */
> STATIC int
> xlog_bdstrat(
> @@ -1638,6 +1657,7 @@ xlog_bdstrat(
> {
> struct xlog_in_core *iclog = bp->b_fspriv;
>
> + xfs_buf_lock(bp);
> if (iclog->ic_state & XLOG_STATE_IOERROR) {
> xfs_buf_ioerror(bp, EIO);
> xfs_buf_stale(bp);
> @@ -1645,7 +1665,8 @@ xlog_bdstrat(
> /*
> * It would seem logical to return EIO here, but we rely on
> * the log state machine to propagate I/O errors instead of
> - * doing it here.
> + * doing it here. Similarly, IO completion will unlock the
> + * buffer, so we don't do it here.
> */
> return 0;
> }
> @@ -1847,14 +1868,28 @@ xlog_dealloc_log(
> xlog_cil_destroy(log);
>
> /*
> - * always need to ensure that the extra buffer does not point to
> memory
> - * owned by another log buffer before we free it.
> + * Cycle all the iclogbuf locks to make sure all log IO completion
> + * is done before we tear down these buffers.
> */
> + iclog = log->l_iclog;
> + for (i = 0; i < log->l_iclog_bufs; i++) {
> + xfs_buf_lock(iclog->ic_bp);
> + xfs_buf_unlock(iclog->ic_bp);
> + iclog = iclog->ic_next;
> + }
> +
> + /*
> + * Always need to ensure that the extra buffer does not point to
> memory
> + * owned by another log buffer before we free it. Also, cycle the
> lock
> + * first to ensure we've completed IO on it.
> + */
> + xfs_buf_lock(log->l_xbuf);
> + xfs_buf_unlock(log->l_xbuf);
> xfs_buf_set_empty(log->l_xbuf, BTOBB(log->l_iclog_size));
> xfs_buf_free(log->l_xbuf);
>
> iclog = log->l_iclog;
> - for (i=0; i<log->l_iclog_bufs; i++) {
> + for (i = 0; i < log->l_iclog_bufs; i++) {
> xfs_buf_free(iclog->ic_bp);
> next_iclog = iclog->ic_next;
> kmem_free(iclog);
> --
> 1.9.0
>
>
[-- Attachment #1.2: Type: text/html, Size: 9208 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-10 4:42 [PATCH] xfs: unmount does not wait for shutdown during unmount Dave Chinner
` (2 preceding siblings ...)
2014-04-11 19:21 ` [PATCH] " Bob Mastors
@ 2014-04-14 8:21 ` Dave Chinner
2014-04-14 19:28 ` Brian Foster
4 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2014-04-14 8:21 UTC (permalink / raw)
To: xfs; +Cc: bob.mastors, snitzer
ping?
On Thu, Apr 10, 2014 at 02:42:35PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> And interesting situation can occur if a log IO error occurs during
> the unmount of a filesystem. The cases reported have the same
> signature - the update of the superblock counters fails due to a log
> write IO error:
>
> XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> XFS (dm-16): xfs_log_force: error 5 returned.
> XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
>
> It can be seen that the last line of output contains a corrupt
> device name - this is because the log and xfs_mount structures have
> already been freed by the time this message is printed. A kernel
> oops closely follows.
>
> The issue is that the shutdown is occurring in a separate IO
> completion thread to the unmount. Once the shutdown processing has
> started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> log shutdown code wakes anyone waiting on a log force so they can
> process the shutdown error. This wakes up the unmount code that
> is doing a synchronous transaction to update the superblock
> counters.
>
> The unmount path now sees all the iclogs are marked with
> XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> it does, there will not be a wakeup trigger for it and we will hang
> the unmount if we do. Hence the unmount runs through all the
> remaining code and frees all the filesystem structures while the
> xlog_iodone() is still processing the shutdown. When the log
> shutdown processing completes, xfs_do_force_shutdown() emits the
> "Please umount the filesystem and rectify the problem(s)" message,
> and xlog_iodone() then aborts all the objects attached to the iclog.
> An iclog that has already been freed....
>
> The real issue here is that there is no serialisation point between
> the log IO and the unmount. We have serialisations points for log
> writes, log forces, reservations, etc, but we don't actually have
> any code that wakes for log IO to fully complete. We do that for all
> other types of object, so why not iclogbufs?
>
> Well, it turns out that we can easily do this. We've got xfs_buf
> handles, and that's what everyone else uses for IO serialisation.
> i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> release the lock in xlog_iodone() when we are finished with the
> buffer. That way before we tear down the iclog, we can lock and
> unlock the buffer to ensure IO completion has finished completely
> before we tear it down.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 44 insertions(+), 9 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 8497a00..08624dc 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1181,11 +1181,14 @@ xlog_iodone(xfs_buf_t *bp)
> /* log I/O is always issued ASYNC */
> ASSERT(XFS_BUF_ISASYNC(bp));
> xlog_state_done_syncing(iclog, aborted);
> +
> /*
> - * do not reference the buffer (bp) here as we could race
> - * with it being freed after writing the unmount record to the
> - * log.
> + * drop the buffer lock now that we are done. Nothing references
> + * the buffer after this, so an unmount waiting on this lock can now
> + * tear it down safely. As such, it is unsafe to reference the buffer
> + * (bp) after the unlock as we could race with it being freed.
> */
> + xfs_buf_unlock(bp);
> }
>
> /*
> @@ -1368,8 +1371,16 @@ xlog_alloc_log(
> bp = xfs_buf_alloc(mp->m_logdev_targp, 0, BTOBB(log->l_iclog_size), 0);
> if (!bp)
> goto out_free_log;
> - bp->b_iodone = xlog_iodone;
> +
> + /*
> + * The iclogbuf buffer locks are held over IO but we are not going to do
> + * IO yet. Hence unlock the buffer so that the log IO path can grab it
> + * when appropriately.
> + */
> ASSERT(xfs_buf_islocked(bp));
> + xfs_buf_unlock(bp);
> +
> + bp->b_iodone = xlog_iodone;
> log->l_xbuf = bp;
>
> spin_lock_init(&log->l_icloglock);
> @@ -1398,6 +1409,9 @@ xlog_alloc_log(
> if (!bp)
> goto out_free_iclog;
>
> + ASSERT(xfs_buf_islocked(bp));
> + xfs_buf_unlock(bp);
> +
> bp->b_iodone = xlog_iodone;
> iclog->ic_bp = bp;
> iclog->ic_data = bp->b_addr;
> @@ -1422,7 +1436,6 @@ xlog_alloc_log(
> iclog->ic_callback_tail = &(iclog->ic_callback);
> iclog->ic_datap = (char *)iclog->ic_data + log->l_iclog_hsize;
>
> - ASSERT(xfs_buf_islocked(iclog->ic_bp));
> init_waitqueue_head(&iclog->ic_force_wait);
> init_waitqueue_head(&iclog->ic_write_wait);
>
> @@ -1631,6 +1644,12 @@ xlog_cksum(
> * we transition the iclogs to IOERROR state *after* flushing all existing
> * iclogs to disk. This is because we don't want anymore new transactions to be
> * started or completed afterwards.
> + *
> + * We lock the iclogbufs here so that we can serialise against IO completion
> + * during unmount. We might be processing a shutdown triggered during unmount,
> + * and that can occur asynchronously to the unmount thread, and hence we need to
> + * ensure that completes before tearing down the iclogbufs. Hence we need to
> + * hold the buffer lock across the log IO to acheive that.
> */
> STATIC int
> xlog_bdstrat(
> @@ -1638,6 +1657,7 @@ xlog_bdstrat(
> {
> struct xlog_in_core *iclog = bp->b_fspriv;
>
> + xfs_buf_lock(bp);
> if (iclog->ic_state & XLOG_STATE_IOERROR) {
> xfs_buf_ioerror(bp, EIO);
> xfs_buf_stale(bp);
> @@ -1645,7 +1665,8 @@ xlog_bdstrat(
> /*
> * It would seem logical to return EIO here, but we rely on
> * the log state machine to propagate I/O errors instead of
> - * doing it here.
> + * doing it here. Similarly, IO completion will unlock the
> + * buffer, so we don't do it here.
> */
> return 0;
> }
> @@ -1847,14 +1868,28 @@ xlog_dealloc_log(
> xlog_cil_destroy(log);
>
> /*
> - * always need to ensure that the extra buffer does not point to memory
> - * owned by another log buffer before we free it.
> + * Cycle all the iclogbuf locks to make sure all log IO completion
> + * is done before we tear down these buffers.
> */
> + iclog = log->l_iclog;
> + for (i = 0; i < log->l_iclog_bufs; i++) {
> + xfs_buf_lock(iclog->ic_bp);
> + xfs_buf_unlock(iclog->ic_bp);
> + iclog = iclog->ic_next;
> + }
> +
> + /*
> + * Always need to ensure that the extra buffer does not point to memory
> + * owned by another log buffer before we free it. Also, cycle the lock
> + * first to ensure we've completed IO on it.
> + */
> + xfs_buf_lock(log->l_xbuf);
> + xfs_buf_unlock(log->l_xbuf);
> xfs_buf_set_empty(log->l_xbuf, BTOBB(log->l_iclog_size));
> xfs_buf_free(log->l_xbuf);
>
> iclog = log->l_iclog;
> - for (i=0; i<log->l_iclog_bufs; i++) {
> + for (i = 0; i < log->l_iclog_bufs; i++) {
> xfs_buf_free(iclog->ic_bp);
> next_iclog = iclog->ic_next;
> kmem_free(iclog);
> --
> 1.9.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-10 4:42 [PATCH] xfs: unmount does not wait for shutdown during unmount Dave Chinner
` (3 preceding siblings ...)
2014-04-14 8:21 ` Dave Chinner
@ 2014-04-14 19:28 ` Brian Foster
2014-04-15 2:15 ` Dave Chinner
4 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2014-04-14 19:28 UTC (permalink / raw)
To: Dave Chinner; +Cc: bob.mastors, snitzer, xfs
On Thu, Apr 10, 2014 at 02:42:35PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> And interesting situation can occur if a log IO error occurs during
> the unmount of a filesystem. The cases reported have the same
> signature - the update of the superblock counters fails due to a log
> write IO error:
>
> XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> XFS (dm-16): xfs_log_force: error 5 returned.
> XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
>
> It can be seen that the last line of output contains a corrupt
> device name - this is because the log and xfs_mount structures have
> already been freed by the time this message is printed. A kernel
> oops closely follows.
>
> The issue is that the shutdown is occurring in a separate IO
> completion thread to the unmount. Once the shutdown processing has
> started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> log shutdown code wakes anyone waiting on a log force so they can
> process the shutdown error. This wakes up the unmount code that
> is doing a synchronous transaction to update the superblock
> counters.
>
> The unmount path now sees all the iclogs are marked with
> XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> it does, there will not be a wakeup trigger for it and we will hang
> the unmount if we do. Hence the unmount runs through all the
> remaining code and frees all the filesystem structures while the
> xlog_iodone() is still processing the shutdown. When the log
> shutdown processing completes, xfs_do_force_shutdown() emits the
> "Please umount the filesystem and rectify the problem(s)" message,
> and xlog_iodone() then aborts all the objects attached to the iclog.
> An iclog that has already been freed....
>
> The real issue here is that there is no serialisation point between
> the log IO and the unmount. We have serialisations points for log
> writes, log forces, reservations, etc, but we don't actually have
> any code that wakes for log IO to fully complete. We do that for all
> other types of object, so why not iclogbufs?
>
> Well, it turns out that we can easily do this. We've got xfs_buf
> handles, and that's what everyone else uses for IO serialisation.
> i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> release the lock in xlog_iodone() when we are finished with the
> buffer. That way before we tear down the iclog, we can lock and
> unlock the buffer to ensure IO completion has finished completely
> before we tear it down.
>
Thanks for the write up...
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_log.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 44 insertions(+), 9 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 8497a00..08624dc 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1181,11 +1181,14 @@ xlog_iodone(xfs_buf_t *bp)
> /* log I/O is always issued ASYNC */
> ASSERT(XFS_BUF_ISASYNC(bp));
> xlog_state_done_syncing(iclog, aborted);
> +
> /*
> - * do not reference the buffer (bp) here as we could race
> - * with it being freed after writing the unmount record to the
> - * log.
> + * drop the buffer lock now that we are done. Nothing references
> + * the buffer after this, so an unmount waiting on this lock can now
> + * tear it down safely. As such, it is unsafe to reference the buffer
> + * (bp) after the unlock as we could race with it being freed.
> */
> + xfs_buf_unlock(bp);
> }
>
> /*
> @@ -1368,8 +1371,16 @@ xlog_alloc_log(
> bp = xfs_buf_alloc(mp->m_logdev_targp, 0, BTOBB(log->l_iclog_size), 0);
> if (!bp)
> goto out_free_log;
> - bp->b_iodone = xlog_iodone;
> +
> + /*
> + * The iclogbuf buffer locks are held over IO but we are not going to do
> + * IO yet. Hence unlock the buffer so that the log IO path can grab it
> + * when appropriately.
> + */
> ASSERT(xfs_buf_islocked(bp));
> + xfs_buf_unlock(bp);
> +
> + bp->b_iodone = xlog_iodone;
> log->l_xbuf = bp;
>
> spin_lock_init(&log->l_icloglock);
> @@ -1398,6 +1409,9 @@ xlog_alloc_log(
> if (!bp)
> goto out_free_iclog;
>
> + ASSERT(xfs_buf_islocked(bp));
> + xfs_buf_unlock(bp);
> +
> bp->b_iodone = xlog_iodone;
> iclog->ic_bp = bp;
> iclog->ic_data = bp->b_addr;
> @@ -1422,7 +1436,6 @@ xlog_alloc_log(
> iclog->ic_callback_tail = &(iclog->ic_callback);
> iclog->ic_datap = (char *)iclog->ic_data + log->l_iclog_hsize;
>
> - ASSERT(xfs_buf_islocked(iclog->ic_bp));
> init_waitqueue_head(&iclog->ic_force_wait);
> init_waitqueue_head(&iclog->ic_write_wait);
>
> @@ -1631,6 +1644,12 @@ xlog_cksum(
> * we transition the iclogs to IOERROR state *after* flushing all existing
> * iclogs to disk. This is because we don't want anymore new transactions to be
> * started or completed afterwards.
> + *
> + * We lock the iclogbufs here so that we can serialise against IO completion
> + * during unmount. We might be processing a shutdown triggered during unmount,
> + * and that can occur asynchronously to the unmount thread, and hence we need to
> + * ensure that completes before tearing down the iclogbufs. Hence we need to
> + * hold the buffer lock across the log IO to acheive that.
> */
> STATIC int
> xlog_bdstrat(
> @@ -1638,6 +1657,7 @@ xlog_bdstrat(
> {
> struct xlog_in_core *iclog = bp->b_fspriv;
>
> + xfs_buf_lock(bp);
> if (iclog->ic_state & XLOG_STATE_IOERROR) {
> xfs_buf_ioerror(bp, EIO);
> xfs_buf_stale(bp);
> @@ -1645,7 +1665,8 @@ xlog_bdstrat(
> /*
> * It would seem logical to return EIO here, but we rely on
> * the log state machine to propagate I/O errors instead of
> - * doing it here.
> + * doing it here. Similarly, IO completion will unlock the
> + * buffer, so we don't do it here.
> */
> return 0;
> }
> @@ -1847,14 +1868,28 @@ xlog_dealloc_log(
> xlog_cil_destroy(log);
>
> /*
> - * always need to ensure that the extra buffer does not point to memory
> - * owned by another log buffer before we free it.
> + * Cycle all the iclogbuf locks to make sure all log IO completion
> + * is done before we tear down these buffers.
> */
> + iclog = log->l_iclog;
> + for (i = 0; i < log->l_iclog_bufs; i++) {
> + xfs_buf_lock(iclog->ic_bp);
> + xfs_buf_unlock(iclog->ic_bp);
> + iclog = iclog->ic_next;
> + }
> +
> + /*
> + * Always need to ensure that the extra buffer does not point to memory
> + * owned by another log buffer before we free it. Also, cycle the lock
> + * first to ensure we've completed IO on it.
> + */
> + xfs_buf_lock(log->l_xbuf);
> + xfs_buf_unlock(log->l_xbuf);
> xfs_buf_set_empty(log->l_xbuf, BTOBB(log->l_iclog_size));
> xfs_buf_free(log->l_xbuf);
>
> iclog = log->l_iclog;
> - for (i=0; i<log->l_iclog_bufs; i++) {
> + for (i = 0; i < log->l_iclog_bufs; i++) {
> xfs_buf_free(iclog->ic_bp);
> next_iclog = iclog->ic_next;
> kmem_free(iclog);
On reading the code, my initial thought was that the source of this is
the xlog_state_do_callback() call down in the shutdown path, when
invoked from the log I/O completion handler. I think you pointed out in
your previous reply that even if we were to make that call selective
(e.g., based on whether the shutdown is due to a log error and thus we
can expect xlog_state_do_callback() to be invoked), we still access
relevant data structures after the ic_force_wait wait_queue is woken.
Therefore, there would still be a race even if we bypassed the call from
within the shutdown path in this particular case.
The logic seems sane to me. I don't notice any issues. But my only
question is why the use of locking, as opposed to wiring up use of
b_iowait or something into the log I/O handler? I ask because it looks
just a _bit_ funny to see the lock/unlock cycles used purely as a
serialization mechanism. Do we use this kind of pattern in other places?
I guess on the other hand you could argue it protects the I/O in
progress, and yet another wait_queue in this codepath might be overkill
(so I like the use of an existing mechanism from that standpoint).
Brian
> --
> 1.9.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-14 19:28 ` Brian Foster
@ 2014-04-15 2:15 ` Dave Chinner
2014-04-15 14:59 ` Brian Foster
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2014-04-15 2:15 UTC (permalink / raw)
To: Brian Foster; +Cc: bob.mastors, snitzer, xfs
On Mon, Apr 14, 2014 at 03:28:24PM -0400, Brian Foster wrote:
> On Thu, Apr 10, 2014 at 02:42:35PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > And interesting situation can occur if a log IO error occurs during
> > the unmount of a filesystem. The cases reported have the same
> > signature - the update of the superblock counters fails due to a log
> > write IO error:
> >
> > XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> > XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> > XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> > XFS (dm-16): xfs_log_force: error 5 returned.
> > XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
> >
> > It can be seen that the last line of output contains a corrupt
> > device name - this is because the log and xfs_mount structures have
> > already been freed by the time this message is printed. A kernel
> > oops closely follows.
> >
> > The issue is that the shutdown is occurring in a separate IO
> > completion thread to the unmount. Once the shutdown processing has
> > started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> > log shutdown code wakes anyone waiting on a log force so they can
> > process the shutdown error. This wakes up the unmount code that
> > is doing a synchronous transaction to update the superblock
> > counters.
> >
> > The unmount path now sees all the iclogs are marked with
> > XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> > it does, there will not be a wakeup trigger for it and we will hang
> > the unmount if we do. Hence the unmount runs through all the
> > remaining code and frees all the filesystem structures while the
> > xlog_iodone() is still processing the shutdown. When the log
> > shutdown processing completes, xfs_do_force_shutdown() emits the
> > "Please umount the filesystem and rectify the problem(s)" message,
> > and xlog_iodone() then aborts all the objects attached to the iclog.
> > An iclog that has already been freed....
> >
> > The real issue here is that there is no serialisation point between
> > the log IO and the unmount. We have serialisations points for log
> > writes, log forces, reservations, etc, but we don't actually have
> > any code that wakes for log IO to fully complete. We do that for all
> > other types of object, so why not iclogbufs?
> >
> > Well, it turns out that we can easily do this. We've got xfs_buf
> > handles, and that's what everyone else uses for IO serialisation.
> > i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> > release the lock in xlog_iodone() when we are finished with the
> > buffer. That way before we tear down the iclog, we can lock and
> > unlock the buffer to ensure IO completion has finished completely
> > before we tear it down.
> >
>
> Thanks for the write up...
.....
> On reading the code, my initial thought was that the source of this is
> the xlog_state_do_callback() call down in the shutdown path, when
> invoked from the log I/O completion handler. I think you pointed out in
> your previous reply that even if we were to make that call selective
> (e.g., based on whether the shutdown is due to a log error and thus we
> can expect xlog_state_do_callback() to be invoked), we still access
> relevant data structures after the ic_force_wait wait_queue is woken.
> Therefore, there would still be a race even if we bypassed the call from
> within the shutdown path in this particular case.
>
> The logic seems sane to me. I don't notice any issues. But my only
> question is why the use of locking, as opposed to wiring up use of
> b_iowait or something into the log I/O handler?
Log IO is issued B_ASYNC, so the buffer IO completion handlers do
not issue wakeups to b_iowait waiters. And the iclogbufs are
uncached buffers, so we can't use the buffer cache waiting
mechanisms to wait for them, either.
> I ask because it looks
> just a _bit_ funny to see the lock/unlock cycles used purely as a
> serialization mechanism. Do we use this kind of pattern in other places?
Yes, in xfs_log_quiesce(), to wait on the superblock buffer IO
because:
/*
* The superblock buffer is uncached and while xfs_ail_push_all_sync()
* will push it, xfs_wait_buftarg() will not wait for it. Further,
* xfs_buf_iowait() cannot be used because it was pushed with the
* XBF_ASYNC flag set, so we need to use a lock/unlock pair to wait for
* the IO to complete.
*/
Which is exactly the same situation as the iclogbufs - uncached
buffers, B_ASYNC IO....
> I guess on the other hand you could argue it protects the I/O in
> progress, and yet another wait_queue in this codepath might be overkill
> (so I like the use of an existing mechanism from that standpoint).
Mostly I figured that we already have the same IO wait mechanism in
the log shutdown path for other buffers, so it seemed silly to add
waitqueues and other infrastructure for something that could be
handled very easily. It also makes the iclogs look like every other
buffer in the filesystem in that the buffer lock is held over IO.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-15 2:15 ` Dave Chinner
@ 2014-04-15 14:59 ` Brian Foster
2014-04-16 14:27 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Brian Foster @ 2014-04-15 14:59 UTC (permalink / raw)
To: Dave Chinner; +Cc: bob.mastors, snitzer, xfs
On Tue, Apr 15, 2014 at 12:15:46PM +1000, Dave Chinner wrote:
> On Mon, Apr 14, 2014 at 03:28:24PM -0400, Brian Foster wrote:
> > On Thu, Apr 10, 2014 at 02:42:35PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > And interesting situation can occur if a log IO error occurs during
> > > the unmount of a filesystem. The cases reported have the same
> > > signature - the update of the superblock counters fails due to a log
> > > write IO error:
> > >
> > > XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1
> > > XFS (dm-16): Log I/O Error Detected. Shutting down filesystem
> > > XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> > > XFS (dm-16): xfs_log_force: error 5 returned.
> > > XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
> > >
> > > It can be seen that the last line of output contains a corrupt
> > > device name - this is because the log and xfs_mount structures have
> > > already been freed by the time this message is printed. A kernel
> > > oops closely follows.
> > >
> > > The issue is that the shutdown is occurring in a separate IO
> > > completion thread to the unmount. Once the shutdown processing has
> > > started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> > > log shutdown code wakes anyone waiting on a log force so they can
> > > process the shutdown error. This wakes up the unmount code that
> > > is doing a synchronous transaction to update the superblock
> > > counters.
> > >
> > > The unmount path now sees all the iclogs are marked with
> > > XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> > > it does, there will not be a wakeup trigger for it and we will hang
> > > the unmount if we do. Hence the unmount runs through all the
> > > remaining code and frees all the filesystem structures while the
> > > xlog_iodone() is still processing the shutdown. When the log
> > > shutdown processing completes, xfs_do_force_shutdown() emits the
> > > "Please umount the filesystem and rectify the problem(s)" message,
> > > and xlog_iodone() then aborts all the objects attached to the iclog.
> > > An iclog that has already been freed....
> > >
> > > The real issue here is that there is no serialisation point between
> > > the log IO and the unmount. We have serialisations points for log
> > > writes, log forces, reservations, etc, but we don't actually have
> > > any code that wakes for log IO to fully complete. We do that for all
> > > other types of object, so why not iclogbufs?
> > >
> > > Well, it turns out that we can easily do this. We've got xfs_buf
> > > handles, and that's what everyone else uses for IO serialisation.
> > > i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> > > release the lock in xlog_iodone() when we are finished with the
> > > buffer. That way before we tear down the iclog, we can lock and
> > > unlock the buffer to ensure IO completion has finished completely
> > > before we tear it down.
> > >
> >
> > Thanks for the write up...
> .....
> > On reading the code, my initial thought was that the source of this is
> > the xlog_state_do_callback() call down in the shutdown path, when
> > invoked from the log I/O completion handler. I think you pointed out in
> > your previous reply that even if we were to make that call selective
> > (e.g., based on whether the shutdown is due to a log error and thus we
> > can expect xlog_state_do_callback() to be invoked), we still access
> > relevant data structures after the ic_force_wait wait_queue is woken.
> > Therefore, there would still be a race even if we bypassed the call from
> > within the shutdown path in this particular case.
> >
> > The logic seems sane to me. I don't notice any issues. But my only
> > question is why the use of locking, as opposed to wiring up use of
> > b_iowait or something into the log I/O handler?
>
> Log IO is issued B_ASYNC, so the buffer IO completion handlers do
> not issue wakeups to b_iowait waiters. And the iclogbufs are
> uncached buffers, so we can't use the buffer cache waiting
> mechanisms to wait for them, either.
>
Yeah, I noticed that the completion wouldn't fire as is since it appears
that doesn't occur if a b_iodone call is provided. FWIW, the thought
process was more to inherit the use of b_iowait in xlog_iodone().
> > I ask because it looks
> > just a _bit_ funny to see the lock/unlock cycles used purely as a
> > serialization mechanism. Do we use this kind of pattern in other places?
>
> Yes, in xfs_log_quiesce(), to wait on the superblock buffer IO
> because:
>
> /*
> * The superblock buffer is uncached and while xfs_ail_push_all_sync()
> * will push it, xfs_wait_buftarg() will not wait for it. Further,
> * xfs_buf_iowait() cannot be used because it was pushed with the
> * XBF_ASYNC flag set, so we need to use a lock/unlock pair to wait for
> * the IO to complete.
> */
>
> Which is exactly the same situation as the iclogbufs - uncached
> buffers, B_ASYNC IO....
>
Ok, thanks. I see the AIL is pushing buffers through the delwri
interface, which means XBF_ASYNC and thus b_iowait is not fired. What
I'm missing about this example is how this serializes xfs_log_quiesce()
against I/O completion of the superblock...
It looks like we (if necessary), grab/lock the sb at transaction commit
time (xfs_trans_apply_sb_deltas()), make any modifications tracked by
the transaction, log the sb buffer and commit the transaction.
At that point, isn't the buffer lock released at iop_unlock() time..? My
understanding is that this protects against concurrent access. So the
super could be unlocked once the transaction is committed, we do a log
sync, push the ail and do a wait_buftarg(). If the latter won't wait on
sb_bp, what prevents the buf_lock from being acquired before the async
write completes..?
> > I guess on the other hand you could argue it protects the I/O in
> > progress, and yet another wait_queue in this codepath might be overkill
> > (so I like the use of an existing mechanism from that standpoint).
>
> Mostly I figured that we already have the same IO wait mechanism in
> the log shutdown path for other buffers, so it seemed silly to add
> waitqueues and other infrastructure for something that could be
> handled very easily. It also makes the iclogs look like every other
> buffer in the filesystem in that the buffer lock is held over IO.
>
When you point out that all buffers in the fs are locked over I/O, I
suspect I'm missing something fundamental in the buffer I/O path. ;)
Digging a little deeper, I see __xfs_buf_delwri_submit() takes the
buffer lock and sends the buffer down for I/O. The lock is
correspondingly released for XBF_ASYNC buffers in the
xfs_buf_iodone_work() callback.
So in the example used above, the sb can be locked, modifed, logged,
committed and unlocked in the tp commit path. The log sync pushes
everything to the log. The AIL push submits writes for all active items
(thus acquiring sb_bp lock). The wait_buftarg() presumably waits for all
cached buffers to write out and the sb_bp lock serializes against
completion of the sb buffer individually.
So in a sense, you've done what I was thinking about with regard to
inheriting b_iowait into the log callback, just doing so with the
locking mechanism instead. Only that appears to be more consistent with
general buffer handling, given that the log buffer I/O is async...
Reviewed-by: Brian Foster <bfoster@redhat.com>
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
2014-04-15 14:59 ` Brian Foster
@ 2014-04-16 14:27 ` Christoph Hellwig
0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2014-04-16 14:27 UTC (permalink / raw)
To: Brian Foster; +Cc: bob.mastors, snitzer, xfs
On Tue, Apr 15, 2014 at 10:59:25AM -0400, Brian Foster wrote:
> Yeah, I noticed that the completion wouldn't fire as is since it appears
> that doesn't occur if a b_iodone call is provided. FWIW, the thought
> process was more to inherit the use of b_iowait in xlog_iodone().
That doesn't mean the b_iodone callback couldn't do the wakeup on
b_iowait.
I have to say it makes me a bit uneasy that we had to find this
race the hard way for the superblock, and now for the logbufs again.
Seems like all other uncached buffers are only read and written
synchronously, so for this should be the last issue, but I fear about
new ones showing up in the future and would prefer a more general
solution.
Reluctantly:
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-04-16 14:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-10 4:42 [PATCH] xfs: unmount does not wait for shutdown during unmount Dave Chinner
2014-04-10 13:29 ` Mark Tinguely
2014-04-10 21:52 ` Dave Chinner
2014-04-10 16:25 ` Mike Snitzer
2014-04-11 19:21 ` [PATCH] " Bob Mastors
2014-04-14 8:21 ` Dave Chinner
2014-04-14 19:28 ` Brian Foster
2014-04-15 2:15 ` Dave Chinner
2014-04-15 14:59 ` Brian Foster
2014-04-16 14:27 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).