All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: bob.mastors@solidfire.com, snitzer@redhat.com, xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: unmount does not wait for shutdown during unmount
Date: Mon, 14 Apr 2014 15:28:24 -0400	[thread overview]
Message-ID: <20140414192824.GC62307@bfoster.bfoster> (raw)
In-Reply-To: <1397104955-7247-1-git-send-email-david@fromorbit.com>

On Thu, Apr 10, 2014 at 02:42:35PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> And interesting situation can occur if a log IO error occurs during
> the unmount of a filesystem. The cases reported have the same
> signature - the update of the superblock counters fails due to a log
> write IO error:
> 
> XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa08a44a1
> XFS (dm-16): Log I/O Error Detected.  Shutting down filesystem
> XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount.
> XFS (dm-16): xfs_log_force: error 5 returned.
> XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s)
> 
> It can be seen that the last line of output contains a corrupt
> device name - this is because the log and xfs_mount structures have
> already been freed by the time this message is printed. A kernel
> oops closely follows.
> 
> The issue is that the shutdown is occurring in a separate IO
> completion thread to the unmount. Once the shutdown processing has
> started and all the iclogs are marked with XLOG_STATE_IOERROR, the
> log shutdown code wakes anyone waiting on a log force so they can
> process the shutdown error. This wakes up the unmount code that
> is doing a synchronous transaction to update the superblock
> counters.
> 
> The unmount path now sees all the iclogs are marked with
> XLOG_STATE_IOERROR and so never waits on them again, knowing that if
> it does, there will not be a wakeup trigger for it and we will hang
> the unmount if we do. Hence the unmount runs through all the
> remaining code and frees all the filesystem structures while the
> xlog_iodone() is still processing the shutdown. When the log
> shutdown processing completes, xfs_do_force_shutdown() emits the
> "Please umount the filesystem and rectify the problem(s)" message,
> and xlog_iodone() then aborts all the objects attached to the iclog.
> An iclog that has already been freed....
> 
> The real issue here is that there is no serialisation point between
> the log IO and the unmount. We have serialisations points for log
> writes, log forces, reservations, etc, but we don't actually have
> any code that wakes for log IO to fully complete. We do that for all
> other types of object, so why not iclogbufs?
> 
> Well, it turns out that we can easily do this. We've got xfs_buf
> handles, and that's what everyone else uses for IO serialisation.
> i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only
> release the lock in xlog_iodone() when we are finished with the
> buffer. That way before we tear down the iclog, we can lock and
> unlock the buffer to ensure IO completion has finished completely
> before we tear it down.
> 

Thanks for the write up...

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 53 ++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 44 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 8497a00..08624dc 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1181,11 +1181,14 @@ xlog_iodone(xfs_buf_t *bp)
>  	/* log I/O is always issued ASYNC */
>  	ASSERT(XFS_BUF_ISASYNC(bp));
>  	xlog_state_done_syncing(iclog, aborted);
> +
>  	/*
> -	 * do not reference the buffer (bp) here as we could race
> -	 * with it being freed after writing the unmount record to the
> -	 * log.
> +	 * drop the buffer lock now that we are done. Nothing references
> +	 * the buffer after this, so an unmount waiting on this lock can now
> +	 * tear it down safely. As such, it is unsafe to reference the buffer
> +	 * (bp) after the unlock as we could race with it being freed.
>  	 */
> +	xfs_buf_unlock(bp);
>  }
>  
>  /*
> @@ -1368,8 +1371,16 @@ xlog_alloc_log(
>  	bp = xfs_buf_alloc(mp->m_logdev_targp, 0, BTOBB(log->l_iclog_size), 0);
>  	if (!bp)
>  		goto out_free_log;
> -	bp->b_iodone = xlog_iodone;
> +
> +	/*
> +	 * The iclogbuf buffer locks are held over IO but we are not going to do
> +	 * IO yet.  Hence unlock the buffer so that the log IO path can grab it
> +	 * when appropriately.
> +	 */
>  	ASSERT(xfs_buf_islocked(bp));
> +	xfs_buf_unlock(bp);
> +
> +	bp->b_iodone = xlog_iodone;
>  	log->l_xbuf = bp;
>  
>  	spin_lock_init(&log->l_icloglock);
> @@ -1398,6 +1409,9 @@ xlog_alloc_log(
>  		if (!bp)
>  			goto out_free_iclog;
>  
> +		ASSERT(xfs_buf_islocked(bp));
> +		xfs_buf_unlock(bp);
> +
>  		bp->b_iodone = xlog_iodone;
>  		iclog->ic_bp = bp;
>  		iclog->ic_data = bp->b_addr;
> @@ -1422,7 +1436,6 @@ xlog_alloc_log(
>  		iclog->ic_callback_tail = &(iclog->ic_callback);
>  		iclog->ic_datap = (char *)iclog->ic_data + log->l_iclog_hsize;
>  
> -		ASSERT(xfs_buf_islocked(iclog->ic_bp));
>  		init_waitqueue_head(&iclog->ic_force_wait);
>  		init_waitqueue_head(&iclog->ic_write_wait);
>  
> @@ -1631,6 +1644,12 @@ xlog_cksum(
>   * we transition the iclogs to IOERROR state *after* flushing all existing
>   * iclogs to disk. This is because we don't want anymore new transactions to be
>   * started or completed afterwards.
> + *
> + * We lock the iclogbufs here so that we can serialise against IO completion
> + * during unmount. We might be processing a shutdown triggered during unmount,
> + * and that can occur asynchronously to the unmount thread, and hence we need to
> + * ensure that completes before tearing down the iclogbufs. Hence we need to
> + * hold the buffer lock across the log IO to acheive that.
>   */
>  STATIC int
>  xlog_bdstrat(
> @@ -1638,6 +1657,7 @@ xlog_bdstrat(
>  {
>  	struct xlog_in_core	*iclog = bp->b_fspriv;
>  
> +	xfs_buf_lock(bp);
>  	if (iclog->ic_state & XLOG_STATE_IOERROR) {
>  		xfs_buf_ioerror(bp, EIO);
>  		xfs_buf_stale(bp);
> @@ -1645,7 +1665,8 @@ xlog_bdstrat(
>  		/*
>  		 * It would seem logical to return EIO here, but we rely on
>  		 * the log state machine to propagate I/O errors instead of
> -		 * doing it here.
> +		 * doing it here. Similarly, IO completion will unlock the
> +		 * buffer, so we don't do it here.
>  		 */
>  		return 0;
>  	}
> @@ -1847,14 +1868,28 @@ xlog_dealloc_log(
>  	xlog_cil_destroy(log);
>  
>  	/*
> -	 * always need to ensure that the extra buffer does not point to memory
> -	 * owned by another log buffer before we free it.
> +	 * Cycle all the iclogbuf locks to make sure all log IO completion
> +	 * is done before we tear down these buffers.
>  	 */
> +	iclog = log->l_iclog;
> +	for (i = 0; i < log->l_iclog_bufs; i++) {
> +		xfs_buf_lock(iclog->ic_bp);
> +		xfs_buf_unlock(iclog->ic_bp);
> +		iclog = iclog->ic_next;
> +	}
> +
> +	/*
> +	 * Always need to ensure that the extra buffer does not point to memory
> +	 * owned by another log buffer before we free it. Also, cycle the lock
> +	 * first to ensure we've completed IO on it.
> +	 */
> +	xfs_buf_lock(log->l_xbuf);
> +	xfs_buf_unlock(log->l_xbuf);
>  	xfs_buf_set_empty(log->l_xbuf, BTOBB(log->l_iclog_size));
>  	xfs_buf_free(log->l_xbuf);
>  
>  	iclog = log->l_iclog;
> -	for (i=0; i<log->l_iclog_bufs; i++) {
> +	for (i = 0; i < log->l_iclog_bufs; i++) {
>  		xfs_buf_free(iclog->ic_bp);
>  		next_iclog = iclog->ic_next;
>  		kmem_free(iclog);

On reading the code, my initial thought was that the source of this is
the xlog_state_do_callback() call down in the shutdown path, when
invoked from the log I/O completion handler. I think you pointed out in
your previous reply that even if we were to make that call selective
(e.g., based on whether the shutdown is due to a log error and thus we
can expect xlog_state_do_callback() to be invoked), we still access
relevant data structures after the ic_force_wait wait_queue is woken.
Therefore, there would still be a race even if we bypassed the call from
within the shutdown path in this particular case.

The logic seems sane to me. I don't notice any issues. But my only
question is why the use of locking, as opposed to wiring up use of
b_iowait or something into the log I/O handler? I ask because it looks
just a _bit_ funny to see the lock/unlock cycles used purely as a
serialization mechanism. Do we use this kind of pattern in other places?
I guess on the other hand you could argue it protects the I/O in
progress, and yet another wait_queue in this codepath might be overkill
(so I like the use of an existing mechanism from that standpoint).

Brian

> -- 
> 1.9.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2014-04-14 19:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-10  4:42 [PATCH] xfs: unmount does not wait for shutdown during unmount Dave Chinner
2014-04-10 13:29 ` Mark Tinguely
2014-04-10 21:52   ` Dave Chinner
2014-04-10 16:25 ` Mike Snitzer
2014-04-11 19:21 ` [PATCH] " Bob Mastors
2014-04-14  8:21 ` Dave Chinner
2014-04-14 19:28 ` Brian Foster [this message]
2014-04-15  2:15   ` Dave Chinner
2014-04-15 14:59     ` Brian Foster
2014-04-16 14:27       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140414192824.GC62307@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=bob.mastors@solidfire.com \
    --cc=david@fromorbit.com \
    --cc=snitzer@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.