From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: log mount failures don't wait for buffers to be released
Date: Tue, 12 Jan 2016 09:15:41 -0500 [thread overview]
Message-ID: <20160112141540.GD12156@bfoster.bfoster> (raw)
In-Reply-To: <1452562536-32145-1-git-send-email-david@fromorbit.com>
On Tue, Jan 12, 2016 at 12:35:36PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Recently I've been seeing xfs/051 fail on 1k block size filesystems.
> Trying to trace the events during the test lead to the problem going
> away, indicating that it was a race condition that lead to this
> ASSERT failure:
>
> XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156
> .....
> [<ffffffff814e1257>] xfs_free_perag+0x87/0xb0
> [<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900
> [<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0
> [<ffffffff811d8800>] mount_bdev+0x180/0x1b0
> [<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20
> [<ffffffff811d90a8>] mount_fs+0x38/0x170
> [<ffffffff811f4347>] vfs_kern_mount+0x67/0x120
> [<ffffffff811f7018>] do_mount+0x218/0xd60
> [<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0
>
> When I finally caught it with tracing enabled, I saw that AG 2 had
> an elevated reference count and a buffer was responsible for it. I
> tracked down the specific buffer, and found that it was missing the
> final reference count release that would put it back on the LRU and
> hence be found by xfs_wait_buftarg() calls in the log mount failure
> handling.
>
> The last four traces for the buffer before the assert were (trimmed
> for relevance)
>
> kworker/0:1-5259 xfs_buf_iodone: hold 2 lock 0 flags ASYNC
> kworker/0:1-5259 xfs_buf_ioerror: hold 2 lock 0 error -5
> mount-7163 xfs_buf_lock_done: hold 2 lock 0 flags ASYNC
> mount-7163 xfs_buf_unlock: hold 2 lock 1 flags ASYNC
>
> This is an async write that is completing, so there's nobody waiting
> for it directly. Hence we call xfs_buf_relse() once all the
> processing is complete. That does:
>
> static inline void xfs_buf_relse(xfs_buf_t *bp)
> {
> xfs_buf_unlock(bp);
> xfs_buf_rele(bp);
> }
>
> Now, it's clear that mount is waiting on the buffer lock, and that
> it has been released by xfs_buf_relse() and gained by mount. This is
> expected, because at this point the mount process is in
> xfs_buf_delwri_submit() waiting for all the IO it submitted to
> complete.
>
> The mount process, however, is waiting on the lock for the buffer
> because it is in xfs_buf_delwri_submit(). This waits for IO
> completion, but it doesn't wait for the buffer reference owned by
> the IO to go away. The mount process collects all the completions,
> fails the log recovery, and the higher level code then calls
> xfs_wait_buftarg() to free all the remaining buffers in the
> filesystem.
>
> The issue is that on unlocking the buffer, the scheduler has decided
> that the mount process has higher priority than the the kworker
> thread that is running the IO completion, and so immediately
> switched contexts to the mount process from the semaphore unlock
> code, hence preventing the kworker thread from finishing the IO
> completion and releasing the IO reference to the buffer.
>
> Hence by the time that xfs_wait_buftarg() is run, the buffer still
> has an active reference and so isn't on the LRU list that the
> function walks to free the remaining buffers. Hence we miss that
> buffer and continue onwards to tear down the mount structures,
> at which time we get find a stray reference count on the perag
> structure. On a non-debug kernel, this will be ignored and the
> structure torn down and freed. Hence when the kworker thread is then
> rescheduled and the buffer released and freed, it will access a
> freed perag structure.
>
> The problem here is that when the log mount fails, we still need to
> quiesce the log to ensure that the IO workqueues have returned to
> idle before we run xfs_wait_buftarg(). By synchronising the
> workqueues, we ensure that all IO completions are fully processed,
> not just to the point where buffers have been unlocked. This ensures
> we don't end up in the situation above.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_buf.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 04d2bdd..08ad701 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1529,6 +1529,16 @@ xfs_wait_buftarg(
> LIST_HEAD(dispose);
> int loop = 0;
>
> + /*
> + * We need to flush the buffer workqueue to ensure that all IO
> + * completion processing is 100% done. Just waiting on buffer locks is
> + * not sufficient for async IO as the reference count held over IO is
> + * not released until after the buffer lock is dropped. Hence we need to
> + * ensure here that all reference counts have been dropped before we
> + * start walking the LRU list.
> + */
> + drain_workqueue(btp->bt_mount->m_buf_workqueue);
> +
> /* loop until there is nothing left on the lru list. */
> while (list_lru_count(&btp->bt_lru)) {
> list_lru_walk(&btp->bt_lru, xfs_buftarg_wait_rele,
> --
> 2.5.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2016-01-12 14:15 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-12 1:35 [PATCH] xfs: log mount failures don't wait for buffers to be released Dave Chinner
2016-01-12 14:15 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160112141540.GD12156@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.