* [PATCH] xfs: clear ail delwri queued bufs on unmount of shutdown fs
@ 2018-10-12 17:46 Brian Foster
2018-10-13 2:29 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2018-10-12 17:46 UTC (permalink / raw)
To: linux-xfs
In the typical unmount case, the AIL is forced out by the unmount
sequence before the xfsaild task is stopped. Since AIL items are
removed on writeback completion, this means that the AIL
->ail_buf_list delwri queue has been drained. This is not always
true in the shutdown case, however.
It's possible for buffers to sit on a delwri queue for a period of
time across submission attempts if said items are locked or have
been relogged and pinned since first added to the queue. If the
attempt to log such an item results in a log I/O error, the error
processing can shutdown the fs, remove the item from the AIL, stale
the buffer (dropping the LRU reference) and clear its delwri queue
state. The latter bit means the buffer will be released from a
delwri queue on the next submission attempt, but this might never
occur if the filesystem has shutdown and the AIL is empty.
This means that such buffers are held indefinitely by the AIL delwri
queue across destruction of the AIL. Aside from being a memory leak,
these buffers can also hold references to in-core perag structures.
The latter problem manifests as a generic/475 failure, reproducing
the following asserts at unmount time:
XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
file: fs/xfs/xfs_mount.c, line: 151
XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
file: fs/xfs/xfs_mount.c, line: 132
To prevent this problem, clear the AIL delwri queue as a final step
before xfsaild() exit. The !empty state should never occur in the
normal case, so add an assert to catch unexpected problems going
forward.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
Hi all,
This is a minor issue I ran into via generic/475. That test still
occasionally fails for me with this patch in place, but the failure
appears to be unrelated.
Brian
fs/xfs/xfs_trans_ail.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 55326f971cb3..d3a4e89bf4a0 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -531,17 +531,33 @@ xfsaild(
set_current_state(TASK_INTERRUPTIBLE);
/*
- * Check kthread_should_stop() after we set the task state
- * to guarantee that we either see the stop bit and exit or
- * the task state is reset to runnable such that it's not
- * scheduled out indefinitely and detects the stop bit at
- * next iteration.
- *
+ * Check kthread_should_stop() after we set the task state to
+ * guarantee that we either see the stop bit and exit or the
+ * task state is reset to runnable such that it's not scheduled
+ * out indefinitely and detects the stop bit at next iteration.
* A memory barrier is included in above task state set to
* serialize again kthread_stop().
*/
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
+
+ /*
+ * The caller forces out the AIL before stopping the
+ * thread in the common case, which means the delwri
+ * queue is drained. In the shutdown case, the queue may
+ * still hold relogged buffers that haven't been
+ * submitted because they were pinned since added to the
+ * queue.
+ *
+ * Log I/O error processing stales the underlying buffer
+ * and clears the delwri state, expecting the buf to be
+ * removed on the next submission attempt. That won't
+ * happen if we're shutting down, so this is the last
+ * opportunity to release such buffers from the queue.
+ */
+ ASSERT(list_empty(&ailp->ail_buf_list) ||
+ XFS_FORCED_SHUTDOWN(ailp->ail_mount));
+ xfs_buf_delwri_cancel(&ailp->ail_buf_list);
break;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: clear ail delwri queued bufs on unmount of shutdown fs
2018-10-12 17:46 [PATCH] xfs: clear ail delwri queued bufs on unmount of shutdown fs Brian Foster
@ 2018-10-13 2:29 ` Dave Chinner
2018-10-15 1:57 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2018-10-13 2:29 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-xfs
On Fri, Oct 12, 2018 at 01:46:54PM -0400, Brian Foster wrote:
> In the typical unmount case, the AIL is forced out by the unmount
> sequence before the xfsaild task is stopped. Since AIL items are
> removed on writeback completion, this means that the AIL
> ->ail_buf_list delwri queue has been drained. This is not always
> true in the shutdown case, however.
>
> It's possible for buffers to sit on a delwri queue for a period of
> time across submission attempts if said items are locked or have
> been relogged and pinned since first added to the queue.
Can you add this as a comment to xfs_buf_delwri_submit_nowait() to
document that callers either need to check that everything was
submitted and/or cancel the delwri list before they tear it down?
> If the
> attempt to log such an item results in a log I/O error, the error
> processing can shutdown the fs, remove the item from the AIL, stale
> the buffer (dropping the LRU reference) and clear its delwri queue
> state. The latter bit means the buffer will be released from a
> delwri queue on the next submission attempt, but this might never
> occur if the filesystem has shutdown and the AIL is empty.
>
> This means that such buffers are held indefinitely by the AIL delwri
> queue across destruction of the AIL. Aside from being a memory leak,
> these buffers can also hold references to in-core perag structures.
> The latter problem manifests as a generic/475 failure, reproducing
> the following asserts at unmount time:
>
> XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
> file: fs/xfs/xfs_mount.c, line: 151
> XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
> file: fs/xfs/xfs_mount.c, line: 132
Yup, I saw that once a couple of weeks ago, but was not able to
reproduce it to track it down.
> To prevent this problem, clear the AIL delwri queue as a final step
> before xfsaild() exit. The !empty state should never occur in the
> normal case, so add an assert to catch unexpected problems going
> forward.
*nod*.
Apart from needing to document how to use
xfs_buf_delwri_submit_nowait(), this looks fine.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: clear ail delwri queued bufs on unmount of shutdown fs
2018-10-13 2:29 ` Dave Chinner
@ 2018-10-15 1:57 ` Dave Chinner
2018-10-15 12:23 ` Brian Foster
0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2018-10-15 1:57 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-xfs
On Sat, Oct 13, 2018 at 01:29:14PM +1100, Dave Chinner wrote:
> On Fri, Oct 12, 2018 at 01:46:54PM -0400, Brian Foster wrote:
> > In the typical unmount case, the AIL is forced out by the unmount
> > sequence before the xfsaild task is stopped. Since AIL items are
> > removed on writeback completion, this means that the AIL
> > ->ail_buf_list delwri queue has been drained. This is not always
> > true in the shutdown case, however.
> >
> > It's possible for buffers to sit on a delwri queue for a period of
> > time across submission attempts if said items are locked or have
> > been relogged and pinned since first added to the queue.
>
> Can you add this as a comment to xfs_buf_delwri_submit_nowait() to
> document that callers either need to check that everything was
> submitted and/or cancel the delwri list before they tear it down?
I added this comment to the patch when I pulled it in:
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2055,6 +2055,13 @@ xfs_buf_delwri_submit_buffers(
* is only safely useable for callers that can track I/O completion by higher
* level means, e.g. AIL pushing as the @buffer_list is consumed in this
* function.
+ *
+ * Note: this function will skip buffers it would block on, and in doing so
+ * leaves them on @buffer_list so they can be retried on a later pass. As such,
+ * it is up to the caller to ensure that the buffer list is fully submitted or
+ * cancelled appropriately when they are finished with the list. Failure to
+ * cancel or resubmit the list until it is empty will result in leaked buffers
+ * at unmount time.
*/
int
xfs_buf_delwri_submit_nowait(
Does that look reasonable?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: clear ail delwri queued bufs on unmount of shutdown fs
2018-10-15 1:57 ` Dave Chinner
@ 2018-10-15 12:23 ` Brian Foster
0 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2018-10-15 12:23 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Mon, Oct 15, 2018 at 12:57:27PM +1100, Dave Chinner wrote:
> On Sat, Oct 13, 2018 at 01:29:14PM +1100, Dave Chinner wrote:
> > On Fri, Oct 12, 2018 at 01:46:54PM -0400, Brian Foster wrote:
> > > In the typical unmount case, the AIL is forced out by the unmount
> > > sequence before the xfsaild task is stopped. Since AIL items are
> > > removed on writeback completion, this means that the AIL
> > > ->ail_buf_list delwri queue has been drained. This is not always
> > > true in the shutdown case, however.
> > >
> > > It's possible for buffers to sit on a delwri queue for a period of
> > > time across submission attempts if said items are locked or have
> > > been relogged and pinned since first added to the queue.
> >
> > Can you add this as a comment to xfs_buf_delwri_submit_nowait() to
> > document that callers either need to check that everything was
> > submitted and/or cancel the delwri list before they tear it down?
>
> I added this comment to the patch when I pulled it in:
>
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -2055,6 +2055,13 @@ xfs_buf_delwri_submit_buffers(
> * is only safely useable for callers that can track I/O completion by higher
> * level means, e.g. AIL pushing as the @buffer_list is consumed in this
> * function.
> + *
> + * Note: this function will skip buffers it would block on, and in doing so
> + * leaves them on @buffer_list so they can be retried on a later pass. As such,
> + * it is up to the caller to ensure that the buffer list is fully submitted or
> + * cancelled appropriately when they are finished with the list. Failure to
> + * cancel or resubmit the list until it is empty will result in leaked buffers
> + * at unmount time.
> */
> int
> xfs_buf_delwri_submit_nowait(
>
> Does that look reasonable?
>
Looks fine to me, thanks!
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-10-15 20:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-12 17:46 [PATCH] xfs: clear ail delwri queued bufs on unmount of shutdown fs Brian Foster
2018-10-13 2:29 ` Dave Chinner
2018-10-15 1:57 ` Dave Chinner
2018-10-15 12:23 ` Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).