public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Long Li <leo.lilong@huawei.com>
Cc: david@fromorbit.com, linux-xfs@vger.kernel.org,
	yi.zhang@huawei.com, houtao1@huawei.com, yangerkun@huawei.com
Subject: Re: [PATCH v2 2/3] xfs: abort intent items when recovery intents fail
Date: Tue, 18 Jul 2023 19:07:56 -0700	[thread overview]
Message-ID: <20230719020756.GD11352@frogsfrogsfrogs> (raw)
In-Reply-To: <20230715063647.2094989-3-leo.lilong@huawei.com>

On Sat, Jul 15, 2023 at 02:36:46PM +0800, Long Li wrote:
> When recovering intents, we capture newly created intent items as part of
> committing recovered intent items.  If intent recovery fails at a later
> point, we forget to remove those newly created intent items from the AIL
> and hang:
> 
>     [root@localhost ~]# cat /proc/539/stack
>     [<0>] xfs_ail_push_all_sync+0x174/0x230
>     [<0>] xfs_unmount_flush_inodes+0x8d/0xd0
>     [<0>] xfs_mountfs+0x15f7/0x1e70
>     [<0>] xfs_fs_fill_super+0x10ec/0x1b20
>     [<0>] get_tree_bdev+0x3c8/0x730
>     [<0>] vfs_get_tree+0x89/0x2c0
>     [<0>] path_mount+0xecf/0x1800
>     [<0>] do_mount+0xf3/0x110
>     [<0>] __x64_sys_mount+0x154/0x1f0
>     [<0>] do_syscall_64+0x39/0x80
>     [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> When newly created intent items fail to commit via transaction, intent
> recovery hasn't created done items for these newly created intent items,
> so the capture structure is the sole owner of the captured intent items.
> We must release them explicitly or else they leak:
> 
> unreferenced object 0xffff888016719108 (size 432):
>   comm "mount", pid 529, jiffies 4294706839 (age 144.463s)
>   hex dump (first 32 bytes):
>     08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff  ..q.......q.....
>     18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff  ..q.......q.....
>   backtrace:
>     [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0
>     [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150
>     [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340
>     [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0
>     [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980
>     [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70
>     [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0
>     [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0
>     [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70
>     [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20
>     [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0
>     [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0
>     [<ffffffff81a9f35f>] path_mount+0xecf/0x1800
>     [<ffffffff81a9fd83>] do_mount+0xf3/0x110
>     [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0
>     [<ffffffff83968739>] do_syscall_64+0x39/0x80
> 
> Fix the problem above by abort intent items that don't have a done item
> when recovery intents fail.
> 
> Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery")
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c | 5 +++--
>  fs/xfs/libxfs/xfs_defer.h | 2 +-
>  fs/xfs/xfs_log_recover.c  | 2 +-
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 88388e12f8e7..f71679ce23b9 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -763,12 +763,13 @@ xfs_defer_ops_capture(
>  
>  /* Release all resources that we used to capture deferred ops. */
>  void
> -xfs_defer_ops_capture_free(
> +xfs_defer_ops_capture_abort(
>  	struct xfs_mount		*mp,
>  	struct xfs_defer_capture	*dfc)
>  {
>  	unsigned short			i;
>  
> +	xfs_defer_pending_abort(mp, &dfc->dfc_dfops);

Aha, so ... intent recovery can itself create new intent items.  These
new items are tracked via t_dfops in the transaction, but they've been
moved into the xfs_defer_capture structure by xfs_defer_ops_capture...

>  	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
>  
>  	for (i = 0; i < dfc->dfc_held.dr_bufs; i++)
> @@ -809,7 +810,7 @@ xfs_defer_ops_capture_and_commit(
>  	/* Commit the transaction and add the capture structure to the list. */
>  	error = xfs_trans_commit(tp);
>  	if (error) {
> -		xfs_defer_ops_capture_free(mp, dfc);
> +		xfs_defer_ops_capture_abort(mp, dfc);

...but then the transaction commit fails.  The capture structure still
has the new intents attached, but nobody actually aborts them.  They
leak.

That's why _capture_free needs to call xfs_defer_pending_abort so that
we call ->abort_intent to free the new intents instead of leaking them.

Ok.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


>  		return error;
>  	}
>  
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 114a3a4930a3..8788ad5f6a73 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -121,7 +121,7 @@ int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
>  		struct list_head *capture_list);
>  void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
>  		struct xfs_defer_resources *dres);
> -void xfs_defer_ops_capture_free(struct xfs_mount *mp,
> +void xfs_defer_ops_capture_abort(struct xfs_mount *mp,
>  		struct xfs_defer_capture *d);
>  void xfs_defer_resources_rele(struct xfs_defer_resources *dres);
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 82c81d20459d..fdaa0ffe029b 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2511,7 +2511,7 @@ xlog_abort_defer_ops(
>  
>  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
>  		list_del_init(&dfc->dfc_list);
> -		xfs_defer_ops_capture_free(mp, dfc);
> +		xfs_defer_ops_capture_abort(mp, dfc);
>  	}
>  }
>  
> -- 
> 2.31.1
> 

  reply	other threads:[~2023-07-19  2:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-15  6:36 [PATCH v2 0/3] xfs: fix two problem when recovery intents fails Long Li
2023-07-15  6:36 ` [PATCH v2 1/3] xfs: factor out xfs_defer_pending_abort Long Li
2023-07-19  2:08   ` Darrick J. Wong
2023-07-15  6:36 ` [PATCH v2 2/3] xfs: abort intent items when recovery intents fail Long Li
2023-07-19  2:07   ` Darrick J. Wong [this message]
2023-07-15  6:36 ` [PATCH v2 3/3] xfs: make sure done item committed before cancel intents Long Li
2023-07-19  2:19   ` Darrick J. Wong
2023-07-22  2:16     ` Long Li
2023-07-19  6:41   ` Dave Chinner
2023-07-22  1:19     ` Long Li
2023-07-24  1:05       ` Dave Chinner
2023-07-25 12:25         ` Long Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230719020756.GD11352@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=houtao1@huawei.com \
    --cc=leo.lilong@huawei.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox