public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues
Date: Wed, 26 May 2021 23:16:43 -0700	[thread overview]
Message-ID: <20210527061643.GD202121@locust> (raw)
In-Reply-To: <20210527045202.1155628-3-david@fromorbit.com>

On Thu, May 27, 2021 at 02:51:58PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> large directory block size operations are assert failing because
> xfs_bunmapi() is not completely removing fragmented directory blocks
> like so:
> 
> XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
> ....
> Call Trace:
>  xfs_dir2_shrink_inode+0x1a8/0x210
>  xfs_dir2_block_to_sf+0x2ae/0x410
>  xfs_dir2_block_removename+0x21a/0x280
>  xfs_dir_removename+0x195/0x1d0
>  xfs_rename+0xb79/0xc50
>  ? avc_has_perm+0x8d/0x1a0
>  ? avc_has_perm_noaudit+0x9a/0x120
>  xfs_vn_rename+0xdb/0x150
>  vfs_rename+0x719/0xb50
>  ? __lookup_hash+0x6a/0xa0
>  do_renameat2+0x413/0x5e0
>  __x64_sys_rename+0x45/0x50
>  do_syscall_64+0x3a/0x70
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> We are aborting the bunmapi() pass because of this specific chunk of
> code:
> 
>                 /*
>                  * Make sure we don't touch multiple AGF headers out of order
>                  * in a single transaction, as that could cause AB-BA deadlocks.
>                  */
>                 if (!wasdel && !isrt) {
>                         agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
>                         if (prev_agno != NULLAGNUMBER && prev_agno > agno)
>                                 break;
>                         prev_agno = agno;
>                 }
> 
> This is designed to prevent deadlocks in AGF locking when freeing
> multiple extents by ensuring that we only ever lock in increasing
> AG number order. Unfortunately, this also violates the "bunmapi will
> always succeed" semantic that some high level callers depend on,
> such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
> xfs_inactive_symlink_rmt().
> 
> This AG lock ordering was introduced back in 2017 to fix deadlocks
> triggered by generic/299 as reported here:
> 
> https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/
> 
> This codebase is old enough that it was before we were defering all
> AG based extent freeing from within xfs_bunmapi(). THat is, we never
> actually lock AGs in xfs_bunmapi() any more - every non-rt based
> extent free is added to the defer ops list, as is all BMBT block
> freeing. And RT extents are not RT based, so there's no lock
> ordering issues associated with them.
> 
> Hence this AGF lock ordering code is both broken and dead. Let's
> just remove it so that the large directory block code works reliably
> again.
> 
> Tested against xfs/538 and generic/299 which is the original test
> that exposed the deadlocks that this code fixed.
> 
> Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Seems reasonable nowadays; will throw it at the test cloud and see what
happens.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 3f8b6da09261..a3e0e6f672d6 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5349,7 +5349,6 @@ __xfs_bunmapi(
>  	xfs_fsblock_t		sum;
>  	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
>  	xfs_fileoff_t		max_len;
> -	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
>  	xfs_fileoff_t		end;
>  	struct xfs_iext_cursor	icur;
>  	bool			done = false;
> @@ -5441,16 +5440,6 @@ __xfs_bunmapi(
>  		del = got;
>  		wasdel = isnullstartblock(del.br_startblock);
>  
> -		/*
> -		 * Make sure we don't touch multiple AGF headers out of order
> -		 * in a single transaction, as that could cause AB-BA deadlocks.
> -		 */
> -		if (!wasdel && !isrt) {
> -			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
> -			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
> -				break;
> -			prev_agno = agno;
> -		}
>  		if (got.br_startoff < start) {
>  			del.br_startoff = start;
>  			del.br_blockcount -= start - got.br_startoff;
> -- 
> 2.31.1
> 

  reply	other threads:[~2021-05-27  6:16 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
2021-05-27  6:15   ` Darrick J. Wong
2021-05-27  4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
2021-05-27  6:16   ` Darrick J. Wong [this message]
2021-05-27  4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
2021-05-31 12:55   ` Chandan Babu R
2021-05-31 13:05     ` Chandan Babu R
2021-05-31 23:28       ` Dave Chinner
2021-06-01  6:42         ` Chandan Babu R
2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
2021-05-27  6:38   ` kernel test robot
2021-05-27  6:38   ` kernel test robot
2021-05-27  7:03   ` kernel test robot
2021-05-27  7:03   ` [RFC PATCH] xfs: xfs_allocfree_extent_res can be static kernel test robot
2021-06-02 21:37   ` [PATCH 4/6] xfs: add a free space extent change reservation Darrick J. Wong
2021-05-27  4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
2021-06-02 21:36   ` Darrick J. Wong
2021-05-27  4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
2021-05-27  6:19   ` Darrick J. Wong
2021-05-27  8:52     ` Dave Chinner
2021-05-28  0:01       ` Darrick J. Wong
2021-05-28  2:30         ` Dave Chinner
2021-05-28  5:30           ` Darrick J. Wong
2021-05-31 10:02 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Chandan Babu R
2021-05-31 22:41   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210527061643.GD202121@locust \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox