All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL
Date: Tue, 11 May 2021 17:19:21 +0530	[thread overview]
Message-ID: <87cztxwkvy.fsf@garuda> (raw)
In-Reply-To: <20210506032751.GN63242@dread.disaster.area>

On 06 May 2021 at 08:57, Dave Chinner wrote:
> On Wed, May 05, 2021 at 06:12:41PM +0530, Chandan Babu R wrote:
>> > Hence when doing allocation for the free list, we need to fail the
>> > allocation rather than block on the only remaining free extent in
>> > the AG. If we are freeing extents, the AGFL not being full is not an
>> > issue at all. And if we are allocating extents, the transaction
>> > reservations should have ensured that the AG had sufficient space in
>> > it to complete the entire operation without deadlocking waiting for
>> > space.
>> >
>> > Either way, I don't see a problem with making sure the AGFL
>> > allocations just skip busy extents and fail if the only free extents
>> > are ones this transaction has freed itself.
>> >
>>
>> Hmm. In the scenario where *all* free extents in the AG were originally freed
>> by the current transaction (and hence busy in the transaction),
>
> How does that happen?

I tried in vain to arrive at the above mentioned scenario by consuming away as
many blocks as possible from the filesystem. At best, I could arrive at an AG
with just one free extent record in the cntbt (NOTE: I had to disable global
reservation by invoking "xfs_io -x -c 'resblks 0' $mntpnt"):

recs[1] = [startblock,blockcount]
1:[32767,1]

For each AG available in an FS instance, we take away 8
(i.e. XFS_ALLOC_AGFL_RESERVE + 4) blocks from the global free data blocks
counter. This reservation is applied to the FS as a whole rather than each AG
individually. Hence we could get to a scenario where an AG could have less
than 8 free blocks. I could not find any other restriction in the code that
explicitly prevents an AG from having zero free extents.

However, I could not create such an AG because any fs operation that needs
extent allocation to be done would try to reserve more than 1 extent causing
the above cited AG to not be chosen.

>
>> we would need
>> to be able to recognize this situation and skip invoking
>> xfs_extent_busy_flush() altogether.
>
> If we are freeing extents (i.e XFS_ALLOC_FLAG_FREEING is set) and
> we are doing allocation for AGFL and we only found busy extents,
> then it's OK to fail the allocation.

When freeing an extent, the following patch skips allocation of blocks to AGFL
if all the free extents found are busy,

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index aaa19101bb2a..5310e311d5c6 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1694,6 +1694,7 @@ xfs_alloc_ag_vextent_size(
 	 * are no smaller extents available.
 	 */
 	if (!i) {
+alloc_small_extent:
 		error = xfs_alloc_ag_vextent_small(args, cnt_cur,
 						   &fbno, &flen, &i);
 		if (error)
@@ -1710,6 +1711,9 @@ xfs_alloc_ag_vextent_size(
 		/*
 		 * Search for a non-busy extent that is large enough.
 		 */
+		xfs_agblock_t	orig_fbno = NULLAGBLOCK;
+		xfs_extlen_t	orig_flen;
+
 		for (;;) {
 			error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
 			if (error)
@@ -1719,6 +1723,11 @@ xfs_alloc_ag_vextent_size(
 				goto error0;
 			}

+			if (orig_fbno == NULLAGBLOCK) {
+				orig_fbno = fbno;
+				orig_flen = flen;
+			}
+
 			busy = xfs_alloc_compute_aligned(args, fbno, flen,
 					&rbno, &rlen, &busy_gen);

@@ -1729,6 +1738,13 @@ xfs_alloc_ag_vextent_size(
 			if (error)
 				goto error0;
 			if (i == 0) {
+				if (args->freeing_extent) {
+					error = xfs_alloc_lookup_eq(cnt_cur,
+							orig_fbno, orig_flen, &i);
+					ASSERT(!error && i);
+					goto alloc_small_extent;
+				}
+
 				/*
 				 * Our only valid extents must have been busy.
 				 * Make it unbusy by forcing the log out and
@@ -1819,7 +1835,7 @@ xfs_alloc_ag_vextent_size(
 	 */
 	args->len = rlen;
 	if (rlen < args->minlen) {
-		if (busy) {
+		if (busy && !args->freeing_extent) {
 			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
 			trace_xfs_alloc_size_busy(args);
 			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
@@ -2641,6 +2657,7 @@ xfs_alloc_fix_freelist(
 	targs.alignment = targs.minlen = targs.prod = 1;
 	targs.type = XFS_ALLOCTYPE_THIS_AG;
 	targs.pag = pag;
+	targs.freeing_extent = flags & XFS_ALLOC_FLAG_FREEING;
 	error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
 	if (error)
 		goto out_agbp_relse;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index a4427c5775c2..1e0fc28ef87a 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -78,6 +78,7 @@ typedef struct xfs_alloc_arg {
 #ifdef DEBUG
 	bool		alloc_minlen_only; /* allocate exact minlen extent */
 #endif
+	bool		freeing_extent;
 } xfs_alloc_arg_t;

 /*

With the above patch, xfs/538 cause the following call trace to be printed,

   XFS (vdc2): Internal error i != 1 at line 3426 of file fs/xfs/libxfs/xfs_btree.c.  Caller xfs_btree_insert+0x15c/0x1f0
   CPU: 2 PID: 1284 Comm: punch-alternati Not tainted 5.12.0-rc8-next-20210419-chandan #19
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
   Call Trace:
    dump_stack+0x64/0x7c
    xfs_corruption_error+0x85/0x90
    ? xfs_btree_insert+0x15c/0x1f0
    xfs_btree_insert+0x18d/0x1f0
    ? xfs_btree_insert+0x15c/0x1f0
    ? xfs_allocbt_init_common+0x30/0xf0
    xfs_free_ag_extent+0x463/0x9d0
    __xfs_free_extent+0xe5/0x200
    xfs_trans_free_extent+0x3e/0x100
    xfs_extent_free_finish_item+0x24/0x40
    xfs_defer_finish_noroll+0x1f7/0x5c0
    __xfs_trans_commit+0x12f/0x300
    xfs_free_file_space+0x1af/0x2c0
    xfs_file_fallocate+0x1ca/0x430
    ? __cond_resched+0x16/0x40
    ? inode_security+0x22/0x60
    ? selinux_file_permission+0xe2/0x120
    vfs_fallocate+0x146/0x2e0
    __x64_sys_fallocate+0x3e/0x70
    do_syscall_64+0x40/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xae

The above call trace occurs during execution of the step #2 listed below,
1. Filling up 90% of the free space of the filesystem.
2. Punch alternate blocks of files.

Just before the failure, the filesystem had ~9000 busy extents. So I think we
have to flush busy extents even when refilling AGFL for the purpose of freeing
an extent.

>
> We have options here - once we get to the end of the btree and
> haven't found a candidate that isn't busy, we could fail
> immediately. Or maybe we try an optimisitic flush which forces the
> log and waits for as short while (instead of forever) for the
> generation to change and then fail if we get a timeout response. Or
> maybe there's a more elegant way of doing this that hasn't yet
> rattled out of my poor, overloaded brain right now.
>
> Just because we currently do a blocking flush doesn't mean we always
> must do a blocking flush....

I will try to work out a solution.

--
chandan

  reply	other threads:[~2021-05-11 11:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28  6:51 [PATCH 1/2] xfs: Introduce XFS_EXTENT_BUSY_IN_TRANS busy extent flag Chandan Babu R
2021-04-28  6:51 ` [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL Chandan Babu R
2021-04-29  1:12   ` Dave Chinner
2021-04-30 13:40     ` Chandan Babu R
2021-04-30 22:44       ` Dave Chinner
2021-05-03  9:52         ` Chandan Babu R
2021-05-04  0:03           ` Dave Chinner
2021-05-05 12:42             ` Chandan Babu R
2021-05-06  3:27               ` Dave Chinner
2021-05-11 11:49                 ` Chandan Babu R [this message]
2021-06-17  4:48                   ` Chandan Babu R
  -- strict thread matches above, loose matches on Subject: below --
2023-01-10 12:24 Xiao Guangrong
2023-01-10 12:48 ` Chandan Babu R
2023-01-11  3:14   ` Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cztxwkvy.fsf@garuda \
    --to=chandanrlinux@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.