From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL
Date: Tue, 11 May 2021 17:19:21 +0530 [thread overview]
Message-ID: <87cztxwkvy.fsf@garuda> (raw)
In-Reply-To: <20210506032751.GN63242@dread.disaster.area>
On 06 May 2021 at 08:57, Dave Chinner wrote:
> On Wed, May 05, 2021 at 06:12:41PM +0530, Chandan Babu R wrote:
>> > Hence when doing allocation for the free list, we need to fail the
>> > allocation rather than block on the only remaining free extent in
>> > the AG. If we are freeing extents, the AGFL not being full is not an
>> > issue at all. And if we are allocating extents, the transaction
>> > reservations should have ensured that the AG had sufficient space in
>> > it to complete the entire operation without deadlocking waiting for
>> > space.
>> >
>> > Either way, I don't see a problem with making sure the AGFL
>> > allocations just skip busy extents and fail if the only free extents
>> > are ones this transaction has freed itself.
>> >
>>
>> Hmm. In the scenario where *all* free extents in the AG were originally freed
>> by the current transaction (and hence busy in the transaction),
>
> How does that happen?
I tried in vain to arrive at the above mentioned scenario by consuming away as
many blocks as possible from the filesystem. At best, I could arrive at an AG
with just one free extent record in the cntbt (NOTE: I had to disable global
reservation by invoking "xfs_io -x -c 'resblks 0' $mntpnt"):
recs[1] = [startblock,blockcount]
1:[32767,1]
For each AG available in an FS instance, we take away 8
(i.e. XFS_ALLOC_AGFL_RESERVE + 4) blocks from the global free data blocks
counter. This reservation is applied to the FS as a whole rather than each AG
individually. Hence we could get to a scenario where an AG could have less
than 8 free blocks. I could not find any other restriction in the code that
explicitly prevents an AG from having zero free extents.
However, I could not create such an AG because any fs operation that needs
extent allocation to be done would try to reserve more than 1 extent causing
the above cited AG to not be chosen.
>
>> we would need
>> to be able to recognize this situation and skip invoking
>> xfs_extent_busy_flush() altogether.
>
> If we are freeing extents (i.e XFS_ALLOC_FLAG_FREEING is set) and
> we are doing allocation for AGFL and we only found busy extents,
> then it's OK to fail the allocation.
When freeing an extent, the following patch skips allocation of blocks to AGFL
if all the free extents found are busy,
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index aaa19101bb2a..5310e311d5c6 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1694,6 +1694,7 @@ xfs_alloc_ag_vextent_size(
* are no smaller extents available.
*/
if (!i) {
+alloc_small_extent:
error = xfs_alloc_ag_vextent_small(args, cnt_cur,
&fbno, &flen, &i);
if (error)
@@ -1710,6 +1711,9 @@ xfs_alloc_ag_vextent_size(
/*
* Search for a non-busy extent that is large enough.
*/
+ xfs_agblock_t orig_fbno = NULLAGBLOCK;
+ xfs_extlen_t orig_flen;
+
for (;;) {
error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
if (error)
@@ -1719,6 +1723,11 @@ xfs_alloc_ag_vextent_size(
goto error0;
}
+ if (orig_fbno == NULLAGBLOCK) {
+ orig_fbno = fbno;
+ orig_flen = flen;
+ }
+
busy = xfs_alloc_compute_aligned(args, fbno, flen,
&rbno, &rlen, &busy_gen);
@@ -1729,6 +1738,13 @@ xfs_alloc_ag_vextent_size(
if (error)
goto error0;
if (i == 0) {
+ if (args->freeing_extent) {
+ error = xfs_alloc_lookup_eq(cnt_cur,
+ orig_fbno, orig_flen, &i);
+ ASSERT(!error && i);
+ goto alloc_small_extent;
+ }
+
/*
* Our only valid extents must have been busy.
* Make it unbusy by forcing the log out and
@@ -1819,7 +1835,7 @@ xfs_alloc_ag_vextent_size(
*/
args->len = rlen;
if (rlen < args->minlen) {
- if (busy) {
+ if (busy && !args->freeing_extent) {
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
trace_xfs_alloc_size_busy(args);
xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
@@ -2641,6 +2657,7 @@ xfs_alloc_fix_freelist(
targs.alignment = targs.minlen = targs.prod = 1;
targs.type = XFS_ALLOCTYPE_THIS_AG;
targs.pag = pag;
+ targs.freeing_extent = flags & XFS_ALLOC_FLAG_FREEING;
error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
if (error)
goto out_agbp_relse;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index a4427c5775c2..1e0fc28ef87a 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -78,6 +78,7 @@ typedef struct xfs_alloc_arg {
#ifdef DEBUG
bool alloc_minlen_only; /* allocate exact minlen extent */
#endif
+ bool freeing_extent;
} xfs_alloc_arg_t;
/*
With the above patch, xfs/538 cause the following call trace to be printed,
XFS (vdc2): Internal error i != 1 at line 3426 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x15c/0x1f0
CPU: 2 PID: 1284 Comm: punch-alternati Not tainted 5.12.0-rc8-next-20210419-chandan #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0x64/0x7c
xfs_corruption_error+0x85/0x90
? xfs_btree_insert+0x15c/0x1f0
xfs_btree_insert+0x18d/0x1f0
? xfs_btree_insert+0x15c/0x1f0
? xfs_allocbt_init_common+0x30/0xf0
xfs_free_ag_extent+0x463/0x9d0
__xfs_free_extent+0xe5/0x200
xfs_trans_free_extent+0x3e/0x100
xfs_extent_free_finish_item+0x24/0x40
xfs_defer_finish_noroll+0x1f7/0x5c0
__xfs_trans_commit+0x12f/0x300
xfs_free_file_space+0x1af/0x2c0
xfs_file_fallocate+0x1ca/0x430
? __cond_resched+0x16/0x40
? inode_security+0x22/0x60
? selinux_file_permission+0xe2/0x120
vfs_fallocate+0x146/0x2e0
__x64_sys_fallocate+0x3e/0x70
do_syscall_64+0x40/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The above call trace occurs during execution of the step #2 listed below,
1. Filling up 90% of the free space of the filesystem.
2. Punch alternate blocks of files.
Just before the failure, the filesystem had ~9000 busy extents. So I think we
have to flush busy extents even when refilling AGFL for the purpose of freeing
an extent.
>
> We have options here - once we get to the end of the btree and
> haven't found a candidate that isn't busy, we could fail
> immediately. Or maybe we try an optimisitic flush which forces the
> log and waits for as short while (instead of forever) for the
> generation to change and then fail if we get a timeout response. Or
> maybe there's a more elegant way of doing this that hasn't yet
> rattled out of my poor, overloaded brain right now.
>
> Just because we currently do a blocking flush doesn't mean we always
> must do a blocking flush....
I will try to work out a solution.
--
chandan
next prev parent reply other threads:[~2021-05-11 11:49 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-28 6:51 [PATCH 1/2] xfs: Introduce XFS_EXTENT_BUSY_IN_TRANS busy extent flag Chandan Babu R
2021-04-28 6:51 ` [PATCH 2/2] xfs: Prevent deadlock when allocating blocks for AGFL Chandan Babu R
2021-04-29 1:12 ` Dave Chinner
2021-04-30 13:40 ` Chandan Babu R
2021-04-30 22:44 ` Dave Chinner
2021-05-03 9:52 ` Chandan Babu R
2021-05-04 0:03 ` Dave Chinner
2021-05-05 12:42 ` Chandan Babu R
2021-05-06 3:27 ` Dave Chinner
2021-05-11 11:49 ` Chandan Babu R [this message]
2021-06-17 4:48 ` Chandan Babu R
-- strict thread matches above, loose matches on Subject: below --
2023-01-10 12:24 Xiao Guangrong
2023-01-10 12:48 ` Chandan Babu R
2023-01-11 3:14 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cztxwkvy.fsf@garuda \
--to=chandanrlinux@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox