From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, "Darrick J . Wong" <djwong@kernel.org>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
stable@vger.kernel.org
Subject: Re: [PATCH 1/1] xfs: Add cond_resched to xfs_defer_finish_noroll
Date: Mon, 22 Apr 2024 20:40:03 +0530 [thread overview]
Message-ID: <87ttjto2lw.fsf@gmail.com> (raw)
In-Reply-To: <ZiWjbWrD60W/0s/F@dread.disaster.area>
Dave Chinner <david@fromorbit.com> writes:
> On Sun, Apr 21, 2024 at 01:19:44PM +0530, Ritesh Harjani (IBM) wrote:
>> An async dio write to a sparse file can generate a lot of extents
>> and when we unlink this file (using rm), the kernel can be busy in umapping
>> and freeing those extents as part of transaction processing.
>> Add cond_resched() in xfs_defer_finish_noroll() to avoid soft lockups
>> messages. Here is a call trace of such soft lockup.
>>
>> watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [rm:81335]
>> CPU: 1 PID: 81335 Comm: rm Kdump: loaded Tainted: G L X 5.14.21-150500.53-default
>
> Can you reproduce this on a current TOT kernel? 5.14 is pretty old,
> and this stack trace:
>
Yes, I was able to reproduce this on upstream kernel too -
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1
Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
Call Trace:
xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
xfs_alloc_compute_aligned+0x5c/0x144
xfs_alloc_ag_vextent_size+0x238/0x8d4
xfs_alloc_fix_freelist+0x540/0x694
xfs_free_extent_fix_freelist+0x84/0xe0
__xfs_free_extent+0x74/0x1ec
xfs_extent_free_finish_item+0xcc/0x214
xfs_defer_finish_one+0x194/0x388
xfs_defer_finish_noroll+0x1b4/0x5c8
xfs_defer_finish+0x2c/0xc4
xfs_bunmapi_range+0xa4/0x100
xfs_itruncate_extents_flags+0x1b8/0x2f4
xfs_inactive_truncate+0xe0/0x124
xfs_inactive+0x30c/0x3e0
xfs_inodegc_worker+0x140/0x234
process_scheduled_works+0x240/0x57c
worker_thread+0x198/0x468
kthread+0x138/0x140
start_kernel_thread+0x14/0x18
>> NIP [c00800001b174768] xfs_extent_busy_trim+0xc0/0x2a0 [xfs]
>> LR [c00800001b1746f4] xfs_extent_busy_trim+0x4c/0x2a0 [xfs]
>> Call Trace:
>> 0xc0000000a8268340 (unreliable)
>> xfs_alloc_compute_aligned+0x5c/0x150 [xfs]
>> xfs_alloc_ag_vextent_size+0x1dc/0x8c0 [xfs]
>> xfs_alloc_ag_vextent+0x17c/0x1c0 [xfs]
>> xfs_alloc_fix_freelist+0x274/0x4b0 [xfs]
>> xfs_free_extent_fix_freelist+0x84/0xe0 [xfs]
>> __xfs_free_extent+0xa0/0x240 [xfs]
>> xfs_trans_free_extent+0x6c/0x140 [xfs]
>> xfs_defer_finish_noroll+0x2b0/0x650 [xfs]
>> xfs_inactive_truncate+0xe8/0x140 [xfs]
>> xfs_fs_destroy_inode+0xdc/0x320 [xfs]
>> destroy_inode+0x6c/0xc0
>
> .... doesn't exist anymore.
>
> xfs_inactive_truncate() is now done from a
> background inodegc thread, not directly in destroy_inode().
>
> I also suspect that any sort of cond_resched() should be in the top
> layer loop in xfs_bunmapi_range(), not hidden deep in the defer
> code. The problem is the number of extents being processed without
> yielding, not the time spent processing each individual deferred
> work chain to free the extent. Hence the explicit rescheduling
> should be at the top level loop where it can be easily explained and
> understand, not hidden deep inside the defer chain mechanism....
Yes, sure. I will submit a v2 with this diff then (after I verify the
fix once on the setup, just for my sanity)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 656c95a22f2e..44d5381bc66f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -6354,6 +6354,7 @@ xfs_bunmapi_range(
error = xfs_defer_finish(tpp);
if (error)
goto out;
+ cond_resched();
}
out:
return error;
-ritesh
prev parent reply other threads:[~2024-04-22 15:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-21 7:49 [PATCH 0/1] xfs: soft lockups for unmapping large no. of extents Ritesh Harjani (IBM)
2024-04-21 7:49 ` [PATCH 1/1] xfs: Add cond_resched to xfs_defer_finish_noroll Ritesh Harjani (IBM)
2024-04-21 23:38 ` Dave Chinner
2024-04-22 15:10 ` Ritesh Harjani [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttjto2lw.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.