From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@infradead.org>
Cc: "Darrick J . Wong" <djwong@kernel.org>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>
Subject: Re: [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop
Date: Mon, 29 Apr 2024 14:14:46 +0530 [thread overview]
Message-ID: <87sez4y2v5.fsf@gmail.com> (raw)
In-Reply-To: <f7d3db235a2c7e16681a323a99bb0ce50a92296a.1714033516.git.ritesh.list@gmail.com>
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:
> An async dio write to a sparse file can generate a lot of extents
> and when we unlink this file (using rm), the kernel can be busy in umapping
> and freeing those extents as part of transaction processing.
> Add cond_resched() in xfs_bunmapi_range() to avoid soft lockups
> messages like these. Here is a call trace of such a soft lockup.
>
> watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
> CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1
> Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
> NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
> LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
> Call Trace:
> xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
> xfs_alloc_compute_aligned+0x5c/0x144
> xfs_alloc_ag_vextent_size+0x238/0x8d4
> xfs_alloc_fix_freelist+0x540/0x694
> xfs_free_extent_fix_freelist+0x84/0xe0
> __xfs_free_extent+0x74/0x1ec
> xfs_extent_free_finish_item+0xcc/0x214
> xfs_defer_finish_one+0x194/0x388
> xfs_defer_finish_noroll+0x1b4/0x5c8
> xfs_defer_finish+0x2c/0xc4
> xfs_bunmapi_range+0xa4/0x100
> xfs_itruncate_extents_flags+0x1b8/0x2f4
> xfs_inactive_truncate+0xe0/0x124
> xfs_inactive+0x30c/0x3e0
> xfs_inodegc_worker+0x140/0x234
> process_scheduled_works+0x240/0x57c
> worker_thread+0x198/0x468
> kthread+0x138/0x140
> start_kernel_thread+0x14/0x18
>
My v1 patch had cond_resched() in xfs_defer_finish_noroll, since I was
suspecting that it's a common point where we loop for many other
operations. And initially Dave also suggested for the same [1].
But I was not totally convinced given the only problematic path I
had till now was in unmapping extents. So this patch keeps the
cond_resched() in xfs_bunmapi_range() loop.
[1]: https://lore.kernel.org/all/ZZ8OaNnp6b%2FPJzsb@dread.disaster.area/
However, I was able to reproduce a problem with reflink remapping path
both on Power (with 64k bs) and on x86 (with preempt=none and with KASAN
enabled). I actually noticed while I was doing regression testing of
some of the iomap changes with KASAN enabled. The issue was seen with
generic/175 for both on Power and x86.
Do you think we should keep the cond_resched() inside
xfs_defer_finish_noroll() loop like we had in v1 [2]. If yes, then I can rebase
v1 on the latest upstream tree and also update the commit msg with both
call stacks.
[2]: https://lore.kernel.org/all/0bfaf740a2d10cc846616ae05963491316850c52.1713674899.git.ritesh.list@gmail.com/
<call stack on Power>
======================
run fstests generic/175 at 2024-02-02 04:40:21
<...>
[ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0-150600.5-default #1
NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs]
LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs]
Call Trace:
0xc000000014107c00 (unreliable)
__xfs_btree_updkeys+0x8c/0x2c0 [xfs]
xfs_btree_update_keys+0x150/0x170 [xfs]
xfs_btree_lshift+0x534/0x660 [xfs]
xfs_btree_make_block_unfull+0x19c/0x240 [xfs]
xfs_btree_insrec+0x4e4/0x630 [xfs]
xfs_btree_insert+0x104/0x2d0 [xfs]
xfs_rmap_insert+0xc4/0x260 [xfs]
xfs_rmap_map_shared+0x228/0x630 [xfs]
xfs_rmap_finish_one+0x2d4/0x350 [xfs]
xfs_rmap_update_finish_item+0x44/0xc0 [xfs]
xfs_defer_finish_noroll+0x2e4/0x740 [xfs]
__xfs_trans_commit+0x1f4/0x400 [xfs]
xfs_reflink_remap_extent+0x2d8/0x650 [xfs]
xfs_reflink_remap_blocks+0x154/0x320 [xfs]
xfs_file_remap_range+0x138/0x3a0 [xfs]
do_clone_file_range+0x11c/0x2f0
vfs_clone_file_range+0x60/0x1c0
ioctl_file_clone+0x78/0x140
sys_ioctl+0x934/0x1270
system_call_exception+0x158/0x320
system_call_vectored_common+0x15c/0x2ec
<call stack on x86 with KASAN>
===============================
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [xfs_io:3438095]
CPU: 6 PID: 3438095 Comm: xfs_io Not tainted 6.9.0-rc5-xfstests-perf-00008-g4e2752e99f55 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.o4
RIP: 0010:_raw_spin_unlock_irqrestore+0x3c/0x60
Code: 10 48 89 fb 48 83 c7 18 e8 31 7c 10 fd 48 89 df e8 79 f5 10 fd f7 c5 00 02 00 00 74 06 e8 0c 6
Call Trace:
<IRQ>
? watchdog_timer_fn+0x2dc/0x3a0
? __pfx_watchdog_timer_fn+0x10/0x10
? __hrtimer_run_queues+0x4a1/0x870
? __pfx___hrtimer_run_queues+0x10/0x10
? kvm_clock_get_cycles+0x18/0x30
? ktime_get_update_offsets_now+0xc6/0x2f0
? hrtimer_interrupt+0x2b8/0x7a0
? __sysvec_apic_timer_interrupt+0xca/0x390
? sysvec_apic_timer_interrupt+0x65/0x80
</IRQ>
<TASK>
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? _raw_spin_unlock_irqrestore+0x34/0x60
? _raw_spin_unlock_irqrestore+0x3c/0x60
? _raw_spin_unlock_irqrestore+0x34/0x60
get_partial_node.part.0+0x1af/0x340
___slab_alloc+0xc07/0x1250
? do_vfs_ioctl+0xe5c/0x1660
? __x64_sys_ioctl+0xd5/0x1b0
? __alloc_object+0x39/0x660
? __pfx___might_resched+0x10/0x10
? __alloc_object+0x39/0x660
? kmem_cache_alloc+0x3cd/0x410
? should_failslab+0xe/0x20
kmem_cache_alloc+0x3cd/0x410
__alloc_object+0x39/0x660
__create_object+0x22/0x90
kmem_cache_alloc+0x324/0x410
xfs_bui_init+0x1b/0x150
xfs_bmap_update_create_intent+0x48/0x110
? __pfx_xfs_bmap_update_create_intent+0x10/0x10
xfs_defer_create_intent+0xcc/0x1b0
xfs_defer_create_intents+0x8f/0x230
xfs_defer_finish_noroll+0x1c0/0x1160
? xfs_inode_item_precommit+0x2c1/0x880
? __create_object+0x5e/0x90
? __pfx_xfs_defer_finish_noroll+0x10/0x10
? xfs_trans_run_precommits+0x126/0x200
__xfs_trans_commit+0x767/0xbe0
? inode_maybe_inc_iversion+0xe2/0x150
? __pfx___xfs_trans_commit+0x10/0x10
xfs_reflink_remap_extent+0x654/0xd40
? __pfx_xfs_reflink_remap_extent+0x10/0x10
? __pfx_down_read_nested+0x10/0x10
xfs_reflink_remap_blocks+0x21a/0x850
? __pfx_xfs_reflink_remap_blocks+0x10/0x10
? _raw_spin_unlock+0x23/0x40
? xfs_reflink_remap_prep+0x47d/0x900
xfs_file_remap_range+0x296/0xb40
? __pfx_xfs_file_remap_range+0x10/0x10
? __pfx_lock_acquire+0x10/0x10
? __pfx___might_resched+0x10/0x10
vfs_clone_file_range+0x260/0xc20
ioctl_file_clone+0x49/0xb0
do_vfs_ioctl+0xe5c/0x1660
? __pfx_do_vfs_ioctl+0x10/0x10
? trace_irq_enable.constprop.0+0xd2/0x110
? kasan_quarantine_put+0x7e/0x1d0
? do_sys_openat2+0x120/0x170
? lock_acquire+0x43b/0x4f0
? __pfx_lock_release+0x10/0x10
? __pfx_do_sys_openat2+0x10/0x10
? __do_sys_newfstatat+0x94/0xe0
? __fget_files+0x1ce/0x330
__x64_sys_ioctl+0xd5/0x1b0
do_syscall_64+0x6a/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ffff7d1a94f
</TASK>
-ritesh
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 656c95a22f2e..44d5381bc66f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -6354,6 +6354,7 @@ xfs_bunmapi_range(
> error = xfs_defer_finish(tpp);
> if (error)
> goto out;
> + cond_resched();
> }
> out:
> return error;
> --
> 2.44.0
next prev parent reply other threads:[~2024-04-29 8:44 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 8:35 [PATCHv2 0/1] xfs: soft lockups while unmapping large no. of extents Ritesh Harjani (IBM)
2024-04-25 8:35 ` [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop Ritesh Harjani (IBM)
2024-04-25 12:18 ` Christoph Hellwig
2024-04-29 8:44 ` Ritesh Harjani [this message]
2024-04-29 15:11 ` Darrick J. Wong
2024-04-30 5:04 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sez4y2v5.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).