* [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags @ 2024-01-10 7:13 Jian Wen 2024-01-10 17:45 ` Darrick J. Wong 2024-01-10 21:38 ` Dave Chinner 0 siblings, 2 replies; 9+ messages in thread From: Jian Wen @ 2024-01-10 7:13 UTC (permalink / raw) To: linux-xfs; +Cc: Jian Wen, djwong, hch, dchinner, Jian Wen From: Jian Wen <wenjianhn@gmail.com> Deleting a file with lots of extents may cause a soft lockup if the preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. Explicitly call cond_resched in xfs_itruncate_extents_flags avoid the below softlockup warning. watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139] CPU: 0 PID: 139 Comm: kworker/0:13 Not tainted 6.7.0-rc8-g610a9b8f49fb #23 Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker Call Trace: _raw_spin_lock+0x30/0x80 ? xfs_extent_busy_trim+0x38/0x200 xfs_extent_busy_trim+0x38/0x200 xfs_alloc_compute_aligned+0x38/0xd0 xfs_alloc_ag_vextent_size+0x1f1/0x870 xfs_alloc_fix_freelist+0x58a/0x770 xfs_free_extent_fix_freelist+0x60/0xa0 __xfs_free_extent+0x66/0x1d0 xfs_trans_free_extent+0xac/0x290 xfs_extent_free_finish_item+0xf/0x40 xfs_defer_finish_noroll+0x1db/0x7f0 xfs_defer_finish+0x10/0xa0 xfs_itruncate_extents_flags+0x169/0x4c0 xfs_inactive_truncate+0xba/0x140 xfs_inactive+0x239/0x2a0 xfs_inodegc_worker+0xa3/0x210 ? process_scheduled_works+0x273/0x550 process_scheduled_works+0x2da/0x550 worker_thread+0x180/0x350 Most of the Linux distributions default to voluntary preemption, might_sleep() would yield CPU if needed. Thus they are not affected. kworker/0:24+xf 298 [000] 7294.810021: probe:__cond_resched: __cond_resched+0x5 ([kernel.kallsyms]) __kmem_cache_alloc_node+0x17c ([kernel.kallsyms]) __kmalloc+0x4d ([kernel.kallsyms]) kmem_alloc+0x7a ([kernel.kallsyms]) xfs_extent_busy_insert_list+0x2e ([kernel.kallsyms]) __xfs_free_extent+0xd3 ([kernel.kallsyms]) xfs_trans_free_extent+0x93 ([kernel.kallsyms]) xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) kworker/0:24+xf 298 [000] 7294.810045: probe:__cond_resched: __cond_resched+0x5 ([kernel.kallsyms]) down+0x11 ([kernel.kallsyms]) xfs_buf_lock+0x2c ([kernel.kallsyms]) xfs_buf_find_lock+0x62 ([kernel.kallsyms]) xfs_buf_get_map+0x1fd ([kernel.kallsyms]) xfs_buf_read_map+0x51 ([kernel.kallsyms]) xfs_trans_read_buf_map+0x1c5 ([kernel.kallsyms]) xfs_btree_read_buf_block.constprop.0+0xa1 ([kernel.kallsyms]) xfs_btree_lookup_get_block+0x97 ([kernel.kallsyms]) xfs_btree_lookup+0xc5 ([kernel.kallsyms]) xfs_alloc_lookup_eq+0x18 ([kernel.kallsyms]) xfs_free_ag_extent+0x63f ([kernel.kallsyms]) __xfs_free_extent+0xa7 ([kernel.kallsyms]) xfs_trans_free_extent+0x93 ([kernel.kallsyms]) xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) Signed-off-by: Jian Wen <wenjian1@xiaomi.com> --- fs/xfs/xfs_inode.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index c0f1c89786c2..194381e10472 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -4,6 +4,7 @@ * All Rights Reserved. */ #include <linux/iversion.h> +#include <linux/sched.h> #include "xfs.h" #include "xfs_fs.h" @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( error = xfs_defer_finish(&tp); if (error) goto out; + + cond_resched(); } if (whichfork == XFS_DATA_FORK) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-10 7:13 [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags Jian Wen @ 2024-01-10 17:45 ` Darrick J. Wong 2024-01-11 12:40 ` Jian Wen 2024-01-11 20:16 ` Dave Chinner 2024-01-10 21:38 ` Dave Chinner 1 sibling, 2 replies; 9+ messages in thread From: Darrick J. Wong @ 2024-01-10 17:45 UTC (permalink / raw) To: Jian Wen; +Cc: linux-xfs, hch, dchinner, Jian Wen On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > From: Jian Wen <wenjianhn@gmail.com> > > Deleting a file with lots of extents may cause a soft lockup if the > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > the below softlockup warning. > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139] Wowwee, how many extents does this file have, anyway? > CPU: 0 PID: 139 Comm: kworker/0:13 Not tainted 6.7.0-rc8-g610a9b8f49fb #23 > Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker > Call Trace: > _raw_spin_lock+0x30/0x80 > ? xfs_extent_busy_trim+0x38/0x200 > xfs_extent_busy_trim+0x38/0x200 > xfs_alloc_compute_aligned+0x38/0xd0 > xfs_alloc_ag_vextent_size+0x1f1/0x870 > xfs_alloc_fix_freelist+0x58a/0x770 > xfs_free_extent_fix_freelist+0x60/0xa0 > __xfs_free_extent+0x66/0x1d0 > xfs_trans_free_extent+0xac/0x290 > xfs_extent_free_finish_item+0xf/0x40 > xfs_defer_finish_noroll+0x1db/0x7f0 > xfs_defer_finish+0x10/0xa0 > xfs_itruncate_extents_flags+0x169/0x4c0 > xfs_inactive_truncate+0xba/0x140 > xfs_inactive+0x239/0x2a0 > xfs_inodegc_worker+0xa3/0x210 > ? process_scheduled_works+0x273/0x550 > process_scheduled_works+0x2da/0x550 > worker_thread+0x180/0x350 > > Most of the Linux distributions default to voluntary preemption, > might_sleep() would yield CPU if needed. Thus they are not affected. > kworker/0:24+xf 298 [000] 7294.810021: probe:__cond_resched: > __cond_resched+0x5 ([kernel.kallsyms]) > __kmem_cache_alloc_node+0x17c ([kernel.kallsyms]) > __kmalloc+0x4d ([kernel.kallsyms]) > kmem_alloc+0x7a ([kernel.kallsyms]) > xfs_extent_busy_insert_list+0x2e ([kernel.kallsyms]) > __xfs_free_extent+0xd3 ([kernel.kallsyms]) > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > kworker/0:24+xf 298 [000] 7294.810045: probe:__cond_resched: > __cond_resched+0x5 ([kernel.kallsyms]) > down+0x11 ([kernel.kallsyms]) > xfs_buf_lock+0x2c ([kernel.kallsyms]) > xfs_buf_find_lock+0x62 ([kernel.kallsyms]) > xfs_buf_get_map+0x1fd ([kernel.kallsyms]) > xfs_buf_read_map+0x51 ([kernel.kallsyms]) > xfs_trans_read_buf_map+0x1c5 ([kernel.kallsyms]) > xfs_btree_read_buf_block.constprop.0+0xa1 ([kernel.kallsyms]) > xfs_btree_lookup_get_block+0x97 ([kernel.kallsyms]) > xfs_btree_lookup+0xc5 ([kernel.kallsyms]) > xfs_alloc_lookup_eq+0x18 ([kernel.kallsyms]) > xfs_free_ag_extent+0x63f ([kernel.kallsyms]) > __xfs_free_extent+0xa7 ([kernel.kallsyms]) > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com> > --- > fs/xfs/xfs_inode.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index c0f1c89786c2..194381e10472 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -4,6 +4,7 @@ > * All Rights Reserved. > */ > #include <linux/iversion.h> > +#include <linux/sched.h> > > #include "xfs.h" > #include "xfs_fs.h" > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > error = xfs_defer_finish(&tp); > if (error) > goto out; > + > + cond_resched(); Is there a particular reason why truncation uses a giant chained transaction for all extents (with xfs_defer_finish) instead of, say, one chain per extent? (If you've actually studied this and know why then I guess cond_resched is a reasonable bandaid, but I don't want to play cond_resched whackamole here.) --D > } > > if (whichfork == XFS_DATA_FORK) { > -- > 2.25.1 > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-10 17:45 ` Darrick J. Wong @ 2024-01-11 12:40 ` Jian Wen 2024-01-11 20:16 ` Dave Chinner 1 sibling, 0 replies; 9+ messages in thread From: Jian Wen @ 2024-01-11 12:40 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, Jian Wen On Thu, Jan 11, 2024 at 1:45 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > From: Jian Wen <wenjianhn@gmail.com> > > > > Deleting a file with lots of extents may cause a soft lockup if the > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > the below softlockup warning. > > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139] > > Wowwee, how many extents does this file have, anyway? I deleted a file with 346915 extents in my test environment. I should have mentioned lockdep was enabled: CONFIG_LOCKDEP=y Without CONFIG_LOCKDEP the scheduling latency was several seconds. P99 latency of one of our services increased from a few ms to 280 ms because of the issue. Below are the steps to make a file badly fragmented. root@xfsunlink:~/vda# uname -r 6.7.0-rc8-g610a9b8f49fb root@xfsunlink:~/vda# img=test.raw ; qemu-img create -f raw -o extent_size_hint=0 ${img} 10G && qemu-img bench -f raw -t none -n -w ${img} -c 1000000 -S 8192 -o 0 && filefrag ${img} Formatting 'test.raw', fmt=raw size=10737418240 extent_size_hint=0 Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192) Run completed in 42.506 seconds. test.raw: 346915 extents found > > > CPU: 0 PID: 139 Comm: kworker/0:13 Not tainted 6.7.0-rc8-g610a9b8f49fb #23 > > Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker > > Call Trace: > > _raw_spin_lock+0x30/0x80 > > ? xfs_extent_busy_trim+0x38/0x200 > > xfs_extent_busy_trim+0x38/0x200 > > xfs_alloc_compute_aligned+0x38/0xd0 > > xfs_alloc_ag_vextent_size+0x1f1/0x870 > > xfs_alloc_fix_freelist+0x58a/0x770 > > xfs_free_extent_fix_freelist+0x60/0xa0 > > __xfs_free_extent+0x66/0x1d0 > > xfs_trans_free_extent+0xac/0x290 > > xfs_extent_free_finish_item+0xf/0x40 > > xfs_defer_finish_noroll+0x1db/0x7f0 > > xfs_defer_finish+0x10/0xa0 > > xfs_itruncate_extents_flags+0x169/0x4c0 > > xfs_inactive_truncate+0xba/0x140 > > xfs_inactive+0x239/0x2a0 > > xfs_inodegc_worker+0xa3/0x210 > > ? process_scheduled_works+0x273/0x550 > > process_scheduled_works+0x2da/0x550 > > worker_thread+0x180/0x350 > > > > Most of the Linux distributions default to voluntary preemption, > > might_sleep() would yield CPU if needed. Thus they are not affected. > > kworker/0:24+xf 298 [000] 7294.810021: probe:__cond_resched: > > __cond_resched+0x5 ([kernel.kallsyms]) > > __kmem_cache_alloc_node+0x17c ([kernel.kallsyms]) > > __kmalloc+0x4d ([kernel.kallsyms]) > > kmem_alloc+0x7a ([kernel.kallsyms]) > > xfs_extent_busy_insert_list+0x2e ([kernel.kallsyms]) > > __xfs_free_extent+0xd3 ([kernel.kallsyms]) > > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > > > kworker/0:24+xf 298 [000] 7294.810045: probe:__cond_resched: > > __cond_resched+0x5 ([kernel.kallsyms]) > > down+0x11 ([kernel.kallsyms]) > > xfs_buf_lock+0x2c ([kernel.kallsyms]) > > xfs_buf_find_lock+0x62 ([kernel.kallsyms]) > > xfs_buf_get_map+0x1fd ([kernel.kallsyms]) > > xfs_buf_read_map+0x51 ([kernel.kallsyms]) > > xfs_trans_read_buf_map+0x1c5 ([kernel.kallsyms]) > > xfs_btree_read_buf_block.constprop.0+0xa1 ([kernel.kallsyms]) > > xfs_btree_lookup_get_block+0x97 ([kernel.kallsyms]) > > xfs_btree_lookup+0xc5 ([kernel.kallsyms]) > > xfs_alloc_lookup_eq+0x18 ([kernel.kallsyms]) > > xfs_free_ag_extent+0x63f ([kernel.kallsyms]) > > __xfs_free_extent+0xa7 ([kernel.kallsyms]) > > xfs_trans_free_extent+0x93 ([kernel.kallsyms]) > > xfs_extent_free_finish_item+0xf ([kernel.kallsyms]) > > > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com> > > --- > > fs/xfs/xfs_inode.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index c0f1c89786c2..194381e10472 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -4,6 +4,7 @@ > > * All Rights Reserved. > > */ > > #include <linux/iversion.h> > > +#include <linux/sched.h> > > > > #include "xfs.h" > > #include "xfs_fs.h" > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > error = xfs_defer_finish(&tp); > > if (error) > > goto out; > > + > > + cond_resched(); > > Is there a particular reason why truncation uses a giant chained > transaction for all extents (with xfs_defer_finish) instead of, say, one > chain per extent? I have no clue yet. > > (If you've actually studied this and know why then I guess cond_resched > is a reasonable bandaid, but I don't want to play cond_resched > whackamole here.) > > --D > > > } > > > > if (whichfork == XFS_DATA_FORK) { > > -- > > 2.25.1 > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-10 17:45 ` Darrick J. Wong 2024-01-11 12:40 ` Jian Wen @ 2024-01-11 20:16 ` Dave Chinner 1 sibling, 0 replies; 9+ messages in thread From: Dave Chinner @ 2024-01-11 20:16 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Jian Wen, linux-xfs, hch, dchinner, Jian Wen On Wed, Jan 10, 2024 at 09:45:01AM -0800, Darrick J. Wong wrote: > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > From: Jian Wen <wenjianhn@gmail.com> > > > > Deleting a file with lots of extents may cause a soft lockup if the > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > the below softlockup warning. > > watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:13:139] > > Wowwee, how many extents does this file have, anyway? IIRC, extent removal runs at about 50,000 extents/s on a typical modern CPU on a production kernel build when everything is cached. 23s then equates to a bit above a million extents.... > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index c0f1c89786c2..194381e10472 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -4,6 +4,7 @@ > > * All Rights Reserved. > > */ > > #include <linux/iversion.h> > > +#include <linux/sched.h> > > > > #include "xfs.h" > > #include "xfs_fs.h" > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > error = xfs_defer_finish(&tp); > > if (error) > > goto out; > > + > > + cond_resched(); > > Is there a particular reason why truncation uses a giant chained > transaction for all extents (with xfs_defer_finish) instead of, say, one > chain per extent? IIUC the question correctly, extent removal for a given range ibeing truncated has to been seen externally as an atomic change so the ILOCK has to held across the entire removal operation. This means it has to be executed as a single rolling transaction as transaction reservation cannot be done with the ILOCK held and the atomicity of extent removal prevents the ILOCK from being dropped. This atomicity only really matters for active operations like truncate, hole punching, etc when other operations may also be trying to act on the extent list. However, in inodegc, we are essentially guaranteed that nobody else will need to access the extent list whilst we are truncating them away, so it could be done as a bunch of fine grained transactions. However, this doesn't solve the problem the CPU never being yeilded if everything is cached and no locks or transaction reservations are ever contended. It will still hold the CPU for the same length of time. Hence I'm not sure changing the way the transactions are run will really change anything material w.r.t. the issue at hand. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-10 7:13 [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags Jian Wen 2024-01-10 17:45 ` Darrick J. Wong @ 2024-01-10 21:38 ` Dave Chinner 2024-01-11 12:52 ` Jian Wen 1 sibling, 1 reply; 9+ messages in thread From: Dave Chinner @ 2024-01-10 21:38 UTC (permalink / raw) To: Jian Wen; +Cc: linux-xfs, djwong, hch, dchinner, Jian Wen On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > From: Jian Wen <wenjianhn@gmail.com> > > Deleting a file with lots of extents may cause a soft lockup if the > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. Time for them to move to CONFIG_PREEMPT_DYNAMIC? Also there has been recent action towards removing CONFIG_PREEMPT_NONE/VOLUNTARY and cond_resched() altogether because the lazy preemption model coming present in the RTPREEMPT patchset solves the performance issues with full preemption that PREEMPT_NONE works around... https://lwn.net/Articles/944686/ https://lwn.net/Articles/945422/ Further, Thomas Gleixner has stated in those discussions that: "Though definitely I'm putting a permanent NAK in place for any attempts to duct tape the preempt=NONE model any further by sprinkling more cond*() and whatever warts around." https://lwn.net/ml/linux-kernel/87jzshhexi.ffs@tglx/ > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > the below softlockup warning. IOWs, this is no longer considered an acceptible solution by core kernel maintainers. Regardless of these policy issues, the code change: > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index c0f1c89786c2..194381e10472 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -4,6 +4,7 @@ > * All Rights Reserved. > */ > #include <linux/iversion.h> > +#include <linux/sched.h> Global includes like this go in fs/xfs/xfs_linux.h, but I don't think that's even necessary because we have cond_resched() calls elsewhere in XFS with the same include list as xfs_inode.c... > #include "xfs.h" > #include "xfs_fs.h" > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > error = xfs_defer_finish(&tp); > if (error) > goto out; > + > + cond_resched(); > } Shouldn't this go in xfs_defer_finish() so that we capture all the cases where we loop indefinitely over a range continually rolling a permanent transaction via xfs_defer_finish()? -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-10 21:38 ` Dave Chinner @ 2024-01-11 12:52 ` Jian Wen 2024-01-11 20:27 ` Dave Chinner 0 siblings, 1 reply; 9+ messages in thread From: Jian Wen @ 2024-01-11 12:52 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, djwong, hch, dchinner, Jian Wen On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: > > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > From: Jian Wen <wenjianhn@gmail.com> > > > > Deleting a file with lots of extents may cause a soft lockup if the > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > Time for them to move to CONFIG_PREEMPT_DYNAMIC? I had asked one of them to support CONFIG_PREEMPT_DYNAMIC before sending the patch. > > Also there has been recent action towards removing > CONFIG_PREEMPT_NONE/VOLUNTARY and cond_resched() altogether because > the lazy preemption model coming present in the RTPREEMPT patchset > solves the performance issues with full preemption that PREEMPT_NONE > works around... > > https://lwn.net/Articles/944686/ > https://lwn.net/Articles/945422/ > > Further, Thomas Gleixner has stated in those discussions that: > > "Though definitely I'm putting a permanent NAK in place for > any attempts to duct tape the preempt=NONE model any > further by sprinkling more cond*() and whatever warts > around." > > https://lwn.net/ml/linux-kernel/87jzshhexi.ffs@tglx/ > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > the below softlockup warning. > > IOWs, this is no longer considered an acceptible solution by core > kernel maintainers. Understood. I will only build a hotfix for our production kernel then. > > Regardless of these policy issues, the code change: > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index c0f1c89786c2..194381e10472 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -4,6 +4,7 @@ > > * All Rights Reserved. > > */ > > #include <linux/iversion.h> > > +#include <linux/sched.h> > > Global includes like this go in fs/xfs/xfs_linux.h, but I don't > think that's even necessary because we have cond_resched() calls > elsewhere in XFS with the same include list as xfs_inode.c... > > > #include "xfs.h" > > #include "xfs_fs.h" > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > error = xfs_defer_finish(&tp); > > if (error) > > goto out; > > + > > + cond_resched(); > > } > > Shouldn't this go in xfs_defer_finish() so that we capture all the > cases where we loop indefinitely over a range continually rolling a > permanent transaction via xfs_defer_finish()? It seems xfs_collapse_file_space and xfs_insert_file_space also need to yield CPU. I don't have use cases for them yet. > > -Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-11 12:52 ` Jian Wen @ 2024-01-11 20:27 ` Dave Chinner 2024-01-12 13:01 ` Thomas Gleixner 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2024-01-11 20:27 UTC (permalink / raw) To: Jian Wen Cc: linux-xfs, djwong, hch, dchinner, Jian Wen, Thomas Gleixner, linux-kernel [cc Thomas, lkml] On Thu, Jan 11, 2024 at 08:52:22PM +0800, Jian Wen wrote: > On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Wed, Jan 10, 2024 at 03:13:47PM +0800, Jian Wen wrote: > > > From: Jian Wen <wenjianhn@gmail.com> > > > > > > Deleting a file with lots of extents may cause a soft lockup if the > > > preemption model is none(CONFIG_PREEMPT_NONE=y or preempt=none is set > > > in the kernel cmdline). Alibaba cloud kernel and Oracle UEK container > > > kernel are affected by the issue, since they select CONFIG_PREEMPT_NONE=y. > > > > Time for them to move to CONFIG_PREEMPT_DYNAMIC? > I had asked one of them to support CONFIG_PREEMPT_DYNAMIC before > sending the patch. OK. > > Also there has been recent action towards removing > > CONFIG_PREEMPT_NONE/VOLUNTARY and cond_resched() altogether because > > the lazy preemption model coming present in the RTPREEMPT patchset > > solves the performance issues with full preemption that PREEMPT_NONE > > works around... > > > > https://lwn.net/Articles/944686/ > > https://lwn.net/Articles/945422/ > > > > Further, Thomas Gleixner has stated in those discussions that: > > > > "Though definitely I'm putting a permanent NAK in place for > > any attempts to duct tape the preempt=NONE model any > > further by sprinkling more cond*() and whatever warts > > around." > > > > https://lwn.net/ml/linux-kernel/87jzshhexi.ffs@tglx/ > > > > > Explicitly call cond_resched in xfs_itruncate_extents_flags avoid > > > the below softlockup warning. > > > > IOWs, this is no longer considered an acceptible solution by core > > kernel maintainers. > Understood. I will only build a hotfix for our production kernel then. Yeah, that may be your best short term fix. We'll need to clarify what the current policy is on adding cond_resched points before we go any further in this direction. Thomas, any update on what is happening with cond_resched() - is there an ETA on it going away/being unnecessary? > > Regardless of these policy issues, the code change: > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > > index c0f1c89786c2..194381e10472 100644 > > > --- a/fs/xfs/xfs_inode.c > > > +++ b/fs/xfs/xfs_inode.c > > > @@ -4,6 +4,7 @@ > > > * All Rights Reserved. > > > */ > > > #include <linux/iversion.h> > > > +#include <linux/sched.h> > > > > Global includes like this go in fs/xfs/xfs_linux.h, but I don't > > think that's even necessary because we have cond_resched() calls > > elsewhere in XFS with the same include list as xfs_inode.c... > > > > > #include "xfs.h" > > > #include "xfs_fs.h" > > > @@ -1383,6 +1384,8 @@ xfs_itruncate_extents_flags( > > > error = xfs_defer_finish(&tp); > > > if (error) > > > goto out; > > > + > > > + cond_resched(); > > > } > > > > Shouldn't this go in xfs_defer_finish() so that we capture all the > > cases where we loop indefinitely over a range continually rolling a > > permanent transaction via xfs_defer_finish()? > It seems xfs_collapse_file_space and xfs_insert_file_space also need > to yield CPU. > I don't have use cases for them yet. Yup, they do, but they also call xfs_defer_finish(), so having the cond_resched() in that function will capture them as well. Also, the current upstream tree has moved this code from xfs_itruncate_extents_flags() to xfs_bunmapi_range(), so the cond_resched() has to be moved, anyway. We may as well put it in xfs_defer_finish() if we end up doing this. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-11 20:27 ` Dave Chinner @ 2024-01-12 13:01 ` Thomas Gleixner 2024-01-23 7:01 ` Ankur Arora 0 siblings, 1 reply; 9+ messages in thread From: Thomas Gleixner @ 2024-01-12 13:01 UTC (permalink / raw) To: Dave Chinner, Jian Wen Cc: linux-xfs, djwong, hch, dchinner, Jian Wen, linux-kernel, Ankur Arora On Fri, Jan 12 2024 at 07:27, Dave Chinner wrote: Cc: Ankur > On Thu, Jan 11, 2024 at 08:52:22PM +0800, Jian Wen wrote: >> On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: >> > IOWs, this is no longer considered an acceptible solution by core >> > kernel maintainers. >> Understood. I will only build a hotfix for our production kernel then. > > Yeah, that may be your best short term fix. We'll need to clarify > what the current policy is on adding cond_resched points before we > go any further in this direction. Well, right now until the scheduler situation is sorted there is no other solution than to add the cond_resched() muck. > Thomas, any update on what is happening with cond_resched() - is > there an ETA on it going away/being unnecessary? Ankur is working on that... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags 2024-01-12 13:01 ` Thomas Gleixner @ 2024-01-23 7:01 ` Ankur Arora 0 siblings, 0 replies; 9+ messages in thread From: Ankur Arora @ 2024-01-23 7:01 UTC (permalink / raw) To: tglx Cc: ankur.a.arora, david, dchinner, djwong, hch, linux-kernel, linux-xfs, wenjian1, wenjianhn [ Missed this email until now. ] Thomas Gleixner writes: >On Fri, Jan 12 2024 at 07:27, Dave Chinner wrote: > >Cc: Ankur > > > On Thu, Jan 11, 2024 at 08:52:22PM +0800, Jian Wen wrote: > >> On Thu, Jan 11, 2024 at 5:38 AM Dave Chinner <david@fromorbit.com> wrote: > >> > IOWs, this is no longer considered an acceptible solution by core > >> > kernel maintainers. > >> Understood. I will only build a hotfix for our production kernel then. > > > > Yeah, that may be your best short term fix. We'll need to clarify > > what the current policy is on adding cond_resched points before we > > go any further in this direction. > > Well, right now until the scheduler situation is sorted there is no > other solution than to add the cond_resched() muck. > > > Thomas, any update on what is happening with cond_resched() - is > > there an ETA on it going away/being unnecessary? > > Ankur is working on that... Yeah, running through a final round of tests before sending out the series. Dave, on the status of cond_resched(): the work on this adds a new scheduling model (as Thomas implemented in his PoC) undwer which cond_resched() would basically be a stub. However, given that other preemption models continue to use cond_resched(), we would need to live with cond_resched() for a while -- at least while this model works well enough under a wide enough variety of loads. Thanks Ankur ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-01-23 7:02 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-10 7:13 [PATCH] xfs: explicitly call cond_resched in xfs_itruncate_extents_flags Jian Wen 2024-01-10 17:45 ` Darrick J. Wong 2024-01-11 12:40 ` Jian Wen 2024-01-11 20:16 ` Dave Chinner 2024-01-10 21:38 ` Dave Chinner 2024-01-11 12:52 ` Jian Wen 2024-01-11 20:27 ` Dave Chinner 2024-01-12 13:01 ` Thomas Gleixner 2024-01-23 7:01 ` Ankur Arora
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox