Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
       [not found] ` <20240807-b4-slab-kfree_rcu-destroy-v2-6-ea79102f428c@suse.cz>
@ 2025-02-21 16:30   ` Keith Busch
  2025-02-21 16:51     ` Mateusz Guzik
  2025-02-21 17:28     ` Vlastimil Babka
  0 siblings, 2 replies; 23+ messages in thread
From: Keith Busch @ 2025-02-21 16:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik, linux-nvme,
	leitao

On Wed, Aug 07, 2024 at 12:31:19PM +0200, Vlastimil Babka wrote:
> We would like to replace call_rcu() users with kfree_rcu() where the
> existing callback is just a kmem_cache_free(). However this causes
> issues when the cache can be destroyed (such as due to module unload).
> 
> Currently such modules should be issuing rcu_barrier() before
> kmem_cache_destroy() to have their call_rcu() callbacks processed first.
> This barrier is however not sufficient for kfree_rcu() in flight due
> to the batching introduced by a35d16905efc ("rcu: Add basic support for
> kfree_rcu() batching").
> 
> This is not a problem for kmalloc caches which are never destroyed, but
> since removing SLOB, kfree_rcu() is allowed also for any other cache,
> that might be destroyed.
> 
> In order not to complicate the API, put the responsibility for handling
> outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the newly
> introduced kvfree_rcu_barrier() to wait before destroying the cache.
> This is similar to how we issue rcu_barrier() for SLAB_TYPESAFE_BY_RCU
> caches, but has to be done earlier, as the latter only needs to wait for
> the empty slab pages to finish freeing, and not objects from the slab.
> 
> Users of call_rcu() with arbitrary callbacks should still issue
> rcu_barrier() before destroying the cache and unloading the module, as
> kvfree_rcu_barrier() is not a superset of rcu_barrier() and the
> callbacks may be invoking module code or performing other actions that
> are necessary for a successful unload.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slab_common.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index c40227d5fa07..1a2873293f5d 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -508,6 +508,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  	if (unlikely(!s) || !kasan_check_byte(s))
>  		return;
>  
> +	/* in-flight kfree_rcu()'s may include objects from our cache */
> +	kvfree_rcu_barrier();
> +
>  	cpus_read_lock();
>  	mutex_lock(&slab_mutex);

This patch appears to be triggering a new warning in certain conditions
when tearing down an nvme namespace's block device. Stack trace is at
the end.

The warning indicates that this shouldn't be called from a
WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
and tearing down block devices, so this is a memory reclaim use AIUI.
I'm a bit confused why we can't tear down a disk from within a memory
reclaim workqueue. Is the recommended solution to simply remove the WQ
flag when creating the workqueue?

  ------------[ cut here ]------------
  workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
  WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
  Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) skx_edac_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) iTCO_wdt(E) xhci_pci(E) mlx5_ib(E) ipmi_si(E) iTCO_vendor_support(E) i2c_i801(E) ipmi_devintf(E) evdev(E) kvm(E) xhci_hcd(E) ib_uverbs(E) acpi_cpufreq(E) wmi(E) i2c_smbus(E) ipmi_msghandler(E) button(E) efivarfs(E) autofs4(E)
  CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
  Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
  Workqueue: nvme-wq nvme_scan_work
  RIP: 0010:check_flush_dependency+0x112/0x120
  Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 c7 c7 90 64 5a 82 49 89 d8 e8 7e 4f 88 ff <0f> 0b eb 8c cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 41 57 41
  RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
  RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
  RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
  RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
  R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
  R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
  CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __warn+0xa4/0x140
   ? check_flush_dependency+0x112/0x120
   ? report_bug+0xe1/0x140
   ? check_flush_dependency+0x112/0x120
   ? handle_bug+0x5e/0x90
   ? exc_invalid_op+0x16/0x40
   ? asm_exc_invalid_op+0x16/0x20
   ? timer_recalc_next_expiry+0x190/0x190
   ? check_flush_dependency+0x112/0x120
   ? check_flush_dependency+0x112/0x120
   __flush_work.llvm.1643880146586177030+0x174/0x2c0
   flush_rcu_work+0x28/0x30
   kvfree_rcu_barrier+0x12f/0x160
   kmem_cache_destroy+0x18/0x120
   bioset_exit+0x10c/0x150
   disk_release.llvm.6740012984264378178+0x61/0xd0
   device_release+0x4f/0x90
   kobject_put+0x95/0x180
   nvme_put_ns+0x23/0xc0
   nvme_remove_invalid_namespaces+0xb3/0xd0
   nvme_scan_work+0x342/0x490
   process_scheduled_works+0x1a2/0x370
   worker_thread+0x2ff/0x390
   ? pwq_release_workfn+0x1e0/0x1e0
   kthread+0xb1/0xe0
   ? __kthread_parkme+0x70/0x70
   ret_from_fork+0x30/0x40
   ? __kthread_parkme+0x70/0x70
   ret_from_fork_asm+0x11/0x20
   </TASK>
  ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-21 16:30   ` [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy() Keith Busch
@ 2025-02-21 16:51     ` Mateusz Guzik
  2025-02-21 16:52       ` Mateusz Guzik
  2025-02-21 17:28     ` Vlastimil Babka
  1 sibling, 1 reply; 23+ messages in thread
From: Mateusz Guzik @ 2025-02-21 16:51 UTC (permalink / raw)
  To: Keith Busch
  Cc: Vlastimil Babka, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev, Jann Horn, linux-nvme, leitao

On Fri, Feb 21, 2025 at 5:30 PM Keith Busch <kbusch@kernel.org> wrote:
> This patch appears to be triggering a new warning in certain conditions
> when tearing down an nvme namespace's block device. Stack trace is at
> the end.
>
> The warning indicates that this shouldn't be called from a
> WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
> and tearing down block devices, so this is a memory reclaim use AIUI.
> I'm a bit confused why we can't tear down a disk from within a memory
> reclaim workqueue. Is the recommended solution to simply remove the WQ
> flag when creating the workqueue?
>

This ends up calling into bioset_exit -> bio_put_slab -> kmem_cache_destroy

Sizes of the bio- slabs are off the beaten path, so it may be they
make sense to exist.

With the assumption that caches should be there, this can instead
invoke kmem_cache_destroy from a queue where it is safe to do it. This
is not supposed to be a frequent operation.
-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-21 16:51     ` Mateusz Guzik
@ 2025-02-21 16:52       ` Mateusz Guzik
  0 siblings, 0 replies; 23+ messages in thread
From: Mateusz Guzik @ 2025-02-21 16:52 UTC (permalink / raw)
  To: Keith Busch
  Cc: Vlastimil Babka, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev, Jann Horn, linux-nvme, leitao

On Fri, Feb 21, 2025 at 5:51 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Fri, Feb 21, 2025 at 5:30 PM Keith Busch <kbusch@kernel.org> wrote:
> > This patch appears to be triggering a new warning in certain conditions
> > when tearing down an nvme namespace's block device. Stack trace is at
> > the end.
> >
> > The warning indicates that this shouldn't be called from a
> > WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
> > and tearing down block devices, so this is a memory reclaim use AIUI.
> > I'm a bit confused why we can't tear down a disk from within a memory
> > reclaim workqueue. Is the recommended solution to simply remove the WQ
> > flag when creating the workqueue?
> >
>
> This ends up calling into bioset_exit -> bio_put_slab -> kmem_cache_destroy
>
> Sizes of the bio- slabs are off the beaten path, so it may be they
> make sense to exist.
>
> With the assumption that caches should be there, this can instead
> invoke kmem_cache_destroy from a queue where it is safe to do it. This
> is not supposed to be a frequent operation.

Erm, I mean defer the operation to a queue which can safely do it.

-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-21 16:30   ` [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy() Keith Busch
  2025-02-21 16:51     ` Mateusz Guzik
@ 2025-02-21 17:28     ` Vlastimil Babka
  2025-02-24 11:44       ` Uladzislau Rezki
  1 sibling, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2025-02-21 17:28 UTC (permalink / raw)
  To: Keith Busch
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik, linux-nvme,
	leitao

On 2/21/25 17:30, Keith Busch wrote:
> On Wed, Aug 07, 2024 at 12:31:19PM +0200, Vlastimil Babka wrote:
>> We would like to replace call_rcu() users with kfree_rcu() where the
>> existing callback is just a kmem_cache_free(). However this causes
>> issues when the cache can be destroyed (such as due to module unload).
>> 
>> Currently such modules should be issuing rcu_barrier() before
>> kmem_cache_destroy() to have their call_rcu() callbacks processed first.
>> This barrier is however not sufficient for kfree_rcu() in flight due
>> to the batching introduced by a35d16905efc ("rcu: Add basic support for
>> kfree_rcu() batching").
>> 
>> This is not a problem for kmalloc caches which are never destroyed, but
>> since removing SLOB, kfree_rcu() is allowed also for any other cache,
>> that might be destroyed.
>> 
>> In order not to complicate the API, put the responsibility for handling
>> outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the newly
>> introduced kvfree_rcu_barrier() to wait before destroying the cache.
>> This is similar to how we issue rcu_barrier() for SLAB_TYPESAFE_BY_RCU
>> caches, but has to be done earlier, as the latter only needs to wait for
>> the empty slab pages to finish freeing, and not objects from the slab.
>> 
>> Users of call_rcu() with arbitrary callbacks should still issue
>> rcu_barrier() before destroying the cache and unloading the module, as
>> kvfree_rcu_barrier() is not a superset of rcu_barrier() and the
>> callbacks may be invoking module code or performing other actions that
>> are necessary for a successful unload.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>  mm/slab_common.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index c40227d5fa07..1a2873293f5d 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -508,6 +508,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
>>  	if (unlikely(!s) || !kasan_check_byte(s))
>>  		return;
>>  
>> +	/* in-flight kfree_rcu()'s may include objects from our cache */
>> +	kvfree_rcu_barrier();
>> +
>>  	cpus_read_lock();
>>  	mutex_lock(&slab_mutex);
> 
> This patch appears to be triggering a new warning in certain conditions
> when tearing down an nvme namespace's block device. Stack trace is at
> the end.
> 
> The warning indicates that this shouldn't be called from a
> WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
> and tearing down block devices, so this is a memory reclaim use AIUI.
> I'm a bit confused why we can't tear down a disk from within a memory
> reclaim workqueue. Is the recommended solution to simply remove the WQ
> flag when creating the workqueue?

I think it's reasonable to expect a memory reclaim related action would
destroy a kmem cache. Mateusz's suggestion would work around the issue, but
then we could get another surprising warning elsewhere. Also making the
kmem_cache destroys async can be tricky when a recreation happens
immediately under the same name (implications with sysfs/debugfs etc). We
managed to make the destroying synchronous as part of this series and it
would be great to keep it that way.

>   ------------[ cut here ]------------
>   workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work

Maybe instead kfree_rcu_work should be using a WQ_MEM_RECLAIM workqueue? It
is after all freeing memory. Ulad, what do you think?

>   WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
>   Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) skx_edac_common(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) iTCO_wdt(E) xhci_pci(E) mlx5_ib(E) ipmi_si(E) iTCO_vendor_support(E) i2c_i801(E) ipmi_devintf(E) evdev(E) kvm(E) xhci_hcd(E) ib_uverbs(E) acpi_cpufreq(E) wmi(E) i2c_smbus(E) ipmi_msghandler(E) button(E) efivarfs(E) autofs4(E)
>   CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
>   Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
>   Workqueue: nvme-wq nvme_scan_work
>   RIP: 0010:check_flush_dependency+0x112/0x120
>   Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 c7 c7 90 64 5a 82 49 89 d8 e8 7e 4f 88 ff <0f> 0b eb 8c cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 41 57 41
>   RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
>   RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
>   RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
>   RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
>   R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
>   R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
>   CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
>   PKRU: 55555554
>   Call Trace:
>    <TASK>
>    ? __warn+0xa4/0x140
>    ? check_flush_dependency+0x112/0x120
>    ? report_bug+0xe1/0x140
>    ? check_flush_dependency+0x112/0x120
>    ? handle_bug+0x5e/0x90
>    ? exc_invalid_op+0x16/0x40
>    ? asm_exc_invalid_op+0x16/0x20
>    ? timer_recalc_next_expiry+0x190/0x190
>    ? check_flush_dependency+0x112/0x120
>    ? check_flush_dependency+0x112/0x120
>    __flush_work.llvm.1643880146586177030+0x174/0x2c0
>    flush_rcu_work+0x28/0x30
>    kvfree_rcu_barrier+0x12f/0x160
>    kmem_cache_destroy+0x18/0x120
>    bioset_exit+0x10c/0x150
>    disk_release.llvm.6740012984264378178+0x61/0xd0
>    device_release+0x4f/0x90
>    kobject_put+0x95/0x180
>    nvme_put_ns+0x23/0xc0
>    nvme_remove_invalid_namespaces+0xb3/0xd0
>    nvme_scan_work+0x342/0x490
>    process_scheduled_works+0x1a2/0x370
>    worker_thread+0x2ff/0x390
>    ? pwq_release_workfn+0x1e0/0x1e0
>    kthread+0xb1/0xe0
>    ? __kthread_parkme+0x70/0x70
>    ret_from_fork+0x30/0x40
>    ? __kthread_parkme+0x70/0x70
>    ret_from_fork_asm+0x11/0x20
>    </TASK>
>   ---[ end trace 0000000000000000 ]---



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-21 17:28     ` Vlastimil Babka
@ 2025-02-24 11:44       ` Uladzislau Rezki
  2025-02-24 15:37         ` Keith Busch
  2025-02-25  9:57         ` Vlastimil Babka
  0 siblings, 2 replies; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-24 11:44 UTC (permalink / raw)
  To: Vlastimil Babka, Keith Busch
  Cc: Keith Busch, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik, linux-nvme,
	leitao

On Fri, Feb 21, 2025 at 06:28:49PM +0100, Vlastimil Babka wrote:
> On 2/21/25 17:30, Keith Busch wrote:
> > On Wed, Aug 07, 2024 at 12:31:19PM +0200, Vlastimil Babka wrote:
> >> We would like to replace call_rcu() users with kfree_rcu() where the
> >> existing callback is just a kmem_cache_free(). However this causes
> >> issues when the cache can be destroyed (such as due to module unload).
> >> 
> >> Currently such modules should be issuing rcu_barrier() before
> >> kmem_cache_destroy() to have their call_rcu() callbacks processed first.
> >> This barrier is however not sufficient for kfree_rcu() in flight due
> >> to the batching introduced by a35d16905efc ("rcu: Add basic support for
> >> kfree_rcu() batching").
> >> 
> >> This is not a problem for kmalloc caches which are never destroyed, but
> >> since removing SLOB, kfree_rcu() is allowed also for any other cache,
> >> that might be destroyed.
> >> 
> >> In order not to complicate the API, put the responsibility for handling
> >> outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the newly
> >> introduced kvfree_rcu_barrier() to wait before destroying the cache.
> >> This is similar to how we issue rcu_barrier() for SLAB_TYPESAFE_BY_RCU
> >> caches, but has to be done earlier, as the latter only needs to wait for
> >> the empty slab pages to finish freeing, and not objects from the slab.
> >> 
> >> Users of call_rcu() with arbitrary callbacks should still issue
> >> rcu_barrier() before destroying the cache and unloading the module, as
> >> kvfree_rcu_barrier() is not a superset of rcu_barrier() and the
> >> callbacks may be invoking module code or performing other actions that
> >> are necessary for a successful unload.
> >> 
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> >>  mm/slab_common.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index c40227d5fa07..1a2873293f5d 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -508,6 +508,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >>  	if (unlikely(!s) || !kasan_check_byte(s))
> >>  		return;
> >>  
> >> +	/* in-flight kfree_rcu()'s may include objects from our cache */
> >> +	kvfree_rcu_barrier();
> >> +
> >>  	cpus_read_lock();
> >>  	mutex_lock(&slab_mutex);
> > 
> > This patch appears to be triggering a new warning in certain conditions
> > when tearing down an nvme namespace's block device. Stack trace is at
> > the end.
> > 
> > The warning indicates that this shouldn't be called from a
> > WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
> > and tearing down block devices, so this is a memory reclaim use AIUI.
> > I'm a bit confused why we can't tear down a disk from within a memory
> > reclaim workqueue. Is the recommended solution to simply remove the WQ
> > flag when creating the workqueue?
> 
> I think it's reasonable to expect a memory reclaim related action would
> destroy a kmem cache. Mateusz's suggestion would work around the issue, but
> then we could get another surprising warning elsewhere. Also making the
> kmem_cache destroys async can be tricky when a recreation happens
> immediately under the same name (implications with sysfs/debugfs etc). We
> managed to make the destroying synchronous as part of this series and it
> would be great to keep it that way.
> 
> >   ------------[ cut here ]------------
> >   workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> 
> Maybe instead kfree_rcu_work should be using a WQ_MEM_RECLAIM workqueue? It
> is after all freeing memory. Ulad, what do you think?
> 
We reclaim memory, therefore WQ_MEM_RECLAIM seems what we need.
AFAIR, there is an extra rescue worker, which can really help
under a low memory condition in a way that we do a progress.

Do we have a reproducer of mentioned splat?

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-24 11:44       ` Uladzislau Rezki
@ 2025-02-24 15:37         ` Keith Busch
  2025-02-25  9:57         ` Vlastimil Babka
  1 sibling, 0 replies; 23+ messages in thread
From: Keith Busch @ 2025-02-24 15:37 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Vlastimil Babka, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On Mon, Feb 24, 2025 at 12:44:46PM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 21, 2025 at 06:28:49PM +0100, Vlastimil Babka wrote:
> > > 
> > > The warning indicates that this shouldn't be called from a
> > > WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
> > > and tearing down block devices, so this is a memory reclaim use AIUI.
> > > I'm a bit confused why we can't tear down a disk from within a memory
> > > reclaim workqueue. Is the recommended solution to simply remove the WQ
> > > flag when creating the workqueue?
> > 
> > I think it's reasonable to expect a memory reclaim related action would
> > destroy a kmem cache. Mateusz's suggestion would work around the issue, but
> > then we could get another surprising warning elsewhere. Also making the
> > kmem_cache destroys async can be tricky when a recreation happens
> > immediately under the same name (implications with sysfs/debugfs etc). We
> > managed to make the destroying synchronous as part of this series and it
> > would be great to keep it that way.
> > 
> > >   ------------[ cut here ]------------
> > >   workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> > 
> > Maybe instead kfree_rcu_work should be using a WQ_MEM_RECLAIM workqueue? It
> > is after all freeing memory. Ulad, what do you think?
> > 
> We reclaim memory, therefore WQ_MEM_RECLAIM seems what we need.
> AFAIR, there is an extra rescue worker, which can really help
> under a low memory condition in a way that we do a progress.
> 
> Do we have a reproducer of mentioned splat?

We're observing this happen in production, and I'm trying to get more
details on what is going on there. The stack trace says that the nvme
controller deleted a namespace, and it happens to also be the last disk
that drops the slab's final ref, which deletes the kmem_cache. I think
this must be part of some automated reimaging process, as the disk is
immediately recreated followed by a kexec.

Trying to manually recreate this hasn't been successful so far because
it's never the last disk on my test machines, so I'm always seeing a
non-zero ref when deleting namespaces from this nvme workqueue.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-24 11:44       ` Uladzislau Rezki
  2025-02-24 15:37         ` Keith Busch
@ 2025-02-25  9:57         ` Vlastimil Babka
  2025-02-25 13:39           ` Uladzislau Rezki
  2025-02-25 16:03           ` Keith Busch
  1 sibling, 2 replies; 23+ messages in thread
From: Vlastimil Babka @ 2025-02-25  9:57 UTC (permalink / raw)
  To: Uladzislau Rezki, Keith Busch
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On 2/24/25 12:44, Uladzislau Rezki wrote:
> On Fri, Feb 21, 2025 at 06:28:49PM +0100, Vlastimil Babka wrote:
>> On 2/21/25 17:30, Keith Busch wrote:
>> > On Wed, Aug 07, 2024 at 12:31:19PM +0200, Vlastimil Babka wrote:
>> >> We would like to replace call_rcu() users with kfree_rcu() where the
>> >> existing callback is just a kmem_cache_free(). However this causes
>> >> issues when the cache can be destroyed (such as due to module unload).
>> >> 
>> >> Currently such modules should be issuing rcu_barrier() before
>> >> kmem_cache_destroy() to have their call_rcu() callbacks processed first.
>> >> This barrier is however not sufficient for kfree_rcu() in flight due
>> >> to the batching introduced by a35d16905efc ("rcu: Add basic support for
>> >> kfree_rcu() batching").
>> >> 
>> >> This is not a problem for kmalloc caches which are never destroyed, but
>> >> since removing SLOB, kfree_rcu() is allowed also for any other cache,
>> >> that might be destroyed.
>> >> 
>> >> In order not to complicate the API, put the responsibility for handling
>> >> outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the newly
>> >> introduced kvfree_rcu_barrier() to wait before destroying the cache.
>> >> This is similar to how we issue rcu_barrier() for SLAB_TYPESAFE_BY_RCU
>> >> caches, but has to be done earlier, as the latter only needs to wait for
>> >> the empty slab pages to finish freeing, and not objects from the slab.
>> >> 
>> >> Users of call_rcu() with arbitrary callbacks should still issue
>> >> rcu_barrier() before destroying the cache and unloading the module, as
>> >> kvfree_rcu_barrier() is not a superset of rcu_barrier() and the
>> >> callbacks may be invoking module code or performing other actions that
>> >> are necessary for a successful unload.
>> >> 
>> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> >> ---
>> >>  mm/slab_common.c | 3 +++
>> >>  1 file changed, 3 insertions(+)
>> >> 
>> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> >> index c40227d5fa07..1a2873293f5d 100644
>> >> --- a/mm/slab_common.c
>> >> +++ b/mm/slab_common.c
>> >> @@ -508,6 +508,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
>> >>  	if (unlikely(!s) || !kasan_check_byte(s))
>> >>  		return;
>> >>  
>> >> +	/* in-flight kfree_rcu()'s may include objects from our cache */
>> >> +	kvfree_rcu_barrier();
>> >> +
>> >>  	cpus_read_lock();
>> >>  	mutex_lock(&slab_mutex);
>> > 
>> > This patch appears to be triggering a new warning in certain conditions
>> > when tearing down an nvme namespace's block device. Stack trace is at
>> > the end.
>> > 
>> > The warning indicates that this shouldn't be called from a
>> > WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
>> > and tearing down block devices, so this is a memory reclaim use AIUI.
>> > I'm a bit confused why we can't tear down a disk from within a memory
>> > reclaim workqueue. Is the recommended solution to simply remove the WQ
>> > flag when creating the workqueue?
>> 
>> I think it's reasonable to expect a memory reclaim related action would
>> destroy a kmem cache. Mateusz's suggestion would work around the issue, but
>> then we could get another surprising warning elsewhere. Also making the
>> kmem_cache destroys async can be tricky when a recreation happens
>> immediately under the same name (implications with sysfs/debugfs etc). We
>> managed to make the destroying synchronous as part of this series and it
>> would be great to keep it that way.
>> 
>> >   ------------[ cut here ]------------
>> >   workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
>> 
>> Maybe instead kfree_rcu_work should be using a WQ_MEM_RECLAIM workqueue? It
>> is after all freeing memory. Ulad, what do you think?
>> 
> We reclaim memory, therefore WQ_MEM_RECLAIM seems what we need.
> AFAIR, there is an extra rescue worker, which can really help
> under a low memory condition in a way that we do a progress.
> 
> Do we have a reproducer of mentioned splat?

I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
it's too simple, or racy, and thus we are not flushing any of the queues from
kvfree_rcu_barrier()?

----8<----
From 1e19ea78e7fe254034970f75e3b7cb705be50163 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 25 Feb 2025 10:51:28 +0100
Subject: [PATCH] add test for kmem_cache_destroy in a workqueue

---
 lib/slub_kunit.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
index f11691315c2f..5fe9775fba05 100644
--- a/lib/slub_kunit.c
+++ b/lib/slub_kunit.c
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/rcupdate.h>
+#include <linux/delay.h>
 #include "../mm/slab.h"
 
 static struct kunit_resource resource;
@@ -181,6 +182,52 @@ static void test_kfree_rcu(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, 0, slab_errors);
 }
 
+struct cache_destroy_work {
+        struct work_struct work;
+        struct kmem_cache *s;
+};
+
+static void cache_destroy_workfn(struct work_struct *w)
+{
+	struct cache_destroy_work *cdw;
+
+	cdw = container_of(w, struct cache_destroy_work, work);
+
+	kmem_cache_destroy(cdw->s);
+}
+
+static void test_kfree_rcu_wq_destroy(struct kunit *test)
+{
+	struct test_kfree_rcu_struct *p;
+	struct cache_destroy_work cdw;
+	struct workqueue_struct *wq;
+	struct kmem_cache *s;
+
+	if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
+		kunit_skip(test, "can't do kfree_rcu() when test is built-in");
+
+	INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
+	wq = alloc_workqueue("test_kfree_rcu_destroy_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+	if (!wq)
+		kunit_skip(test, "failed to alloc wq");
+
+	s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
+				   sizeof(struct test_kfree_rcu_struct),
+				   SLAB_NO_MERGE);
+	p = kmem_cache_alloc(s, GFP_KERNEL);
+
+	kfree_rcu(p, rcu);
+
+	cdw.s = s;
+	queue_work(wq, &cdw.work);
+	msleep(1000);
+	flush_work(&cdw.work);
+
+	destroy_workqueue(wq);
+
+	KUNIT_EXPECT_EQ(test, 0, slab_errors);
+}
+
 static void test_leak_destroy(struct kunit *test)
 {
 	struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
@@ -254,6 +301,7 @@ static struct kunit_case test_cases[] = {
 	KUNIT_CASE(test_clobber_redzone_free),
 	KUNIT_CASE(test_kmalloc_redzone_access),
 	KUNIT_CASE(test_kfree_rcu),
+	KUNIT_CASE(test_kfree_rcu_wq_destroy),
 	KUNIT_CASE(test_leak_destroy),
 	KUNIT_CASE(test_krealloc_redzone_zeroing),
 	{}
-- 
2.48.1




^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25  9:57         ` Vlastimil Babka
@ 2025-02-25 13:39           ` Uladzislau Rezki
  2025-02-25 14:12             ` Vlastimil Babka
  2025-02-25 16:03           ` Keith Busch
  1 sibling, 1 reply; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-25 13:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Keith Busch, Paul E. McKenney, Joel Fernandes,
	Josh Triplett, Boqun Feng, Christoph Lameter, David Rientjes,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel, rcu,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev,
	Jann Horn, Mateusz Guzik, linux-nvme, leitao

On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
> On 2/24/25 12:44, Uladzislau Rezki wrote:
> > On Fri, Feb 21, 2025 at 06:28:49PM +0100, Vlastimil Babka wrote:
> >> On 2/21/25 17:30, Keith Busch wrote:
> >> > On Wed, Aug 07, 2024 at 12:31:19PM +0200, Vlastimil Babka wrote:
> >> >> We would like to replace call_rcu() users with kfree_rcu() where the
> >> >> existing callback is just a kmem_cache_free(). However this causes
> >> >> issues when the cache can be destroyed (such as due to module unload).
> >> >> 
> >> >> Currently such modules should be issuing rcu_barrier() before
> >> >> kmem_cache_destroy() to have their call_rcu() callbacks processed first.
> >> >> This barrier is however not sufficient for kfree_rcu() in flight due
> >> >> to the batching introduced by a35d16905efc ("rcu: Add basic support for
> >> >> kfree_rcu() batching").
> >> >> 
> >> >> This is not a problem for kmalloc caches which are never destroyed, but
> >> >> since removing SLOB, kfree_rcu() is allowed also for any other cache,
> >> >> that might be destroyed.
> >> >> 
> >> >> In order not to complicate the API, put the responsibility for handling
> >> >> outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the newly
> >> >> introduced kvfree_rcu_barrier() to wait before destroying the cache.
> >> >> This is similar to how we issue rcu_barrier() for SLAB_TYPESAFE_BY_RCU
> >> >> caches, but has to be done earlier, as the latter only needs to wait for
> >> >> the empty slab pages to finish freeing, and not objects from the slab.
> >> >> 
> >> >> Users of call_rcu() with arbitrary callbacks should still issue
> >> >> rcu_barrier() before destroying the cache and unloading the module, as
> >> >> kvfree_rcu_barrier() is not a superset of rcu_barrier() and the
> >> >> callbacks may be invoking module code or performing other actions that
> >> >> are necessary for a successful unload.
> >> >> 
> >> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> >> ---
> >> >>  mm/slab_common.c | 3 +++
> >> >>  1 file changed, 3 insertions(+)
> >> >> 
> >> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> >> index c40227d5fa07..1a2873293f5d 100644
> >> >> --- a/mm/slab_common.c
> >> >> +++ b/mm/slab_common.c
> >> >> @@ -508,6 +508,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >> >>  	if (unlikely(!s) || !kasan_check_byte(s))
> >> >>  		return;
> >> >>  
> >> >> +	/* in-flight kfree_rcu()'s may include objects from our cache */
> >> >> +	kvfree_rcu_barrier();
> >> >> +
> >> >>  	cpus_read_lock();
> >> >>  	mutex_lock(&slab_mutex);
> >> > 
> >> > This patch appears to be triggering a new warning in certain conditions
> >> > when tearing down an nvme namespace's block device. Stack trace is at
> >> > the end.
> >> > 
> >> > The warning indicates that this shouldn't be called from a
> >> > WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
> >> > and tearing down block devices, so this is a memory reclaim use AIUI.
> >> > I'm a bit confused why we can't tear down a disk from within a memory
> >> > reclaim workqueue. Is the recommended solution to simply remove the WQ
> >> > flag when creating the workqueue?
> >> 
> >> I think it's reasonable to expect a memory reclaim related action would
> >> destroy a kmem cache. Mateusz's suggestion would work around the issue, but
> >> then we could get another surprising warning elsewhere. Also making the
> >> kmem_cache destroys async can be tricky when a recreation happens
> >> immediately under the same name (implications with sysfs/debugfs etc). We
> >> managed to make the destroying synchronous as part of this series and it
> >> would be great to keep it that way.
> >> 
> >> >   ------------[ cut here ]------------
> >> >   workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> >> 
> >> Maybe instead kfree_rcu_work should be using a WQ_MEM_RECLAIM workqueue? It
> >> is after all freeing memory. Ulad, what do you think?
> >> 
> > We reclaim memory, therefore WQ_MEM_RECLAIM seems what we need.
> > AFAIR, there is an extra rescue worker, which can really help
> > under a low memory condition in a way that we do a progress.
> > 
> > Do we have a reproducer of mentioned splat?
> 
> I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
> it's too simple, or racy, and thus we are not flushing any of the queues from
> kvfree_rcu_barrier()?
> 
See some comments below. I will try to reproduce it today. But from the
first glance it should trigger it.

> ----8<----
> From 1e19ea78e7fe254034970f75e3b7cb705be50163 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 25 Feb 2025 10:51:28 +0100
> Subject: [PATCH] add test for kmem_cache_destroy in a workqueue
> 
> ---
>  lib/slub_kunit.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
> index f11691315c2f..5fe9775fba05 100644
> --- a/lib/slub_kunit.c
> +++ b/lib/slub_kunit.c
> @@ -6,6 +6,7 @@
>  #include <linux/module.h>
>  #include <linux/kernel.h>
>  #include <linux/rcupdate.h>
> +#include <linux/delay.h>
>  #include "../mm/slab.h"
>  
>  static struct kunit_resource resource;
> @@ -181,6 +182,52 @@ static void test_kfree_rcu(struct kunit *test)
>  	KUNIT_EXPECT_EQ(test, 0, slab_errors);
>  }
>  
> +struct cache_destroy_work {
> +        struct work_struct work;
> +        struct kmem_cache *s;
> +};
> +
> +static void cache_destroy_workfn(struct work_struct *w)
> +{
> +	struct cache_destroy_work *cdw;
> +
> +	cdw = container_of(w, struct cache_destroy_work, work);
> +
> +	kmem_cache_destroy(cdw->s);
> +}
> +
> +static void test_kfree_rcu_wq_destroy(struct kunit *test)
> +{
> +	struct test_kfree_rcu_struct *p;
> +	struct cache_destroy_work cdw;
> +	struct workqueue_struct *wq;
> +	struct kmem_cache *s;
> +
> +	if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
> +		kunit_skip(test, "can't do kfree_rcu() when test is built-in");
> +
> +	INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
> +	wq = alloc_workqueue("test_kfree_rcu_destroy_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
>
Maybe it is worth to add WQ_HIGHPRI also to be ahead?

> +	if (!wq)
> +		kunit_skip(test, "failed to alloc wq");
> +
> +	s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
> +				   sizeof(struct test_kfree_rcu_struct),
> +				   SLAB_NO_MERGE);
> +	p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> +	kfree_rcu(p, rcu);
> +
> +	cdw.s = s;
> +	queue_work(wq, &cdw.work);
> +	msleep(1000);
I am not sure it is needed. From the other hand it does nothing if
i do not miss anything.

> +	flush_work(&cdw.work);
> +
> +	destroy_workqueue(wq);
> +
> +	KUNIT_EXPECT_EQ(test, 0, slab_errors);
> +}
> +
>  static void test_leak_destroy(struct kunit *test)
>  {
>  	struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
> @@ -254,6 +301,7 @@ static struct kunit_case test_cases[] = {
>  	KUNIT_CASE(test_clobber_redzone_free),
>  	KUNIT_CASE(test_kmalloc_redzone_access),
>  	KUNIT_CASE(test_kfree_rcu),
> +	KUNIT_CASE(test_kfree_rcu_wq_destroy),
>  	KUNIT_CASE(test_leak_destroy),
>  	KUNIT_CASE(test_krealloc_redzone_zeroing),
>  	{}
> -- 
> 2.48.1
> 
> 

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 13:39           ` Uladzislau Rezki
@ 2025-02-25 14:12             ` Vlastimil Babka
  0 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2025-02-25 14:12 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Keith Busch, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On 2/25/25 14:39, Uladzislau Rezki wrote:
> On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
>> On 2/24/25 12:44, Uladzislau Rezki wrote:
>> > On Fri, Feb 21, 2025 at 06:28:49PM +0100, Vlastimil Babka wrote:
>> >> On 2/21/25 17:30, Keith Busch wrote:
>> >> >   ------------[ cut here ]------------
>> >> >   workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
>> >> 
>> >> Maybe instead kfree_rcu_work should be using a WQ_MEM_RECLAIM workqueue? It
>> >> is after all freeing memory. Ulad, what do you think?
>> >> 
>> > We reclaim memory, therefore WQ_MEM_RECLAIM seems what we need.
>> > AFAIR, there is an extra rescue worker, which can really help
>> > under a low memory condition in a way that we do a progress.
>> > 
>> > Do we have a reproducer of mentioned splat?
>> 
>> I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
>> it's too simple, or racy, and thus we are not flushing any of the queues from
>> kvfree_rcu_barrier()?
>> 
> See some comments below. I will try to reproduce it today. But from the
> first glance it should trigger it.
> 
>> ----8<----
>> From 1e19ea78e7fe254034970f75e3b7cb705be50163 Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <vbabka@suse.cz>
>> Date: Tue, 25 Feb 2025 10:51:28 +0100
>> Subject: [PATCH] add test for kmem_cache_destroy in a workqueue
>> 
>> ---
>>  lib/slub_kunit.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 48 insertions(+)
>> 
>> diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
>> index f11691315c2f..5fe9775fba05 100644
>> --- a/lib/slub_kunit.c
>> +++ b/lib/slub_kunit.c
>> @@ -6,6 +6,7 @@
>>  #include <linux/module.h>
>>  #include <linux/kernel.h>
>>  #include <linux/rcupdate.h>
>> +#include <linux/delay.h>
>>  #include "../mm/slab.h"
>>  
>>  static struct kunit_resource resource;
>> @@ -181,6 +182,52 @@ static void test_kfree_rcu(struct kunit *test)
>>  	KUNIT_EXPECT_EQ(test, 0, slab_errors);
>>  }
>>  
>> +struct cache_destroy_work {
>> +        struct work_struct work;
>> +        struct kmem_cache *s;
>> +};
>> +
>> +static void cache_destroy_workfn(struct work_struct *w)
>> +{
>> +	struct cache_destroy_work *cdw;
>> +
>> +	cdw = container_of(w, struct cache_destroy_work, work);
>> +
>> +	kmem_cache_destroy(cdw->s);
>> +}
>> +
>> +static void test_kfree_rcu_wq_destroy(struct kunit *test)
>> +{
>> +	struct test_kfree_rcu_struct *p;
>> +	struct cache_destroy_work cdw;
>> +	struct workqueue_struct *wq;
>> +	struct kmem_cache *s;
>> +
>> +	if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
>> +		kunit_skip(test, "can't do kfree_rcu() when test is built-in");
>> +
>> +	INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
>> +	wq = alloc_workqueue("test_kfree_rcu_destroy_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
>>
> Maybe it is worth to add WQ_HIGHPRI also to be ahead?

I looked at what nvme_wq uses:

unsigned int wq_flags = WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS;

HIGHPRI wasn't there, and sysfs didn't seem important.


>> +	if (!wq)
>> +		kunit_skip(test, "failed to alloc wq");
>> +
>> +	s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
>> +				   sizeof(struct test_kfree_rcu_struct),
>> +				   SLAB_NO_MERGE);
>> +	p = kmem_cache_alloc(s, GFP_KERNEL);
>> +
>> +	kfree_rcu(p, rcu);
>> +
>> +	cdw.s = s;
>> +	queue_work(wq, &cdw.work);
>> +	msleep(1000);
> I am not sure it is needed. From the other hand it does nothing if
> i do not miss anything.

I've tried to add that in case it makes any difference (letting the
processing be done on its own instead of flushing immediately), but the
results was the same either way, no warning. AFAICS it also doesn't depend
on some debug CONFIG_ I could be missing, but maybe I'm wrong.

Hope you have more success :) Thanks.

>> +	flush_work(&cdw.work);
>> +
>> +	destroy_workqueue(wq);
>> +
>> +	KUNIT_EXPECT_EQ(test, 0, slab_errors);
>> +}
>> +
>>  static void test_leak_destroy(struct kunit *test)
>>  {
>>  	struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
>> @@ -254,6 +301,7 @@ static struct kunit_case test_cases[] = {
>>  	KUNIT_CASE(test_clobber_redzone_free),
>>  	KUNIT_CASE(test_kmalloc_redzone_access),
>>  	KUNIT_CASE(test_kfree_rcu),
>> +	KUNIT_CASE(test_kfree_rcu_wq_destroy),
>>  	KUNIT_CASE(test_leak_destroy),
>>  	KUNIT_CASE(test_krealloc_redzone_zeroing),
>>  	{}
>> -- 
>> 2.48.1
>> 
>> 
> 
> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25  9:57         ` Vlastimil Babka
  2025-02-25 13:39           ` Uladzislau Rezki
@ 2025-02-25 16:03           ` Keith Busch
  2025-02-25 17:05             ` Keith Busch
  1 sibling, 1 reply; 23+ messages in thread
From: Keith Busch @ 2025-02-25 16:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
> I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
> it's too simple, or racy, and thus we are not flushing any of the queues from
> kvfree_rcu_barrier()?

Thanks, your test readily triggers it for me, but only if I load
rcutorture at the same time.

[  142.223220] CPU: 11 UID: 0 PID: 186 Comm: kworker/u64:11 Tainted: G            E    N 6.13.0-04839-g5e7b40f0ddce-dirty #831
[  142.223222] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
[  142.223223] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  142.223224] Workqueue: test_kfree_rcu_destroy_wq cache_destroy_workfn [slub_kunit]
[  142.223230] Call Trace:
[  142.223231]  <TASK>
[  142.223233]  dump_stack_lvl+0x64/0x90
[  142.223239]  print_circular_bug+0x2c5/0x400
[  142.223243]  check_noncircular+0x103/0x120
[  142.223246]  ? save_trace+0x3e/0x360
[  142.223249]  __lock_acquire+0x1481/0x24b0
[  142.223252]  lock_acquire+0xaa/0x2a0
[  142.223253]  ? console_lock_spinning_enable+0x3e/0x60
[  142.223255]  ? lock_release+0xb3/0x250
[  142.223257]  console_lock_spinning_enable+0x5a/0x60
[  142.223258]  ? console_lock_spinning_enable+0x3e/0x60
[  142.223260]  console_flush_all+0x2b1/0x490
[  142.223262]  ? console_flush_all+0x29/0x490
[  142.223264]  console_unlock+0x49/0xf0
[  142.223266]  vprintk_emit+0x12b/0x300
[  142.223269]  ? kfree_rcu_shrink_scan+0x120/0x120
[  142.223270]  _printk+0x48/0x50
[  142.223272]  ? kfree_rcu_shrink_scan+0x120/0x120
[  142.223273]  __warn_printk+0xb4/0xe0
[  142.223276]  ? 0xffffffffa05d6000
[  142.223278]  ? kfree_rcu_shrink_scan+0x120/0x120
[  142.223279]  check_flush_dependency.part.0+0xad/0x100
[  142.223282]  __flush_work+0x38a/0x4a0
[  142.223284]  ? find_held_lock+0x2b/0x80
[  142.223287]  ? flush_rcu_work+0x26/0x40
[  142.223289]  ? lock_release+0xb3/0x250
[  142.223290]  ? __mutex_unlock_slowpath+0x2c/0x270
[  142.223292]  flush_rcu_work+0x30/0x40
[  142.223294]  kvfree_rcu_barrier+0xe9/0x130
[  142.223296]  kmem_cache_destroy+0x2b/0x1f0
[  142.223297]  cache_destroy_workfn+0x20/0x40 [slub_kunit]
[  142.223299]  process_one_work+0x1cd/0x560
[  142.223302]  worker_thread+0x183/0x310
[  142.223304]  ? rescuer_thread+0x330/0x330
[  142.223306]  kthread+0xd8/0x1d0
[  142.223308]  ? ret_from_fork+0x17/0x50
[  142.223310]  ? lock_release+0xb3/0x250
[  142.223311]  ? kthreads_online_cpu+0xf0/0xf0
[  142.223313]  ret_from_fork+0x2d/0x50
[  142.223315]  ? kthreads_online_cpu+0xf0/0xf0
[  142.223317]  ret_from_fork_asm+0x11/0x20
[  142.223321]  </TASK>
[  142.371052] workqueue: WQ_MEM_RECLAIM test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
[  142.371072] WARNING: CPU: 11 PID: 186 at kernel/workqueue.c:3715 check_flush_dependency.part.0+0xad/0x100
[  142.375748] Modules linked in: slub_kunit(E) rcutorture(E) torture(E) kunit(E) iTCO_wdt(E) iTCO_vendor_support(E) intel_uncore_frequency_common(E) skx_edac_common(E) nfit(E) libnvdimm(E) kvm_intel(E) kvm(E) evdev(E) bochs(E) serio_raw(E) drm_kms_helper(E) i2c_i801(E) e1000e(E) i2c_smbus(E) intel_agp(E) intel_gtt(E) lpc_ich(E) agpgart(E) mfd_core(E) drm_shm]


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 16:03           ` Keith Busch
@ 2025-02-25 17:05             ` Keith Busch
  2025-02-25 17:41               ` Uladzislau Rezki
  0 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2025-02-25 17:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On Tue, Feb 25, 2025 at 09:03:38AM -0700, Keith Busch wrote:
> On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
> > I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
> > it's too simple, or racy, and thus we are not flushing any of the queues from
> > kvfree_rcu_barrier()?
>
> Thanks, your test readily triggers it for me, but only if I load
> rcutorture at the same time.

Oops, I sent the wrong kernel messages. This is the relevant part:

[  142.371052] workqueue: WQ_MEM_RECLAIM
test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is
flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
[  142.371072] WARNING: CPU: 11 PID: 186 at kernel/workqueue.c:3715
check_flush_dependency.part.0+0xad/0x100
[  142.375748] Modules linked in: slub_kunit(E) rcutorture(E)
torture(E) kunit(E) iTCO_wdt(E) iTCO_vendor_support(E)
intel_uncore_frequency_common(E) skx_edac_common(E) nfit(E)
libnvdimm(E) kvm_intel(E) kvm(E) evdev(E) bochs(E) serio_raw(E)
drm_kms_helper(E) i2c_i801(E) e1000e(E) i2c_smbus(E) intel_agp(E)
intel_gtt(E) lpc_ich(E) agpgart(E) mfd_core(E) drm_shm]
[  142.384553] CPU: 11 UID: 0 PID: 186 Comm: kworker/u64:11 Tainted: G
           E    N 6.13.0-04839-g5e7b40f0ddce-dirty #831
[  142.386755] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
[  142.387849] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  142.390236] Workqueue: test_kfree_rcu_destroy_wq
cache_destroy_workfn [slub_kunit]
[  142.391863] RIP: 0010:check_flush_dependency.part.0+0xad/0x100
[  142.393183] Code: 75 dc 48 8b 55 18 49 8d 8d 78 01 00 00 4d 89 f0
48 81 c6 78 01 00 00 48 c7 c7 00 e1 9a 82 c6 05 4f 39 c5 02 01 e8 53
bd fd ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e c3 80 3d 39 39 c5 02 00 75 83
41 8b
[  142.396981] RSP: 0018:ffffc900007cfc90 EFLAGS: 00010092
[  142.398124] RAX: 000000000000008f RBX: ffff88803e9b10a0 RCX: 0000000000000027
[  142.399605] RDX: ffff88803eba0d08 RSI: 0000000000000001 RDI: ffff88803eba0d00
[  142.401092] RBP: ffff888007d9a480 R08: ffffffff83b8c808 R09: 0000000000000003
[  142.402548] R10: ffffffff8348c820 R11: ffffffff83a11d58 R12: ffff888007150000
[  142.404098] R13: ffff888005961400 R14: ffffffff813221a0 R15: ffff888005961400
[  142.405561] FS:  0000000000000000(0000) GS:ffff88803eb80000(0000)
knlGS:0000000000000000
[  142.407297] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  142.408658] CR2: 00007f826bd1a000 CR3: 00000000069db002 CR4: 0000000000772ef0
[  142.410259] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  142.411871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  142.413341] PKRU: 55555554
[  142.414038] Call Trace:
[  142.414658]  <TASK>
[  142.415249]  ? __warn+0x8d/0x180
[  142.416035]  ? check_flush_dependency.part.0+0xad/0x100
[  142.417182]  ? report_bug+0x160/0x170
[  142.418041]  ? handle_bug+0x4f/0x90
[  142.418861]  ? exc_invalid_op+0x14/0x70
[  142.419853]  ? asm_exc_invalid_op+0x16/0x20
[  142.420877]  ? kfree_rcu_shrink_scan+0x120/0x120
[  142.422029]  ? check_flush_dependency.part.0+0xad/0x100
[  142.423244]  __flush_work+0x38a/0x4a0
[  142.424157]  ? find_held_lock+0x2b/0x80
[  142.425070]  ? flush_rcu_work+0x26/0x40
[  142.425953]  ? lock_release+0xb3/0x250
[  142.426785]  ? __mutex_unlock_slowpath+0x2c/0x270
[  142.427906]  flush_rcu_work+0x30/0x40
[  142.428756]  kvfree_rcu_barrier+0xe9/0x130
[  142.429649]  kmem_cache_destroy+0x2b/0x1f0
[  142.430578]  cache_destroy_workfn+0x20/0x40 [slub_kunit]
[  142.431729]  process_one_work+0x1cd/0x560
[  142.432620]  worker_thread+0x183/0x310
[  142.433487]  ? rescuer_thread+0x330/0x330
[  142.434428]  kthread+0xd8/0x1d0
[  142.435248]  ? ret_from_fork+0x17/0x50
[  142.436165]  ? lock_release+0xb3/0x250
[  142.437106]  ? kthreads_online_cpu+0xf0/0xf0
[  142.438133]  ret_from_fork+0x2d/0x50
[  142.439045]  ? kthreads_online_cpu+0xf0/0xf0
[  142.440428]  ret_from_fork_asm+0x11/0x20
[  142.441476]  </TASK>
[  142.442152] irq event stamp: 22858
[  142.443002] hardirqs last  enabled at (22857): [<ffffffff82044ef4>]
_raw_spin_unlock_irq+0x24/0x30
[  142.445032] hardirqs last disabled at (22858): [<ffffffff82044ce3>]
_raw_spin_lock_irq+0x43/0x50
[  142.451450] softirqs last  enabled at (22714): [<ffffffff810bfdbc>]
__irq_exit_rcu+0xac/0xd0
[  142.453345] softirqs last disabled at (22709): [<ffffffff810bfdbc>]
__irq_exit_rcu+0xac/0xd0
[  142.455305] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 17:05             ` Keith Busch
@ 2025-02-25 17:41               ` Uladzislau Rezki
  2025-02-25 18:11                 ` Vlastimil Babka
  2025-02-25 18:21                 ` Uladzislau Rezki
  0 siblings, 2 replies; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-25 17:41 UTC (permalink / raw)
  To: Keith Busch, Vlastimil Babka
  Cc: Vlastimil Babka, Uladzislau Rezki, Paul E. McKenney,
	Joel Fernandes, Josh Triplett, Boqun Feng, Christoph Lameter,
	David Rientjes, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik, linux-nvme,
	leitao

On Tue, Feb 25, 2025 at 10:05:37AM -0700, Keith Busch wrote:
> On Tue, Feb 25, 2025 at 09:03:38AM -0700, Keith Busch wrote:
> > On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
> > > I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
> > > it's too simple, or racy, and thus we are not flushing any of the queues from
> > > kvfree_rcu_barrier()?
> >
> > Thanks, your test readily triggers it for me, but only if I load
> > rcutorture at the same time.
> 
> Oops, I sent the wrong kernel messages. This is the relevant part:
> 
> [  142.371052] workqueue: WQ_MEM_RECLAIM
> test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is
> flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> [  142.371072] WARNING: CPU: 11 PID: 186 at kernel/workqueue.c:3715
> check_flush_dependency.part.0+0xad/0x100
> [  142.375748] Modules linked in: slub_kunit(E) rcutorture(E)
> torture(E) kunit(E) iTCO_wdt(E) iTCO_vendor_support(E)
> intel_uncore_frequency_common(E) skx_edac_common(E) nfit(E)
> libnvdimm(E) kvm_intel(E) kvm(E) evdev(E) bochs(E) serio_raw(E)
> drm_kms_helper(E) i2c_i801(E) e1000e(E) i2c_smbus(E) intel_agp(E)
> intel_gtt(E) lpc_ich(E) agpgart(E) mfd_core(E) drm_shm]
> [  142.384553] CPU: 11 UID: 0 PID: 186 Comm: kworker/u64:11 Tainted: G
>            E    N 6.13.0-04839-g5e7b40f0ddce-dirty #831
> [  142.386755] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
> [  142.387849] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [  142.390236] Workqueue: test_kfree_rcu_destroy_wq
> cache_destroy_workfn [slub_kunit]
> [  142.391863] RIP: 0010:check_flush_dependency.part.0+0xad/0x100
> [  142.393183] Code: 75 dc 48 8b 55 18 49 8d 8d 78 01 00 00 4d 89 f0
> 48 81 c6 78 01 00 00 48 c7 c7 00 e1 9a 82 c6 05 4f 39 c5 02 01 e8 53
> bd fd ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e c3 80 3d 39 39 c5 02 00 75 83
> 41 8b
> [  142.396981] RSP: 0018:ffffc900007cfc90 EFLAGS: 00010092
> [  142.398124] RAX: 000000000000008f RBX: ffff88803e9b10a0 RCX: 0000000000000027
> [  142.399605] RDX: ffff88803eba0d08 RSI: 0000000000000001 RDI: ffff88803eba0d00
> [  142.401092] RBP: ffff888007d9a480 R08: ffffffff83b8c808 R09: 0000000000000003
> [  142.402548] R10: ffffffff8348c820 R11: ffffffff83a11d58 R12: ffff888007150000
> [  142.404098] R13: ffff888005961400 R14: ffffffff813221a0 R15: ffff888005961400
> [  142.405561] FS:  0000000000000000(0000) GS:ffff88803eb80000(0000)
> knlGS:0000000000000000
> [  142.407297] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  142.408658] CR2: 00007f826bd1a000 CR3: 00000000069db002 CR4: 0000000000772ef0
> [  142.410259] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  142.411871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  142.413341] PKRU: 55555554
> [  142.414038] Call Trace:
> [  142.414658]  <TASK>
> [  142.415249]  ? __warn+0x8d/0x180
> [  142.416035]  ? check_flush_dependency.part.0+0xad/0x100
> [  142.417182]  ? report_bug+0x160/0x170
> [  142.418041]  ? handle_bug+0x4f/0x90
> [  142.418861]  ? exc_invalid_op+0x14/0x70
> [  142.419853]  ? asm_exc_invalid_op+0x16/0x20
> [  142.420877]  ? kfree_rcu_shrink_scan+0x120/0x120
> [  142.422029]  ? check_flush_dependency.part.0+0xad/0x100
> [  142.423244]  __flush_work+0x38a/0x4a0
> [  142.424157]  ? find_held_lock+0x2b/0x80
> [  142.425070]  ? flush_rcu_work+0x26/0x40
> [  142.425953]  ? lock_release+0xb3/0x250
> [  142.426785]  ? __mutex_unlock_slowpath+0x2c/0x270
> [  142.427906]  flush_rcu_work+0x30/0x40
> [  142.428756]  kvfree_rcu_barrier+0xe9/0x130
> [  142.429649]  kmem_cache_destroy+0x2b/0x1f0
> [  142.430578]  cache_destroy_workfn+0x20/0x40 [slub_kunit]
> [  142.431729]  process_one_work+0x1cd/0x560
> [  142.432620]  worker_thread+0x183/0x310
> [  142.433487]  ? rescuer_thread+0x330/0x330
> [  142.434428]  kthread+0xd8/0x1d0
> [  142.435248]  ? ret_from_fork+0x17/0x50
> [  142.436165]  ? lock_release+0xb3/0x250
> [  142.437106]  ? kthreads_online_cpu+0xf0/0xf0
> [  142.438133]  ret_from_fork+0x2d/0x50
> [  142.439045]  ? kthreads_online_cpu+0xf0/0xf0
> [  142.440428]  ret_from_fork_asm+0x11/0x20
> [  142.441476]  </TASK>
> [  142.442152] irq event stamp: 22858
> [  142.443002] hardirqs last  enabled at (22857): [<ffffffff82044ef4>]
> _raw_spin_unlock_irq+0x24/0x30
> [  142.445032] hardirqs last disabled at (22858): [<ffffffff82044ce3>]
> _raw_spin_lock_irq+0x43/0x50
> [  142.451450] softirqs last  enabled at (22714): [<ffffffff810bfdbc>]
> __irq_exit_rcu+0xac/0xd0
> [  142.453345] softirqs last disabled at (22709): [<ffffffff810bfdbc>]
> __irq_exit_rcu+0xac/0xd0
> [  142.455305] ---[ end trace 0000000000000000 ]---
Thanks!

I can trigger this also:

<snip>
[   21.712856] KTAP version 1
[   21.712862] 1..1
[   21.714486]     KTAP version 1
[   21.714490]     # Subtest: slub_test
[   21.714492]     # module: slub_kunit
[   21.714495]     1..10
[   21.750359]     ok 1 test_clobber_zone
[   21.750955]     ok 2 test_next_pointer
[   21.751532]     ok 3 test_first_word
[   21.751991]     ok 4 test_clobber_50th_byte
[   21.752493]     ok 5 test_clobber_redzone_free
[   21.753004] stackdepot: allocating hash table of 1048576 entries via kvcalloc
[   21.756176]     ok 6 test_kmalloc_redzone_access
[   21.806549]     ok 7 test_kfree_rcu
[   22.058010] ------------[ cut here ]------------
[   22.058015] workqueue: WQ_MEM_RECLAIM test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
[   22.058039] WARNING: CPU: 19 PID: 474 at kernel/workqueue.c:3715 check_flush_dependency.part.0+0xbe/0x130
[   22.058047] Modules linked in: slub_kunit(E) kunit(E) binfmt_misc(E) bochs(E) drm_client_lib(E) drm_shmem_helper(E) ppdev(E) drm_kms_helper(E) snd_pcm(E) sg(E) snd_timer(E) evdev(E) snd(E) joydev(E) parport_pc(E) parport(E) soundcore(E) serio_raw(E) button(E) pcspkr(E) drm(E) fuse(E) dm_mod(E) efi_pstore(E) configfs(E) loop(E) qemu_fw_cfg(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) cdrom(E) ata_generic(E) ata_piix(E) libata(E) scsi_mod(E) i2c_piix4(E) psmouse(E) e1000(E) i2c_smbus(E) scsi_common(E) floppy(E)
[   22.058091] CPU: 19 UID: 0 PID: 474 Comm: kworker/u257:0 Kdump: loaded Tainted: G            E    N 6.14.0-rc1+ #286
[   22.058096] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
[   22.058097] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   22.058099] Workqueue: test_kfree_rcu_destroy_wq cache_destroy_workfn [slub_kunit]
[   22.058103] RIP: 0010:check_flush_dependency.part.0+0xbe/0x130
[   22.058106] Code: 75 d0 48 8b 55 18 49 8d 8d c0 00 00 00 4d 89 f0 48 81 c6 c0 00 00 00 48 c7 c7 b0 7d c8 bd c6 05 6c 78 53 01 01 e8 a2 ae fd ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc f6 c4 08 74 94 31 ed
[   22.058108] RSP: 0018:ffff95e5c123fd50 EFLAGS: 00010086
[   22.058111] RAX: 0000000000000000 RBX: ffff89a4ff22d5a0 RCX: 0000000000000000
[   22.058113] RDX: 0000000000000003 RSI: ffffffffbdce1697 RDI: 00000000ffffffff
[   22.058114] RBP: ffff89961043a780 R08: 0000000000000000 R09: 0000000000000003
[   22.058116] R10: ffff95e5c123fbe8 R11: ffff89a53fefefa8 R12: ffff89960cb6b080
[   22.058117] R13: ffff899600051400 R14: ffffffffbcf2ba80 R15: ffff89960005a800
[   22.058120] FS:  0000000000000000(0000) GS:ffff89a4ff2c0000(0000) knlGS:0000000000000000
[   22.058122] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.058124] CR2: 000055bf2cbc6038 CR3: 000000010dc1e000 CR4: 00000000000006f0
[   22.058128] Call Trace:
[   22.058130]  <TASK>
[   22.058133]  ? __warn+0x85/0x130
[   22.058137]  ? check_flush_dependency.part.0+0xbe/0x130
[   22.058139]  ? report_bug+0x18d/0x1c0
[   22.058142]  ? prb_read_valid+0x17/0x20
[   22.058147]  ? handle_bug+0x58/0x90
[   22.058151]  ? exc_invalid_op+0x13/0x60
[   22.058154]  ? asm_exc_invalid_op+0x16/0x20
[   22.058158]  ? __pfx_kfree_rcu_work+0x10/0x10
[   22.058162]  ? check_flush_dependency.part.0+0xbe/0x130
[   22.058165]  __flush_work+0xd6/0x320
[   22.058168]  flush_rcu_work+0x39/0x50
[   22.058171]  kvfree_rcu_barrier+0xe9/0x130
[   22.058174]  kmem_cache_destroy+0x18/0x140
[   22.058177]  process_one_work+0x184/0x3a0
[   22.058180]  worker_thread+0x24d/0x360
[   22.058183]  ? __pfx_worker_thread+0x10/0x10
[   22.058185]  kthread+0xfc/0x230
[   22.058189]  ? finish_task_switch.isra.0+0x85/0x2a0
[   22.058192]  ? __pfx_kthread+0x10/0x10
[   22.058195]  ret_from_fork+0x30/0x50
[   22.058199]  ? __pfx_kthread+0x10/0x10
[   22.058202]  ret_from_fork_asm+0x1a/0x30
[   22.058206]  </TASK>
[   22.058207] ---[ end trace 0000000000000000 ]---
[   23.123507]     ok 8 test_kfree_rcu_wq_destroy
[   23.151033]     ok 9 test_leak_destroy
[   23.151612]     ok 10 test_krealloc_redzone_zeroing
[   23.151617] # slub_test: pass:10 fail:0 skip:0 total:10
[   23.151619] # Totals: pass:10 fail:0 skip:0 total:10
[   23.151620] ok 1 slub_test
urezki@pc638:~$
<snip>

but i had to adapt slightly the Vlastimil's test:

diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
index f11691315c2f..222f6d204b0d 100644
--- a/lib/slub_kunit.c
+++ b/lib/slub_kunit.c
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/rcupdate.h>
+#include <linux/delay.h>
 #include "../mm/slab.h"

 static struct kunit_resource resource;
@@ -181,6 +182,63 @@ static void test_kfree_rcu(struct kunit *test)
        KUNIT_EXPECT_EQ(test, 0, slab_errors);
 }

+struct cache_destroy_work {
+        struct work_struct work;
+        struct kmem_cache *s;
+};
+
+static void cache_destroy_workfn(struct work_struct *w)
+{
+       struct cache_destroy_work *cdw;
+
+       cdw = container_of(w, struct cache_destroy_work, work);
+       kmem_cache_destroy(cdw->s);
+}
+
+#define KMEM_CACHE_DESTROY_NR 10
+
+static void test_kfree_rcu_wq_destroy(struct kunit *test)
+{
+       struct test_kfree_rcu_struct *p;
+       struct cache_destroy_work cdw;
+       struct workqueue_struct *wq;
+       struct kmem_cache *s;
+       unsigned int rnd;
+       int i;
+
+       if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
+               kunit_skip(test, "can't do kfree_rcu() when test is built-in");
+
+       INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
+       wq = alloc_workqueue("test_kfree_rcu_destroy_wq",
+                       WQ_HIGHPRI | WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+
+       if (!wq)
+               kunit_skip(test, "failed to alloc wq");
+
+       for (i = 0; i < KMEM_CACHE_DESTROY_NR; i++) {
+               s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
+                               sizeof(struct test_kfree_rcu_struct),
+                               SLAB_NO_MERGE);
+
+               if (!s)
+                       kunit_skip(test, "failed to create cache");
+
+               rnd = get_random_u8() % 255;
+               p = kmem_cache_alloc(s, GFP_KERNEL);
+               kfree_rcu(p, rcu);
+
+               cdw.s = s;
+
+               msleep(rnd);
+               queue_work(wq, &cdw.work);
+               flush_work(&cdw.work);
+       }
+
+       destroy_workqueue(wq);
+       KUNIT_EXPECT_EQ(test, 0, slab_errors);
+}
+
 static void test_leak_destroy(struct kunit *test)
 {
        struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
@@ -254,6 +312,7 @@ static struct kunit_case test_cases[] = {
        KUNIT_CASE(test_clobber_redzone_free),
        KUNIT_CASE(test_kmalloc_redzone_access),
        KUNIT_CASE(test_kfree_rcu),
+       KUNIT_CASE(test_kfree_rcu_wq_destroy),
        KUNIT_CASE(test_leak_destroy),
        KUNIT_CASE(test_krealloc_redzone_zeroing),
        {}

--
Uladzislau Rezki


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 17:41               ` Uladzislau Rezki
@ 2025-02-25 18:11                 ` Vlastimil Babka
  2025-02-25 18:21                   ` Uladzislau Rezki
  2025-02-25 18:21                 ` Uladzislau Rezki
  1 sibling, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2025-02-25 18:11 UTC (permalink / raw)
  To: Uladzislau Rezki, Keith Busch
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On 2/25/25 18:41, Uladzislau Rezki wrote:
> 
> but i had to adapt slightly the Vlastimil's test:

Great, works for me too, thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 17:41               ` Uladzislau Rezki
  2025-02-25 18:11                 ` Vlastimil Babka
@ 2025-02-25 18:21                 ` Uladzislau Rezki
  2025-02-26 10:59                   ` Vlastimil Babka
  2025-02-26 15:51                   ` Keith Busch
  1 sibling, 2 replies; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-25 18:21 UTC (permalink / raw)
  To: Keith Busch, Vlastimil Babka
  Cc: Keith Busch, Vlastimil Babka, Paul E. McKenney, Joel Fernandes,
	Josh Triplett, Boqun Feng, Christoph Lameter, David Rientjes,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel, rcu,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev,
	Jann Horn, Mateusz Guzik, linux-nvme, leitao

On Tue, Feb 25, 2025 at 06:41:19PM +0100, Uladzislau Rezki wrote:
> On Tue, Feb 25, 2025 at 10:05:37AM -0700, Keith Busch wrote:
> > On Tue, Feb 25, 2025 at 09:03:38AM -0700, Keith Busch wrote:
> > > On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
> > > > I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
> > > > it's too simple, or racy, and thus we are not flushing any of the queues from
> > > > kvfree_rcu_barrier()?
> > >
> > > Thanks, your test readily triggers it for me, but only if I load
> > > rcutorture at the same time.
> > 
> > Oops, I sent the wrong kernel messages. This is the relevant part:
> > 
> > [  142.371052] workqueue: WQ_MEM_RECLAIM
> > test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is
> > flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> > [  142.371072] WARNING: CPU: 11 PID: 186 at kernel/workqueue.c:3715
> > check_flush_dependency.part.0+0xad/0x100
> > [  142.375748] Modules linked in: slub_kunit(E) rcutorture(E)
> > torture(E) kunit(E) iTCO_wdt(E) iTCO_vendor_support(E)
> > intel_uncore_frequency_common(E) skx_edac_common(E) nfit(E)
> > libnvdimm(E) kvm_intel(E) kvm(E) evdev(E) bochs(E) serio_raw(E)
> > drm_kms_helper(E) i2c_i801(E) e1000e(E) i2c_smbus(E) intel_agp(E)
> > intel_gtt(E) lpc_ich(E) agpgart(E) mfd_core(E) drm_shm]
> > [  142.384553] CPU: 11 UID: 0 PID: 186 Comm: kworker/u64:11 Tainted: G
> >            E    N 6.13.0-04839-g5e7b40f0ddce-dirty #831
> > [  142.386755] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
> > [  142.387849] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [  142.390236] Workqueue: test_kfree_rcu_destroy_wq
> > cache_destroy_workfn [slub_kunit]
> > [  142.391863] RIP: 0010:check_flush_dependency.part.0+0xad/0x100
> > [  142.393183] Code: 75 dc 48 8b 55 18 49 8d 8d 78 01 00 00 4d 89 f0
> > 48 81 c6 78 01 00 00 48 c7 c7 00 e1 9a 82 c6 05 4f 39 c5 02 01 e8 53
> > bd fd ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e c3 80 3d 39 39 c5 02 00 75 83
> > 41 8b
> > [  142.396981] RSP: 0018:ffffc900007cfc90 EFLAGS: 00010092
> > [  142.398124] RAX: 000000000000008f RBX: ffff88803e9b10a0 RCX: 0000000000000027
> > [  142.399605] RDX: ffff88803eba0d08 RSI: 0000000000000001 RDI: ffff88803eba0d00
> > [  142.401092] RBP: ffff888007d9a480 R08: ffffffff83b8c808 R09: 0000000000000003
> > [  142.402548] R10: ffffffff8348c820 R11: ffffffff83a11d58 R12: ffff888007150000
> > [  142.404098] R13: ffff888005961400 R14: ffffffff813221a0 R15: ffff888005961400
> > [  142.405561] FS:  0000000000000000(0000) GS:ffff88803eb80000(0000)
> > knlGS:0000000000000000
> > [  142.407297] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  142.408658] CR2: 00007f826bd1a000 CR3: 00000000069db002 CR4: 0000000000772ef0
> > [  142.410259] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  142.411871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  142.413341] PKRU: 55555554
> > [  142.414038] Call Trace:
> > [  142.414658]  <TASK>
> > [  142.415249]  ? __warn+0x8d/0x180
> > [  142.416035]  ? check_flush_dependency.part.0+0xad/0x100
> > [  142.417182]  ? report_bug+0x160/0x170
> > [  142.418041]  ? handle_bug+0x4f/0x90
> > [  142.418861]  ? exc_invalid_op+0x14/0x70
> > [  142.419853]  ? asm_exc_invalid_op+0x16/0x20
> > [  142.420877]  ? kfree_rcu_shrink_scan+0x120/0x120
> > [  142.422029]  ? check_flush_dependency.part.0+0xad/0x100
> > [  142.423244]  __flush_work+0x38a/0x4a0
> > [  142.424157]  ? find_held_lock+0x2b/0x80
> > [  142.425070]  ? flush_rcu_work+0x26/0x40
> > [  142.425953]  ? lock_release+0xb3/0x250
> > [  142.426785]  ? __mutex_unlock_slowpath+0x2c/0x270
> > [  142.427906]  flush_rcu_work+0x30/0x40
> > [  142.428756]  kvfree_rcu_barrier+0xe9/0x130
> > [  142.429649]  kmem_cache_destroy+0x2b/0x1f0
> > [  142.430578]  cache_destroy_workfn+0x20/0x40 [slub_kunit]
> > [  142.431729]  process_one_work+0x1cd/0x560
> > [  142.432620]  worker_thread+0x183/0x310
> > [  142.433487]  ? rescuer_thread+0x330/0x330
> > [  142.434428]  kthread+0xd8/0x1d0
> > [  142.435248]  ? ret_from_fork+0x17/0x50
> > [  142.436165]  ? lock_release+0xb3/0x250
> > [  142.437106]  ? kthreads_online_cpu+0xf0/0xf0
> > [  142.438133]  ret_from_fork+0x2d/0x50
> > [  142.439045]  ? kthreads_online_cpu+0xf0/0xf0
> > [  142.440428]  ret_from_fork_asm+0x11/0x20
> > [  142.441476]  </TASK>
> > [  142.442152] irq event stamp: 22858
> > [  142.443002] hardirqs last  enabled at (22857): [<ffffffff82044ef4>]
> > _raw_spin_unlock_irq+0x24/0x30
> > [  142.445032] hardirqs last disabled at (22858): [<ffffffff82044ce3>]
> > _raw_spin_lock_irq+0x43/0x50
> > [  142.451450] softirqs last  enabled at (22714): [<ffffffff810bfdbc>]
> > __irq_exit_rcu+0xac/0xd0
> > [  142.453345] softirqs last disabled at (22709): [<ffffffff810bfdbc>]
> > __irq_exit_rcu+0xac/0xd0
> > [  142.455305] ---[ end trace 0000000000000000 ]---
> Thanks!
> 
> I can trigger this also:
> 
> <snip>
> [   21.712856] KTAP version 1
> [   21.712862] 1..1
> [   21.714486]     KTAP version 1
> [   21.714490]     # Subtest: slub_test
> [   21.714492]     # module: slub_kunit
> [   21.714495]     1..10
> [   21.750359]     ok 1 test_clobber_zone
> [   21.750955]     ok 2 test_next_pointer
> [   21.751532]     ok 3 test_first_word
> [   21.751991]     ok 4 test_clobber_50th_byte
> [   21.752493]     ok 5 test_clobber_redzone_free
> [   21.753004] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> [   21.756176]     ok 6 test_kmalloc_redzone_access
> [   21.806549]     ok 7 test_kfree_rcu
> [   22.058010] ------------[ cut here ]------------
> [   22.058015] workqueue: WQ_MEM_RECLAIM test_kfree_rcu_destroy_wq:cache_destroy_workfn [slub_kunit] is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> [   22.058039] WARNING: CPU: 19 PID: 474 at kernel/workqueue.c:3715 check_flush_dependency.part.0+0xbe/0x130
> [   22.058047] Modules linked in: slub_kunit(E) kunit(E) binfmt_misc(E) bochs(E) drm_client_lib(E) drm_shmem_helper(E) ppdev(E) drm_kms_helper(E) snd_pcm(E) sg(E) snd_timer(E) evdev(E) snd(E) joydev(E) parport_pc(E) parport(E) soundcore(E) serio_raw(E) button(E) pcspkr(E) drm(E) fuse(E) dm_mod(E) efi_pstore(E) configfs(E) loop(E) qemu_fw_cfg(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) cdrom(E) ata_generic(E) ata_piix(E) libata(E) scsi_mod(E) i2c_piix4(E) psmouse(E) e1000(E) i2c_smbus(E) scsi_common(E) floppy(E)
> [   22.058091] CPU: 19 UID: 0 PID: 474 Comm: kworker/u257:0 Kdump: loaded Tainted: G            E    N 6.14.0-rc1+ #286
> [   22.058096] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
> [   22.058097] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   22.058099] Workqueue: test_kfree_rcu_destroy_wq cache_destroy_workfn [slub_kunit]
> [   22.058103] RIP: 0010:check_flush_dependency.part.0+0xbe/0x130
> [   22.058106] Code: 75 d0 48 8b 55 18 49 8d 8d c0 00 00 00 4d 89 f0 48 81 c6 c0 00 00 00 48 c7 c7 b0 7d c8 bd c6 05 6c 78 53 01 01 e8 a2 ae fd ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc f6 c4 08 74 94 31 ed
> [   22.058108] RSP: 0018:ffff95e5c123fd50 EFLAGS: 00010086
> [   22.058111] RAX: 0000000000000000 RBX: ffff89a4ff22d5a0 RCX: 0000000000000000
> [   22.058113] RDX: 0000000000000003 RSI: ffffffffbdce1697 RDI: 00000000ffffffff
> [   22.058114] RBP: ffff89961043a780 R08: 0000000000000000 R09: 0000000000000003
> [   22.058116] R10: ffff95e5c123fbe8 R11: ffff89a53fefefa8 R12: ffff89960cb6b080
> [   22.058117] R13: ffff899600051400 R14: ffffffffbcf2ba80 R15: ffff89960005a800
> [   22.058120] FS:  0000000000000000(0000) GS:ffff89a4ff2c0000(0000) knlGS:0000000000000000
> [   22.058122] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   22.058124] CR2: 000055bf2cbc6038 CR3: 000000010dc1e000 CR4: 00000000000006f0
> [   22.058128] Call Trace:
> [   22.058130]  <TASK>
> [   22.058133]  ? __warn+0x85/0x130
> [   22.058137]  ? check_flush_dependency.part.0+0xbe/0x130
> [   22.058139]  ? report_bug+0x18d/0x1c0
> [   22.058142]  ? prb_read_valid+0x17/0x20
> [   22.058147]  ? handle_bug+0x58/0x90
> [   22.058151]  ? exc_invalid_op+0x13/0x60
> [   22.058154]  ? asm_exc_invalid_op+0x16/0x20
> [   22.058158]  ? __pfx_kfree_rcu_work+0x10/0x10
> [   22.058162]  ? check_flush_dependency.part.0+0xbe/0x130
> [   22.058165]  __flush_work+0xd6/0x320
> [   22.058168]  flush_rcu_work+0x39/0x50
> [   22.058171]  kvfree_rcu_barrier+0xe9/0x130
> [   22.058174]  kmem_cache_destroy+0x18/0x140
> [   22.058177]  process_one_work+0x184/0x3a0
> [   22.058180]  worker_thread+0x24d/0x360
> [   22.058183]  ? __pfx_worker_thread+0x10/0x10
> [   22.058185]  kthread+0xfc/0x230
> [   22.058189]  ? finish_task_switch.isra.0+0x85/0x2a0
> [   22.058192]  ? __pfx_kthread+0x10/0x10
> [   22.058195]  ret_from_fork+0x30/0x50
> [   22.058199]  ? __pfx_kthread+0x10/0x10
> [   22.058202]  ret_from_fork_asm+0x1a/0x30
> [   22.058206]  </TASK>
> [   22.058207] ---[ end trace 0000000000000000 ]---
> [   23.123507]     ok 8 test_kfree_rcu_wq_destroy
> [   23.151033]     ok 9 test_leak_destroy
> [   23.151612]     ok 10 test_krealloc_redzone_zeroing
> [   23.151617] # slub_test: pass:10 fail:0 skip:0 total:10
> [   23.151619] # Totals: pass:10 fail:0 skip:0 total:10
> [   23.151620] ok 1 slub_test
> urezki@pc638:~$
> <snip>
> 
> but i had to adapt slightly the Vlastimil's test:
> 
> diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
> index f11691315c2f..222f6d204b0d 100644
> --- a/lib/slub_kunit.c
> +++ b/lib/slub_kunit.c
> @@ -6,6 +6,7 @@
>  #include <linux/module.h>
>  #include <linux/kernel.h>
>  #include <linux/rcupdate.h>
> +#include <linux/delay.h>
>  #include "../mm/slab.h"
> 
>  static struct kunit_resource resource;
> @@ -181,6 +182,63 @@ static void test_kfree_rcu(struct kunit *test)
>         KUNIT_EXPECT_EQ(test, 0, slab_errors);
>  }
> 
> +struct cache_destroy_work {
> +        struct work_struct work;
> +        struct kmem_cache *s;
> +};
> +
> +static void cache_destroy_workfn(struct work_struct *w)
> +{
> +       struct cache_destroy_work *cdw;
> +
> +       cdw = container_of(w, struct cache_destroy_work, work);
> +       kmem_cache_destroy(cdw->s);
> +}
> +
> +#define KMEM_CACHE_DESTROY_NR 10
> +
> +static void test_kfree_rcu_wq_destroy(struct kunit *test)
> +{
> +       struct test_kfree_rcu_struct *p;
> +       struct cache_destroy_work cdw;
> +       struct workqueue_struct *wq;
> +       struct kmem_cache *s;
> +       unsigned int rnd;
> +       int i;
> +
> +       if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
> +               kunit_skip(test, "can't do kfree_rcu() when test is built-in");
> +
> +       INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
> +       wq = alloc_workqueue("test_kfree_rcu_destroy_wq",
> +                       WQ_HIGHPRI | WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
> +
> +       if (!wq)
> +               kunit_skip(test, "failed to alloc wq");
> +
> +       for (i = 0; i < KMEM_CACHE_DESTROY_NR; i++) {
> +               s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
> +                               sizeof(struct test_kfree_rcu_struct),
> +                               SLAB_NO_MERGE);
> +
> +               if (!s)
> +                       kunit_skip(test, "failed to create cache");
> +
> +               rnd = get_random_u8() % 255;
> +               p = kmem_cache_alloc(s, GFP_KERNEL);
> +               kfree_rcu(p, rcu);
> +
> +               cdw.s = s;
> +
> +               msleep(rnd);
> +               queue_work(wq, &cdw.work);
> +               flush_work(&cdw.work);
> +       }
> +
> +       destroy_workqueue(wq);
> +       KUNIT_EXPECT_EQ(test, 0, slab_errors);
> +}
> +
>  static void test_leak_destroy(struct kunit *test)
>  {
>         struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
> @@ -254,6 +312,7 @@ static struct kunit_case test_cases[] = {
>         KUNIT_CASE(test_clobber_redzone_free),
>         KUNIT_CASE(test_kmalloc_redzone_access),
>         KUNIT_CASE(test_kfree_rcu),
> +       KUNIT_CASE(test_kfree_rcu_wq_destroy),
>         KUNIT_CASE(test_leak_destroy),
>         KUNIT_CASE(test_krealloc_redzone_zeroing),
>         {}
> 
> --
> Uladzislau Rezki
>
WQ_MEM_RECLAIM-patch fixes this for me:

<snip>
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4030907b6b7d..1b5ed5512782 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
 static int rcu_delay_page_cache_fill_msec = 5000;
 module_param(rcu_delay_page_cache_fill_msec, int, 0444);

+static struct workqueue_struct *rcu_reclaim_wq;
+
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (5 * HZ)
 #define KFREE_N_BATCHES 2
@@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
        if (delayed_work_pending(&krcp->monitor_work)) {
                delay_left = krcp->monitor_work.timer.expires - jiffies;
                if (delay < delay_left)
-                       mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+                       mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
                return;
        }
-       queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+       queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
 }

 static void
@@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
                        // "free channels", the batch can handle. Break
                        // the loop since it is done with this CPU thus
                        // queuing an RCU work is _always_ success here.
-                       queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
+                       queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
                        WARN_ON_ONCE(!queued);
                        break;
                }
@@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
        if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
                        !atomic_xchg(&krcp->work_in_progress, 1)) {
                if (atomic_read(&krcp->backoff_page_cache_fill)) {
-                       queue_delayed_work(system_unbound_wq,
+                       queue_delayed_work(rcu_reclaim_wq,
                                &krcp->page_cache_work,
                                        msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
                } else {
@@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
        int i, j;
        struct shrinker *kfree_rcu_shrinker;

+       rcu_reclaim_wq = alloc_workqueue("rcu_reclaim",
+               WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+       WARN_ON(!rcu_reclaim_wq);
+
        /* Clamp it to [0:100] seconds interval. */
        if (rcu_delay_page_cache_fill_msec < 0 ||
                rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
<snip>

it passes:

<snip>
[   15.972416] KTAP version 1
[   15.972421] 1..1
[   15.973467]     KTAP version 1
[   15.973470]     # Subtest: slub_test
[   15.973472]     # module: slub_kunit
[   15.973474]     1..10
[   15.974483]     ok 1 test_clobber_zone
[   15.974927]     ok 2 test_next_pointer
[   15.975308]     ok 3 test_first_word
[   15.975672]     ok 4 test_clobber_50th_byte
[   15.976035]     ok 5 test_clobber_redzone_free
[   15.976128] stackdepot: allocating hash table of 1048576 entries via kvcalloc
[   15.979505]     ok 6 test_kmalloc_redzone_access
[   16.014408]     ok 7 test_kfree_rcu
[   17.726602]     ok 8 test_kfree_rcu_wq_destroy
[   17.750323]     ok 9 test_leak_destroy
[   17.750883]     ok 10 test_krealloc_redzone_zeroing
[   17.750887] # slub_test: pass:10 fail:0 skip:0 total:10
[   17.750890] # Totals: pass:10 fail:0 skip:0 total:10
[   17.750891] ok 1 slub_test
<snip>

--
Uladzislau Rezki


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 18:11                 ` Vlastimil Babka
@ 2025-02-25 18:21                   ` Uladzislau Rezki
  0 siblings, 0 replies; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-25 18:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Keith Busch, Paul E. McKenney, Joel Fernandes,
	Josh Triplett, Boqun Feng, Christoph Lameter, David Rientjes,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel, rcu,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev,
	Jann Horn, Mateusz Guzik, linux-nvme, leitao

On Tue, Feb 25, 2025 at 07:11:05PM +0100, Vlastimil Babka wrote:
> On 2/25/25 18:41, Uladzislau Rezki wrote:
> > 
> > but i had to adapt slightly the Vlastimil's test:
> 
> Great, works for me too, thanks!
> 
Sounds good :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 18:21                 ` Uladzislau Rezki
@ 2025-02-26 10:59                   ` Vlastimil Babka
  2025-02-26 14:31                     ` Uladzislau Rezki
  2025-02-26 15:51                   ` Keith Busch
  1 sibling, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2025-02-26 10:59 UTC (permalink / raw)
  To: Uladzislau Rezki, Keith Busch
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On 2/25/25 7:21 PM, Uladzislau Rezki wrote:
>>
> WQ_MEM_RECLAIM-patch fixes this for me:

Sounds good, can you send a formal patch then?
Some nits below:

> <snip>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 4030907b6b7d..1b5ed5512782 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
>  static int rcu_delay_page_cache_fill_msec = 5000;
>  module_param(rcu_delay_page_cache_fill_msec, int, 0444);
> 
> +static struct workqueue_struct *rcu_reclaim_wq;
> +
>  /* Maximum number of jiffies to wait before draining a batch. */
>  #define KFREE_DRAIN_JIFFIES (5 * HZ)
>  #define KFREE_N_BATCHES 2
> @@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
>         if (delayed_work_pending(&krcp->monitor_work)) {
>                 delay_left = krcp->monitor_work.timer.expires - jiffies;
>                 if (delay < delay_left)
> -                       mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> +                       mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
>                 return;
>         }
> -       queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> +       queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
>  }
> 
>  static void
> @@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
>                         // "free channels", the batch can handle. Break
>                         // the loop since it is done with this CPU thus
>                         // queuing an RCU work is _always_ success here.
> -                       queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
> +                       queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
>                         WARN_ON_ONCE(!queued);
>                         break;
>                 }
> @@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>         if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>                         !atomic_xchg(&krcp->work_in_progress, 1)) {
>                 if (atomic_read(&krcp->backoff_page_cache_fill)) {
> -                       queue_delayed_work(system_unbound_wq,
> +                       queue_delayed_work(rcu_reclaim_wq,
>                                 &krcp->page_cache_work,
>                                         msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
>                 } else {
> @@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
>         int i, j;
>         struct shrinker *kfree_rcu_shrinker;
> 
> +       rcu_reclaim_wq = alloc_workqueue("rcu_reclaim",

Should we name it "kvfree_rcu_reclaim"? rcu_reclaim sounds too generic
as if it's part of rcu itself?

> +               WQ_UNBOUND | WQ_MEM_RECLAIM, 0);

Do we want WQ_SYSFS? Or maybe only when someone asks, with a use case?

Thanks,
Vlastimil

> +       WARN_ON(!rcu_reclaim_wq);
> +
>         /* Clamp it to [0:100] seconds interval. */
>         if (rcu_delay_page_cache_fill_msec < 0 ||
>                 rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
> <snip>
> 
> it passes:
> 
> <snip>
> [   15.972416] KTAP version 1
> [   15.972421] 1..1
> [   15.973467]     KTAP version 1
> [   15.973470]     # Subtest: slub_test
> [   15.973472]     # module: slub_kunit
> [   15.973474]     1..10
> [   15.974483]     ok 1 test_clobber_zone
> [   15.974927]     ok 2 test_next_pointer
> [   15.975308]     ok 3 test_first_word
> [   15.975672]     ok 4 test_clobber_50th_byte
> [   15.976035]     ok 5 test_clobber_redzone_free
> [   15.976128] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> [   15.979505]     ok 6 test_kmalloc_redzone_access
> [   16.014408]     ok 7 test_kfree_rcu
> [   17.726602]     ok 8 test_kfree_rcu_wq_destroy
> [   17.750323]     ok 9 test_leak_destroy
> [   17.750883]     ok 10 test_krealloc_redzone_zeroing
> [   17.750887] # slub_test: pass:10 fail:0 skip:0 total:10
> [   17.750890] # Totals: pass:10 fail:0 skip:0 total:10
> [   17.750891] ok 1 slub_test
> <snip>
> 
> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-26 10:59                   ` Vlastimil Babka
@ 2025-02-26 14:31                     ` Uladzislau Rezki
  2025-02-26 14:36                       ` Vlastimil Babka
  0 siblings, 1 reply; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-26 14:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Keith Busch, Paul E. McKenney, Joel Fernandes,
	Josh Triplett, Boqun Feng, Christoph Lameter, David Rientjes,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel, rcu,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev,
	Jann Horn, Mateusz Guzik, linux-nvme, leitao

On Wed, Feb 26, 2025 at 11:59:53AM +0100, Vlastimil Babka wrote:
> On 2/25/25 7:21 PM, Uladzislau Rezki wrote:
> >>
> > WQ_MEM_RECLAIM-patch fixes this for me:
> 
> Sounds good, can you send a formal patch then?
>
Do you mean both? Test case and fix? I can :)

> Some nits below:
> 
> > <snip>
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 4030907b6b7d..1b5ed5512782 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
> >  static int rcu_delay_page_cache_fill_msec = 5000;
> >  module_param(rcu_delay_page_cache_fill_msec, int, 0444);
> > 
> > +static struct workqueue_struct *rcu_reclaim_wq;
> > +
> >  /* Maximum number of jiffies to wait before draining a batch. */
> >  #define KFREE_DRAIN_JIFFIES (5 * HZ)
> >  #define KFREE_N_BATCHES 2
> > @@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
> >         if (delayed_work_pending(&krcp->monitor_work)) {
> >                 delay_left = krcp->monitor_work.timer.expires - jiffies;
> >                 if (delay < delay_left)
> > -                       mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> > +                       mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
> >                 return;
> >         }
> > -       queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> > +       queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
> >  }
> > 
> >  static void
> > @@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
> >                         // "free channels", the batch can handle. Break
> >                         // the loop since it is done with this CPU thus
> >                         // queuing an RCU work is _always_ success here.
> > -                       queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
> > +                       queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
> >                         WARN_ON_ONCE(!queued);
> >                         break;
> >                 }
> > @@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
> >         if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> >                         !atomic_xchg(&krcp->work_in_progress, 1)) {
> >                 if (atomic_read(&krcp->backoff_page_cache_fill)) {
> > -                       queue_delayed_work(system_unbound_wq,
> > +                       queue_delayed_work(rcu_reclaim_wq,
> >                                 &krcp->page_cache_work,
> >                                         msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
> >                 } else {
> > @@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
> >         int i, j;
> >         struct shrinker *kfree_rcu_shrinker;
> > 
> > +       rcu_reclaim_wq = alloc_workqueue("rcu_reclaim",
> 
> Should we name it "kvfree_rcu_reclaim"? rcu_reclaim sounds too generic
> as if it's part of rcu itself?
> 
> > +               WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
> 
> Do we want WQ_SYSFS? Or maybe only when someone asks, with a use case?
> 
If someone asks, IMO.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-26 14:31                     ` Uladzislau Rezki
@ 2025-02-26 14:36                       ` Vlastimil Babka
  2025-02-26 15:42                         ` Uladzislau Rezki
  0 siblings, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2025-02-26 14:36 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Keith Busch, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On 2/26/25 3:31 PM, Uladzislau Rezki wrote:
> On Wed, Feb 26, 2025 at 11:59:53AM +0100, Vlastimil Babka wrote:
>> On 2/25/25 7:21 PM, Uladzislau Rezki wrote:
>>>>
>>> WQ_MEM_RECLAIM-patch fixes this for me:
>>
>> Sounds good, can you send a formal patch then?
>>
> Do you mean both? Test case and fix? I can :)

Sure, but only the fix is for stable. Thanks!



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-26 14:36                       ` Vlastimil Babka
@ 2025-02-26 15:42                         ` Uladzislau Rezki
  2025-02-26 15:46                           ` Vlastimil Babka
  0 siblings, 1 reply; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-26 15:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Keith Busch, Paul E. McKenney, Joel Fernandes,
	Josh Triplett, Boqun Feng, Christoph Lameter, David Rientjes,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel, rcu,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev,
	Jann Horn, Mateusz Guzik, linux-nvme, leitao

On Wed, Feb 26, 2025 at 03:36:39PM +0100, Vlastimil Babka wrote:
> On 2/26/25 3:31 PM, Uladzislau Rezki wrote:
> > On Wed, Feb 26, 2025 at 11:59:53AM +0100, Vlastimil Babka wrote:
> >> On 2/25/25 7:21 PM, Uladzislau Rezki wrote:
> >>>>
> >>> WQ_MEM_RECLAIM-patch fixes this for me:
> >>
> >> Sounds good, can you send a formal patch then?
> >>
> > Do you mean both? Test case and fix? I can :)
> 
> Sure, but only the fix is for stable. Thanks!
> 
It is taken by Gregg if there is a Fixes tag in the commit.
What do you mean: the fix is for stable? The current Linus
tree is not suffering from this?

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-26 15:42                         ` Uladzislau Rezki
@ 2025-02-26 15:46                           ` Vlastimil Babka
  2025-02-26 15:57                             ` Uladzislau Rezki
  0 siblings, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2025-02-26 15:46 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Keith Busch, Paul E. McKenney, Joel Fernandes, Josh Triplett,
	Boqun Feng, Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Andrew Morton, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, linux-kernel, rcu, Alexander Potapenko,
	Marco Elver, Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik,
	linux-nvme, leitao

On 2/26/25 4:42 PM, Uladzislau Rezki wrote:
> On Wed, Feb 26, 2025 at 03:36:39PM +0100, Vlastimil Babka wrote:
>> On 2/26/25 3:31 PM, Uladzislau Rezki wrote:
>>> On Wed, Feb 26, 2025 at 11:59:53AM +0100, Vlastimil Babka wrote:
>>>> On 2/25/25 7:21 PM, Uladzislau Rezki wrote:
>>>>>>
>>>>> WQ_MEM_RECLAIM-patch fixes this for me:
>>>>
>>>> Sounds good, can you send a formal patch then?
>>>>
>>> Do you mean both? Test case and fix? I can :)
>>
>> Sure, but only the fix is for stable. Thanks!
>>
> It is taken by Gregg if there is a Fixes tag in the commit.
> What do you mean: the fix is for stable? The current Linus
> tree is not suffering from this?

I just meant the fix should be a Cc: stable, and the testcase not.
mm/ has an exception from "anything with Fixes: can be taken to stable"

> 
> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-25 18:21                 ` Uladzislau Rezki
  2025-02-26 10:59                   ` Vlastimil Babka
@ 2025-02-26 15:51                   ` Keith Busch
  2025-02-26 15:58                     ` Uladzislau Rezki
  1 sibling, 1 reply; 23+ messages in thread
From: Keith Busch @ 2025-02-26 15:51 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Keith Busch, Vlastimil Babka, Paul E. McKenney, Joel Fernandes,
	Josh Triplett, Boqun Feng, Christoph Lameter, David Rientjes,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel, rcu,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev,
	Jann Horn, Mateusz Guzik, linux-nvme, leitao

On Tue, Feb 25, 2025 at 07:21:19PM +0100, Uladzislau Rezki wrote:
> WQ_MEM_RECLAIM-patch fixes this for me:

This is successful with the new kuint test for me as well. I can't
readily test this in production where I first learned of this issue (at
least not in the near term), but for what it's worth, this looks like a
good change to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>
 
> <snip>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 4030907b6b7d..1b5ed5512782 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
>  static int rcu_delay_page_cache_fill_msec = 5000;
>  module_param(rcu_delay_page_cache_fill_msec, int, 0444);
> 
> +static struct workqueue_struct *rcu_reclaim_wq;
> +
>  /* Maximum number of jiffies to wait before draining a batch. */
>  #define KFREE_DRAIN_JIFFIES (5 * HZ)
>  #define KFREE_N_BATCHES 2
> @@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
>         if (delayed_work_pending(&krcp->monitor_work)) {
>                 delay_left = krcp->monitor_work.timer.expires - jiffies;
>                 if (delay < delay_left)
> -                       mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> +                       mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
>                 return;
>         }
> -       queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> +       queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
>  }
> 
>  static void
> @@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
>                         // "free channels", the batch can handle. Break
>                         // the loop since it is done with this CPU thus
>                         // queuing an RCU work is _always_ success here.
> -                       queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
> +                       queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
>                         WARN_ON_ONCE(!queued);
>                         break;
>                 }
> @@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>         if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>                         !atomic_xchg(&krcp->work_in_progress, 1)) {
>                 if (atomic_read(&krcp->backoff_page_cache_fill)) {
> -                       queue_delayed_work(system_unbound_wq,
> +                       queue_delayed_work(rcu_reclaim_wq,
>                                 &krcp->page_cache_work,
>                                         msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
>                 } else {
> @@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
>         int i, j;
>         struct shrinker *kfree_rcu_shrinker;
> 
> +       rcu_reclaim_wq = alloc_workqueue("rcu_reclaim",
> +               WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
> +       WARN_ON(!rcu_reclaim_wq);
> +
>         /* Clamp it to [0:100] seconds interval. */
>         if (rcu_delay_page_cache_fill_msec < 0 ||
>                 rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
> <snip>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-26 15:46                           ` Vlastimil Babka
@ 2025-02-26 15:57                             ` Uladzislau Rezki
  0 siblings, 0 replies; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-26 15:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, Keith Busch, Paul E. McKenney, Joel Fernandes,
	Josh Triplett, Boqun Feng, Christoph Lameter, David Rientjes,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld, Andrew Morton,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, linux-kernel, rcu,
	Alexander Potapenko, Marco Elver, Dmitry Vyukov, kasan-dev,
	Jann Horn, Mateusz Guzik, linux-nvme, leitao

On Wed, Feb 26, 2025 at 04:46:38PM +0100, Vlastimil Babka wrote:
> On 2/26/25 4:42 PM, Uladzislau Rezki wrote:
> > On Wed, Feb 26, 2025 at 03:36:39PM +0100, Vlastimil Babka wrote:
> >> On 2/26/25 3:31 PM, Uladzislau Rezki wrote:
> >>> On Wed, Feb 26, 2025 at 11:59:53AM +0100, Vlastimil Babka wrote:
> >>>> On 2/25/25 7:21 PM, Uladzislau Rezki wrote:
> >>>>>>
> >>>>> WQ_MEM_RECLAIM-patch fixes this for me:
> >>>>
> >>>> Sounds good, can you send a formal patch then?
> >>>>
> >>> Do you mean both? Test case and fix? I can :)
> >>
> >> Sure, but only the fix is for stable. Thanks!
> >>
> > It is taken by Gregg if there is a Fixes tag in the commit.
> > What do you mean: the fix is for stable? The current Linus
> > tree is not suffering from this?
> 
> I just meant the fix should be a Cc: stable, and the testcase not.
> mm/ has an exception from "anything with Fixes: can be taken to stable"
> 
Got it.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
  2025-02-26 15:51                   ` Keith Busch
@ 2025-02-26 15:58                     ` Uladzislau Rezki
  0 siblings, 0 replies; 23+ messages in thread
From: Uladzislau Rezki @ 2025-02-26 15:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: Uladzislau Rezki, Keith Busch, Vlastimil Babka, Paul E. McKenney,
	Joel Fernandes, Josh Triplett, Boqun Feng, Christoph Lameter,
	David Rientjes, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan,
	Zqiang, Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Alexander Potapenko, Marco Elver,
	Dmitry Vyukov, kasan-dev, Jann Horn, Mateusz Guzik, linux-nvme,
	leitao

On Wed, Feb 26, 2025 at 08:51:37AM -0700, Keith Busch wrote:
> On Tue, Feb 25, 2025 at 07:21:19PM +0100, Uladzislau Rezki wrote:
> > WQ_MEM_RECLAIM-patch fixes this for me:
> 
> This is successful with the new kuint test for me as well. I can't
> readily test this in production where I first learned of this issue (at
> least not in the near term), but for what it's worth, this looks like a
> good change to me.
> 
> Reviewed-by: Keith Busch <kbusch@kernel.org>
>  
Thank you for checking. I will apply this.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-03-03 17:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240807-b4-slab-kfree_rcu-destroy-v2-0-ea79102f428c@suse.cz>
     [not found] ` <20240807-b4-slab-kfree_rcu-destroy-v2-6-ea79102f428c@suse.cz>
2025-02-21 16:30   ` [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy() Keith Busch
2025-02-21 16:51     ` Mateusz Guzik
2025-02-21 16:52       ` Mateusz Guzik
2025-02-21 17:28     ` Vlastimil Babka
2025-02-24 11:44       ` Uladzislau Rezki
2025-02-24 15:37         ` Keith Busch
2025-02-25  9:57         ` Vlastimil Babka
2025-02-25 13:39           ` Uladzislau Rezki
2025-02-25 14:12             ` Vlastimil Babka
2025-02-25 16:03           ` Keith Busch
2025-02-25 17:05             ` Keith Busch
2025-02-25 17:41               ` Uladzislau Rezki
2025-02-25 18:11                 ` Vlastimil Babka
2025-02-25 18:21                   ` Uladzislau Rezki
2025-02-25 18:21                 ` Uladzislau Rezki
2025-02-26 10:59                   ` Vlastimil Babka
2025-02-26 14:31                     ` Uladzislau Rezki
2025-02-26 14:36                       ` Vlastimil Babka
2025-02-26 15:42                         ` Uladzislau Rezki
2025-02-26 15:46                           ` Vlastimil Babka
2025-02-26 15:57                             ` Uladzislau Rezki
2025-02-26 15:51                   ` Keith Busch
2025-02-26 15:58                     ` Uladzislau Rezki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox