[PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case
@ 2025-02-28 12:13 Uladzislau Rezki (Sony)
  2025-02-28 12:13 ` [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq Uladzislau Rezki (Sony)
  2025-02-28 15:49 ` [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case Vlastimil Babka
  0 siblings, 2 replies; 9+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-02-28 12:13 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko, Keith Busch

Add a test_kfree_rcu_wq_destroy test to verify a kmem_cache_destroy()
from a workqueue context. The problem is that, before destroying any
cache the kvfree_rcu_barrier() is invoked to guarantee that in-flight
freed objects are flushed.

The _barrier() function queues and flushes its own internal workers
which might conflict with a workqueue type a kmem-cache gets destroyed
from.

One example is when a WQ_MEM_RECLAIM workqueue is flushing !WQ_MEM_RECLAIM
events which leads to a kernel splat. See the check_flush_dependency() in
the workqueue.c file.

If this test does not emits any kernel warning, it is passed.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Co-developed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 lib/slub_kunit.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
index f11691315c2f..d47c472b0520 100644
--- a/lib/slub_kunit.c
+++ b/lib/slub_kunit.c
@@ -6,6 +6,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/rcupdate.h>
+#include <linux/delay.h>
 #include "../mm/slab.h"
 
 static struct kunit_resource resource;
@@ -181,6 +182,63 @@ static void test_kfree_rcu(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, 0, slab_errors);
 }
 
+struct cache_destroy_work {
+	struct work_struct work;
+	struct kmem_cache *s;
+};
+
+static void cache_destroy_workfn(struct work_struct *w)
+{
+	struct cache_destroy_work *cdw;
+
+	cdw = container_of(w, struct cache_destroy_work, work);
+	kmem_cache_destroy(cdw->s);
+}
+
+#define KMEM_CACHE_DESTROY_NR 10
+
+static void test_kfree_rcu_wq_destroy(struct kunit *test)
+{
+	struct test_kfree_rcu_struct *p;
+	struct cache_destroy_work cdw;
+	struct workqueue_struct *wq;
+	struct kmem_cache *s;
+	unsigned int delay;
+	int i;
+
+	if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
+		kunit_skip(test, "can't do kfree_rcu() when test is built-in");
+
+	INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
+	wq = alloc_workqueue("test_kfree_rcu_destroy_wq",
+			WQ_HIGHPRI | WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+
+	if (!wq)
+		kunit_skip(test, "failed to alloc wq");
+
+	for (i = 0; i < KMEM_CACHE_DESTROY_NR; i++) {
+		s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
+				sizeof(struct test_kfree_rcu_struct),
+				SLAB_NO_MERGE);
+
+		if (!s)
+			kunit_skip(test, "failed to create cache");
+
+		delay = get_random_u8();
+		p = kmem_cache_alloc(s, GFP_KERNEL);
+		kfree_rcu(p, rcu);
+
+		cdw.s = s;
+
+		msleep(delay);
+		queue_work(wq, &cdw.work);
+		flush_work(&cdw.work);
+	}
+
+	destroy_workqueue(wq);
+	KUNIT_EXPECT_EQ(test, 0, slab_errors);
+}
+
 static void test_leak_destroy(struct kunit *test)
 {
 	struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
@@ -254,6 +312,7 @@ static struct kunit_case test_cases[] = {
 	KUNIT_CASE(test_clobber_redzone_free),
 	KUNIT_CASE(test_kmalloc_redzone_access),
 	KUNIT_CASE(test_kfree_rcu),
+	KUNIT_CASE(test_kfree_rcu_wq_destroy),
 	KUNIT_CASE(test_leak_destroy),
 	KUNIT_CASE(test_krealloc_redzone_zeroing),
 	{}
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq
  2025-02-28 12:13 [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case Uladzislau Rezki (Sony)
@ 2025-02-28 12:13 ` Uladzislau Rezki (Sony)
  2025-02-28 14:42   ` Vlastimil Babka
  2025-03-03 16:08   ` Joel Fernandes
  2025-02-28 15:49 ` [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case Vlastimil Babka
  1 sibling, 2 replies; 9+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-02-28 12:13 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko, stable, Greg Kroah-Hartman, Keith Busch

Currently kvfree_rcu() APIs use a system workqueue which is
"system_unbound_wq" to driver RCU machinery to reclaim a memory.

Recently, it has been noted that the following kernel warning can
be observed:

<snip>
workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
  WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
  Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ...
  CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
  Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
  Workqueue: nvme-wq nvme_scan_work
  RIP: 0010:check_flush_dependency+0x112/0x120
  Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ...
  RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
  RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
  RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
  RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
  R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
  R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
  CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __warn+0xa4/0x140
   ? check_flush_dependency+0x112/0x120
   ? report_bug+0xe1/0x140
   ? check_flush_dependency+0x112/0x120
   ? handle_bug+0x5e/0x90
   ? exc_invalid_op+0x16/0x40
   ? asm_exc_invalid_op+0x16/0x20
   ? timer_recalc_next_expiry+0x190/0x190
   ? check_flush_dependency+0x112/0x120
   ? check_flush_dependency+0x112/0x120
   __flush_work.llvm.1643880146586177030+0x174/0x2c0
   flush_rcu_work+0x28/0x30
   kvfree_rcu_barrier+0x12f/0x160
   kmem_cache_destroy+0x18/0x120
   bioset_exit+0x10c/0x150
   disk_release.llvm.6740012984264378178+0x61/0xd0
   device_release+0x4f/0x90
   kobject_put+0x95/0x180
   nvme_put_ns+0x23/0xc0
   nvme_remove_invalid_namespaces+0xb3/0xd0
   nvme_scan_work+0x342/0x490
   process_scheduled_works+0x1a2/0x370
   worker_thread+0x2ff/0x390
   ? pwq_release_workfn+0x1e0/0x1e0
   kthread+0xb1/0xe0
   ? __kthread_parkme+0x70/0x70
   ret_from_fork+0x30/0x40
   ? __kthread_parkme+0x70/0x70
   ret_from_fork_asm+0x11/0x20
   </TASK>
  ---[ end trace 0000000000000000 ]---
<snip>

To address this switch to use of independent WQ_MEM_RECLAIM
workqueue, so the rules are not violated from workqueue framework
point of view.

Apart of that, since kvfree_rcu() does reclaim memory it is worth
to go with WQ_MEM_RECLAIM type of wq because it is designed for
this purpose.

Cc: <stable@vger.kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Keith Busch <kbusch@kernel.org>
Closes: https://www.spinics.net/lists/kernel/msg5563270.html
Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
Reported-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/slab_common.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4030907b6b7d..4c9f0a87f733 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
 static int rcu_delay_page_cache_fill_msec = 5000;
 module_param(rcu_delay_page_cache_fill_msec, int, 0444);
 
+static struct workqueue_struct *rcu_reclaim_wq;
+
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (5 * HZ)
 #define KFREE_N_BATCHES 2
@@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
 	if (delayed_work_pending(&krcp->monitor_work)) {
 		delay_left = krcp->monitor_work.timer.expires - jiffies;
 		if (delay < delay_left)
-			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+			mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
 		return;
 	}
-	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+	queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
 }
 
 static void
@@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
 			// "free channels", the batch can handle. Break
 			// the loop since it is done with this CPU thus
 			// queuing an RCU work is _always_ success here.
-			queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
+			queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
 			WARN_ON_ONCE(!queued);
 			break;
 		}
@@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
 			!atomic_xchg(&krcp->work_in_progress, 1)) {
 		if (atomic_read(&krcp->backoff_page_cache_fill)) {
-			queue_delayed_work(system_unbound_wq,
+			queue_delayed_work(rcu_reclaim_wq,
 				&krcp->page_cache_work,
 					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
 		} else {
@@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
 	int i, j;
 	struct shrinker *kfree_rcu_shrinker;
 
+	rcu_reclaim_wq = alloc_workqueue("kvfree_rcu_reclaim",
+			WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+	WARN_ON(!rcu_reclaim_wq);
+
 	/* Clamp it to [0:100] seconds interval. */
 	if (rcu_delay_page_cache_fill_msec < 0 ||
 		rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq
  2025-02-28 12:13 ` [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq Uladzislau Rezki (Sony)
@ 2025-02-28 14:42   ` Vlastimil Babka
  2025-02-28 16:25     ` Uladzislau Rezki
  2025-03-03 16:08   ` Joel Fernandes
  1 sibling, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2025-02-28 14:42 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony), linux-mm, Andrew Morton
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko,
	stable, Greg Kroah-Hartman, Keith Busch

On 2/28/25 13:13, Uladzislau Rezki (Sony) wrote:
> Currently kvfree_rcu() APIs use a system workqueue which is
> "system_unbound_wq" to driver RCU machinery to reclaim a memory.
> 
> Recently, it has been noted that the following kernel warning can
> be observed:
> 
> <snip>
> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
>   WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
>   Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ...
>   CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
>   Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
>   Workqueue: nvme-wq nvme_scan_work
>   RIP: 0010:check_flush_dependency+0x112/0x120
>   Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ...
>   RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
>   RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
>   RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
>   RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
>   R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
>   R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
>   CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
>   PKRU: 55555554
>   Call Trace:
>    <TASK>
>    ? __warn+0xa4/0x140
>    ? check_flush_dependency+0x112/0x120
>    ? report_bug+0xe1/0x140
>    ? check_flush_dependency+0x112/0x120
>    ? handle_bug+0x5e/0x90
>    ? exc_invalid_op+0x16/0x40
>    ? asm_exc_invalid_op+0x16/0x20
>    ? timer_recalc_next_expiry+0x190/0x190
>    ? check_flush_dependency+0x112/0x120
>    ? check_flush_dependency+0x112/0x120
>    __flush_work.llvm.1643880146586177030+0x174/0x2c0
>    flush_rcu_work+0x28/0x30
>    kvfree_rcu_barrier+0x12f/0x160
>    kmem_cache_destroy+0x18/0x120
>    bioset_exit+0x10c/0x150
>    disk_release.llvm.6740012984264378178+0x61/0xd0
>    device_release+0x4f/0x90
>    kobject_put+0x95/0x180
>    nvme_put_ns+0x23/0xc0
>    nvme_remove_invalid_namespaces+0xb3/0xd0
>    nvme_scan_work+0x342/0x490
>    process_scheduled_works+0x1a2/0x370
>    worker_thread+0x2ff/0x390
>    ? pwq_release_workfn+0x1e0/0x1e0
>    kthread+0xb1/0xe0
>    ? __kthread_parkme+0x70/0x70
>    ret_from_fork+0x30/0x40
>    ? __kthread_parkme+0x70/0x70
>    ret_from_fork_asm+0x11/0x20
>    </TASK>
>   ---[ end trace 0000000000000000 ]---
> <snip>
> 
> To address this switch to use of independent WQ_MEM_RECLAIM
> workqueue, so the rules are not violated from workqueue framework
> point of view.
> 
> Apart of that, since kvfree_rcu() does reclaim memory it is worth
> to go with WQ_MEM_RECLAIM type of wq because it is designed for
> this purpose.
> 
> Cc: <stable@vger.kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

stable is sufficient, no need for greg himself too

> Cc: Keith Busch <kbusch@kernel.org>
> Closes: https://www.spinics.net/lists/kernel/msg5563270.html

lore pls :)

> Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
> Reported-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

fixed locally and pushed to slab/for-next-fixes
thanks!



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq
  2025-02-28 14:42   ` Vlastimil Babka
@ 2025-02-28 16:25     ` Uladzislau Rezki
  0 siblings, 0 replies; 9+ messages in thread
From: Uladzislau Rezki @ 2025-02-28 16:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko, stable,
	Greg Kroah-Hartman, Keith Busch

On Fri, Feb 28, 2025 at 03:42:02PM +0100, Vlastimil Babka wrote:
> On 2/28/25 13:13, Uladzislau Rezki (Sony) wrote:
> > Currently kvfree_rcu() APIs use a system workqueue which is
> > "system_unbound_wq" to driver RCU machinery to reclaim a memory.
> > 
> > Recently, it has been noted that the following kernel warning can
> > be observed:
> > 
> > <snip>
> > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> >   WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
> >   Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ...
> >   CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
> >   Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
> >   Workqueue: nvme-wq nvme_scan_work
> >   RIP: 0010:check_flush_dependency+0x112/0x120
> >   Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ...
> >   RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
> >   RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
> >   RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
> >   RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
> >   R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
> >   R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
> >   FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
> >   CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
> >   PKRU: 55555554
> >   Call Trace:
> >    <TASK>
> >    ? __warn+0xa4/0x140
> >    ? check_flush_dependency+0x112/0x120
> >    ? report_bug+0xe1/0x140
> >    ? check_flush_dependency+0x112/0x120
> >    ? handle_bug+0x5e/0x90
> >    ? exc_invalid_op+0x16/0x40
> >    ? asm_exc_invalid_op+0x16/0x20
> >    ? timer_recalc_next_expiry+0x190/0x190
> >    ? check_flush_dependency+0x112/0x120
> >    ? check_flush_dependency+0x112/0x120
> >    __flush_work.llvm.1643880146586177030+0x174/0x2c0
> >    flush_rcu_work+0x28/0x30
> >    kvfree_rcu_barrier+0x12f/0x160
> >    kmem_cache_destroy+0x18/0x120
> >    bioset_exit+0x10c/0x150
> >    disk_release.llvm.6740012984264378178+0x61/0xd0
> >    device_release+0x4f/0x90
> >    kobject_put+0x95/0x180
> >    nvme_put_ns+0x23/0xc0
> >    nvme_remove_invalid_namespaces+0xb3/0xd0
> >    nvme_scan_work+0x342/0x490
> >    process_scheduled_works+0x1a2/0x370
> >    worker_thread+0x2ff/0x390
> >    ? pwq_release_workfn+0x1e0/0x1e0
> >    kthread+0xb1/0xe0
> >    ? __kthread_parkme+0x70/0x70
> >    ret_from_fork+0x30/0x40
> >    ? __kthread_parkme+0x70/0x70
> >    ret_from_fork_asm+0x11/0x20
> >    </TASK>
> >   ---[ end trace 0000000000000000 ]---
> > <snip>
> > 
> > To address this switch to use of independent WQ_MEM_RECLAIM
> > workqueue, so the rules are not violated from workqueue framework
> > point of view.
> > 
> > Apart of that, since kvfree_rcu() does reclaim memory it is worth
> > to go with WQ_MEM_RECLAIM type of wq because it is designed for
> > this purpose.
> > 
> > Cc: <stable@vger.kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> stable is sufficient, no need for greg himself too
> 
> > Cc: Keith Busch <kbusch@kernel.org>
> > Closes: https://www.spinics.net/lists/kernel/msg5563270.html
> 
> lore pls :)
> 
Thanks, got it. I tried but did not find the link :)

> > Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
> > Reported-by: Keith Busch <kbusch@kernel.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> fixed locally and pushed to slab/for-next-fixes
> thanks!
>
Thanks!

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq
  2025-02-28 12:13 ` [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq Uladzislau Rezki (Sony)
  2025-02-28 14:42   ` Vlastimil Babka
@ 2025-03-03 16:08   ` Joel Fernandes
  2025-03-04 14:55     ` Paul E. McKenney
  1 sibling, 1 reply; 9+ messages in thread
From: Joel Fernandes @ 2025-03-03 16:08 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Andrew Morton, Vlastimil Babka, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko, stable,
	Greg Kroah-Hartman, Keith Busch

On Fri, Feb 28, 2025 at 01:13:56PM +0100, Uladzislau Rezki (Sony) wrote:
> Currently kvfree_rcu() APIs use a system workqueue which is
> "system_unbound_wq" to driver RCU machinery to reclaim a memory.
> 
> Recently, it has been noted that the following kernel warning can
> be observed:
> 
> <snip>
> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
>   WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
>   Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ...
>   CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
>   Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
>   Workqueue: nvme-wq nvme_scan_work
>   RIP: 0010:check_flush_dependency+0x112/0x120
>   Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ...
>   RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
>   RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
>   RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
>   RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
>   R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
>   R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
>   FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
>   CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
>   PKRU: 55555554
>   Call Trace:
>    <TASK>
>    ? __warn+0xa4/0x140
>    ? check_flush_dependency+0x112/0x120
>    ? report_bug+0xe1/0x140
>    ? check_flush_dependency+0x112/0x120
>    ? handle_bug+0x5e/0x90
>    ? exc_invalid_op+0x16/0x40
>    ? asm_exc_invalid_op+0x16/0x20
>    ? timer_recalc_next_expiry+0x190/0x190
>    ? check_flush_dependency+0x112/0x120
>    ? check_flush_dependency+0x112/0x120
>    __flush_work.llvm.1643880146586177030+0x174/0x2c0
>    flush_rcu_work+0x28/0x30
>    kvfree_rcu_barrier+0x12f/0x160
>    kmem_cache_destroy+0x18/0x120
>    bioset_exit+0x10c/0x150
>    disk_release.llvm.6740012984264378178+0x61/0xd0
>    device_release+0x4f/0x90
>    kobject_put+0x95/0x180
>    nvme_put_ns+0x23/0xc0
>    nvme_remove_invalid_namespaces+0xb3/0xd0
>    nvme_scan_work+0x342/0x490
>    process_scheduled_works+0x1a2/0x370
>    worker_thread+0x2ff/0x390
>    ? pwq_release_workfn+0x1e0/0x1e0
>    kthread+0xb1/0xe0
>    ? __kthread_parkme+0x70/0x70
>    ret_from_fork+0x30/0x40
>    ? __kthread_parkme+0x70/0x70
>    ret_from_fork_asm+0x11/0x20
>    </TASK>
>   ---[ end trace 0000000000000000 ]---
> <snip>
> 
> To address this switch to use of independent WQ_MEM_RECLAIM
> workqueue, so the rules are not violated from workqueue framework
> point of view.
> 
> Apart of that, since kvfree_rcu() does reclaim memory it is worth
> to go with WQ_MEM_RECLAIM type of wq because it is designed for
> this purpose.
> 
> Cc: <stable@vger.kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Keith Busch <kbusch@kernel.org>
> Closes: https://www.spinics.net/lists/kernel/msg5563270.html
> Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
> Reported-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

BTW, there is a path in RCU-tasks that involves queuing work on system_wq
which is !WQ_RECLAIM. While I don't anticipate an issue such as the one fixed
by this patch, I am wondering if we should move these to their own WQ_RECLAIM
queues for added robustness since otherwise that will result in CB invocation
(And thus memory freeing delays). Paul?

kernel/rcu/tasks.h:       queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
kernel/rcu/tasks.h:       queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);

For this patch:
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>

thanks,

 - Joel


> ---
>  mm/slab_common.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 4030907b6b7d..4c9f0a87f733 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
>  static int rcu_delay_page_cache_fill_msec = 5000;
>  module_param(rcu_delay_page_cache_fill_msec, int, 0444);
>  
> +static struct workqueue_struct *rcu_reclaim_wq;
> +
>  /* Maximum number of jiffies to wait before draining a batch. */
>  #define KFREE_DRAIN_JIFFIES (5 * HZ)
>  #define KFREE_N_BATCHES 2
> @@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
>  	if (delayed_work_pending(&krcp->monitor_work)) {
>  		delay_left = krcp->monitor_work.timer.expires - jiffies;
>  		if (delay < delay_left)
> -			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> +			mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
>  		return;
>  	}
> -	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> +	queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
>  }
>  
>  static void
> @@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
>  			// "free channels", the batch can handle. Break
>  			// the loop since it is done with this CPU thus
>  			// queuing an RCU work is _always_ success here.
> -			queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
> +			queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
>  			WARN_ON_ONCE(!queued);
>  			break;
>  		}
> @@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
>  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
>  			!atomic_xchg(&krcp->work_in_progress, 1)) {
>  		if (atomic_read(&krcp->backoff_page_cache_fill)) {
> -			queue_delayed_work(system_unbound_wq,
> +			queue_delayed_work(rcu_reclaim_wq,
>  				&krcp->page_cache_work,
>  					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
>  		} else {
> @@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
>  	int i, j;
>  	struct shrinker *kfree_rcu_shrinker;
>  
> +	rcu_reclaim_wq = alloc_workqueue("kvfree_rcu_reclaim",
> +			WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
> +	WARN_ON(!rcu_reclaim_wq);
> +
>  	/* Clamp it to [0:100] seconds interval. */
>  	if (rcu_delay_page_cache_fill_msec < 0 ||
>  		rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
> -- 
> 2.39.5
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq
  2025-03-03 16:08   ` Joel Fernandes
@ 2025-03-04 14:55     ` Paul E. McKenney
  2025-03-06 18:26       ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2025-03-04 14:55 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, Vlastimil Babka,
	RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko,
	stable, Greg Kroah-Hartman, Keith Busch

On Mon, Mar 03, 2025 at 11:08:24AM -0500, Joel Fernandes wrote:
> On Fri, Feb 28, 2025 at 01:13:56PM +0100, Uladzislau Rezki (Sony) wrote:
> > Currently kvfree_rcu() APIs use a system workqueue which is
> > "system_unbound_wq" to driver RCU machinery to reclaim a memory.
> > 
> > Recently, it has been noted that the following kernel warning can
> > be observed:
> > 
> > <snip>
> > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> >   WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
> >   Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ...
> >   CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
> >   Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
> >   Workqueue: nvme-wq nvme_scan_work
> >   RIP: 0010:check_flush_dependency+0x112/0x120
> >   Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ...
> >   RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
> >   RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
> >   RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
> >   RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
> >   R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
> >   R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
> >   FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
> >   CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
> >   PKRU: 55555554
> >   Call Trace:
> >    <TASK>
> >    ? __warn+0xa4/0x140
> >    ? check_flush_dependency+0x112/0x120
> >    ? report_bug+0xe1/0x140
> >    ? check_flush_dependency+0x112/0x120
> >    ? handle_bug+0x5e/0x90
> >    ? exc_invalid_op+0x16/0x40
> >    ? asm_exc_invalid_op+0x16/0x20
> >    ? timer_recalc_next_expiry+0x190/0x190
> >    ? check_flush_dependency+0x112/0x120
> >    ? check_flush_dependency+0x112/0x120
> >    __flush_work.llvm.1643880146586177030+0x174/0x2c0
> >    flush_rcu_work+0x28/0x30
> >    kvfree_rcu_barrier+0x12f/0x160
> >    kmem_cache_destroy+0x18/0x120
> >    bioset_exit+0x10c/0x150
> >    disk_release.llvm.6740012984264378178+0x61/0xd0
> >    device_release+0x4f/0x90
> >    kobject_put+0x95/0x180
> >    nvme_put_ns+0x23/0xc0
> >    nvme_remove_invalid_namespaces+0xb3/0xd0
> >    nvme_scan_work+0x342/0x490
> >    process_scheduled_works+0x1a2/0x370
> >    worker_thread+0x2ff/0x390
> >    ? pwq_release_workfn+0x1e0/0x1e0
> >    kthread+0xb1/0xe0
> >    ? __kthread_parkme+0x70/0x70
> >    ret_from_fork+0x30/0x40
> >    ? __kthread_parkme+0x70/0x70
> >    ret_from_fork_asm+0x11/0x20
> >    </TASK>
> >   ---[ end trace 0000000000000000 ]---
> > <snip>
> > 
> > To address this switch to use of independent WQ_MEM_RECLAIM
> > workqueue, so the rules are not violated from workqueue framework
> > point of view.
> > 
> > Apart of that, since kvfree_rcu() does reclaim memory it is worth
> > to go with WQ_MEM_RECLAIM type of wq because it is designed for
> > this purpose.
> > 
> > Cc: <stable@vger.kernel.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Keith Busch <kbusch@kernel.org>
> > Closes: https://www.spinics.net/lists/kernel/msg5563270.html
> > Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
> > Reported-by: Keith Busch <kbusch@kernel.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> BTW, there is a path in RCU-tasks that involves queuing work on system_wq
> which is !WQ_RECLAIM. While I don't anticipate an issue such as the one fixed
> by this patch, I am wondering if we should move these to their own WQ_RECLAIM
> queues for added robustness since otherwise that will result in CB invocation
> (And thus memory freeing delays). Paul?

For RCU Tasks, the memory traffic has been much lower.  But maybe someday
someone will drop a million trampolines all at once.  But let's see that
problem before we fix some random problem that we believe will happen,
but which proves to be only slightly related to the problem that actually
does happen.  ;-)

							Thanx, Paul

> kernel/rcu/tasks.h:       queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
> kernel/rcu/tasks.h:       queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work);
> 
> For this patch:
> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
> 
> thanks,
> 
>  - Joel
> 
> 
> > ---
> >  mm/slab_common.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 4030907b6b7d..4c9f0a87f733 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1304,6 +1304,8 @@ module_param(rcu_min_cached_objs, int, 0444);
> >  static int rcu_delay_page_cache_fill_msec = 5000;
> >  module_param(rcu_delay_page_cache_fill_msec, int, 0444);
> >  
> > +static struct workqueue_struct *rcu_reclaim_wq;
> > +
> >  /* Maximum number of jiffies to wait before draining a batch. */
> >  #define KFREE_DRAIN_JIFFIES (5 * HZ)
> >  #define KFREE_N_BATCHES 2
> > @@ -1632,10 +1634,10 @@ __schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
> >  	if (delayed_work_pending(&krcp->monitor_work)) {
> >  		delay_left = krcp->monitor_work.timer.expires - jiffies;
> >  		if (delay < delay_left)
> > -			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> > +			mod_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
> >  		return;
> >  	}
> > -	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
> > +	queue_delayed_work(rcu_reclaim_wq, &krcp->monitor_work, delay);
> >  }
> >  
> >  static void
> > @@ -1733,7 +1735,7 @@ kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
> >  			// "free channels", the batch can handle. Break
> >  			// the loop since it is done with this CPU thus
> >  			// queuing an RCU work is _always_ success here.
> > -			queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
> > +			queued = queue_rcu_work(rcu_reclaim_wq, &krwp->rcu_work);
> >  			WARN_ON_ONCE(!queued);
> >  			break;
> >  		}
> > @@ -1883,7 +1885,7 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
> >  	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
> >  			!atomic_xchg(&krcp->work_in_progress, 1)) {
> >  		if (atomic_read(&krcp->backoff_page_cache_fill)) {
> > -			queue_delayed_work(system_unbound_wq,
> > +			queue_delayed_work(rcu_reclaim_wq,
> >  				&krcp->page_cache_work,
> >  					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
> >  		} else {
> > @@ -2120,6 +2122,10 @@ void __init kvfree_rcu_init(void)
> >  	int i, j;
> >  	struct shrinker *kfree_rcu_shrinker;
> >  
> > +	rcu_reclaim_wq = alloc_workqueue("kvfree_rcu_reclaim",
> > +			WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
> > +	WARN_ON(!rcu_reclaim_wq);
> > +
> >  	/* Clamp it to [0:100] seconds interval. */
> >  	if (rcu_delay_page_cache_fill_msec < 0 ||
> >  		rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
> > -- 
> > 2.39.5
> > 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq
  2025-03-04 14:55     ` Paul E. McKenney
@ 2025-03-06 18:26       ` Joel Fernandes
  0 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes @ 2025-03-06 18:26 UTC (permalink / raw)
  To: paulmck
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, Vlastimil Babka,
	RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko,
	stable, Greg Kroah-Hartman, Keith Busch



On 3/4/2025 9:55 AM, Paul E. McKenney wrote:
> On Mon, Mar 03, 2025 at 11:08:24AM -0500, Joel Fernandes wrote:
>> On Fri, Feb 28, 2025 at 01:13:56PM +0100, Uladzislau Rezki (Sony) wrote:
>>> Currently kvfree_rcu() APIs use a system workqueue which is
>>> "system_unbound_wq" to driver RCU machinery to reclaim a memory.
>>>
>>> Recently, it has been noted that the following kernel warning can
>>> be observed:
>>>
>>> <snip>
>>> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
>>>   WARNING: CPU: 21 PID: 330 at kernel/workqueue.c:3719 check_flush_dependency+0x112/0x120
>>>   Modules linked in: intel_uncore_frequency(E) intel_uncore_frequency_common(E) skx_edac(E) ...
>>>   CPU: 21 UID: 0 PID: 330 Comm: kworker/u144:6 Tainted: G            E      6.13.2-0_g925d379822da #1
>>>   Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM20 02/01/2023
>>>   Workqueue: nvme-wq nvme_scan_work
>>>   RIP: 0010:check_flush_dependency+0x112/0x120
>>>   Code: 05 9a 40 14 02 01 48 81 c6 c0 00 00 00 48 8b 50 18 48 81 c7 c0 00 00 00 48 89 f9 48 ...
>>>   RSP: 0018:ffffc90000df7bd8 EFLAGS: 00010082
>>>   RAX: 000000000000006a RBX: ffffffff81622390 RCX: 0000000000000027
>>>   RDX: 00000000fffeffff RSI: 000000000057ffa8 RDI: ffff88907f960c88
>>>   RBP: 0000000000000000 R08: ffffffff83068e50 R09: 000000000002fffd
>>>   R10: 0000000000000004 R11: 0000000000000000 R12: ffff8881001a4400
>>>   R13: 0000000000000000 R14: ffff88907f420fb8 R15: 0000000000000000
>>>   FS:  0000000000000000(0000) GS:ffff88907f940000(0000) knlGS:0000000000000000
>>>   CR2: 00007f60c3001000 CR3: 000000107d010005 CR4: 00000000007726f0
>>>   PKRU: 55555554
>>>   Call Trace:
>>>    <TASK>
>>>    ? __warn+0xa4/0x140
>>>    ? check_flush_dependency+0x112/0x120
>>>    ? report_bug+0xe1/0x140
>>>    ? check_flush_dependency+0x112/0x120
>>>    ? handle_bug+0x5e/0x90
>>>    ? exc_invalid_op+0x16/0x40
>>>    ? asm_exc_invalid_op+0x16/0x20
>>>    ? timer_recalc_next_expiry+0x190/0x190
>>>    ? check_flush_dependency+0x112/0x120
>>>    ? check_flush_dependency+0x112/0x120
>>>    __flush_work.llvm.1643880146586177030+0x174/0x2c0
>>>    flush_rcu_work+0x28/0x30
>>>    kvfree_rcu_barrier+0x12f/0x160
>>>    kmem_cache_destroy+0x18/0x120
>>>    bioset_exit+0x10c/0x150
>>>    disk_release.llvm.6740012984264378178+0x61/0xd0
>>>    device_release+0x4f/0x90
>>>    kobject_put+0x95/0x180
>>>    nvme_put_ns+0x23/0xc0
>>>    nvme_remove_invalid_namespaces+0xb3/0xd0
>>>    nvme_scan_work+0x342/0x490
>>>    process_scheduled_works+0x1a2/0x370
>>>    worker_thread+0x2ff/0x390
>>>    ? pwq_release_workfn+0x1e0/0x1e0
>>>    kthread+0xb1/0xe0
>>>    ? __kthread_parkme+0x70/0x70
>>>    ret_from_fork+0x30/0x40
>>>    ? __kthread_parkme+0x70/0x70
>>>    ret_from_fork_asm+0x11/0x20
>>>    </TASK>
>>>   ---[ end trace 0000000000000000 ]---
>>> <snip>
>>>
>>> To address this switch to use of independent WQ_MEM_RECLAIM
>>> workqueue, so the rules are not violated from workqueue framework
>>> point of view.
>>>
>>> Apart of that, since kvfree_rcu() does reclaim memory it is worth
>>> to go with WQ_MEM_RECLAIM type of wq because it is designed for
>>> this purpose.
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: Keith Busch <kbusch@kernel.org>
>>> Closes: https://www.spinics.net/lists/kernel/msg5563270.html
>>> Fixes: 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"),
>>> Reported-by: Keith Busch <kbusch@kernel.org>
>>> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>>
>> BTW, there is a path in RCU-tasks that involves queuing work on system_wq
>> which is !WQ_RECLAIM. While I don't anticipate an issue such as the one fixed
>> by this patch, I am wondering if we should move these to their own WQ_RECLAIM
>> queues for added robustness since otherwise that will result in CB invocation
>> (And thus memory freeing delays). Paul?
> 
> For RCU Tasks, the memory traffic has been much lower.  But maybe someday
> someone will drop a million trampolines all at once.  But let's see that
> problem before we fix some random problem that we believe will happen,
> but which proves to be only slightly related to the problem that actually
> does happen.  ;-)
> 


Fair enough. ;-)

thanks,

- Joel



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case
  2025-02-28 12:13 [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case Uladzislau Rezki (Sony)
  2025-02-28 12:13 ` [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq Uladzislau Rezki (Sony)
@ 2025-02-28 15:49 ` Vlastimil Babka
  2025-02-28 16:27   ` Uladzislau Rezki
  1 sibling, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2025-02-28 15:49 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony), linux-mm, Andrew Morton
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko,
	Keith Busch

On 2/28/25 13:13, Uladzislau Rezki (Sony) wrote:
> Add a test_kfree_rcu_wq_destroy test to verify a kmem_cache_destroy()
> from a workqueue context. The problem is that, before destroying any
> cache the kvfree_rcu_barrier() is invoked to guarantee that in-flight
> freed objects are flushed.
> 
> The _barrier() function queues and flushes its own internal workers
> which might conflict with a workqueue type a kmem-cache gets destroyed
> from.
> 
> One example is when a WQ_MEM_RECLAIM workqueue is flushing !WQ_MEM_RECLAIM
> events which leads to a kernel splat. See the check_flush_dependency() in
> the workqueue.c file.
> 
> If this test does not emits any kernel warning, it is passed.

Well the workqueue warning doesn't seem to make the test fail. But someone
will notice the warning, so that should be enough. We can't instrument
warnings in other subsystem's code for slub kunit context anyway. It would
have to be a generic kunit's hook for all warns.

> Reviewed-by: Keith Busch <kbusch@kernel.org>
> Co-developed-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Pushed to slab/for-next, thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case
  2025-02-28 15:49 ` [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case Vlastimil Babka
@ 2025-02-28 16:27   ` Uladzislau Rezki
  0 siblings, 0 replies; 9+ messages in thread
From: Uladzislau Rezki @ 2025-02-28 16:27 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki (Sony), linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko, Keith Busch

On Fri, Feb 28, 2025 at 04:49:24PM +0100, Vlastimil Babka wrote:
> On 2/28/25 13:13, Uladzislau Rezki (Sony) wrote:
> > Add a test_kfree_rcu_wq_destroy test to verify a kmem_cache_destroy()
> > from a workqueue context. The problem is that, before destroying any
> > cache the kvfree_rcu_barrier() is invoked to guarantee that in-flight
> > freed objects are flushed.
> > 
> > The _barrier() function queues and flushes its own internal workers
> > which might conflict with a workqueue type a kmem-cache gets destroyed
> > from.
> > 
> > One example is when a WQ_MEM_RECLAIM workqueue is flushing !WQ_MEM_RECLAIM
> > events which leads to a kernel splat. See the check_flush_dependency() in
> > the workqueue.c file.
> > 
> > If this test does not emits any kernel warning, it is passed.
> 
> Well the workqueue warning doesn't seem to make the test fail. But someone
> will notice the warning, so that should be enough. We can't instrument
> warnings in other subsystem's code for slub kunit context anyway. It would
> have to be a generic kunit's hook for all warns.
> 
I agree.

> > Reviewed-by: Keith Busch <kbusch@kernel.org>
> > Co-developed-by: Vlastimil Babka <vbabka@suse.cz>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Pushed to slab/for-next, thanks.
>
Thanks!

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-06 18:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28 12:13 [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case Uladzislau Rezki (Sony)
2025-02-28 12:13 ` [PATCH v1 2/2] mm/slab/kvfree_rcu: Switch to WQ_MEM_RECLAIM wq Uladzislau Rezki (Sony)
2025-02-28 14:42   ` Vlastimil Babka
2025-02-28 16:25     ` Uladzislau Rezki
2025-03-03 16:08   ` Joel Fernandes
2025-03-04 14:55     ` Paul E. McKenney
2025-03-06 18:26       ` Joel Fernandes
2025-02-28 15:49 ` [PATCH v1 1/2] kunit, slub: Add test_kfree_rcu_wq_destroy use case Vlastimil Babka
2025-02-28 16:27   ` Uladzislau Rezki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).