From: Uladzislau Rezki <urezki@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Keith Busch <kbusch@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Julia Lawall <Julia.Lawall@inria.fr>,
Jakub Kicinski <kuba@kernel.org>,
"Jason A. Donenfeld" <Jason@zx2c4.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
rcu@vger.kernel.org, Alexander Potapenko <glider@google.com>,
Marco Elver <elver@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
kasan-dev@googlegroups.com, Jann Horn <jannh@google.com>,
Mateusz Guzik <mjguzik@gmail.com>,
linux-nvme@lists.infradead.org, leitao@debian.org
Subject: Re: [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()
Date: Tue, 25 Feb 2025 14:39:16 +0100 [thread overview]
Message-ID: <Z73IBMdk5fnmYnN1@pc636> (raw)
In-Reply-To: <ef97428b-f6e7-481e-b47e-375cc76653ad@suse.cz>
On Tue, Feb 25, 2025 at 10:57:38AM +0100, Vlastimil Babka wrote:
> On 2/24/25 12:44, Uladzislau Rezki wrote:
> > On Fri, Feb 21, 2025 at 06:28:49PM +0100, Vlastimil Babka wrote:
> >> On 2/21/25 17:30, Keith Busch wrote:
> >> > On Wed, Aug 07, 2024 at 12:31:19PM +0200, Vlastimil Babka wrote:
> >> >> We would like to replace call_rcu() users with kfree_rcu() where the
> >> >> existing callback is just a kmem_cache_free(). However this causes
> >> >> issues when the cache can be destroyed (such as due to module unload).
> >> >>
> >> >> Currently such modules should be issuing rcu_barrier() before
> >> >> kmem_cache_destroy() to have their call_rcu() callbacks processed first.
> >> >> This barrier is however not sufficient for kfree_rcu() in flight due
> >> >> to the batching introduced by a35d16905efc ("rcu: Add basic support for
> >> >> kfree_rcu() batching").
> >> >>
> >> >> This is not a problem for kmalloc caches which are never destroyed, but
> >> >> since removing SLOB, kfree_rcu() is allowed also for any other cache,
> >> >> that might be destroyed.
> >> >>
> >> >> In order not to complicate the API, put the responsibility for handling
> >> >> outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the newly
> >> >> introduced kvfree_rcu_barrier() to wait before destroying the cache.
> >> >> This is similar to how we issue rcu_barrier() for SLAB_TYPESAFE_BY_RCU
> >> >> caches, but has to be done earlier, as the latter only needs to wait for
> >> >> the empty slab pages to finish freeing, and not objects from the slab.
> >> >>
> >> >> Users of call_rcu() with arbitrary callbacks should still issue
> >> >> rcu_barrier() before destroying the cache and unloading the module, as
> >> >> kvfree_rcu_barrier() is not a superset of rcu_barrier() and the
> >> >> callbacks may be invoking module code or performing other actions that
> >> >> are necessary for a successful unload.
> >> >>
> >> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> >> ---
> >> >> mm/slab_common.c | 3 +++
> >> >> 1 file changed, 3 insertions(+)
> >> >>
> >> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> >> index c40227d5fa07..1a2873293f5d 100644
> >> >> --- a/mm/slab_common.c
> >> >> +++ b/mm/slab_common.c
> >> >> @@ -508,6 +508,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >> >> if (unlikely(!s) || !kasan_check_byte(s))
> >> >> return;
> >> >>
> >> >> + /* in-flight kfree_rcu()'s may include objects from our cache */
> >> >> + kvfree_rcu_barrier();
> >> >> +
> >> >> cpus_read_lock();
> >> >> mutex_lock(&slab_mutex);
> >> >
> >> > This patch appears to be triggering a new warning in certain conditions
> >> > when tearing down an nvme namespace's block device. Stack trace is at
> >> > the end.
> >> >
> >> > The warning indicates that this shouldn't be called from a
> >> > WQ_MEM_RECLAIM workqueue. This workqueue is responsible for bringing up
> >> > and tearing down block devices, so this is a memory reclaim use AIUI.
> >> > I'm a bit confused why we can't tear down a disk from within a memory
> >> > reclaim workqueue. Is the recommended solution to simply remove the WQ
> >> > flag when creating the workqueue?
> >>
> >> I think it's reasonable to expect a memory reclaim related action would
> >> destroy a kmem cache. Mateusz's suggestion would work around the issue, but
> >> then we could get another surprising warning elsewhere. Also making the
> >> kmem_cache destroys async can be tricky when a recreation happens
> >> immediately under the same name (implications with sysfs/debugfs etc). We
> >> managed to make the destroying synchronous as part of this series and it
> >> would be great to keep it that way.
> >>
> >> > ------------[ cut here ]------------
> >> > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_scan_work is flushing !WQ_MEM_RECLAIM events_unbound:kfree_rcu_work
> >>
> >> Maybe instead kfree_rcu_work should be using a WQ_MEM_RECLAIM workqueue? It
> >> is after all freeing memory. Ulad, what do you think?
> >>
> > We reclaim memory, therefore WQ_MEM_RECLAIM seems what we need.
> > AFAIR, there is an extra rescue worker, which can really help
> > under a low memory condition in a way that we do a progress.
> >
> > Do we have a reproducer of mentioned splat?
>
> I tried to create a kunit test for it, but it doesn't trigger anything. Maybe
> it's too simple, or racy, and thus we are not flushing any of the queues from
> kvfree_rcu_barrier()?
>
See some comments below. I will try to reproduce it today. But from the
first glance it should trigger it.
> ----8<----
> From 1e19ea78e7fe254034970f75e3b7cb705be50163 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 25 Feb 2025 10:51:28 +0100
> Subject: [PATCH] add test for kmem_cache_destroy in a workqueue
>
> ---
> lib/slub_kunit.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 48 insertions(+)
>
> diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
> index f11691315c2f..5fe9775fba05 100644
> --- a/lib/slub_kunit.c
> +++ b/lib/slub_kunit.c
> @@ -6,6 +6,7 @@
> #include <linux/module.h>
> #include <linux/kernel.h>
> #include <linux/rcupdate.h>
> +#include <linux/delay.h>
> #include "../mm/slab.h"
>
> static struct kunit_resource resource;
> @@ -181,6 +182,52 @@ static void test_kfree_rcu(struct kunit *test)
> KUNIT_EXPECT_EQ(test, 0, slab_errors);
> }
>
> +struct cache_destroy_work {
> + struct work_struct work;
> + struct kmem_cache *s;
> +};
> +
> +static void cache_destroy_workfn(struct work_struct *w)
> +{
> + struct cache_destroy_work *cdw;
> +
> + cdw = container_of(w, struct cache_destroy_work, work);
> +
> + kmem_cache_destroy(cdw->s);
> +}
> +
> +static void test_kfree_rcu_wq_destroy(struct kunit *test)
> +{
> + struct test_kfree_rcu_struct *p;
> + struct cache_destroy_work cdw;
> + struct workqueue_struct *wq;
> + struct kmem_cache *s;
> +
> + if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
> + kunit_skip(test, "can't do kfree_rcu() when test is built-in");
> +
> + INIT_WORK_ONSTACK(&cdw.work, cache_destroy_workfn);
> + wq = alloc_workqueue("test_kfree_rcu_destroy_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
>
Maybe it is worth to add WQ_HIGHPRI also to be ahead?
> + if (!wq)
> + kunit_skip(test, "failed to alloc wq");
> +
> + s = test_kmem_cache_create("TestSlub_kfree_rcu_wq_destroy",
> + sizeof(struct test_kfree_rcu_struct),
> + SLAB_NO_MERGE);
> + p = kmem_cache_alloc(s, GFP_KERNEL);
> +
> + kfree_rcu(p, rcu);
> +
> + cdw.s = s;
> + queue_work(wq, &cdw.work);
> + msleep(1000);
I am not sure it is needed. From the other hand it does nothing if
i do not miss anything.
> + flush_work(&cdw.work);
> +
> + destroy_workqueue(wq);
> +
> + KUNIT_EXPECT_EQ(test, 0, slab_errors);
> +}
> +
> static void test_leak_destroy(struct kunit *test)
> {
> struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
> @@ -254,6 +301,7 @@ static struct kunit_case test_cases[] = {
> KUNIT_CASE(test_clobber_redzone_free),
> KUNIT_CASE(test_kmalloc_redzone_access),
> KUNIT_CASE(test_kfree_rcu),
> + KUNIT_CASE(test_kfree_rcu_wq_destroy),
> KUNIT_CASE(test_leak_destroy),
> KUNIT_CASE(test_krealloc_redzone_zeroing),
> {}
> --
> 2.48.1
>
>
--
Uladzislau Rezki
next prev parent reply other threads:[~2025-02-25 17:03 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-07 10:31 [PATCH v2 0/7] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
2024-08-07 10:31 ` [PATCH v2 1/7] mm, slab: dissolve shutdown_cache() into its caller Vlastimil Babka
2024-08-07 19:11 ` Jann Horn
2024-08-07 10:31 ` [PATCH v2 2/7] mm, slab: unlink slabinfo, sysfs and debugfs immediately Vlastimil Babka
2024-08-07 19:11 ` Jann Horn
2024-08-07 10:31 ` [PATCH v2 3/7] mm, slab: move kfence_shutdown_cache() outside slab_mutex Vlastimil Babka
2024-08-07 19:11 ` Jann Horn
2024-08-07 10:31 ` [PATCH v2 4/7] mm, slab: reintroduce rcu_barrier() into kmem_cache_destroy() Vlastimil Babka
2024-08-07 19:11 ` Jann Horn
2024-08-07 10:31 ` [PATCH v2 5/7] rcu/kvfree: Add kvfree_rcu_barrier() API Vlastimil Babka
2024-08-09 16:26 ` Uladzislau Rezki
2024-08-09 17:00 ` Vlastimil Babka
2024-08-20 16:02 ` Uladzislau Rezki
2024-08-07 10:31 ` [PATCH v2 6/7] mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy() Vlastimil Babka
2025-02-21 16:30 ` Keith Busch
2025-02-21 16:51 ` Mateusz Guzik
2025-02-21 16:52 ` Mateusz Guzik
2025-02-21 17:28 ` Vlastimil Babka
2025-02-24 11:44 ` Uladzislau Rezki
2025-02-24 15:37 ` Keith Busch
2025-02-25 9:57 ` Vlastimil Babka
2025-02-25 13:39 ` Uladzislau Rezki [this message]
2025-02-25 14:12 ` Vlastimil Babka
2025-02-25 16:03 ` Keith Busch
2025-02-25 17:05 ` Keith Busch
2025-02-25 17:41 ` Uladzislau Rezki
2025-02-25 18:11 ` Vlastimil Babka
2025-02-25 18:21 ` Uladzislau Rezki
2025-02-25 18:21 ` Uladzislau Rezki
2025-02-26 10:59 ` Vlastimil Babka
2025-02-26 14:31 ` Uladzislau Rezki
2025-02-26 14:36 ` Vlastimil Babka
2025-02-26 15:42 ` Uladzislau Rezki
2025-02-26 15:46 ` Vlastimil Babka
2025-02-26 15:57 ` Uladzislau Rezki
2025-02-26 15:51 ` Keith Busch
2025-02-26 15:58 ` Uladzislau Rezki
2024-08-07 10:31 ` [PATCH v2 7/7] kunit, slub: add test_kfree_rcu() and test_leak_destroy() Vlastimil Babka
2024-08-09 16:23 ` Uladzislau Rezki
2024-09-14 13:22 ` Hyeonggon Yoo
2024-09-14 18:39 ` Vlastimil Babka
2024-09-20 13:35 ` Guenter Roeck
2024-09-21 20:40 ` Vlastimil Babka
2024-09-21 21:08 ` Guenter Roeck
2024-09-21 21:25 ` Vlastimil Babka
2024-09-22 6:16 ` Hyeonggon Yoo
2024-09-22 14:13 ` Guenter Roeck
2024-09-25 12:56 ` Hyeonggon Yoo
2024-09-26 12:54 ` Vlastimil Babka
2024-09-30 8:47 ` Vlastimil Babka
2024-08-09 15:02 ` [-next conflict imminent] Re: [PATCH v2 0/7] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
2024-08-09 15:12 ` Jann Horn
2024-08-09 15:14 ` Vlastimil Babka
2024-08-10 0:11 ` Andrew Morton
2024-08-10 20:25 ` Vlastimil Babka
2024-08-10 20:30 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z73IBMdk5fnmYnN1@pc636 \
--to=urezki@gmail.com \
--cc=42.hyeyoo@gmail.com \
--cc=Jason@zx2c4.com \
--cc=Julia.Lawall@inria.fr \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cl@linux.com \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=jannh@google.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=kasan-dev@googlegroups.com \
--cc=kbusch@kernel.org \
--cc=kuba@kernel.org \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mjguzik@gmail.com \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.