From: Uladzislau Rezki <urezki@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
linux-mm@kvack.org, Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
rcu@vger.kernel.org
Subject: Re: [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu
Date: Fri, 24 Jan 2025 15:19:32 +0100 [thread overview]
Message-ID: <Z5OhdK0OqIgT7qXU@pc636> (raw)
In-Reply-To: <e8bcc4bd-3e3b-4a96-8183-4af1fbb14211@suse.cz>
On Fri, Jan 24, 2025 at 03:16:05PM +0100, Vlastimil Babka wrote:
> On 1/24/25 13:47, Uladzislau Rezki wrote:
> > On Thu, Jan 23, 2025 at 11:37:20AM +0100, Vlastimil Babka wrote:
> >> RCU has been special-casing callback function pointers that are integers
> >> lower than 4096 as offsets of rcu_head for kvfree() instead. The tree
> >> RCU implementation no longer does that as the batched kvfree_rcu() is
> >> not a simple call_rcu(). The tiny RCU still does, and the plan is also
> >> to make tree RCU use call_rcu() for SLUB_TINY configurations.
> >>
> >> Instead of teaching tree RCU again to special case the offsets, let's
> >> remove the special casing completely. Since there's no SLOB anymore, it
> >> is possible to create a callback function that can take a pointer to a
> >> middle of slab object with unknown offset and determine the object's
> >> pointer before freeing it, so implement that as kvfree_rcu_cb().
> >>
> >> Large kmalloc and vmalloc allocations are handled simply by aligning
> >> down to page size. For that we retain the requirement that the offset is
> >> smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely
> >> and instead just opencode the condition in the BUILD_BUG_ON() check.
> >>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> >> include/linux/rcupdate.h | 24 +++++++++---------------
> >> kernel/rcu/tiny.c | 13 -------------
> >> mm/slab.h | 2 ++
> >> mm/slab_common.c | 5 +----
> >> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> >> 5 files changed, 54 insertions(+), 32 deletions(-)
> >>
> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >> index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644
> >> --- a/include/linux/rcupdate.h
> >> +++ b/include/linux/rcupdate.h
> >> @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> >> #define RCU_POINTER_INITIALIZER(p, v) \
> >> .p = RCU_INITIALIZER(v)
> >>
> >> -/*
> >> - * Does the specified offset indicate that the corresponding rcu_head
> >> - * structure can be handled by kvfree_rcu()?
> >> - */
> >> -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
> >> -
> >> /**
> >> * kfree_rcu() - kfree an object after a grace period.
> >> * @ptr: pointer to kfree for double-argument invocations.
> >> @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> >> * when they are used in a kernel module, that module must invoke the
> >> * high-latency rcu_barrier() function at module-unload time.
> >> *
> >> - * The kfree_rcu() function handles this issue. Rather than encoding a
> >> - * function address in the embedded rcu_head structure, kfree_rcu() instead
> >> - * encodes the offset of the rcu_head structure within the base structure.
> >> - * Because the functions are not allowed in the low-order 4096 bytes of
> >> - * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
> >> + * The kfree_rcu() function handles this issue. In order to have a universal
> >> + * callback function handling different offsets of rcu_head, the callback needs
> >> + * to determine the starting address of the freed object, which can be a large
> >> + * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to
> >> + * page boundary for those, only offsets up to 4095 bytes can be accommodated.
> >> * If the offset is larger than 4095 bytes, a compile-time error will
> >> * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can
> >> * either fall back to use of call_rcu() or rearrange the structure to
> >> @@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr);
> >> do { \
> >> typeof (ptr) ___p = (ptr); \
> >> \
> >> - if (___p) { \
> >> - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
> >> - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> >> - } \
> >> + if (___p) { \
> >> + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \
> >> + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> >> + } \
> >>
> > Why removing the macro? At least __is_kvfree_rcu_offset() makes it
> > clear what and why + it has a nice comment what it is used for. 4096
> > looks like hard-coded value.
>
> Hmm but it's ultimately shorter. We could add a short comment to
> kvfree_rcu_arg_2() referring to the longer explanation in the kfree_rcu()
> comment?
>
Sounds good to place or keep the comment.
> > Or you do not want that someone else started to use that macro in
> > another places?
>
> And also that, since this has to be exposed in a "public" header.
>
> >> diff --git a/mm/slab.h b/mm/slab.h
> >> index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644
> >> --- a/mm/slab.h
> >> +++ b/mm/slab.h
> >> @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> >> void **p, int objects, struct slabobj_ext *obj_exts);
> >> #endif
> >>
> >> +void kvfree_rcu_cb(struct rcu_head *head);
> >> +
> >> size_t __ksize(const void *objp);
> >>
> >> static inline size_t slab_ksize(const struct kmem_cache *s)
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head)
> >> rcu_lock_acquire(&rcu_callback_map);
> >> trace_rcu_invoke_kvfree_callback("slab", head, offset);
> >>
> >> - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> >> - kvfree(ptr);
> >> -
> > This is not correct unless i miss something. Why do you remove this?
>
> Oops, meant to remove just the warn check, and unconditionally call
> kvfree(), thanks for noticing :)
>
> The warning could really only trigger due to a use-after-free corruption of
> the ptr pointer, because the BUILD_BUG_ON() would catch misuse with a too
> large offset, so we don't need to keep it.
>
> >> +void kvfree_rcu_cb(struct rcu_head *head)
> >> +{
> >> + void *obj = head;
> >> + struct folio *folio;
> >> + struct slab *slab;
> >> + struct kmem_cache *s;
> >> + void *slab_addr;
> >> +
> >> + if (unlikely(is_vmalloc_addr(obj))) {
> >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> >> + vfree(obj);
> >> + return;
> >> + }
> >> +
> >> + folio = virt_to_folio(obj);
> >> + if (unlikely(!folio_test_slab(folio))) {
> >> + /*
> >> + * rcu_head offset can be only less than page size so no need to
> >> + * consider folio order
> >> + */
> >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> >> + free_large_kmalloc(folio, obj);
> >> + return;
> >> + }
> >> +
> >> + slab = folio_slab(folio);
> >> + s = slab->slab_cache;
> >> + slab_addr = folio_address(folio);
> >> +
> >> + if (is_kfence_address(obj)) {
> >> + obj = kfence_object_start(obj);
> >> + } else {
> >> + unsigned int idx = __obj_to_index(s, slab_addr, obj);
> >> +
> >> + obj = slab_addr + s->size * idx;
> >> + obj = fixup_red_left(s, obj);
> >> + }
> >> +
> >> + slab_free(s, slab, obj, _RET_IP_);
> >> +}
> >> +
> > Tiny computer case. Just small nit, maybe remove unlikely() but you decide :)
>
> Force of habbit :)
>
:)
--
Uladzislau Rezki
next prev parent reply other threads:[~2025-01-24 14:19 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of " Vlastimil Babka
2025-01-24 12:56 ` Uladzislau Rezki
2025-01-23 10:37 ` [PATCH RFC 2/4] rcu: remove trace_rcu_kvfree_callback Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu Vlastimil Babka
2025-01-24 12:47 ` Uladzislau Rezki
2025-01-24 14:16 ` Vlastimil Babka
2025-01-24 14:19 ` Uladzislau Rezki [this message]
2025-01-23 10:37 ` [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY Vlastimil Babka
2025-01-24 2:43 ` kernel test robot
2025-01-24 12:11 ` Uladzislau Rezki
2025-01-23 19:37 ` [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Joel Fernandes
2025-01-23 20:26 ` Paul E. McKenney
2025-01-24 14:25 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5OhdK0OqIgT7qXU@pc636 \
--to=urezki@gmail.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cl@linux.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=rostedt@goodmis.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.