From: John Fastabend <john.fastabend@gmail.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Zijian Zhang <zijianzhang@bytedance.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
zhoufeng.zf@bytedance.com, jakub@cloudflare.com,
Cong Wang <cong.wang@bytedance.com>
Subject: Re: [Patch bpf-next v3 2/4] skmsg: implement slab allocator cache for sk_msg
Date: Thu, 29 May 2025 23:30:11 -0700 [thread overview]
Message-ID: <20250530063011.cqubbcehgnraasmx@gmail.com> (raw)
In-Reply-To: <aDipm6P+RWGD8j4M@pop-os.localdomain>
On 2025-05-29 11:38:19, Cong Wang wrote:
> On Wed, May 28, 2025 at 05:49:22PM -0700, Zijian Zhang wrote:
> > On 5/28/25 5:04 PM, John Fastabend wrote:
> > > On 2025-05-19 13:36:26, Cong Wang wrote:
> > > > From: Zijian Zhang <zijianzhang@bytedance.com>
> > > >
> > > > Optimizing redirect ingress performance requires frequent allocation and
> > > > deallocation of sk_msg structures. Introduce a dedicated kmem_cache for
> > > > sk_msg to reduce memory allocation overhead and improve performance.
> > > >
> > > > Reviewed-by: Cong Wang <cong.wang@bytedance.com>
> > > > Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> > > > ---
> > > > include/linux/skmsg.h | 21 ++++++++++++---------
> > > > net/core/skmsg.c | 28 +++++++++++++++++++++-------
> > > > net/ipv4/tcp_bpf.c | 5 ++---
> > > > 3 files changed, 35 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> > > > index d6f0a8cd73c4..bf28ce9b5fdb 100644
> > > > --- a/include/linux/skmsg.h
> > > > +++ b/include/linux/skmsg.h
> > > > @@ -121,6 +121,7 @@ struct sk_psock {
> > > > struct rcu_work rwork;
> > > > };
> > > > +struct sk_msg *sk_msg_alloc(gfp_t gfp);
> > > > int sk_msg_expand(struct sock *sk, struct sk_msg *msg, int len,
> > > > int elem_first_coalesce);
> > > > int sk_msg_clone(struct sock *sk, struct sk_msg *dst, struct sk_msg *src,
> > > > @@ -143,6 +144,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
> > > > int len, int flags);
> > > > bool sk_msg_is_readable(struct sock *sk);
> > > > +extern struct kmem_cache *sk_msg_cachep;
> > > > +
> > > > static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes)
> > > > {
> > > > WARN_ON(i == msg->sg.end && bytes);
> > > > @@ -319,6 +322,13 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb)
> > > > kfree_skb(skb);
> > > > }
> > > > +static inline void kfree_sk_msg(struct sk_msg *msg)
> > > > +{
> > > > + if (msg->skb)
> > > > + consume_skb(msg->skb);
> > > > + kmem_cache_free(sk_msg_cachep, msg);
> > > > +}
> > > > +
> > > > static inline bool sk_psock_queue_msg(struct sk_psock *psock,
> > > > struct sk_msg *msg)
> > > > {
> > > > @@ -330,7 +340,7 @@ static inline bool sk_psock_queue_msg(struct sk_psock *psock,
> > > > ret = true;
> > > > } else {
> > > > sk_msg_free(psock->sk, msg);
> > > > - kfree(msg);
> > > > + kfree_sk_msg(msg);
> > >
> > > Isn't this a potential use after free on msg->skb? The sk_msg_free() a
> > > line above will consume_skb() if it exists and its not nil set so we would
> > > consume_skb() again?
> > >
> >
> > Thanks to sk_msg_free, after consuming the skb, it invokes sk_msg_init
> > to make msg->skb NULL to prevent further double free.
> >
> > To avoid the confusion, we can replace kfree_sk_msg here with
> > kmem_cache_free.
> >
>
> Right, the re-initialization in sk_msg_free() is indeed confusing, maybe
> it is time to clean up its logic? For example, separate sk_msg_init()
> out from sk_msg_free().
>
> I can add a separate patch for this in next update, if people prefer.
>
> Thanks!
OK so its not a problem we have the init there. So ACK for this patch.
Perhaps a follow up to clean up the different types of 'frees' would be
useful. Move into sk_msg_free+kfree_sk_msg into a single call. But,
I'm not completely convinced its worth the churn.
Acked-by: John Fastabend <john.fastabend@gmail.com>
next prev parent reply other threads:[~2025-05-30 6:30 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-19 20:36 [Patch bpf-next v3 0/4] tcp_bpf: improve ingress redirection performance with message corking Cong Wang
2025-05-19 20:36 ` [Patch bpf-next v3 1/4] skmsg: rename sk_msg_alloc() to sk_msg_expand() Cong Wang
2025-05-28 23:51 ` John Fastabend
2025-05-19 20:36 ` [Patch bpf-next v3 2/4] skmsg: implement slab allocator cache for sk_msg Cong Wang
2025-05-29 0:04 ` John Fastabend
2025-05-29 0:49 ` Zijian Zhang
2025-05-29 18:38 ` Cong Wang
2025-05-30 6:30 ` John Fastabend [this message]
2025-05-19 20:36 ` [Patch bpf-next v3 3/4] skmsg: save some space in struct sk_psock Cong Wang
2025-05-30 17:15 ` John Fastabend
2025-05-19 20:36 ` [Patch bpf-next v3 4/4] tcp_bpf: improve ingress redirection performance with message corking Cong Wang
2025-05-30 20:07 ` John Fastabend
2025-05-30 20:37 ` Zijian Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250530063011.cqubbcehgnraasmx@gmail.com \
--to=john.fastabend@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=jakub@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
--cc=zhoufeng.zf@bytedance.com \
--cc=zijianzhang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.