From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Kuniyuki Iwashima" <kuniyu@google.com>
Cc: "John Fastabend" <john.fastabend@gmail.com>,
"Jakub Sitnicki" <jakub@cloudflare.com>,
"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
"Kuniyuki Iwashima" <kuni1840@gmail.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org,
syzbot+5b3b7e51dda1be027b7a@syzkaller.appspotmail.com
Subject: Re: [PATCH v4 bpf/net 6/6] sockmap: Fix broken memory accounting for UDP.
Date: Thu, 05 Mar 2026 10:45:18 +0000 [thread overview]
Message-ID: <3e278eea8830a7b68054ae0b5a99e1bc7ae5fbf3@linux.dev> (raw)
In-Reply-To: <CAAVpQUBrVzwDN2xc68owHJZP7VoAZQWhwo_xdo5noGZ27qmjpA@mail.gmail.com>
March 5, 2026 at 17:27, "Kuniyuki Iwashima" <kuniyu@google.com mailto:kuniyu@google.com?to=%22Kuniyuki%20Iwashima%22%20%3Ckuniyu%40google.com%3E > wrote:
>
> On Thu, Mar 5, 2026 at 12:30 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> >
> > March 5, 2026 at 15:48, "Kuniyuki Iwashima" <kuniyu@google.com mailto:kuniyu@google.com?to=%22Kuniyuki%20Iwashima%22%20%3Ckuniyu%40google.com%3E > wrote:
> >
> > On Wed, Mar 4, 2026 at 10:37 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >
> > >
> > > On Sat, Feb 21, 2026 at 11:30:53PM +0800, Kuniyuki Iwashima wrote:
> > > [...]
> > > </TASK>
> > >
> > > Fixes: d7f571188ecf ("udp: Implement ->read_sock() for sockmap")
> > > Reported-by: syzbot+5b3b7e51dda1be027b7a@syzkaller.appspotmail.com
> > > Closes: https://lore.kernel.org/netdev/698f84c6.a70a0220.2c38d7.00cb.GAE@google.com/
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > > ---
> > > v4: Fix deadlock when requeued
> > > v2: Fix build failure when CONFIG_INET=n
> > > ---
> > > include/net/udp.h | 9 +++++++++
> > > net/core/skmsg.c | 29 +++++++++++++++++++++++++----
> > > net/ipv4/udp.c | 9 +++++++++
> > > 3 files changed, 43 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > index 700dbedcb15f..ae38a4da9388 100644
> > > --- a/include/net/udp.h
> > > +++ b/include/net/udp.h
> > > @@ -455,6 +455,15 @@ struct sock *__udp6_lib_lookup(const struct net *net,
> > > struct sk_buff *skb);
> > > struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
> > > __be16 sport, __be16 dport);
> > > +
> > > +#ifdef CONFIG_INET
> > > +void udp_sock_rfree(struct sk_buff *skb);
> > > +#else
> > > +static inline void udp_sock_rfree(struct sk_buff *skb)
> > > +{
> > > +}
> > > +#endif
> > > +
> > > int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor);
> > >
> > > /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
> > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > index 96f43e0dbb17..dd9134a45663 100644
> > > --- a/net/core/skmsg.c
> > > +++ b/net/core/skmsg.c
> > > @@ -7,6 +7,7 @@
> > >
> > > #include <net/sock.h>
> > > #include <net/tcp.h>
> > > +#include <net/udp.h>
> > > #include <net/tls.h>
> > > #include <trace/events/sock.h>
> > >
> > > @@ -576,6 +577,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> > > u32 off, u32 len, bool take_ref)
> > > {
> > > struct sock *sk = psock->sk;
> > > + bool is_udp = sk_is_udp(sk);
> > > struct sk_msg *msg;
> > > int err = -EAGAIN;
> > >
> > > @@ -583,13 +585,20 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> > > if (!msg)
> > > goto out;
> > >
> > > - if (skb->sk != sk && take_ref) {
> > > + if (is_udp) {
> > > + if (unlikely(skb->destructor == udp_sock_rfree))
> > > + goto enqueue;
> > > +
> > > + spin_lock_bh(&sk->sk_receive_queue.lock);
> > > + }
> > > +
> > > + if (is_udp || (skb->sk != sk && take_ref)) {
> > > if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> > > - goto free;
> > > + goto unlock;
> > >
> > > if (!sk_rmem_schedule(sk, skb, skb->truesize))
> > > - goto free;
> > > - } else {
> > > + goto unlock;
> > > + } else if (skb->sk == sk || !take_ref) {
> > > /* This is used in tcp_bpf_recvmsg_parser() to determine whether the
> > > * data originates from the socket's own protocol stack. No need to
> > > * refcount sk because msg's lifetime is bound to sk via the ingress_msg.
> > > @@ -604,11 +613,23 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> > > * into user buffers.
> > > */
> > > skb_set_owner_r(skb, sk);
> > > +
> > > + if (is_udp) {
> > > + spin_unlock_bh(&sk->sk_receive_queue.lock);
> > > +
> > > + skb->destructor = udp_sock_rfree;
> > > + }
> > > +
> > > +enqueue:
> > > err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, take_ref);
> > > if (err < 0)
> > > goto free;
> > > out:
> > > return err;
> > > +
> > > +unlock:
> > > + if (is_udp)
> > > + spin_unlock_bh(&sk->sk_receive_queue.lock);
> > > free:
> > > kfree(msg);
> > > goto out;
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 422c96fea249..831d26748a90 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -2039,6 +2039,15 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
> > > }
> > > EXPORT_SYMBOL(__skb_recv_udp);
> > >
> > > +void udp_sock_rfree(struct sk_buff *skb)
> > > +{
> > > + struct sock *sk = skb->sk;
> > > +
> > > + spin_lock_bh(&sk->sk_receive_queue.lock);
> > > + sock_rfree(skb);
> > > + spin_unlock_bh(&sk->sk_receive_queue.lock);
> > > +}
> > > +
> > > int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
> > > {
> > > struct sk_buff *skb;
> > > --
> > > 2.53.0.371.g1d285c8824-goog
> > >
> > > There are too many protocol-specific checks in the generic skmsg code
> > > here. This should be abstracted out.
> > >
> > > Also TCP also has a similar issue with sk_forward_alloc concurrency
> > > between the backlog charge and recvmsg uncharge paths so the abstraction
> > > is necessary.
> > >
> > > I've put together a simple diff based on your patch for reference.
> > > I haven't tested it thoroughly, but it at least handles TCP and UDP
> > > separately through a callback, keeping the generic code clean.
> > >
> > I can follow up on TCP, but TCP needs another fix, which
> > cannot be factored out.
> >
> > Some wrong assumptions in your patch:
> >
> > 1) tcp_psock_ingress_charge() uses psock->ingress_lock,
> > but it does not protect any tcp_sock fields, especially
> > sk->sk_forwad_alloc.
> >
> > 2) TCP memory accounting happens under bh_lock_sock()
> > _or_ lock_sock(). sendmsg()/recvmsg() works under
> > lock_sock() and TCP fast path puts skb to backlog if
> > lock_sock() is held by userspace. This means even
> > a simple bh_lock_sock() in tcp_psock_ingress_charge()
> > race with memory accounting happening under lock_sock().
> >
> > Since sk_psock_skb_ingress() is called from both GFP_KERNEL
> > and GFP_ATOMIC context, the simplest fix would be to put
> > lock_sock() in sk_psock_backlog() for TCP, which is ugly though.
> >
> > OK, thanks.
> >
> > Now my only concern is keeping UDP-specific logic
> > out of skmsg.c.
> >
> We already have TCP-only msg->sk logic there that
> you added. It just does not look like TCP code.
The msg->sk field marks ingress-self vs ingress-other, which is
protocol-agnostic metadata. TCP happens to consume it for seq
tracking, but the assignment itself has no TCP-specific check
(no is_tcp_sk() or similar).
> If really needed, we can do such cleanup in bpf-next
> for TCP and UDP altogether (probably after another
> TCP fix ?).
>
> >
> > We could use a function pointer so that UDP
> > implements its own ingress charge in udp_bpf.c, while other
> > protocols just use a default no-op or even NULL.
> >
> If TCP requires locking outside of the scope, there's
> no point adding one more extra layer/complexity just
> for UDP.
>
> Even if we go that way, most likely we cannot get rid
> of protocol-specific code (UDP/TCP-only hook) and
> end up with the indirect call helpers, which will be
> expensive than simple if logic.
>
An alternative would be to add such sk_psock_udp_skb_ingress() in
udp_bpf.c, without adding any indirection layer.
This way we don't need to design a new abstraction before the
approach is settled. Just a UDP-specific ingress helper, which
keeps things clean enough for now, and avoid scattering is_udp
checks all over sk_psock_skb_ingress, just need one if (is_udp).
Thanks.
next prev parent reply other threads:[~2026-03-05 10:45 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-21 23:30 [PATCH v4 bpf/net 0/6] sockmap: Fix UAF and broken memory accounting for UDP Kuniyuki Iwashima
2026-02-21 23:30 ` [PATCH v4 bpf/net 1/6] sockmap: Annotate sk->sk_data_ready() " Kuniyuki Iwashima
2026-03-05 11:05 ` Jakub Sitnicki
2026-03-05 11:27 ` Jiayuan Chen
2026-02-21 23:30 ` [PATCH v4 bpf/net 2/6] sockmap: Annotate sk->sk_write_space() " Kuniyuki Iwashima
2026-03-05 1:48 ` Jiayuan Chen
2026-03-05 3:43 ` Kuniyuki Iwashima
2026-03-07 0:03 ` Martin KaFai Lau
2026-03-07 2:51 ` Kuniyuki Iwashima
2026-03-05 11:35 ` Jiayuan Chen
2026-03-05 11:51 ` Jakub Sitnicki
2026-02-21 23:30 ` [PATCH v4 bpf/net 3/6] sockmap: Fix use-after-free in udp_bpf_recvmsg() Kuniyuki Iwashima
2026-03-05 2:30 ` Jiayuan Chen
2026-03-05 3:41 ` Kuniyuki Iwashima
2026-03-05 11:36 ` Jiayuan Chen
2026-03-05 11:39 ` Jakub Sitnicki
2026-03-05 17:46 ` Kuniyuki Iwashima
2026-02-21 23:30 ` [PATCH v4 bpf/net 4/6] sockmap: Inline sk_psock_create_ingress_msg() Kuniyuki Iwashima
2026-03-05 11:44 ` Jakub Sitnicki
2026-02-21 23:30 ` [PATCH v4 bpf/net 5/6] sockmap: Consolidate sk_psock_skb_ingress_self() Kuniyuki Iwashima
2026-02-21 23:30 ` [PATCH v4 bpf/net 6/6] sockmap: Fix broken memory accounting for UDP Kuniyuki Iwashima
2026-03-04 20:04 ` Martin KaFai Lau
2026-03-04 20:14 ` Kuniyuki Iwashima
2026-03-05 6:37 ` Jiayuan Chen
2026-03-05 7:48 ` Kuniyuki Iwashima
2026-03-05 8:30 ` Jiayuan Chen
2026-03-05 9:27 ` Kuniyuki Iwashima
2026-03-05 10:45 ` Jiayuan Chen [this message]
2026-03-05 11:04 ` Jiayuan Chen
2026-03-05 17:42 ` Kuniyuki Iwashima
2026-03-06 7:44 ` Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3e278eea8830a7b68054ae0b5a99e1bc7ae5fbf3@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=bpf@vger.kernel.org \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kuni1840@gmail.com \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=syzbot+5b3b7e51dda1be027b7a@syzkaller.appspotmail.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.