public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Kuniyuki Iwashima" <kuniyu@google.com>, jakub@cloudflare.com
Cc: "John Fastabend" <john.fastabend@gmail.com>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Kuniyuki Iwashima" <kuni1840@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	syzbot+5b3b7e51dda1be027b7a@syzkaller.appspotmail.com
Subject: Re: [PATCH v4 bpf/net 6/6] sockmap: Fix broken memory accounting for UDP.
Date: Fri, 06 Mar 2026 07:44:23 +0000	[thread overview]
Message-ID: <a61769c3eb9255309548c2624f054b67f2995ab1@linux.dev> (raw)
In-Reply-To: <CAAVpQUD+GiToMjWU387zgMT17emmXH_guWuebbvjPuppF5pbKw@mail.gmail.com>

March 6, 2026 at 01:42, "Kuniyuki Iwashima" <kuniyu@google.com mailto:kuniyu@google.com?to=%22Kuniyuki%20Iwashima%22%20%3Ckuniyu%40google.com%3E > wrote:


> 
> On Thu, Mar 5, 2026 at 3:04 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> 
> > 
> > March 5, 2026 at 18:45, "Jiayuan Chen" <jiayuan.chen@linux.dev mailto:jiayuan.chen@linux.dev?to=%22Jiayuan%20Chen%22%20%3Cjiayuan.chen%40linux.dev%3E > wrote:
> > 
> >  March 5, 2026 at 17:27, "Kuniyuki Iwashima" <kuniyu@google.com mailto:kuniyu@google.com?to=%22Kuniyuki%20Iwashima%22%20%3Ckuniyu%40google.com%3E > wrote:
> > 
> >  >
> >  > On Thu, Mar 5, 2026 at 12:30 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >  >
> >  >
> >  > March 5, 2026 at 15:48, "Kuniyuki Iwashima" <kuniyu@google.com mailto:kuniyu@google.com?to=%22Kuniyuki%20Iwashima%22%20%3Ckuniyu%40google.com%3E > wrote:
> >  >
> >  > On Wed, Mar 4, 2026 at 10:37 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> >  >
> >  > >
> >  > > On Sat, Feb 21, 2026 at 11:30:53PM +0800, Kuniyuki Iwashima wrote:
> >  > > [...]
> >  > > </TASK>
> >  > >
> >  > > Fixes: d7f571188ecf ("udp: Implement ->read_sock() for sockmap")
> >  > > Reported-by: syzbot+5b3b7e51dda1be027b7a@syzkaller.appspotmail.com
> >  > > Closes: https://lore.kernel.org/netdev/698f84c6.a70a0220.2c38d7.00cb.GAE@google.com/
> >  > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> >  > > ---
> >  > > v4: Fix deadlock when requeued
> >  > > v2: Fix build failure when CONFIG_INET=n
> >  > > ---
> >  > > include/net/udp.h | 9 +++++++++
> >  > > net/core/skmsg.c | 29 +++++++++++++++++++++++++----
> >  > > net/ipv4/udp.c | 9 +++++++++
> >  > > 3 files changed, 43 insertions(+), 4 deletions(-)
> >  > >
> >  > > diff --git a/include/net/udp.h b/include/net/udp.h
> >  > > index 700dbedcb15f..ae38a4da9388 100644
> >  > > --- a/include/net/udp.h
> >  > > +++ b/include/net/udp.h
> >  > > @@ -455,6 +455,15 @@ struct sock *__udp6_lib_lookup(const struct net *net,
> >  > > struct sk_buff *skb);
> >  > > struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
> >  > > __be16 sport, __be16 dport);
> >  > > +
> >  > > +#ifdef CONFIG_INET
> >  > > +void udp_sock_rfree(struct sk_buff *skb);
> >  > > +#else
> >  > > +static inline void udp_sock_rfree(struct sk_buff *skb)
> >  > > +{
> >  > > +}
> >  > > +#endif
> >  > > +
> >  > > int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor);
> >  > >
> >  > > /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
> >  > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> >  > > index 96f43e0dbb17..dd9134a45663 100644
> >  > > --- a/net/core/skmsg.c
> >  > > +++ b/net/core/skmsg.c
> >  > > @@ -7,6 +7,7 @@
> >  > >
> >  > > #include <net/sock.h>
> >  > > #include <net/tcp.h>
> >  > > +#include <net/udp.h>
> >  > > #include <net/tls.h>
> >  > > #include <trace/events/sock.h>
> >  > >
> >  > > @@ -576,6 +577,7 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> >  > > u32 off, u32 len, bool take_ref)
> >  > > {
> >  > > struct sock *sk = psock->sk;
> >  > > + bool is_udp = sk_is_udp(sk);
> >  > > struct sk_msg *msg;
> >  > > int err = -EAGAIN;
> >  > >
> >  > > @@ -583,13 +585,20 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> >  > > if (!msg)
> >  > > goto out;
> >  > >
> >  > > - if (skb->sk != sk && take_ref) {
> >  > > + if (is_udp) {
> >  > > + if (unlikely(skb->destructor == udp_sock_rfree))
> >  > > + goto enqueue;
> >  > > +
> >  > > + spin_lock_bh(&sk->sk_receive_queue.lock);
> >  > > + }
> >  > > +
> >  > > + if (is_udp || (skb->sk != sk && take_ref)) {
> >  > > if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> >  > > - goto free;
> >  > > + goto unlock;
> >  > >
> >  > > if (!sk_rmem_schedule(sk, skb, skb->truesize))
> >  > > - goto free;
> >  > > - } else {
> >  > > + goto unlock;
> >  > > + } else if (skb->sk == sk || !take_ref) {
> >  > > /* This is used in tcp_bpf_recvmsg_parser() to determine whether the
> >  > > * data originates from the socket's own protocol stack. No need to
> >  > > * refcount sk because msg's lifetime is bound to sk via the ingress_msg.
> >  > > @@ -604,11 +613,23 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
> >  > > * into user buffers.
> >  > > */
> >  > > skb_set_owner_r(skb, sk);
> >  > > +
> >  > > + if (is_udp) {
> >  > > + spin_unlock_bh(&sk->sk_receive_queue.lock);
> >  > > +
> >  > > + skb->destructor = udp_sock_rfree;
> >  > > + }
> >  > > +
> >  > > +enqueue:
> >  > > err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg, take_ref);
> >  > > if (err < 0)
> >  > > goto free;
> >  > > out:
> >  > > return err;
> >  > > +
> >  > > +unlock:
> >  > > + if (is_udp)
> >  > > + spin_unlock_bh(&sk->sk_receive_queue.lock);
> >  > > free:
> >  > > kfree(msg);
> >  > > goto out;
> >  > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> >  > > index 422c96fea249..831d26748a90 100644
> >  > > --- a/net/ipv4/udp.c
> >  > > +++ b/net/ipv4/udp.c
> >  > > @@ -2039,6 +2039,15 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags,
> >  > > }
> >  > > EXPORT_SYMBOL(__skb_recv_udp);
> >  > >
> >  > > +void udp_sock_rfree(struct sk_buff *skb)
> >  > > +{
> >  > > + struct sock *sk = skb->sk;
> >  > > +
> >  > > + spin_lock_bh(&sk->sk_receive_queue.lock);
> >  > > + sock_rfree(skb);
> >  > > + spin_unlock_bh(&sk->sk_receive_queue.lock);
> >  > > +}
> >  > > +
> >  > > int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
> >  > > {
> >  > > struct sk_buff *skb;
> >  > > --
> >  > > 2.53.0.371.g1d285c8824-goog
> >  > >
> >  > > There are too many protocol-specific checks in the generic skmsg code
> >  > > here. This should be abstracted out.
> >  > >
> >  > > Also TCP also has a similar issue with sk_forward_alloc concurrency
> >  > > between the backlog charge and recvmsg uncharge paths so the abstraction
> >  > > is necessary.
> >  > >
> >  > > I've put together a simple diff based on your patch for reference.
> >  > > I haven't tested it thoroughly, but it at least handles TCP and UDP
> >  > > separately through a callback, keeping the generic code clean.
> >  > >
> >  > I can follow up on TCP, but TCP needs another fix, which
> >  > cannot be factored out.
> >  >
> >  > Some wrong assumptions in your patch:
> >  >
> >  > 1) tcp_psock_ingress_charge() uses psock->ingress_lock,
> >  > but it does not protect any tcp_sock fields, especially
> >  > sk->sk_forwad_alloc.
> >  >
> >  > 2) TCP memory accounting happens under bh_lock_sock()
> >  > _or_ lock_sock(). sendmsg()/recvmsg() works under
> >  > lock_sock() and TCP fast path puts skb to backlog if
> >  > lock_sock() is held by userspace. This means even
> >  > a simple bh_lock_sock() in tcp_psock_ingress_charge()
> >  > race with memory accounting happening under lock_sock().
> >  >
> >  > Since sk_psock_skb_ingress() is called from both GFP_KERNEL
> >  > and GFP_ATOMIC context, the simplest fix would be to put
> >  > lock_sock() in sk_psock_backlog() for TCP, which is ugly though.
> >  >
> >  > OK, thanks.
> >  >
> >  > Now my only concern is keeping UDP-specific logic
> >  > out of skmsg.c.
> >  >
> >  > We already have TCP-only msg->sk logic there that
> >  > you added. It just does not look like TCP code.
> >  >
> >  The msg->sk field marks ingress-self vs ingress-other, which is
> >  protocol-agnostic metadata. TCP happens to consume it for seq
> >  tracking, but the assignment itself has no TCP-specific check
> >  (no is_tcp_sk() or similar).
> > 
> >  >
> >  > If really needed, we can do such cleanup in bpf-next
> >  > for TCP and UDP altogether (probably after another
> >  > TCP fix ?).
> >  >
> >  >
> >  > We could use a function pointer so that UDP
> >  > implements its own ingress charge in udp_bpf.c, while other
> >  > protocols just use a default no-op or even NULL.
> >  >
> >  > If TCP requires locking outside of the scope, there's
> >  > no point adding one more extra layer/complexity just
> >  > for UDP.
> >  >
> >  > Even if we go that way, most likely we cannot get rid
> >  > of protocol-specific code (UDP/TCP-only hook) and
> >  > end up with the indirect call helpers, which will be
> >  > expensive than simple if logic.
> >  >
> >  An alternative would be to add such sk_psock_udp_skb_ingress() in
> >  udp_bpf.c, without adding any indirection layer.
> > 
> >  This way we don't need to design a new abstraction before the
> >  approach is settled. Just a UDP-specific ingress helper, which
> >  keeps things clean enough for now, and avoid scattering is_udp
> >  checks all over sk_psock_skb_ingress, just need one if (is_udp).
> > 
> >  Thanks.
> > 
> >  After thinking a while, since the UDP-specific behaviour only appears
> >  in the redirect-to-self path,
> > 
> Are you saying psock->sk == skb->sk is always true and
> skb_set_owner_r() in the else case is not called for UDP ?
>


Before we figure out how to properly handle TCP, I'm wondering if we can take a
simpler approach for UDP like this (diff below).
This keeps the original code cleaner — two sk_is_udp()
dispatch points in sk_psock_skb_ingress() and sk_psock_verdict_apply(),
both routing to a dedicated sk_psock_skb_ingress_udp() that handles
all UDP-specific locking and accounting in one place.

As for whether sk_psock_skb_ingress_udp() should live in udp_bpf.c
instead of skmsg.c, I'm open to either way.


diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index ddde93dd8bc6..70876ed51b91 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -8,6 +8,7 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <net/tls.h>
+#include <net/udp.h>
 #include <trace/events/sock.h>

 static bool sk_msg_try_coalesce_ok(struct sk_msg *msg, int elem_first_coalesce)
@@ -587,6 +588,48 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
 static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb,
 				     u32 off, u32 len, bool take_ref);

+/* UDP needs sk_receive_queue.lock to protect sk_forward_alloc from concurrent
+ * modification by udp_rmem_release(). This handles both self-redirect and
+ * cross-socket redirect for UDP.
+ */
+static int sk_psock_skb_ingress_udp(struct sk_psock *psock, struct sk_buff *skb,
+				    u32 off, u32 len, bool take_ref)
+{
+	struct sock *sk = psock->sk;
+	struct sk_msg *msg;
+	int err;
+
+	if (skb->destructor == udp_sock_rfree) {
+		msg = alloc_sk_msg(GFP_ATOMIC);
+		if (unlikely(!msg))
+			return -EAGAIN;
+		goto enqueue;
+	}
+
+	msg = alloc_sk_msg(take_ref ? GFP_KERNEL : GFP_ATOMIC);
+	if (unlikely(!msg))
+		return -EAGAIN;
+
+	spin_lock_bh(&sk->sk_receive_queue.lock);
+	if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
+	    !sk_rmem_schedule(sk, skb, skb->truesize)) {
+		spin_unlock_bh(&sk->sk_receive_queue.lock);
+		kfree(msg);
+		return -EAGAIN;
+	}
+	skb_set_owner_r(skb, sk);
+	spin_unlock_bh(&sk->sk_receive_queue.lock);
+
+	skb->destructor = udp_sock_rfree;
+
+enqueue:
+	err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg,
+					   take_ref);
+	if (err < 0)
+		kfree(msg);
+	return err;
+}
+
 static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
 				u32 off, u32 len)
 {
@@ -594,6 +637,9 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb,
 	struct sk_msg *msg;
 	int err;

+	if (sk_is_udp(sk))
+		return sk_psock_skb_ingress_udp(psock, skb, off, len, true);
+
 	/* If we are receiving on the same sock skb->sk is already assigned,
 	 * skip memory accounting and owner transition seeing it already set
 	 * correctly.
@@ -1059,7 +1105,12 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
 				off = stm->offset;
 				len = stm->full_len;
 			}
-			err = sk_psock_skb_ingress_self(psock, skb, off, len, false);
+			if (sk_is_udp(sk_other))
+				err = sk_psock_skb_ingress_udp(psock, skb,
+							       off, len, false);
+			else
+				err = sk_psock_skb_ingress_self(psock, skb,
+								off, len, false);
 		}
 		if (err < 0) {
 			spin_lock_bh(&psock->ingress_lock);

      reply	other threads:[~2026-03-06  7:44 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-21 23:30 [PATCH v4 bpf/net 0/6] sockmap: Fix UAF and broken memory accounting for UDP Kuniyuki Iwashima
2026-02-21 23:30 ` [PATCH v4 bpf/net 1/6] sockmap: Annotate sk->sk_data_ready() " Kuniyuki Iwashima
2026-03-05 11:05   ` Jakub Sitnicki
2026-03-05 11:27   ` Jiayuan Chen
2026-02-21 23:30 ` [PATCH v4 bpf/net 2/6] sockmap: Annotate sk->sk_write_space() " Kuniyuki Iwashima
2026-03-05  1:48   ` Jiayuan Chen
2026-03-05  3:43     ` Kuniyuki Iwashima
2026-03-07  0:03       ` Martin KaFai Lau
2026-03-07  2:51         ` Kuniyuki Iwashima
2026-03-05 11:35   ` Jiayuan Chen
2026-03-05 11:51   ` Jakub Sitnicki
2026-02-21 23:30 ` [PATCH v4 bpf/net 3/6] sockmap: Fix use-after-free in udp_bpf_recvmsg() Kuniyuki Iwashima
2026-03-05  2:30   ` Jiayuan Chen
2026-03-05  3:41     ` Kuniyuki Iwashima
2026-03-05 11:36   ` Jiayuan Chen
2026-03-05 11:39   ` Jakub Sitnicki
2026-03-05 17:46     ` Kuniyuki Iwashima
2026-02-21 23:30 ` [PATCH v4 bpf/net 4/6] sockmap: Inline sk_psock_create_ingress_msg() Kuniyuki Iwashima
2026-03-05 11:44   ` Jakub Sitnicki
2026-02-21 23:30 ` [PATCH v4 bpf/net 5/6] sockmap: Consolidate sk_psock_skb_ingress_self() Kuniyuki Iwashima
2026-02-21 23:30 ` [PATCH v4 bpf/net 6/6] sockmap: Fix broken memory accounting for UDP Kuniyuki Iwashima
2026-03-04 20:04   ` Martin KaFai Lau
2026-03-04 20:14     ` Kuniyuki Iwashima
2026-03-05  6:37   ` Jiayuan Chen
2026-03-05  7:48     ` Kuniyuki Iwashima
2026-03-05  8:30       ` Jiayuan Chen
2026-03-05  9:27         ` Kuniyuki Iwashima
2026-03-05 10:45           ` Jiayuan Chen
2026-03-05 11:04             ` Jiayuan Chen
2026-03-05 17:42               ` Kuniyuki Iwashima
2026-03-06  7:44                 ` Jiayuan Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a61769c3eb9255309548c2624f054b67f2995ab1@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=bpf@vger.kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+5b3b7e51dda1be027b7a@syzkaller.appspotmail.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox