public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Martin KaFai Lau <martin.lau@linux.dev>,
	Eric Dumazet <edumazet@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>
Cc: bpf@vger.kernel.org, Kuniyuki Iwashima <kuniyu@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	Jiri Olsa <jolsa@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>, Shuah Khan <shuah@kernel.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf v6 1/2] bpf: tcp: Reject non-TCP skb in bpf_sk_assign_tcp_reqsk()
Date: Tue, 7 Apr 2026 13:22:22 +0800	[thread overview]
Message-ID: <4820d99b-f2c4-4b24-b90f-87e6444284c4@linux.dev> (raw)
In-Reply-To: <20264619471.he8Q.martin.lau@linux.dev>


On 4/7/26 3:53 AM, Martin KaFai Lau wrote:
> On Fri, Apr 03, 2026 at 09:58:27AM +0800, Jiayuan Chen wrote:
>> bpf_sk_assign_tcp_reqsk() only validates skb->protocol (L3) but does not
>> check the L4 protocol in the IP header. A BPF program can call this kfunc
>> on a UDP skb with a valid TCP listener socket, which will succeed and
>> attach a TCP reqsk to the UDP skb.
>>
>> When the UDP skb enters the UDP receive path, skb_steal_sock() returns
>> the TCP listener from the reqsk. The UDP code then passes this TCP socket
>> to udp_unicast_rcv_skb() -> __udp_enqueue_schedule_skb(), which casts
>> it to udp_sock and accesses UDP-specific fields at invalid offsets,
>> causing a null pointer dereference and kernel panic:
>>
>>    BUG: KASAN: null-ptr-deref in __udp_enqueue_schedule_skb+0x19d/0x1df0
>>    Read of size 4 at addr 0000000000000008 by task test_progs/537
>>
>>    CPU: 1 UID: 0 PID: 537 Comm: test_progs Not tainted 7.0.0-rc4+ #46 PREEMPT
>>    Call Trace:
>>     <IRQ>
>>     dump_stack_lvl (lib/dump_stack.c:123)
>>     print_report (mm/kasan/report.c:487)
>>     kasan_report (mm/kasan/report.c:597)
>>     __kasan_check_read (mm/kasan/shadow.c:32)
>>     __udp_enqueue_schedule_skb (net/ipv4/udp.c:1719)
>>     udp_queue_rcv_one_skb (net/ipv4/udp.c:2370 net/ipv4/udp.c:2500)
>>     udp_queue_rcv_skb (net/ipv4/udp.c:2532)
>>     udp_unicast_rcv_skb (net/ipv4/udp.c:2684)
>>     __udp4_lib_rcv (net/ipv4/udp.c:2742)
>>     udp_rcv (net/ipv4/udp.c:2937)
>>     ip_protocol_deliver_rcu (net/ipv4/ip_input.c:209)
>>     ip_local_deliver_finish (./include/linux/rcupdate.h:879 net/ipv4/ip_input.c:242)
>>     ip_local_deliver (net/ipv4/ip_input.c:265)
>>     __netif_receive_skb_one_core (net/core/dev.c:6164 (discriminator 4))
>>     __netif_receive_skb (net/core/dev.c:6280)
>>
>> Fix this by checking the IP header's protocol field in
>> bpf_sk_assign_tcp_reqsk() and rejecting non-TCP skbs with -EINVAL.
>>
>> Note that for IPv6, the nexthdr check does not walk extension headers.
>> This is uncommon for TCP SYN packets in practice, and keeping it simple
>> was agreed upon by Kuniyuki Iwashima.
> sashiko has flagged a similar issue with larger scope.
> Please take a look. Thanks.
>
> https://sashiko.dev/#/patchset/20260403015851.148209-1-jiayuan.chen%40linux.dev


Thanks a lot Martin, sashiko actually dug into a deeper issue here.

Eric and Kuniyuki,

I think the AI review has a point. Since BPF can modify skb fields, the
following sequence still bypasses the protocol check in 
bpf_sk_assign_tcp_reqsk():

    // for a UDP skb
    iph->protocol = TCP
    bpf_sk_assign_tcp_reqsk()
    iph->protocol = UDP

On top of that, bpf_sk_assign() already has the same problem — it doesn't
validate L4 protocol at all.

So I think we should add a check matching skb against sk in 
skb_steal_sock() instead of adding check in bpf helper.
(Also, we probably need some MIB counters for this.)



Below is my diff:

//The core reason for passing protocol as a parameter rather than 
reading it from
//the skb is that IPv6 extension headers make it non-trivial to get the 
actual nexthdr.
//It's much simpler to just let the caller tell us what protocol it expects.

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index c16de5b7963fd..c1032bb9ea864 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -104,12 +104,13 @@ static inline
  struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, 
int doff,
                               const struct in6_addr *saddr, const 
__be16 sport,
                               const struct in6_addr *daddr, const 
__be16 dport,
-                             bool *refcounted, inet6_ehashfn_t *ehashfn)
+                             bool *refcounted, inet6_ehashfn_t *ehashfn,
+                             int protocol)
  {
         struct sock *sk, *reuse_sk;
         bool prefetched;

-       sk = skb_steal_sock(skb, refcounted, &prefetched);
+       sk = skb_steal_sock(skb, refcounted, &prefetched, protocol);
         if (!sk)
                 return NULL;

@@ -151,7 +152,7 @@ static inline struct sock *__inet6_lookup_skb(struct 
sk_buff *skb, int doff,
         struct sock *sk;

         sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, 
&ip6h->daddr, dport,
-                             refcounted, inet6_ehashfn);
+                             refcounted, inet6_ehashfn, IPPROTO_TCP);
         if (IS_ERR(sk))
                 return NULL;
         if (sk)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 5a979dcab5383..0ad73135ece7d 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -433,12 +433,13 @@ static inline
  struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int 
doff,
                              const __be32 saddr, const __be16 sport,
                              const __be32 daddr, const __be16 dport,
-                            bool *refcounted, inet_ehashfn_t *ehashfn)
+                            bool *refcounted, inet_ehashfn_t *ehashfn,
+                            int protocol)
  {
         struct sock *sk, *reuse_sk;
         bool prefetched;

-       sk = skb_steal_sock(skb, refcounted, &prefetched);
+       sk = skb_steal_sock(skb, refcounted, &prefetched, protocol);
         if (!sk)
                 return NULL;

@@ -469,7 +470,7 @@ struct sock *inet_steal_sock(struct net *net, struct 
sk_buff *skb, int doff,
         return reuse_sk;
  }

-static inline struct sock *__inet_lookup_skb(struct sk_buff *skb,
+static inline struct sock *__tcp_lookup_skb(struct sk_buff *skb,
                                              int doff,
                                              const __be16 sport,
                                              const __be16 dport,
@@ -481,7 +482,7 @@ static inline struct sock *__inet_lookup_skb(struct 
sk_buff *skb,
         struct sock *sk;

         sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, 
iph->daddr, dport,
-                            refcounted, inet_ehashfn);
+                            refcounted, inet_ehashfn, IPPROTO_TCP);
         if (IS_ERR(sk))
                 return NULL;
         if (sk)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 5a9c826a7092d..80bd209b2323b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -91,7 +91,8 @@ static inline struct sock *req_to_sk(struct 
request_sock *req)
   * @prefetched: is set to true if the socket was assigned from bpf
   */
  static inline struct sock *skb_steal_sock(struct sk_buff *skb,
-                                         bool *refcounted, bool 
*prefetched)
+                                         bool *refcounted, bool 
*prefetched,
+                                         int protocol)
  {
         struct sock *sk = skb->sk;

@@ -103,6 +104,24 @@ static inline struct sock *skb_steal_sock(struct 
sk_buff *skb,

         *prefetched = skb_sk_is_prefetched(skb);
         if (*prefetched) {
+               /* Validate that the stolen socket matches the expected L4
+                * protocol.  BPF (bpf_sk_assign) can mistakenly or 
maliciously
+                * assign a socket of the wrong type.  If the protocols 
don't
+                * match, clean up the assignment and return NULL so the 
caller
+                * falls through to the normal hash table lookup.
+                *
+                * For full sockets, check sk_protocol directly.
+                * For request sockets (only created by 
bpf_sk_assign_tcp_reqsk),
+                * sk_protocol is not in sock_common and cannot be accessed.
+                * Request sockets are always TCP, so assume IPPROTO_TCP.
+                */
+               if ((sk_fullsock(sk) ? sk->sk_protocol : IPPROTO_TCP) != 
protocol) {
+                       skb_orphan(skb);
+                       *prefetched = false;
+                       *refcounted = false;
+                       return NULL;
+               }
+
  #if IS_ENABLED(CONFIG_SYN_COOKIES)
                 if (sk->sk_state == TCP_NEW_SYN_RECV && 
inet_reqsk(sk)->syncookie) {
                         struct request_sock *req = inet_reqsk(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c7b2463c2e254..442a9379aefe6 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2188,7 +2188,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
         th = (const struct tcphdr *)skb->data;
         iph = ip_hdr(skb);
  lookup:
-       sk = __inet_lookup_skb(skb, __tcp_hdrlen(th), th->source,
+       sk = __tcp_lookup_skb(skb, __tcp_hdrlen(th), th->source,
                                th->dest, sdif, &refcounted);
         if (!sk)
                 goto no_tcp_socket;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b60fad393e182..61f7fbf5ffd9c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2728,7 +2728,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct 
udp_table *udptable,
                 goto csum_error;

         sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr, 
uh->source, daddr, uh->dest,
-                            &refcounted, udp_ehashfn);
+                            &refcounted, udp_ehashfn, IPPROTO_UDP);
         if (IS_ERR(sk))
                 goto no_sk;

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 010b909275dd0..9c3c25c938ec8 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1115,7 +1115,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct 
udp_table *udptable,

         /* Check if the socket is already available, e.g. due to early 
demux */
         sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr, 
uh->source, daddr, uh->dest,
-                             &refcounted, udp6_ehashfn);
+                             &refcounted, udp6_ehashfn, IPPROTO_UDP);
         if (IS_ERR(sk))
                 goto no_sk;




---
This fix touches the net core path (skb_steal_sock / inet_steal_sock),
so it probably makes more sense to review it on the net side rather than 
the bpf side.


  reply	other threads:[~2026-04-07  5:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03  1:58 [PATCH bpf v6 0/2] bpf: tcp: Fix null-ptr-deref in arbitrary SYN Cookie Jiayuan Chen
2026-04-03  1:58 ` [PATCH bpf v6 1/2] bpf: tcp: Reject non-TCP skb in bpf_sk_assign_tcp_reqsk() Jiayuan Chen
2026-04-03  2:26   ` Eric Dumazet
2026-04-06 19:53   ` Martin KaFai Lau
2026-04-07  5:22     ` Jiayuan Chen [this message]
2026-04-03  1:58 ` [PATCH bpf v6 2/2] selftests/bpf: Add protocol check test for bpf_sk_assign_tcp_reqsk() Jiayuan Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4820d99b-f2c4-4b24-b90f-87e6444284c4@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox