All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Martin KaFai Lau <martin.lau@linux.dev>,
	Eric Dumazet <edumazet@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>
Cc: bpf@vger.kernel.org, Kuniyuki Iwashima <kuniyu@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	Jiri Olsa <jolsa@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>, Shuah Khan <shuah@kernel.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf v6 1/2] bpf: tcp: Reject non-TCP skb in bpf_sk_assign_tcp_reqsk()
Date: Tue, 7 Apr 2026 13:22:22 +0800	[thread overview]
Message-ID: <4820d99b-f2c4-4b24-b90f-87e6444284c4@linux.dev> (raw)
In-Reply-To: <20264619471.he8Q.martin.lau@linux.dev>


On 4/7/26 3:53 AM, Martin KaFai Lau wrote:
> On Fri, Apr 03, 2026 at 09:58:27AM +0800, Jiayuan Chen wrote:
>> bpf_sk_assign_tcp_reqsk() only validates skb->protocol (L3) but does not
>> check the L4 protocol in the IP header. A BPF program can call this kfunc
>> on a UDP skb with a valid TCP listener socket, which will succeed and
>> attach a TCP reqsk to the UDP skb.
>>
>> When the UDP skb enters the UDP receive path, skb_steal_sock() returns
>> the TCP listener from the reqsk. The UDP code then passes this TCP socket
>> to udp_unicast_rcv_skb() -> __udp_enqueue_schedule_skb(), which casts
>> it to udp_sock and accesses UDP-specific fields at invalid offsets,
>> causing a null pointer dereference and kernel panic:
>>
>>    BUG: KASAN: null-ptr-deref in __udp_enqueue_schedule_skb+0x19d/0x1df0
>>    Read of size 4 at addr 0000000000000008 by task test_progs/537
>>
>>    CPU: 1 UID: 0 PID: 537 Comm: test_progs Not tainted 7.0.0-rc4+ #46 PREEMPT
>>    Call Trace:
>>     <IRQ>
>>     dump_stack_lvl (lib/dump_stack.c:123)
>>     print_report (mm/kasan/report.c:487)
>>     kasan_report (mm/kasan/report.c:597)
>>     __kasan_check_read (mm/kasan/shadow.c:32)
>>     __udp_enqueue_schedule_skb (net/ipv4/udp.c:1719)
>>     udp_queue_rcv_one_skb (net/ipv4/udp.c:2370 net/ipv4/udp.c:2500)
>>     udp_queue_rcv_skb (net/ipv4/udp.c:2532)
>>     udp_unicast_rcv_skb (net/ipv4/udp.c:2684)
>>     __udp4_lib_rcv (net/ipv4/udp.c:2742)
>>     udp_rcv (net/ipv4/udp.c:2937)
>>     ip_protocol_deliver_rcu (net/ipv4/ip_input.c:209)
>>     ip_local_deliver_finish (./include/linux/rcupdate.h:879 net/ipv4/ip_input.c:242)
>>     ip_local_deliver (net/ipv4/ip_input.c:265)
>>     __netif_receive_skb_one_core (net/core/dev.c:6164 (discriminator 4))
>>     __netif_receive_skb (net/core/dev.c:6280)
>>
>> Fix this by checking the IP header's protocol field in
>> bpf_sk_assign_tcp_reqsk() and rejecting non-TCP skbs with -EINVAL.
>>
>> Note that for IPv6, the nexthdr check does not walk extension headers.
>> This is uncommon for TCP SYN packets in practice, and keeping it simple
>> was agreed upon by Kuniyuki Iwashima.
> sashiko has flagged a similar issue with larger scope.
> Please take a look. Thanks.
>
> https://sashiko.dev/#/patchset/20260403015851.148209-1-jiayuan.chen%40linux.dev


Thanks a lot Martin, sashiko actually dug into a deeper issue here.

Eric and Kuniyuki,

I think the AI review has a point. Since BPF can modify skb fields, the
following sequence still bypasses the protocol check in 
bpf_sk_assign_tcp_reqsk():

    // for a UDP skb
    iph->protocol = TCP
    bpf_sk_assign_tcp_reqsk()
    iph->protocol = UDP

On top of that, bpf_sk_assign() already has the same problem — it doesn't
validate L4 protocol at all.

So I think we should add a check matching skb against sk in 
skb_steal_sock() instead of adding check in bpf helper.
(Also, we probably need some MIB counters for this.)



Below is my diff:

//The core reason for passing protocol as a parameter rather than 
reading it from
//the skb is that IPv6 extension headers make it non-trivial to get the 
actual nexthdr.
//It's much simpler to just let the caller tell us what protocol it expects.

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index c16de5b7963fd..c1032bb9ea864 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -104,12 +104,13 @@ static inline
  struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, 
int doff,
                               const struct in6_addr *saddr, const 
__be16 sport,
                               const struct in6_addr *daddr, const 
__be16 dport,
-                             bool *refcounted, inet6_ehashfn_t *ehashfn)
+                             bool *refcounted, inet6_ehashfn_t *ehashfn,
+                             int protocol)
  {
         struct sock *sk, *reuse_sk;
         bool prefetched;

-       sk = skb_steal_sock(skb, refcounted, &prefetched);
+       sk = skb_steal_sock(skb, refcounted, &prefetched, protocol);
         if (!sk)
                 return NULL;

@@ -151,7 +152,7 @@ static inline struct sock *__inet6_lookup_skb(struct 
sk_buff *skb, int doff,
         struct sock *sk;

         sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, 
&ip6h->daddr, dport,
-                             refcounted, inet6_ehashfn);
+                             refcounted, inet6_ehashfn, IPPROTO_TCP);
         if (IS_ERR(sk))
                 return NULL;
         if (sk)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 5a979dcab5383..0ad73135ece7d 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -433,12 +433,13 @@ static inline
  struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int 
doff,
                              const __be32 saddr, const __be16 sport,
                              const __be32 daddr, const __be16 dport,
-                            bool *refcounted, inet_ehashfn_t *ehashfn)
+                            bool *refcounted, inet_ehashfn_t *ehashfn,
+                            int protocol)
  {
         struct sock *sk, *reuse_sk;
         bool prefetched;

-       sk = skb_steal_sock(skb, refcounted, &prefetched);
+       sk = skb_steal_sock(skb, refcounted, &prefetched, protocol);
         if (!sk)
                 return NULL;

@@ -469,7 +470,7 @@ struct sock *inet_steal_sock(struct net *net, struct 
sk_buff *skb, int doff,
         return reuse_sk;
  }

-static inline struct sock *__inet_lookup_skb(struct sk_buff *skb,
+static inline struct sock *__tcp_lookup_skb(struct sk_buff *skb,
                                              int doff,
                                              const __be16 sport,
                                              const __be16 dport,
@@ -481,7 +482,7 @@ static inline struct sock *__inet_lookup_skb(struct 
sk_buff *skb,
         struct sock *sk;

         sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, 
iph->daddr, dport,
-                            refcounted, inet_ehashfn);
+                            refcounted, inet_ehashfn, IPPROTO_TCP);
         if (IS_ERR(sk))
                 return NULL;
         if (sk)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 5a9c826a7092d..80bd209b2323b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -91,7 +91,8 @@ static inline struct sock *req_to_sk(struct 
request_sock *req)
   * @prefetched: is set to true if the socket was assigned from bpf
   */
  static inline struct sock *skb_steal_sock(struct sk_buff *skb,
-                                         bool *refcounted, bool 
*prefetched)
+                                         bool *refcounted, bool 
*prefetched,
+                                         int protocol)
  {
         struct sock *sk = skb->sk;

@@ -103,6 +104,24 @@ static inline struct sock *skb_steal_sock(struct 
sk_buff *skb,

         *prefetched = skb_sk_is_prefetched(skb);
         if (*prefetched) {
+               /* Validate that the stolen socket matches the expected L4
+                * protocol.  BPF (bpf_sk_assign) can mistakenly or 
maliciously
+                * assign a socket of the wrong type.  If the protocols 
don't
+                * match, clean up the assignment and return NULL so the 
caller
+                * falls through to the normal hash table lookup.
+                *
+                * For full sockets, check sk_protocol directly.
+                * For request sockets (only created by 
bpf_sk_assign_tcp_reqsk),
+                * sk_protocol is not in sock_common and cannot be accessed.
+                * Request sockets are always TCP, so assume IPPROTO_TCP.
+                */
+               if ((sk_fullsock(sk) ? sk->sk_protocol : IPPROTO_TCP) != 
protocol) {
+                       skb_orphan(skb);
+                       *prefetched = false;
+                       *refcounted = false;
+                       return NULL;
+               }
+
  #if IS_ENABLED(CONFIG_SYN_COOKIES)
                 if (sk->sk_state == TCP_NEW_SYN_RECV && 
inet_reqsk(sk)->syncookie) {
                         struct request_sock *req = inet_reqsk(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c7b2463c2e254..442a9379aefe6 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2188,7 +2188,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
         th = (const struct tcphdr *)skb->data;
         iph = ip_hdr(skb);
  lookup:
-       sk = __inet_lookup_skb(skb, __tcp_hdrlen(th), th->source,
+       sk = __tcp_lookup_skb(skb, __tcp_hdrlen(th), th->source,
                                th->dest, sdif, &refcounted);
         if (!sk)
                 goto no_tcp_socket;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b60fad393e182..61f7fbf5ffd9c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2728,7 +2728,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct 
udp_table *udptable,
                 goto csum_error;

         sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr, 
uh->source, daddr, uh->dest,
-                            &refcounted, udp_ehashfn);
+                            &refcounted, udp_ehashfn, IPPROTO_UDP);
         if (IS_ERR(sk))
                 goto no_sk;

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 010b909275dd0..9c3c25c938ec8 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1115,7 +1115,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct 
udp_table *udptable,

         /* Check if the socket is already available, e.g. due to early 
demux */
         sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr, 
uh->source, daddr, uh->dest,
-                             &refcounted, udp6_ehashfn);
+                             &refcounted, udp6_ehashfn, IPPROTO_UDP);
         if (IS_ERR(sk))
                 goto no_sk;




---
This fix touches the net core path (skb_steal_sock / inet_steal_sock),
so it probably makes more sense to review it on the net side rather than 
the bpf side.


  reply	other threads:[~2026-04-07  5:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03  1:58 [PATCH bpf v6 0/2] bpf: tcp: Fix null-ptr-deref in arbitrary SYN Cookie Jiayuan Chen
2026-04-03  1:58 ` [PATCH bpf v6 1/2] bpf: tcp: Reject non-TCP skb in bpf_sk_assign_tcp_reqsk() Jiayuan Chen
2026-04-03  2:26   ` Eric Dumazet
2026-04-06 19:53   ` Martin KaFai Lau
2026-04-07  5:22     ` Jiayuan Chen [this message]
2026-04-08  4:25       ` Kuniyuki Iwashima
2026-04-08 19:22         ` Martin KaFai Lau
2026-04-03  1:58 ` [PATCH bpf v6 2/2] selftests/bpf: Add protocol check test for bpf_sk_assign_tcp_reqsk() Jiayuan Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4820d99b-f2c4-4b24-b90f-87e6444284c4@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.