From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Martin KaFai Lau <martin.lau@linux.dev>,
Eric Dumazet <edumazet@google.com>,
Kuniyuki Iwashima <kuniyu@google.com>
Cc: bpf@vger.kernel.org, Kuniyuki Iwashima <kuniyu@google.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
Jiri Olsa <jolsa@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>, Shuah Khan <shuah@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf v6 1/2] bpf: tcp: Reject non-TCP skb in bpf_sk_assign_tcp_reqsk()
Date: Tue, 7 Apr 2026 13:22:22 +0800 [thread overview]
Message-ID: <4820d99b-f2c4-4b24-b90f-87e6444284c4@linux.dev> (raw)
In-Reply-To: <20264619471.he8Q.martin.lau@linux.dev>
On 4/7/26 3:53 AM, Martin KaFai Lau wrote:
> On Fri, Apr 03, 2026 at 09:58:27AM +0800, Jiayuan Chen wrote:
>> bpf_sk_assign_tcp_reqsk() only validates skb->protocol (L3) but does not
>> check the L4 protocol in the IP header. A BPF program can call this kfunc
>> on a UDP skb with a valid TCP listener socket, which will succeed and
>> attach a TCP reqsk to the UDP skb.
>>
>> When the UDP skb enters the UDP receive path, skb_steal_sock() returns
>> the TCP listener from the reqsk. The UDP code then passes this TCP socket
>> to udp_unicast_rcv_skb() -> __udp_enqueue_schedule_skb(), which casts
>> it to udp_sock and accesses UDP-specific fields at invalid offsets,
>> causing a null pointer dereference and kernel panic:
>>
>> BUG: KASAN: null-ptr-deref in __udp_enqueue_schedule_skb+0x19d/0x1df0
>> Read of size 4 at addr 0000000000000008 by task test_progs/537
>>
>> CPU: 1 UID: 0 PID: 537 Comm: test_progs Not tainted 7.0.0-rc4+ #46 PREEMPT
>> Call Trace:
>> <IRQ>
>> dump_stack_lvl (lib/dump_stack.c:123)
>> print_report (mm/kasan/report.c:487)
>> kasan_report (mm/kasan/report.c:597)
>> __kasan_check_read (mm/kasan/shadow.c:32)
>> __udp_enqueue_schedule_skb (net/ipv4/udp.c:1719)
>> udp_queue_rcv_one_skb (net/ipv4/udp.c:2370 net/ipv4/udp.c:2500)
>> udp_queue_rcv_skb (net/ipv4/udp.c:2532)
>> udp_unicast_rcv_skb (net/ipv4/udp.c:2684)
>> __udp4_lib_rcv (net/ipv4/udp.c:2742)
>> udp_rcv (net/ipv4/udp.c:2937)
>> ip_protocol_deliver_rcu (net/ipv4/ip_input.c:209)
>> ip_local_deliver_finish (./include/linux/rcupdate.h:879 net/ipv4/ip_input.c:242)
>> ip_local_deliver (net/ipv4/ip_input.c:265)
>> __netif_receive_skb_one_core (net/core/dev.c:6164 (discriminator 4))
>> __netif_receive_skb (net/core/dev.c:6280)
>>
>> Fix this by checking the IP header's protocol field in
>> bpf_sk_assign_tcp_reqsk() and rejecting non-TCP skbs with -EINVAL.
>>
>> Note that for IPv6, the nexthdr check does not walk extension headers.
>> This is uncommon for TCP SYN packets in practice, and keeping it simple
>> was agreed upon by Kuniyuki Iwashima.
> sashiko has flagged a similar issue with larger scope.
> Please take a look. Thanks.
>
> https://sashiko.dev/#/patchset/20260403015851.148209-1-jiayuan.chen%40linux.dev
Thanks a lot Martin, sashiko actually dug into a deeper issue here.
Eric and Kuniyuki,
I think the AI review has a point. Since BPF can modify skb fields, the
following sequence still bypasses the protocol check in
bpf_sk_assign_tcp_reqsk():
// for a UDP skb
iph->protocol = TCP
bpf_sk_assign_tcp_reqsk()
iph->protocol = UDP
On top of that, bpf_sk_assign() already has the same problem — it doesn't
validate L4 protocol at all.
So I think we should add a check matching skb against sk in
skb_steal_sock() instead of adding check in bpf helper.
(Also, we probably need some MIB counters for this.)
Below is my diff:
//The core reason for passing protocol as a parameter rather than
reading it from
//the skb is that IPv6 extension headers make it non-trivial to get the
actual nexthdr.
//It's much simpler to just let the caller tell us what protocol it expects.
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index c16de5b7963fd..c1032bb9ea864 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -104,12 +104,13 @@ static inline
struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb,
int doff,
const struct in6_addr *saddr, const
__be16 sport,
const struct in6_addr *daddr, const
__be16 dport,
- bool *refcounted, inet6_ehashfn_t *ehashfn)
+ bool *refcounted, inet6_ehashfn_t *ehashfn,
+ int protocol)
{
struct sock *sk, *reuse_sk;
bool prefetched;
- sk = skb_steal_sock(skb, refcounted, &prefetched);
+ sk = skb_steal_sock(skb, refcounted, &prefetched, protocol);
if (!sk)
return NULL;
@@ -151,7 +152,7 @@ static inline struct sock *__inet6_lookup_skb(struct
sk_buff *skb, int doff,
struct sock *sk;
sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport,
&ip6h->daddr, dport,
- refcounted, inet6_ehashfn);
+ refcounted, inet6_ehashfn, IPPROTO_TCP);
if (IS_ERR(sk))
return NULL;
if (sk)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 5a979dcab5383..0ad73135ece7d 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -433,12 +433,13 @@ static inline
struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int
doff,
const __be32 saddr, const __be16 sport,
const __be32 daddr, const __be16 dport,
- bool *refcounted, inet_ehashfn_t *ehashfn)
+ bool *refcounted, inet_ehashfn_t *ehashfn,
+ int protocol)
{
struct sock *sk, *reuse_sk;
bool prefetched;
- sk = skb_steal_sock(skb, refcounted, &prefetched);
+ sk = skb_steal_sock(skb, refcounted, &prefetched, protocol);
if (!sk)
return NULL;
@@ -469,7 +470,7 @@ struct sock *inet_steal_sock(struct net *net, struct
sk_buff *skb, int doff,
return reuse_sk;
}
-static inline struct sock *__inet_lookup_skb(struct sk_buff *skb,
+static inline struct sock *__tcp_lookup_skb(struct sk_buff *skb,
int doff,
const __be16 sport,
const __be16 dport,
@@ -481,7 +482,7 @@ static inline struct sock *__inet_lookup_skb(struct
sk_buff *skb,
struct sock *sk;
sk = inet_steal_sock(net, skb, doff, iph->saddr, sport,
iph->daddr, dport,
- refcounted, inet_ehashfn);
+ refcounted, inet_ehashfn, IPPROTO_TCP);
if (IS_ERR(sk))
return NULL;
if (sk)
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 5a9c826a7092d..80bd209b2323b 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -91,7 +91,8 @@ static inline struct sock *req_to_sk(struct
request_sock *req)
* @prefetched: is set to true if the socket was assigned from bpf
*/
static inline struct sock *skb_steal_sock(struct sk_buff *skb,
- bool *refcounted, bool
*prefetched)
+ bool *refcounted, bool
*prefetched,
+ int protocol)
{
struct sock *sk = skb->sk;
@@ -103,6 +104,24 @@ static inline struct sock *skb_steal_sock(struct
sk_buff *skb,
*prefetched = skb_sk_is_prefetched(skb);
if (*prefetched) {
+ /* Validate that the stolen socket matches the expected L4
+ * protocol. BPF (bpf_sk_assign) can mistakenly or
maliciously
+ * assign a socket of the wrong type. If the protocols
don't
+ * match, clean up the assignment and return NULL so the
caller
+ * falls through to the normal hash table lookup.
+ *
+ * For full sockets, check sk_protocol directly.
+ * For request sockets (only created by
bpf_sk_assign_tcp_reqsk),
+ * sk_protocol is not in sock_common and cannot be accessed.
+ * Request sockets are always TCP, so assume IPPROTO_TCP.
+ */
+ if ((sk_fullsock(sk) ? sk->sk_protocol : IPPROTO_TCP) !=
protocol) {
+ skb_orphan(skb);
+ *prefetched = false;
+ *refcounted = false;
+ return NULL;
+ }
+
#if IS_ENABLED(CONFIG_SYN_COOKIES)
if (sk->sk_state == TCP_NEW_SYN_RECV &&
inet_reqsk(sk)->syncookie) {
struct request_sock *req = inet_reqsk(sk);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c7b2463c2e254..442a9379aefe6 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2188,7 +2188,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
th = (const struct tcphdr *)skb->data;
iph = ip_hdr(skb);
lookup:
- sk = __inet_lookup_skb(skb, __tcp_hdrlen(th), th->source,
+ sk = __tcp_lookup_skb(skb, __tcp_hdrlen(th), th->source,
th->dest, sdif, &refcounted);
if (!sk)
goto no_tcp_socket;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index b60fad393e182..61f7fbf5ffd9c 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2728,7 +2728,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct
udp_table *udptable,
goto csum_error;
sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr,
uh->source, daddr, uh->dest,
- &refcounted, udp_ehashfn);
+ &refcounted, udp_ehashfn, IPPROTO_UDP);
if (IS_ERR(sk))
goto no_sk;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 010b909275dd0..9c3c25c938ec8 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1115,7 +1115,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct
udp_table *udptable,
/* Check if the socket is already available, e.g. due to early
demux */
sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr,
uh->source, daddr, uh->dest,
- &refcounted, udp6_ehashfn);
+ &refcounted, udp6_ehashfn, IPPROTO_UDP);
if (IS_ERR(sk))
goto no_sk;
---
This fix touches the net core path (skb_steal_sock / inet_steal_sock),
so it probably makes more sense to review it on the net side rather than
the bpf side.
next prev parent reply other threads:[~2026-04-07 5:22 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-03 1:58 [PATCH bpf v6 0/2] bpf: tcp: Fix null-ptr-deref in arbitrary SYN Cookie Jiayuan Chen
2026-04-03 1:58 ` [PATCH bpf v6 1/2] bpf: tcp: Reject non-TCP skb in bpf_sk_assign_tcp_reqsk() Jiayuan Chen
2026-04-03 2:26 ` Eric Dumazet
2026-04-06 19:53 ` Martin KaFai Lau
2026-04-07 5:22 ` Jiayuan Chen [this message]
2026-04-03 1:58 ` [PATCH bpf v6 2/2] selftests/bpf: Add protocol check test for bpf_sk_assign_tcp_reqsk() Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4820d99b-f2c4-4b24-b90f-87e6444284c4@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox