Netdev List
 help / color / mirror / Atom feed
* [PATCH bpf] bpf, sockmap: fix BUG_ON in skb_to_sgvec() on a resized ingress skb
@ 2026-06-13  8:24 Sechang Lim
  2026-06-18 19:00 ` John Fastabend
  0 siblings, 1 reply; 2+ messages in thread
From: Sechang Lim @ 2026-06-13  8:24 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Liu Jian, Daniel Borkmann, Cong Wang, netdev, bpf,
	linux-kernel

sk_psock_skb_ingress_enqueue() maps a received message into a scatterlist
with skb_to_sgvec(skb, sg, off, len). On the SK_SKB strparser path off and
len come from the message's strp_msg (stm->offset and stm->full_len), set
by the stream parser. strparser does not trim the skb, so normally
skb->len - off >= full_len and len is within the skb.

An SK_SKB verdict (or parser) program may call bpf_skb_change_tail() and
shrink the skb after full_len was recorded. len then covers more bytes than
the skb holds, __skb_to_sgvec() walks past the data and trips BUG_ON(len):

  kernel BUG at net/core/skbuff.c:5286!
  RIP: 0010:__skb_to_sgvec+0x78c/0x790
  Call Trace:
   <IRQ>
   skb_to_sgvec+0x32/0x90
   sk_psock_skb_ingress_enqueue+0x42/0x370
   sk_psock_skb_ingress_self+0x1a8/0x200
   sk_psock_verdict_apply+0x33c/0x360
   sk_psock_strp_read+0x24a/0x370
   __strp_recv+0x66d/0xda0
   __tcp_read_sock+0x13d/0x590
   tcp_bpf_strp_read_sock+0x195/0x320
   strp_data_ready+0x267/0x340
   sk_psock_strp_data_ready+0x1ce/0x350
   tcp_data_queue+0x1364/0x2fd0
   </IRQ>

Clamp len to skb->len - off, and drop the message if off is already past
the skb. sk_psock_skb_ingress_enqueue() is the only skb_to_sgvec() caller
and both ingress paths (verdict SK_PASS and the backlog worker) reach it.
The clamp is a no-op unless the skb was shrunk.

Fixes: 7303524e04af ("skmsg: Lose offset info in sk_psock_skb_ingress")
Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
---
 net/core/skmsg.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index e1850caf1a71..2961178ebd1e 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -550,6 +550,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
 {
 	int num_sge, copied;
 
+	if (off >= skb->len)
+		return -EINVAL;
+	len = min_t(u32, len, skb->len - off);
+
 	/* skb_to_sgvec will fail when the total number of fragments in
 	 * frag_list and frags exceeds MAX_MSG_FRAGS. For example, the
 	 * caller may aggregate multiple skbs.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH bpf] bpf, sockmap: fix BUG_ON in skb_to_sgvec() on a resized ingress skb
  2026-06-13  8:24 [PATCH bpf] bpf, sockmap: fix BUG_ON in skb_to_sgvec() on a resized ingress skb Sechang Lim
@ 2026-06-18 19:00 ` John Fastabend
  0 siblings, 0 replies; 2+ messages in thread
From: John Fastabend @ 2026-06-18 19:00 UTC (permalink / raw)
  To: Sechang Lim
  Cc: Jakub Sitnicki, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Liu Jian, Daniel Borkmann, Cong Wang,
	netdev, bpf, linux-kernel

On Sat, Jun 13, 2026 at 08:24:31AM +0000, Sechang Lim wrote:
>sk_psock_skb_ingress_enqueue() maps a received message into a scatterlist
>with skb_to_sgvec(skb, sg, off, len). On the SK_SKB strparser path off and
>len come from the message's strp_msg (stm->offset and stm->full_len), set
>by the stream parser. strparser does not trim the skb, so normally
>skb->len - off >= full_len and len is within the skb.
>
>An SK_SKB verdict (or parser) program may call bpf_skb_change_tail() and
>shrink the skb after full_len was recorded. len then covers more bytes than
>the skb holds, __skb_to_sgvec() walks past the data and trips BUG_ON(len):

FWIW this only happens if the strparser program is also attached. If 
there is no strparser program stm->offset = 0 and stm->full_len will be 
whatever the verdict program set. So there we would get

   len = skb->len; // then if it shrinks to skb->len - X its ok.
   off = 0;


>
>  kernel BUG at net/core/skbuff.c:5286!
>  RIP: 0010:__skb_to_sgvec+0x78c/0x790
>  Call Trace:
>   <IRQ>
>   skb_to_sgvec+0x32/0x90
>   sk_psock_skb_ingress_enqueue+0x42/0x370
>   sk_psock_skb_ingress_self+0x1a8/0x200
>   sk_psock_verdict_apply+0x33c/0x360
>   sk_psock_strp_read+0x24a/0x370
>   __strp_recv+0x66d/0xda0
>   __tcp_read_sock+0x13d/0x590
>   tcp_bpf_strp_read_sock+0x195/0x320
>   strp_data_ready+0x267/0x340
>   sk_psock_strp_data_ready+0x1ce/0x350
>   tcp_data_queue+0x1364/0x2fd0
>   </IRQ>
>
>Clamp len to skb->len - off, and drop the message if off is already past
>the skb. sk_psock_skb_ingress_enqueue() is the only skb_to_sgvec() caller
>and both ingress paths (verdict SK_PASS and the backlog worker) reach it.
>The clamp is a no-op unless the skb was shrunk.
>
>Fixes: 7303524e04af ("skmsg: Lose offset info in sk_psock_skb_ingress")
>Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
>---
> net/core/skmsg.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
>diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>index e1850caf1a71..2961178ebd1e 100644
>--- a/net/core/skmsg.c
>+++ b/net/core/skmsg.c
>@@ -550,6 +550,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> {
> 	int num_sge, copied;
>
>+	if (off >= skb->len)
>+		return -EINVAL;
>+	len = min_t(u32, len, skb->len - off);
>+

This is blocking the BUG but will break the socket. We should fix
at the cause. Something like this untested... although I've never
used the strparser program in any of our cases.

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 2521b643fa05..95347f9d140c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -542,6 +542,20 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk,
         return alloc_sk_msg(GFP_KERNEL);
  }

+static bool sk_psock_skb_strp_range(struct sk_buff *skb, u32 *off, u32 *len)
+{
+       struct strp_msg *stm = strp_msg(skb);
+
+       *off = stm->offset;
+       if (unlikely(*off >= skb->len)) {
+               *len = 0;
+               return false;
+       }
+
+       *len = min_t(u32, stm->full_len, skb->len - *off);
+       return *len;
+}
+
  static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
                                         u32 off, u32 len,
                                         struct sk_psock *psock,
@@ -696,12 +710,8 @@ static void sk_psock_backlog(struct work_struct *work)
         while ((skb = skb_peek(&psock->ingress_skb))) {
                 len = skb->len;
                 off = 0;
-               if (skb_bpf_strparser(skb)) {
-                       struct strp_msg *stm = strp_msg(skb);
-
-                       off = stm->offset;
-                       len = stm->full_len;
-               }
+               if (skb_bpf_strparser(skb))
+                       sk_psock_skb_strp_range(skb, &off, &len);

                 /* Resume processing from previous partial state */
                 if (unlikely(state->len)) {
@@ -709,6 +719,9 @@ static void sk_psock_backlog(struct work_struct *work)
                         off = state->off;
                 }

+               if (unlikely(!len))
+                       goto out_free_skb;
+
                 ingress = skb_bpf_ingress(skb);
                 skb_bpf_redirect_clear(skb);
                 do {
@@ -737,7 +750,8 @@ static void sk_psock_backlog(struct work_struct *work)
                         len -= ret;
                 } while (len);

-               /* The entire skb sent, clear state */
+out_free_skb:
+               /* The skb has been handled, clear state. */
                 sk_psock_skb_state(psock, state, 0, 0);
                 skb = skb_dequeue(&psock->ingress_skb);
                 kfree_skb(skb);
@@ -1020,10 +1034,10 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
                         len = skb->len;
                         off = 0;
                         if (skb_bpf_strparser(skb)) {
-                               struct strp_msg *stm = strp_msg(skb);
-
-                               off = stm->offset;
-                               len = stm->full_len;
+                               if (unlikely(!sk_psock_skb_strp_range(skb, &off, &len))) {
+                                       err = 0;
+                                       goto out_free;
+                               }
                         }
                         err = sk_psock_skb_ingress_self(psock, skb, off, len, false);
		}

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-18 19:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-13  8:24 [PATCH bpf] bpf, sockmap: fix BUG_ON in skb_to_sgvec() on a resized ingress skb Sechang Lim
2026-06-18 19:00 ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox