* Re: [PATCH bpf] bpf, sockmap: fix BUG_ON in skb_to_sgvec() on a resized ingress skb
2026-06-13 8:24 [PATCH bpf] bpf, sockmap: fix BUG_ON in skb_to_sgvec() on a resized ingress skb Sechang Lim
@ 2026-06-18 19:00 ` John Fastabend
0 siblings, 0 replies; 2+ messages in thread
From: John Fastabend @ 2026-06-18 19:00 UTC (permalink / raw)
To: Sechang Lim
Cc: Jakub Sitnicki, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Liu Jian, Daniel Borkmann, Cong Wang,
netdev, bpf, linux-kernel
On Sat, Jun 13, 2026 at 08:24:31AM +0000, Sechang Lim wrote:
>sk_psock_skb_ingress_enqueue() maps a received message into a scatterlist
>with skb_to_sgvec(skb, sg, off, len). On the SK_SKB strparser path off and
>len come from the message's strp_msg (stm->offset and stm->full_len), set
>by the stream parser. strparser does not trim the skb, so normally
>skb->len - off >= full_len and len is within the skb.
>
>An SK_SKB verdict (or parser) program may call bpf_skb_change_tail() and
>shrink the skb after full_len was recorded. len then covers more bytes than
>the skb holds, __skb_to_sgvec() walks past the data and trips BUG_ON(len):
FWIW this only happens if the strparser program is also attached. If
there is no strparser program stm->offset = 0 and stm->full_len will be
whatever the verdict program set. So there we would get
len = skb->len; // then if it shrinks to skb->len - X its ok.
off = 0;
>
> kernel BUG at net/core/skbuff.c:5286!
> RIP: 0010:__skb_to_sgvec+0x78c/0x790
> Call Trace:
> <IRQ>
> skb_to_sgvec+0x32/0x90
> sk_psock_skb_ingress_enqueue+0x42/0x370
> sk_psock_skb_ingress_self+0x1a8/0x200
> sk_psock_verdict_apply+0x33c/0x360
> sk_psock_strp_read+0x24a/0x370
> __strp_recv+0x66d/0xda0
> __tcp_read_sock+0x13d/0x590
> tcp_bpf_strp_read_sock+0x195/0x320
> strp_data_ready+0x267/0x340
> sk_psock_strp_data_ready+0x1ce/0x350
> tcp_data_queue+0x1364/0x2fd0
> </IRQ>
>
>Clamp len to skb->len - off, and drop the message if off is already past
>the skb. sk_psock_skb_ingress_enqueue() is the only skb_to_sgvec() caller
>and both ingress paths (verdict SK_PASS and the backlog worker) reach it.
>The clamp is a no-op unless the skb was shrunk.
>
>Fixes: 7303524e04af ("skmsg: Lose offset info in sk_psock_skb_ingress")
>Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
>---
> net/core/skmsg.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
>diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>index e1850caf1a71..2961178ebd1e 100644
>--- a/net/core/skmsg.c
>+++ b/net/core/skmsg.c
>@@ -550,6 +550,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> {
> int num_sge, copied;
>
>+ if (off >= skb->len)
>+ return -EINVAL;
>+ len = min_t(u32, len, skb->len - off);
>+
This is blocking the BUG but will break the socket. We should fix
at the cause. Something like this untested... although I've never
used the strparser program in any of our cases.
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 2521b643fa05..95347f9d140c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -542,6 +542,20 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk,
return alloc_sk_msg(GFP_KERNEL);
}
+static bool sk_psock_skb_strp_range(struct sk_buff *skb, u32 *off, u32 *len)
+{
+ struct strp_msg *stm = strp_msg(skb);
+
+ *off = stm->offset;
+ if (unlikely(*off >= skb->len)) {
+ *len = 0;
+ return false;
+ }
+
+ *len = min_t(u32, stm->full_len, skb->len - *off);
+ return *len;
+}
+
static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
u32 off, u32 len,
struct sk_psock *psock,
@@ -696,12 +710,8 @@ static void sk_psock_backlog(struct work_struct *work)
while ((skb = skb_peek(&psock->ingress_skb))) {
len = skb->len;
off = 0;
- if (skb_bpf_strparser(skb)) {
- struct strp_msg *stm = strp_msg(skb);
-
- off = stm->offset;
- len = stm->full_len;
- }
+ if (skb_bpf_strparser(skb))
+ sk_psock_skb_strp_range(skb, &off, &len);
/* Resume processing from previous partial state */
if (unlikely(state->len)) {
@@ -709,6 +719,9 @@ static void sk_psock_backlog(struct work_struct *work)
off = state->off;
}
+ if (unlikely(!len))
+ goto out_free_skb;
+
ingress = skb_bpf_ingress(skb);
skb_bpf_redirect_clear(skb);
do {
@@ -737,7 +750,8 @@ static void sk_psock_backlog(struct work_struct *work)
len -= ret;
} while (len);
- /* The entire skb sent, clear state */
+out_free_skb:
+ /* The skb has been handled, clear state. */
sk_psock_skb_state(psock, state, 0, 0);
skb = skb_dequeue(&psock->ingress_skb);
kfree_skb(skb);
@@ -1020,10 +1034,10 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
len = skb->len;
off = 0;
if (skb_bpf_strparser(skb)) {
- struct strp_msg *stm = strp_msg(skb);
-
- off = stm->offset;
- len = stm->full_len;
+ if (unlikely(!sk_psock_skb_strp_range(skb, &off, &len))) {
+ err = 0;
+ goto out_free;
+ }
}
err = sk_psock_skb_ingress_self(psock, skb, off, len, false);
}
^ permalink raw reply related [flat|nested] 2+ messages in thread