linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: fix udp gso skb_segment after pull from frag_list
@ 2025-05-22  3:18 Shiming Cheng
  2025-05-22 16:49 ` Jakub Kicinski
  2025-05-23 14:04 ` Willem de Bruijn
  0 siblings, 2 replies; 4+ messages in thread
From: Shiming Cheng @ 2025-05-22  3:18 UTC (permalink / raw)
  To: willemdebruijin.kernel, edumazet, davem, kuba, pabeni,
	matthias.bgg
  Cc: linux-kernel, netdev, shiming.cheng, lena.wang

    Detect invalid geometry due to pull from frag_list, and pass to
    regular skb_segment. if only part of the fraglist payload is pulled
    into head_skb, When splitting packets in the skb_segment function,
    it will always cause exception as below.

    Valid SKB_GSO_FRAGLIST skbs
    - consist of two or more segments
    - the head_skb holds the protocol headers plus first gso_size
    - one or more frag_list skbs hold exactly one segment
    - all but the last must be gso_size

    Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
    modify fraglist skbs, breaking these invariants.

    In extreme cases they pull one part of data into skb linear. For UDP,
    this  causes three payloads with lengths of (11,11,10) bytes were
    pulled tail to become (12,10,10) bytes.

    When splitting packets in the skb_segment function, the first two
    packets of (11,11) bytes are split using skb_copy_bits. But when
    the last packet of 10 bytes is split, because hsize becomes nagative,
    it enters the skb_clone process instead of continuing to use
    skb_copy_bits. In fact, the data for skb_clone has already been
    copied into the second packet.

    when hsize < 0,  the payload of the fraglist has already been copied
    (with skb_copy_bits), so there is no need to enter skb_clone to
    process this packet. Instead, continue using skb_copy_bits to process
    the next packet.

    el1h_64_sync_handler+0x3c/0x90
    el1h_64_sync+0x68/0x6c
    skb_segment+0xcd0/0xd14
    __udp_gso_segment+0x334/0x5f4
    udp4_ufo_fragment+0x118/0x15c
    inet_gso_segment+0x164/0x338
    skb_mac_gso_segment+0xc4/0x13c
    __skb_gso_segment+0xc4/0x124
    validate_xmit_skb+0x9c/0x2c0
    validate_xmit_skb_list+0x4c/0x80
    sch_direct_xmit+0x70/0x404
    __dev_queue_xmit+0x64c/0xe5c
    neigh_resolve_output+0x178/0x1c4
    ip_finish_output2+0x37c/0x47c
    __ip_finish_output+0x194/0x240
    ip_finish_output+0x20/0xf4
    ip_output+0x100/0x1a0
    NF_HOOK+0xc4/0x16c
    ip_forward+0x314/0x32c
    ip_rcv+0x90/0x118
    __netif_receive_skb+0x74/0x124
    process_backlog+0xe8/0x1a4
    __napi_poll+0x5c/0x1f8
    net_rx_action+0x154/0x314
    handle_softirqs+0x154/0x4b8
    __do_softirq+0x14/0x20

    [  118.376811] [C201134] dpmaif_rxq0_pus: [name:bug&]kernel BUG at net/core/skbuff.c:4278!
    [  118.376829] [C201134] dpmaif_rxq0_pus: [name:traps&]Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
    [  118.376858] [C201134] dpmaif_rxq0_pus: [name:mediatek_cpufreq_hw&]cpufreq stop DVFS log done
    [  118.470774] [C201134] dpmaif_rxq0_pus: [name:mrdump&]Kernel Offset: 0x178cc00000 from 0xffffffc008000000
    [  118.470810] [C201134] dpmaif_rxq0_pus: [name:mrdump&]PHYS_OFFSET: 0x40000000
    [  118.470827] [C201134] dpmaif_rxq0_pus: [name:mrdump&]pstate: 60400005 (nZCv daif +PAN -UAO)
    [  118.470848] [C201134] dpmaif_rxq0_pus: [name:mrdump&]pc : [0xffffffd79598aefc] skb_segment+0xcd0/0xd14
    [  118.470900] [C201134] dpmaif_rxq0_pus: [name:mrdump&]lr : [0xffffffd79598a5e8] skb_segment+0x3bc/0xd14
    [  118.470928] [C201134] dpmaif_rxq0_pus: [name:mrdump&]sp : ffffffc008013770
    [  118.470941] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x29: ffffffc008013810 x28: 0000000000000040
    [  118.470961] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x27: 000000000000002a x26: faffff81338f5500
    [  118.470976] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x25: f9ffff800c87e000 x24: 0000000000000000
    [  118.470991] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x23: 000000000000004b x22: f4ffff81338f4c00
    [  118.471005] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x21: 000000000000000b x20: 0000000000000000
    [  118.471019] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x19: fdffff8077db5dc8 x18: 0000000000000000
    [  118.471033] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x17: 00000000ad6b63b6 x16: 00000000ad6b63b6
    [  118.471047] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x15: ffffffd795aa59d4 x14: ffffffd795aa7bc4
    [  118.471061] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x13: f4ffff806d40bc00 x12: 0000000100000000
    [  118.471075] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x11: 0054000800000000 x10: 0000000000000040
    [  118.471089] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x9 : 0000000000000040 x8 : 0000000000000055
    [  118.471104] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x7 : ffffffd7959b0868 x6 : ffffffd7959aeebc
    [  118.471118] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x5 : f8ffff8132ac5720 x4 : ffffffc0080134a8
    [  118.471131] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x3 : 0000000000000a20 x2 : 0000000000000001
    [  118.471145] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x1 : 000000000000000a x0 : faffff81338f5500

    BUG_ON:
         pos += skb_headlen(list_skb);
         while (pos < offset + len) {
            BUG_ON(i >= nfrags);
            size = skb_frag_size(frag);

Fixes: dbd50f238dec ("net: move the hsize check to the else block in skb_segment")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6841e61a6bd0..f9888f8dc3fa 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4808,7 +4808,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 
 		hsize = skb_headlen(head_skb) - offset;
 
-		if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
+		if (hsize == 0 && i >= nfrags && skb_headlen(list_skb) &&
 		    (skb_headlen(list_skb) == len || sg)) {
 			BUG_ON(skb_headlen(list_skb) > len);
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: fix udp gso skb_segment after pull from frag_list
  2025-05-22  3:18 [PATCH] net: fix udp gso skb_segment after pull from frag_list Shiming Cheng
@ 2025-05-22 16:49 ` Jakub Kicinski
  2025-05-23 13:52   ` Willem de Bruijn
  2025-05-23 14:04 ` Willem de Bruijn
  1 sibling, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2025-05-22 16:49 UTC (permalink / raw)
  To: Shiming Cheng
  Cc: willemdebruijin.kernel, edumazet, davem, pabeni, matthias.bgg,
	linux-kernel, netdev, lena.wang

On Thu, 22 May 2025 11:18:04 +0800 Shiming Cheng wrote:
>     Detect invalid geometry due to pull from frag_list, and pass to
>     regular skb_segment. if only part of the fraglist payload is pulled
>     into head_skb, When splitting packets in the skb_segment function,
>     it will always cause exception as below.
> 
>     Valid SKB_GSO_FRAGLIST skbs
>     - consist of two or more segments
>     - the head_skb holds the protocol headers plus first gso_size
>     - one or more frag_list skbs hold exactly one segment
>     - all but the last must be gso_size
> 
>     Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
>     modify fraglist skbs, breaking these invariants.
> 
>     In extreme cases they pull one part of data into skb linear. For UDP,
>     this  causes three payloads with lengths of (11,11,10) bytes were
>     pulled tail to become (12,10,10) bytes.
> 
>     When splitting packets in the skb_segment function, the first two
>     packets of (11,11) bytes are split using skb_copy_bits. But when
>     the last packet of 10 bytes is split, because hsize becomes nagative,
>     it enters the skb_clone process instead of continuing to use
>     skb_copy_bits. In fact, the data for skb_clone has already been
>     copied into the second packet.
> 
>     when hsize < 0,  the payload of the fraglist has already been copied
>     (with skb_copy_bits), so there is no need to enter skb_clone to
>     process this packet. Instead, continue using skb_copy_bits to process
>     the next packet.

nit: please un-indent the text, you can keep the stack trace indented
but the commit message explanation should not be. And if you can
run the stack trace thru decode_stacktrace to add source code
references.

This patch seems to be causing regressions for SCTP, see:
https://lore.kernel.org/all/aC82JEehNShMjW8-@strlen.de/
If you send a v2 please make sure you add test cases to
net/core/net_test.c for both the geometry you're trying to fix,
and the geometry you got wrong in v1.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: fix udp gso skb_segment after pull from frag_list
  2025-05-22 16:49 ` Jakub Kicinski
@ 2025-05-23 13:52   ` Willem de Bruijn
  0 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2025-05-23 13:52 UTC (permalink / raw)
  To: Jakub Kicinski, Shiming Cheng
  Cc: willemdebruijin.kernel, edumazet, davem, pabeni, matthias.bgg,
	linux-kernel, netdev, lena.wang

Jakub Kicinski wrote:
> On Thu, 22 May 2025 11:18:04 +0800 Shiming Cheng wrote:
> >     Detect invalid geometry due to pull from frag_list, and pass to
> >     regular skb_segment. if only part of the fraglist payload is pulled
> >     into head_skb, When splitting packets in the skb_segment function,
> >     it will always cause exception as below.
> > 
> >     Valid SKB_GSO_FRAGLIST skbs
> >     - consist of two or more segments
> >     - the head_skb holds the protocol headers plus first gso_size
> >     - one or more frag_list skbs hold exactly one segment
> >     - all but the last must be gso_size
> > 
> >     Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
> >     modify fraglist skbs, breaking these invariants.
> > 
> >     In extreme cases they pull one part of data into skb linear. For UDP,
> >     this  causes three payloads with lengths of (11,11,10) bytes were
> >     pulled tail to become (12,10,10) bytes.
> > 
> >     When splitting packets in the skb_segment function, the first two
> >     packets of (11,11) bytes are split using skb_copy_bits. But when
> >     the last packet of 10 bytes is split, because hsize becomes nagative,
> >     it enters the skb_clone process instead of continuing to use
> >     skb_copy_bits. In fact, the data for skb_clone has already been
> >     copied into the second packet.
> > 
> >     when hsize < 0,  the payload of the fraglist has already been copied
> >     (with skb_copy_bits), so there is no need to enter skb_clone to
> >     process this packet. Instead, continue using skb_copy_bits to process
> >     the next packet.
> 
> nit: please un-indent the text, you can keep the stack trace indented
> but the commit message explanation should not be. And if you can
> run the stack trace thru decode_stacktrace to add source code
> references.

Most of this quotes my earlier patch, but without saying so.

https://lore.kernel.org/netdev/20241001171752.107580-1-willemdebruijn.kernel@gmail.com/

Btw, my email address was misspelled.

> 
> This patch seems to be causing regressions for SCTP, see:
> https://lore.kernel.org/all/aC82JEehNShMjW8-@strlen.de/
> If you send a v2 please make sure you add test cases to
> net/core/net_test.c for both the geometry you're trying to fix,
> and the geometry you got wrong in v1.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: fix udp gso skb_segment after pull from frag_list
  2025-05-22  3:18 [PATCH] net: fix udp gso skb_segment after pull from frag_list Shiming Cheng
  2025-05-22 16:49 ` Jakub Kicinski
@ 2025-05-23 14:04 ` Willem de Bruijn
  1 sibling, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2025-05-23 14:04 UTC (permalink / raw)
  To: Shiming Cheng, willemdebruijin.kernel, edumazet, davem, kuba,
	pabeni, matthias.bgg
  Cc: linux-kernel, netdev, shiming.cheng, lena.wang

Shiming Cheng wrote:
>     Detect invalid geometry due to pull from frag_list, and pass to
>     regular skb_segment. if only part of the fraglist payload is pulled

This does not match the patch.

This is quoted from another patch that moves skb handling from
skb_segment_list to skb_segment.

This patch does not do that.

Btw, don't forget to target the right list. [PATCH net]

>     into head_skb, When splitting packets in the skb_segment function,
>     it will always cause exception as below.
> 
>     Valid SKB_GSO_FRAGLIST skbs
>     - consist of two or more segments
>     - the head_skb holds the protocol headers plus first gso_size
>     - one or more frag_list skbs hold exactly one segment
>     - all but the last must be gso_size
> 
>     Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
>     modify fraglist skbs, breaking these invariants.
> 
>     In extreme cases they pull one part of data into skb linear. For UDP,
>     this  causes three payloads with lengths of (11,11,10) bytes were
>     pulled tail to become (12,10,10) bytes.
> 
>     When splitting packets in the skb_segment function, the first two
>     packets of (11,11) bytes are split using skb_copy_bits. But when
>     the last packet of 10 bytes is split, because hsize becomes nagative,
>     it enters the skb_clone process instead of continuing to use
>     skb_copy_bits. In fact, the data for skb_clone has already been
>     copied into the second packet.
> 
>     when hsize < 0,  the payload of the fraglist has already been copied
>     (with skb_copy_bits), so there is no need to enter skb_clone to
>     process this packet. Instead, continue using skb_copy_bits to process
>     the next packet.

I don't fully follow this. And as always with skb_segment am concerned
that changing this condition may break other valid uses of this code.

If this is a fraglist gso packet (SKB_GSO_FRAGLIST), I have a mind to
just flag those that fail to meet the four invariants at the top, and
in that case take a slow path that linearizes the skb and passes that
to skb_segment.

That should take care of a whole host of such bad fraglist gso skbs,
without possibly breaking existing gso skb handling.

>     el1h_64_sync_handler+0x3c/0x90
>     el1h_64_sync+0x68/0x6c
>     skb_segment+0xcd0/0xd14
>     __udp_gso_segment+0x334/0x5f4
>     udp4_ufo_fragment+0x118/0x15c
>     inet_gso_segment+0x164/0x338
>     skb_mac_gso_segment+0xc4/0x13c
>     __skb_gso_segment+0xc4/0x124
>     validate_xmit_skb+0x9c/0x2c0
>     validate_xmit_skb_list+0x4c/0x80
>     sch_direct_xmit+0x70/0x404
>     __dev_queue_xmit+0x64c/0xe5c
>     neigh_resolve_output+0x178/0x1c4
>     ip_finish_output2+0x37c/0x47c
>     __ip_finish_output+0x194/0x240
>     ip_finish_output+0x20/0xf4
>     ip_output+0x100/0x1a0
>     NF_HOOK+0xc4/0x16c
>     ip_forward+0x314/0x32c
>     ip_rcv+0x90/0x118
>     __netif_receive_skb+0x74/0x124
>     process_backlog+0xe8/0x1a4
>     __napi_poll+0x5c/0x1f8
>     net_rx_action+0x154/0x314
>     handle_softirqs+0x154/0x4b8
>     __do_softirq+0x14/0x20
> 
>     [  118.376811] [C201134] dpmaif_rxq0_pus: [name:bug&]kernel BUG at net/core/skbuff.c:4278!
>     [  118.376829] [C201134] dpmaif_rxq0_pus: [name:traps&]Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>     [  118.376858] [C201134] dpmaif_rxq0_pus: [name:mediatek_cpufreq_hw&]cpufreq stop DVFS log done
>     [  118.470774] [C201134] dpmaif_rxq0_pus: [name:mrdump&]Kernel Offset: 0x178cc00000 from 0xffffffc008000000
>     [  118.470810] [C201134] dpmaif_rxq0_pus: [name:mrdump&]PHYS_OFFSET: 0x40000000
>     [  118.470827] [C201134] dpmaif_rxq0_pus: [name:mrdump&]pstate: 60400005 (nZCv daif +PAN -UAO)
>     [  118.470848] [C201134] dpmaif_rxq0_pus: [name:mrdump&]pc : [0xffffffd79598aefc] skb_segment+0xcd0/0xd14
>     [  118.470900] [C201134] dpmaif_rxq0_pus: [name:mrdump&]lr : [0xffffffd79598a5e8] skb_segment+0x3bc/0xd14
>     [  118.470928] [C201134] dpmaif_rxq0_pus: [name:mrdump&]sp : ffffffc008013770
>     [  118.470941] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x29: ffffffc008013810 x28: 0000000000000040
>     [  118.470961] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x27: 000000000000002a x26: faffff81338f5500
>     [  118.470976] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x25: f9ffff800c87e000 x24: 0000000000000000
>     [  118.470991] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x23: 000000000000004b x22: f4ffff81338f4c00
>     [  118.471005] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x21: 000000000000000b x20: 0000000000000000
>     [  118.471019] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x19: fdffff8077db5dc8 x18: 0000000000000000
>     [  118.471033] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x17: 00000000ad6b63b6 x16: 00000000ad6b63b6
>     [  118.471047] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x15: ffffffd795aa59d4 x14: ffffffd795aa7bc4
>     [  118.471061] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x13: f4ffff806d40bc00 x12: 0000000100000000
>     [  118.471075] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x11: 0054000800000000 x10: 0000000000000040
>     [  118.471089] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x9 : 0000000000000040 x8 : 0000000000000055
>     [  118.471104] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x7 : ffffffd7959b0868 x6 : ffffffd7959aeebc
>     [  118.471118] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x5 : f8ffff8132ac5720 x4 : ffffffc0080134a8
>     [  118.471131] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x3 : 0000000000000a20 x2 : 0000000000000001
>     [  118.471145] [C201134] dpmaif_rxq0_pus: [name:mrdump&]x1 : 000000000000000a x0 : faffff81338f5500
> 
>     BUG_ON:
>          pos += skb_headlen(list_skb);
>          while (pos < offset + len) {
>             BUG_ON(i >= nfrags);
>             size = skb_frag_size(frag);
> 
> Fixes: dbd50f238dec ("net: move the hsize check to the else block in skb_segment")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> ---
>  net/core/skbuff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6841e61a6bd0..f9888f8dc3fa 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4808,7 +4808,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  
>  		hsize = skb_headlen(head_skb) - offset;
>  
> -		if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
> +		if (hsize == 0 && i >= nfrags && skb_headlen(list_skb) &&
>  		    (skb_headlen(list_skb) == len || sg)) {
>  			BUG_ON(skb_headlen(list_skb) > len);
>  
> -- 
> 2.45.2
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-23 14:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22  3:18 [PATCH] net: fix udp gso skb_segment after pull from frag_list Shiming Cheng
2025-05-22 16:49 ` Jakub Kicinski
2025-05-23 13:52   ` Willem de Bruijn
2025-05-23 14:04 ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).