* [PATCH net v2] net: skbuff: fix truesize and head state corruption in skb_segment_list
@ 2025-12-31 2:54 mheib
2025-12-31 16:58 ` Willem de Bruijn
0 siblings, 1 reply; 4+ messages in thread
From: mheib @ 2025-12-31 2:54 UTC (permalink / raw)
To: netdev
Cc: davem, edumazet, kuba, pabeni, horms, kernelxing, kuniyu,
willemdebruijn.kernel, atenart, aleksander.lobakin, Mohammad Heib
From: Mohammad Heib <mheib@redhat.com>
When skb_segment_list is called during packet forwarding through
a bridge or VXLAN, it assumes that every fragment in a frag_list
carries its own socket ownership and head state. While this is true for
GSO packets created by the transmit path (via __ip_append_data), it is
not true for packets built by the GRO receive path.
In the GRO path, fragments are "orphans" (skb->sk == NULL) and were
never charged to a socket. However, the current logic in
skb_segment_list unconditionally adds every fragment's truesize to
delta_truesize and subsequently subtracts this from the parent SKB.
This results a memory accounting leak, Since GRO fragments were never
charged to the socket in the first place, the "refund" results in the
parent SKB returning less memory than originally charged when it is
finally freed. This leads to a permanent leak in sk_wmem_alloc, which
prevents the socket from being destroyed, resulting in a persistent memory
leak of the socket object and its related metadata.
The leak can be observed via KMEMLEAK when tearing down the networking
environment:
unreferenced object 0xffff8881e6eb9100 (size 2048):
comm "ping", pid 6720, jiffies 4295492526
backtrace:
kmem_cache_alloc_noprof+0x5c6/0x800
sk_prot_alloc+0x5b/0x220
sk_alloc+0x35/0xa00
inet6_create.part.0+0x303/0x10d0
__sock_create+0x248/0x640
__sys_socket+0x11b/0x1d0
This patch modifies skb_segment_list to only perform head state release
and truesize subtraction if the fragment explicitly owns a socket
reference. For GRO-forwarded packets where fragments are not owners,
the parent maintains the full truesize and acts as the single anchor for
the memory refund upon destruction.
Fixes: ed4cccef64c1 ("gro: fix ownership transfer")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
---
net/core/skbuff.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a00808f7be6a..63d3d76162ef 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4656,7 +4656,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
list_skb = list_skb->next;
err = 0;
- delta_truesize += nskb->truesize;
+
+ /* Only track truesize delta and release head state for fragments
+ * that own a socket. GRO-forwarded fragments (sk == NULL) rely on
+ * the parent SKB for memory accounting.
+ */
+ if (nskb->sk)
+ delta_truesize += nskb->truesize;
+
if (skb_shared(nskb)) {
tmp = skb_clone(nskb, GFP_ATOMIC);
if (tmp) {
@@ -4684,7 +4691,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
skb_push(nskb, -skb_network_offset(nskb) + offset);
- skb_release_head_state(nskb);
+ /* For GRO-forwarded packets, fragments have no head state
+ * (no sk/destructor) to release. Skip this.
+ */
+ if (nskb->sk)
+ skb_release_head_state(nskb);
+
len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
__copy_skb_header(nskb, skb);
--
2.52.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH net v2] net: skbuff: fix truesize and head state corruption in skb_segment_list 2025-12-31 2:54 [PATCH net v2] net: skbuff: fix truesize and head state corruption in skb_segment_list mheib @ 2025-12-31 16:58 ` Willem de Bruijn 2026-01-01 22:24 ` mohammad heib 0 siblings, 1 reply; 4+ messages in thread From: Willem de Bruijn @ 2025-12-31 16:58 UTC (permalink / raw) To: mheib, netdev Cc: davem, edumazet, kuba, pabeni, horms, kernelxing, kuniyu, willemdebruijn.kernel, atenart, aleksander.lobakin, Mohammad Heib mheib@ wrote: > From: Mohammad Heib <mheib@redhat.com> > > When skb_segment_list is called during packet forwarding through > a bridge or VXLAN, it assumes that every fragment in a frag_list > carries its own socket ownership and head state. While this is true for > GSO packets created by the transmit path (via __ip_append_data), it is > not true for packets built by the GRO receive path. > > In the GRO path, fragments are "orphans" (skb->sk == NULL) and were > never charged to a socket. However, the current logic in > skb_segment_list unconditionally adds every fragment's truesize to > delta_truesize and subsequently subtracts this from the parent SKB. > > This results a memory accounting leak, Since GRO fragments were never > charged to the socket in the first place, the "refund" results in the > parent SKB returning less memory than originally charged when it is > finally freed. This leads to a permanent leak in sk_wmem_alloc, which > prevents the socket from being destroyed, resulting in a persistent memory > leak of the socket object and its related metadata. > > The leak can be observed via KMEMLEAK when tearing down the networking > environment: > > unreferenced object 0xffff8881e6eb9100 (size 2048): > comm "ping", pid 6720, jiffies 4295492526 > backtrace: > kmem_cache_alloc_noprof+0x5c6/0x800 > sk_prot_alloc+0x5b/0x220 > sk_alloc+0x35/0xa00 > inet6_create.part.0+0x303/0x10d0 > __sock_create+0x248/0x640 > __sys_socket+0x11b/0x1d0 > > This patch modifies skb_segment_list to only perform head state release > and truesize subtraction if the fragment explicitly owns a socket > reference. For GRO-forwarded packets where fragments are not owners, > the parent maintains the full truesize and acts as the single anchor for > the memory refund upon destruction. > > Fixes: ed4cccef64c1 ("gro: fix ownership transfer") > Signed-off-by: Mohammad Heib <mheib@redhat.com> > --- > net/core/skbuff.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index a00808f7be6a..63d3d76162ef 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4656,7 +4656,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > list_skb = list_skb->next; > > err = 0; > - delta_truesize += nskb->truesize; > + > + /* Only track truesize delta and release head state for fragments > + * that own a socket. GRO-forwarded fragments (sk == NULL) rely on > + * the parent SKB for memory accounting. > + */ > + if (nskb->sk) > + delta_truesize += nskb->truesize; > + Similar to the previous point: if all paths that generate GSO packets with SKB_GSO_FRAGLIST are generated from skb_gro_receive_list and that function always sets skb->sk = NULL, is there even a need for this brancy (and comment)? > if (skb_shared(nskb)) { > tmp = skb_clone(nskb, GFP_ATOMIC); > if (tmp) { > @@ -4684,7 +4691,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > > skb_push(nskb, -skb_network_offset(nskb) + offset); > > - skb_release_head_state(nskb); > + /* For GRO-forwarded packets, fragments have no head state > + * (no sk/destructor) to release. Skip this. > + */ > + if (nskb->sk) > + skb_release_head_state(nskb); > + > len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb); > __copy_skb_header(nskb, skb); > > -- > 2.52.0 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v2] net: skbuff: fix truesize and head state corruption in skb_segment_list 2025-12-31 16:58 ` Willem de Bruijn @ 2026-01-01 22:24 ` mohammad heib 2026-01-01 23:58 ` Willem de Bruijn 0 siblings, 1 reply; 4+ messages in thread From: mohammad heib @ 2026-01-01 22:24 UTC (permalink / raw) To: Willem de Bruijn, netdev Cc: davem, edumazet, kuba, pabeni, horms, kernelxing, kuniyu, atenart, aleksander.lobakin Hi Willem, You're right. I did a deeper dive into the callers and where the SKB_GSO_FRAGLIST bit actually originates. It turns out it is exclusively set in the GRO complete paths (tcp4, udp4, tcp6, udp6). Since these packets are built by the GRO engine for forwarding, the fragments are guaranteed to be orphans without socket ownership or head state. i will simply removed the truesize accumulation, as they are inapplicable to this GSO type. One thing that’s confusing me is whether I should remove the call to skb_release_head_state(). This function updates reference counts for some fields in the skb, even when no socket is attached to it. So I’m wondering should I remove this call, or keep it as is? What do you think? On 12/31/25 6:58 PM, Willem de Bruijn wrote: > mheib@ wrote: >> From: Mohammad Heib <mheib@redhat.com> >> >> When skb_segment_list is called during packet forwarding through >> a bridge or VXLAN, it assumes that every fragment in a frag_list >> carries its own socket ownership and head state. While this is true for >> GSO packets created by the transmit path (via __ip_append_data), it is >> not true for packets built by the GRO receive path. >> >> In the GRO path, fragments are "orphans" (skb->sk == NULL) and were >> never charged to a socket. However, the current logic in >> skb_segment_list unconditionally adds every fragment's truesize to >> delta_truesize and subsequently subtracts this from the parent SKB. >> >> This results a memory accounting leak, Since GRO fragments were never >> charged to the socket in the first place, the "refund" results in the >> parent SKB returning less memory than originally charged when it is >> finally freed. This leads to a permanent leak in sk_wmem_alloc, which >> prevents the socket from being destroyed, resulting in a persistent memory >> leak of the socket object and its related metadata. >> >> The leak can be observed via KMEMLEAK when tearing down the networking >> environment: >> >> unreferenced object 0xffff8881e6eb9100 (size 2048): >> comm "ping", pid 6720, jiffies 4295492526 >> backtrace: >> kmem_cache_alloc_noprof+0x5c6/0x800 >> sk_prot_alloc+0x5b/0x220 >> sk_alloc+0x35/0xa00 >> inet6_create.part.0+0x303/0x10d0 >> __sock_create+0x248/0x640 >> __sys_socket+0x11b/0x1d0 >> >> This patch modifies skb_segment_list to only perform head state release >> and truesize subtraction if the fragment explicitly owns a socket >> reference. For GRO-forwarded packets where fragments are not owners, >> the parent maintains the full truesize and acts as the single anchor for >> the memory refund upon destruction. >> >> Fixes: ed4cccef64c1 ("gro: fix ownership transfer") >> Signed-off-by: Mohammad Heib <mheib@redhat.com> >> --- >> net/core/skbuff.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index a00808f7be6a..63d3d76162ef 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -4656,7 +4656,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >> list_skb = list_skb->next; >> >> err = 0; >> - delta_truesize += nskb->truesize; >> + >> + /* Only track truesize delta and release head state for fragments >> + * that own a socket. GRO-forwarded fragments (sk == NULL) rely on >> + * the parent SKB for memory accounting. >> + */ >> + if (nskb->sk) >> + delta_truesize += nskb->truesize; >> + > > Similar to the previous point: if all paths that generate GSO packets > with SKB_GSO_FRAGLIST are generated from skb_gro_receive_list and that > function always sets skb->sk = NULL, is there even a need for this > brancy (and comment)? > >> if (skb_shared(nskb)) { >> tmp = skb_clone(nskb, GFP_ATOMIC); >> if (tmp) { >> @@ -4684,7 +4691,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, >> >> skb_push(nskb, -skb_network_offset(nskb) + offset); >> >> - skb_release_head_state(nskb); >> + /* For GRO-forwarded packets, fragments have no head state >> + * (no sk/destructor) to release. Skip this. >> + */ >> + if (nskb->sk) >> + skb_release_head_state(nskb); >> + >> len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb); >> __copy_skb_header(nskb, skb); >> >> -- >> 2.52.0 >> > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v2] net: skbuff: fix truesize and head state corruption in skb_segment_list 2026-01-01 22:24 ` mohammad heib @ 2026-01-01 23:58 ` Willem de Bruijn 0 siblings, 0 replies; 4+ messages in thread From: Willem de Bruijn @ 2026-01-01 23:58 UTC (permalink / raw) To: mohammad heib, Willem de Bruijn, netdev Cc: davem, edumazet, kuba, pabeni, horms, kernelxing, kuniyu, atenart, aleksander.lobakin, fw, steffen.klassert mohammad heib wrote: > Hi Willem, > > You're right. I did a deeper dive into the callers and where the > SKB_GSO_FRAGLIST bit actually originates. > > It turns out it is exclusively set in the GRO complete paths (tcp4, > udp4, tcp6, udp6). Since these packets are built by the GRO engine for > forwarding, the fragments are guaranteed to be orphans without socket > ownership or head state. When revising, please include my short note on relevant prior patches. Also, let's self document these known invariants with /* Only skb_gro_receive_list generated skbs arrive here */ DEBUG_NET_WARN_ON_ONCE(!skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST); DEBUG_NET_WARN_ON_ONCE(nskb->sk); > i will simply removed the truesize accumulation, as they are > inapplicable to this GSO type. > > > One thing that’s confusing me is whether I should remove the call to > skb_release_head_state(). > This function updates reference counts for some fields in the skb, even > when no socket is attached to it. > > So I’m wondering should I remove this call, or keep it as is? > What do you think? This was apparently introduced in commit cf673ed0e057 ("net: fix fraglist segmentation reference count leak"). To handle a special case for skbs with skb extensions. From that commit: "where the frag skbs will get the header copied from the head skb in skb_segment_list() by calling __copy_skb_header(), which could overwrite the frag skbs' extensions by __skb_ext_copy() and cause a leak." So I read that has (1) this is still needed for some skbs (though the only thing that is really needed is the skb_ext_reset(skb) call) and (2) it is evidently safe to call unconditionally in skb_segment_list. Same as previous, please include this context in the commit message. > > On 12/31/25 6:58 PM, Willem de Bruijn wrote: > > mheib@ wrote: > >> From: Mohammad Heib <mheib@redhat.com> > >> > >> When skb_segment_list is called during packet forwarding through > >> a bridge or VXLAN, it assumes that every fragment in a frag_list > >> carries its own socket ownership and head state. While this is true for > >> GSO packets created by the transmit path (via __ip_append_data), it is > >> not true for packets built by the GRO receive path. > >> > >> In the GRO path, fragments are "orphans" (skb->sk == NULL) and were > >> never charged to a socket. However, the current logic in > >> skb_segment_list unconditionally adds every fragment's truesize to > >> delta_truesize and subsequently subtracts this from the parent SKB. > >> > >> This results a memory accounting leak, Since GRO fragments were never > >> charged to the socket in the first place, the "refund" results in the > >> parent SKB returning less memory than originally charged when it is > >> finally freed. This leads to a permanent leak in sk_wmem_alloc, which > >> prevents the socket from being destroyed, resulting in a persistent memory > >> leak of the socket object and its related metadata. > >> > >> The leak can be observed via KMEMLEAK when tearing down the networking > >> environment: > >> > >> unreferenced object 0xffff8881e6eb9100 (size 2048): > >> comm "ping", pid 6720, jiffies 4295492526 > >> backtrace: > >> kmem_cache_alloc_noprof+0x5c6/0x800 > >> sk_prot_alloc+0x5b/0x220 > >> sk_alloc+0x35/0xa00 > >> inet6_create.part.0+0x303/0x10d0 > >> __sock_create+0x248/0x640 > >> __sys_socket+0x11b/0x1d0 > >> > >> This patch modifies skb_segment_list to only perform head state release > >> and truesize subtraction if the fragment explicitly owns a socket > >> reference. For GRO-forwarded packets where fragments are not owners, > >> the parent maintains the full truesize and acts as the single anchor for > >> the memory refund upon destruction. > >> > >> Fixes: ed4cccef64c1 ("gro: fix ownership transfer") > >> Signed-off-by: Mohammad Heib <mheib@redhat.com> > >> --- > >> net/core/skbuff.c | 16 ++++++++++++++-- > >> 1 file changed, 14 insertions(+), 2 deletions(-) > >> > >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >> index a00808f7be6a..63d3d76162ef 100644 > >> --- a/net/core/skbuff.c > >> +++ b/net/core/skbuff.c > >> @@ -4656,7 +4656,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >> list_skb = list_skb->next; > >> > >> err = 0; > >> - delta_truesize += nskb->truesize; > >> + > >> + /* Only track truesize delta and release head state for fragments > >> + * that own a socket. GRO-forwarded fragments (sk == NULL) rely on > >> + * the parent SKB for memory accounting. > >> + */ > >> + if (nskb->sk) > >> + delta_truesize += nskb->truesize; > >> + > > > > Similar to the previous point: if all paths that generate GSO packets > > with SKB_GSO_FRAGLIST are generated from skb_gro_receive_list and that > > function always sets skb->sk = NULL, is there even a need for this > > brancy (and comment)? > > > >> if (skb_shared(nskb)) { > >> tmp = skb_clone(nskb, GFP_ATOMIC); > >> if (tmp) { > >> @@ -4684,7 +4691,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > >> > >> skb_push(nskb, -skb_network_offset(nskb) + offset); > >> > >> - skb_release_head_state(nskb); > >> + /* For GRO-forwarded packets, fragments have no head state > >> + * (no sk/destructor) to release. Skip this. > >> + */ > >> + if (nskb->sk) > >> + skb_release_head_state(nskb); > >> + > >> len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb); > >> __copy_skb_header(nskb, skb); > >> > >> -- > >> 2.52.0 > >> > > > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-01 23:58 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-31 2:54 [PATCH net v2] net: skbuff: fix truesize and head state corruption in skb_segment_list mheib 2025-12-31 16:58 ` Willem de Bruijn 2026-01-01 22:24 ` mohammad heib 2026-01-01 23:58 ` Willem de Bruijn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).