All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: skbuff: fix truesize and head state corruption in skb_segment_list
@ 2025-12-30  9:11 mheib
  2025-12-30 16:31 ` Willem de Bruijn
  0 siblings, 1 reply; 3+ messages in thread
From: mheib @ 2025-12-30  9:11 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, kernelxing, kuniyu,
	Mohammad Heib

From: Mohammad Heib <mheib@redhat.com>

When skb_segment_list is called during packet forwarding through
a bridge or VXLAN, it assumes that every fragment in a frag_list
carries its own socket ownership and head state. While this is true for
GSO packets created by the transmit path (via __ip_append_data), it is
not true for packets built by the GRO receive path.

In the GRO path, fragments are "orphans" (skb->sk == NULL) and were
never charged to a socket. However, the current logic in
skb_segment_list unconditionally adds every fragment's truesize to
delta_truesize and subsequently subtracts this from the parent SKB.

This results a memory accounting leak, Since GRO fragments were never
charged to the socket in the first place, the "refund" results in the
parent SKB returning less memory than originally charged when it is
finally freed. This leads to a permanent leak in sk_wmem_alloc, which
prevents the socket from being destroyed, resulting in a persistent memory
leak of the socket object and its related metadata.

The leak can be observed via KMEMLEAK when tearing down the networking
environment:

unreferenced object 0xffff8881e6eb9100 (size 2048):
  comm "ping", pid 6720, jiffies 4295492526
  backtrace:
    kmem_cache_alloc_noprof+0x5c6/0x800
    sk_prot_alloc+0x5b/0x220
    sk_alloc+0x35/0xa00
    inet6_create.part.0+0x303/0x10d0
    __sock_create+0x248/0x640
    __sys_socket+0x11b/0x1d0

This patch modifies skb_segment_list to only perform head state release
and truesize subtraction if the fragment explicitly owns a socket
reference. For GRO-forwarded packets where fragments are not owners,
the parent maintains the full truesize and acts as the single anchor for
the memory refund upon destruction.

Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
---
 net/core/skbuff.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a00808f7be6a..aee9be42409b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4641,6 +4641,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
 	struct sk_buff *tail = NULL;
 	struct sk_buff *nskb, *tmp;
 	int len_diff, err;
+	bool is_flist = !!(skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST);
 
 	skb_push(skb, -skb_network_offset(skb) + offset);
 
@@ -4656,7 +4657,15 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
 		list_skb = list_skb->next;
 
 		err = 0;
-		delta_truesize += nskb->truesize;
+
+		/* Only track truesize delta if the fragment is being orphaned.
+		 * In the GRO path, fragments don't have a socket owner (sk=NULL),
+		 * so the parent must maintain the total truesize to prevent
+		 * memory accounting leaks.
+		 */
+		if (!is_flist || nskb->sk)
+			delta_truesize += nskb->truesize;
+
 		if (skb_shared(nskb)) {
 			tmp = skb_clone(nskb, GFP_ATOMIC);
 			if (tmp) {
@@ -4684,7 +4693,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
 
 		skb_push(nskb, -skb_network_offset(nskb) + offset);
 
-		skb_release_head_state(nskb);
+		/* For GRO-forwarded packets, fragments have no head state
+		 * (no sk/destructor) to release. Skip this.
+		 */
+		if (!is_flist || nskb->sk)
+			skb_release_head_state(nskb);
+
 		len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
 		__copy_skb_header(nskb, skb);
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net: skbuff: fix truesize and head state corruption in skb_segment_list
  2025-12-30  9:11 [PATCH net] net: skbuff: fix truesize and head state corruption in skb_segment_list mheib
@ 2025-12-30 16:31 ` Willem de Bruijn
  2025-12-31  3:01   ` mohammad heib
  0 siblings, 1 reply; 3+ messages in thread
From: Willem de Bruijn @ 2025-12-30 16:31 UTC (permalink / raw)
  To: mheib, netdev
  Cc: davem, edumazet, kuba, pabeni, horms, kernelxing, kuniyu,
	Mohammad Heib, steffen.klassert, atenart

mheib@ wrote:
> From: Mohammad Heib <mheib@redhat.com>
> 
> When skb_segment_list is called during packet forwarding through
> a bridge or VXLAN, it assumes that every fragment in a frag_list
> carries its own socket ownership and head state. While this is true for
> GSO packets created by the transmit path (via __ip_append_data), it is
> not true for packets built by the GRO receive path.

We have to separate packets that use frag_list, a broader category,
from those that use fraglist gso chaining. This code path is only
exercised by the latter.

> In the GRO path, fragments are "orphans" (skb->sk == NULL) and were
> never charged to a socket. However, the current logic in
> skb_segment_list unconditionally adds every fragment's truesize to
> delta_truesize and subsequently subtracts this from the parent SKB.

This was not present in the original fraglist chaining patch cited in
the Fixes tag. It was added in commit ed4cccef64c1 ("gro: fix
ownership transfer"). Which was a follow-on to commit 5e10da5385d2
("skbuff: allow 'slow_gro' for skb carring sock reference") removing
the skb->destructor reference.
 
> This results a memory accounting leak, Since GRO fragments were never
> charged to the socket in the first place, the "refund" results in the
> parent SKB returning less memory than originally charged when it is
> finally freed. This leads to a permanent leak in sk_wmem_alloc, which
> prevents the socket from being destroyed, resulting in a persistent memory
> leak of the socket object and its related metadata.
> 
> The leak can be observed via KMEMLEAK when tearing down the networking
> environment:
> 
> unreferenced object 0xffff8881e6eb9100 (size 2048):
>   comm "ping", pid 6720, jiffies 4295492526
>   backtrace:
>     kmem_cache_alloc_noprof+0x5c6/0x800
>     sk_prot_alloc+0x5b/0x220
>     sk_alloc+0x35/0xa00
>     inet6_create.part.0+0x303/0x10d0
>     __sock_create+0x248/0x640
>     __sys_socket+0x11b/0x1d0
> 
> This patch modifies skb_segment_list to only perform head state release
> and truesize subtraction if the fragment explicitly owns a socket
> reference. For GRO-forwarded packets where fragments are not owners,
> the parent maintains the full truesize and acts as the single anchor for
> the memory refund upon destruction.

Thanks for the report and fix. It can probably be simplified a bit
based on knowledge that only fraglist chaining skbs reach this path.
And the Fixes tag should reflect the patch that changed this
accounting in the GRO patch. Matching that in the GSO path makes
sense.

> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Mohammad Heib <mheib@redhat.com>
> ---
>  net/core/skbuff.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a00808f7be6a..aee9be42409b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4641,6 +4641,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>  	struct sk_buff *tail = NULL;
>  	struct sk_buff *nskb, *tmp;
>  	int len_diff, err;
> +	bool is_flist = !!(skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST);

This is guaranteed when entering skb_segment_list.

>  
>  	skb_push(skb, -skb_network_offset(skb) + offset);
>  
> @@ -4656,7 +4657,15 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>  		list_skb = list_skb->next;
>  
>  		err = 0;
> -		delta_truesize += nskb->truesize;
> +
> +		/* Only track truesize delta if the fragment is being orphaned.
> +		 * In the GRO path, fragments don't have a socket owner (sk=NULL),
> +		 * so the parent must maintain the total truesize to prevent
> +		 * memory accounting leaks.
> +		 */
> +		if (!is_flist || nskb->sk)
> +			delta_truesize += nskb->truesize;
> +
>  		if (skb_shared(nskb)) {
>  			tmp = skb_clone(nskb, GFP_ATOMIC);
>  			if (tmp) {
> @@ -4684,7 +4693,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>  
>  		skb_push(nskb, -skb_network_offset(nskb) + offset);
>  
> -		skb_release_head_state(nskb);
> +		/* For GRO-forwarded packets, fragments have no head state
> +		 * (no sk/destructor) to release. Skip this.
> +		 */
> +		if (!is_flist || nskb->sk)
> +			skb_release_head_state(nskb);
> +
>  		len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
>  		__copy_skb_header(nskb, skb);
>  
> -- 
> 2.52.0
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net: skbuff: fix truesize and head state corruption in skb_segment_list
  2025-12-30 16:31 ` Willem de Bruijn
@ 2025-12-31  3:01   ` mohammad heib
  0 siblings, 0 replies; 3+ messages in thread
From: mohammad heib @ 2025-12-31  3:01 UTC (permalink / raw)
  To: Willem de Bruijn, netdev
  Cc: davem, edumazet, kuba, pabeni, horms, kernelxing, kuniyu,
	steffen.klassert, atenart

Hi Willem,

Thank you for the review and for tracing the history of the accounting 
logic.


On 12/30/25 6:31 PM, Willem de Bruijn wrote:
> mheib@ wrote:
>> From: Mohammad Heib <mheib@redhat.com>
>>
>> When skb_segment_list is called during packet forwarding through
>> a bridge or VXLAN, it assumes that every fragment in a frag_list
>> carries its own socket ownership and head state. While this is true for
>> GSO packets created by the transmit path (via __ip_append_data), it is
>> not true for packets built by the GRO receive path.
> 
> We have to separate packets that use frag_list, a broader category,
> from those that use fraglist gso chaining. This code path is only
> exercised by the latter.
That makes perfect sense. I will remove the redundant is_flist check 
since this path is only reached for fraglist GSO, and I'll update the 
Fixes tag to ed4cccef64c1 in v2.
> 
>> In the GRO path, fragments are "orphans" (skb->sk == NULL) and were
>> never charged to a socket. However, the current logic in
>> skb_segment_list unconditionally adds every fragment's truesize to
>> delta_truesize and subsequently subtracts this from the parent SKB.
> 
> This was not present in the original fraglist chaining patch cited in
> the Fixes tag. It was added in commit ed4cccef64c1 ("gro: fix
> ownership transfer"). Which was a follow-on to commit 5e10da5385d2
> ("skbuff: allow 'slow_gro' for skb carring sock reference") removing
> the skb->destructor reference.
>   
>> This results a memory accounting leak, Since GRO fragments were never
>> charged to the socket in the first place, the "refund" results in the
>> parent SKB returning less memory than originally charged when it is
>> finally freed. This leads to a permanent leak in sk_wmem_alloc, which
>> prevents the socket from being destroyed, resulting in a persistent memory
>> leak of the socket object and its related metadata.
>>
>> The leak can be observed via KMEMLEAK when tearing down the networking
>> environment:
>>
>> unreferenced object 0xffff8881e6eb9100 (size 2048):
>>    comm "ping", pid 6720, jiffies 4295492526
>>    backtrace:
>>      kmem_cache_alloc_noprof+0x5c6/0x800
>>      sk_prot_alloc+0x5b/0x220
>>      sk_alloc+0x35/0xa00
>>      inet6_create.part.0+0x303/0x10d0
>>      __sock_create+0x248/0x640
>>      __sys_socket+0x11b/0x1d0
>>
>> This patch modifies skb_segment_list to only perform head state release
>> and truesize subtraction if the fragment explicitly owns a socket
>> reference. For GRO-forwarded packets where fragments are not owners,
>> the parent maintains the full truesize and acts as the single anchor for
>> the memory refund upon destruction.
> 
> Thanks for the report and fix. It can probably be simplified a bit
> based on knowledge that only fraglist chaining skbs reach this path.
> And the Fixes tag should reflect the patch that changed this
> accounting in the GRO patch. Matching that in the GSO path makes
> sense.
> 
>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
>> Signed-off-by: Mohammad Heib <mheib@redhat.com>
>> ---
>>   net/core/skbuff.c | 18 ++++++++++++++++--
>>   1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a00808f7be6a..aee9be42409b 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -4641,6 +4641,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>>   	struct sk_buff *tail = NULL;
>>   	struct sk_buff *nskb, *tmp;
>>   	int len_diff, err;
>> +	bool is_flist = !!(skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST);
> 
> This is guaranteed when entering skb_segment_list.
> 
>>   
>>   	skb_push(skb, -skb_network_offset(skb) + offset);
>>   
>> @@ -4656,7 +4657,15 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>>   		list_skb = list_skb->next;
>>   
>>   		err = 0;
>> -		delta_truesize += nskb->truesize;
>> +
>> +		/* Only track truesize delta if the fragment is being orphaned.
>> +		 * In the GRO path, fragments don't have a socket owner (sk=NULL),
>> +		 * so the parent must maintain the total truesize to prevent
>> +		 * memory accounting leaks.
>> +		 */
>> +		if (!is_flist || nskb->sk)
>> +			delta_truesize += nskb->truesize;
>> +
I considered combining the two if (nskb->sk) blocks into one to simplify 
the code further, but I decided against it to avoid reordering the 
skb_clone and skb_push logic, which felt too risky for this path.
>>   		if (skb_shared(nskb)) {
>>   			tmp = skb_clone(nskb, GFP_ATOMIC);
>>   			if (tmp) {
>> @@ -4684,7 +4693,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>>   
>>   		skb_push(nskb, -skb_network_offset(nskb) + offset);
>>   
>> -		skb_release_head_state(nskb);
>> +		/* For GRO-forwarded packets, fragments have no head state
>> +		 * (no sk/destructor) to release. Skip this.
>> +		 */
>> +		if (!is_flist || nskb->sk)
>> +			skb_release_head_state(nskb);
>> +
>>   		len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
>>   		__copy_skb_header(nskb, skb);
>>   
>> -- 
>> 2.52.0
>>
> 
> 
Thanks


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-12-31  3:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-30  9:11 [PATCH net] net: skbuff: fix truesize and head state corruption in skb_segment_list mheib
2025-12-30 16:31 ` Willem de Bruijn
2025-12-31  3:01   ` mohammad heib

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.