netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* skb_attempt_defer_free and reference counting
@ 2025-10-31 11:04 Hudson, Nick
  2025-10-31 11:43 ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Hudson, Nick @ 2025-10-31 11:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

Hi,

I’ve been looking at using skb_attempt_defer_free and had a question about the skb reference counting.

The existing reference release for any skb handed to skb_attempt_defer_free is done in skb_defer_free_flush (via napi_consume_skb). However, it seems to me that calling skb_attempt_defer_free on the same skb to drop the multiple references is problematic as, if the defer_list isn’t serviced between the calls, the list gets corrupted. That is, the skb can’t appear on the list twice.

Would it be possible to move the reference count drop into skb_attempt_defer_free and only add the skb to the list on last reference drop?

Thanks,
Nick

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3067 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: skb_attempt_defer_free and reference counting
  2025-10-31 11:04 skb_attempt_defer_free and reference counting Hudson, Nick
@ 2025-10-31 11:43 ` Eric Dumazet
  2025-10-31 14:02   ` Hudson, Nick
  2025-11-04 14:43   ` skb_attempt_defer_free and reference counting Sabrina Dubroca
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2025-10-31 11:43 UTC (permalink / raw)
  To: Hudson, Nick; +Cc: netdev@vger.kernel.org

On Fri, Oct 31, 2025 at 4:04 AM Hudson, Nick <nhudson@akamai.com> wrote:
>
> Hi,
>
> I’ve been looking at using skb_attempt_defer_free and had a question about the skb reference counting.
>
> The existing reference release for any skb handed to skb_attempt_defer_free is done in skb_defer_free_flush (via napi_consume_skb). However, it seems to me that calling skb_attempt_defer_free on the same skb to drop the multiple references is problematic as, if the defer_list isn’t serviced between the calls, the list gets corrupted. That is, the skb can’t appear on the list twice.
>
> Would it be possible to move the reference count drop into skb_attempt_defer_free and only add the skb to the list on last reference drop?

We do not plan using this helper for arbitrary skbs, but ones fully
owned by TCP and UDP receive paths.

skb_share_check() must have been called before reaching them.

In any case using skb->next could be problematic with shared skb.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: skb_attempt_defer_free and reference counting
  2025-10-31 11:43 ` Eric Dumazet
@ 2025-10-31 14:02   ` Hudson, Nick
  2025-11-04 12:44     ` Other use cases for skb_attempt_defer_free [was Re: skb_attempt_defer_free and reference counting] Hudson, Nick
  2025-11-04 14:43   ` skb_attempt_defer_free and reference counting Sabrina Dubroca
  1 sibling, 1 reply; 6+ messages in thread
From: Hudson, Nick @ 2025-10-31 14:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2476 bytes --]



> On 31 Oct 2025, at 11:43, Eric Dumazet <edumazet@google.com> wrote:
> 
> !-------------------------------------------------------------------|
>  This Message Is From an External Sender
>  This message came from outside your organization.
> |-------------------------------------------------------------------!
> 
> On Fri, Oct 31, 2025 at 4:04 AM Hudson, Nick <nhudson@akamai.com> wrote:
>> 
>> Hi,
>> 
>> I’ve been looking at using skb_attempt_defer_free and had a question about the skb reference counting.
>> 
>> The existing reference release for any skb handed to skb_attempt_defer_free is done in skb_defer_free_flush (via napi_consume_skb). However, it seems to me that calling skb_attempt_defer_free on the same skb to drop the multiple references is problematic as, if the defer_list isn’t serviced between the calls, the list gets corrupted. That is, the skb can’t appear on the list twice.
>> 
>> Would it be possible to move the reference count drop into skb_attempt_defer_free and only add the skb to the list on last reference drop?
> 
> We do not plan using this helper for arbitrary skbs, but ones fully
> owned by TCP and UDP receive paths.

Interesting. 

This patch has shown to give a performance benefit and I’m curious if it problematic in any way.

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fae1a0ab36bd..59ffac9afdad 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2251,7 +2251,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
                if (unlikely(ret < 0))
                        kfree_skb(skb);
                else
-                       consume_skb(skb);
+                       skb_attempt_defer_free(skb);
        }

        return ret;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f220306731da..525b2a2698c6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -7167,6 +7167,7 @@ nodefer:  kfree_skb_napi_cache(skb);
        if (unlikely(kick))
                kick_defer_list_purge(sd, cpu);
 }
+EXPORT_SYMBOL(skb_attempt_defer_free);

 static void skb_splice_csum_page(struct sk_buff *skb, struct page *page,
  


> 
> skb_share_check() must have been called before reaching them.
> 
> In any case using skb->next could be problematic with shared skb.

OK, so the assumption is skb->users is already 1. Perhaps there is an optimisation in skb_defer_free_flush if that is the case?



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3067 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Other use cases for skb_attempt_defer_free [was Re: skb_attempt_defer_free and reference counting]
  2025-10-31 14:02   ` Hudson, Nick
@ 2025-11-04 12:44     ` Hudson, Nick
  0 siblings, 0 replies; 6+ messages in thread
From: Hudson, Nick @ 2025-11-04 12:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 2786 bytes --]

Hi Eric,

Any thoughts / insights on the patch below?

It really improves performance of a vhost-net / tap setup for Rx packets.

Thanks,
Nick
 

> On 31 Oct 2025, at 14:02, Hudson, Nick <nhudson@akamai.com> wrote:
> 
> 
> 
>> On 31 Oct 2025, at 11:43, Eric Dumazet <edumazet@google.com> wrote:
>> 
>> !-------------------------------------------------------------------|
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> |-------------------------------------------------------------------!
>> 
>> On Fri, Oct 31, 2025 at 4:04 AM Hudson, Nick <nhudson@akamai.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I’ve been looking at using skb_attempt_defer_free and had a question about the skb reference counting.
>>> 
>>> The existing reference release for any skb handed to skb_attempt_defer_free is done in skb_defer_free_flush (via napi_consume_skb). However, it seems to me that calling skb_attempt_defer_free on the same skb to drop the multiple references is problematic as, if the defer_list isn’t serviced between the calls, the list gets corrupted. That is, the skb can’t appear on the list twice.
>>> 
>>> Would it be possible to move the reference count drop into skb_attempt_defer_free and only add the skb to the list on last reference drop?
>> 
>> We do not plan using this helper for arbitrary skbs, but ones fully
>> owned by TCP and UDP receive paths.
> 
> Interesting. 
> 
> This patch has shown to give a performance benefit and I’m curious if it problematic in any way.
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fae1a0ab36bd..59ffac9afdad 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2251,7 +2251,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
>                if (unlikely(ret < 0))
>                        kfree_skb(skb);
>                else
> -                       consume_skb(skb);
> +                       skb_attempt_defer_free(skb);
>        }
> 
>        return ret;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f220306731da..525b2a2698c6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -7167,6 +7167,7 @@ nodefer:  kfree_skb_napi_cache(skb);
>        if (unlikely(kick))
>                kick_defer_list_purge(sd, cpu);
> }
> +EXPORT_SYMBOL(skb_attempt_defer_free);
> 
> static void skb_splice_csum_page(struct sk_buff *skb, struct page *page,
> 
> 
> 
>> 
>> skb_share_check() must have been called before reaching them.
>> 
>> In any case using skb->next could be problematic with shared skb.
> 
> OK, so the assumption is skb->users is already 1. Perhaps there is an optimisation in skb_defer_free_flush if that is the case?


[-- Attachment #1.2: Type: text/html, Size: 25303 bytes --]

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3067 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: skb_attempt_defer_free and reference counting
  2025-10-31 11:43 ` Eric Dumazet
  2025-10-31 14:02   ` Hudson, Nick
@ 2025-11-04 14:43   ` Sabrina Dubroca
  2025-11-04 15:04     ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: Sabrina Dubroca @ 2025-11-04 14:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hudson, Nick, netdev@vger.kernel.org

2025-10-31, 04:43:19 -0700, Eric Dumazet wrote:
> On Fri, Oct 31, 2025 at 4:04 AM Hudson, Nick <nhudson@akamai.com> wrote:
> >
> > Hi,
> >
> > I’ve been looking at using skb_attempt_defer_free and had a question about the skb reference counting.
> >
> > The existing reference release for any skb handed to skb_attempt_defer_free is done in skb_defer_free_flush (via napi_consume_skb). However, it seems to me that calling skb_attempt_defer_free on the same skb to drop the multiple references is problematic as, if the defer_list isn’t serviced between the calls, the list gets corrupted. That is, the skb can’t appear on the list twice.
> >
> > Would it be possible to move the reference count drop into skb_attempt_defer_free and only add the skb to the list on last reference drop?
> 
> We do not plan using this helper for arbitrary skbs, but ones fully
> owned by TCP and UDP receive paths.
> 
> skb_share_check() must have been called before reaching them.

Do you think it's worth adding another DEBUG_NET_WARN_ON_ONCE check to
skb_attempt_defer_free(), to validate (and in a way, document) that
assumption?

-- 
Sabrina

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: skb_attempt_defer_free and reference counting
  2025-11-04 14:43   ` skb_attempt_defer_free and reference counting Sabrina Dubroca
@ 2025-11-04 15:04     ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2025-11-04 15:04 UTC (permalink / raw)
  To: Sabrina Dubroca; +Cc: Hudson, Nick, netdev@vger.kernel.org

On Tue, Nov 4, 2025 at 6:43 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> 2025-10-31, 04:43:19 -0700, Eric Dumazet wrote:
> > On Fri, Oct 31, 2025 at 4:04 AM Hudson, Nick <nhudson@akamai.com> wrote:
> > >
> > > Hi,
> > >
> > > I’ve been looking at using skb_attempt_defer_free and had a question about the skb reference counting.
> > >
> > > The existing reference release for any skb handed to skb_attempt_defer_free is done in skb_defer_free_flush (via napi_consume_skb). However, it seems to me that calling skb_attempt_defer_free on the same skb to drop the multiple references is problematic as, if the defer_list isn’t serviced between the calls, the list gets corrupted. That is, the skb can’t appear on the list twice.
> > >
> > > Would it be possible to move the reference count drop into skb_attempt_defer_free and only add the skb to the list on last reference drop?
> >
> > We do not plan using this helper for arbitrary skbs, but ones fully
> > owned by TCP and UDP receive paths.
> >
> > skb_share_check() must have been called before reaching them.
>
> Do you think it's worth adding another DEBUG_NET_WARN_ON_ONCE check to
> skb_attempt_defer_free(), to validate (and in a way, document) that
> assumption?

Let's see first if Nick Hudson proposes a working patch, with some numbers...

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-11-04 15:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 11:04 skb_attempt_defer_free and reference counting Hudson, Nick
2025-10-31 11:43 ` Eric Dumazet
2025-10-31 14:02   ` Hudson, Nick
2025-11-04 12:44     ` Other use cases for skb_attempt_defer_free [was Re: skb_attempt_defer_free and reference counting] Hudson, Nick
2025-11-04 14:43   ` skb_attempt_defer_free and reference counting Sabrina Dubroca
2025-11-04 15:04     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).