From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, Rusty Russell <rusty@rustcorp.com.au>,
xen-devel <xen-devel@lists.xensource.com>,
Ian Campbell <Ian.Campbell@citrix.com>
Subject: Re: SKB paged fragment lifecycle on receive
Date: Fri, 24 Jun 2011 11:21:15 -0700 [thread overview]
Message-ID: <4E04D59B.8060301@goop.org> (raw)
In-Reply-To: <1308938183.2532.8.camel@edumazet-laptop>
On 06/24/2011 10:56 AM, Eric Dumazet wrote:
> Le vendredi 24 juin 2011 à 10:29 -0700, Jeremy Fitzhardinge a écrit :
>> On 06/24/2011 08:43 AM, Ian Campbell wrote:
>>> We've previously looked into solutions using the skb destructor callback
>>> but that falls over if the skb is cloned since you also need to know
>>> when the clone is destroyed. Jeremy Fitzhardinge and I subsequently
>>> looked at the possibility of a no-clone skb flag (i.e. always forcing a
>>> copy instead of a clone) but IIRC honouring it universally turned into a
>>> very twisty maze with a number of nasty corner cases etc. It also seemed
>>> that the proportion of SKBs which get cloned at least once appeared as
>>> if it could be quite high which would presumably make the performance
>>> impact unacceptable when using the flag. Another issue with using the
>>> skb destructor is that functions such as __pskb_pull_tail will eat (and
>>> free) pages from the start of the frag array such that by the time the
>>> skb destructor is called they are no longer there.
>>>
>>> AIUI Rusty Russell had previously looked into a per-page destructor in
>>> the shinfo but found that it couldn't be made to work (I don't remember
>>> why, or if I even knew at the time). Could that be an approach worth
>>> reinvestigating?
>>>
>>> I can't really think of any other solution which doesn't involve some
>>> sort of driver callback at the time a page is free()d.
> This reminds me the packet mmap (tx path) games we play with pages.
>
> net/packet/af_packet.c : tpacket_destruct_skb(), poking
> TP_STATUS_AVAILABLE back to user to tell him he can reuse space...
Yes. Its similar in the sense that its a tx from a page which isn't
being handed over entirely to the network stack, but has some other
longer-term lifetime.
>> One simple approach would be to simply make sure that we retain a page
>> reference on any granted pages so that the network stack's put pages
>> will never result in them being released back to the kernel. We can
>> also install an skb destructor. If it sees a page being released with a
>> refcount of 1, then we know its our own reference and can free the page
>> immediately. If the refcount is > 1 then we can add it to a queue of
>> pending pages, which can be periodically polled to free pages whose
>> other references have been dropped.
>>
>> However, the question is how large will this queue get? If it remains
>> small then this scheme could be entirely practical. But if almost every
>> page ends up having transient stray references, it could become very
>> awkward.
>>
>> So it comes down to "how useful is an skb destructor callback as a
>> heuristic for page free"?
>>
> Dangerous I would say. You could have a skb1 page transferred to another
> skb2, and call skb1 destructor way before page being released.
Under what circumstances would that happen?
> TCP stack could do that in tcp_collapse() [ it currently doesnt play
> with pages ]
Do you mean "dangerous" in the sense that many pages could end up being
tied up in the pending-release queue? We'd always check the page
refcount, so it should never release pages prematurely.
Thanks,
J
next prev parent reply other threads:[~2011-06-24 18:21 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-24 15:43 SKB paged fragment lifecycle on receive Ian Campbell
2011-06-24 15:43 ` Ian Campbell
2011-06-24 17:29 ` Jeremy Fitzhardinge
2011-06-24 17:56 ` Eric Dumazet
2011-06-24 18:21 ` Jeremy Fitzhardinge [this message]
2011-06-24 19:46 ` David Miller
2011-06-24 20:11 ` Jeremy Fitzhardinge
2011-06-24 20:27 ` David Miller
2011-06-25 11:58 ` Ian Campbell
2011-06-27 20:51 ` Jeremy Fitzhardinge
2011-06-28 10:25 ` Ian Campbell
2011-06-27 14:42 ` Ian Campbell
2011-06-27 22:49 ` David Miller
2011-06-28 10:24 ` Ian Campbell
2011-06-24 22:44 ` Ian Campbell
2011-06-24 22:48 ` Jeremy Fitzhardinge
2011-06-26 10:25 ` Michael S. Tsirkin
2011-06-27 9:41 ` Ian Campbell
2011-06-27 10:21 ` Michael S. Tsirkin
2011-06-27 10:54 ` Ian Campbell
2011-06-27 11:19 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E04D59B.8060301@goop.org \
--to=jeremy@goop.org \
--cc=Ian.Campbell@citrix.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.