From: Dimitris Michailidis <dm@chelsio.com>
To: David Miller <davem@davemloft.net>
Cc: rusty@rustcorp.com.au, netdev@vger.kernel.org,
virtualization@lists.linux-foundation.org
Subject: Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
Date: Thu, 21 May 2009 10:24:31 -0700 [thread overview]
Message-ID: <4A158E4F.9070005@chelsio.com> (raw)
In-Reply-To: <20090521.001503.90069315.davem@davemloft.net>
David Miller wrote:
> From: Rusty Russell <rusty@rustcorp.com.au>
> Date: Thu, 21 May 2009 16:27:05 +0930
>
>> On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
>>> What you're doing by orphan'ing is creating a situation where a single
>>> UDP socket can loop doing sends and monopolize the TX queue of a
>>> device. The only control we have over a sender for fairness in
>>> datagram protocols is that send buffer allocation.
>> Urgh, that hadn't even occurred to me. Good point.
>
> Now this all is predicated on this actually mattering. :-)
>
> You could argue that the scheduler as well as the size of the
> TX queue should be limiting and enforcing fairness.
>
> Someone really needs to test this. Just skb_orphan() every packet
> at the beginning of dev_hard_start_xmit(), then run some test
> program with two clients looping out UDP packets to see if one
> can monopolize the device and get a significantly larger amount
> of TX resources than the other. Repeat for 3, 4, 5, etc. clients.
The cxgb3 driver has had skb_orphan in its transmit routine forever (also
due to lack of Tx interrupts) and I am not aware of adverse effects caused
by doing so. It does skip skb_orphan when skb_shared but probably nobody
sends sharead skbs with destructors.
The only application I know of that has trouble with lazy skb freeing is
pktgen because it treats freeing as an indication that the packet has been
transmitted so it's thrown off if packets sit there for a while. (Also
freeing just indicates that the DMA is done, not that the packet has been
sent, and modern devices have quite a bit of buffering.)
>
>> I haven't thought this through properly, but how about a hack where
>> we don't orphan packets if the ring is over half full?
>
> That would also work. And for the NIU case this would be great
> because I DO have a marker bit for triggering interrupts in the TX
> descriptors. There's just no "all empty" interrupt on TX (who
> designs these things? :( ).
>
>> Then I guess we could overload the watchdog as a more general
>> timer-after-no- xmit?
>
> Yes, but it means that teardown of a socket can be delayed up to
> the amount of that timer. Factor in all of this crazy
> round_jiffies() stuff people do these days and it could cause
> pauses for real use cases and drive users batty.
>
> Probably the most profitable avenue is to see if this is a real issue
> afterall (see above). If we can get away with having the socket
> buffer represent socket --> device space only, that's the most ideal
> solution. It will probably also improve performance a lot across the
> board, especially on NUMA/SMP boxes as our TX complete events tend to
> be in difference places than the SKB producer.
There's a comment in the cxgb3 driver where it calls skb_orphan that
explains the rationale and includes some of what you're saying.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-05-21 18:04 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-18 12:48 [RFC] virtio: orphan skbs if we're relying on timer to free them Rusty Russell
2009-05-19 2:40 ` David Miller
2009-05-21 6:57 ` Rusty Russell
2009-05-21 7:15 ` David Miller
2009-05-21 17:24 ` Dimitris Michailidis [this message]
2009-05-25 11:01 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A158E4F.9070005@chelsio.com \
--to=dm@chelsio.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).