From: David Miller <davem@davemloft.net>
To: rusty@rustcorp.com.au
Cc: netdev@vger.kernel.org, virtualization@lists.linux-foundation.org
Subject: Re: [RFC] virtio: orphan skbs if we're relying on timer to free them
Date: Thu, 21 May 2009 00:15:03 -0700 (PDT) [thread overview]
Message-ID: <20090521.001503.90069315.davem@davemloft.net> (raw)
In-Reply-To: <200905211627.05814.rusty@rustcorp.com.au>
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 21 May 2009 16:27:05 +0930
> On Tue, 19 May 2009 12:10:13 pm David Miller wrote:
>> What you're doing by orphan'ing is creating a situation where a single
>> UDP socket can loop doing sends and monopolize the TX queue of a
>> device. The only control we have over a sender for fairness in
>> datagram protocols is that send buffer allocation.
>
> Urgh, that hadn't even occurred to me. Good point.
Now this all is predicated on this actually mattering. :-)
You could argue that the scheduler as well as the size of the
TX queue should be limiting and enforcing fairness.
Someone really needs to test this. Just skb_orphan() every packet
at the beginning of dev_hard_start_xmit(), then run some test
program with two clients looping out UDP packets to see if one
can monopolize the device and get a significantly larger amount
of TX resources than the other. Repeat for 3, 4, 5, etc. clients.
> I haven't thought this through properly, but how about a hack where
> we don't orphan packets if the ring is over half full?
That would also work. And for the NIU case this would be great
because I DO have a marker bit for triggering interrupts in the TX
descriptors. There's just no "all empty" interrupt on TX (who
designs these things? :( ).
> Then I guess we could overload the watchdog as a more general
> timer-after-no- xmit?
Yes, but it means that teardown of a socket can be delayed up to
the amount of that timer. Factor in all of this crazy
round_jiffies() stuff people do these days and it could cause
pauses for real use cases and drive users batty.
Probably the most profitable avenue is to see if this is a real issue
afterall (see above). If we can get away with having the socket
buffer represent socket --> device space only, that's the most ideal
solution. It will probably also improve performance a lot across the
board, especially on NUMA/SMP boxes as our TX complete events tend to
be in difference places than the SKB producer.
next prev parent reply other threads:[~2009-05-21 7:15 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-18 12:48 [RFC] virtio: orphan skbs if we're relying on timer to free them Rusty Russell
2009-05-19 2:40 ` David Miller
2009-05-21 6:57 ` Rusty Russell
2009-05-21 7:15 ` David Miller [this message]
2009-05-21 17:24 ` Dimitris Michailidis
2009-05-25 11:01 ` Rusty Russell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090521.001503.90069315.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).