From: David Miller <davem@davemloft.net>
To: cl@linux-foundation.org
Cc: rdreier@cisco.com, netdev@vger.kernel.org, yosefe@Voltaire.COM
Subject: Re: IPoIB: Fix multicast packet drops before join is complete
Date: Fri, 05 Jun 2009 18:16:13 -0700 (PDT) [thread overview]
Message-ID: <20090605.181613.147549237.davem@davemloft.net> (raw)
In-Reply-To: <alpine.DEB.1.10.0906050959380.23895@gentwo.org>
From: Christoph Lameter <cl@linux-foundation.org>
Date: Fri, 5 Jun 2009 10:18:15 -0400 (EDT)
> ARP is tied to managing small chunks of information about the network
> infrastructure. Buffering the first few and throwing the rest away is
> appropriate there for what the ARP protocol intends to do.
>
> UDP multicasting can be used for streaming information. And right now the
> IPoIB layer is dropping thousands of packets whenever there was a pause of
> a few minutes or when a new multicast group is used and there is some
> delay that the network need to reestablish the multicast route.
>
> On IPoIB the app can send without being throttled to the speed supported
> by the hardware in these cases. The faster the cpu we get the more packets
> will be dropped in these bursts. The socket layer has an option to not
> wait using MSG_DONTWAIT but in these cases we are not honoring that the
> flag is not set. We simply drop the packets.
>
> If UDP multicasting is used for a purpose like ARP then this is
> appropriate but UDP multicasting has a variety of uses. If you want ARP
> style semantics then the buffer size can be limited in such a way that
> only 3 packets are bufferd by setting SO_SNDBUF and using MSG_DONTWAIT.
I can guarentee you that this will break in the future, because
very soon we are going to skb_orphan() (disassosicate the socket
from the SKB and thus detract the socket buffer allocation) right
before we give packets to the device layer. It kills out TX
performance what we're doing now.
And Rusty has done tests showing that, as far as fairness is
concerned, doing the early skb_orphan() does not create problems
where sockets can "take over" a device keeping other sockets out.
But it will allow non-limiting schemes like your's to consume tons
of memory. Normally device TX queue lengths keep the systems
collection of active sockets in check, so there is a real limit
under normal operation. Your multicast-resolution list has no
limits.
It's fundamentally wrong. You need some limit there.
next prev parent reply other threads:[~2009-06-06 1:16 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-02 14:49 IPoIB: Fix multicast packet drops before join is complete Christoph Lameter
2009-06-04 5:41 ` David Miller
2009-06-04 14:28 ` Or Gerlitz
2009-06-04 15:52 ` Christoph Lameter
2009-06-04 22:30 ` David Miller
2009-06-05 14:18 ` Christoph Lameter
2009-06-05 16:56 ` Roland Dreier
2009-06-05 19:17 ` Christoph Lameter
2009-06-05 21:12 ` Roland Dreier
2009-06-06 1:17 ` David Miller
2009-06-05 21:13 ` Roland Dreier
2009-06-08 15:16 ` Christoph Lameter
2009-06-06 1:16 ` David Miller [this message]
2009-06-08 15:20 ` Christoph Lameter
2009-06-08 21:29 ` David Miller
2009-06-09 20:52 ` Roland Dreier
2009-06-10 0:45 ` David Miller
2009-06-10 3:55 ` Roland Dreier
2009-06-10 4:57 ` David Miller
2009-06-10 5:04 ` Roland Dreier
2009-06-10 5:12 ` David Miller
2009-06-10 10:18 ` Or Gerlitz
2009-06-10 12:01 ` David Miller
2009-06-11 11:45 ` Or Gerlitz
2009-06-11 11:57 ` David Miller
2009-06-11 15:07 ` Christoph Lameter
2009-06-11 23:58 ` David Miller
2009-06-12 14:17 ` Christoph Lameter
2009-06-05 16:54 ` Roland Dreier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090605.181613.147549237.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=cl@linux-foundation.org \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=yosefe@Voltaire.COM \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).