From: Bob Gilligan <gilligan@aristanetworks.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Subject: Re: [PATCH 1/2] ipv4: Improve the scaling of the ARP cache for multicast destinations.
Date: Fri, 31 Aug 2012 12:21:28 -0700 [thread overview]
Message-ID: <50410EB8.3040603@aristanetworks.com> (raw)
In-Reply-To: <20120830.210628.365120808137655227.davem@davemloft.net>
On 8/30/12 6:06 PM, David Miller wrote:
> From: Bob Gilligan <gilligan@aristanetworks.com>
> Date: Thu, 30 Aug 2012 17:55:04 -0700
>
>> The mapping from multicast IPv4 address to MAC address can just as
>> easily be done at the time a packet is to be sent. With this change,
>> we maintain one ARP cache entry for each interface that has at least
>> one multicast group member. All routes to IPv4 multicast destinations
>> via a particular interface use the same ARP cache entry. This entry
>> does not store the MAC address to use. Instead, packets for multicast
>> destinations go to a new output function that maps the destination
>> IPv4 multicast address into the MAC address and forms the MAC header.
>
> Doing an ARP MC mapping on every packet is much more expensive than
> doing a copy of the hard header cache.
>
> I do not believe the memory consumption issue you use to justify this
> change is a real issue.
>
> If you are talking to that many multicast groups actively, you do want
> that many neighbour cache entries. This is not different from talking
> to nearly every IP address on a local /8 subnet. You'll have a huge
> number of neighbour table entries in that case as well.
>
> If your the actual steady state number of active groups being spoken
> to is smaller, you can tune the neighbour cache thresholds to collect
> old less used entries more quickly.
>
> And this today is trivial, since routes no longer hold a reference
> to neighbour entries. Therefore any neighbour entry whatsoever can
> be immediately reclaimed at any moment.
The scaling is N-squared: the number of neighbor cache entries
required for your multicast traffic is interfaces * groups. 100
interfaces and 100 groups could generate 10,000 entries. 1,000
interfaces and 1,000 groups could generate a million entries.
But the number of groups is hard to predict: it depends on the
applications in use and the multicast traffic they generate. So, it
is hard to come up with a "budget" for multicast entries in the
neighbor cache for a multicast router.
If you pick a gc_thresh3 that is less than your working set, you'll
end up thrashing the neighbor cache. And calls to neigh_forced_gc()
are expensive: It performs a linear search of the entire neighbor
cache. Also, the calls to neigh_forced_gc() due to a large number of
multicast entries will negatively impact the unicast entries sharing the
neighbor cache: it will free any unreferenced but resolved unicast
entries. Any subsequent packets for those destinations will trigger a
re-ARP. Unnecessary re-ARPing is generally undesirable in a router.
The user who wants to avoid these problems is left with the
alternative of setting gc_thresh3 to a very large number based on a
worst case estimate of the number of unicast plus multicast entries
required.
Seems just simpler and more efficient to keep the multicast entries
out of the neighbor cache entirely.
Bob.
>
> I'm not fond of these patches, and adding yet more special cases to
> the neighbour layer, and therefore will not apply them.
>
next prev parent reply other threads:[~2012-08-31 19:21 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-31 0:55 [PATCH 1/2] ipv4: Improve the scaling of the ARP cache for multicast destinations Bob Gilligan
2012-08-31 1:06 ` David Miller
2012-08-31 19:21 ` Bob Gilligan [this message]
2012-09-02 13:26 ` Nicolas de Pesloüan
2012-09-04 4:22 ` Bob Gilligan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50410EB8.3040603@aristanetworks.com \
--to=gilligan@aristanetworks.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.