netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Lebrun <david.lebrun@uclouvain.be>
To: David Miller <davem@davemloft.net>
Cc: <netdev@vger.kernel.org>
Subject: Re: [RFC PATCH net-next] ipv6: implement consistent hashing for equal-cost multipath routing
Date: Mon, 28 Nov 2016 21:16:19 +0100	[thread overview]
Message-ID: <583C9093.7060404@uclouvain.be> (raw)
In-Reply-To: <20161128.112248.2198395724649783009.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]

On 11/28/2016 05:22 PM, David Miller wrote:
> Thanks for trying to solve this problem.
> 
> But we really don't want this to be Kconfig gated.  If we decide to
> support this it should be a run-time selectable option.  Every
> distribution on the planet is going to turn your Kconfig option on, so
> whatever you think we might be saving by putting this behind Kconfig
> checks won't be realized for large swaths of the userbase.
> 

OK for that

> I also question how necessary %100 consistent hashing of ECMP really
> is.
> 
> If we can do something at extremely low cost and arrive at a very low
> disruption rate, honestly that's probably good enough.
> 
> I assume you've looked over RFC2992.  A low disrutpion algorithm is
> described there and if we could just get rid of the divide it might
> be extremely desirable.

I wanted a solution that has very low disruption for both scale up and
scale down. When a next-hop is added, only 1/N flows should be disrupted
(i.e. in this case, reaffected to the new next-hop). When a next-hop is
removed, only the flows that were handled by this particular next-hop
should be disrupted, and those flows should be equally rebalanced across
the remaining next-hops. Although the solution presented in RFC2992 has
a "reasonably" low disruption (looks like it can get up to 1/2
disruption though), I'm not sure if the equal-rebalancing part holds.

The advantage of my solution over RFC2992 is lowest possible disruption
and equal rebalancing of affected flows. The disadvantage is the lookup
complexity of O(log n) vs O(1). Although from a theoretical viewpoint
O(1) is obviously better, would O(log n) have an effectively measurable
negative impact on scalability ? If we consider 32 next-hops for a route
and 100 pseudo-random numbers generated per next-hop, the lookup
algorithm would have to perform in the worst case log2 3200 = 11
comparisons to select a next-hop for that route.

For my part I prefer the minimal disruption over constant time lookup.
I'll try to come up with some measurements to see the actual impact.

David


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

  reply	other threads:[~2016-11-28 20:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-24 19:59 [RFC PATCH net-next] ipv6: implement consistent hashing for equal-cost multipath routing David Lebrun
2016-11-28 16:22 ` David Miller
2016-11-28 20:16   ` David Lebrun [this message]
2016-11-28 20:32     ` David Miller
2016-11-28 20:58       ` David Lebrun
2016-11-30 13:12       ` Hannes Frederic Sowa
2016-12-01 20:26         ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=583C9093.7060404@uclouvain.be \
    --to=david.lebrun@uclouvain.be \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).