linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/3] ipv4: Hash-based multipath routing
@ 2015-08-28 20:00 pch
       [not found] ` <1440792050-2109-1-git-send-email-pch-chEQUL3jiZBWk0Htik3J/w@public.gmane.org>
  2015-08-28 20:00 ` [PATCH v2 net-next 3/3] ipv4: ICMP packet inspection for L3 multipath pch
  0 siblings, 2 replies; 13+ messages in thread
From: pch @ 2015-08-28 20:00 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, linux-api, Roopa Prabhu,
	Scott Feldman, Eric W. Biederman, Nicolas Dichtel, Thomas Graf,
	Jiri Benc, Peter Nørlund

When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed
from more or less being destination-based into being quasi-random per-packet
scheduling. This increases the risk of out-of-order packets and makes it
impossible to use multipath together with anycast services.

This patch series seeks to extend the multipath system to support both L3 and
L4-based multipath while still supporting per-packet multipath.

The multipath algorithm is set as a per-route attribute (RTA_MP_ALGO) with some
degree of binary compatibility with the old implementation (2.6.12 - 2.6.22),
but without source level compatibility since attributes have different names:

RT_MP_ALG_L3_HASH:
L3 hash-based distribution. This was IP_MP_ALG_NONE, which with the route
cached behaved somewhat like L3-based distribution. This is the new default.

RT_MP_ALG_PER_PACKET:
Per-packet distribution. Was IP_MP_ALG_RR. Uses round-robin.

RT_MP_ALG_DRR, RT_MP_ALG_RANDOM, RT_MP_ALG_WRANDOM:
Unsupported values, but reserved because they existed in 2.6.12 - 2.6.22.

RT_MP_ALG_L4_HASH:
L4 hash-based distribution. This is new.

The traditional modulo approach is replaced by a threshold-based approach,
described in RFC 2992. This reduces disruption in case of link failures or
route changes.

To better support anycast environments where PMTU usually breaks with
multipath, certain ICMP packets are hashed using the IP addresses within the
ICMP payload when using L3 hashing. This ensures that ICMP packets are routed
over the same path as the flow they belong to. It is not enabled with L4
hashing, since we can only consistently rely on L4 information, when PMTU is
used, and PMTU may be used in one direction while not being used in the other.

As a side effect, the multipath spinlock was removed and the code got faster.
I measured ip_mkroute_input (excl. __mkroute_input) on a Xeon X3350 (4 cores,
2.66GHz) with two paths:

Old per-packet: ~393.9 cycles(tsc)
New per-packet:  ~75.2 cycles(tsc)
New L3:         ~107.9 cycles(tsc)
New L4:         ~129.1 cycles(tsc)

The timings are approximately the same with a single core, except for the old
per-packet which gets faster (~199.8 cycles) most likely because there is no
contention on the spinlock.

If this patch is accepted, a follow-up patch to iproute2 will also be
submitted.

Changes in v2:
- Replaced 8-bit xor hash with 31-bit jenkins hash
- Don't scale weights (since 31-bit)
- Avoided unnecesary renaming of variables
- Rely on DF-bit instead of fragment offset when checking for fragmentation
- upper_bound is now inclusive to avoid overflow
- Use a callback to postpone extracting flow information until necessary
- Skipped ICMP inspection entirely with L4 hashing
- Handle newly added sysctl ignore_routes_with_linkdown

Best Regards,
 Peter Nørlund

Peter Nørlund (3):
      ipv4: Lock-less per-packet multipath
      ipv4: L3 and L4 hash-based multipath routing
      ipv4: ICMP packet inspection for L3 multipath

 include/net/ip_fib.h           |  26 ++++++-
 include/net/route.h            |  12 ++-
 include/uapi/linux/rtnetlink.h |  14 +++-
 net/ipv4/Kconfig               |   1 +
 net/ipv4/fib_frontend.c        |   4 +
 net/ipv4/fib_semantics.c       | 168 ++++++++++++++++++++++++++---------------
 net/ipv4/icmp.c                |  34 ++++++++-
 net/ipv4/route.c               | 112 +++++++++++++++++++++++++--
 8 files changed, 298 insertions(+), 73 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-08-31  9:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-28 20:00 [PATCH v2 net-next 0/3] ipv4: Hash-based multipath routing pch
     [not found] ` <1440792050-2109-1-git-send-email-pch-chEQUL3jiZBWk0Htik3J/w@public.gmane.org>
2015-08-28 20:00   ` [PATCH v2 net-next 1/3] ipv4: Lock-less per-packet multipath pch-chEQUL3jiZBWk0Htik3J/w
2015-08-28 20:00   ` [PATCH v2 net-next 2/3] ipv4: L3 and L4 hash-based multipath routing pch-chEQUL3jiZBWk0Htik3J/w
2015-08-30 22:48     ` Tom Herbert
2015-08-29 20:14   ` [PATCH v2 net-next 0/3] ipv4: Hash-based " David Miller
     [not found]     ` <20150829.131429.360433621593751136.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2015-08-29 20:31       ` Peter Nørlund
2015-08-29 20:46         ` David Miller
     [not found]           ` <20150829.134628.1013990034021542524.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2015-08-29 20:55             ` Scott Feldman
2015-08-29 20:59             ` Tom Herbert
2015-08-30 21:28               ` Peter Nørlund
2015-08-30 22:29                 ` Tom Herbert
2015-08-31  9:02                   ` Thomas Graf
2015-08-28 20:00 ` [PATCH v2 net-next 3/3] ipv4: ICMP packet inspection for L3 multipath pch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).