netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use
@ 2017-08-23  7:58 Jakub Sitnicki
  2017-08-23  7:58 ` [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash Jakub Sitnicki
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert

This patch set is another take at making Path MTU Discovery work when
server nodes are behind a router employing multipath routing in a
load-balance or anycast setup (that is, when not every end-node can be
reached by every path). The problem has been well described in RFC 7690
[1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
to be routed back to the server node that sent a reply that exceeds path
MTU.

The proposed solution is two-fold:

 (1) on the server side - reflect the Flow Label [2]. This can be done
     without modifying the application using a new per-netns sysctl knob
     that has been proposed independently of this patchset in the patch
     entitled "ipv6: Add sysctl for per namespace flow label
     reflection" [3].

 (2) on the ECMP router - make the ipv6 routing subsystem look into the
     ICMPv6 error packets and compute the flow-hash from its payload,
     i.e. the offending packet that triggered the error. This is the
     same behavior as ipv4 stack has already.

With both parts in place Path MTU Discovery can work past the ECMP
router when using IPv6.

[1] https://tools.ietf.org/html/rfc7690
[2] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
[3] http://patchwork.ozlabs.org/patch/804870/

v1 -> v2:
 - don't use "extern" in external function declaration in header file
 - style change, put as many arguments as possible on the first line of
   a function call, and align consecutive lines to the first argument
 - expand the cover letter based on the feedback

v2 -> v3:
 - switch to computing flow-hash using flow dissector to align with
   recent changes to multipath routing in ipv4 stack
 - add a sysctl knob for enabling flow label reflection per netns

---

Testing has covered multipath routing of ICMPv6 PTB errors in forward
and local output path in a simple use-case of an HTTP server sending a
reply which is over the path MTU size [3]. I have also checked if the
flows get evenly spread over multiple paths (i.e. if there are no
regressions) [4].

[3] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/pmtud
[4] https://github.com/jsitnicki/tools/tree/master/net/tests/ecmp/load-balance


Jakub Sitnicki (4):
  net: Extend struct flowi6 with multipath hash
  ipv6: Compute multipath hash for ICMP errors from offending packet
  ipv6: Fold rt6_info_hash_nhsfn() into its only caller
  ipv6: Use multipath hash from flow info if available

 include/net/flow.h      |  1 +
 include/net/ip6_route.h |  1 +
 net/ipv6/icmp.c         |  1 +
 net/ipv6/route.c        | 68 +++++++++++++++++++++++++++++++++++++++++--------
 4 files changed, 60 insertions(+), 11 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash
  2017-08-23  7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
@ 2017-08-23  7:58 ` Jakub Sitnicki
  2017-08-23  7:58 ` [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet Jakub Sitnicki
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert

Allow for functions that fill out the IPv6 flow info to also pass a hash
computed over the skb contents. The hash value will drive the multipath
routing decisions.

This is intended for special treatment of ICMPv6 errors, where we would
like to make a routing decision based on the flow identifying the
offending IPv6 datagram that triggered the error, rather than the flow
of the ICMP error itself.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 include/net/flow.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index f3dc61b..eb60cee3 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -149,6 +149,7 @@ struct flowi6 {
 #define fl6_ipsec_spi		uli.spi
 #define fl6_mh_type		uli.mht.type
 #define fl6_gre_key		uli.gre_key
+	__u32			mp_hash;
 } __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 struct flowidn {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet
  2017-08-23  7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
  2017-08-23  7:58 ` [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash Jakub Sitnicki
@ 2017-08-23  7:58 ` Jakub Sitnicki
  2017-08-23  7:58 ` [PATCH net-next v3 3/4] ipv6: Fold rt6_info_hash_nhsfn() into its only caller Jakub Sitnicki
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert

When forwarding or sending out an ICMPv6 error, look at the embedded
packet that triggered the error and compute a flow hash over its
headers.

This let's us route the ICMP error together with the flow it belongs to
when multipath (ECMP) routing is in use, which in turn makes Path MTU
Discovery work in ECMP load-balanced or anycast setups (RFC 7690).

Granted, end-hosts behind the ECMP router (aka servers) need to reflect
the IPv6 Flow Label for PMTUD to work.

The code is organized to be in parallel with ipv4 stack:

  ip_multipath_l3_keys -> ip6_multipath_l3_keys
  fib_multipath_hash   -> rt6_multipath_hash

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 include/net/ip6_route.h |  1 +
 net/ipv6/icmp.c         |  1 +
 net/ipv6/route.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 907d39a..882bc3c 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -115,6 +115,7 @@ static inline int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
 
 struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
 			    const struct in6_addr *saddr, int oif, int flags);
+u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb);
 
 struct dst_entry *icmp6_dst_alloc(struct net_device *dev, struct flowi6 *fl6);
 
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 4f82830..dd7608c 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -519,6 +519,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	fl6.fl6_icmp_type = type;
 	fl6.fl6_icmp_code = code;
 	fl6.flowi6_uid = sock_net_uid(net, NULL);
+	fl6.mp_hash = rt6_multipath_hash(&fl6, skb);
 	security_skb_classify_flow(skb, flowi6_to_flowi(&fl6));
 
 	sk = icmpv6_xmit_lock(net);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9b02064..6c4dd57 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1214,6 +1214,54 @@ struct dst_entry *ip6_route_input_lookup(struct net *net,
 }
 EXPORT_SYMBOL_GPL(ip6_route_input_lookup);
 
+static void ip6_multipath_l3_keys(const struct sk_buff *skb,
+				  struct flow_keys *keys)
+{
+	const struct ipv6hdr *outer_iph = ipv6_hdr(skb);
+	const struct ipv6hdr *key_iph = outer_iph;
+	const struct ipv6hdr *inner_iph;
+	const struct icmp6hdr *icmph;
+	struct ipv6hdr _inner_iph;
+
+	if (likely(outer_iph->nexthdr != IPPROTO_ICMPV6))
+		goto out;
+
+	icmph = icmp6_hdr(skb);
+	if (icmph->icmp6_type != ICMPV6_DEST_UNREACH &&
+	    icmph->icmp6_type != ICMPV6_PKT_TOOBIG &&
+	    icmph->icmp6_type != ICMPV6_TIME_EXCEED &&
+	    icmph->icmp6_type != ICMPV6_PARAMPROB)
+		goto out;
+
+	inner_iph = skb_header_pointer(skb,
+				       skb_transport_offset(skb) + sizeof(*icmph),
+				       sizeof(_inner_iph), &_inner_iph);
+	if (!inner_iph)
+		goto out;
+
+	key_iph = inner_iph;
+out:
+	memset(keys, 0, sizeof(*keys));
+	keys->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+	keys->addrs.v6addrs.src = key_iph->saddr;
+	keys->addrs.v6addrs.dst = key_iph->daddr;
+	keys->tags.flow_label = ip6_flowinfo(key_iph);
+	keys->basic.ip_proto = key_iph->nexthdr;
+}
+
+/* if skb is set it will be used and fl6 can be NULL */
+u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb)
+{
+	struct flow_keys hash_keys;
+
+	if (skb) {
+		ip6_multipath_l3_keys(skb, &hash_keys);
+		return flow_hash_from_keys(&hash_keys);
+	}
+
+	return get_hash_from_flowi6(fl6);
+}
+
 void ip6_route_input(struct sk_buff *skb)
 {
 	const struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -1232,6 +1280,8 @@ void ip6_route_input(struct sk_buff *skb)
 	tun_info = skb_tunnel_info(skb);
 	if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))
 		fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;
+	if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))
+		fl6.mp_hash = rt6_multipath_hash(&fl6, skb);
 	skb_dst_drop(skb);
 	skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, &fl6, flags));
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next v3 3/4] ipv6: Fold rt6_info_hash_nhsfn() into its only caller
  2017-08-23  7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
  2017-08-23  7:58 ` [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash Jakub Sitnicki
  2017-08-23  7:58 ` [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet Jakub Sitnicki
@ 2017-08-23  7:58 ` Jakub Sitnicki
  2017-08-23  7:58 ` [PATCH net-next v3 4/4] ipv6: Use multipath hash from flow info if available Jakub Sitnicki
  2017-08-25  1:21 ` [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert

Commit 644d0e656958 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
sense to keep it around. Also remove the accompanying comment that has
become outdated.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 net/ipv6/route.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6c4dd57..246e7d7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -445,16 +445,6 @@ static bool rt6_check_expired(const struct rt6_info *rt)
 	return false;
 }
 
-/* Multipath route selection:
- *   Hash based function using packet header and flowlabel.
- * Adapted from fib_info_hashfn()
- */
-static int rt6_info_hash_nhsfn(unsigned int candidate_count,
-			       const struct flowi6 *fl6)
-{
-	return get_hash_from_flowi6(fl6) % candidate_count;
-}
-
 static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
 					     struct flowi6 *fl6, int oif,
 					     int strict)
@@ -462,7 +452,7 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
 	struct rt6_info *sibling, *next_sibling;
 	int route_choosen;
 
-	route_choosen = rt6_info_hash_nhsfn(match->rt6i_nsiblings + 1, fl6);
+	route_choosen = get_hash_from_flowi6(fl6) % (match->rt6i_nsiblings + 1);
 	/* Don't change the route, if route_choosen == 0
 	 * (siblings does not include ourself)
 	 */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next v3 4/4] ipv6: Use multipath hash from flow info if available
  2017-08-23  7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2017-08-23  7:58 ` [PATCH net-next v3 3/4] ipv6: Fold rt6_info_hash_nhsfn() into its only caller Jakub Sitnicki
@ 2017-08-23  7:58 ` Jakub Sitnicki
  2017-08-25  1:21 ` [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23  7:58 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
	Tom Herbert

Allow our callers to influence the choice of ECMP link by honoring the
hash passed together with the flow info. This allows for special
treatment of ICMP errors which we would like to route over the same path
as the IPv6 datagram that triggered the error.

Also go through rt6_multipath_hash(), in the usual case when we aren't
dealing with an ICMP error, so that there is one central place where
multipath hash is computed.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
 net/ipv6/route.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 246e7d7..4d02734 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -452,7 +452,13 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
 	struct rt6_info *sibling, *next_sibling;
 	int route_choosen;
 
-	route_choosen = get_hash_from_flowi6(fl6) % (match->rt6i_nsiblings + 1);
+	/* We might have already computed the hash for ICMPv6 errors. In such
+	 * case it will always be non-zero. Otherwise now is the time to do it.
+	 */
+	if (!fl6->mp_hash)
+		fl6->mp_hash = rt6_multipath_hash(fl6, NULL);
+
+	route_choosen = fl6->mp_hash % (match->rt6i_nsiblings + 1);
 	/* Don't change the route, if route_choosen == 0
 	 * (siblings does not include ourself)
 	 */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use
  2017-08-23  7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
                   ` (3 preceding siblings ...)
  2017-08-23  7:58 ` [PATCH net-next v3 4/4] ipv6: Use multipath hash from flow info if available Jakub Sitnicki
@ 2017-08-25  1:21 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2017-08-25  1:21 UTC (permalink / raw)
  To: jkbs; +Cc: netdev, hannes, nikolay, tom

From: Jakub Sitnicki <jkbs@redhat.com>
Date: Wed, 23 Aug 2017 09:58:27 +0200

> This patch set is another take at making Path MTU Discovery work when
> server nodes are behind a router employing multipath routing in a
> load-balance or anycast setup (that is, when not every end-node can be
> reached by every path). The problem has been well described in RFC 7690
> [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
> to be routed back to the server node that sent a reply that exceeds path
> MTU.
 ...

Ok, looks not to bad.

Applied, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-25  1:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-23  7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
2017-08-23  7:58 ` [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash Jakub Sitnicki
2017-08-23  7:58 ` [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet Jakub Sitnicki
2017-08-23  7:58 ` [PATCH net-next v3 3/4] ipv6: Fold rt6_info_hash_nhsfn() into its only caller Jakub Sitnicki
2017-08-23  7:58 ` [PATCH net-next v3 4/4] ipv6: Use multipath hash from flow info if available Jakub Sitnicki
2017-08-25  1:21 ` [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).