* [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash
2017-08-23 7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
@ 2017-08-23 7:58 ` Jakub Sitnicki
2017-08-23 7:58 ` [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet Jakub Sitnicki
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23 7:58 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
Tom Herbert
Allow for functions that fill out the IPv6 flow info to also pass a hash
computed over the skb contents. The hash value will drive the multipath
routing decisions.
This is intended for special treatment of ICMPv6 errors, where we would
like to make a routing decision based on the flow identifying the
offending IPv6 datagram that triggered the error, rather than the flow
of the ICMP error itself.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
include/net/flow.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/net/flow.h b/include/net/flow.h
index f3dc61b..eb60cee3 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -149,6 +149,7 @@ struct flowi6 {
#define fl6_ipsec_spi uli.spi
#define fl6_mh_type uli.mht.type
#define fl6_gre_key uli.gre_key
+ __u32 mp_hash;
} __attribute__((__aligned__(BITS_PER_LONG/8)));
struct flowidn {
--
2.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet
2017-08-23 7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
2017-08-23 7:58 ` [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash Jakub Sitnicki
@ 2017-08-23 7:58 ` Jakub Sitnicki
2017-08-23 7:58 ` [PATCH net-next v3 3/4] ipv6: Fold rt6_info_hash_nhsfn() into its only caller Jakub Sitnicki
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23 7:58 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
Tom Herbert
When forwarding or sending out an ICMPv6 error, look at the embedded
packet that triggered the error and compute a flow hash over its
headers.
This let's us route the ICMP error together with the flow it belongs to
when multipath (ECMP) routing is in use, which in turn makes Path MTU
Discovery work in ECMP load-balanced or anycast setups (RFC 7690).
Granted, end-hosts behind the ECMP router (aka servers) need to reflect
the IPv6 Flow Label for PMTUD to work.
The code is organized to be in parallel with ipv4 stack:
ip_multipath_l3_keys -> ip6_multipath_l3_keys
fib_multipath_hash -> rt6_multipath_hash
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
include/net/ip6_route.h | 1 +
net/ipv6/icmp.c | 1 +
net/ipv6/route.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 52 insertions(+)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 907d39a..882bc3c 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -115,6 +115,7 @@ static inline int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
const struct in6_addr *saddr, int oif, int flags);
+u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb);
struct dst_entry *icmp6_dst_alloc(struct net_device *dev, struct flowi6 *fl6);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 4f82830..dd7608c 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -519,6 +519,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
fl6.fl6_icmp_type = type;
fl6.fl6_icmp_code = code;
fl6.flowi6_uid = sock_net_uid(net, NULL);
+ fl6.mp_hash = rt6_multipath_hash(&fl6, skb);
security_skb_classify_flow(skb, flowi6_to_flowi(&fl6));
sk = icmpv6_xmit_lock(net);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9b02064..6c4dd57 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1214,6 +1214,54 @@ struct dst_entry *ip6_route_input_lookup(struct net *net,
}
EXPORT_SYMBOL_GPL(ip6_route_input_lookup);
+static void ip6_multipath_l3_keys(const struct sk_buff *skb,
+ struct flow_keys *keys)
+{
+ const struct ipv6hdr *outer_iph = ipv6_hdr(skb);
+ const struct ipv6hdr *key_iph = outer_iph;
+ const struct ipv6hdr *inner_iph;
+ const struct icmp6hdr *icmph;
+ struct ipv6hdr _inner_iph;
+
+ if (likely(outer_iph->nexthdr != IPPROTO_ICMPV6))
+ goto out;
+
+ icmph = icmp6_hdr(skb);
+ if (icmph->icmp6_type != ICMPV6_DEST_UNREACH &&
+ icmph->icmp6_type != ICMPV6_PKT_TOOBIG &&
+ icmph->icmp6_type != ICMPV6_TIME_EXCEED &&
+ icmph->icmp6_type != ICMPV6_PARAMPROB)
+ goto out;
+
+ inner_iph = skb_header_pointer(skb,
+ skb_transport_offset(skb) + sizeof(*icmph),
+ sizeof(_inner_iph), &_inner_iph);
+ if (!inner_iph)
+ goto out;
+
+ key_iph = inner_iph;
+out:
+ memset(keys, 0, sizeof(*keys));
+ keys->control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
+ keys->addrs.v6addrs.src = key_iph->saddr;
+ keys->addrs.v6addrs.dst = key_iph->daddr;
+ keys->tags.flow_label = ip6_flowinfo(key_iph);
+ keys->basic.ip_proto = key_iph->nexthdr;
+}
+
+/* if skb is set it will be used and fl6 can be NULL */
+u32 rt6_multipath_hash(const struct flowi6 *fl6, const struct sk_buff *skb)
+{
+ struct flow_keys hash_keys;
+
+ if (skb) {
+ ip6_multipath_l3_keys(skb, &hash_keys);
+ return flow_hash_from_keys(&hash_keys);
+ }
+
+ return get_hash_from_flowi6(fl6);
+}
+
void ip6_route_input(struct sk_buff *skb)
{
const struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -1232,6 +1280,8 @@ void ip6_route_input(struct sk_buff *skb)
tun_info = skb_tunnel_info(skb);
if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))
fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;
+ if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))
+ fl6.mp_hash = rt6_multipath_hash(&fl6, skb);
skb_dst_drop(skb);
skb_dst_set(skb, ip6_route_input_lookup(net, skb->dev, &fl6, flags));
}
--
2.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH net-next v3 3/4] ipv6: Fold rt6_info_hash_nhsfn() into its only caller
2017-08-23 7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
2017-08-23 7:58 ` [PATCH net-next v3 1/4] net: Extend struct flowi6 with multipath hash Jakub Sitnicki
2017-08-23 7:58 ` [PATCH net-next v3 2/4] ipv6: Compute multipath hash for ICMP errors from offending packet Jakub Sitnicki
@ 2017-08-23 7:58 ` Jakub Sitnicki
2017-08-23 7:58 ` [PATCH net-next v3 4/4] ipv6: Use multipath hash from flow info if available Jakub Sitnicki
2017-08-25 1:21 ` [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use David Miller
4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23 7:58 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
Tom Herbert
Commit 644d0e656958 ("ipv6 Use get_hash_from_flowi6 for rt6 hash") has
turned rt6_info_hash_nhsfn() into a one-liner, so it no longer makes
sense to keep it around. Also remove the accompanying comment that has
become outdated.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
net/ipv6/route.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6c4dd57..246e7d7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -445,16 +445,6 @@ static bool rt6_check_expired(const struct rt6_info *rt)
return false;
}
-/* Multipath route selection:
- * Hash based function using packet header and flowlabel.
- * Adapted from fib_info_hashfn()
- */
-static int rt6_info_hash_nhsfn(unsigned int candidate_count,
- const struct flowi6 *fl6)
-{
- return get_hash_from_flowi6(fl6) % candidate_count;
-}
-
static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
struct flowi6 *fl6, int oif,
int strict)
@@ -462,7 +452,7 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
struct rt6_info *sibling, *next_sibling;
int route_choosen;
- route_choosen = rt6_info_hash_nhsfn(match->rt6i_nsiblings + 1, fl6);
+ route_choosen = get_hash_from_flowi6(fl6) % (match->rt6i_nsiblings + 1);
/* Don't change the route, if route_choosen == 0
* (siblings does not include ourself)
*/
--
2.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH net-next v3 4/4] ipv6: Use multipath hash from flow info if available
2017-08-23 7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
` (2 preceding siblings ...)
2017-08-23 7:58 ` [PATCH net-next v3 3/4] ipv6: Fold rt6_info_hash_nhsfn() into its only caller Jakub Sitnicki
@ 2017-08-23 7:58 ` Jakub Sitnicki
2017-08-25 1:21 ` [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use David Miller
4 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2017-08-23 7:58 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Hannes Frederic Sowa, Nikolay Aleksandrov,
Tom Herbert
Allow our callers to influence the choice of ECMP link by honoring the
hash passed together with the flow info. This allows for special
treatment of ICMP errors which we would like to route over the same path
as the IPv6 datagram that triggered the error.
Also go through rt6_multipath_hash(), in the usual case when we aren't
dealing with an ICMP error, so that there is one central place where
multipath hash is computed.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
---
net/ipv6/route.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 246e7d7..4d02734 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -452,7 +452,13 @@ static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
struct rt6_info *sibling, *next_sibling;
int route_choosen;
- route_choosen = get_hash_from_flowi6(fl6) % (match->rt6i_nsiblings + 1);
+ /* We might have already computed the hash for ICMPv6 errors. In such
+ * case it will always be non-zero. Otherwise now is the time to do it.
+ */
+ if (!fl6->mp_hash)
+ fl6->mp_hash = rt6_multipath_hash(fl6, NULL);
+
+ route_choosen = fl6->mp_hash % (match->rt6i_nsiblings + 1);
/* Don't change the route, if route_choosen == 0
* (siblings does not include ourself)
*/
--
2.9.4
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use
2017-08-23 7:58 [PATCH net-next v3 0/4] Route ICMPv6 errors with the flow when ECMP in use Jakub Sitnicki
` (3 preceding siblings ...)
2017-08-23 7:58 ` [PATCH net-next v3 4/4] ipv6: Use multipath hash from flow info if available Jakub Sitnicki
@ 2017-08-25 1:21 ` David Miller
4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2017-08-25 1:21 UTC (permalink / raw)
To: jkbs; +Cc: netdev, hannes, nikolay, tom
From: Jakub Sitnicki <jkbs@redhat.com>
Date: Wed, 23 Aug 2017 09:58:27 +0200
> This patch set is another take at making Path MTU Discovery work when
> server nodes are behind a router employing multipath routing in a
> load-balance or anycast setup (that is, when not every end-node can be
> reached by every path). The problem has been well described in RFC 7690
> [1], but in short - in such setups ICMPv6 PTB errors are not guaranteed
> to be routed back to the server node that sent a reply that exceeds path
> MTU.
...
Ok, looks not to bad.
Applied, thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread