* IPv6 multipath routes
@ 2012-09-06 17:30 Vincent Bernat
2012-09-06 17:30 ` [PATCH] Fix "ip -6 route add ... nexthop" Vincent Bernat
2012-09-11 12:57 ` IPv6 multipath routes Ulrich Weber
0 siblings, 2 replies; 47+ messages in thread
From: Vincent Bernat @ 2012-09-06 17:30 UTC (permalink / raw)
To: netdev
Hi!
It appears that "ip -6 route add" expects IPv4 addresses with nexthop directives:
$ ip -6 route add to 2a01:c9c0:a1:982::/64 proto bird \
nexthop via fe80::ea39:35ff:febd:f9e dev bai2.2008 weight 1 \
nexthop via fe80::ea39:35ff:febd:fd6 dev bai1.2009 weight 1
Error: an IP address is expected rather than "fe80::ea39:35ff:febd:f9e"
The following patch fix this problem. However, it does not work. I now have:
RTNETLINK answers: No such device
I have little knowledge of netlink so it is likely that my patch is
buggy. However, I have found this problem by trying to debug what
appears to be a valid netlink message refused by the kernel with the
same error. Therefore, I suspect that there is also a bug in the kernel.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH] Fix "ip -6 route add ... nexthop"
2012-09-06 17:30 IPv6 multipath routes Vincent Bernat
@ 2012-09-06 17:30 ` Vincent Bernat
2012-09-06 17:53 ` Vincent Bernat
2012-09-11 12:57 ` IPv6 multipath routes Ulrich Weber
1 sibling, 1 reply; 47+ messages in thread
From: Vincent Bernat @ 2012-09-06 17:30 UTC (permalink / raw)
To: netdev; +Cc: Vincent Bernat
IPv6 multipath routes were not accepted by "ip route" because an IPv4
address was expected for each gateway. Use `get_addr()` instead of
`get_addr32()`.
Signed-off-by: Vincent Bernat <bernat@luffy.cx>
---
ip/iproute.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/ip/iproute.c b/ip/iproute.c
index 522dd28..c78d4f7 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -624,16 +624,20 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
}
-int parse_one_nh(struct rtattr *rta, struct rtnexthop *rtnh, int *argcp, char ***argvp)
+int parse_one_nh(struct rtmsg *r, struct rtattr *rta, struct rtnexthop *rtnh, int *argcp, char ***argvp)
{
int argc = *argcp;
char **argv = *argvp;
while (++argv, --argc > 0) {
if (strcmp(*argv, "via") == 0) {
+ inet_prefix addr;
NEXT_ARG();
- rta_addattr32(rta, 4096, RTA_GATEWAY, get_addr32(*argv));
- rtnh->rtnh_len += sizeof(struct rtattr) + 4;
+ get_addr(&addr, *argv, r->rtm_family);
+ if (r->rtm_family == AF_UNSPEC)
+ r->rtm_family = addr.family;
+ rta_addattr_l(rta, 4096, RTA_GATEWAY, &addr.data, addr.bytelen);
+ rtnh->rtnh_len += sizeof(struct rtattr) + addr.bytelen;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
if ((rtnh->rtnh_ifindex = ll_name_to_index(*argv)) == 0) {
@@ -685,7 +689,7 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
memset(rtnh, 0, sizeof(*rtnh));
rtnh->rtnh_len = sizeof(*rtnh);
rta->rta_len += rtnh->rtnh_len;
- parse_one_nh(rta, rtnh, &argc, &argv);
+ parse_one_nh(r, rta, rtnh, &argc, &argv);
rtnh = RTNH_NEXT(rtnh);
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH] Fix "ip -6 route add ... nexthop"
2012-09-06 17:30 ` [PATCH] Fix "ip -6 route add ... nexthop" Vincent Bernat
@ 2012-09-06 17:53 ` Vincent Bernat
2012-09-12 8:29 ` [RFC PATCH net-next 0/1] Add support of ECMPv6 Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Vincent Bernat @ 2012-09-06 17:53 UTC (permalink / raw)
To: netdev
❦ 6 septembre 2012 19:30 CEST, Vincent Bernat <bernat@luffy.cx> :
> IPv6 multipath routes were not accepted by "ip route" because an IPv4
> address was expected for each gateway. Use `get_addr()` instead of
> `get_addr32()`.
Well, looking at the kernel, I have just discovered that there is no
support for IPv6 multipath. This explains a lot of things for me and
this patch is therefore useless (but could still be applied for future
purposes?).
--
panic("CPU too expensive - making holiday in the ANDES!");
2.2.16 /usr/src/linux/arch/mips/kernel/traps.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: IPv6 multipath routes
2012-09-06 17:30 IPv6 multipath routes Vincent Bernat
2012-09-06 17:30 ` [PATCH] Fix "ip -6 route add ... nexthop" Vincent Bernat
@ 2012-09-11 12:57 ` Ulrich Weber
1 sibling, 0 replies; 47+ messages in thread
From: Ulrich Weber @ 2012-09-11 12:57 UTC (permalink / raw)
To: Vincent Bernat; +Cc: netdev
Hi Vincent,
there is no multipath support for IPv6 in the kernel.
RTA_MULTIPATH attributes are not parsed for IPv6.
Cheers
Ulrich
On 09/06/2012 07:30 PM, Vincent Bernat wrote:
> Hi!
>
> It appears that "ip -6 route add" expects IPv4 addresses with nexthop directives:
>
> $ ip -6 route add to 2a01:c9c0:a1:982::/64 proto bird \
> nexthop via fe80::ea39:35ff:febd:f9e dev bai2.2008 weight 1 \
> nexthop via fe80::ea39:35ff:febd:fd6 dev bai1.2009 weight 1
> Error: an IP address is expected rather than "fe80::ea39:35ff:febd:f9e"
>
> The following patch fix this problem. However, it does not work. I now have:
> RTNETLINK answers: No such device
>
> I have little knowledge of netlink so it is likely that my patch is
> buggy. However, I have found this problem by trying to debug what
> appears to be a valid netlink message refused by the kernel with the
> same error. Therefore, I suspect that there is also a bug in the kernel.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCH net-next 0/1] Add support of ECMPv6
2012-09-06 17:53 ` Vincent Bernat
@ 2012-09-12 8:29 ` Nicolas Dichtel
2012-09-12 8:29 ` [RFC PATCH net-next 1/1] ipv6: add support of ECMP Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-12 8:29 UTC (permalink / raw)
To: bernat, netdev, yoshfuji, davem
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too:
diff --git a/ip/iproute.c b/ip/iproute.c
index 2fe44b3..b71f150 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -693,8 +693,10 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
rtnh = RTNH_NEXT(rtnh);
}
- if (rta->rta_len > RTA_LENGTH(0))
+ if (rta->rta_len > RTA_LENGTH(0)) {
addattr_l(n, 1024, RTA_MULTIPATH, RTA_DATA(rta), RTA_PAYLOAD(rta));
+ n->nlmsg_flags &= ~NLM_F_EXCL;
+ }
return 0;
}
If the kernel patch is approved, I will submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 weight 1 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0 weight 1
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [RFC PATCH net-next 1/1] ipv6: add support of ECMP
2012-09-12 8:29 ` [RFC PATCH net-next 0/1] Add support of ECMPv6 Nicolas Dichtel
@ 2012-09-12 8:29 ` Nicolas Dichtel
2012-09-12 8:48 ` YOSHIFUJI Hideaki
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-12 8:29 UTC (permalink / raw)
To: bernat, netdev, yoshfuji, davem; +Cc: Nicolas Dichtel
This patch adds the support of equal cost multipath for IPv6.
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/ip6_fib.h | 13 ++++
net/ipv6/Kconfig | 32 ++++++++
net/ipv6/ip6_fib.c | 73 ++++++++++++++++++
net/ipv6/route.c | 207 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 322 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index cd64cf3..8071c66 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,10 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+#ifdef CONFIG_IPV6_MULTIPATH
+ struct nlattr *fc_mp;
+ int fc_mp_len;
+#endif
struct nl_info fc_nlinfo;
};
@@ -98,6 +102,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ unsigned int rt6i_nsiblings;
+ struct list_head rt6i_siblings;
+#endif
atomic_t rt6i_ref;
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 4f7fe72..5980aec 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -266,4 +266,36 @@ config IPV6_PIMSM_V2
Support for IPv6 PIM multicast routing protocol PIM-SMv2.
If unsure, say N.
+config IPV6_MULTIPATH
+ bool "IPv6: equal cost multipath for IPv6 routing"
+ depends on IPV6
+ default y
+ ---help---
+ Enable this option to support ECMP for IPv6.
+ If unsure, say N.
+
+choice
+ prompt "IPv6: choose Multipath algorithm"
+ depends on IPV6_MULTIPATH
+ default IPV6_MULTIPATH_ROUTE
+ ---help---
+ Define the method to select route between each possible path.
+
+ config IPV6_MULTIPATH_ROUTE
+ bool "IPv6: MULTIPATH flow algorithm"
+ ---help---
+ Multipath routes are chosen according to hash of packet header to
+ ensure a flow keeps the same route.
+
+ config IPV6_MULTIPATH_RR
+ bool "IPv6: MULTIPATH round robin algorithm"
+ ---help---
+ Multipath routes are chosen according to Round Robin.
+
+ config IPV6_MULTIPATH_RANDOM
+ bool "IPv6: MULTIPATH random algorithm"
+ ---help---
+ Multipath routes are chosen in a random fashion.
+endchoice
+
endif # IPV6
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 13690d6..3541e44 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings)
+ rt->rt6i_nsiblings = 0;
+#endif
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +684,23 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ rt->rt6i_nsiblings++;
+#endif
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +713,43 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Link this route to others same route. */
+ if (rt->rt6i_nsiblings) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. We can check if all the counter are equal.
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings,
+ rt6i_siblings) {
+ sibling->rt6i_nsiblings++;
+ if (unlikely(sibling->rt6i_nsiblings !=
+ rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p (%d)\n",
+ sibling, sibling->rt6i_nsiblings);
+ }
+ rt6i_nsiblings++;
+ }
+ if (unlikely(rt6i_nsiblings != rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p. I have %d routes, but count %d siblings\n",
+ rt, rt6i_nsiblings, rt->rt6i_nsiblings);
+ }
+ }
+#endif
/*
* insert node
*/
@@ -1197,6 +1255,21 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Remove this entry from other siblings */
+ if (rt->rt6i_nsiblings) {
+ struct rt6_info *sibling, *next_sibling;
+
+ /* For each siblings, decrement the counter of siblings */
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ sibling->rt6i_nsiblings--;
+ }
+ rt->rt6i_nsiblings = 0;
+ list_del_init(&rt->rt6i_siblings);
+ }
+#endif
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399613b..563d671 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,9 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#ifdef CONFIG_IPV6_MULTIPATH
+#include <net/nexthop.h>
+#endif
#include <asm/uaccess.h>
@@ -288,6 +291,10 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
+#ifdef CONFIG_IPV6_MULTIPATH
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+ rt->rt6i_nsiblings = 0;
+#endif
}
return rt;
}
@@ -388,6 +395,122 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+#ifdef CONFIG_IPV6_MULTIPATH
+/*
+ * Multipath route selection.
+ */
+
+#ifdef CONFIG_IPV6_MULTIPATH_RANDOM
+/*
+ * Pseudo random candidate function
+ */
+static int rt6_info_hash_randomfn(unsigned int candidate_count)
+{
+ return random32() % candidate_count;
+}
+#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH_RR
+/*
+ * Fake Round Robin candidate function
+ * If we want real RR, we need to add a counter in each route
+ */
+static int rt6_info_hash_falserr(unsigned int candidate_count)
+{
+ static unsigned int seed;
+ seed++;
+ return seed % candidate_count;
+}
+#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH_ROUTE
+/*
+ * Pseudo random candidate using the src port, and other information
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+#endif
+
+/*
+ * This function return an index used to select (at random, round robin, ...)
+ * a route between any siblings.
+ *
+ * Note: fl6 can be NULL
+ */
+static unsigned int rt6_info_hashfn(const struct rt6_info *rt,
+ const struct flowi6 *fl6)
+{
+ int candidate_count = rt->rt6i_nsiblings + 1;
+
+#if defined(CONFIG_IPV6_MULTIPATH_RR)
+ return rt6_info_hash_falserr(candidate_count);
+#elif defined(CONFIG_IPV6_MULTIPATH_RANDOM)
+ return rt6_info_hash_randomfn(candidate_count);
+#elif defined(CONFIG_IPV6_MULTIPATH_ROUTE)
+ if (fl6 == NULL)
+ return 0;
+ return rt6_info_hash_nhsfn(candidate_count, fl6);
+#else
+ return 0;
+#endif
+}
+
+static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hashfn(match, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -705,6 +828,10 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -866,7 +993,10 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2247,6 +2377,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+#ifdef CONFIG_IPV6_MULTIPATH
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+#endif
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2324,11 +2457,69 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+#endif
+
err = 0;
errout:
return err;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2338,7 +2529,12 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+#endif
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2350,7 +2546,12 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+#endif
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next 1/1] ipv6: add support of ECMP
2012-09-12 8:29 ` [RFC PATCH net-next 1/1] ipv6: add support of ECMP Nicolas Dichtel
@ 2012-09-12 8:48 ` YOSHIFUJI Hideaki
2012-09-12 9:42 ` YOSHIFUJI Hideaki
0 siblings, 1 reply; 47+ messages in thread
From: YOSHIFUJI Hideaki @ 2012-09-12 8:48 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: bernat, netdev, davem, YOSHIFUJI Hideaki
Hello.
Nicolas Dichtel wrote:
> This patch adds the support of equal cost multipath for IPv6.
>
> The patch is based on a previous work from
> Luc Saillard <luc.saillard@6wind.com>.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
:
> +config IPV6_MULTIPATH
> + bool "IPv6: equal cost multipath for IPv6 routing"
> + depends on IPV6
> + default y
> + ---help---
> + Enable this option to support ECMP for IPv6.
> + If unsure, say N.
> +
> +choice
> + prompt "IPv6: choose Multipath algorithm"
> + depends on IPV6_MULTIPATH
> + default IPV6_MULTIPATH_ROUTE
> + ---help---
> + Define the method to select route between each possible path.
> +
> + config IPV6_MULTIPATH_ROUTE
> + bool "IPv6: MULTIPATH flow algorithm"
> + ---help---
> + Multipath routes are chosen according to hash of packet header to
> + ensure a flow keeps the same route.
> +
> + config IPV6_MULTIPATH_RR
> + bool "IPv6: MULTIPATH round robin algorithm"
> + ---help---
> + Multipath routes are chosen according to Round Robin.
> +
> + config IPV6_MULTIPATH_RANDOM
> + bool "IPv6: MULTIPATH random algorithm"
> + ---help---
> + Multipath routes are chosen in a random fashion.
> +endchoice
We should use hash-based algorithm by default,
according to RFC4311. See also RFC6438.
Regards,
--yoshfuji
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next 1/1] ipv6: add support of ECMP
2012-09-12 8:48 ` YOSHIFUJI Hideaki
@ 2012-09-12 9:42 ` YOSHIFUJI Hideaki
2012-09-12 9:53 ` Nicolas Dichtel
2012-09-14 7:59 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Nicolas Dichtel
0 siblings, 2 replies; 47+ messages in thread
From: YOSHIFUJI Hideaki @ 2012-09-12 9:42 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: YOSHIFUJI Hideaki, bernat, netdev, davem
Hello.
YOSHIFUJI Hideaki wrote:
> Hello.
>
> Nicolas Dichtel wrote:
>> This patch adds the support of equal cost multipath for IPv6.
>>
>> The patch is based on a previous work from
>> Luc Saillard <luc.saillard@6wind.com>.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> :
>> +config IPV6_MULTIPATH
>> + bool "IPv6: equal cost multipath for IPv6 routing"
>> + depends on IPV6
>> + default y
>> + ---help---
>> + Enable this option to support ECMP for IPv6.
>> + If unsure, say N.
>> +
>> +choice
>> + prompt "IPv6: choose Multipath algorithm"
>> + depends on IPV6_MULTIPATH
>> + default IPV6_MULTIPATH_ROUTE
>> + ---help---
>> + Define the method to select route between each possible path.
>> +
>> + config IPV6_MULTIPATH_ROUTE
>> + bool "IPv6: MULTIPATH flow algorithm"
>> + ---help---
>> + Multipath routes are chosen according to hash of packet header to
>> + ensure a flow keeps the same route.
>> +
>> + config IPV6_MULTIPATH_RR
>> + bool "IPv6: MULTIPATH round robin algorithm"
>> + ---help---
>> + Multipath routes are chosen according to Round Robin.
>> +
>> + config IPV6_MULTIPATH_RANDOM
>> + bool "IPv6: MULTIPATH random algorithm"
>> + ---help---
>> + Multipath routes are chosen in a random fashion.
>> +endchoice
>
> We should use hash-based algorithm by default,
> according to RFC4311. See also RFC6438.
Sorry, I missed something and misunderstood.
I prefer "HASH" of "FLOW" instead of "ROUTE"
because it select route by "hash" or "flow"
(as other options mean; by "round-robin"(RR) or by "random"(RANDOM)).
And, please clearly specify that it is the recommended
the default and recommended algorithm.
(We may have references to RFCs.)
Default is "y" but description says "if unsure, say N."
This is not good.
Of course, we may want to take "flow label" into account
when calculating hash (RFC6438).
Regards,
-----
[*] IPv6: equal cost multipath for IPv6 routing
Enable this option to support ECMP for IPv6.
[*] IPv6: MULTIPATH hash-based algorithm
Multipath routes are chosen according to hash of packet
header information (source, destination, ...)
to ensure a flow keeps the same route.
This is the default and recommended.
[ ] IPv6: MULTIPATH round-robin algorithm
[ ] IPv6: MULTIPATH random algorithm
--yoshfuji
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next 1/1] ipv6: add support of ECMP
2012-09-12 9:42 ` YOSHIFUJI Hideaki
@ 2012-09-12 9:53 ` Nicolas Dichtel
2012-09-14 7:59 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Nicolas Dichtel
1 sibling, 0 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-12 9:53 UTC (permalink / raw)
To: YOSHIFUJI Hideaki; +Cc: bernat, netdev, davem
Le 12/09/2012 11:42, YOSHIFUJI Hideaki a écrit :
> Hello.
>
> YOSHIFUJI Hideaki wrote:
>> Hello.
>>
>> Nicolas Dichtel wrote:
>>> This patch adds the support of equal cost multipath for IPv6.
>>>
>>> The patch is based on a previous work from
>>> Luc Saillard <luc.saillard@6wind.com>.
>>>
>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> :
>>> +config IPV6_MULTIPATH
>>> + bool "IPv6: equal cost multipath for IPv6 routing"
>>> + depends on IPV6
>>> + default y
>>> + ---help---
>>> + Enable this option to support ECMP for IPv6.
>>> + If unsure, say N.
>>> +
>>> +choice
>>> + prompt "IPv6: choose Multipath algorithm"
>>> + depends on IPV6_MULTIPATH
>>> + default IPV6_MULTIPATH_ROUTE
>>> + ---help---
>>> + Define the method to select route between each possible path.
>>> +
>>> + config IPV6_MULTIPATH_ROUTE
>>> + bool "IPv6: MULTIPATH flow algorithm"
>>> + ---help---
>>> + Multipath routes are chosen according to hash of packet header to
>>> + ensure a flow keeps the same route.
>>> +
>>> + config IPV6_MULTIPATH_RR
>>> + bool "IPv6: MULTIPATH round robin algorithm"
>>> + ---help---
>>> + Multipath routes are chosen according to Round Robin.
>>> +
>>> + config IPV6_MULTIPATH_RANDOM
>>> + bool "IPv6: MULTIPATH random algorithm"
>>> + ---help---
>>> + Multipath routes are chosen in a random fashion.
>>> +endchoice
>>
>> We should use hash-based algorithm by default,
>> according to RFC4311. See also RFC6438.
>
> Sorry, I missed something and misunderstood.
>
>
> I prefer "HASH" of "FLOW" instead of "ROUTE"
> because it select route by "hash" or "flow"
> (as other options mean; by "round-robin"(RR) or by "random"(RANDOM)).
Ok.
>
> And, please clearly specify that it is the recommended
> the default and recommended algorithm.
> (We may have references to RFCs.)
Ok.
>
> Default is "y" but description says "if unsure, say N."
> This is not good.
Yes, good catch.
>
>
> Of course, we may want to take "flow label" into account
> when calculating hash (RFC6438).
Ok, I will add it. I wait for others comments.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCH net-next v2 0/1] Add support of ECMPv6
2012-09-12 9:42 ` YOSHIFUJI Hideaki
2012-09-12 9:53 ` Nicolas Dichtel
@ 2012-09-14 7:59 ` Nicolas Dichtel
2012-09-14 7:59 ` [RFC PATCH net-next v2 1/1] ipv6: add support of ECMP Nicolas Dichtel
` (2 more replies)
1 sibling, 3 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-14 7:59 UTC (permalink / raw)
To: yoshfuji; +Cc: bernat, netdev, davem
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too, see http://patchwork.ozlabs.org/patch/183277/
If the kernel patch is approved, I will submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 weight 1 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0 weight 1
v2: rename CONFIG_IPV6_MULTIPATH_ROUTE to CONFIG_IPV6_MULTIPATH_HASH
use flowlabel in the hash function
add reference to RFC
fix a small identation issue
remove "If unsure, say N." from the help of CONFIG_IPV6_MULTIPATH
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [RFC PATCH net-next v2 1/1] ipv6: add support of ECMP
2012-09-14 7:59 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Nicolas Dichtel
@ 2012-09-14 7:59 ` Nicolas Dichtel
2012-09-14 9:40 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Vincent Bernat
2012-09-19 9:18 ` [PATCH net-next v3 " Nicolas Dichtel
2 siblings, 0 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-14 7:59 UTC (permalink / raw)
To: yoshfuji; +Cc: bernat, netdev, davem, Nicolas Dichtel
This patch adds the support of equal cost multipath for IPv6.
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/ip6_fib.h | 13 ++++
net/ipv6/Kconfig | 33 ++++++++
net/ipv6/ip6_fib.c | 73 ++++++++++++++++++
net/ipv6/route.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 325 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index cd64cf3..37e502a 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,10 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+#ifdef CONFIG_IPV6_MULTIPATH
+ struct nlattr *fc_mp;
+ int fc_mp_len;
+#endif
struct nl_info fc_nlinfo;
};
@@ -98,6 +102,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ unsigned int rt6i_nsiblings;
+ struct list_head rt6i_siblings;
+#endif
atomic_t rt6i_ref;
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 4f7fe72..e0c92dc 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -266,4 +266,37 @@ config IPV6_PIMSM_V2
Support for IPv6 PIM multicast routing protocol PIM-SMv2.
If unsure, say N.
+config IPV6_MULTIPATH
+ bool "IPv6: equal cost multipath for IPv6 routing"
+ depends on IPV6
+ default y
+ ---help---
+ Enable this option to support ECMP for IPv6.
+
+choice
+ prompt "IPv6: choose Multipath algorithm"
+ depends on IPV6_MULTIPATH
+ default IPV6_MULTIPATH_HASH
+ ---help---
+ Define the method to select route between each possible path.
+ The recommanded algorithm (by RFC4311) is HASH method.
+
+ config IPV6_MULTIPATH_HASH
+ bool "IPv6: MULTIPATH hash/flow algorithm"
+ ---help---
+ Multipath routes are chosen according to hash of packet header to
+ ensure a flow keeps the same route.
+ This algorithm is recommanded by RFC4311.
+
+ config IPV6_MULTIPATH_RR
+ bool "IPv6: MULTIPATH round robin algorithm"
+ ---help---
+ Multipath routes are chosen according to Round Robin.
+
+ config IPV6_MULTIPATH_RANDOM
+ bool "IPv6: MULTIPATH random algorithm"
+ ---help---
+ Multipath routes are chosen in a random fashion.
+endchoice
+
endif # IPV6
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 13690d6..3541e44 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings)
+ rt->rt6i_nsiblings = 0;
+#endif
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +684,23 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ rt->rt6i_nsiblings++;
+#endif
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +713,43 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Link this route to others same route. */
+ if (rt->rt6i_nsiblings) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. We can check if all the counter are equal.
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings,
+ rt6i_siblings) {
+ sibling->rt6i_nsiblings++;
+ if (unlikely(sibling->rt6i_nsiblings !=
+ rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p (%d)\n",
+ sibling, sibling->rt6i_nsiblings);
+ }
+ rt6i_nsiblings++;
+ }
+ if (unlikely(rt6i_nsiblings != rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p. I have %d routes, but count %d siblings\n",
+ rt, rt6i_nsiblings, rt->rt6i_nsiblings);
+ }
+ }
+#endif
/*
* insert node
*/
@@ -1197,6 +1255,21 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Remove this entry from other siblings */
+ if (rt->rt6i_nsiblings) {
+ struct rt6_info *sibling, *next_sibling;
+
+ /* For each siblings, decrement the counter of siblings */
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ sibling->rt6i_nsiblings--;
+ }
+ rt->rt6i_nsiblings = 0;
+ list_del_init(&rt->rt6i_siblings);
+ }
+#endif
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399613b..431f7ad 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,9 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#ifdef CONFIG_IPV6_MULTIPATH
+#include <net/nexthop.h>
+#endif
#include <asm/uaccess.h>
@@ -288,6 +291,10 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
+#ifdef CONFIG_IPV6_MULTIPATH
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+ rt->rt6i_nsiblings = 0;
+#endif
}
return rt;
}
@@ -388,6 +395,124 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+#ifdef CONFIG_IPV6_MULTIPATH
+/*
+ * Multipath route selection.
+ */
+
+#ifdef CONFIG_IPV6_MULTIPATH_RANDOM
+/*
+ * Pseudo random candidate function
+ */
+static int rt6_info_hash_randomfn(unsigned int candidate_count)
+{
+ return random32() % candidate_count;
+}
+#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH_RR
+/*
+ * Fake Round Robin candidate function
+ * If we want real RR, we need to add a counter in each route
+ */
+static int rt6_info_hash_falserr(unsigned int candidate_count)
+{
+ static unsigned int seed;
+ seed++;
+ return seed % candidate_count;
+}
+#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH_HASH
+/*
+ * Pseudo random candidate using the src port, and other information
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+ /* RFC6438 recommands to use flowlabel */
+ val ^= fl6->flowlabel;
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+#endif
+
+/*
+ * This function return an index used to select (at random, round robin, ...)
+ * a route between any siblings.
+ *
+ * Note: fl6 can be NULL
+ */
+static unsigned int rt6_info_hashfn(const struct rt6_info *rt,
+ const struct flowi6 *fl6)
+{
+ int candidate_count = rt->rt6i_nsiblings + 1;
+
+#if defined(CONFIG_IPV6_MULTIPATH_RR)
+ return rt6_info_hash_falserr(candidate_count);
+#elif defined(CONFIG_IPV6_MULTIPATH_RANDOM)
+ return rt6_info_hash_randomfn(candidate_count);
+#elif defined(CONFIG_IPV6_MULTIPATH_HASH)
+ if (fl6 == NULL)
+ return 0;
+ return rt6_info_hash_nhsfn(candidate_count, fl6);
+#else
+ return 0;
+#endif
+}
+
+static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hashfn(match, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -705,6 +830,10 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -866,7 +995,10 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2247,6 +2379,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+#ifdef CONFIG_IPV6_MULTIPATH
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+#endif
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2324,11 +2459,69 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+#endif
+
err = 0;
errout:
return err;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2338,7 +2531,12 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+#endif
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2350,7 +2548,12 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+#endif
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next v2 0/1] Add support of ECMPv6
2012-09-14 7:59 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Nicolas Dichtel
2012-09-14 7:59 ` [RFC PATCH net-next v2 1/1] ipv6: add support of ECMP Nicolas Dichtel
@ 2012-09-14 9:40 ` Vincent Bernat
2012-09-14 13:35 ` Nicolas Dichtel
2012-09-19 9:18 ` [PATCH net-next v3 " Nicolas Dichtel
2 siblings, 1 reply; 47+ messages in thread
From: Vincent Bernat @ 2012-09-14 9:40 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: yoshfuji, netdev, davem
❦ 14 septembre 2012 09:59 CEST, Nicolas Dichtel <nicolas.dichtel@6wind.com> :
> Here is an example of a command to add an ECMP route:
> $ ip -6 route add 3ffe:304:124:2306::/64 \
> nexthop via fe80::230:1bff:feb4:e05c dev eth0 weight 1 \
> nexthop via fe80::230:1bff:feb4:dd4f dev eth0 weight 1
When displaying ECMP routes, the display is different than for IPv4: we
get two distinct routes instead of an ECMP route (with nexthop
keyword).
With IPv4:
193.252.X.X/26 proto zebra metric 20
nexthop via 193.252.X.X dev bae1 weight 1
nexthop via 193.252.X.X dev bae2 weight 1
With IPv6:
2a01:c9c0:X:X::/64 via fe80::215:17ff:fe85:76b9 dev bae1 metric 11
2a01:c9c0:X:X::/64 via fe80::222:91ff:fe4e:b000 dev bae2 metric 11
If I capture the netlink message from the add command, put it in a file
and use "ip monitor file ...", I see this:
2a01:c9c0:X:X::/64
nexthop via fe80::215:17ff:fe85:76b9 dev if12 weight 1
nexthop via fe80::222:91ff:fe4e:b000 dev if11 weight 1
Therefore, the problem is not in iproute2 which knows how to display
those ECMP routes. I fear that this difference make support in routing
daemons more difficult.
--
Make the coupling between modules visible.
- The Elements of Programming Style (Kernighan & Plauger)
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next v2 0/1] Add support of ECMPv6
2012-09-14 9:40 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Vincent Bernat
@ 2012-09-14 13:35 ` Nicolas Dichtel
2012-09-14 13:37 ` Nicolas Dichtel
2012-10-15 12:36 ` Vincent Bernat
0 siblings, 2 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-14 13:35 UTC (permalink / raw)
To: Vincent Bernat; +Cc: yoshfuji, netdev, davem
Le 14/09/2012 11:40, Vincent Bernat a écrit :
> ❦ 14 septembre 2012 09:59 CEST, Nicolas Dichtel <nicolas.dichtel@6wind.com> :
>
>> Here is an example of a command to add an ECMP route:
>> $ ip -6 route add 3ffe:304:124:2306::/64 \
>> nexthop via fe80::230:1bff:feb4:e05c dev eth0 weight 1 \
>> nexthop via fe80::230:1bff:feb4:dd4f dev eth0 weight 1
In fact, I use this command as a shortcut. You can also use:
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0
$ ip -6 route append 3ffe:304:124:2406::/64 via fe80::230:1bff:feb4:dd4f dev eth0
and these commands will output two standard netlink messages, without 'nexthop'
lines.
>
> When displaying ECMP routes, the display is different than for IPv4: we
> get two distinct routes instead of an ECMP route (with nexthop
> keyword).
Sure, this implementation stores each 'nexthop' like a standard route in the
kernel table.
>
> With IPv4:
>
> 193.252.X.X/26 proto zebra metric 20
> nexthop via 193.252.X.X dev bae1 weight 1
> nexthop via 193.252.X.X dev bae2 weight 1
>
> With IPv6:
>
> 2a01:c9c0:X:X::/64 via fe80::215:17ff:fe85:76b9 dev bae1 metric 11
> 2a01:c9c0:X:X::/64 via fe80::222:91ff:fe4e:b000 dev bae2 metric 11
>
> If I capture the netlink message from the add command, put it in a file
> and use "ip monitor file ...", I see this:
>
> 2a01:c9c0:X:X::/64
> nexthop via fe80::215:17ff:fe85:76b9 dev if12 weight 1
> nexthop via fe80::222:91ff:fe4e:b000 dev if11 weight 1
>
> Therefore, the problem is not in iproute2 which knows how to display
> those ECMP routes. I fear that this difference make support in routing
> daemons more difficult.
Hmm, can you elaborate? Our routing daemon, quagga, manage it without any problem.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next v2 0/1] Add support of ECMPv6
2012-09-14 13:35 ` Nicolas Dichtel
@ 2012-09-14 13:37 ` Nicolas Dichtel
2012-10-15 12:36 ` Vincent Bernat
1 sibling, 0 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-14 13:37 UTC (permalink / raw)
To: Vincent Bernat; +Cc: yoshfuji, netdev, davem
Le 14/09/2012 15:35, Nicolas Dichtel a écrit :
> Le 14/09/2012 11:40, Vincent Bernat a écrit :
>> ❦ 14 septembre 2012 09:59 CEST, Nicolas Dichtel <nicolas.dichtel@6wind.com> :
>>
>>> Here is an example of a command to add an ECMP route:
>>> $ ip -6 route add 3ffe:304:124:2306::/64 \
>>> nexthop via fe80::230:1bff:feb4:e05c dev eth0 weight 1 \
>>> nexthop via fe80::230:1bff:feb4:dd4f dev eth0 weight 1
> In fact, I use this command as a shortcut. You can also use:
> $ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0
> $ ip -6 route append 3ffe:304:124:2406::/64 via fe80::230:1bff:feb4:dd4f dev eth0
Note also that with these commands, there is no need to patch iproute2.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v3 0/1] Add support of ECMPv6
2012-09-14 7:59 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Nicolas Dichtel
2012-09-14 7:59 ` [RFC PATCH net-next v2 1/1] ipv6: add support of ECMP Nicolas Dichtel
2012-09-14 9:40 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Vincent Bernat
@ 2012-09-19 9:18 ` Nicolas Dichtel
2012-09-19 9:18 ` [PATCH net-next v3 1/1] ipv6: add support of ECMP Nicolas Dichtel
2 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-19 9:18 UTC (permalink / raw)
To: netdev, davem; +Cc: bernat, yoshfuji
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too, see http://patchwork.ozlabs.org/patch/183277/
If the kernel patch is approved, I can submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0
But note that this command is a shortcut and previous patches are not
mandatory to set ECMP routes. The following commands can be used too:
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
eth0
$ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
eth0
Here is an example of a dump:
$ ip -6 route | grep 3ffe:304:124:2306::/64
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0 metric 1024
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev eth0 metric 1024
v2: rename CONFIG_IPV6_MULTIPATH_ROUTE to CONFIG_IPV6_MULTIPATH_HASH
use flowlabel in the hash function
add reference to RFC
fix a small identation issue
remove "If unsure, say N." from the help of CONFIG_IPV6_MULTIPATH
v3: rebase after updating net-next
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v3 1/1] ipv6: add support of ECMP
2012-09-19 9:18 ` [PATCH net-next v3 " Nicolas Dichtel
@ 2012-09-19 9:18 ` Nicolas Dichtel
2012-09-20 21:15 ` David Miller
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-19 9:18 UTC (permalink / raw)
To: netdev, davem; +Cc: bernat, yoshfuji, Nicolas Dichtel
This patch adds the support of equal cost multipath for IPv6.
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/ip6_fib.h | 13 ++++
net/ipv6/Kconfig | 33 ++++++++
net/ipv6/ip6_fib.c | 73 ++++++++++++++++++
net/ipv6/route.c | 209 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 325 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index cd64cf3..37e502a 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,10 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+#ifdef CONFIG_IPV6_MULTIPATH
+ struct nlattr *fc_mp;
+ int fc_mp_len;
+#endif
struct nl_info fc_nlinfo;
};
@@ -98,6 +102,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ unsigned int rt6i_nsiblings;
+ struct list_head rt6i_siblings;
+#endif
atomic_t rt6i_ref;
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 4f7fe72..e0c92dc 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -266,4 +266,37 @@ config IPV6_PIMSM_V2
Support for IPv6 PIM multicast routing protocol PIM-SMv2.
If unsure, say N.
+config IPV6_MULTIPATH
+ bool "IPv6: equal cost multipath for IPv6 routing"
+ depends on IPV6
+ default y
+ ---help---
+ Enable this option to support ECMP for IPv6.
+
+choice
+ prompt "IPv6: choose Multipath algorithm"
+ depends on IPV6_MULTIPATH
+ default IPV6_MULTIPATH_HASH
+ ---help---
+ Define the method to select route between each possible path.
+ The recommanded algorithm (by RFC4311) is HASH method.
+
+ config IPV6_MULTIPATH_HASH
+ bool "IPv6: MULTIPATH hash/flow algorithm"
+ ---help---
+ Multipath routes are chosen according to hash of packet header to
+ ensure a flow keeps the same route.
+ This algorithm is recommanded by RFC4311.
+
+ config IPV6_MULTIPATH_RR
+ bool "IPv6: MULTIPATH round robin algorithm"
+ ---help---
+ Multipath routes are chosen according to Round Robin.
+
+ config IPV6_MULTIPATH_RANDOM
+ bool "IPv6: MULTIPATH random algorithm"
+ ---help---
+ Multipath routes are chosen in a random fashion.
+endchoice
+
endif # IPV6
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 13690d6..3541e44 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings)
+ rt->rt6i_nsiblings = 0;
+#endif
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +684,23 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ rt->rt6i_nsiblings++;
+#endif
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +713,43 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Link this route to others same route. */
+ if (rt->rt6i_nsiblings) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. We can check if all the counter are equal.
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings,
+ rt6i_siblings) {
+ sibling->rt6i_nsiblings++;
+ if (unlikely(sibling->rt6i_nsiblings !=
+ rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p (%d)\n",
+ sibling, sibling->rt6i_nsiblings);
+ }
+ rt6i_nsiblings++;
+ }
+ if (unlikely(rt6i_nsiblings != rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p. I have %d routes, but count %d siblings\n",
+ rt, rt6i_nsiblings, rt->rt6i_nsiblings);
+ }
+ }
+#endif
/*
* insert node
*/
@@ -1197,6 +1255,21 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Remove this entry from other siblings */
+ if (rt->rt6i_nsiblings) {
+ struct rt6_info *sibling, *next_sibling;
+
+ /* For each siblings, decrement the counter of siblings */
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ sibling->rt6i_nsiblings--;
+ }
+ rt->rt6i_nsiblings = 0;
+ list_del_init(&rt->rt6i_siblings);
+ }
+#endif
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 83dafa5..ac8b3a2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,9 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#ifdef CONFIG_IPV6_MULTIPATH
+#include <net/nexthop.h>
+#endif
#include <asm/uaccess.h>
@@ -288,6 +291,10 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
+#ifdef CONFIG_IPV6_MULTIPATH
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+ rt->rt6i_nsiblings = 0;
+#endif
}
return rt;
}
@@ -388,6 +395,124 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+#ifdef CONFIG_IPV6_MULTIPATH
+/*
+ * Multipath route selection.
+ */
+
+#ifdef CONFIG_IPV6_MULTIPATH_RANDOM
+/*
+ * Pseudo random candidate function
+ */
+static int rt6_info_hash_randomfn(unsigned int candidate_count)
+{
+ return random32() % candidate_count;
+}
+#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH_RR
+/*
+ * Fake Round Robin candidate function
+ * If we want real RR, we need to add a counter in each route
+ */
+static int rt6_info_hash_falserr(unsigned int candidate_count)
+{
+ static unsigned int seed;
+ seed++;
+ return seed % candidate_count;
+}
+#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH_HASH
+/*
+ * Pseudo random candidate using the src port, and other information
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+ /* RFC6438 recommands to use flowlabel */
+ val ^= fl6->flowlabel;
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+#endif
+
+/*
+ * This function return an index used to select (at random, round robin, ...)
+ * a route between any siblings.
+ *
+ * Note: fl6 can be NULL
+ */
+static unsigned int rt6_info_hashfn(const struct rt6_info *rt,
+ const struct flowi6 *fl6)
+{
+ int candidate_count = rt->rt6i_nsiblings + 1;
+
+#if defined(CONFIG_IPV6_MULTIPATH_RR)
+ return rt6_info_hash_falserr(candidate_count);
+#elif defined(CONFIG_IPV6_MULTIPATH_RANDOM)
+ return rt6_info_hash_randomfn(candidate_count);
+#elif defined(CONFIG_IPV6_MULTIPATH_HASH)
+ if (fl6 == NULL)
+ return 0;
+ return rt6_info_hash_nhsfn(candidate_count, fl6);
+#else
+ return 0;
+#endif
+}
+
+static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hashfn(match, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -705,6 +830,10 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -866,7 +995,10 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2247,6 +2379,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+#ifdef CONFIG_IPV6_MULTIPATH
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+#endif
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2324,11 +2459,69 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+#endif
+
err = 0;
errout:
return err;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2338,7 +2531,12 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+#endif
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2350,7 +2548,12 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+#endif
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v3 1/1] ipv6: add support of ECMP
2012-09-19 9:18 ` [PATCH net-next v3 1/1] ipv6: add support of ECMP Nicolas Dichtel
@ 2012-09-20 21:15 ` David Miller
2012-09-21 9:59 ` [PATCH net-next v4 0/1] Add support of ECMPv6 Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2012-09-20 21:15 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: netdev, bernat, yoshfuji
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 19 Sep 2012 11:18:24 +0200
> This patch adds the support of equal cost multipath for IPv6.
>
> The patch is based on a previous work from
> Luc Saillard <luc.saillard@6wind.com>.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Please make the algorithm selection run-time rather than compile
time.
For %99.999999999999999999 of users, making compile time changes
to get the semantics they want is not an option.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v4 0/1] Add support of ECMPv6
2012-09-20 21:15 ` David Miller
@ 2012-09-21 9:59 ` Nicolas Dichtel
2012-09-21 9:59 ` [PATCH net-next v4 1/1] ipv6: add support of ECMP Nicolas Dichtel
2012-09-21 17:48 ` [PATCH net-next v4 0/1] Add support of ECMPv6 David Miller
0 siblings, 2 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-21 9:59 UTC (permalink / raw)
To: davem; +Cc: bernat, netdev, yoshfuji
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too, see http://patchwork.ozlabs.org/patch/183277/
If the kernel patch is approved, I can submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0
But note that this command is a shortcut and previous patches are not
mandatory to set ECMP routes. The following commands can be used too:
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
eth0
$ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
eth0
Here is an example of a dump:
$ ip -6 route | grep 3ffe:304:124:2306::/64
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0 metric 1024
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev eth0 metric 1024
v2: rename CONFIG_IPV6_MULTIPATH_ROUTE to CONFIG_IPV6_MULTIPATH_HASH
use flowlabel in the hash function
add reference to RFC
fix a small identation issue
remove "If unsure, say N." from the help of CONFIG_IPV6_MULTIPATH
v3: rebase after updating net-next
v4: remove compilation options to choose multipath algorithm for next hop
selection. Now the choice can be done at run time via
/proc/sys/net/ipv6/route/multipath_algorithm
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v4 1/1] ipv6: add support of ECMP
2012-09-21 9:59 ` [PATCH net-next v4 0/1] Add support of ECMPv6 Nicolas Dichtel
@ 2012-09-21 9:59 ` Nicolas Dichtel
2012-09-21 17:48 ` [PATCH net-next v4 0/1] Add support of ECMPv6 David Miller
1 sibling, 0 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-21 9:59 UTC (permalink / raw)
To: davem; +Cc: bernat, netdev, yoshfuji, Nicolas Dichtel
This patch adds the support of equal cost multipath for IPv6.
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
Documentation/networking/ip-sysctl.txt | 8 ++
include/net/ip6_fib.h | 13 ++
include/net/netns/ipv6.h | 3 +
net/ipv6/Kconfig | 10 ++
net/ipv6/ip6_fib.c | 73 +++++++++++
net/ipv6/route.c | 222 ++++++++++++++++++++++++++++++++-
6 files changed, 325 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7fc107..018bf8b 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1330,6 +1330,14 @@ ratelimit - INTEGER
otherwise the minimal space between responses in milliseconds.
Default: 1000
+route/*:
+multipath_algorithm - INTEGER
+ Define the method to select route between each possible path.
+ 0 for hash/flow method (recommanded by RFC4311)
+ 1 for round robin method
+ 2 for random method
+ Default: 0
+
IPv6 Update by:
Pekka Savola <pekkas@netcore.fi>
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index cd64cf3..37e502a 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,10 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+#ifdef CONFIG_IPV6_MULTIPATH
+ struct nlattr *fc_mp;
+ int fc_mp_len;
+#endif
struct nl_info fc_nlinfo;
};
@@ -98,6 +102,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ unsigned int rt6i_nsiblings;
+ struct list_head rt6i_siblings;
+#endif
atomic_t rt6i_ref;
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 214cb0a..820d4a6 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -26,6 +26,9 @@ struct netns_sysctl_ipv6 {
int ip6_rt_gc_elasticity;
int ip6_rt_mtu_expires;
int ip6_rt_min_advmss;
+#ifdef CONFIG_IPV6_MULTIPATH
+ int ip6_rt_multipath_algo;
+#endif
int icmpv6_time;
};
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 4f7fe72..c43fdf7 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -266,4 +266,14 @@ config IPV6_PIMSM_V2
Support for IPv6 PIM multicast routing protocol PIM-SMv2.
If unsure, say N.
+config IPV6_MULTIPATH
+ bool "IPv6: equal cost multipath for IPv6 routing"
+ depends on IPV6
+ default y
+ ---help---
+ Enable this option to support ECMP for IPv6.
+
+ Three algorithms for route selection are available: hash of packet
+ header (recommanded by RFC4311), round robin and random.
+
endif # IPV6
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 13690d6..3541e44 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings)
+ rt->rt6i_nsiblings = 0;
+#endif
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +684,23 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ rt->rt6i_nsiblings++;
+#endif
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +713,43 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Link this route to others same route. */
+ if (rt->rt6i_nsiblings) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. We can check if all the counter are equal.
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings,
+ rt6i_siblings) {
+ sibling->rt6i_nsiblings++;
+ if (unlikely(sibling->rt6i_nsiblings !=
+ rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p (%d)\n",
+ sibling, sibling->rt6i_nsiblings);
+ }
+ rt6i_nsiblings++;
+ }
+ if (unlikely(rt6i_nsiblings != rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p. I have %d routes, but count %d siblings\n",
+ rt, rt6i_nsiblings, rt->rt6i_nsiblings);
+ }
+ }
+#endif
/*
* insert node
*/
@@ -1197,6 +1255,21 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Remove this entry from other siblings */
+ if (rt->rt6i_nsiblings) {
+ struct rt6_info *sibling, *next_sibling;
+
+ /* For each siblings, decrement the counter of siblings */
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ sibling->rt6i_nsiblings--;
+ }
+ rt->rt6i_nsiblings = 0;
+ list_del_init(&rt->rt6i_siblings);
+ }
+#endif
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0607ee3..bfad74f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,9 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#ifdef CONFIG_IPV6_MULTIPATH
+#include <net/nexthop.h>
+#endif
#include <asm/uaccess.h>
@@ -288,6 +291,10 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
+#ifdef CONFIG_IPV6_MULTIPATH
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+ rt->rt6i_nsiblings = 0;
+#endif
}
return rt;
}
@@ -384,6 +391,121 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+#ifdef CONFIG_IPV6_MULTIPATH
+/*
+ * Multipath route selection.
+ */
+
+/*
+ * Pseudo random candidate function
+ */
+static int rt6_info_hash_randomfn(unsigned int candidate_count)
+{
+ return random32() % candidate_count;
+}
+
+/*
+ * Fake Round Robin candidate function
+ * If we want real RR, we need to add a counter in each route
+ */
+static int rt6_info_hash_falserr(unsigned int candidate_count)
+{
+ static unsigned int seed;
+ seed++;
+ return seed % candidate_count;
+}
+
+/*
+ * Pseudo random candidate using the src port, and other information
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+ /* RFC6438 recommands to use flowlabel */
+ val ^= fl6->flowlabel;
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+
+/*
+ * This function return an index used to select (at random, round robin, ...)
+ * a route between any siblings.
+ *
+ * Note: fl6 can be NULL
+ */
+static unsigned int rt6_info_hashfn(struct net *net,
+ const struct rt6_info *rt,
+ const struct flowi6 *fl6)
+{
+ int candidate_count = rt->rt6i_nsiblings + 1;
+
+ switch (net->ipv6.sysctl.ip6_rt_multipath_algo) {
+ case 0:
+ if (fl6 == NULL)
+ return 0;
+ return rt6_info_hash_nhsfn(candidate_count, fl6);
+ case 1:
+ return rt6_info_hash_falserr(candidate_count);
+ case 2:
+ return rt6_info_hash_randomfn(candidate_count);
+ }
+
+ return 0;
+}
+
+static struct rt6_info *rt6_multipath_select(struct net *net,
+ struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hashfn(net, match, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -701,6 +823,10 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -862,7 +988,10 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2243,6 +2372,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+#ifdef CONFIG_IPV6_MULTIPATH
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+#endif
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2320,11 +2452,69 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+#endif
+
err = 0;
errout:
return err;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2334,7 +2524,12 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+#endif
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2346,7 +2541,12 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+#endif
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
@@ -2844,6 +3044,15 @@ ctl_table ipv6_route_table_template[] = {
.mode = 0644,
.proc_handler = proc_dointvec_ms_jiffies,
},
+#ifdef CONFIG_IPV6_MULTIPATH
+ {
+ .procname = "multipath_algorithm",
+ .data = &init_net.ipv6.sysctl.ip6_rt_multipath_algo,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif
{ }
};
@@ -2867,6 +3076,9 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
table[7].data = &net->ipv6.sysctl.ip6_rt_mtu_expires;
table[8].data = &net->ipv6.sysctl.ip6_rt_min_advmss;
table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
+#ifdef CONFIG_IPV6_MULTIPATH
+ table[10].data = &net->ipv6.sysctl.ip6_rt_multipath_algo;
+#endif
}
return table;
@@ -2926,7 +3138,9 @@ static int __net_init ip6_route_net_init(struct net *net)
net->ipv6.sysctl.ip6_rt_gc_elasticity = 9;
net->ipv6.sysctl.ip6_rt_mtu_expires = 10*60*HZ;
net->ipv6.sysctl.ip6_rt_min_advmss = IPV6_MIN_MTU - 20 - 40;
-
+#ifdef CONFIG_IPV6_MULTIPATH
+ net->ipv6.sysctl.ip6_rt_multipath_algo = 0;
+#endif
net->ipv6.ip6_rt_gc_expire = 30*HZ;
ret = 0;
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v4 0/1] Add support of ECMPv6
2012-09-21 9:59 ` [PATCH net-next v4 0/1] Add support of ECMPv6 Nicolas Dichtel
2012-09-21 9:59 ` [PATCH net-next v4 1/1] ipv6: add support of ECMP Nicolas Dichtel
@ 2012-09-21 17:48 ` David Miller
2012-09-24 12:28 ` Nicolas Dichtel
2012-10-01 13:56 ` [PATCH net-next v5 " Nicolas Dichtel
1 sibling, 2 replies; 47+ messages in thread
From: David Miller @ 2012-09-21 17:48 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: bernat, netdev, yoshfuji
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 21 Sep 2012 11:59:04 +0200
> v4: remove compilation options to choose multipath algorithm for next hop
> selection. Now the choice can be done at run time via
> /proc/sys/net/ipv6/route/multipath_algorithm
Please specify this in the routing configuration protocol, rather than
via some obscure procfs file.
Thanks.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v4 0/1] Add support of ECMPv6
2012-09-21 17:48 ` [PATCH net-next v4 0/1] Add support of ECMPv6 David Miller
@ 2012-09-24 12:28 ` Nicolas Dichtel
2012-10-01 13:56 ` [PATCH net-next v5 " Nicolas Dichtel
1 sibling, 0 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-09-24 12:28 UTC (permalink / raw)
To: David Miller; +Cc: bernat, netdev, yoshfuji
Le 21/09/2012 19:48, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Fri, 21 Sep 2012 11:59:04 +0200
>
>> v4: remove compilation options to choose multipath algorithm for next hop
>> selection. Now the choice can be done at run time via
>> /proc/sys/net/ipv6/route/multipath_algorithm
>
> Please specify this in the routing configuration protocol, rather than
> via some obscure procfs file.
Just to be sure to understand, the goal is to configure the algorithm when the
route is added? Thus, resurrecting RTA_MP_ALGO and having one algo per route?
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v5 0/1] Add support of ECMPv6
2012-09-21 17:48 ` [PATCH net-next v4 0/1] Add support of ECMPv6 David Miller
2012-09-24 12:28 ` Nicolas Dichtel
@ 2012-10-01 13:56 ` Nicolas Dichtel
2012-10-01 13:56 ` [PATCH net-next v5 1/1] ipv6: add support of ECMP Nicolas Dichtel
1 sibling, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-01 13:56 UTC (permalink / raw)
To: davem; +Cc: bernat, netdev, yoshfuji
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too, see http://patchwork.ozlabs.org/patch/183277/
If the kernel patch is approved, I can submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0
But note that this command is a shortcut and previous patches are not
mandatory to set ECMP routes. The following commands can be used too:
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
eth0
$ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
eth0
Here is an example of a dump:
$ ip -6 route | grep 3ffe:304:124:2306::/64
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0 metric 1024
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev eth0 metric 1024
v5: to minimize the patch and ease its integration, remove roundrobin and random
algorithms for route selection. It will be possible to add new algorithms
through rt6_info_hashfn() when the basic support of ECMP is integrated.
v4: remove compilation options to choose multipath algorithm for next hop
selection. Now the choice can be done at run time via
/proc/sys/net/ipv6/route/multipath_algorithm
v3: rebase after updating net-next
v2: rename CONFIG_IPV6_MULTIPATH_ROUTE to CONFIG_IPV6_MULTIPATH_HASH
use flowlabel in the hash function
add reference to RFC
fix a small identation issue
remove "If unsure, say N." from the help of CONFIG_IPV6_MULTIPATH
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v5 1/1] ipv6: add support of ECMP
2012-10-01 13:56 ` [PATCH net-next v5 " Nicolas Dichtel
@ 2012-10-01 13:56 ` Nicolas Dichtel
2012-10-01 16:47 ` Joe Perches
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-01 13:56 UTC (permalink / raw)
To: davem; +Cc: bernat, netdev, yoshfuji, Nicolas Dichtel
This patch adds the support of equal cost multipath for IPv6.
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/ip6_fib.h | 13 ++++
net/ipv6/Kconfig | 10 +++
net/ipv6/ip6_fib.c | 73 +++++++++++++++++++++
net/ipv6/route.c | 177 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 270 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 8a2a203..ed3f9c5 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,10 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+#ifdef CONFIG_IPV6_MULTIPATH
+ struct nlattr *fc_mp;
+ int fc_mp_len;
+#endif
struct nl_info fc_nlinfo;
};
@@ -98,6 +102,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ unsigned int rt6i_nsiblings;
+ struct list_head rt6i_siblings;
+#endif
atomic_t rt6i_ref;
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 4f7fe72..fc2f3cb 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -266,4 +266,14 @@ config IPV6_PIMSM_V2
Support for IPv6 PIM multicast routing protocol PIM-SMv2.
If unsure, say N.
+config IPV6_MULTIPATH
+ bool "IPv6: equal cost multipath for IPv6 routing"
+ depends on IPV6
+ default y
+ ---help---
+ Enable this option to support ECMP for IPv6.
+
+ The algorithm used for route selection is based on a hash of packet
+ header (recommanded by RFC4311) and flowlabel (RFC6438).
+
endif # IPV6
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 24995a9..754888c 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings)
+ rt->rt6i_nsiblings = 0;
+#endif
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +684,23 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ rt->rt6i_nsiblings++;
+#endif
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +713,43 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Link this route to others same route. */
+ if (rt->rt6i_nsiblings) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. We can check if all the counter are equal.
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings,
+ rt6i_siblings) {
+ sibling->rt6i_nsiblings++;
+ if (unlikely(sibling->rt6i_nsiblings !=
+ rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p (%d)\n",
+ sibling, sibling->rt6i_nsiblings);
+ }
+ rt6i_nsiblings++;
+ }
+ if (unlikely(rt6i_nsiblings != rt->rt6i_nsiblings)) {
+ pr_err("Wrong number of siblings for route %p. I have %d routes, but count %d siblings\n",
+ rt, rt6i_nsiblings, rt->rt6i_nsiblings);
+ }
+ }
+#endif
/*
* insert node
*/
@@ -1193,6 +1251,21 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Remove this entry from other siblings */
+ if (rt->rt6i_nsiblings) {
+ struct rt6_info *sibling, *next_sibling;
+
+ /* For each siblings, decrement the counter of siblings */
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ sibling->rt6i_nsiblings--;
+ }
+ rt->rt6i_nsiblings = 0;
+ list_del_init(&rt->rt6i_siblings);
+ }
+#endif
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d1ddbc6..0a8e16d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,9 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#ifdef CONFIG_IPV6_MULTIPATH
+#include <net/nexthop.h>
+#endif
#include <asm/uaccess.h>
@@ -289,6 +292,10 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
rt->rt6i_genid = rt_genid(net);
+#ifdef CONFIG_IPV6_MULTIPATH
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+ rt->rt6i_nsiblings = 0;
+#endif
}
return rt;
}
@@ -385,6 +392,92 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+#ifdef CONFIG_IPV6_MULTIPATH
+/*
+ * Multipath route selection.
+ */
+
+/*
+ * Hash based function using packet header and flowlabel.
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+ /* RFC6438 recommands to use flowlabel */
+ val ^= fl6->flowlabel;
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+
+/*
+ * This function returns an index used to select a route between any siblings.
+ *
+ * Note: fl6 can be NULL
+ */
+static unsigned int rt6_info_hashfn(struct net *net,
+ const struct rt6_info *rt,
+ const struct flowi6 *fl6)
+{
+ int candidate_count = rt->rt6i_nsiblings + 1;
+
+ if (fl6 == NULL)
+ return 0;
+ return rt6_info_hash_nhsfn(candidate_count, fl6);
+}
+
+static struct rt6_info *rt6_multipath_select(struct net *net,
+ struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hashfn(net, match, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -702,6 +795,10 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -863,7 +960,10 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (rt->rt6i_nsiblings && oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
+#endif
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2248,6 +2348,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+#ifdef CONFIG_IPV6_MULTIPATH
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+#endif
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2325,11 +2428,69 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+#endif
+
err = 0;
errout:
return err;
}
+#ifdef CONFIG_IPV6_MULTIPATH
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2339,7 +2500,12 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+#endif
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2351,7 +2517,12 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+#ifdef CONFIG_IPV6_MULTIPATH
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+#endif
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v5 1/1] ipv6: add support of ECMP
2012-10-01 13:56 ` [PATCH net-next v5 1/1] ipv6: add support of ECMP Nicolas Dichtel
@ 2012-10-01 16:47 ` Joe Perches
2012-10-02 16:02 ` [PATCH net-next v6 0/1] Add support of ECMPv6 Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Joe Perches @ 2012-10-01 16:47 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: davem, bernat, netdev, yoshfuji
On Mon, 2012-10-01 at 15:56 +0200, Nicolas Dichtel wrote:
> This patch adds the support of equal cost multipath for IPv6.
trivia:
> diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
[]
> @@ -47,6 +47,10 @@ struct fib6_config {
> unsigned long fc_expires;
> struct nlattr *fc_mx;
> int fc_mx_len;
> +#ifdef CONFIG_IPV6_MULTIPATH
> + struct nlattr *fc_mp;
> + int fc_mp_len;
> +#endif
These new entries should be in the reverse order to
avoid having a padding hole in 64-bit systems.
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> @@ -672,6 +672,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> iter->rt6i_idev == rt->rt6i_idev &&
> ipv6_addr_equal(&iter->rt6i_gateway,
> &rt->rt6i_gateway)) {
> +#ifdef CONFIG_IPV6_MULTIPATH
> + if (rt->rt6i_nsiblings)
> + rt->rt6i_nsiblings = 0;
> +#endif
There are a _lot_ of #ifdef CONFIG_IPV6_MULTIPATH blocks.
It might be better to add a few static line functions
in a header file like:
#ifdef CONFIG_IPV6_MULTIPATH
static inline int ipv6_get_multipath_siblings(const struct rt6_info *rt)
{
return rt->rt6i_nsiblings;
}
#else
static inline int ipv6_get_multipath_siblings(const struct rt6_info *rt)
{
return 0;
}
#endif
and remove most of the #ifdef blocks.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v6 0/1] Add support of ECMPv6
2012-10-01 16:47 ` Joe Perches
@ 2012-10-02 16:02 ` Nicolas Dichtel
2012-10-02 16:02 ` [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP) Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-02 16:02 UTC (permalink / raw)
To: joe; +Cc: bernat, netdev, yoshfuji, davem
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too, see http://patchwork.ozlabs.org/patch/183277/
If the kernel patch is approved, I can submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0
But note that this command is a shortcut and previous patches are not
mandatory to set ECMP routes. The following commands can be used too:
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
eth0
$ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
eth0
Here is an example of a dump:
$ ip -6 route | grep 3ffe:304:124:2306::/64
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0 metric 1024
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev eth0 metric 1024
v6: be more verbose in commitlog
add some helpers in ip6_fib.h to avoid to have too many ifdef block in the
code
invert fc_mp_len and fc_mp in struct fib6_config to avoid a hole on 64bits
arch
v5: to minimize the patch and ease its integration, remove roundrobin and random
algorithms for route selection. It will be possible to add new algorithms
through rt6_info_hashfn() when the basic support of ECMP is integrated.
v4: remove compilation options to choose multipath algorithm for next hop
selection. Now the choice can be done at run time via
/proc/sys/net/ipv6/route/multipath_algorithm
v3: rebase after updating net-next
v2: rename CONFIG_IPV6_MULTIPATH_ROUTE to CONFIG_IPV6_MULTIPATH_HASH
use flowlabel in the hash function
add reference to RFC
fix a small identation issue
remove "If unsure, say N." from the help of CONFIG_IPV6_MULTIPATH
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-02 16:02 ` [PATCH net-next v6 0/1] Add support of ECMPv6 Nicolas Dichtel
@ 2012-10-02 16:02 ` Nicolas Dichtel
2012-10-02 16:06 ` Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-02 16:02 UTC (permalink / raw)
To: joe; +Cc: bernat, netdev, yoshfuji, davem, Nicolas Dichtel
Each nexthop is added like a single route in the routing table. All routes
that have the same metric/weight and destination but not the same gateway
are considering as ECMP routes. They are linked together, through a list called
rt6i_siblings.
ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
the other (in both case, the flag NLM_F_EXCL should not be set).
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/ip6_fib.h | 50 +++++++++++++++
net/ipv6/Kconfig | 10 +++
net/ipv6/ip6_fib.c | 71 +++++++++++++++++++++
net/ipv6/route.c | 169 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 297 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 8a2a203..2712572 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,8 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+ int fc_mp_len;
+ struct nlattr *fc_mp;
struct nl_info fc_nlinfo;
};
@@ -98,6 +100,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ unsigned int rt6i_nsiblings;
+ struct list_head rt6i_siblings;
+#endif
atomic_t rt6i_ref;
@@ -318,4 +329,43 @@ static inline void fib6_rules_cleanup(void)
return ;
}
#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH
+static inline unsigned int ipv6_multipath_get_nsiblings(const struct rt6_info *rt)
+{
+ return rt->rt6i_nsiblings;
+}
+static inline void ipv6_multipath_reset_nsiblings(struct rt6_info *rt)
+{
+ rt->rt6i_nsiblings = 0;
+}
+static inline void ipv6_multipath_inc_nsiblings(struct rt6_info *rt)
+{
+ rt->rt6i_nsiblings++;
+}
+static inline void ipv6_multipath_dec_nsiblings(struct rt6_info *rt)
+{
+ rt->rt6i_nsiblings--;
+}
+#else
+static inline unsigned int ipv6_multipath_get_nsiblings(const struct rt6_info *rt)
+{
+ return 0;
+}
+static inline void ipv6_multipath_reset_nsiblings(struct rt6_info *rt)
+{
+}
+static inline void ipv6_multipath_inc_nsiblings(struct rt6_info *rt)
+{
+}
+static inline void ipv6_multipath_dec_nsiblings(struct rt6_info *rt)
+{
+}
+static inline struct rt6_info *rt6_multipath_select(struct net *net,
+ struct rt6_info *rt,
+ struct flowi6 *fl6)
+{
+ return rt;
+}
+#endif
#endif
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 4f7fe72..fc2f3cb 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -266,4 +266,14 @@ config IPV6_PIMSM_V2
Support for IPv6 PIM multicast routing protocol PIM-SMv2.
If unsure, say N.
+config IPV6_MULTIPATH
+ bool "IPv6: equal cost multipath for IPv6 routing"
+ depends on IPV6
+ default y
+ ---help---
+ Enable this option to support ECMP for IPv6.
+
+ The algorithm used for route selection is based on a hash of packet
+ header (recommanded by RFC4311) and flowlabel (RFC6438).
+
endif # IPV6
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 24995a9..ef4faf8 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+ if (ipv6_multipath_get_nsiblings(rt))
+ ipv6_multipath_reset_nsiblings(rt);
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +682,21 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ ipv6_multipath_inc_nsiblings(rt);
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +709,45 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Link this route to others same route. */
+ if (ipv6_multipath_get_nsiblings(rt)) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. We can check if all the counter are equal.
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings,
+ rt6i_siblings) {
+ ipv6_multipath_inc_nsiblings(sibling);
+ if (unlikely(ipv6_multipath_get_nsiblings(sibling) !=
+ ipv6_multipath_get_nsiblings(rt))) {
+ pr_err("Wrong number of siblings for route %p (%d)\n",
+ sibling, ipv6_multipath_get_nsiblings(sibling));
+ }
+ rt6i_nsiblings++;
+ }
+ if (unlikely(rt6i_nsiblings !=
+ ipv6_multipath_get_nsiblings(rt))) {
+ pr_err("Wrong number of siblings for route %p. I have %d routes, but count %d siblings\n",
+ rt, rt6i_nsiblings,
+ ipv6_multipath_get_nsiblings(rt));
+ }
+ }
+#endif
/*
* insert node
*/
@@ -1193,6 +1249,21 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Remove this entry from other siblings */
+ if (ipv6_multipath_get_nsiblings(rt)) {
+ struct rt6_info *sibling, *next_sibling;
+
+ /* For each siblings, decrement the counter of siblings */
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ ipv6_multipath_dec_nsiblings(sibling);
+ }
+ ipv6_multipath_reset_nsiblings(rt);
+ list_del_init(&rt->rt6i_siblings);
+ }
+#endif
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d1ddbc6..4c42b9e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,9 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#ifdef CONFIG_IPV6_MULTIPATH
+#include <net/nexthop.h>
+#endif
#include <asm/uaccess.h>
@@ -289,6 +292,10 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
rt->rt6i_genid = rt_genid(net);
+#ifdef CONFIG_IPV6_MULTIPATH
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+#endif
+ ipv6_multipath_reset_nsiblings(rt);
}
return rt;
}
@@ -385,6 +392,92 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+#ifdef CONFIG_IPV6_MULTIPATH
+/*
+ * Multipath route selection.
+ */
+
+/*
+ * Hash based function using packet header and flowlabel.
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+ /* RFC6438 recommands to use flowlabel */
+ val ^= fl6->flowlabel;
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+
+/*
+ * This function returns an index used to select a route between any siblings.
+ *
+ * Note: fl6 can be NULL
+ */
+static unsigned int rt6_info_hashfn(struct net *net,
+ const struct rt6_info *rt,
+ const struct flowi6 *fl6)
+{
+ int candidate_count = ipv6_multipath_get_nsiblings(rt) + 1;
+
+ if (fl6 == NULL)
+ return 0;
+ return rt6_info_hash_nhsfn(candidate_count, fl6);
+}
+
+static struct rt6_info *rt6_multipath_select(struct net *net,
+ struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hashfn(net, match, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -702,6 +795,8 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+ if (ipv6_multipath_get_nsiblings(rt) && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -863,7 +958,8 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+ if (ipv6_multipath_get_nsiblings(rt) && oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2248,6 +2344,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+#ifdef CONFIG_IPV6_MULTIPATH
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+#endif
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2325,11 +2424,69 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+
err = 0;
errout:
return err;
}
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+#ifdef CONFIG_IPV6_MULTIPATH
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+#else
+ return -ENOSYS;
+#endif /* CONFIG_IPV6_MULTIPATH */
+}
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2339,7 +2496,10 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2351,7 +2511,10 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-02 16:02 ` [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP) Nicolas Dichtel
@ 2012-10-02 16:06 ` Nicolas Dichtel
2012-10-02 16:14 ` Eric Dumazet
2012-10-02 18:43 ` [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP) David Miller
0 siblings, 2 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-02 16:06 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: joe, bernat, netdev, yoshfuji, davem
Le 02/10/2012 18:02, Nicolas Dichtel a écrit :
> Each nexthop is added like a single route in the routing table. All routes
> that have the same metric/weight and destination but not the same gateway
> are considering as ECMP routes. They are linked together, through a list called
> rt6i_siblings.
>
> ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
> the other (in both case, the flag NLM_F_EXCL should not be set).
>
> The patch is based on a previous work from
> Luc Saillard <luc.saillard@6wind.com>.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
I forget to run checkpatch.pl, some lines are over 80 columns. I will fix it in
the v7 with other comments (if any).
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-02 16:06 ` Nicolas Dichtel
@ 2012-10-02 16:14 ` Eric Dumazet
2012-10-19 9:13 ` [PATCH net-next v7 0/1] Add support of ECMPv6 nicolas.dichtel
2012-10-02 18:43 ` [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP) David Miller
1 sibling, 1 reply; 47+ messages in thread
From: Eric Dumazet @ 2012-10-02 16:14 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: joe, bernat, netdev, yoshfuji, davem
On Tue, 2012-10-02 at 18:06 +0200, Nicolas Dichtel wrote:
> I forget to run checkpatch.pl, some lines are over 80 columns. I will fix it in
> the v7 with other comments (if any).
> --
Yep, please reorder :
@@ -98,6 +100,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ unsigned int rt6i_nsiblings;
+ struct list_head rt6i_siblings;
+#endif
atomic_t rt6i_ref;
@@ -318,4 +329,43 @@ static inline void fib6_rules_cleanup(void)
return ;
}
to :
@@ -98,6 +100,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ struct list_head rt6i_siblings;
+ unsigned int rt6i_nsiblings;
+#endif
atomic_t rt6i_ref;
@@ -318,4 +329,43 @@ static inline void fib6_rules_cleanup(void)
return ;
}
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-02 16:06 ` Nicolas Dichtel
2012-10-02 16:14 ` Eric Dumazet
@ 2012-10-02 18:43 ` David Miller
1 sibling, 0 replies; 47+ messages in thread
From: David Miller @ 2012-10-02 18:43 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: joe, bernat, netdev, yoshfuji
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Tue, 02 Oct 2012 18:06:28 +0200
> Le 02/10/2012 18:02, Nicolas Dichtel a écrit :
>> Each nexthop is added like a single route in the routing table. All
>> routes
>> that have the same metric/weight and destination but not the same
>> gateway
>> are considering as ECMP routes. They are linked together, through a
>> list called
>> rt6i_siblings.
>>
>> ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or
>> one after
>> the other (in both case, the flag NLM_F_EXCL should not be set).
>>
>> The patch is based on a previous work from
>> Luc Saillard <luc.saillard@6wind.com>.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> I forget to run checkpatch.pl, some lines are over 80 columns. I will
> fix it in the v7 with other comments (if any).
No rush as this is too late for this merge window anyways.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next v2 0/1] Add support of ECMPv6
2012-09-14 13:35 ` Nicolas Dichtel
2012-09-14 13:37 ` Nicolas Dichtel
@ 2012-10-15 12:36 ` Vincent Bernat
2012-10-15 19:54 ` Vincent Bernat
1 sibling, 1 reply; 47+ messages in thread
From: Vincent Bernat @ 2012-10-15 12:36 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: yoshfuji, netdev, davem
Le 14.09.2012 15:35, Nicolas Dichtel a écrit :
>> Therefore, the problem is not in iproute2 which knows how to display
>> those ECMP routes. I fear that this difference make support in
>> routing
>> daemons more difficult.
> Hmm, can you elaborate? Our routing daemon, quagga, manage it without
> any problem.
Hi!
Sorry for the late answer. I have been experimenting with your patch
and it seems that Quagga does not handle such routes. Do you have some
patchset on top of Quagga? I am looking at
28971c8cb1138700e87dc7da673e59b5596bb51b (which is fairly recent) and in
zebra/rt_netlink.c, IPv6 routes are handled as IPv4 routes: multiple
hops are added as attributes.
In Quagga, I do:
ipv6 route 2001:db8:97::/64 2001:db8:1::2
ipv6 route 2001:db8:97::/64 2001:db8:2::2
And I get:
r1(VTY)# show ipv6 route
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv6, I - IS-IS, B - BGP, A - Babel,
> - selected route, * - FIB route
C>* ::1/128 is directly connected, lo
O 2001:db8:1::/64 [110/1] is directly connected, eth0, 01:52:42
C>* 2001:db8:1::/64 is directly connected, eth0
O 2001:db8:2::/64 [110/1] is directly connected, eth1, 01:52:37
C>* 2001:db8:2::/64 is directly connected, eth1
S> 2001:db8:97::/64 [1/0] via 2001:db8:1::2, eth0
via 2001:db8:2::2, eth1
K>* 2001:db8:98::/64 via 2001:db8:2::2, eth1
C>* 2001:db8:99::/64 is directly connected, dummy0
C * fe80::/64 is directly connected, eth1
C * fe80::/64 is directly connected, eth0
C>* fe80::/64 is directly connected, dummy0
The route is not installed in the kernel (not "*"):
2012/10/15 14:22:01 ZEBRA: rib_process: 2001:db8:97::/64: Updating
existing route, select 0x7fee39f0ad10, fib 0x7fee39f0ad10
2012/10/15 14:22:01 ZEBRA: netlink_route_multipath() (multihop):
RTM_DELROUTE 2001:db8:97::/64, type IPv6 nexthop
2012/10/15 14:22:01 ZEBRA: netlink_route_multipath() (multihop):
nexthop via 2001:db8:1::2 if 4
2012/10/15 14:22:01 ZEBRA: netlink_talk: netlink-cmd type
RTM_DELROUTE(25), seq=27
2012/10/15 14:22:01 ZEBRA: netlink_route_multipath() (multihop):
RTM_NEWROUTE 2001:db8:97::/64, type IPv6 nexthop
2012/10/15 14:22:01 ZEBRA: netlink_route_multipath() (multihop):
nexthop via 2001:db8:1::2 if 4
2012/10/15 14:22:01 ZEBRA: netlink_route_multipath() (multihop):
RTM_NEWROUTE 2001:db8:97::/64, type IPv6 nexthop
2012/10/15 14:22:01 ZEBRA: netlink_route_multipath() (multihop):
nexthop via 2001:db8:2::2 if 5
2012/10/15 14:22:01 ZEBRA: netlink_talk: netlink-cmd type
RTM_NEWROUTE(24), seq=28
2012/10/15 14:22:01 ZEBRA: netlink-cmd error: No such process,
type=RTM_NEWROUTE(24), seq=28, pid=0
The problem is the same with BIRD. The difference with IPv4 makes it
difficult to factor the code between IPv4 and IPv6. What do you think?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RFC PATCH net-next v2 0/1] Add support of ECMPv6
2012-10-15 12:36 ` Vincent Bernat
@ 2012-10-15 19:54 ` Vincent Bernat
0 siblings, 0 replies; 47+ messages in thread
From: Vincent Bernat @ 2012-10-15 19:54 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: yoshfuji, netdev, davem
❦ 15 octobre 2012 14:36 CEST, Vincent Bernat <bernat@luffy.cx> :
> The problem is the same with BIRD. The difference with IPv4 makes it
> difficult to factor the code between IPv4 and IPv6. What do you think?
I am in the process of creating the appropriate patches for Quagga and
it is less difficult that I thought. It is possible to plug the
difference between IPv4 and IPv6 for multipath routes right into the
difference between IPv4 single hop and IPv4 multi hop routes that are
also handled differently.
I am still thinking that the approach is confusing for the userland but
it is not really cumbersome from the implementor point of view and it
has also some advantages.
--
Make it clear before you make it faster.
- The Elements of Programming Style (Kernighan & Plauger)
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v7 0/1] Add support of ECMPv6
2012-10-02 16:14 ` Eric Dumazet
@ 2012-10-19 9:13 ` nicolas.dichtel
2012-10-19 9:13 ` [PATCH net-next v7 1/1] ipv6: add support of equal cost multipath (ECMP) nicolas.dichtel
0 siblings, 1 reply; 47+ messages in thread
From: nicolas.dichtel @ 2012-10-19 9:13 UTC (permalink / raw)
To: eric.dumazet; +Cc: joe, bernat, netdev, yoshfuji, davem
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too, see http://patchwork.ozlabs.org/patch/183277/
He also starts to write a patch against quagga, to be able to manage ECMPv6
routes implemented in this patch:
http://marc.info/?l=quagga-dev&m=135040310117116&w=2
If the kernel patch is approved, I can submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0
But note that this command is a shortcut and previous patches are not
mandatory to set ECMP routes. The following commands can be used too:
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
eth0
$ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
eth0
Here is an example of a dump:
$ ip -6 route | grep 3ffe:304:124:2306::/64
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0 metric 1024
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev eth0 metric 1024
v7: fix checkpatch.pl warning
invert rt6i_nsiblings and rt6i_siblings in struct rt6_info
v6: be more verbose in commitlog
add some helpers in ip6_fib.h to avoid to have too many ifdef block in the
code
invert fc_mp_len and fc_mp in struct fib6_config to avoid a hole on 64bits
arch
v5: to minimize the patch and ease its integration, remove roundrobin and random
algorithms for route selection. It will be possible to add new algorithms
through rt6_info_hashfn() when the basic support of ECMP is integrated.
v4: remove compilation options to choose multipath algorithm for next hop
selection. Now the choice can be done at run time via
/proc/sys/net/ipv6/route/multipath_algorithm
v3: rebase after updating net-next
v2: rename CONFIG_IPV6_MULTIPATH_ROUTE to CONFIG_IPV6_MULTIPATH_HASH
use flowlabel in the hash function
add reference to RFC
fix a small identation issue
remove "If unsure, say N." from the help of CONFIG_IPV6_MULTIPATH
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v7 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-19 9:13 ` [PATCH net-next v7 0/1] Add support of ECMPv6 nicolas.dichtel
@ 2012-10-19 9:13 ` nicolas.dichtel
2012-10-22 0:41 ` David Miller
0 siblings, 1 reply; 47+ messages in thread
From: nicolas.dichtel @ 2012-10-19 9:13 UTC (permalink / raw)
To: eric.dumazet; +Cc: joe, bernat, netdev, yoshfuji, davem, Nicolas Dichtel
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Each nexthop is added like a single route in the routing table. All routes
that have the same metric/weight and destination but not the same gateway
are considering as ECMP routes. They are linked together, through a list called
rt6i_siblings.
ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
the other (in both case, the flag NLM_F_EXCL should not be set).
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/ip6_fib.h | 52 ++++++++++++++++
net/ipv6/Kconfig | 10 +++
net/ipv6/ip6_fib.c | 72 ++++++++++++++++++++++
net/ipv6/route.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 298 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 8a2a203..7c666c8 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,8 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+ int fc_mp_len;
+ struct nlattr *fc_mp;
struct nl_info fc_nlinfo;
};
@@ -98,6 +100,15 @@ struct rt6_info {
struct fib6_node *rt6i_node;
struct in6_addr rt6i_gateway;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /*
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ struct list_head rt6i_siblings;
+ unsigned int rt6i_nsiblings;
+#endif
atomic_t rt6i_ref;
@@ -318,4 +329,45 @@ static inline void fib6_rules_cleanup(void)
return ;
}
#endif
+
+#ifdef CONFIG_IPV6_MULTIPATH
+static inline unsigned int
+ipv6_multipath_get_nsiblings(const struct rt6_info *rt)
+{
+ return rt->rt6i_nsiblings;
+}
+static inline void ipv6_multipath_reset_nsiblings(struct rt6_info *rt)
+{
+ rt->rt6i_nsiblings = 0;
+}
+static inline void ipv6_multipath_inc_nsiblings(struct rt6_info *rt)
+{
+ rt->rt6i_nsiblings++;
+}
+static inline void ipv6_multipath_dec_nsiblings(struct rt6_info *rt)
+{
+ rt->rt6i_nsiblings--;
+}
+#else
+static inline unsigned int
+ipv6_multipath_get_nsiblings(const struct rt6_info *rt)
+{
+ return 0;
+}
+static inline void ipv6_multipath_reset_nsiblings(struct rt6_info *rt)
+{
+}
+static inline void ipv6_multipath_inc_nsiblings(struct rt6_info *rt)
+{
+}
+static inline void ipv6_multipath_dec_nsiblings(struct rt6_info *rt)
+{
+}
+static inline struct rt6_info *rt6_multipath_select(struct net *net,
+ struct rt6_info *rt,
+ struct flowi6 *fl6)
+{
+ return rt;
+}
+#endif
#endif
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 4f7fe72..fc2f3cb 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -266,4 +266,14 @@ config IPV6_PIMSM_V2
Support for IPv6 PIM multicast routing protocol PIM-SMv2.
If unsure, say N.
+config IPV6_MULTIPATH
+ bool "IPv6: equal cost multipath for IPv6 routing"
+ depends on IPV6
+ default y
+ ---help---
+ Enable this option to support ECMP for IPv6.
+
+ The algorithm used for route selection is based on a hash of packet
+ header (recommanded by RFC4311) and flowlabel (RFC6438).
+
endif # IPV6
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 24995a9..6b923d6 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+ if (ipv6_multipath_get_nsiblings(rt))
+ ipv6_multipath_reset_nsiblings(rt);
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +682,21 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ ipv6_multipath_inc_nsiblings(rt);
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +709,46 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Link this route to others same route. */
+ if (ipv6_multipath_get_nsiblings(rt)) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. We can check if all the counter are equal.
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings,
+ rt6i_siblings) {
+ ipv6_multipath_inc_nsiblings(sibling);
+ if (unlikely(ipv6_multipath_get_nsiblings(sibling) !=
+ ipv6_multipath_get_nsiblings(rt))) {
+ pr_err("Wrong number of siblings for route %p (%d)\n",
+ sibling,
+ ipv6_multipath_get_nsiblings(sibling));
+ }
+ rt6i_nsiblings++;
+ }
+ if (unlikely(rt6i_nsiblings !=
+ ipv6_multipath_get_nsiblings(rt))) {
+ pr_err("Wrong number of siblings for route %p. I have %d routes, but count %d siblings\n",
+ rt, rt6i_nsiblings,
+ ipv6_multipath_get_nsiblings(rt));
+ }
+ }
+#endif
/*
* insert node
*/
@@ -1193,6 +1250,21 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+#ifdef CONFIG_IPV6_MULTIPATH
+ /* Remove this entry from other siblings */
+ if (ipv6_multipath_get_nsiblings(rt)) {
+ struct rt6_info *sibling, *next_sibling;
+
+ /* For each siblings, decrement the counter of siblings */
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ ipv6_multipath_dec_nsiblings(sibling);
+ }
+ ipv6_multipath_reset_nsiblings(rt);
+ list_del_init(&rt->rt6i_siblings);
+ }
+#endif
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7c7e963..b339f5b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,9 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#ifdef CONFIG_IPV6_MULTIPATH
+#include <net/nexthop.h>
+#endif
#include <asm/uaccess.h>
@@ -289,6 +292,10 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
rt->rt6i_genid = rt_genid(net);
+#ifdef CONFIG_IPV6_MULTIPATH
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+#endif
+ ipv6_multipath_reset_nsiblings(rt);
}
return rt;
}
@@ -385,6 +392,90 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+#ifdef CONFIG_IPV6_MULTIPATH
+/*
+ * Multipath route selection.
+ */
+
+/* Hash based function using packet header and flowlabel.
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+ /* RFC6438 recommands to use flowlabel */
+ val ^= fl6->flowlabel;
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+
+/* This function returns an index used to select a route between any siblings.
+ *
+ * Note: fl6 can be NULL
+ */
+static unsigned int rt6_info_hashfn(struct net *net,
+ const struct rt6_info *rt,
+ const struct flowi6 *fl6)
+{
+ int candidate_count = ipv6_multipath_get_nsiblings(rt) + 1;
+
+ if (fl6 == NULL)
+ return 0;
+ return rt6_info_hash_nhsfn(candidate_count, fl6);
+}
+
+static struct rt6_info *rt6_multipath_select(struct net *net,
+ struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hashfn(net, match, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+#endif /* CONFIG_IPV6_MULTIPATH */
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -702,6 +793,8 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+ if (ipv6_multipath_get_nsiblings(rt) && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -863,7 +956,8 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+ if (ipv6_multipath_get_nsiblings(rt) && oif == 0)
+ rt = rt6_multipath_select(net, rt, fl6);
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2249,6 +2343,9 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+#ifdef CONFIG_IPV6_MULTIPATH
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
+#endif
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2326,11 +2423,69 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+
err = 0;
errout:
return err;
}
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+#ifdef CONFIG_IPV6_MULTIPATH
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+#else
+ return -ENOSYS;
+#endif /* CONFIG_IPV6_MULTIPATH */
+}
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2340,7 +2495,10 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2352,7 +2510,10 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v7 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-19 9:13 ` [PATCH net-next v7 1/1] ipv6: add support of equal cost multipath (ECMP) nicolas.dichtel
@ 2012-10-22 0:41 ` David Miller
2012-10-22 13:42 ` [PATCH net-next v8 0/1] Add support of ECMPv6 nicolas.dichtel
0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2012-10-22 0:41 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: eric.dumazet, joe, bernat, netdev, yoshfuji
Why do you go through all of the effort to create a nice
abstraction in a header file:
> +#ifdef CONFIG_IPV6_MULTIPATH
> +static inline unsigned int
> +ipv6_multipath_get_nsiblings(const struct rt6_info *rt)
> +{
> + return rt->rt6i_nsiblings;
> +}
...
Only to screw it up by still plopping ifdef crap into foo.c files?
> +#ifdef CONFIG_IPV6_MULTIPATH
> + INIT_LIST_HEAD(&rt->rt6i_siblings);
> +#endif
> + ipv6_multipath_reset_nsiblings(rt);
I really don't want to see these ifdefs.
And if they are unavoidable, remove this configure option
altogether and make the code unconditionally included.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v8 0/1] Add support of ECMPv6
2012-10-22 0:41 ` David Miller
@ 2012-10-22 13:42 ` nicolas.dichtel
2012-10-22 13:42 ` [PATCH net-next v8 1/1] ipv6: add support of equal cost multipath (ECMP) nicolas.dichtel
0 siblings, 1 reply; 47+ messages in thread
From: nicolas.dichtel @ 2012-10-22 13:42 UTC (permalink / raw)
To: davem; +Cc: joe, bernat, netdev, yoshfuji, eric.dumazet
Here is a proposal to add the support of ECMPv6. The previous patch
from Vincent against iproute2 can be used, but a little other patch is needed
too, see http://patchwork.ozlabs.org/patch/183277/
He also starts to write a patch against quagga, to be able to manage ECMPv6
routes implemented in this patch:
http://marc.info/?l=quagga-dev&m=135040310117116&w=2
If the kernel patch is approved, I can submit formally the patch for
iproute2.
Here is an example of a command to add an ECMP route:
$ ip -6 route add 3ffe:304:124:2306::/64 \
nexthop via fe80::230:1bff:feb4:e05c dev eth0 \
nexthop via fe80::230:1bff:feb4:dd4f dev eth0
But note that this command is a shortcut and previous patches are not
mandatory to set ECMP routes. The following commands can be used too:
$ ip -6 route add 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev
eth0
$ ip -6 route append 3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev
eth0
Here is an example of a dump:
$ ip -6 route | grep 3ffe:304:124:2306::/64
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:dd4f dev eth0 metric 1024
3ffe:304:124:2306::/64 via fe80::230:1bff:feb4:e05c dev eth0 metric 1024
v8: Remove CONFIG_IPV6_MULTIPATH (keeping this option with only some helpers
in header files will be just unreadable).
Replace some invisible printk() by BUG_ON(): if counters don't match, list of
siblings is broken.
Remove rt6_info_hashfn() (useless in the current patch).
Remove argument net from rt6_multipath_select(), it was not used.
v7: fix checkpatch.pl warning
invert rt6i_nsiblings and rt6i_siblings in struct rt6_info
v6: be more verbose in commitlog
add some helpers in ip6_fib.h to avoid to have too many ifdef block in the
code
invert fc_mp_len and fc_mp in struct fib6_config to avoid a hole on 64bits
arch
v5: to minimize the patch and ease its integration, remove roundrobin and random
algorithms for route selection. It will be possible to add new algorithms
through rt6_info_hashfn() when the basic support of ECMP is integrated.
v4: remove compilation options to choose multipath algorithm for next hop
selection. Now the choice can be done at run time via
/proc/sys/net/ipv6/route/multipath_algorithm
v3: rebase after updating net-next
v2: rename CONFIG_IPV6_MULTIPATH_ROUTE to CONFIG_IPV6_MULTIPATH_HASH
use flowlabel in the hash function
add reference to RFC
fix a small identation issue
remove "If unsure, say N." from the help of CONFIG_IPV6_MULTIPATH
Comments are welcome.
Regards,
Nicolas
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH net-next v8 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-22 13:42 ` [PATCH net-next v8 0/1] Add support of ECMPv6 nicolas.dichtel
@ 2012-10-22 13:42 ` nicolas.dichtel
2012-10-23 6:39 ` David Miller
0 siblings, 1 reply; 47+ messages in thread
From: nicolas.dichtel @ 2012-10-22 13:42 UTC (permalink / raw)
To: davem; +Cc: joe, bernat, netdev, yoshfuji, eric.dumazet, Nicolas Dichtel
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Each nexthop is added like a single route in the routing table. All routes
that have the same metric/weight and destination but not the same gateway
are considering as ECMP routes. They are linked together, through a list called
rt6i_siblings.
ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
the other (in both case, the flag NLM_F_EXCL should not be set).
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
include/net/ip6_fib.h | 10 ++++
net/ipv6/ip6_fib.c | 57 +++++++++++++++++++++
net/ipv6/route.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 200 insertions(+), 3 deletions(-)
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 8a2a203..20210d7 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -47,6 +47,8 @@ struct fib6_config {
unsigned long fc_expires;
struct nlattr *fc_mx;
int fc_mx_len;
+ int fc_mp_len;
+ struct nlattr *fc_mp;
struct nl_info fc_nlinfo;
};
@@ -99,6 +101,14 @@ struct rt6_info {
struct in6_addr rt6i_gateway;
+ /* Multipath routes:
+ * siblings is a list of rt6_info that have the the same metric/weight,
+ * destination, but not the same gateway. nsiblings is just a cache
+ * to speed up lookup.
+ */
+ struct list_head rt6i_siblings;
+ unsigned int rt6i_nsiblings;
+
atomic_t rt6i_ref;
/* These are in a separate cache line. */
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 24995a9..710cafd 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -672,6 +672,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
iter->rt6i_idev == rt->rt6i_idev &&
ipv6_addr_equal(&iter->rt6i_gateway,
&rt->rt6i_gateway)) {
+ if (rt->rt6i_nsiblings)
+ rt->rt6i_nsiblings = 0;
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
@@ -680,6 +682,21 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
rt6_set_expires(iter, rt->dst.expires);
return -EEXIST;
}
+ /* If we have the same destination and the same metric,
+ * but not the same gateway, then the route we try to
+ * add is sibling to this route, increment our counter
+ * of siblings, and later we will add our route to the
+ * list.
+ * Only static routes (which don't have flag
+ * RTF_EXPIRES) are used for ECMPv6.
+ *
+ * To avoid long list, we only had siblings if the
+ * route have a gateway.
+ */
+ if (rt->rt6i_flags & RTF_GATEWAY &&
+ !(rt->rt6i_flags & RTF_EXPIRES) &&
+ !(iter->rt6i_flags & RTF_EXPIRES))
+ rt->rt6i_nsiblings++;
}
if (iter->rt6i_metric > rt->rt6i_metric)
@@ -692,6 +709,35 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
if (ins == &fn->leaf)
fn->rr_ptr = NULL;
+ /* Link this route to others same route. */
+ if (rt->rt6i_nsiblings) {
+ unsigned int rt6i_nsiblings;
+ struct rt6_info *sibling, *temp_sibling;
+
+ /* Find the first route that have the same metric */
+ sibling = fn->leaf;
+ while (sibling) {
+ if (sibling->rt6i_metric == rt->rt6i_metric) {
+ list_add_tail(&rt->rt6i_siblings,
+ &sibling->rt6i_siblings);
+ break;
+ }
+ sibling = sibling->dst.rt6_next;
+ }
+ /* For each sibling in the list, increment the counter of
+ * siblings. BUG() if counters does not match, list of siblings
+ * is broken!
+ */
+ rt6i_nsiblings = 0;
+ list_for_each_entry_safe(sibling, temp_sibling,
+ &rt->rt6i_siblings, rt6i_siblings) {
+ sibling->rt6i_nsiblings++;
+ BUG_ON(sibling->rt6i_nsiblings != rt->rt6i_nsiblings);
+ rt6i_nsiblings++;
+ }
+ BUG_ON(rt6i_nsiblings != rt->rt6i_nsiblings);
+ }
+
/*
* insert node
*/
@@ -1193,6 +1239,17 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
if (fn->rr_ptr == rt)
fn->rr_ptr = NULL;
+ /* Remove this entry from other siblings */
+ if (rt->rt6i_nsiblings) {
+ struct rt6_info *sibling, *next_sibling;
+
+ list_for_each_entry_safe(sibling, next_sibling,
+ &rt->rt6i_siblings, rt6i_siblings)
+ sibling->rt6i_nsiblings--;
+ rt->rt6i_nsiblings = 0;
+ list_del_init(&rt->rt6i_siblings);
+ }
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7c7e963..126da56 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -57,6 +57,7 @@
#include <net/xfrm.h>
#include <net/netevent.h>
#include <net/netlink.h>
+#include <net/nexthop.h>
#include <asm/uaccess.h>
@@ -289,6 +290,8 @@ static inline struct rt6_info *ip6_dst_alloc(struct net *net,
memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
rt6_init_peer(rt, table ? &table->tb6_peers : net->ipv6.peers);
rt->rt6i_genid = rt_genid(net);
+ INIT_LIST_HEAD(&rt->rt6i_siblings);
+ rt->rt6i_nsiblings = 0;
}
return rt;
}
@@ -385,6 +388,69 @@ static bool rt6_need_strict(const struct in6_addr *daddr)
(IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL | IPV6_ADDR_LOOPBACK);
}
+/* Multipath route selection:
+ * Hash based function using packet header and flowlabel.
+ * Adapted from fib_info_hashfn()
+ */
+static int rt6_info_hash_nhsfn(unsigned int candidate_count,
+ const struct flowi6 *fl6)
+{
+ unsigned int val = fl6->flowi6_proto;
+
+ val ^= fl6->daddr.s6_addr32[0];
+ val ^= fl6->daddr.s6_addr32[1];
+ val ^= fl6->daddr.s6_addr32[2];
+ val ^= fl6->daddr.s6_addr32[3];
+
+ val ^= fl6->saddr.s6_addr32[0];
+ val ^= fl6->saddr.s6_addr32[1];
+ val ^= fl6->saddr.s6_addr32[2];
+ val ^= fl6->saddr.s6_addr32[3];
+
+ /* Work only if this not encapsulated */
+ switch (fl6->flowi6_proto) {
+ case IPPROTO_UDP:
+ case IPPROTO_TCP:
+ case IPPROTO_SCTP:
+ val ^= fl6->fl6_sport;
+ val ^= fl6->fl6_dport;
+ break;
+
+ case IPPROTO_ICMPV6:
+ val ^= fl6->fl6_icmp_type;
+ val ^= fl6->fl6_icmp_code;
+ break;
+ }
+ /* RFC6438 recommands to use flowlabel */
+ val ^= fl6->flowlabel;
+
+ /* Perhaps, we need to tune, this function? */
+ val = val ^ (val >> 7) ^ (val >> 12);
+ return val % candidate_count;
+}
+
+static struct rt6_info *rt6_multipath_select(struct rt6_info *match,
+ struct flowi6 *fl6)
+{
+ struct rt6_info *sibling, *next_sibling;
+ int route_choosen;
+
+ route_choosen = rt6_info_hash_nhsfn(match->rt6i_nsiblings + 1, fl6);
+ /* Don't change the route, if route_choosen == 0
+ * (siblings does not include ourself)
+ */
+ if (route_choosen)
+ list_for_each_entry_safe(sibling, next_sibling,
+ &match->rt6i_siblings, rt6i_siblings) {
+ route_choosen--;
+ if (route_choosen == 0) {
+ match = sibling;
+ break;
+ }
+ }
+ return match;
+}
+
/*
* Route lookup. Any table->tb6_lock is implied.
*/
@@ -702,6 +768,8 @@ static struct rt6_info *ip6_pol_route_lookup(struct net *net,
restart:
rt = fn->leaf;
rt = rt6_device_match(net, rt, &fl6->saddr, fl6->flowi6_oif, flags);
+ if (rt->rt6i_nsiblings && fl6->flowi6_oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
BACKTRACK(net, &fl6->saddr);
out:
dst_use(&rt->dst, jiffies);
@@ -863,7 +931,8 @@ restart_2:
restart:
rt = rt6_select(fn, oif, strict | reachable);
-
+ if (rt->rt6i_nsiblings && oif == 0)
+ rt = rt6_multipath_select(rt, fl6);
BACKTRACK(net, &fl6->saddr);
if (rt == net->ipv6.ip6_null_entry ||
rt->rt6i_flags & RTF_CACHE)
@@ -2249,6 +2318,7 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
[RTA_IIF] = { .type = NLA_U32 },
[RTA_PRIORITY] = { .type = NLA_U32 },
[RTA_METRICS] = { .type = NLA_NESTED },
+ [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
};
static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2326,11 +2396,65 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
if (tb[RTA_TABLE])
cfg->fc_table = nla_get_u32(tb[RTA_TABLE]);
+ if (tb[RTA_MULTIPATH]) {
+ cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]);
+ cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]);
+ }
+
err = 0;
errout:
return err;
}
+static int ip6_route_multipath(struct fib6_config *cfg, int add)
+{
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 0, last_err = 0;
+
+beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+ remaining = cfg->fc_mp_len;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+ if (err) {
+ last_err = err;
+ /* If we are trying to remove a route, do not stop the
+ * loop when ip6_route_del() fails (because next hop is
+ * already gone), we should try to remove all next hops.
+ */
+ if (add) {
+ /* If add fails, we should try to delete all
+ * next hops that have been already added.
+ */
+ add = 0;
+ goto beginning;
+ }
+ }
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ return last_err;
+}
+
static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
{
struct fib6_config cfg;
@@ -2340,7 +2464,10 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_del(&cfg);
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 0);
+ else
+ return ip6_route_del(&cfg);
}
static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
@@ -2352,7 +2479,10 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
if (err < 0)
return err;
- return ip6_route_add(&cfg);
+ if (cfg.fc_mp)
+ return ip6_route_multipath(&cfg, 1);
+ else
+ return ip6_route_add(&cfg);
}
static inline size_t rt6_nlmsg_size(void)
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH net-next v8 1/1] ipv6: add support of equal cost multipath (ECMP)
2012-10-22 13:42 ` [PATCH net-next v8 1/1] ipv6: add support of equal cost multipath (ECMP) nicolas.dichtel
@ 2012-10-23 6:39 ` David Miller
2012-10-23 12:42 ` [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop" Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2012-10-23 6:39 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: joe, bernat, netdev, yoshfuji, eric.dumazet
From: nicolas.dichtel@6wind.com
Date: Mon, 22 Oct 2012 15:42:09 +0200
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>
> Each nexthop is added like a single route in the routing table. All routes
> that have the same metric/weight and destination but not the same gateway
> are considering as ECMP routes. They are linked together, through a list called
> rt6i_siblings.
>
> ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
> the other (in both case, the flag NLM_F_EXCL should not be set).
>
> The patch is based on a previous work from
> Luc Saillard <luc.saillard@6wind.com>.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Applied, thanks.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop"
2012-10-23 6:39 ` David Miller
@ 2012-10-23 12:42 ` Nicolas Dichtel
2012-10-23 12:42 ` [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes Nicolas Dichtel
2012-10-25 16:08 ` [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop" Stephen Hemminger
0 siblings, 2 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-23 12:42 UTC (permalink / raw)
To: shemminger
Cc: netdev, joe, bernat, eric.dumazet, yoshfuji, davem,
Nicolas Dichtel
From: Vincent Bernat <bernat@luffy.cx>
IPv6 multipath routes were not accepted by "ip route" because an IPv4
address was expected for each gateway. Use `get_addr()` instead of
`get_addr32()`.
Signed-off-by: Vincent Bernat <bernat@luffy.cx>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
ip/iproute.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/ip/iproute.c b/ip/iproute.c
index 3e5f8d0..c60156f 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -625,16 +625,20 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
}
-int parse_one_nh(struct rtattr *rta, struct rtnexthop *rtnh, int *argcp, char ***argvp)
+int parse_one_nh(struct rtmsg *r, struct rtattr *rta, struct rtnexthop *rtnh, int *argcp, char ***argvp)
{
int argc = *argcp;
char **argv = *argvp;
while (++argv, --argc > 0) {
if (strcmp(*argv, "via") == 0) {
+ inet_prefix addr;
NEXT_ARG();
- rta_addattr32(rta, 4096, RTA_GATEWAY, get_addr32(*argv));
- rtnh->rtnh_len += sizeof(struct rtattr) + 4;
+ get_addr(&addr, *argv, r->rtm_family);
+ if (r->rtm_family == AF_UNSPEC)
+ r->rtm_family = addr.family;
+ rta_addattr_l(rta, 4096, RTA_GATEWAY, &addr.data, addr.bytelen);
+ rtnh->rtnh_len += sizeof(struct rtattr) + addr.bytelen;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
if ((rtnh->rtnh_ifindex = ll_name_to_index(*argv)) == 0) {
@@ -686,7 +690,7 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
memset(rtnh, 0, sizeof(*rtnh));
rtnh->rtnh_len = sizeof(*rtnh);
rta->rta_len += rtnh->rtnh_len;
- parse_one_nh(rta, rtnh, &argc, &argv);
+ parse_one_nh(r, rta, rtnh, &argc, &argv);
rtnh = RTNH_NEXT(rtnh);
}
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes
2012-10-23 12:42 ` [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop" Nicolas Dichtel
@ 2012-10-23 12:42 ` Nicolas Dichtel
2012-10-25 16:06 ` Stephen Hemminger
2012-10-25 16:08 ` [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop" Stephen Hemminger
1 sibling, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-23 12:42 UTC (permalink / raw)
To: shemminger
Cc: netdev, joe, bernat, eric.dumazet, yoshfuji, davem,
Nicolas Dichtel
ECMPv6 routes are added each one after the other by the kernel, so we should
avoid to set the flag NLM_F_EXCL.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
ip/iproute.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/ip/iproute.c b/ip/iproute.c
index c60156f..799a70e 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -694,8 +694,11 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
rtnh = RTNH_NEXT(rtnh);
}
- if (rta->rta_len > RTA_LENGTH(0))
+ if (rta->rta_len > RTA_LENGTH(0)) {
addattr_l(n, 1024, RTA_MULTIPATH, RTA_DATA(rta), RTA_PAYLOAD(rta));
+ if (r->rtm_family == AF_INET6)
+ n->nlmsg_flags &= ~NLM_F_EXCL;
+ }
return 0;
}
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes
2012-10-23 12:42 ` [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes Nicolas Dichtel
@ 2012-10-25 16:06 ` Stephen Hemminger
2012-10-25 16:20 ` Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Stephen Hemminger @ 2012-10-25 16:06 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: netdev, joe, bernat, eric.dumazet, yoshfuji, davem
On Tue, 23 Oct 2012 14:42:56 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> ECMPv6 routes are added each one after the other by the kernel, so we should
> avoid to set the flag NLM_F_EXCL.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> ip/iproute.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/ip/iproute.c b/ip/iproute.c
> index c60156f..799a70e 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
> @@ -694,8 +694,11 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
> rtnh = RTNH_NEXT(rtnh);
> }
>
> - if (rta->rta_len > RTA_LENGTH(0))
> + if (rta->rta_len > RTA_LENGTH(0)) {
> addattr_l(n, 1024, RTA_MULTIPATH, RTA_DATA(rta), RTA_PAYLOAD(rta));
> + if (r->rtm_family == AF_INET6)
> + n->nlmsg_flags &= ~NLM_F_EXCL;
> + }
> return 0;
> }
>
Shouldn't this be true for multipath IPv4 as well?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop"
2012-10-23 12:42 ` [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop" Nicolas Dichtel
2012-10-23 12:42 ` [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes Nicolas Dichtel
@ 2012-10-25 16:08 ` Stephen Hemminger
1 sibling, 0 replies; 47+ messages in thread
From: Stephen Hemminger @ 2012-10-25 16:08 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: netdev, joe, bernat, eric.dumazet, yoshfuji, davem
On Tue, 23 Oct 2012 14:42:55 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> From: Vincent Bernat <bernat@luffy.cx>
>
> IPv6 multipath routes were not accepted by "ip route" because an IPv4
> address was expected for each gateway. Use `get_addr()` instead of
> `get_addr32()`.
>
> Signed-off-by: Vincent Bernat <bernat@luffy.cx>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Looks good. Applied.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes
2012-10-25 16:06 ` Stephen Hemminger
@ 2012-10-25 16:20 ` Nicolas Dichtel
2012-10-25 16:25 ` Stephen Hemminger
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-25 16:20 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, joe, bernat, eric.dumazet, yoshfuji, davem
Le 25/10/2012 18:06, Stephen Hemminger a écrit :
> On Tue, 23 Oct 2012 14:42:56 +0200
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> ECMPv6 routes are added each one after the other by the kernel, so we should
>> avoid to set the flag NLM_F_EXCL.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>> ip/iproute.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/ip/iproute.c b/ip/iproute.c
>> index c60156f..799a70e 100644
>> --- a/ip/iproute.c
>> +++ b/ip/iproute.c
>> @@ -694,8 +694,11 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
>> rtnh = RTNH_NEXT(rtnh);
>> }
>>
>> - if (rta->rta_len > RTA_LENGTH(0))
>> + if (rta->rta_len > RTA_LENGTH(0)) {
>> addattr_l(n, 1024, RTA_MULTIPATH, RTA_DATA(rta), RTA_PAYLOAD(rta));
>> + if (r->rtm_family == AF_INET6)
>> + n->nlmsg_flags &= ~NLM_F_EXCL;
>> + }
>> return 0;
>> }
>>
>
> Shouldn't this be true for multipath IPv4 as well?
>
In IPv4, the message is treating in one shot, because all nexthops are added in
the route. In IPv6, each nexthop is added like a single route and then they are
linked together.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes
2012-10-25 16:20 ` Nicolas Dichtel
@ 2012-10-25 16:25 ` Stephen Hemminger
2012-10-25 16:48 ` Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Stephen Hemminger @ 2012-10-25 16:25 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: netdev, joe, bernat, eric.dumazet, yoshfuji, davem
On Thu, 25 Oct 2012 18:20:49 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> Le 25/10/2012 18:06, Stephen Hemminger a écrit :
> > On Tue, 23 Oct 2012 14:42:56 +0200
> > Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> >
> >> ECMPv6 routes are added each one after the other by the kernel, so we should
> >> avoid to set the flag NLM_F_EXCL.
> >>
> >> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> >> ---
> >> ip/iproute.c | 5 ++++-
> >> 1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/ip/iproute.c b/ip/iproute.c
> >> index c60156f..799a70e 100644
> >> --- a/ip/iproute.c
> >> +++ b/ip/iproute.c
> >> @@ -694,8 +694,11 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
> >> rtnh = RTNH_NEXT(rtnh);
> >> }
> >>
> >> - if (rta->rta_len > RTA_LENGTH(0))
> >> + if (rta->rta_len > RTA_LENGTH(0)) {
> >> addattr_l(n, 1024, RTA_MULTIPATH, RTA_DATA(rta), RTA_PAYLOAD(rta));
> >> + if (r->rtm_family == AF_INET6)
> >> + n->nlmsg_flags &= ~NLM_F_EXCL;
> >> + }
> >> return 0;
> >> }
> >>
> >
> > Shouldn't this be true for multipath IPv4 as well?
> >
> In IPv4, the message is treating in one shot, because all nexthops are added in
> the route. In IPv6, each nexthop is added like a single route and then they are
> linked together.
So it is a fundamental design flaw in how either v4 or v6 was implemented in
the kernel?
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes
2012-10-25 16:25 ` Stephen Hemminger
@ 2012-10-25 16:48 ` Nicolas Dichtel
2012-11-02 8:58 ` [RESEND PATCH net-next] ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-10-25 16:48 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, joe, bernat, eric.dumazet, yoshfuji, davem
Le 25/10/2012 18:25, Stephen Hemminger a écrit :
> On Thu, 25 Oct 2012 18:20:49 +0200
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> Le 25/10/2012 18:06, Stephen Hemminger a écrit :
>>> On Tue, 23 Oct 2012 14:42:56 +0200
>>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>>
>>>> ECMPv6 routes are added each one after the other by the kernel, so we should
>>>> avoid to set the flag NLM_F_EXCL.
>>>>
>>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>>> ---
>>>> ip/iproute.c | 5 ++++-
>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/ip/iproute.c b/ip/iproute.c
>>>> index c60156f..799a70e 100644
>>>> --- a/ip/iproute.c
>>>> +++ b/ip/iproute.c
>>>> @@ -694,8 +694,11 @@ int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r, int argc, char **argv)
>>>> rtnh = RTNH_NEXT(rtnh);
>>>> }
>>>>
>>>> - if (rta->rta_len > RTA_LENGTH(0))
>>>> + if (rta->rta_len > RTA_LENGTH(0)) {
>>>> addattr_l(n, 1024, RTA_MULTIPATH, RTA_DATA(rta), RTA_PAYLOAD(rta));
>>>> + if (r->rtm_family == AF_INET6)
>>>> + n->nlmsg_flags &= ~NLM_F_EXCL;
>>>> + }
>>>> return 0;
>>>> }
>>>>
>>>
>>> Shouldn't this be true for multipath IPv4 as well?
>>>
>> In IPv4, the message is treating in one shot, because all nexthops are added in
>> the route. In IPv6, each nexthop is added like a single route and then they are
>> linked together.
>
> So it is a fundamental design flaw in how either v4 or v6 was implemented in
> the kernel?
>
The way to manage route is just different. Maybe a patch in the kernel is more
appropriate:
From b4979c97f33bc41a0fa095751bfcc05de074afec Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 25 Oct 2012 18:45:47 +0200
Subject: [PATCH] ipv6/multipath: remove flag NLM_F_EXCL after the first
nexthop
fib6_add_rt2node() will reject the nexthop if this flag is set, so
we perform the check only for the first nexthop.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
net/ipv6/route.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c42650c..9c7b5d8 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2449,6 +2449,12 @@ beginning:
goto beginning;
}
}
+ /* Because each route is added like a single route we remove
+ * this flag after the first nexthop (if there is a collision,
+ * we have already fail to add the first nexthop:
+ * fib6_add_rt2node() has reject it).
+ */
+ cfg->fc_nlinfo.nlh->nlmsg_flags &= ~NLM_F_EXCL;
rtnh = rtnh_next(rtnh, &remaining);
}
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* [RESEND PATCH net-next] ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop
2012-10-25 16:48 ` Nicolas Dichtel
@ 2012-11-02 8:58 ` Nicolas Dichtel
2012-11-03 1:38 ` David Miller
0 siblings, 1 reply; 47+ messages in thread
From: Nicolas Dichtel @ 2012-11-02 8:58 UTC (permalink / raw)
To: davem
Cc: shemminger, netdev, joe, bernat, eric.dumazet, yoshfuji,
Nicolas Dichtel
fib6_add_rt2node() will reject the nexthop if this flag is set, so
we perform the check only for the first nexthop.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
net/ipv6/route.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c42650c..9c7b5d8 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2449,6 +2449,12 @@ beginning:
goto beginning;
}
}
+ /* Because each route is added like a single route we remove
+ * this flag after the first nexthop (if there is a collision,
+ * we have already fail to add the first nexthop:
+ * fib6_add_rt2node() has reject it).
+ */
+ cfg->fc_nlinfo.nlh->nlmsg_flags &= ~NLM_F_EXCL;
rtnh = rtnh_next(rtnh, &remaining);
}
--
1.7.12
^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [RESEND PATCH net-next] ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop
2012-11-02 8:58 ` [RESEND PATCH net-next] ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop Nicolas Dichtel
@ 2012-11-03 1:38 ` David Miller
2012-11-05 8:30 ` Nicolas Dichtel
0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2012-11-03 1:38 UTC (permalink / raw)
To: nicolas.dichtel; +Cc: shemminger, netdev, joe, bernat, eric.dumazet, yoshfuji
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 2 Nov 2012 09:58:22 +0100
> fib6_add_rt2node() will reject the nexthop if this flag is set, so
> we perform the check only for the first nexthop.
>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
It seems a bit hackish, but I don't have any better ideas, so
applied, thanks.
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [RESEND PATCH net-next] ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop
2012-11-03 1:38 ` David Miller
@ 2012-11-05 8:30 ` Nicolas Dichtel
0 siblings, 0 replies; 47+ messages in thread
From: Nicolas Dichtel @ 2012-11-05 8:30 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, netdev, joe, bernat, eric.dumazet, yoshfuji
Le 03/11/2012 02:38, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Fri, 2 Nov 2012 09:58:22 +0100
>
>> fib6_add_rt2node() will reject the nexthop if this flag is set, so
>> we perform the check only for the first nexthop.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>
> It seems a bit hackish, but I don't have any better ideas, so
> applied, thanks.
>
Yes, I agree. It's why I didn't include it in the initial patch ... but I also
didn't find a better way :/
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2012-11-05 8:30 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-06 17:30 IPv6 multipath routes Vincent Bernat
2012-09-06 17:30 ` [PATCH] Fix "ip -6 route add ... nexthop" Vincent Bernat
2012-09-06 17:53 ` Vincent Bernat
2012-09-12 8:29 ` [RFC PATCH net-next 0/1] Add support of ECMPv6 Nicolas Dichtel
2012-09-12 8:29 ` [RFC PATCH net-next 1/1] ipv6: add support of ECMP Nicolas Dichtel
2012-09-12 8:48 ` YOSHIFUJI Hideaki
2012-09-12 9:42 ` YOSHIFUJI Hideaki
2012-09-12 9:53 ` Nicolas Dichtel
2012-09-14 7:59 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Nicolas Dichtel
2012-09-14 7:59 ` [RFC PATCH net-next v2 1/1] ipv6: add support of ECMP Nicolas Dichtel
2012-09-14 9:40 ` [RFC PATCH net-next v2 0/1] Add support of ECMPv6 Vincent Bernat
2012-09-14 13:35 ` Nicolas Dichtel
2012-09-14 13:37 ` Nicolas Dichtel
2012-10-15 12:36 ` Vincent Bernat
2012-10-15 19:54 ` Vincent Bernat
2012-09-19 9:18 ` [PATCH net-next v3 " Nicolas Dichtel
2012-09-19 9:18 ` [PATCH net-next v3 1/1] ipv6: add support of ECMP Nicolas Dichtel
2012-09-20 21:15 ` David Miller
2012-09-21 9:59 ` [PATCH net-next v4 0/1] Add support of ECMPv6 Nicolas Dichtel
2012-09-21 9:59 ` [PATCH net-next v4 1/1] ipv6: add support of ECMP Nicolas Dichtel
2012-09-21 17:48 ` [PATCH net-next v4 0/1] Add support of ECMPv6 David Miller
2012-09-24 12:28 ` Nicolas Dichtel
2012-10-01 13:56 ` [PATCH net-next v5 " Nicolas Dichtel
2012-10-01 13:56 ` [PATCH net-next v5 1/1] ipv6: add support of ECMP Nicolas Dichtel
2012-10-01 16:47 ` Joe Perches
2012-10-02 16:02 ` [PATCH net-next v6 0/1] Add support of ECMPv6 Nicolas Dichtel
2012-10-02 16:02 ` [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP) Nicolas Dichtel
2012-10-02 16:06 ` Nicolas Dichtel
2012-10-02 16:14 ` Eric Dumazet
2012-10-19 9:13 ` [PATCH net-next v7 0/1] Add support of ECMPv6 nicolas.dichtel
2012-10-19 9:13 ` [PATCH net-next v7 1/1] ipv6: add support of equal cost multipath (ECMP) nicolas.dichtel
2012-10-22 0:41 ` David Miller
2012-10-22 13:42 ` [PATCH net-next v8 0/1] Add support of ECMPv6 nicolas.dichtel
2012-10-22 13:42 ` [PATCH net-next v8 1/1] ipv6: add support of equal cost multipath (ECMP) nicolas.dichtel
2012-10-23 6:39 ` David Miller
2012-10-23 12:42 ` [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop" Nicolas Dichtel
2012-10-23 12:42 ` [PATCH iproute2 2/2] ip: remove NLM_F_EXCL in case of ECMPv6 routes Nicolas Dichtel
2012-10-25 16:06 ` Stephen Hemminger
2012-10-25 16:20 ` Nicolas Dichtel
2012-10-25 16:25 ` Stephen Hemminger
2012-10-25 16:48 ` Nicolas Dichtel
2012-11-02 8:58 ` [RESEND PATCH net-next] ipv6/multipath: remove flag NLM_F_EXCL after the first nexthop Nicolas Dichtel
2012-11-03 1:38 ` David Miller
2012-11-05 8:30 ` Nicolas Dichtel
2012-10-25 16:08 ` [PATCH iproute2 1/2] ip: fix "ip -6 route add ... nexthop" Stephen Hemminger
2012-10-02 18:43 ` [PATCH net-next v6 1/1] ipv6: add support of equal cost multipath (ECMP) David Miller
2012-09-11 12:57 ` IPv6 multipath routes Ulrich Weber
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).