netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes
@ 2017-02-02 20:37 David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 1/5] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: David Ahern @ 2017-02-02 20:37 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

This series closes a couple of gaps between IPv4 and IPv6 with respect
to multipath routes:

1. IPv4 allows all nexthops of multipath routes to be deleted using just
   the prefix and length; IPv6 only deletes the first nexthop for the
   route if only the prefix and length are given.

2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
   IPv6 returns a series of routes with the same prefix and length - one
   for each nexthop. This happens for both dumps and notifications.

IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
series of routes.

Patch 1 addresses the first item by allowing IPv6 multipath routes to be
deleted using just the prefix and length. Patch 2 addresses the second
allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.

Patches 3 and 4 upate the RTM_{NEW,DEL}ROUTE notifications to generate
1 notification with RTA_MULTIPATH where applicable.

Patch 5 prints IPv6 addresses in compressed format when showing route
replace errors. This was noticed testing REPLACE failures.

The end result for multipath routes:
1. Dump
   - RTA_MULTIPATH used for multipath routes

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 metric 1024
	    nexthop via 2001:db8:1::2  dev eth1 weight 1
	    nexthop via 2001:db8:2::2  dev eth2 weight 1
    ...


2. Route Add
   - one notification with RTA_MULTIPATH attribute

    $ ip -6 ro add vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::2 nexthop via 2001:db8:2::2 

    $ ip mon route
    2001:db8:200::/120 table red metric 1024
	nexthop via 2001:db8:1::2  dev eth1 weight 1
	nexthop via 2001:db8:2::2  dev eth2 weight 1

2. Route Replace
   - one notification with RTA_MULTIPATH attribute

    $ ip -6 ro replace vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::16 nexthop via 2001:db8:2::16

    $ ip mon route
    Replaced 2001:db8:200::/120 table red metric 1024
	    nexthop via 2001:db8:1::16  dev eth1 weight 1
	    nexthop via 2001:db8:2::16  dev eth2 weight 1

   - on a failure after the insertion of the first nexthop (which means
     the original route has been replaced in the FIB), a notification is
     sent with the successful nexthops and then the nexthops are deleted
     with one notification per hop. This is consistent with how it works
     today except the successful additions are coalesced into 1
     notification.


3. Route Delete
   - delete of entire multipath route using prefix/length only 1
     notification is generated:
    $ ip -6 ro del vrf red 2001:db8:200::/120

    $ ip mon route
    Deleted 2001:db8:200::/120 table red metric 1024
	    nexthop via 2001:db8:1::16  dev eth1 weight 1
	    nexthop via 2001:db8:2::16  dev eth2 weight 1

   - if a delete request contains nexthops one notification is
     generated per nexthop deleted. This is unavoidable since IPv6
     alllows a single nexthop to be deleted within a multipath route


4. Route Appends
   - IPv6 allows nexthops to be appended to an existing route. In this
     case one notification is sent for the new route with the append
     flag set.

    $ ip -6 ro append vrf red 2001:db8:200::/120 nexthop via 2001:db8:2::20 nexthop via 2001:db8:1::20

    $ ip mon route
    Append 2001:db8:200::/120 table red metric 1024
	    nexthop via 2001:db8:1::2  dev eth1 weight 1
	    nexthop via 2001:db8:2::2  dev eth2 weight 1
	    nexthop via 2001:db8:2::20  dev eth2 weight 1
	    nexthop via 2001:db8:1::20  dev eth1 weight 1

  - on failure of an append, a notification is sent with the route
    containing all of the nexthops successfully added, and it is
    followed by delete notifications as the hops are removed
    returning the route to its prior state. This is consistent with
    how it works today except the successful additions are coalesced
    into 1 notification.

Addresses some of the inconsistencies also noted by Roopa at netdev0.1:
https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_slides.pdf

v4
- changed series to do encoding in 1 patch and updating notificatons
  in separate patches to make it easier to review and understand

- 1 notification for delete when using prefix/length; 1 notification for
  append

- handle delete of a single nexthop without RTA_MULTIPATH in delete request

- upated commit messages and cover letter


v3
- removed the need for a user API to opt-in to change. Requiring an
  API just shifts the difference from same API with different
  behavior to different API to achieve equivalent behavior
- route notifications changed to use RTA_MULTIPATH for add and replace
- upated commit messages and cover letter

v2
- fixed locking in patch 1 as noted by DaveM
- changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
  set in rtm_flags
- revamped explanation of patch 2 and cover letter

David Ahern (5):
  net: ipv6: Allow shorthand delete of all nexthops in multipath route
  net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH
    attribute
  net: ipv6: Change notifications for multipath add to RTA_MULTIPATH
  net: ipv6: Change notifications for multipath delete to RTA_MULTIPATH
  net: ipv6: Use compressed IPv6 addresses showing route replace error

 include/net/ip6_fib.h |   4 +-
 include/net/netlink.h |   1 +
 net/ipv6/ip6_fib.c    |  19 ++++-
 net/ipv6/route.c      | 228 +++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 227 insertions(+), 25 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next v4 1/5] net: ipv6: Allow shorthand delete of all nexthops in multipath route
  2017-02-02 20:37 [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Ahern
@ 2017-02-02 20:37 ` David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 2/5] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute David Ahern
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2017-02-02 20:37 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

IPv4 allows multipath routes to be deleted using just the prefix and
length. For example:
    $ ip ro ls vrf red
    unreachable default metric 8192
    1.1.1.0/24
        nexthop via 10.100.1.254  dev eth1 weight 1
        nexthop via 10.11.200.2  dev eth11.200 weight 1
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

    $ ip ro del 1.1.1.0/24 vrf red

    $ ip ro ls vrf red
    unreachable default metric 8192
    10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
    10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

The same notation does not work with IPv6 because of how multipath routes
are implemented for IPv6. For IPv6 only the first nexthop of a multipath
route is deleted if the request contains only a prefix and length. This
leads to unnecessary complexity in userspace dealing with IPv6 multipath
routes.

This patch allows all nexthops to be deleted without specifying each one
in the delete request. Internally, this is done by walking the sibling
list of the route matching the specifications given (prefix, length,
metric, protocol, etc).

    $  ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 via 2001:db8:1::2 dev eth1 metric 1024  pref medium
    2001:db8:200::/120 via 2001:db8:2::2 dev eth2 metric 1024  pref medium
    ...

    $ ip -6 ro del vrf red 2001:db8:200::/120

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    ...

Because IPv6 allows individual nexthops to be deleted without deleting
the entire route, the ip6_route_multipath_del and non-multipath code
path (ip6_route_del) have to be discriminated so that all nexthops are
only deleted for the latter case. This is done by making the existing
fc_type in fib6_config a u16 and then adding a new u16 field with
fc_delete_all_nh as the first bit.

Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v4
- handle the case where a single nexthop is deleted without the
  RTA_MULTIPATH encoding:
     ip -6 ro del vrf red 2001:db8:200::/120 via 2001:db8:1::2

v3
- removed need for RTM_F_ALL_NEXTHOPS user api

v2
- fixed locking deleting route and its siblings as noted by DaveM

v2' (patch originally submitted standalone)
- switched examples to rfc 3849 documentation address per request
- changed delete loop to explicitly look at siblings list for
  first route matching specs given (metric, protocol, etc)

 include/net/ip6_fib.h |  4 +++-
 net/ipv6/route.c      | 38 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index a74e2aa40ef4..c979c878df1c 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -37,7 +37,9 @@ struct fib6_config {
 	int		fc_ifindex;
 	u32		fc_flags;
 	u32		fc_protocol;
-	u32		fc_type;	/* only 8 bits are used */
+	u16		fc_type;        /* only 8 bits are used */
+	u16		fc_delete_all_nh : 1,
+			__unused : 15;
 
 	struct in6_addr	fc_dst;
 	struct in6_addr	fc_src;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2563331b0532..466199cab8cc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2143,6 +2143,34 @@ int ip6_del_rt(struct rt6_info *rt)
 	return __ip6_del_rt(rt, &info);
 }
 
+static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg)
+{
+	struct nl_info *info = &cfg->fc_nlinfo;
+	struct fib6_table *table;
+	int err;
+
+	table = rt->rt6i_table;
+	write_lock_bh(&table->tb6_lock);
+
+	if (rt->rt6i_nsiblings && cfg->fc_delete_all_nh) {
+		struct rt6_info *sibling, *next_sibling;
+
+		list_for_each_entry_safe(sibling, next_sibling,
+					 &rt->rt6i_siblings,
+					 rt6i_siblings) {
+			err = fib6_del(sibling, info);
+			if (err)
+				goto out;
+		}
+	}
+
+	err = fib6_del(rt, info);
+out:
+	write_unlock_bh(&table->tb6_lock);
+	ip6_rt_put(rt);
+	return err;
+}
+
 static int ip6_route_del(struct fib6_config *cfg)
 {
 	struct fib6_table *table;
@@ -2179,7 +2207,11 @@ static int ip6_route_del(struct fib6_config *cfg)
 			dst_hold(&rt->dst);
 			read_unlock_bh(&table->tb6_lock);
 
-			return __ip6_del_rt(rt, &cfg->fc_nlinfo);
+			/* if gateway was specified only delete the one hop */
+			if (cfg->fc_flags & RTF_GATEWAY)
+				return __ip6_del_rt(rt, &cfg->fc_nlinfo);
+
+			return __ip6_del_rt_siblings(rt, cfg);
 		}
 	}
 	read_unlock_bh(&table->tb6_lock);
@@ -3141,8 +3173,10 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 
 	if (cfg.fc_mp)
 		return ip6_route_multipath_del(&cfg);
-	else
+	else {
+		cfg.fc_delete_all_nh = 1;
 		return ip6_route_del(&cfg);
+	}
 }
 
 static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v4 2/5] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute
  2017-02-02 20:37 [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 1/5] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
@ 2017-02-02 20:37 ` David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 3/5] net: ipv6: Change notifications for multipath add to RTA_MULTIPATH David Ahern
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2017-02-02 20:37 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

IPv6 returns multipath routes as a series of individual routes making
their display and handling by userspace different and more complicated
than IPv4, putting the burden on the user to see that a route is part of
a multipath route and internally creating a multipath route if desired
(e.g., libnl does this as of commit 29b71371e764). This patch addresses
this difference, allowing multipath routes to be returned using the
RTA_MULTIPATH attribute.

The end result is that IPv6 multipath routes can be treated and displayed
in a format similar to IPv4:

    $ ip -6 ro ls vrf red
    2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
    2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
    2001:db8:200::/120 metric 1024
	    nexthop via 2001:db8:1::2  dev eth1 weight 1
	    nexthop via 2001:db8:2::2  dev eth2 weight 1

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v4
- reduced this patch to just RT_MULTIPATH encoding
- account for RT_MULTIPATH in rt6_nlmsg_size

v3
- dropped RTM_F_ALL_NEXTHOPS

v2
- changed user api to opt in to new behavior from attribute appended to
  the request to requiring an rtmsg struct with the RTM_F_ALL_NEXTHOPS
  set

 net/ipv6/ip6_fib.c |  10 +++++
 net/ipv6/route.c   | 112 +++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 105 insertions(+), 17 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index febde6c112bf..1bf5e22fb95d 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -318,6 +318,16 @@ static int fib6_dump_node(struct fib6_walker *w)
 			w->leaf = rt;
 			return 1;
 		}
+
+		/* Multipath routes are dumped in one route with the
+		 * RTA_MULTIPATH attribute. Jump 'rt' to point to the
+		 * last sibling of this route (no need to dump the
+		 * sibling routes again)
+		 */
+		if (rt->rt6i_nsiblings)
+			rt = list_last_entry(&rt->rt6i_siblings,
+					     struct rt6_info,
+					     rt6i_siblings);
 	}
 	w->leaf = NULL;
 	return 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 466199cab8cc..8d9970163e33 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3194,8 +3194,20 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return ip6_route_add(&cfg);
 }
 
-static inline size_t rt6_nlmsg_size(struct rt6_info *rt)
+static size_t rt6_nlmsg_size(struct rt6_info *rt)
 {
+	int nexthop_len = 0;
+
+	if (rt->rt6i_nsiblings) {
+		nexthop_len = nla_total_size(0)	 /* RTA_MULTIPATH */
+			    + NLA_ALIGN(sizeof(struct rtnexthop))
+			    + nla_total_size(16) /* RTA_GATEWAY */
+			    + nla_total_size(4)  /* RTA_OIF */
+			    + lwtunnel_get_encap_size(rt->dst.lwtstate);
+
+		nexthop_len *= rt->rt6i_nsiblings;
+	}
+
 	return NLMSG_ALIGN(sizeof(struct rtmsg))
 	       + nla_total_size(16) /* RTA_SRC */
 	       + nla_total_size(16) /* RTA_DST */
@@ -3209,7 +3221,62 @@ static inline size_t rt6_nlmsg_size(struct rt6_info *rt)
 	       + nla_total_size(sizeof(struct rta_cacheinfo))
 	       + nla_total_size(TCP_CA_NAME_MAX) /* RTAX_CC_ALGO */
 	       + nla_total_size(1) /* RTA_PREF */
-	       + lwtunnel_get_encap_size(rt->dst.lwtstate);
+	       + lwtunnel_get_encap_size(rt->dst.lwtstate)
+	       + nexthop_len;
+}
+
+static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
+			    unsigned int *flags)
+{
+	if (!netif_running(rt->dst.dev) || !netif_carrier_ok(rt->dst.dev)) {
+		*flags |= RTNH_F_LINKDOWN;
+		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+			*flags |= RTNH_F_DEAD;
+	}
+
+	if (rt->rt6i_flags & RTF_GATEWAY) {
+		if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0)
+			goto nla_put_failure;
+	}
+
+	if (rt->dst.dev &&
+	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
+		goto nla_put_failure;
+
+	if (rt->dst.lwtstate &&
+	    lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0)
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt)
+{
+	struct rtnexthop *rtnh;
+	unsigned int flags = 0;
+
+	rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
+	if (!rtnh)
+		goto nla_put_failure;
+
+	rtnh->rtnh_hops = 0;
+	rtnh->rtnh_ifindex = rt->dst.dev ? rt->dst.dev->ifindex : 0;
+
+	if (rt6_nexthop_info(skb, rt, &flags) < 0)
+		goto nla_put_failure;
+
+	rtnh->rtnh_flags = flags;
+
+	/* length of rtnetlink header + attributes */
+	rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *)rtnh;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
 }
 
 static int rt6_fill_node(struct net *net,
@@ -3263,11 +3330,6 @@ static int rt6_fill_node(struct net *net,
 	else
 		rtm->rtm_type = RTN_UNICAST;
 	rtm->rtm_flags = 0;
-	if (!netif_running(rt->dst.dev) || !netif_carrier_ok(rt->dst.dev)) {
-		rtm->rtm_flags |= RTNH_F_LINKDOWN;
-		if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
-			rtm->rtm_flags |= RTNH_F_DEAD;
-	}
 	rtm->rtm_scope = RT_SCOPE_UNIVERSE;
 	rtm->rtm_protocol = rt->rt6i_protocol;
 	if (rt->rt6i_flags & RTF_DYNAMIC)
@@ -3331,17 +3393,35 @@ static int rt6_fill_node(struct net *net,
 	if (rtnetlink_put_metrics(skb, metrics) < 0)
 		goto nla_put_failure;
 
-	if (rt->rt6i_flags & RTF_GATEWAY) {
-		if (nla_put_in6_addr(skb, RTA_GATEWAY, &rt->rt6i_gateway) < 0)
-			goto nla_put_failure;
-	}
-
-	if (rt->dst.dev &&
-	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
-		goto nla_put_failure;
 	if (nla_put_u32(skb, RTA_PRIORITY, rt->rt6i_metric))
 		goto nla_put_failure;
 
+	/* For multipath routes, walk the siblings list and add
+	 * each as a nexthop within RTA_MULTIPATH.
+	 */
+	if (rt->rt6i_nsiblings) {
+		struct rt6_info *sibling, *next_sibling;
+		struct nlattr *mp;
+
+		mp = nla_nest_start(skb, RTA_MULTIPATH);
+		if (!mp)
+			goto nla_put_failure;
+
+		if (rt6_add_nexthop(skb, rt) < 0)
+			goto nla_put_failure;
+
+		list_for_each_entry_safe(sibling, next_sibling,
+					 &rt->rt6i_siblings, rt6i_siblings) {
+			if (rt6_add_nexthop(skb, sibling) < 0)
+				goto nla_put_failure;
+		}
+
+		nla_nest_end(skb, mp);
+	} else {
+		if (rt6_nexthop_info(skb, rt, &rtm->rtm_flags) < 0)
+			goto nla_put_failure;
+	}
+
 	expires = (rt->rt6i_flags & RTF_EXPIRES) ? rt->dst.expires - jiffies : 0;
 
 	if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, rt->dst.error) < 0)
@@ -3350,8 +3430,6 @@ static int rt6_fill_node(struct net *net,
 	if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt->rt6i_flags)))
 		goto nla_put_failure;
 
-	if (lwtunnel_fill_encap(skb, rt->dst.lwtstate) < 0)
-		goto nla_put_failure;
 
 	nlmsg_end(skb, nlh);
 	return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v4 3/5] net: ipv6: Change notifications for multipath add to RTA_MULTIPATH
  2017-02-02 20:37 [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 1/5] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 2/5] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute David Ahern
@ 2017-02-02 20:37 ` David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 4/5] net: ipv6: Change notifications for multipath delete " David Ahern
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2017-02-02 20:37 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

Change ip6_route_multipath_add to send one notifciation with the full
route encoded with RTA_MULTIPATH instead of a series of individual routes.
This is done by adding a skip_notify flag to the nl_info struct. The
flag is used to skip sending of the notification in the fib code that
actually inserts the route. Once the full route has been added, a
notification is generated with all nexthops.

ip6_route_multipath_add handles 3 use cases: new routes, route replace,
and route append. The multipath notification generated needs to be
consistent with the order of the nexthops and it should be consistent
with the order in a FIB dump which means the route with the first nexthop
needs to be used as the route reference. For the first 2 cases (new and
replace), a reference to the route used to send the notification is
obtained by saving the first route added. For the append case, the last
route added is used to loop back to its first sibling route which is
the first nexthop in the multipath route.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v4
- new standalone patch in v4 of set

 include/net/netlink.h |  1 +
 net/ipv6/ip6_fib.c    |  6 ++++--
 net/ipv6/route.c      | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index d3938f11ae52..b239fcd33d80 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -229,6 +229,7 @@ struct nl_info {
 	struct nlmsghdr		*nlh;
 	struct net		*nl_net;
 	u32			portid;
+	bool			skip_notify;
 };
 
 int netlink_rcv_skb(struct sk_buff *skb,
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 1bf5e22fb95d..99c68ce6ef78 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -881,7 +881,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		*ins = rt;
 		rt->rt6i_node = fn;
 		atomic_inc(&rt->rt6i_ref);
-		inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
+		if (!info->skip_notify)
+			inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
 		info->nl_net->ipv6.rt6_stats->fib_rt_entries++;
 
 		if (!(fn->fn_flags & RTN_RTINFO)) {
@@ -907,7 +908,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		rt->rt6i_node = fn;
 		rt->dst.rt6_next = iter->dst.rt6_next;
 		atomic_inc(&rt->rt6i_ref);
-		inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
+		if (!info->skip_notify)
+			inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
 		if (!(fn->fn_flags & RTN_RTINFO)) {
 			info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
 			fn->fn_flags |= RTN_RTINFO;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8d9970163e33..b2f5c6f18e1d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3022,13 +3022,37 @@ static int ip6_route_info_append(struct list_head *rt6_nh_list,
 	return 0;
 }
 
+static void ip6_route_mpath_notify(struct rt6_info *rt,
+				   struct rt6_info *rt_last,
+				   struct nl_info *info,
+				   __u16 nlflags)
+{
+	/* if this is an APPEND route, then rt points to the first route
+	 * inserted and rt_last points to last route inserted. Userspace
+	 * wants a consistent dump of the route which starts at the first
+	 * nexthop. Since sibling routes are always added at the end of
+	 * the list, find the first sibling of the last route appended
+	 */
+	if ((nlflags & NLM_F_APPEND) && rt_last && rt_last->rt6i_nsiblings) {
+		rt = list_first_entry(&rt_last->rt6i_siblings,
+				      struct rt6_info,
+				      rt6i_siblings);
+	}
+
+	if (rt)
+		inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
+}
+
 static int ip6_route_multipath_add(struct fib6_config *cfg)
 {
+	struct rt6_info *rt_notif = NULL, *rt_last = NULL;
+	struct nl_info *info = &cfg->fc_nlinfo;
 	struct fib6_config r_cfg;
 	struct rtnexthop *rtnh;
 	struct rt6_info *rt;
 	struct rt6_nh *err_nh;
 	struct rt6_nh *nh, *nh_safe;
+	__u16 nlflags;
 	int remaining;
 	int attrlen;
 	int err = 1;
@@ -3037,6 +3061,10 @@ static int ip6_route_multipath_add(struct fib6_config *cfg)
 		       (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE));
 	LIST_HEAD(rt6_nh_list);
 
+	nlflags = replace ? NLM_F_REPLACE : NLM_F_CREATE;
+	if (info->nlh && info->nlh->nlmsg_flags & NLM_F_APPEND)
+		nlflags |= NLM_F_APPEND;
+
 	remaining = cfg->fc_mp_len;
 	rtnh = (struct rtnexthop *)cfg->fc_mp;
 
@@ -3079,9 +3107,20 @@ static int ip6_route_multipath_add(struct fib6_config *cfg)
 		rtnh = rtnh_next(rtnh, &remaining);
 	}
 
+	/* for add and replace send one notification with all nexthops.
+	 * Skip the notification in fib6_add_rt2node and send one with
+	 * the full route when done
+	 */
+	info->skip_notify = 1;
+
 	err_nh = NULL;
 	list_for_each_entry(nh, &rt6_nh_list, next) {
-		err = __ip6_ins_rt(nh->rt6_info, &cfg->fc_nlinfo, &nh->mxc);
+		rt_last = nh->rt6_info;
+		err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc);
+		/* save reference to first route for notification */
+		if (!rt_notif && !err)
+			rt_notif = nh->rt6_info;
+
 		/* nh->rt6_info is used or freed at this point, reset to NULL*/
 		nh->rt6_info = NULL;
 		if (err) {
@@ -3103,9 +3142,18 @@ static int ip6_route_multipath_add(struct fib6_config *cfg)
 		nhn++;
 	}
 
+	/* success ... tell user about new route */
+	ip6_route_mpath_notify(rt_notif, rt_last, info, nlflags);
 	goto cleanup;
 
 add_errout:
+	/* send notification for routes that were added so that
+	 * the delete notifications sent by ip6_route_del are
+	 * coherent
+	 */
+	if (rt_notif)
+		ip6_route_mpath_notify(rt_notif, rt_last, info, nlflags);
+
 	/* Delete routes that were already added */
 	list_for_each_entry(nh, &rt6_nh_list, next) {
 		if (err_nh == nh)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v4 4/5] net: ipv6: Change notifications for multipath delete to RTA_MULTIPATH
  2017-02-02 20:37 [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Ahern
                   ` (2 preceding siblings ...)
  2017-02-02 20:37 ` [PATCH net-next v4 3/5] net: ipv6: Change notifications for multipath add to RTA_MULTIPATH David Ahern
@ 2017-02-02 20:37 ` David Ahern
  2017-02-02 20:37 ` [PATCH net-next v4 5/5] net: ipv6: Use compressed IPv6 addresses showing route replace error David Ahern
  2017-02-05  0:58 ` [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2017-02-02 20:37 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

If an entire multipath route is deleted using prefix and len (without any
nexthops), send a single RTM_DELROUTE notification with the full route
using RTA_MULTIPATH. This is done by generating the skb before the route
delete when all of the sibling routes are still present but sending it
after the route has been removed from the FIB. The skip_notify flag
is used to tell the lower fib code not to send notifications for the
individual nexthop routes.

If a route is deleted using RTA_MULTIPATH for any nexthops or a single
nexthop entry is deleted, then the nexthops are deleted one at a time with
notifications sent as each hop is deleted. This is necessary given that
IPv6 allows individual hops within a route to be deleted.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v4
- new standalone patch for v4

 net/ipv6/ip6_fib.c |  3 ++-
 net/ipv6/route.c   | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 99c68ce6ef78..e4266746e4a2 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1454,7 +1454,8 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
 
 	fib6_purge_rt(rt, fn, net);
 
-	inet6_rt_notify(RTM_DELROUTE, rt, info, 0);
+	if (!info->skip_notify)
+		inet6_rt_notify(RTM_DELROUTE, rt, info, 0);
 	rt6_release(rt);
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b2f5c6f18e1d..d229c17118e5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -98,6 +98,12 @@ static void		rt6_do_redirect(struct dst_entry *dst, struct sock *sk,
 					struct sk_buff *skb);
 static void		rt6_dst_from_metrics_check(struct rt6_info *rt);
 static int rt6_score_route(struct rt6_info *rt, int oif, int strict);
+static size_t rt6_nlmsg_size(struct rt6_info *rt);
+static int rt6_fill_node(struct net *net,
+			 struct sk_buff *skb, struct rt6_info *rt,
+			 struct in6_addr *dst, struct in6_addr *src,
+			 int iif, int type, u32 portid, u32 seq,
+			 unsigned int flags);
 
 #ifdef CONFIG_IPV6_ROUTE_INFO
 static struct rt6_info *rt6_add_route_info(struct net *net,
@@ -2146,6 +2152,7 @@ int ip6_del_rt(struct rt6_info *rt)
 static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg)
 {
 	struct nl_info *info = &cfg->fc_nlinfo;
+	struct sk_buff *skb = NULL;
 	struct fib6_table *table;
 	int err;
 
@@ -2155,6 +2162,20 @@ static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg)
 	if (rt->rt6i_nsiblings && cfg->fc_delete_all_nh) {
 		struct rt6_info *sibling, *next_sibling;
 
+		/* prefer to send a single notification with all hops */
+		skb = nlmsg_new(rt6_nlmsg_size(rt), gfp_any());
+		if (skb) {
+			u32 seq = info->nlh ? info->nlh->nlmsg_seq : 0;
+
+			if (rt6_fill_node(info->nl_net, skb, rt,
+					  NULL, NULL, 0, RTM_DELROUTE,
+					  info->portid, seq, 0) < 0) {
+				kfree_skb(skb);
+				skb = NULL;
+			} else
+				info->skip_notify = 1;
+		}
+
 		list_for_each_entry_safe(sibling, next_sibling,
 					 &rt->rt6i_siblings,
 					 rt6i_siblings) {
@@ -2168,6 +2189,11 @@ static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg)
 out:
 	write_unlock_bh(&table->tb6_lock);
 	ip6_rt_put(rt);
+
+	if (skb) {
+		rtnl_notify(skb, info->nl_net, info->portid, RTNLGRP_IPV6_ROUTE,
+			    info->nlh, gfp_any());
+	}
 	return err;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v4 5/5] net: ipv6: Use compressed IPv6 addresses showing route replace error
  2017-02-02 20:37 [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Ahern
                   ` (3 preceding siblings ...)
  2017-02-02 20:37 ` [PATCH net-next v4 4/5] net: ipv6: Change notifications for multipath delete " David Ahern
@ 2017-02-02 20:37 ` David Ahern
  2017-02-05  0:58 ` [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2017-02-02 20:37 UTC (permalink / raw)
  To: netdev; +Cc: roopa, nicolas.dichtel, David Ahern

ip6_print_replace_route_err logs an error if a route replace fails with
IPv6 addresses in the full format. e.g,:

IPv6: IPV6: multipath route replace failed (check consistency of installed routes): 2001:0db8:0200:0000:0000:0000:0000:0000 nexthop 2001:0db8:0001:0000:0000:0000:0000:0016 ifi 0

Change the message to dump the addresses in the compressed format.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v4
- no change from v3

v3
- new patch for v3 of set

 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d229c17118e5..9689b111d642 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3009,7 +3009,7 @@ static void ip6_print_replace_route_err(struct list_head *rt6_nh_list)
 	struct rt6_nh *nh;
 
 	list_for_each_entry(nh, rt6_nh_list, next) {
-		pr_warn("IPV6: multipath route replace failed (check consistency of installed routes): %pI6 nexthop %pI6 ifi %d\n",
+		pr_warn("IPV6: multipath route replace failed (check consistency of installed routes): %pI6c nexthop %pI6c ifi %d\n",
 		        &nh->r_cfg.fc_dst, &nh->r_cfg.fc_gateway,
 		        nh->r_cfg.fc_ifindex);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes
  2017-02-02 20:37 [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Ahern
                   ` (4 preceding siblings ...)
  2017-02-02 20:37 ` [PATCH net-next v4 5/5] net: ipv6: Use compressed IPv6 addresses showing route replace error David Ahern
@ 2017-02-05  0:58 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2017-02-05  0:58 UTC (permalink / raw)
  To: dsa; +Cc: netdev, roopa, nicolas.dichtel

From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu,  2 Feb 2017 12:37:07 -0800

> This series closes a couple of gaps between IPv4 and IPv6 with respect
> to multipath routes:
 ...

Series applied, thanks David.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-02-05  0:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-02 20:37 [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Ahern
2017-02-02 20:37 ` [PATCH net-next v4 1/5] net: ipv6: Allow shorthand delete of all nexthops in multipath route David Ahern
2017-02-02 20:37 ` [PATCH net-next v4 2/5] net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute David Ahern
2017-02-02 20:37 ` [PATCH net-next v4 3/5] net: ipv6: Change notifications for multipath add to RTA_MULTIPATH David Ahern
2017-02-02 20:37 ` [PATCH net-next v4 4/5] net: ipv6: Change notifications for multipath delete " David Ahern
2017-02-02 20:37 ` [PATCH net-next v4 5/5] net: ipv6: Use compressed IPv6 addresses showing route replace error David Ahern
2017-02-05  0:58 ` [PATCH net-next v4 0/5] net: ipv6: Improve user experience with multipath routes David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).