From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Roopa Prabhu <roopa@cumulusnetworks.com>,
Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
Nicolas Dichtel <nicolas.dichtel@6wind.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.2 08/30] ipv6: fix multipath route replace error recovery
Date: Thu, 1 Oct 2015 11:21:30 +0200 [thread overview]
Message-ID: <20151001092038.561995704@linuxfoundation.org> (raw)
In-Reply-To: <20151001092038.213304276@linuxfoundation.org>
4.2-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roopa Prabhu <roopa@cumulusnetworks.com>
[ Upstream commit 6b9ea5a64ed5eeb3f68f2e6fcce0ed1179801d1e ]
Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route leaving the fib
in an inconsistent state.
This patch reduces the possibility of this by doing the following:
a) Changes the existing multipath route add code to a two stage process:
build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that most errors are caught during building rt6_infos
and we fail early
c) Separates multipath add and del code. Because add needs the special
two stage mode in a) and delete essentially does not care.
d) In any event if the code fails during inserting a route again, a
warning is printed (This should be unlikely)
Before the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists
$ip -6 route show
/* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
* kernel */
After the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv6/route.c | 201 +++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 175 insertions(+), 26 deletions(-)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1727,7 +1727,7 @@ static int ip6_convert_metrics(struct mx
return -EINVAL;
}
-int ip6_route_add(struct fib6_config *cfg)
+int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret)
{
int err;
struct net *net = cfg->fc_nlinfo.nl_net;
@@ -1735,7 +1735,6 @@ int ip6_route_add(struct fib6_config *cf
struct net_device *dev = NULL;
struct inet6_dev *idev = NULL;
struct fib6_table *table;
- struct mx6_config mxc = { .mx = NULL, };
int addr_type;
if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128)
@@ -1941,6 +1940,32 @@ install_route:
cfg->fc_nlinfo.nl_net = dev_net(dev);
+ *rt_ret = rt;
+
+ return 0;
+out:
+ if (dev)
+ dev_put(dev);
+ if (idev)
+ in6_dev_put(idev);
+ if (rt)
+ dst_free(&rt->dst);
+
+ *rt_ret = NULL;
+
+ return err;
+}
+
+int ip6_route_add(struct fib6_config *cfg)
+{
+ struct mx6_config mxc = { .mx = NULL, };
+ struct rt6_info *rt = NULL;
+ int err;
+
+ err = ip6_route_info_create(cfg, &rt);
+ if (err)
+ goto out;
+
err = ip6_convert_metrics(&mxc, cfg);
if (err)
goto out;
@@ -1948,14 +1973,12 @@ install_route:
err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, &mxc);
kfree(mxc.mx);
+
return err;
out:
- if (dev)
- dev_put(dev);
- if (idev)
- in6_dev_put(idev);
if (rt)
dst_free(&rt->dst);
+
return err;
}
@@ -2727,19 +2750,78 @@ errout:
return err;
}
-static int ip6_route_multipath(struct fib6_config *cfg, int add)
+struct rt6_nh {
+ struct rt6_info *rt6_info;
+ struct fib6_config r_cfg;
+ struct mx6_config mxc;
+ struct list_head next;
+};
+
+static void ip6_print_replace_route_err(struct list_head *rt6_nh_list)
+{
+ struct rt6_nh *nh;
+
+ list_for_each_entry(nh, rt6_nh_list, next) {
+ pr_warn("IPV6: multipath route replace failed (check consistency of installed routes): %pI6 nexthop %pI6 ifi %d\n",
+ &nh->r_cfg.fc_dst, &nh->r_cfg.fc_gateway,
+ nh->r_cfg.fc_ifindex);
+ }
+}
+
+static int ip6_route_info_append(struct list_head *rt6_nh_list,
+ struct rt6_info *rt, struct fib6_config *r_cfg)
+{
+ struct rt6_nh *nh;
+ struct rt6_info *rtnh;
+ int err = -EEXIST;
+
+ list_for_each_entry(nh, rt6_nh_list, next) {
+ /* check if rt6_info already exists */
+ rtnh = nh->rt6_info;
+
+ if (rtnh->dst.dev == rt->dst.dev &&
+ rtnh->rt6i_idev == rt->rt6i_idev &&
+ ipv6_addr_equal(&rtnh->rt6i_gateway,
+ &rt->rt6i_gateway))
+ return err;
+ }
+
+ nh = kzalloc(sizeof(*nh), GFP_KERNEL);
+ if (!nh)
+ return -ENOMEM;
+ nh->rt6_info = rt;
+ err = ip6_convert_metrics(&nh->mxc, r_cfg);
+ if (err) {
+ kfree(nh);
+ return err;
+ }
+ memcpy(&nh->r_cfg, r_cfg, sizeof(*r_cfg));
+ list_add_tail(&nh->next, rt6_nh_list);
+
+ return 0;
+}
+
+static int ip6_route_multipath_add(struct fib6_config *cfg)
{
struct fib6_config r_cfg;
struct rtnexthop *rtnh;
+ struct rt6_info *rt;
+ struct rt6_nh *err_nh;
+ struct rt6_nh *nh, *nh_safe;
int remaining;
int attrlen;
- int err = 0, last_err = 0;
+ int err = 1;
+ int nhn = 0;
+ int replace = (cfg->fc_nlinfo.nlh &&
+ (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE));
+ LIST_HEAD(rt6_nh_list);
remaining = cfg->fc_mp_len;
-beginning:
rtnh = (struct rtnexthop *)cfg->fc_mp;
- /* Parse a Multipath Entry */
+ /* Parse a Multipath Entry and build a list (rt6_nh_list) of
+ * rt6_info structs per nexthop
+ */
while (rtnh_ok(rtnh, remaining)) {
memcpy(&r_cfg, cfg, sizeof(*cfg));
if (rtnh->rtnh_ifindex)
@@ -2755,22 +2837,32 @@ beginning:
r_cfg.fc_flags |= RTF_GATEWAY;
}
}
- err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
+
+ err = ip6_route_info_create(&r_cfg, &rt);
+ if (err)
+ goto cleanup;
+
+ err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
if (err) {
- last_err = err;
- /* If we are trying to remove a route, do not stop the
- * loop when ip6_route_del() fails (because next hop is
- * already gone), we should try to remove all next hops.
- */
- if (add) {
- /* If add fails, we should try to delete all
- * next hops that have been already added.
- */
- add = 0;
- remaining = cfg->fc_mp_len - remaining;
- goto beginning;
- }
+ dst_free(&rt->dst);
+ goto cleanup;
+ }
+
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+ err_nh = NULL;
+ list_for_each_entry(nh, &rt6_nh_list, next) {
+ err = __ip6_ins_rt(nh->rt6_info, &cfg->fc_nlinfo, &nh->mxc);
+ /* nh->rt6_info is used or freed at this point, reset to NULL*/
+ nh->rt6_info = NULL;
+ if (err) {
+ if (replace && nhn)
+ ip6_print_replace_route_err(&rt6_nh_list);
+ err_nh = nh;
+ goto add_errout;
}
+
/* Because each route is added like a single route we remove
* these flags after the first nexthop: if there is a collision,
* we have already failed to add the first nexthop:
@@ -2780,6 +2872,63 @@ beginning:
*/
cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL |
NLM_F_REPLACE);
+ nhn++;
+ }
+
+ goto cleanup;
+
+add_errout:
+ /* Delete routes that were already added */
+ list_for_each_entry(nh, &rt6_nh_list, next) {
+ if (err_nh == nh)
+ break;
+ ip6_route_del(&nh->r_cfg);
+ }
+
+cleanup:
+ list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
+ if (nh->rt6_info)
+ dst_free(&nh->rt6_info->dst);
+ if (nh->mxc.mx)
+ kfree(nh->mxc.mx);
+ list_del(&nh->next);
+ kfree(nh);
+ }
+
+ return err;
+}
+
+static int ip6_route_multipath_del(struct fib6_config *cfg)
+{
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
+ int remaining;
+ int attrlen;
+ int err = 1, last_err = 0;
+
+ remaining = cfg->fc_mp_len;
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+
+ /* Parse a Multipath Entry */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+ attrlen = rtnh_attrlen(rtnh);
+ if (attrlen > 0) {
+ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+ if (nla) {
+ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+ err = ip6_route_del(&r_cfg);
+ if (err)
+ last_err = err;
+
rtnh = rtnh_next(rtnh, &remaining);
}
@@ -2796,7 +2945,7 @@ static int inet6_rtm_delroute(struct sk_
return err;
if (cfg.fc_mp)
- return ip6_route_multipath(&cfg, 0);
+ return ip6_route_multipath_del(&cfg);
else
return ip6_route_del(&cfg);
}
@@ -2811,7 +2960,7 @@ static int inet6_rtm_newroute(struct sk_
return err;
if (cfg.fc_mp)
- return ip6_route_multipath(&cfg, 1);
+ return ip6_route_multipath_add(&cfg);
else
return ip6_route_add(&cfg);
}
next prev parent reply other threads:[~2015-10-01 9:23 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-01 9:21 [PATCH 4.2 00/30] 4.2.3-stable review Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 01/30] phylib: fix device deletion order in mdiobus_unregister() Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 02/30] sock, diag: fix panic in sock_diag_put_filterinfo Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 03/30] ipv6: fix exthdrs offload registration in out_rt path Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 04/30] net: fec: clear receive interrupts before processing a packet Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 05/30] net: eth: altera: fix napi poll_list corruption Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 06/30] net/ipv6: Correct PIM6 mrt_lock handling Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 07/30] net: dsa: bcm_sf2: Fix ageing conditions and operation Greg Kroah-Hartman
2015-10-01 9:21 ` Greg Kroah-Hartman [this message]
2015-10-01 9:21 ` [PATCH 4.2 09/30] net: dsa: bcm_sf2: Fix 64-bits register writes Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 10/30] netlink, mmap: transform mmap skb into full skb on taps Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 11/30] sctp: fix race on protocol/netns initialization Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 13/30] net: mvneta: fix DMA buffer unmapping in mvneta_rx() Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 14/30] rtnetlink: catch -EOPNOTSUPP errors from ndo_bridge_getlink Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 15/30] net/mlx4_en: really allow to change RSS key Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 16/30] macvtap: fix TUNSETSNDBUF values > 64k Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 17/30] netlink: Fix autobind race condition that leads to zero port ID Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 18/30] netlink: Replace rhash_portid with bound Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 19/30] net: dsa: actually force the speed on the CPU port Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 20/30] openvswitch: Zero flows on allocation Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 21/30] tcp: add proper TS val into RST packets Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 22/30] Fix AF_PACKET ABI breakage in 4.2 Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 23/30] net: revert "net_sched: move tp->root allocation into fw_init()" Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 24/30] fib_rules: fix fib rule dumps across multiple skbs Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 25/30] ppp: fix lockdep splat in ppp_dev_uninit() Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 26/30] net: dsa: bcm_sf2: Do not override speed settings Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 27/30] net: phy: fixed_phy: handle link-down case Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 28/30] of_mdio: add new DT property managed to specify the PHY management type Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 29/30] mvneta: use inband status only when explicitly enabled Greg Kroah-Hartman
2015-10-01 9:21 ` [PATCH 4.2 30/30] net/mlx4_core: Capping number of requested MSIXs to MAX_MSIX Greg Kroah-Hartman
2015-10-02 1:27 ` [PATCH 4.2 00/30] 4.2.3-stable review Guenter Roeck
2015-10-03 11:37 ` Greg Kroah-Hartman
2015-10-03 14:45 ` Guenter Roeck
2015-10-18 0:45 ` Greg Kroah-Hartman
2015-10-02 5:14 ` Sudip Mukherjee
2015-10-03 11:36 ` Greg Kroah-Hartman
2015-10-02 15:41 ` Shuah Khan
2015-10-03 11:36 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151001092038.561995704@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=nikolay@cumulusnetworks.com \
--cc=roopa@cumulusnetworks.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).