netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch net-next 00/17] mlxsw: Support for IPv6 UC router
@ 2017-07-19  7:02 Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 01/17] net: core: Make the FIB notification chain generic Jiri Pirko
                   ` (16 more replies)
  0 siblings, 17 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Jiri Pirko <jiri@mellanox.com>

This set adds support for IPv6 unicast routes offload. The first four
patches make the FIB notification chain generic so that it could be used
by address families other than IPv4. This is done by having each address
family register its callbacks with the common code, so that its FIB tables
and rules could be dumped upon registration to the chain, while ensuring
the integrity of the dump. The exact mechanics are explained in detail in
the first patch.

The next seven patches build upon this work and add the necessary
callbacks in IPv6 code. This allows listeners of the chain to receive
notifications about IPv6 routes addition, deletion and replacement as
well as FIB rules notifications.

Unlike user space notifications for IPv6 multipath routes, the FIB
notification chain notifies these on a per-nexthop basis. This allows
us to keep the common code lean and is also unnecessary, as notifications
are serialized by each table's lock whereas applications maintaining
netlink caches may suffer from concurrent dumps and deletions / additions
of routes.

The last six patches enable the mlxsw driver to offload IPv6 unicast
routes to the Spectrum ASIC. Without resorting to ACLs, lookup is done
solely based on the destination IP, so the abort mechanism is invoked
upon the addition of source-specific routes.

Follow-up patch sets will increase the scale of gatewayed routes by
consolidating identical nexthop groups to one adjacency entry in the
device's adjacency table (as in IPv4), as well as add support for
NH_{ADD,DEL} events which enable support for the
'ignore_routes_with_linkdown' sysctl.

Ido Schimmel (17):
  net: core: Make the FIB notification chain generic
  mlxsw: spectrum_router: Ignore address families other than IPv4
  rocker: Ignore address families other than IPv4
  net: fib_rules: Implement notification logic in core
  ipv6: fib_rules: Check if rule is a default rule
  ipv6: fib: Add FIB notifiers callbacks
  ipv6: fib: Add in-kernel notifications for route add / delete
  ipv6: fib_rules: Dump rules during registration to FIB chain
  ipv6: fib: Dump tables during registration to FIB chain
  ipv6: fib: Add offload indication to routes
  ipv6: fib: Allow non-FIB users to take reference on route
  mlxsw: spectrum_router: Demultiplex FIB event based on family
  mlxsw: spectrum_router: Sanitize IPv6 FIB rules
  mlxsw: spectrum_router: Add support for IPv6 routes addition /
    deletion
  mlxsw: spectrum_router: Add support for route replace
  mlxsw: spectrum_router: Abort on source-specific routes
  mlxsw: spectrum_router: Don't ignore IPv6 notifications

 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 815 ++++++++++++++++++++-
 drivers/net/ethernet/rocker/rocker_main.c          |   5 +
 include/net/fib_notifier.h                         |  44 ++
 include/net/fib_rules.h                            |   9 +
 include/net/ip6_fib.h                              |  52 ++
 include/net/ip_fib.h                               |  54 +-
 include/net/net_namespace.h                        |   1 +
 include/net/netns/ipv4.h                           |   1 +
 include/net/netns/ipv6.h                           |   1 +
 include/uapi/linux/ipv6_route.h                    |   1 +
 net/core/Makefile                                  |   3 +-
 net/core/fib_notifier.c                            | 164 +++++
 net/core/fib_rules.c                               |  63 ++
 net/ipv4/fib_frontend.c                            |  17 +-
 net/ipv4/fib_notifier.c                            |  99 ++-
 net/ipv4/fib_rules.c                               |  44 +-
 net/ipv4/fib_semantics.c                           |   9 +-
 net/ipv4/fib_trie.c                                |   5 +-
 net/ipv6/Makefile                                  |   2 +-
 net/ipv6/fib6_notifier.c                           |  61 ++
 net/ipv6/fib6_rules.c                              |  31 +
 net/ipv6/ip6_fib.c                                 | 126 +++-
 net/ipv6/route.c                                   |  23 +-
 23 files changed, 1459 insertions(+), 171 deletions(-)
 create mode 100644 include/net/fib_notifier.h
 create mode 100644 net/core/fib_notifier.c
 create mode 100644 net/ipv6/fib6_notifier.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch net-next 01/17] net: core: Make the FIB notification chain generic
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19 14:11   ` David Ahern
  2017-07-19  7:02 ` [patch net-next 02/17] mlxsw: spectrum_router: Ignore address families other than IPv4 Jiri Pirko
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

The FIB notification chain is currently soley used by IPv4 code.
However, we're going to introduce IPv6 FIB offload support, which
requires these notification as well.

As explained in commit c3852ef7f2f8 ("ipv4: fib: Replay events when
registering FIB notifier"), upon registration to the chain, the callee
receives a full dump of the FIB tables and rules by traversing all the
net namespaces. The integrity of the dump is ensured by a per-namespace
sequence counter that is incremented whenever a change to the tables or
rules occurs.

In order to allow more address families to use the chain, each family is
expected to register its fib_notifier_ops in its pernet init. These
operations allow the common code to read the family's sequence counter
as well as dump its tables and rules in the given net namespace.

Additionally, a 'family' parameter is added to sent notifications, so
that listeners could distinguish between the different families.

Implement the common code that allows listeners to register to the chain
and for address families to register their fib_notifier_ops. Subsequent
patches will implement these operations in IPv6.

In the future, ipmr and ip6mr will be extended to provide these
notifications as well.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |   1 +
 drivers/net/ethernet/rocker/rocker_main.c          |   1 +
 include/net/fib_notifier.h                         |  44 ++++++
 include/net/ip_fib.h                               |  30 +---
 include/net/net_namespace.h                        |   1 +
 include/net/netns/ipv4.h                           |   1 +
 net/core/Makefile                                  |   3 +-
 net/core/fib_notifier.c                            | 164 +++++++++++++++++++++
 net/ipv4/fib_frontend.c                            |  17 ++-
 net/ipv4/fib_notifier.c                            |  94 +++++-------
 net/ipv4/fib_rules.c                               |   5 +-
 net/ipv4/fib_semantics.c                           |   9 +-
 net/ipv4/fib_trie.c                                |   5 +-
 13 files changed, 282 insertions(+), 93 deletions(-)
 create mode 100644 include/net/fib_notifier.h
 create mode 100644 net/core/fib_notifier.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index e6d629f..6069681 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -52,6 +52,7 @@
 #include <net/addrconf.h>
 #include <net/ndisc.h>
 #include <net/ipv6.h>
+#include <net/fib_notifier.h>
 
 #include "spectrum.h"
 #include "core.h"
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index b1e5c07..ef38c1a 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -34,6 +34,7 @@
 #include <net/netevent.h>
 #include <net/arp.h>
 #include <net/fib_rules.h>
+#include <net/fib_notifier.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <generated/utsrelease.h>
 
diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h
new file mode 100644
index 0000000..2414752
--- /dev/null
+++ b/include/net/fib_notifier.h
@@ -0,0 +1,44 @@
+#ifndef __NET_FIB_NOTIFIER_H
+#define __NET_FIB_NOTIFIER_H
+
+#include <linux/types.h>
+#include <linux/notifier.h>
+#include <net/net_namespace.h>
+
+struct fib_notifier_info {
+	struct net *net;
+	int family;
+};
+
+enum fib_event_type {
+	FIB_EVENT_ENTRY_REPLACE,
+	FIB_EVENT_ENTRY_APPEND,
+	FIB_EVENT_ENTRY_ADD,
+	FIB_EVENT_ENTRY_DEL,
+	FIB_EVENT_RULE_ADD,
+	FIB_EVENT_RULE_DEL,
+	FIB_EVENT_NH_ADD,
+	FIB_EVENT_NH_DEL,
+};
+
+struct fib_notifier_ops {
+	int family;
+	struct list_head list;
+	unsigned int (*fib_seq_read)(struct net *net);
+	int (*fib_dump)(struct net *net, struct notifier_block *nb);
+	struct rcu_head rcu;
+};
+
+int call_fib_notifier(struct notifier_block *nb, struct net *net,
+		      enum fib_event_type event_type,
+		      struct fib_notifier_info *info);
+int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
+		       struct fib_notifier_info *info);
+int register_fib_notifier(struct notifier_block *nb,
+			  void (*cb)(struct notifier_block *nb));
+int unregister_fib_notifier(struct notifier_block *nb);
+struct fib_notifier_ops *
+fib_notifier_ops_register(const struct fib_notifier_ops *tmpl, struct net *net);
+void fib_notifier_ops_unregister(struct fib_notifier_ops *ops);
+
+#endif
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 41d580c..800a006 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -19,6 +19,7 @@
 #include <net/flow.h>
 #include <linux/seq_file.h>
 #include <linux/rcupdate.h>
+#include <net/fib_notifier.h>
 #include <net/fib_rules.h>
 #include <net/inetpeer.h>
 #include <linux/percpu.h>
@@ -201,10 +202,6 @@ static inline void fib_info_offload_dec(struct fib_info *fi)
 #define FIB_RES_PREFSRC(net, res)	((res).fi->fib_prefsrc ? : \
 					 FIB_RES_SADDR(net, res))
 
-struct fib_notifier_info {
-	struct net *net;
-};
-
 struct fib_entry_notifier_info {
 	struct fib_notifier_info info; /* must be first */
 	u32 dst;
@@ -225,25 +222,14 @@ struct fib_nh_notifier_info {
 	struct fib_nh *fib_nh;
 };
 
-enum fib_event_type {
-	FIB_EVENT_ENTRY_REPLACE,
-	FIB_EVENT_ENTRY_APPEND,
-	FIB_EVENT_ENTRY_ADD,
-	FIB_EVENT_ENTRY_DEL,
-	FIB_EVENT_RULE_ADD,
-	FIB_EVENT_RULE_DEL,
-	FIB_EVENT_NH_ADD,
-	FIB_EVENT_NH_DEL,
-};
-
-int register_fib_notifier(struct notifier_block *nb,
-			  void (*cb)(struct notifier_block *nb));
-int unregister_fib_notifier(struct notifier_block *nb);
-int call_fib_notifier(struct notifier_block *nb, struct net *net,
-		      enum fib_event_type event_type,
-		      struct fib_notifier_info *info);
-int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
+int call_fib4_notifier(struct notifier_block *nb, struct net *net,
+		       enum fib_event_type event_type,
 		       struct fib_notifier_info *info);
+int call_fib4_notifiers(struct net *net, enum fib_event_type event_type,
+			struct fib_notifier_info *info);
+
+int __net_init fib4_notifier_init(struct net *net);
+void __net_exit fib4_notifier_exit(struct net *net);
 
 void fib_notify(struct net *net, struct notifier_block *nb);
 #ifdef CONFIG_IP_MULTIPLE_TABLES
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 31a2b51..7700c2f 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -88,6 +88,7 @@ struct net {
 	/* core fib_rules */
 	struct list_head	rules_ops;
 
+	struct list_head	fib_notifier_ops;  /* protected by net_mutex */
 
 	struct net_device       *loopback_dev;          /* The loopback */
 	struct netns_core	core;
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 9a14a08..20d061c 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -159,6 +159,7 @@ struct netns_ipv4 {
 	int sysctl_fib_multipath_hash_policy;
 #endif
 
+	struct fib_notifier_ops	*notifier_ops;
 	unsigned int	fib_seq;	/* protected by rtnl_mutex */
 
 	atomic_t	rt_genid;
diff --git a/net/core/Makefile b/net/core/Makefile
index d501c42..56d771a 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -9,7 +9,8 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
 obj-y		     += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \
 			neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
-			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o
+			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
+			fib_notifier.o
 
 obj-y += net-sysfs.o
 obj-$(CONFIG_PROC_FS) += net-procfs.o
diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
new file mode 100644
index 0000000..292aab8
--- /dev/null
+++ b/net/core/fib_notifier.c
@@ -0,0 +1,164 @@
+#include <linux/rtnetlink.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <net/net_namespace.h>
+#include <net/fib_notifier.h>
+
+static ATOMIC_NOTIFIER_HEAD(fib_chain);
+
+int call_fib_notifier(struct notifier_block *nb, struct net *net,
+		      enum fib_event_type event_type,
+		      struct fib_notifier_info *info)
+{
+	info->net = net;
+	return nb->notifier_call(nb, event_type, info);
+}
+EXPORT_SYMBOL(call_fib_notifier);
+
+int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
+		       struct fib_notifier_info *info)
+{
+	info->net = net;
+	return atomic_notifier_call_chain(&fib_chain, event_type, info);
+}
+EXPORT_SYMBOL(call_fib_notifiers);
+
+static unsigned int fib_seq_sum(void)
+{
+	struct fib_notifier_ops *ops;
+	unsigned int fib_seq = 0;
+	struct net *net;
+
+	rtnl_lock();
+	for_each_net(net) {
+		list_for_each_entry(ops, &net->fib_notifier_ops, list)
+			fib_seq += ops->fib_seq_read(net);
+	}
+	rtnl_unlock();
+
+	return fib_seq;
+}
+
+static int fib_net_dump(struct net *net, struct notifier_block *nb)
+{
+	struct fib_notifier_ops *ops;
+
+	list_for_each_entry_rcu(ops, &net->fib_notifier_ops, list) {
+		int err = ops->fib_dump(net, nb);
+
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static bool fib_dump_is_consistent(struct notifier_block *nb,
+				   void (*cb)(struct notifier_block *nb),
+				   unsigned int fib_seq)
+{
+	atomic_notifier_chain_register(&fib_chain, nb);
+	if (fib_seq == fib_seq_sum())
+		return true;
+	atomic_notifier_chain_unregister(&fib_chain, nb);
+	if (cb)
+		cb(nb);
+	return false;
+}
+
+#define FIB_DUMP_MAX_RETRIES 5
+int register_fib_notifier(struct notifier_block *nb,
+			  void (*cb)(struct notifier_block *nb))
+{
+	int retries = 0;
+	int err;
+
+	do {
+		unsigned int fib_seq = fib_seq_sum();
+		struct net *net;
+
+		rcu_read_lock();
+		for_each_net_rcu(net) {
+			err = fib_net_dump(net, nb);
+			if (err)
+				goto err_fib_net_dump;
+		}
+		rcu_read_unlock();
+
+		if (fib_dump_is_consistent(nb, cb, fib_seq))
+			return 0;
+	} while (++retries < FIB_DUMP_MAX_RETRIES);
+
+	return -EBUSY;
+
+err_fib_net_dump:
+	rcu_read_unlock();
+	return err;
+}
+EXPORT_SYMBOL(register_fib_notifier);
+
+int unregister_fib_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_unregister(&fib_chain, nb);
+}
+EXPORT_SYMBOL(unregister_fib_notifier);
+
+static int __fib_notifier_ops_register(struct fib_notifier_ops *ops,
+				       struct net *net)
+{
+	struct fib_notifier_ops *o;
+
+	list_for_each_entry(o, &net->fib_notifier_ops, list)
+		if (ops->family == o->family)
+			return -EEXIST;
+	list_add_tail_rcu(&ops->list, &net->fib_notifier_ops);
+	return 0;
+}
+
+struct fib_notifier_ops *
+fib_notifier_ops_register(const struct fib_notifier_ops *tmpl, struct net *net)
+{
+	struct fib_notifier_ops *ops;
+	int err;
+
+	ops = kmemdup(tmpl, sizeof(*ops), GFP_KERNEL);
+	if (!ops)
+		return ERR_PTR(-ENOMEM);
+
+	err = __fib_notifier_ops_register(ops, net);
+	if (err)
+		goto err_register;
+
+	return ops;
+
+err_register:
+	kfree(ops);
+	return ERR_PTR(err);
+}
+EXPORT_SYMBOL(fib_notifier_ops_register);
+
+void fib_notifier_ops_unregister(struct fib_notifier_ops *ops)
+{
+	list_del_rcu(&ops->list);
+	kfree_rcu(ops, rcu);
+}
+EXPORT_SYMBOL(fib_notifier_ops_unregister);
+
+static int __net_init fib_notifier_net_init(struct net *net)
+{
+	INIT_LIST_HEAD(&net->fib_notifier_ops);
+	return 0;
+}
+
+static struct pernet_operations fib_notifier_net_ops = {
+	.init = fib_notifier_net_init,
+};
+
+static int __init fib_notifier_init(void)
+{
+	return register_pernet_subsys(&fib_notifier_net_ops);
+}
+
+subsys_initcall(fib_notifier_init);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4e678fa..7ed3c0a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1247,22 +1247,28 @@ static int __net_init ip_fib_net_init(struct net *net)
 	int err;
 	size_t size = sizeof(struct hlist_head) * FIB_TABLE_HASHSZ;
 
-	net->ipv4.fib_seq = 0;
+	err = fib4_notifier_init(net);
+	if (err)
+		return err;
 
 	/* Avoid false sharing : Use at least a full cache line */
 	size = max_t(size_t, size, L1_CACHE_BYTES);
 
 	net->ipv4.fib_table_hash = kzalloc(size, GFP_KERNEL);
-	if (!net->ipv4.fib_table_hash)
-		return -ENOMEM;
+	if (!net->ipv4.fib_table_hash) {
+		err = -ENOMEM;
+		goto err_table_hash_alloc;
+	}
 
 	err = fib4_rules_init(net);
 	if (err < 0)
-		goto fail;
+		goto err_rules_init;
 	return 0;
 
-fail:
+err_rules_init:
 	kfree(net->ipv4.fib_table_hash);
+err_table_hash_alloc:
+	fib4_notifier_exit(net);
 	return err;
 }
 
@@ -1292,6 +1298,7 @@ static void ip_fib_net_exit(struct net *net)
 #endif
 	rtnl_unlock();
 	kfree(net->ipv4.fib_table_hash);
+	fib4_notifier_exit(net);
 }
 
 static int __net_init fib_net_init(struct net *net)
diff --git a/net/ipv4/fib_notifier.c b/net/ipv4/fib_notifier.c
index e0714d9..7cf1954 100644
--- a/net/ipv4/fib_notifier.c
+++ b/net/ipv4/fib_notifier.c
@@ -1,86 +1,66 @@
 #include <linux/rtnetlink.h>
 #include <linux/notifier.h>
-#include <linux/rcupdate.h>
+#include <linux/socket.h>
 #include <linux/kernel.h>
 #include <net/net_namespace.h>
+#include <net/fib_notifier.h>
 #include <net/netns/ipv4.h>
 #include <net/ip_fib.h>
 
-static ATOMIC_NOTIFIER_HEAD(fib_chain);
-
-int call_fib_notifier(struct notifier_block *nb, struct net *net,
-		      enum fib_event_type event_type,
-		      struct fib_notifier_info *info)
+int call_fib4_notifier(struct notifier_block *nb, struct net *net,
+		       enum fib_event_type event_type,
+		       struct fib_notifier_info *info)
 {
-	info->net = net;
-	return nb->notifier_call(nb, event_type, info);
+	info->family = AF_INET;
+	return call_fib_notifier(nb, net, event_type, info);
 }
 
-int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
-		       struct fib_notifier_info *info)
+int call_fib4_notifiers(struct net *net, enum fib_event_type event_type,
+			struct fib_notifier_info *info)
 {
+	ASSERT_RTNL();
+
+	info->family = AF_INET;
 	net->ipv4.fib_seq++;
-	info->net = net;
-	return atomic_notifier_call_chain(&fib_chain, event_type, info);
+	return call_fib_notifiers(net, event_type, info);
 }
 
-static unsigned int fib_seq_sum(void)
+static unsigned int fib4_seq_read(struct net *net)
 {
-	unsigned int fib_seq = 0;
-	struct net *net;
+	ASSERT_RTNL();
 
-	rtnl_lock();
-	for_each_net(net)
-		fib_seq += net->ipv4.fib_seq;
-	rtnl_unlock();
-
-	return fib_seq;
+	return net->ipv4.fib_seq;
 }
 
-static bool fib_dump_is_consistent(struct notifier_block *nb,
-				   void (*cb)(struct notifier_block *nb),
-				   unsigned int fib_seq)
+static int fib4_dump(struct net *net, struct notifier_block *nb)
 {
-	atomic_notifier_chain_register(&fib_chain, nb);
-	if (fib_seq == fib_seq_sum())
-		return true;
-	atomic_notifier_chain_unregister(&fib_chain, nb);
-	if (cb)
-		cb(nb);
-	return false;
+	fib_rules_notify(net, nb);
+	fib_notify(net, nb);
+
+	return 0;
 }
 
-#define FIB_DUMP_MAX_RETRIES 5
-int register_fib_notifier(struct notifier_block *nb,
-			  void (*cb)(struct notifier_block *nb))
-{
-	int retries = 0;
+static const struct fib_notifier_ops fib4_notifier_ops_template = {
+	.family		= AF_INET,
+	.fib_seq_read	= fib4_seq_read,
+	.fib_dump	= fib4_dump,
+};
 
-	do {
-		unsigned int fib_seq = fib_seq_sum();
-		struct net *net;
+int __net_init fib4_notifier_init(struct net *net)
+{
+	struct fib_notifier_ops *ops;
 
-		/* Mutex semantics guarantee that every change done to
-		 * FIB tries before we read the change sequence counter
-		 * is now visible to us.
-		 */
-		rcu_read_lock();
-		for_each_net_rcu(net) {
-			fib_rules_notify(net, nb);
-			fib_notify(net, nb);
-		}
-		rcu_read_unlock();
+	net->ipv4.fib_seq = 0;
 
-		if (fib_dump_is_consistent(nb, cb, fib_seq))
-			return 0;
-	} while (++retries < FIB_DUMP_MAX_RETRIES);
+	ops = fib_notifier_ops_register(&fib4_notifier_ops_template, net);
+	if (IS_ERR(ops))
+		return PTR_ERR(ops);
+	net->ipv4.notifier_ops = ops;
 
-	return -EBUSY;
+	return 0;
 }
-EXPORT_SYMBOL(register_fib_notifier);
 
-int unregister_fib_notifier(struct notifier_block *nb)
+void __net_exit fib4_notifier_exit(struct net *net)
 {
-	return atomic_notifier_chain_unregister(&fib_chain, nb);
+	fib_notifier_ops_unregister(net->ipv4.notifier_ops);
 }
-EXPORT_SYMBOL(unregister_fib_notifier);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 778ecf9..acdbf5a 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -32,6 +32,7 @@
 #include <net/tcp.h>
 #include <net/ip_fib.h>
 #include <net/fib_rules.h>
+#include <net/fib_notifier.h>
 
 struct fib4_rule {
 	struct fib_rule		common;
@@ -193,7 +194,7 @@ static int call_fib_rule_notifier(struct notifier_block *nb, struct net *net,
 		.rule = rule,
 	};
 
-	return call_fib_notifier(nb, net, event_type, &info.info);
+	return call_fib4_notifier(nb, net, event_type, &info.info);
 }
 
 static int call_fib_rule_notifiers(struct net *net,
@@ -204,7 +205,7 @@ static int call_fib_rule_notifiers(struct net *net,
 		.rule = rule,
 	};
 
-	return call_fib_notifiers(net, event_type, &info.info);
+	return call_fib4_notifiers(net, event_type, &info.info);
 }
 
 /* Called with rcu_read_lock() */
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 22210010..1cd7b5d 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -44,6 +44,7 @@
 #include <net/netlink.h>
 #include <net/nexthop.h>
 #include <net/lwtunnel.h>
+#include <net/fib_notifier.h>
 
 #include "fib_lookup.h"
 
@@ -1449,14 +1450,14 @@ static int call_fib_nh_notifiers(struct fib_nh *fib_nh,
 		if (IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
 		    fib_nh->nh_flags & RTNH_F_LINKDOWN)
 			break;
-		return call_fib_notifiers(dev_net(fib_nh->nh_dev), event_type,
-					  &info.info);
+		return call_fib4_notifiers(dev_net(fib_nh->nh_dev), event_type,
+					   &info.info);
 	case FIB_EVENT_NH_DEL:
 		if ((IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
 		     fib_nh->nh_flags & RTNH_F_LINKDOWN) ||
 		    (fib_nh->nh_flags & RTNH_F_DEAD))
-			return call_fib_notifiers(dev_net(fib_nh->nh_dev),
-						  event_type, &info.info);
+			return call_fib4_notifiers(dev_net(fib_nh->nh_dev),
+						   event_type, &info.info);
 	default:
 		break;
 	}
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 64668c6..1a6ffb0 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -81,6 +81,7 @@
 #include <net/tcp.h>
 #include <net/sock.h>
 #include <net/ip_fib.h>
+#include <net/fib_notifier.h>
 #include <trace/events/fib.h>
 #include "fib_lookup.h"
 
@@ -97,7 +98,7 @@ static int call_fib_entry_notifier(struct notifier_block *nb, struct net *net,
 		.type = type,
 		.tb_id = tb_id,
 	};
-	return call_fib_notifier(nb, net, event_type, &info.info);
+	return call_fib4_notifier(nb, net, event_type, &info.info);
 }
 
 static int call_fib_entry_notifiers(struct net *net,
@@ -113,7 +114,7 @@ static int call_fib_entry_notifiers(struct net *net,
 		.type = type,
 		.tb_id = tb_id,
 	};
-	return call_fib_notifiers(net, event_type, &info.info);
+	return call_fib4_notifiers(net, event_type, &info.info);
 }
 
 #define MAX_STAT_DEPTH 32
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 02/17] mlxsw: spectrum_router: Ignore address families other than IPv4
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 01/17] net: core: Make the FIB notification chain generic Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 03/17] rocker: " Jiri Pirko
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

We're about to add IPv6 notifications in the FIB notification chain, but
the driver currently doesn't support these, so ignore them.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 6069681..7965a53 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -43,6 +43,7 @@
 #include <linux/inetdevice.h>
 #include <linux/netdevice.h>
 #include <linux/if_bridge.h>
+#include <linux/socket.h>
 #include <net/netevent.h>
 #include <net/neighbour.h>
 #include <net/arp.h>
@@ -3034,7 +3035,7 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 	struct fib_notifier_info *info = ptr;
 	struct mlxsw_sp_router *router;
 
-	if (!net_eq(info->net, &init_net))
+	if (!net_eq(info->net, &init_net) || info->family != AF_INET)
 		return NOTIFY_DONE;
 
 	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 03/17] rocker: Ignore address families other than IPv4
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 01/17] net: core: Make the FIB notification chain generic Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 02/17] mlxsw: spectrum_router: Ignore address families other than IPv4 Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 04/17] net: fib_rules: Implement notification logic in core Jiri Pirko
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

As in previous patch, ignore IPv6 notifications since the driver doesn't
support these.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index ef38c1a..fc8f8bd 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2192,6 +2192,10 @@ static int rocker_router_fib_event(struct notifier_block *nb,
 {
 	struct rocker *rocker = container_of(nb, struct rocker, fib_nb);
 	struct rocker_fib_event_work *fib_work;
+	struct fib_notifier_info *info = ptr;
+
+	if (info->family != AF_INET)
+		return NOTIFY_DONE;
 
 	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
 	if (WARN_ON(!fib_work))
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 04/17] net: fib_rules: Implement notification logic in core
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (2 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 03/17] rocker: " Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 05/17] ipv6: fib_rules: Check if rule is a default rule Jiri Pirko
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

Unlike the routing tables, the FIB rules share a common core, so instead
of replicating the same logic for each address family we can simply dump
the rules and send notifications from the core itself.

To protect the integrity of the dump, a rules-specific sequence counter
is added for each address family and incremented whenever a rule is
added or deleted (under RTNL).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/fib_rules.h |  9 +++++++
 include/net/ip_fib.h    | 24 +++++++++----------
 net/core/fib_rules.c    | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/fib_notifier.c |  9 +++++--
 net/ipv4/fib_rules.c    | 45 ++++++++---------------------------
 5 files changed, 101 insertions(+), 49 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index c487bfa..3d7f1ce 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -8,6 +8,7 @@
 #include <linux/refcount.h>
 #include <net/flow.h>
 #include <net/rtnetlink.h>
+#include <net/fib_notifier.h>
 
 struct fib_kuid_range {
 	kuid_t start;
@@ -57,6 +58,7 @@ struct fib_rules_ops {
 	int			addr_size;
 	int			unresolved_rules;
 	int			nr_goto_rules;
+	unsigned int		fib_rules_seq;
 
 	int			(*action)(struct fib_rule *,
 					  struct flowi *, int,
@@ -89,6 +91,11 @@ struct fib_rules_ops {
 	struct rcu_head		rcu;
 };
 
+struct fib_rule_notifier_info {
+	struct fib_notifier_info info; /* must be first */
+	struct fib_rule *rule;
+};
+
 #define FRA_GENERIC_POLICY \
 	[FRA_IIFNAME]	= { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, \
 	[FRA_OIFNAME]	= { .type = NLA_STRING, .len = IFNAMSIZ - 1 }, \
@@ -143,6 +150,8 @@ int fib_rules_lookup(struct fib_rules_ops *, struct flowi *, int flags,
 int fib_default_rule_add(struct fib_rules_ops *, u32 pref, u32 table,
 			 u32 flags);
 bool fib_rule_matchall(const struct fib_rule *rule);
+int fib_rules_dump(struct net *net, struct notifier_block *nb, int family);
+unsigned int fib_rules_seq_read(struct net *net, int family);
 
 int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh,
 		   struct netlink_ext_ack *extack);
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 800a006..593d8e2 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -212,11 +212,6 @@ struct fib_entry_notifier_info {
 	u32 tb_id;
 };
 
-struct fib_rule_notifier_info {
-	struct fib_notifier_info info; /* must be first */
-	struct fib_rule *rule;
-};
-
 struct fib_nh_notifier_info {
 	struct fib_notifier_info info; /* must be first */
 	struct fib_nh *fib_nh;
@@ -232,13 +227,6 @@ int __net_init fib4_notifier_init(struct net *net);
 void __net_exit fib4_notifier_exit(struct net *net);
 
 void fib_notify(struct net *net, struct notifier_block *nb);
-#ifdef CONFIG_IP_MULTIPLE_TABLES
-void fib_rules_notify(struct net *net, struct notifier_block *nb);
-#else
-static inline void fib_rules_notify(struct net *net, struct notifier_block *nb)
-{
-}
-#endif
 
 struct fib_table {
 	struct hlist_node	tb_hlist;
@@ -311,6 +299,16 @@ static inline bool fib4_rule_default(const struct fib_rule *rule)
 	return true;
 }
 
+static inline int fib4_rules_dump(struct net *net, struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline unsigned int fib4_rules_seq_read(struct net *net)
+{
+	return 0;
+}
+
 #else /* CONFIG_IP_MULTIPLE_TABLES */
 int __net_init fib4_rules_init(struct net *net);
 void __net_exit fib4_rules_exit(struct net *net);
@@ -356,6 +354,8 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp,
 }
 
 bool fib4_rule_default(const struct fib_rule *rule);
+int fib4_rules_dump(struct net *net, struct notifier_block *nb);
+unsigned int fib4_rules_seq_read(struct net *net);
 
 #endif /* CONFIG_IP_MULTIPLE_TABLES */
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index a0093e1..6678813 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -299,6 +299,67 @@ int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl,
 }
 EXPORT_SYMBOL_GPL(fib_rules_lookup);
 
+static int call_fib_rule_notifier(struct notifier_block *nb, struct net *net,
+				  enum fib_event_type event_type,
+				  struct fib_rule *rule, int family)
+{
+	struct fib_rule_notifier_info info = {
+		.info.family = family,
+		.rule = rule,
+	};
+
+	return call_fib_notifier(nb, net, event_type, &info.info);
+}
+
+static int call_fib_rule_notifiers(struct net *net,
+				   enum fib_event_type event_type,
+				   struct fib_rule *rule,
+				   struct fib_rules_ops *ops)
+{
+	struct fib_rule_notifier_info info = {
+		.info.family = ops->family,
+		.rule = rule,
+	};
+
+	ops->fib_rules_seq++;
+	return call_fib_notifiers(net, event_type, &info.info);
+}
+
+/* Called with rcu_read_lock() */
+int fib_rules_dump(struct net *net, struct notifier_block *nb, int family)
+{
+	struct fib_rules_ops *ops;
+	struct fib_rule *rule;
+
+	ops = lookup_rules_ops(net, family);
+	if (!ops)
+		return -EAFNOSUPPORT;
+	list_for_each_entry_rcu(rule, &ops->rules_list, list)
+		call_fib_rule_notifier(nb, net, FIB_EVENT_RULE_ADD, rule,
+				       family);
+	rules_ops_put(ops);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fib_rules_dump);
+
+unsigned int fib_rules_seq_read(struct net *net, int family)
+{
+	unsigned int fib_rules_seq;
+	struct fib_rules_ops *ops;
+
+	ASSERT_RTNL();
+
+	ops = lookup_rules_ops(net, family);
+	if (!ops)
+		return 0;
+	fib_rules_seq = ops->fib_rules_seq;
+	rules_ops_put(ops);
+
+	return fib_rules_seq;
+}
+EXPORT_SYMBOL_GPL(fib_rules_seq_read);
+
 static int validate_rulemsg(struct fib_rule_hdr *frh, struct nlattr **tb,
 			    struct fib_rules_ops *ops)
 {
@@ -549,6 +610,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (rule->tun_id)
 		ip_tunnel_need_metadata();
 
+	call_fib_rule_notifiers(net, FIB_EVENT_RULE_ADD, rule, ops);
 	notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).portid);
 	flush_route_cache(ops);
 	rules_ops_put(ops);
@@ -688,6 +750,7 @@ int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr *nlh,
 			}
 		}
 
+		call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL, rule, ops);
 		notify_rule_change(RTM_DELRULE, rule, ops, nlh,
 				   NETLINK_CB(skb).portid);
 		fib_rule_put(rule);
diff --git a/net/ipv4/fib_notifier.c b/net/ipv4/fib_notifier.c
index 7cf1954..5d7afb1 100644
--- a/net/ipv4/fib_notifier.c
+++ b/net/ipv4/fib_notifier.c
@@ -29,12 +29,17 @@ static unsigned int fib4_seq_read(struct net *net)
 {
 	ASSERT_RTNL();
 
-	return net->ipv4.fib_seq;
+	return net->ipv4.fib_seq + fib4_rules_seq_read(net);
 }
 
 static int fib4_dump(struct net *net, struct notifier_block *nb)
 {
-	fib_rules_notify(net, nb);
+	int err;
+
+	err = fib4_rules_dump(net, nb);
+	if (err)
+		return err;
+
 	fib_notify(net, nb);
 
 	return 0;
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index acdbf5a..35d646a 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -32,7 +32,6 @@
 #include <net/tcp.h>
 #include <net/ip_fib.h>
 #include <net/fib_rules.h>
-#include <net/fib_notifier.h>
 
 struct fib4_rule {
 	struct fib_rule		common;
@@ -69,6 +68,16 @@ bool fib4_rule_default(const struct fib_rule *rule)
 }
 EXPORT_SYMBOL_GPL(fib4_rule_default);
 
+int fib4_rules_dump(struct net *net, struct notifier_block *nb)
+{
+	return fib_rules_dump(net, nb, AF_INET);
+}
+
+unsigned int fib4_rules_seq_read(struct net *net)
+{
+	return fib_rules_seq_read(net, AF_INET);
+}
+
 int __fib_lookup(struct net *net, struct flowi4 *flp,
 		 struct fib_result *res, unsigned int flags)
 {
@@ -186,38 +195,6 @@ static struct fib_table *fib_empty_table(struct net *net)
 	return NULL;
 }
 
-static int call_fib_rule_notifier(struct notifier_block *nb, struct net *net,
-				  enum fib_event_type event_type,
-				  struct fib_rule *rule)
-{
-	struct fib_rule_notifier_info info = {
-		.rule = rule,
-	};
-
-	return call_fib4_notifier(nb, net, event_type, &info.info);
-}
-
-static int call_fib_rule_notifiers(struct net *net,
-				   enum fib_event_type event_type,
-				   struct fib_rule *rule)
-{
-	struct fib_rule_notifier_info info = {
-		.rule = rule,
-	};
-
-	return call_fib4_notifiers(net, event_type, &info.info);
-}
-
-/* Called with rcu_read_lock() */
-void fib_rules_notify(struct net *net, struct notifier_block *nb)
-{
-	struct fib_rules_ops *ops = net->ipv4.rules_ops;
-	struct fib_rule *rule;
-
-	list_for_each_entry_rcu(rule, &ops->rules_list, list)
-		call_fib_rule_notifier(nb, net, FIB_EVENT_RULE_ADD, rule);
-}
-
 static const struct nla_policy fib4_rule_policy[FRA_MAX+1] = {
 	FRA_GENERIC_POLICY,
 	[FRA_FLOW]	= { .type = NLA_U32 },
@@ -274,7 +251,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 	rule4->tos = frh->tos;
 
 	net->ipv4.fib_has_custom_rules = true;
-	call_fib_rule_notifiers(net, FIB_EVENT_RULE_ADD, rule);
 
 	err = 0;
 errout:
@@ -296,7 +272,6 @@ static int fib4_rule_delete(struct fib_rule *rule)
 		net->ipv4.fib_num_tclassid_users--;
 #endif
 	net->ipv4.fib_has_custom_rules = true;
-	call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL, rule);
 errout:
 	return err;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 05/17] ipv6: fib_rules: Check if rule is a default rule
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (3 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 04/17] net: fib_rules: Implement notification logic in core Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 06/17] ipv6: fib: Add FIB notifiers callbacks Jiri Pirko
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

As explained in commit 3c71006d15fd ("ipv4: fib_rules: Check if rule is
a default rule"), drivers supporting IPv6 FIB offload need to be able to
sanitize the rules they don't support and potentially flush their
tables.

Add an IPv6 helper to check if a FIB rule is a default rule.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip6_fib.h |  5 +++++
 net/ipv6/fib6_rules.c | 20 ++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1a88008..6000b0d 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -295,6 +295,7 @@ int ipv6_route_open(struct inode *inode, struct file *file);
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 int fib6_rules_init(void);
 void fib6_rules_cleanup(void);
+bool fib6_rule_default(const struct fib_rule *rule);
 #else
 static inline int               fib6_rules_init(void)
 {
@@ -304,5 +305,9 @@ static inline void              fib6_rules_cleanup(void)
 {
 	return ;
 }
+static inline bool fib6_rule_default(const struct fib_rule *rule)
+{
+	return true;
+}
 #endif
 #endif
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index ec849d8..ef1fcee 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -29,6 +29,26 @@ struct fib6_rule {
 	u8			tclass;
 };
 
+static bool fib6_rule_matchall(const struct fib_rule *rule)
+{
+	struct fib6_rule *r = container_of(rule, struct fib6_rule, common);
+
+	if (r->dst.plen || r->src.plen || r->tclass)
+		return false;
+	return fib_rule_matchall(rule);
+}
+
+bool fib6_rule_default(const struct fib_rule *rule)
+{
+	if (!fib6_rule_matchall(rule) || rule->action != FR_ACT_TO_TBL ||
+	    rule->l3mdev)
+		return false;
+	if (rule->table != RT6_TABLE_LOCAL && rule->table != RT6_TABLE_MAIN)
+		return false;
+	return true;
+}
+EXPORT_SYMBOL_GPL(fib6_rule_default);
+
 struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 				   int flags, pol_lookup_t lookup)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 06/17] ipv6: fib: Add FIB notifiers callbacks
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (4 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 05/17] ipv6: fib_rules: Check if rule is a default rule Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 07/17] ipv6: fib: Add in-kernel notifications for route add / delete Jiri Pirko
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

We're about to add IPv6 FIB offload support, so implement the necessary
callbacks in IPv6 code, which will later allow us to add routes and
rules notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip6_fib.h    | 11 ++++++++++
 include/net/netns/ipv6.h |  1 +
 net/ipv6/Makefile        |  2 +-
 net/ipv6/fib6_notifier.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/ip6_fib.c       |  7 ++++++
 5 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv6/fib6_notifier.c

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 6000b0d..be8ddf3 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -16,10 +16,12 @@
 #include <linux/ipv6_route.h>
 #include <linux/rtnetlink.h>
 #include <linux/spinlock.h>
+#include <linux/notifier.h>
 #include <net/dst.h>
 #include <net/flow.h>
 #include <net/netlink.h>
 #include <net/inetpeer.h>
+#include <net/fib_notifier.h>
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 #define FIB6_TABLE_HASHSZ 256
@@ -292,6 +294,15 @@ int fib6_init(void);
 
 int ipv6_route_open(struct inode *inode, struct file *file);
 
+int call_fib6_notifier(struct notifier_block *nb, struct net *net,
+		       enum fib_event_type event_type,
+		       struct fib_notifier_info *info);
+int call_fib6_notifiers(struct net *net, enum fib_event_type event_type,
+			struct fib_notifier_info *info);
+
+int __net_init fib6_notifier_init(struct net *net);
+void __net_exit fib6_notifier_exit(struct net *net);
+
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 int fib6_rules_init(void);
 void fib6_rules_cleanup(void);
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index de7745e..abdf3b4 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -86,6 +86,7 @@ struct netns_ipv6 {
 	atomic_t		dev_addr_genid;
 	atomic_t		fib6_sernum;
 	struct seg6_pernet_data *seg6_data;
+	struct fib_notifier_ops	*notifier_ops;
 };
 
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 217e9ff..f8b24c2 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -9,7 +9,7 @@ ipv6-objs :=	af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
 		route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
 		raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
 		exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
-		udp_offload.o seg6.o
+		udp_offload.o seg6.o fib6_notifier.o
 
 ipv6-offload :=	ip6_offload.o tcpv6_offload.o exthdrs_offload.o
 
diff --git a/net/ipv6/fib6_notifier.c b/net/ipv6/fib6_notifier.c
new file mode 100644
index 0000000..c2bb1ab
--- /dev/null
+++ b/net/ipv6/fib6_notifier.c
@@ -0,0 +1,55 @@
+#include <linux/notifier.h>
+#include <linux/socket.h>
+#include <linux/kernel.h>
+#include <net/net_namespace.h>
+#include <net/fib_notifier.h>
+#include <net/netns/ipv6.h>
+#include <net/ip6_fib.h>
+
+int call_fib6_notifier(struct notifier_block *nb, struct net *net,
+		       enum fib_event_type event_type,
+		       struct fib_notifier_info *info)
+{
+	info->family = AF_INET6;
+	return call_fib_notifier(nb, net, event_type, info);
+}
+
+int call_fib6_notifiers(struct net *net, enum fib_event_type event_type,
+			struct fib_notifier_info *info)
+{
+	info->family = AF_INET6;
+	return call_fib_notifiers(net, event_type, info);
+}
+
+static unsigned int fib6_seq_read(struct net *net)
+{
+	return 0;
+}
+
+static int fib6_dump(struct net *net, struct notifier_block *nb)
+{
+	return 0;
+}
+
+static const struct fib_notifier_ops fib6_notifier_ops_template = {
+	.family		= AF_INET6,
+	.fib_seq_read	= fib6_seq_read,
+	.fib_dump	= fib6_dump,
+};
+
+int __net_init fib6_notifier_init(struct net *net)
+{
+	struct fib_notifier_ops *ops;
+
+	ops = fib_notifier_ops_register(&fib6_notifier_ops_template, net);
+	if (IS_ERR(ops))
+		return PTR_ERR(ops);
+	net->ipv6.notifier_ops = ops;
+
+	return 0;
+}
+
+void __net_exit fib6_notifier_exit(struct net *net)
+{
+	fib_notifier_ops_unregister(net->ipv6.notifier_ops);
+}
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index ebb299c..f93976e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1839,6 +1839,11 @@ static void fib6_gc_timer_cb(unsigned long arg)
 static int __net_init fib6_net_init(struct net *net)
 {
 	size_t size = sizeof(struct hlist_head) * FIB6_TABLE_HASHSZ;
+	int err;
+
+	err = fib6_notifier_init(net);
+	if (err)
+		return err;
 
 	spin_lock_init(&net->ipv6.fib6_gc_lock);
 	rwlock_init(&net->ipv6.fib6_walker_lock);
@@ -1891,6 +1896,7 @@ static int __net_init fib6_net_init(struct net *net)
 out_rt6_stats:
 	kfree(net->ipv6.rt6_stats);
 out_timer:
+	fib6_notifier_exit(net);
 	return -ENOMEM;
 }
 
@@ -1907,6 +1913,7 @@ static void fib6_net_exit(struct net *net)
 	kfree(net->ipv6.fib6_main_tbl);
 	kfree(net->ipv6.fib_table_hash);
 	kfree(net->ipv6.rt6_stats);
+	fib6_notifier_exit(net);
 }
 
 static struct pernet_operations fib6_net_ops = {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 07/17] ipv6: fib: Add in-kernel notifications for route add / delete
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (5 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 06/17] ipv6: fib: Add FIB notifiers callbacks Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19 15:38   ` David Ahern
  2017-07-19  7:02 ` [patch net-next 08/17] ipv6: fib_rules: Dump rules during registration to FIB chain Jiri Pirko
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

As with IPv4, allow listeners of the FIB notification chain to receive
notifications whenever a route is added, replaced or deleted. This is
done by placing calls to the FIB notification chain in the two lowest
level functions that end up performing these operations - namely,
fib6_add_rt2node() and fib6_del_route().

Unlike IPv4, APPEND notifications aren't sent as the kernel doesn't
distinguish between "append" (NLM_F_CREATE|NLM_F_APPEND) and "prepend"
(NLM_F_CREATE). If NLM_F_EXCL isn't set, duplicate routes are always
added after the existing duplicate routes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip6_fib.h |  5 +++++
 net/ipv6/ip6_fib.c    | 17 +++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index be8ddf3..e2b292b 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -258,6 +258,11 @@ typedef struct rt6_info *(*pol_lookup_t)(struct net *,
 					 struct fib6_table *,
 					 struct flowi6 *, int);
 
+struct fib6_entry_notifier_info {
+	struct fib_notifier_info info; /* must be first */
+	struct rt6_info *rt;
+};
+
 /*
  *	exported functions
  */
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index f93976e..595a57c 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -33,6 +33,7 @@
 #include <net/ndisc.h>
 #include <net/addrconf.h>
 #include <net/lwtunnel.h>
+#include <net/fib_notifier.h>
 
 #include <net/ip6_fib.h>
 #include <net/ip6_route.h>
@@ -302,6 +303,17 @@ static void __net_init fib6_tables_init(struct net *net)
 
 #endif
 
+static int call_fib6_entry_notifiers(struct net *net,
+				     enum fib_event_type event_type,
+				     struct rt6_info *rt)
+{
+	struct fib6_entry_notifier_info info = {
+		.rt = rt,
+	};
+
+	return call_fib6_notifiers(net, event_type, &info.info);
+}
+
 static int fib6_dump_node(struct fib6_walker *w)
 {
 	int res;
@@ -879,6 +891,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		*ins = rt;
 		rt->rt6i_node = fn;
 		atomic_inc(&rt->rt6i_ref);
+		call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_ADD,
+					  rt);
 		if (!info->skip_notify)
 			inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
 		info->nl_net->ipv6.rt6_stats->fib_rt_entries++;
@@ -906,6 +920,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
 		rt->rt6i_node = fn;
 		rt->dst.rt6_next = iter->dst.rt6_next;
 		atomic_inc(&rt->rt6i_ref);
+		call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_REPLACE,
+					  rt);
 		if (!info->skip_notify)
 			inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
 		if (!(fn->fn_flags & RTN_RTINFO)) {
@@ -1459,6 +1475,7 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
 
 	fib6_purge_rt(rt, fn, net);
 
+	call_fib6_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, rt);
 	if (!info->skip_notify)
 		inet6_rt_notify(RTM_DELROUTE, rt, info, 0);
 	rt6_release(rt);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 08/17] ipv6: fib_rules: Dump rules during registration to FIB chain
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (6 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 07/17] ipv6: fib: Add in-kernel notifications for route add / delete Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 09/17] ipv6: fib: Dump tables " Jiri Pirko
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

Allow users of the FIB notification chain to receive a complete view of
the IPv6 FIB rules upon registration to the chain.

The integrity of the dump is ensured by a per-family sequence counter
that is incremented (under RTNL) whenever a rule is added or deleted.

All the sequence counters are read (under RTNL) and summed, prior and
after the dump. In case the counters differ, then the dump is either
restarted or the registration fails.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip6_fib.h    | 10 ++++++++++
 net/ipv6/fib6_notifier.c |  4 ++--
 net/ipv6/fib6_rules.c    | 11 +++++++++++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index e2b292b..dbe5537 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -312,6 +312,8 @@ void __net_exit fib6_notifier_exit(struct net *net);
 int fib6_rules_init(void);
 void fib6_rules_cleanup(void);
 bool fib6_rule_default(const struct fib_rule *rule);
+int fib6_rules_dump(struct net *net, struct notifier_block *nb);
+unsigned int fib6_rules_seq_read(struct net *net);
 #else
 static inline int               fib6_rules_init(void)
 {
@@ -325,5 +327,13 @@ static inline bool fib6_rule_default(const struct fib_rule *rule)
 {
 	return true;
 }
+static inline int fib6_rules_dump(struct net *net, struct notifier_block *nb)
+{
+	return 0;
+}
+static inline unsigned int fib6_rules_seq_read(struct net *net)
+{
+	return 0;
+}
 #endif
 #endif
diff --git a/net/ipv6/fib6_notifier.c b/net/ipv6/fib6_notifier.c
index c2bb1ab..298efc6 100644
--- a/net/ipv6/fib6_notifier.c
+++ b/net/ipv6/fib6_notifier.c
@@ -23,12 +23,12 @@ int call_fib6_notifiers(struct net *net, enum fib_event_type event_type,
 
 static unsigned int fib6_seq_read(struct net *net)
 {
-	return 0;
+	return fib6_rules_seq_read(net);
 }
 
 static int fib6_dump(struct net *net, struct notifier_block *nb)
 {
-	return 0;
+	return fib6_rules_dump(net, nb);
 }
 
 static const struct fib_notifier_ops fib6_notifier_ops_template = {
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index ef1fcee..2f29e4e 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -14,6 +14,7 @@
  */
 
 #include <linux/netdevice.h>
+#include <linux/notifier.h>
 #include <linux/export.h>
 
 #include <net/fib_rules.h>
@@ -49,6 +50,16 @@ bool fib6_rule_default(const struct fib_rule *rule)
 }
 EXPORT_SYMBOL_GPL(fib6_rule_default);
 
+int fib6_rules_dump(struct net *net, struct notifier_block *nb)
+{
+	return fib_rules_dump(net, nb, AF_INET6);
+}
+
+unsigned int fib6_rules_seq_read(struct net *net)
+{
+	return fib_rules_seq_read(net, AF_INET6);
+}
+
 struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
 				   int flags, pol_lookup_t lookup)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 09/17] ipv6: fib: Dump tables during registration to FIB chain
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (7 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 08/17] ipv6: fib_rules: Dump rules during registration to FIB chain Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 10/17] ipv6: fib: Add offload indication to routes Jiri Pirko
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

Dump all the FIB tables in each net namespace upon registration to the
FIB notification chain so that the callee will have a complete view of
the tables.

The integrity of the dump is ensured by a per-table sequence counter
that is incremented (under write lock) whenever a route is added or
deleted from the table.

All the sequence counters are read (under each table's read lock) and
summed, prior and after the dump. In case the counters differ, then the
dump is either restarted or the registration fails.

While it's possible for a table to be modified after its counter has
been read, this isn't really a problem. In case it happened before it
was read the second time, then the comparison at the end will fail. If
it happened afterwards, then we're guaranteed to be notified about the
change, as the notification block is registered prior to the second
read.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip6_fib.h    |  4 +++
 net/ipv6/fib6_notifier.c | 10 ++++--
 net/ipv6/ip6_fib.c       | 92 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index dbe5537..0b30521 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -235,6 +235,7 @@ struct fib6_table {
 	struct fib6_node	tb6_root;
 	struct inet_peer_base	tb6_peers;
 	unsigned int		flags;
+	unsigned int		fib_seq;
 #define RT6_TABLE_HAS_DFLT_ROUTER	BIT(0)
 };
 
@@ -308,6 +309,9 @@ int call_fib6_notifiers(struct net *net, enum fib_event_type event_type,
 int __net_init fib6_notifier_init(struct net *net);
 void __net_exit fib6_notifier_exit(struct net *net);
 
+unsigned int fib6_tables_seq_read(struct net *net);
+int fib6_tables_dump(struct net *net, struct notifier_block *nb);
+
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 int fib6_rules_init(void);
 void fib6_rules_cleanup(void);
diff --git a/net/ipv6/fib6_notifier.c b/net/ipv6/fib6_notifier.c
index 298efc6..66a103e 100644
--- a/net/ipv6/fib6_notifier.c
+++ b/net/ipv6/fib6_notifier.c
@@ -23,12 +23,18 @@ int call_fib6_notifiers(struct net *net, enum fib_event_type event_type,
 
 static unsigned int fib6_seq_read(struct net *net)
 {
-	return fib6_rules_seq_read(net);
+	return fib6_tables_seq_read(net) + fib6_rules_seq_read(net);
 }
 
 static int fib6_dump(struct net *net, struct notifier_block *nb)
 {
-	return fib6_rules_dump(net, nb);
+	int err;
+
+	err = fib6_rules_dump(net, nb);
+	if (err)
+		return err;
+
+	return fib6_tables_dump(net, nb);
 }
 
 static const struct fib_notifier_ops fib6_notifier_ops_template = {
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 595a57c..719c1048 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -303,6 +303,37 @@ static void __net_init fib6_tables_init(struct net *net)
 
 #endif
 
+unsigned int fib6_tables_seq_read(struct net *net)
+{
+	unsigned int h, fib_seq = 0;
+
+	rcu_read_lock();
+	for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
+		struct hlist_head *head = &net->ipv6.fib_table_hash[h];
+		struct fib6_table *tb;
+
+		hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
+			read_lock_bh(&tb->tb6_lock);
+			fib_seq += tb->fib_seq;
+			read_unlock_bh(&tb->tb6_lock);
+		}
+	}
+	rcu_read_unlock();
+
+	return fib_seq;
+}
+
+static int call_fib6_entry_notifier(struct notifier_block *nb, struct net *net,
+				    enum fib_event_type event_type,
+				    struct rt6_info *rt)
+{
+	struct fib6_entry_notifier_info info = {
+		.rt = rt,
+	};
+
+	return call_fib6_notifier(nb, net, event_type, &info.info);
+}
+
 static int call_fib6_entry_notifiers(struct net *net,
 				     enum fib_event_type event_type,
 				     struct rt6_info *rt)
@@ -311,9 +342,70 @@ static int call_fib6_entry_notifiers(struct net *net,
 		.rt = rt,
 	};
 
+	rt->rt6i_table->fib_seq++;
 	return call_fib6_notifiers(net, event_type, &info.info);
 }
 
+struct fib6_dump_arg {
+	struct net *net;
+	struct notifier_block *nb;
+};
+
+static void fib6_rt_dump(struct rt6_info *rt, struct fib6_dump_arg *arg)
+{
+	if (rt == arg->net->ipv6.ip6_null_entry)
+		return;
+	call_fib6_entry_notifier(arg->nb, arg->net, FIB_EVENT_ENTRY_ADD, rt);
+}
+
+static int fib6_node_dump(struct fib6_walker *w)
+{
+	struct rt6_info *rt;
+
+	for (rt = w->leaf; rt; rt = rt->dst.rt6_next)
+		fib6_rt_dump(rt, w->args);
+	w->leaf = NULL;
+	return 0;
+}
+
+static void fib6_table_dump(struct net *net, struct fib6_table *tb,
+			    struct fib6_walker *w)
+{
+	w->root = &tb->tb6_root;
+	read_lock_bh(&tb->tb6_lock);
+	fib6_walk(net, w);
+	read_unlock_bh(&tb->tb6_lock);
+}
+
+/* Called with rcu_read_lock() */
+int fib6_tables_dump(struct net *net, struct notifier_block *nb)
+{
+	struct fib6_dump_arg arg;
+	struct fib6_walker *w;
+	unsigned int h;
+
+	w = kzalloc(sizeof(*w), GFP_ATOMIC);
+	if (!w)
+		return -ENOMEM;
+
+	w->func = fib6_node_dump;
+	arg.net = net;
+	arg.nb = nb;
+	w->args = &arg;
+
+	for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
+		struct hlist_head *head = &net->ipv6.fib_table_hash[h];
+		struct fib6_table *tb;
+
+		hlist_for_each_entry_rcu(tb, head, tb6_hlist)
+			fib6_table_dump(net, tb, w);
+	}
+
+	kfree(w);
+
+	return 0;
+}
+
 static int fib6_dump_node(struct fib6_walker *w)
 {
 	int res;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 10/17] ipv6: fib: Add offload indication to routes
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (8 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 09/17] ipv6: fib: Dump tables " Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19 15:27   ` David Ahern
  2017-07-19  7:02 ` [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route Jiri Pirko
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

Allow user space applications to see which routes are offloaded and
which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.

To be consistent with IPv4, a multipath route is marked as offloaded if
one of its nexthops is offloaded. Individual nexthops aren't marked with
the 'offload' flag.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/uapi/linux/ipv6_route.h |  1 +
 net/ipv6/route.c                | 19 ++++++++++++++++---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/ipv6_route.h b/include/uapi/linux/ipv6_route.h
index d496c02..33e2a57 100644
--- a/include/uapi/linux/ipv6_route.h
+++ b/include/uapi/linux/ipv6_route.h
@@ -35,6 +35,7 @@
 #define RTF_PREF(pref)	((pref) << 27)
 #define RTF_PREF_MASK	0x18000000
 
+#define RTF_OFFLOAD	0x20000000	/* offloaded route		*/
 #define RTF_PCPU	0x40000000	/* read-only: can not be set by user */
 #define RTF_LOCAL	0x80000000
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d30c96..924e02d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1820,6 +1820,11 @@ static struct rt6_info *ip6_route_info_create(struct fib6_config *cfg,
 		goto out;
 	}
 
+	if (cfg->fc_flags & RTF_OFFLOAD) {
+		NL_SET_ERR_MSG(extack, "Userspace can not set RTF_OFFLOAD");
+		goto out;
+	}
+
 	if (cfg->fc_dst_len > 128) {
 		NL_SET_ERR_MSG(extack, "Invalid prefix length");
 		goto out;
@@ -3327,6 +3332,9 @@ static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
 			goto nla_put_failure;
 	}
 
+	if (rt->rt6i_flags & RTF_OFFLOAD)
+		*flags |= RTNH_F_OFFLOAD;
+
 	/* not needed for multipath encoding b/c it has a rtnexthop struct */
 	if (!skip_oif && rt->dst.dev &&
 	    nla_put_u32(skb, RTA_OIF, rt->dst.dev->ifindex))
@@ -3343,7 +3351,8 @@ static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
 }
 
 /* add multipath next hop */
-static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt)
+static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt,
+			   unsigned int *rtm_flags)
 {
 	struct rtnexthop *rtnh;
 	unsigned int flags = 0;
@@ -3359,6 +3368,10 @@ static int rt6_add_nexthop(struct sk_buff *skb, struct rt6_info *rt)
 		goto nla_put_failure;
 
 	rtnh->rtnh_flags = flags;
+	if (rtnh->rtnh_flags & RTNH_F_OFFLOAD) {
+		rtnh->rtnh_flags &= ~RTNH_F_OFFLOAD;
+		*rtm_flags |= RTNH_F_OFFLOAD;
+	}
 
 	/* length of rtnetlink header + attributes */
 	rtnh->rtnh_len = nlmsg_get_pos(skb) - (void *)rtnh;
@@ -3499,12 +3512,12 @@ static int rt6_fill_node(struct net *net,
 		if (!mp)
 			goto nla_put_failure;
 
-		if (rt6_add_nexthop(skb, rt) < 0)
+		if (rt6_add_nexthop(skb, rt, &rtm->rtm_flags) < 0)
 			goto nla_put_failure;
 
 		list_for_each_entry_safe(sibling, next_sibling,
 					 &rt->rt6i_siblings, rt6i_siblings) {
-			if (rt6_add_nexthop(skb, sibling) < 0)
+			if (rt6_add_nexthop(skb, sibling, &rtm->rtm_flags) < 0)
 				goto nla_put_failure;
 		}
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (9 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 10/17] ipv6: fib: Add offload indication to routes Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19 15:49   ` David Ahern
  2017-07-19  7:02 ` [patch net-next 12/17] mlxsw: spectrum_router: Demultiplex FIB event based on family Jiri Pirko
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

Listeners of the FIB notification chain are expected to be able to take
and release a reference on notified IPv6 routes. This is needed in the
case of drivers capable of offloading these routes to a capable device.

Since notifications are sent in an atomic context, these drivers need to
take a reference on the route, prepare a work item to offload the route
and release the reference at the end of the work.

Currently, rt6i_ref is used to indicate in how many FIB nodes a route
appears. Different code paths rely on rt6i_ref being 0 to indicate the
route is no longer used by the FIB.

For example, whenever a route is deleted or replaced, fib6_purge_rt() is
run to make sure the route is no longer present in intermediate nodes. A
BUG_ON() at the end of the function is executed in case the reference
count isn't 1, as it's only supposed to appear in the non-intermediate
node from which it's going to be deleted.

Instead of changing the semantics of rt6i_ref, a new reference count is
added, so that external users could also take a reference on routes
without modifying rt6i_ref.

To make sure external users don't release routes used by the FIB, the
reference count is set to 1 upon creation of a route and decremented by
the FIB upon rt6_release().

The reference count is atomic, as it's not protected by any locks and
placed in the 40 bytes hole after the existing rt6i_ref.

rt6_free_pcpu() is exported so that modules could invoke rt6_put().
Similar to commit b423cb10807b ("ipv4: fib: Export free_fib_info()").

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip6_fib.h | 17 +++++++++++++++++
 net/ipv6/ip6_fib.c    | 10 ++++------
 net/ipv6/route.c      |  4 ++++
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 0b30521..e8ecd08 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -119,6 +119,7 @@ struct rt6_info {
 	unsigned int			rt6i_nsiblings;
 
 	atomic_t			rt6i_ref;
+	refcount_t			rt6i_extref;
 
 	/* These are in a separate cache line. */
 	struct rt6key			rt6i_dst ____cacheline_aligned_in_smp;
@@ -187,6 +188,22 @@ static inline void ip6_rt_put(struct rt6_info *rt)
 	dst_release(&rt->dst);
 }
 
+void rt6_free_pcpu(struct rt6_info *non_pcpu_rt);
+
+static inline void rt6_get(struct rt6_info *rt)
+{
+	refcount_inc(&rt->rt6i_extref);
+}
+
+static inline void rt6_put(struct rt6_info *rt)
+{
+	if (refcount_dec_and_test(&rt->rt6i_extref)) {
+		rt6_free_pcpu(rt);
+		dst_dev_put(&rt->dst);
+		dst_release(&rt->dst);
+	}
+}
+
 enum fib6_walk_state {
 #ifdef CONFIG_IPV6_SUBTREES
 	FWS_S,
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 719c1048..99ca785 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -154,7 +154,7 @@ static void node_free(struct fib6_node *fn)
 	kmem_cache_free(fib6_node_kmem, fn);
 }
 
-static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
+void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
 {
 	int cpu;
 
@@ -177,14 +177,12 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
 	free_percpu(non_pcpu_rt->rt6i_pcpu);
 	non_pcpu_rt->rt6i_pcpu = NULL;
 }
+EXPORT_SYMBOL_GPL(rt6_free_pcpu);
 
 static void rt6_release(struct rt6_info *rt)
 {
-	if (atomic_dec_and_test(&rt->rt6i_ref)) {
-		rt6_free_pcpu(rt);
-		dst_dev_put(&rt->dst);
-		dst_release(&rt->dst);
-	}
+	if (atomic_dec_and_test(&rt->rt6i_ref))
+		rt6_put(rt);
 }
 
 static void fib6_link_table(struct net *net, struct fib6_table *tb)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 924e02d..cabe0c6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -345,6 +345,10 @@ static void rt6_info_init(struct rt6_info *rt)
 	memset(dst + 1, 0, sizeof(*rt) - sizeof(*dst));
 	INIT_LIST_HEAD(&rt->rt6i_siblings);
 	INIT_LIST_HEAD(&rt->rt6i_uncached);
+	/* Make sure route can't be released as long as it's used by
+	 * the FIB.
+	 */
+	refcount_set(&rt->rt6i_extref, 1);
 }
 
 /* allocate dst with ip6_dst_ops */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 12/17] mlxsw: spectrum_router: Demultiplex FIB event based on family
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (10 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 13/17] mlxsw: spectrum_router: Sanitize IPv6 FIB rules Jiri Pirko
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

The FIB notification block currently only handles IPv4 events, but we
want to start handling IPv6 events soon, so lay the groundwork now.

Do that by preparing the work item and process it according to the
notified address family.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 65 +++++++++++++++-------
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 7965a53..d3b20bc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2982,7 +2982,7 @@ struct mlxsw_sp_fib_event_work {
 	unsigned long event;
 };
 
-static void mlxsw_sp_router_fib_event_work(struct work_struct *work)
+static void mlxsw_sp_router_fib4_event_work(struct work_struct *work)
 {
 	struct mlxsw_sp_fib_event_work *fib_work =
 		container_of(work, struct mlxsw_sp_fib_event_work, work);
@@ -3027,6 +3027,42 @@ static void mlxsw_sp_router_fib_event_work(struct work_struct *work)
 	kfree(fib_work);
 }
 
+static void mlxsw_sp_router_fib6_event_work(struct work_struct *work)
+{
+}
+
+static void mlxsw_sp_router_fib4_event(struct mlxsw_sp_fib_event_work *fib_work,
+				       struct fib_notifier_info *info)
+{
+	switch (fib_work->event) {
+	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
+	case FIB_EVENT_ENTRY_APPEND: /* fall through */
+	case FIB_EVENT_ENTRY_ADD: /* fall through */
+	case FIB_EVENT_ENTRY_DEL:
+		memcpy(&fib_work->fen_info, info, sizeof(fib_work->fen_info));
+		/* Take referece on fib_info to prevent it from being
+		 * freed while work is queued. Release it afterwards.
+		 */
+		fib_info_hold(fib_work->fen_info.fi);
+		break;
+	case FIB_EVENT_RULE_ADD: /* fall through */
+	case FIB_EVENT_RULE_DEL:
+		memcpy(&fib_work->fr_info, info, sizeof(fib_work->fr_info));
+		fib_rule_get(fib_work->fr_info.rule);
+		break;
+	case FIB_EVENT_NH_ADD: /* fall through */
+	case FIB_EVENT_NH_DEL:
+		memcpy(&fib_work->fnh_info, info, sizeof(fib_work->fnh_info));
+		fib_info_hold(fib_work->fnh_info.fib_nh->nh_parent);
+		break;
+	}
+}
+
+static void mlxsw_sp_router_fib6_event(struct mlxsw_sp_fib_event_work *fib_work,
+				       struct fib_notifier_info *info)
+{
+}
+
 /* Called with rcu_read_lock() */
 static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 				     unsigned long event, void *ptr)
@@ -3042,31 +3078,18 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 	if (WARN_ON(!fib_work))
 		return NOTIFY_BAD;
 
-	INIT_WORK(&fib_work->work, mlxsw_sp_router_fib_event_work);
 	router = container_of(nb, struct mlxsw_sp_router, fib_nb);
 	fib_work->mlxsw_sp = router->mlxsw_sp;
 	fib_work->event = event;
 
-	switch (event) {
-	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
-	case FIB_EVENT_ENTRY_APPEND: /* fall through */
-	case FIB_EVENT_ENTRY_ADD: /* fall through */
-	case FIB_EVENT_ENTRY_DEL:
-		memcpy(&fib_work->fen_info, ptr, sizeof(fib_work->fen_info));
-		/* Take referece on fib_info to prevent it from being
-		 * freed while work is queued. Release it afterwards.
-		 */
-		fib_info_hold(fib_work->fen_info.fi);
+	switch (info->family) {
+	case AF_INET:
+		INIT_WORK(&fib_work->work, mlxsw_sp_router_fib4_event_work);
+		mlxsw_sp_router_fib4_event(fib_work, info);
 		break;
-	case FIB_EVENT_RULE_ADD: /* fall through */
-	case FIB_EVENT_RULE_DEL:
-		memcpy(&fib_work->fr_info, ptr, sizeof(fib_work->fr_info));
-		fib_rule_get(fib_work->fr_info.rule);
-		break;
-	case FIB_EVENT_NH_ADD: /* fall through */
-	case FIB_EVENT_NH_DEL:
-		memcpy(&fib_work->fnh_info, ptr, sizeof(fib_work->fnh_info));
-		fib_info_hold(fib_work->fnh_info.fib_nh->nh_parent);
+	case AF_INET6:
+		INIT_WORK(&fib_work->work, mlxsw_sp_router_fib6_event_work);
+		mlxsw_sp_router_fib6_event(fib_work, info);
 		break;
 	}
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 13/17] mlxsw: spectrum_router: Sanitize IPv6 FIB rules
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (11 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 12/17] mlxsw: spectrum_router: Demultiplex FIB event based on family Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion Jiri Pirko
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

We only allow FIB offload in the presence of default rules or an l3mdev
rule. In a similar fashion to IPv4 FIB rules, sanitize IPv6 rules.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 25 ++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index d3b20bc..cf06b7d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -48,6 +48,7 @@
 #include <net/neighbour.h>
 #include <net/arp.h>
 #include <net/ip_fib.h>
+#include <net/ip6_fib.h>
 #include <net/fib_rules.h>
 #include <net/l3mdev.h>
 #include <net/addrconf.h>
@@ -3029,6 +3030,23 @@ static void mlxsw_sp_router_fib4_event_work(struct work_struct *work)
 
 static void mlxsw_sp_router_fib6_event_work(struct work_struct *work)
 {
+	struct mlxsw_sp_fib_event_work *fib_work =
+		container_of(work, struct mlxsw_sp_fib_event_work, work);
+	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
+	struct fib_rule *rule;
+
+	rtnl_lock();
+	switch (fib_work->event) {
+	case FIB_EVENT_RULE_ADD: /* fall through */
+	case FIB_EVENT_RULE_DEL:
+		rule = fib_work->fr_info.rule;
+		if (!fib6_rule_default(rule) && !rule->l3mdev)
+			mlxsw_sp_router_fib_abort(mlxsw_sp);
+		fib_rule_put(rule);
+		break;
+	}
+	rtnl_unlock();
+	kfree(fib_work);
 }
 
 static void mlxsw_sp_router_fib4_event(struct mlxsw_sp_fib_event_work *fib_work,
@@ -3061,6 +3079,13 @@ static void mlxsw_sp_router_fib4_event(struct mlxsw_sp_fib_event_work *fib_work,
 static void mlxsw_sp_router_fib6_event(struct mlxsw_sp_fib_event_work *fib_work,
 				       struct fib_notifier_info *info)
 {
+	switch (fib_work->event) {
+	case FIB_EVENT_RULE_ADD: /* fall through */
+	case FIB_EVENT_RULE_DEL:
+		memcpy(&fib_work->fr_info, info, sizeof(fib_work->fr_info));
+		fib_rule_get(fib_work->fr_info.rule);
+		break;
+	}
 }
 
 /* Called with rcu_read_lock() */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (12 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 13/17] mlxsw: spectrum_router: Sanitize IPv6 FIB rules Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19 16:14   ` David Ahern
  2017-07-19  7:02 ` [patch net-next 15/17] mlxsw: spectrum_router: Add support for route replace Jiri Pirko
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

Allow directly connected and remote unicast IPv6 routes to be programmed
to the device's tables.

As with IPv4, identical routes - sharing the same destination prefix -
are ordered in a FIB node according to their table ID and then the
metric. While the kernel doesn't share the same trie for the local and
main table, this does happen in the device, so ordering according to
table ID is needed.

Since individual nexthops can be added and deleted in IPv6, each FIB
entry stores a linked list of the rt6_info structs it represents. Upon
the addition or deletion of a nexthop, a new nexthop group is allocated
according to the new configuration and the old one is destroyed.
Identical groups aren't currently consolidated, but will be in a
follow-up patchset.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 685 ++++++++++++++++++++-
 1 file changed, 682 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index cf06b7d..33e5b16 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -44,6 +44,7 @@
 #include <linux/netdevice.h>
 #include <linux/if_bridge.h>
 #include <linux/socket.h>
+#include <linux/route.h>
 #include <net/netevent.h>
 #include <net/neighbour.h>
 #include <net/arp.h>
@@ -407,6 +408,17 @@ struct mlxsw_sp_fib4_entry {
 	u8 type;
 };
 
+struct mlxsw_sp_fib6_entry {
+	struct mlxsw_sp_fib_entry common;
+	struct list_head rt6_list;
+	unsigned int nrt6;
+};
+
+struct mlxsw_sp_rt6 {
+	struct list_head list;
+	struct rt6_info *rt;
+};
+
 enum mlxsw_sp_l3proto {
 	MLXSW_SP_L3_PROTO_IPV4,
 	MLXSW_SP_L3_PROTO_IPV6,
@@ -2094,6 +2106,40 @@ mlxsw_sp_fib_entry_should_offload(const struct mlxsw_sp_fib_entry *fib_entry)
 	}
 }
 
+static void
+mlxsw_sp_fib6_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+
+	fib6_entry = container_of(fib_entry, struct mlxsw_sp_fib6_entry,
+				  common);
+	list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
+		struct rt6_info *rt = mlxsw_sp_rt6->rt;
+
+		write_lock_bh(&rt->rt6i_table->tb6_lock);
+		rt->rt6i_flags |= RTF_OFFLOAD;
+		write_unlock_bh(&rt->rt6i_table->tb6_lock);
+	}
+}
+
+static void
+mlxsw_sp_fib6_entry_offload_unset(struct mlxsw_sp_fib_entry *fib_entry)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+
+	fib6_entry = container_of(fib_entry, struct mlxsw_sp_fib6_entry,
+				  common);
+	list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
+		struct rt6_info *rt = mlxsw_sp_rt6->rt;
+
+		write_lock_bh(&rt->rt6i_table->tb6_lock);
+		rt->rt6i_flags &= ~RTF_OFFLOAD;
+		write_unlock_bh(&rt->rt6i_table->tb6_lock);
+	}
+}
+
 static void mlxsw_sp_fib_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
 {
 	fib_entry->offloaded = true;
@@ -2103,7 +2149,8 @@ static void mlxsw_sp_fib_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
 		fib_info_offload_inc(fib_entry->nh_group->key.fi);
 		break;
 	case MLXSW_SP_L3_PROTO_IPV6:
-		WARN_ON_ONCE(1);
+		mlxsw_sp_fib6_entry_offload_set(fib_entry);
+		break;
 	}
 }
 
@@ -2115,7 +2162,8 @@ mlxsw_sp_fib_entry_offload_unset(struct mlxsw_sp_fib_entry *fib_entry)
 		fib_info_offload_dec(fib_entry->nh_group->key.fi);
 		break;
 	case MLXSW_SP_L3_PROTO_IPV6:
-		WARN_ON_ONCE(1);
+		mlxsw_sp_fib6_entry_offload_unset(fib_entry);
+		break;
 	}
 
 	fib_entry->offloaded = false;
@@ -2829,6 +2877,602 @@ static void mlxsw_sp_router_fib4_del(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
 }
 
+static bool mlxsw_sp_fib6_rt_should_ignore(const struct rt6_info *rt)
+{
+	/* Packets with link-local destination IP arriving to the router
+	 * are trapped to the CPU, so no need to program specific routes
+	 * for them.
+	 */
+	if (ipv6_addr_type(&rt->rt6i_dst.addr) & IPV6_ADDR_LINKLOCAL)
+		return true;
+
+	/* Multicast routes aren't supported, so ignore them. Neighbour
+	 * Discovery packets are specifically trapped.
+	 */
+	if (ipv6_addr_type(&rt->rt6i_dst.addr) & IPV6_ADDR_MULTICAST)
+		return true;
+
+	/* Cloned routes are irrelevant in the forwarding path. */
+	if (rt->rt6i_flags & RTF_CACHE)
+		return true;
+
+	return false;
+}
+
+static struct mlxsw_sp_rt6 *mlxsw_sp_rt6_create(struct rt6_info *rt)
+{
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+
+	mlxsw_sp_rt6 = kzalloc(sizeof(*mlxsw_sp_rt6), GFP_KERNEL);
+	if (!mlxsw_sp_rt6)
+		return ERR_PTR(-ENOMEM);
+
+	/* In case of route replace, replaced route is deleted with
+	 * no notification. Take reference to prevent accessing freed
+	 * memory.
+	 */
+	mlxsw_sp_rt6->rt = rt;
+	rt6_get(rt);
+
+	return mlxsw_sp_rt6;
+}
+
+static void mlxsw_sp_rt6_destroy(struct mlxsw_sp_rt6 *mlxsw_sp_rt6)
+{
+	rt6_put(mlxsw_sp_rt6->rt);
+	kfree(mlxsw_sp_rt6);
+}
+
+static bool mlxsw_sp_fib6_rt_can_mp(const struct rt6_info *rt)
+{
+	/* RTF_CACHE routes are ignored */
+	return (rt->rt6i_flags & (RTF_GATEWAY | RTF_ADDRCONF)) == RTF_GATEWAY;
+}
+
+static struct rt6_info *
+mlxsw_sp_fib6_entry_rt(const struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	return list_first_entry(&fib6_entry->rt6_list, struct mlxsw_sp_rt6,
+				list)->rt;
+}
+
+static struct mlxsw_sp_fib6_entry *
+mlxsw_sp_fib6_node_mp_entry_find(const struct mlxsw_sp_fib_node *fib_node,
+				 const struct rt6_info *nrt)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+
+	if (!mlxsw_sp_fib6_rt_can_mp(nrt))
+		return NULL;
+
+	list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
+		struct rt6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
+
+		/* RT6_TABLE_LOCAL and RT6_TABLE_MAIN share the same
+		 * virtual router.
+		 */
+		if (rt->rt6i_table->tb6_id > nrt->rt6i_table->tb6_id)
+			continue;
+		if (rt->rt6i_table->tb6_id != nrt->rt6i_table->tb6_id)
+			break;
+		if (rt->rt6i_metric < nrt->rt6i_metric)
+			continue;
+		if (rt->rt6i_metric == nrt->rt6i_metric &&
+		    mlxsw_sp_fib6_rt_can_mp(rt))
+			return fib6_entry;
+		if (rt->rt6i_metric > nrt->rt6i_metric)
+			break;
+	}
+
+	return NULL;
+}
+
+static struct mlxsw_sp_rt6 *
+mlxsw_sp_fib6_entry_rt_find(const struct mlxsw_sp_fib6_entry *fib6_entry,
+			    const struct rt6_info *rt)
+{
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+
+	list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
+		if (mlxsw_sp_rt6->rt == rt)
+			return mlxsw_sp_rt6;
+	}
+
+	return NULL;
+}
+
+static int mlxsw_sp_nexthop6_init(struct mlxsw_sp *mlxsw_sp,
+				  struct mlxsw_sp_nexthop_group *nh_grp,
+				  struct mlxsw_sp_nexthop *nh,
+				  const struct rt6_info *rt)
+{
+	struct net_device *dev = rt->dst.dev;
+	struct mlxsw_sp_rif *rif;
+	int err;
+
+	nh->nh_grp = nh_grp;
+	memcpy(&nh->gw_addr, &rt->rt6i_gateway, sizeof(nh->gw_addr));
+
+	if (!dev)
+		return 0;
+
+	rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
+	if (!rif)
+		return 0;
+	mlxsw_sp_nexthop_rif_init(nh, rif);
+
+	err = mlxsw_sp_nexthop_neigh_init(mlxsw_sp, nh);
+	if (err)
+		goto err_nexthop_neigh_init;
+
+	return 0;
+
+err_nexthop_neigh_init:
+	mlxsw_sp_nexthop_rif_fini(nh);
+	return err;
+}
+
+static void mlxsw_sp_nexthop6_fini(struct mlxsw_sp *mlxsw_sp,
+				   struct mlxsw_sp_nexthop *nh)
+{
+	mlxsw_sp_nexthop_neigh_fini(mlxsw_sp, nh);
+	mlxsw_sp_nexthop_rif_fini(nh);
+}
+
+static struct mlxsw_sp_nexthop_group *
+mlxsw_sp_nexthop6_group_create(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	struct mlxsw_sp_nexthop_group *nh_grp;
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+	struct mlxsw_sp_nexthop *nh;
+	size_t alloc_size;
+	int i = 0;
+	int err;
+
+	alloc_size = sizeof(*nh_grp) +
+		     fib6_entry->nrt6 * sizeof(struct mlxsw_sp_nexthop);
+	nh_grp = kzalloc(alloc_size, GFP_KERNEL);
+	if (!nh_grp)
+		return ERR_PTR(-ENOMEM);
+	INIT_LIST_HEAD(&nh_grp->fib_list);
+	nh_grp->neigh_tbl = &nd_tbl;
+	mlxsw_sp_rt6 = list_first_entry(&fib6_entry->rt6_list,
+					struct mlxsw_sp_rt6, list);
+	nh_grp->gateway = !!(mlxsw_sp_rt6->rt->rt6i_flags & RTF_GATEWAY);
+	nh_grp->count = fib6_entry->nrt6;
+	for (i = 0; i < nh_grp->count; i++) {
+		struct rt6_info *rt = mlxsw_sp_rt6->rt;
+
+		nh = &nh_grp->nexthops[i];
+		err = mlxsw_sp_nexthop6_init(mlxsw_sp, nh_grp, nh, rt);
+		if (err)
+			goto err_nexthop6_init;
+		mlxsw_sp_rt6 = list_next_entry(mlxsw_sp_rt6, list);
+	}
+	mlxsw_sp_nexthop_group_refresh(mlxsw_sp, nh_grp);
+	return nh_grp;
+
+err_nexthop6_init:
+	for (i--; i >= 0; i--) {
+		nh = &nh_grp->nexthops[i];
+		mlxsw_sp_nexthop6_fini(mlxsw_sp, nh);
+	}
+	kfree(nh_grp);
+	return ERR_PTR(err);
+}
+
+static void
+mlxsw_sp_nexthop6_group_destroy(struct mlxsw_sp *mlxsw_sp,
+				struct mlxsw_sp_nexthop_group *nh_grp)
+{
+	struct mlxsw_sp_nexthop *nh;
+	int i = nh_grp->count;
+
+	for (i--; i >= 0; i--) {
+		nh = &nh_grp->nexthops[i];
+		mlxsw_sp_nexthop6_fini(mlxsw_sp, nh);
+	}
+	mlxsw_sp_nexthop_group_refresh(mlxsw_sp, nh_grp);
+	WARN_ON(nh_grp->adj_index_valid);
+	kfree(nh_grp);
+}
+
+static int mlxsw_sp_nexthop6_group_get(struct mlxsw_sp *mlxsw_sp,
+				       struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	struct mlxsw_sp_nexthop_group *nh_grp;
+
+	/* For now, don't consolidate nexthop groups */
+	nh_grp = mlxsw_sp_nexthop6_group_create(mlxsw_sp, fib6_entry);
+	if (IS_ERR(nh_grp))
+		return PTR_ERR(nh_grp);
+
+	list_add_tail(&fib6_entry->common.nexthop_group_node,
+		      &nh_grp->fib_list);
+	fib6_entry->common.nh_group = nh_grp;
+
+	return 0;
+}
+
+static void mlxsw_sp_nexthop6_group_put(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_fib_entry *fib_entry)
+{
+	struct mlxsw_sp_nexthop_group *nh_grp = fib_entry->nh_group;
+
+	list_del(&fib_entry->nexthop_group_node);
+	if (!list_empty(&nh_grp->fib_list))
+		return;
+	mlxsw_sp_nexthop6_group_destroy(mlxsw_sp, nh_grp);
+}
+
+static int
+mlxsw_sp_nexthop6_group_update(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	struct mlxsw_sp_nexthop_group *old_nh_grp = fib6_entry->common.nh_group;
+	int err;
+
+	fib6_entry->common.nh_group = NULL;
+	list_del(&fib6_entry->common.nexthop_group_node);
+
+	err = mlxsw_sp_nexthop6_group_get(mlxsw_sp, fib6_entry);
+	if (err)
+		goto err_nexthop6_group_get;
+
+	/* In case this entry is offloaded, then the adjacency index
+	 * currently associated with it in the device's table is that
+	 * of the old group. Start using the new one instead.
+	 */
+	err = mlxsw_sp_fib_node_entry_add(mlxsw_sp, &fib6_entry->common);
+	if (err)
+		goto err_fib_node_entry_add;
+
+	if (list_empty(&old_nh_grp->fib_list))
+		mlxsw_sp_nexthop6_group_destroy(mlxsw_sp, old_nh_grp);
+
+	return 0;
+
+err_fib_node_entry_add:
+	mlxsw_sp_nexthop6_group_put(mlxsw_sp, &fib6_entry->common);
+err_nexthop6_group_get:
+	list_add_tail(&fib6_entry->common.nexthop_group_node,
+		      &old_nh_grp->fib_list);
+	fib6_entry->common.nh_group = old_nh_grp;
+	return err;
+}
+
+static int
+mlxsw_sp_fib6_entry_nexthop_add(struct mlxsw_sp *mlxsw_sp,
+				struct mlxsw_sp_fib6_entry *fib6_entry,
+				struct rt6_info *rt)
+{
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+	int err;
+
+	mlxsw_sp_rt6 = mlxsw_sp_rt6_create(rt);
+	if (IS_ERR(mlxsw_sp_rt6))
+		return PTR_ERR(mlxsw_sp_rt6);
+
+	list_add_tail(&mlxsw_sp_rt6->list, &fib6_entry->rt6_list);
+	fib6_entry->nrt6++;
+
+	err = mlxsw_sp_nexthop6_group_update(mlxsw_sp, fib6_entry);
+	if (err)
+		goto err_nexthop6_group_update;
+
+	return 0;
+
+err_nexthop6_group_update:
+	fib6_entry->nrt6--;
+	list_del(&mlxsw_sp_rt6->list);
+	mlxsw_sp_rt6_destroy(mlxsw_sp_rt6);
+	return err;
+}
+
+static void
+mlxsw_sp_fib6_entry_nexthop_del(struct mlxsw_sp *mlxsw_sp,
+				struct mlxsw_sp_fib6_entry *fib6_entry,
+				struct rt6_info *rt)
+{
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+
+	mlxsw_sp_rt6 = mlxsw_sp_fib6_entry_rt_find(fib6_entry, rt);
+	if (WARN_ON(!mlxsw_sp_rt6))
+		return;
+
+	fib6_entry->nrt6--;
+	list_del(&mlxsw_sp_rt6->list);
+	mlxsw_sp_nexthop6_group_update(mlxsw_sp, fib6_entry);
+	mlxsw_sp_rt6_destroy(mlxsw_sp_rt6);
+}
+
+static void mlxsw_sp_fib6_entry_type_set(struct mlxsw_sp_fib_entry *fib_entry,
+					 const struct rt6_info *rt)
+{
+	/* Packets hitting RTF_REJECT routes need to be discarded by the
+	 * stack. We can rely on their destination device not having a
+	 * RIF (it's the loopback device) and can thus use action type
+	 * local, which will cause them to be trapped with a lower
+	 * priority than packets that need to be locally received.
+	 */
+	if (rt->rt6i_flags & RTF_LOCAL)
+		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
+	else if (rt->rt6i_flags & RTF_REJECT)
+		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
+	else if (rt->rt6i_flags & RTF_GATEWAY)
+		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_REMOTE;
+	else
+		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
+}
+
+static void
+mlxsw_sp_fib6_entry_rt_destroy_all(struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6, *tmp;
+
+	list_for_each_entry_safe(mlxsw_sp_rt6, tmp, &fib6_entry->rt6_list,
+				 list) {
+		fib6_entry->nrt6--;
+		list_del(&mlxsw_sp_rt6->list);
+		mlxsw_sp_rt6_destroy(mlxsw_sp_rt6);
+	}
+}
+
+static struct mlxsw_sp_fib6_entry *
+mlxsw_sp_fib6_entry_create(struct mlxsw_sp *mlxsw_sp,
+			   struct mlxsw_sp_fib_node *fib_node,
+			   struct rt6_info *rt)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
+	int err;
+
+	fib6_entry = kzalloc(sizeof(*fib6_entry), GFP_KERNEL);
+	if (!fib6_entry)
+		return ERR_PTR(-ENOMEM);
+	fib_entry = &fib6_entry->common;
+
+	mlxsw_sp_rt6 = mlxsw_sp_rt6_create(rt);
+	if (IS_ERR(mlxsw_sp_rt6)) {
+		err = PTR_ERR(mlxsw_sp_rt6);
+		goto err_rt6_create;
+	}
+
+	mlxsw_sp_fib6_entry_type_set(fib_entry, mlxsw_sp_rt6->rt);
+
+	INIT_LIST_HEAD(&fib6_entry->rt6_list);
+	list_add_tail(&mlxsw_sp_rt6->list, &fib6_entry->rt6_list);
+	fib6_entry->nrt6 = 1;
+	err = mlxsw_sp_nexthop6_group_get(mlxsw_sp, fib6_entry);
+	if (err)
+		goto err_nexthop6_group_get;
+
+	fib_entry->fib_node = fib_node;
+
+	return fib6_entry;
+
+err_nexthop6_group_get:
+	list_del(&mlxsw_sp_rt6->list);
+	mlxsw_sp_rt6_destroy(mlxsw_sp_rt6);
+err_rt6_create:
+	kfree(fib6_entry);
+	return ERR_PTR(err);
+}
+
+static void mlxsw_sp_fib6_entry_destroy(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	mlxsw_sp_nexthop6_group_put(mlxsw_sp, &fib6_entry->common);
+	mlxsw_sp_fib6_entry_rt_destroy_all(fib6_entry);
+	WARN_ON(fib6_entry->nrt6);
+	kfree(fib6_entry);
+}
+
+static struct mlxsw_sp_fib6_entry *
+mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
+			      const struct rt6_info *nrt)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+
+	list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
+		struct rt6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
+
+		if (rt->rt6i_table->tb6_id > nrt->rt6i_table->tb6_id)
+			continue;
+		if (rt->rt6i_table->tb6_id != nrt->rt6i_table->tb6_id)
+			break;
+		if (rt->rt6i_metric > nrt->rt6i_metric)
+			return fib6_entry;
+	}
+
+	return NULL;
+}
+
+static int
+mlxsw_sp_fib6_node_list_insert(struct mlxsw_sp_fib6_entry *new6_entry)
+{
+	struct mlxsw_sp_fib_node *fib_node = new6_entry->common.fib_node;
+	struct rt6_info *nrt = mlxsw_sp_fib6_entry_rt(new6_entry);
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+
+	fib6_entry = mlxsw_sp_fib6_node_entry_find(fib_node, nrt);
+
+	if (fib6_entry) {
+		list_add_tail(&new6_entry->common.list,
+			      &fib6_entry->common.list);
+	} else {
+		struct mlxsw_sp_fib6_entry *last;
+
+		list_for_each_entry(last, &fib_node->entry_list, common.list) {
+			struct rt6_info *rt = mlxsw_sp_fib6_entry_rt(last);
+
+			if (nrt->rt6i_table->tb6_id > rt->rt6i_table->tb6_id)
+				break;
+			fib6_entry = last;
+		}
+
+		if (fib6_entry)
+			list_add(&new6_entry->common.list,
+				 &fib6_entry->common.list);
+		else
+			list_add(&new6_entry->common.list,
+				 &fib_node->entry_list);
+	}
+
+	return 0;
+}
+
+static void
+mlxsw_sp_fib6_node_list_remove(struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	list_del(&fib6_entry->common.list);
+}
+
+static int mlxsw_sp_fib6_node_entry_link(struct mlxsw_sp *mlxsw_sp,
+					 struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	int err;
+
+	err = mlxsw_sp_fib6_node_list_insert(fib6_entry);
+	if (err)
+		return err;
+
+	err = mlxsw_sp_fib_node_entry_add(mlxsw_sp, &fib6_entry->common);
+	if (err)
+		goto err_fib_node_entry_add;
+
+	return 0;
+
+err_fib_node_entry_add:
+	mlxsw_sp_fib6_node_list_remove(fib6_entry);
+	return err;
+}
+
+static void
+mlxsw_sp_fib6_node_entry_unlink(struct mlxsw_sp *mlxsw_sp,
+				struct mlxsw_sp_fib6_entry *fib6_entry)
+{
+	mlxsw_sp_fib_node_entry_del(mlxsw_sp, &fib6_entry->common);
+	mlxsw_sp_fib6_node_list_remove(fib6_entry);
+}
+
+static struct mlxsw_sp_fib6_entry *
+mlxsw_sp_fib6_entry_lookup(struct mlxsw_sp *mlxsw_sp,
+			   const struct rt6_info *rt)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+	struct mlxsw_sp_fib_node *fib_node;
+	struct mlxsw_sp_fib *fib;
+	struct mlxsw_sp_vr *vr;
+
+	vr = mlxsw_sp_vr_find(mlxsw_sp, rt->rt6i_table->tb6_id);
+	if (!vr)
+		return NULL;
+	fib = mlxsw_sp_vr_fib(vr, MLXSW_SP_L3_PROTO_IPV6);
+
+	fib_node = mlxsw_sp_fib_node_lookup(fib, &rt->rt6i_dst.addr,
+					    sizeof(rt->rt6i_dst.addr),
+					    rt->rt6i_dst.plen);
+	if (!fib_node)
+		return NULL;
+
+	list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
+		struct rt6_info *iter_rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
+
+		if (rt->rt6i_table->tb6_id == iter_rt->rt6i_table->tb6_id &&
+		    rt->rt6i_metric == iter_rt->rt6i_metric &&
+		    mlxsw_sp_fib6_entry_rt_find(fib6_entry, rt))
+			return fib6_entry;
+	}
+
+	return NULL;
+}
+
+static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
+				    struct rt6_info *rt)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+	struct mlxsw_sp_fib_node *fib_node;
+	int err;
+
+	if (mlxsw_sp->router->aborted)
+		return 0;
+
+	if (mlxsw_sp_fib6_rt_should_ignore(rt))
+		return 0;
+
+	fib_node = mlxsw_sp_fib_node_get(mlxsw_sp, rt->rt6i_table->tb6_id,
+					 &rt->rt6i_dst.addr,
+					 sizeof(rt->rt6i_dst.addr),
+					 rt->rt6i_dst.plen,
+					 MLXSW_SP_L3_PROTO_IPV6);
+	if (IS_ERR(fib_node))
+		return PTR_ERR(fib_node);
+
+	/* Before creating a new entry, try to append route to an existing
+	 * multipath entry.
+	 */
+	fib6_entry = mlxsw_sp_fib6_node_mp_entry_find(fib_node, rt);
+	if (fib6_entry) {
+		err = mlxsw_sp_fib6_entry_nexthop_add(mlxsw_sp, fib6_entry, rt);
+		if (err)
+			goto err_fib6_entry_nexthop_add;
+		return 0;
+	}
+
+	fib6_entry = mlxsw_sp_fib6_entry_create(mlxsw_sp, fib_node, rt);
+	if (IS_ERR(fib6_entry)) {
+		err = PTR_ERR(fib6_entry);
+		goto err_fib6_entry_create;
+	}
+
+	err = mlxsw_sp_fib6_node_entry_link(mlxsw_sp, fib6_entry);
+	if (err)
+		goto err_fib6_node_entry_link;
+
+	return 0;
+
+err_fib6_node_entry_link:
+	mlxsw_sp_fib6_entry_destroy(mlxsw_sp, fib6_entry);
+err_fib6_entry_create:
+err_fib6_entry_nexthop_add:
+	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
+	return err;
+}
+
+static void mlxsw_sp_router_fib6_del(struct mlxsw_sp *mlxsw_sp,
+				     struct rt6_info *rt)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry;
+	struct mlxsw_sp_fib_node *fib_node;
+
+	if (mlxsw_sp->router->aborted)
+		return;
+
+	if (mlxsw_sp_fib6_rt_should_ignore(rt))
+		return;
+
+	fib6_entry = mlxsw_sp_fib6_entry_lookup(mlxsw_sp, rt);
+	if (WARN_ON(!fib6_entry))
+		return;
+
+	/* If route is part of a multipath entry, but not the last one
+	 * removed, then only reduce its nexthop group.
+	 */
+	if (!list_is_singular(&fib6_entry->rt6_list)) {
+		mlxsw_sp_fib6_entry_nexthop_del(mlxsw_sp, fib6_entry, rt);
+		return;
+	}
+
+	fib_node = fib6_entry->common.fib_node;
+
+	mlxsw_sp_fib6_node_entry_unlink(mlxsw_sp, fib6_entry);
+	mlxsw_sp_fib6_entry_destroy(mlxsw_sp, fib6_entry);
+	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
+}
+
 static int __mlxsw_sp_router_set_abort_trap(struct mlxsw_sp *mlxsw_sp,
 					    enum mlxsw_reg_ralxx_protocol proto,
 					    u8 tree_id)
@@ -2909,6 +3553,23 @@ static void mlxsw_sp_fib4_node_flush(struct mlxsw_sp *mlxsw_sp,
 	}
 }
 
+static void mlxsw_sp_fib6_node_flush(struct mlxsw_sp *mlxsw_sp,
+				     struct mlxsw_sp_fib_node *fib_node)
+{
+	struct mlxsw_sp_fib6_entry *fib6_entry, *tmp;
+
+	list_for_each_entry_safe(fib6_entry, tmp, &fib_node->entry_list,
+				 common.list) {
+		bool do_break = &tmp->common.list == &fib_node->entry_list;
+
+		mlxsw_sp_fib6_node_entry_unlink(mlxsw_sp, fib6_entry);
+		mlxsw_sp_fib6_entry_destroy(mlxsw_sp, fib6_entry);
+		mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
+		if (do_break)
+			break;
+	}
+}
+
 static void mlxsw_sp_fib_node_flush(struct mlxsw_sp *mlxsw_sp,
 				    struct mlxsw_sp_fib_node *fib_node)
 {
@@ -2917,7 +3578,7 @@ static void mlxsw_sp_fib_node_flush(struct mlxsw_sp *mlxsw_sp,
 		mlxsw_sp_fib4_node_flush(mlxsw_sp, fib_node);
 		break;
 	case MLXSW_SP_L3_PROTO_IPV6:
-		WARN_ON_ONCE(1);
+		mlxsw_sp_fib6_node_flush(mlxsw_sp, fib_node);
 		break;
 	}
 }
@@ -2975,6 +3636,7 @@ static void mlxsw_sp_router_fib_abort(struct mlxsw_sp *mlxsw_sp)
 struct mlxsw_sp_fib_event_work {
 	struct work_struct work;
 	union {
+		struct fib6_entry_notifier_info fen6_info;
 		struct fib_entry_notifier_info fen_info;
 		struct fib_rule_notifier_info fr_info;
 		struct fib_nh_notifier_info fnh_info;
@@ -3034,9 +3696,21 @@ static void mlxsw_sp_router_fib6_event_work(struct work_struct *work)
 		container_of(work, struct mlxsw_sp_fib_event_work, work);
 	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
 	struct fib_rule *rule;
+	int err;
 
 	rtnl_lock();
 	switch (fib_work->event) {
+	case FIB_EVENT_ENTRY_ADD:
+		err = mlxsw_sp_router_fib6_add(mlxsw_sp,
+					       fib_work->fen6_info.rt);
+		if (err)
+			mlxsw_sp_router_fib_abort(mlxsw_sp);
+		rt6_put(fib_work->fen6_info.rt);
+		break;
+	case FIB_EVENT_ENTRY_DEL:
+		mlxsw_sp_router_fib6_del(mlxsw_sp, fib_work->fen6_info.rt);
+		rt6_put(fib_work->fen6_info.rt);
+		break;
 	case FIB_EVENT_RULE_ADD: /* fall through */
 	case FIB_EVENT_RULE_DEL:
 		rule = fib_work->fr_info.rule;
@@ -3080,6 +3754,11 @@ static void mlxsw_sp_router_fib6_event(struct mlxsw_sp_fib_event_work *fib_work,
 				       struct fib_notifier_info *info)
 {
 	switch (fib_work->event) {
+	case FIB_EVENT_ENTRY_ADD: /* fall through */
+	case FIB_EVENT_ENTRY_DEL:
+		memcpy(&fib_work->fen6_info, info, sizeof(fib_work->fen6_info));
+		rt6_get(fib_work->fen6_info.rt);
+		break;
 	case FIB_EVENT_RULE_ADD: /* fall through */
 	case FIB_EVENT_RULE_DEL:
 		memcpy(&fib_work->fr_info, info, sizeof(fib_work->fr_info));
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 15/17] mlxsw: spectrum_router: Add support for route replace
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (13 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 16/17] mlxsw: spectrum_router: Abort on source-specific routes Jiri Pirko
  2017-07-19  7:02 ` [patch net-next 17/17] mlxsw: spectrum_router: Don't ignore IPv6 notifications Jiri Pirko
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

In case we got a replace event, then the replaced route must exist. If
the route isn't capable of multipath, then replace first matching
non-multipath capable route.

If the route is capable of multipath and matching multipath capable
route is found, then replace it. Otherwise, replace first matching
non-multipath capable route.

The new route is inserted before the replaced one. In case the replaced
route is currently offloaded, then it's overwritten in the device's table
by the new route and later deleted, thus not impacting routed traffic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 63 +++++++++++++++++-----
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 33e5b16..c56c700 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2938,11 +2938,11 @@ mlxsw_sp_fib6_entry_rt(const struct mlxsw_sp_fib6_entry *fib6_entry)
 
 static struct mlxsw_sp_fib6_entry *
 mlxsw_sp_fib6_node_mp_entry_find(const struct mlxsw_sp_fib_node *fib_node,
-				 const struct rt6_info *nrt)
+				 const struct rt6_info *nrt, bool replace)
 {
 	struct mlxsw_sp_fib6_entry *fib6_entry;
 
-	if (!mlxsw_sp_fib6_rt_can_mp(nrt))
+	if (!mlxsw_sp_fib6_rt_can_mp(nrt) || replace)
 		return NULL;
 
 	list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
@@ -3272,9 +3272,9 @@ static void mlxsw_sp_fib6_entry_destroy(struct mlxsw_sp *mlxsw_sp,
 
 static struct mlxsw_sp_fib6_entry *
 mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
-			      const struct rt6_info *nrt)
+			      const struct rt6_info *nrt, bool replace)
 {
-	struct mlxsw_sp_fib6_entry *fib6_entry;
+	struct mlxsw_sp_fib6_entry *fib6_entry, *fallback = NULL;
 
 	list_for_each_entry(fib6_entry, &fib_node->entry_list, common.list) {
 		struct rt6_info *rt = mlxsw_sp_fib6_entry_rt(fib6_entry);
@@ -3283,21 +3283,32 @@ mlxsw_sp_fib6_node_entry_find(const struct mlxsw_sp_fib_node *fib_node,
 			continue;
 		if (rt->rt6i_table->tb6_id != nrt->rt6i_table->tb6_id)
 			break;
+		if (replace && rt->rt6i_metric == nrt->rt6i_metric) {
+			if (mlxsw_sp_fib6_rt_can_mp(rt) ==
+			    mlxsw_sp_fib6_rt_can_mp(nrt))
+				return fib6_entry;
+			if (mlxsw_sp_fib6_rt_can_mp(nrt))
+				fallback = fallback ?: fib6_entry;
+		}
 		if (rt->rt6i_metric > nrt->rt6i_metric)
-			return fib6_entry;
+			return fallback ?: fib6_entry;
 	}
 
-	return NULL;
+	return fallback;
 }
 
 static int
-mlxsw_sp_fib6_node_list_insert(struct mlxsw_sp_fib6_entry *new6_entry)
+mlxsw_sp_fib6_node_list_insert(struct mlxsw_sp_fib6_entry *new6_entry,
+			       bool replace)
 {
 	struct mlxsw_sp_fib_node *fib_node = new6_entry->common.fib_node;
 	struct rt6_info *nrt = mlxsw_sp_fib6_entry_rt(new6_entry);
 	struct mlxsw_sp_fib6_entry *fib6_entry;
 
-	fib6_entry = mlxsw_sp_fib6_node_entry_find(fib_node, nrt);
+	fib6_entry = mlxsw_sp_fib6_node_entry_find(fib_node, nrt, replace);
+
+	if (replace && WARN_ON(!fib6_entry))
+		return -EINVAL;
 
 	if (fib6_entry) {
 		list_add_tail(&new6_entry->common.list,
@@ -3331,11 +3342,12 @@ mlxsw_sp_fib6_node_list_remove(struct mlxsw_sp_fib6_entry *fib6_entry)
 }
 
 static int mlxsw_sp_fib6_node_entry_link(struct mlxsw_sp *mlxsw_sp,
-					 struct mlxsw_sp_fib6_entry *fib6_entry)
+					 struct mlxsw_sp_fib6_entry *fib6_entry,
+					 bool replace)
 {
 	int err;
 
-	err = mlxsw_sp_fib6_node_list_insert(fib6_entry);
+	err = mlxsw_sp_fib6_node_list_insert(fib6_entry, replace);
 	if (err)
 		return err;
 
@@ -3390,8 +3402,25 @@ mlxsw_sp_fib6_entry_lookup(struct mlxsw_sp *mlxsw_sp,
 	return NULL;
 }
 
+static void mlxsw_sp_fib6_entry_replace(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_fib6_entry *fib6_entry,
+					bool replace)
+{
+	struct mlxsw_sp_fib_node *fib_node = fib6_entry->common.fib_node;
+	struct mlxsw_sp_fib6_entry *replaced;
+
+	if (!replace)
+		return;
+
+	replaced = list_next_entry(fib6_entry, common.list);
+
+	mlxsw_sp_fib6_node_entry_unlink(mlxsw_sp, replaced);
+	mlxsw_sp_fib6_entry_destroy(mlxsw_sp, replaced);
+	mlxsw_sp_fib_node_put(mlxsw_sp, fib_node);
+}
+
 static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
-				    struct rt6_info *rt)
+				    struct rt6_info *rt, bool replace)
 {
 	struct mlxsw_sp_fib6_entry *fib6_entry;
 	struct mlxsw_sp_fib_node *fib_node;
@@ -3414,7 +3443,7 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
 	/* Before creating a new entry, try to append route to an existing
 	 * multipath entry.
 	 */
-	fib6_entry = mlxsw_sp_fib6_node_mp_entry_find(fib_node, rt);
+	fib6_entry = mlxsw_sp_fib6_node_mp_entry_find(fib_node, rt, replace);
 	if (fib6_entry) {
 		err = mlxsw_sp_fib6_entry_nexthop_add(mlxsw_sp, fib6_entry, rt);
 		if (err)
@@ -3428,10 +3457,12 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
 		goto err_fib6_entry_create;
 	}
 
-	err = mlxsw_sp_fib6_node_entry_link(mlxsw_sp, fib6_entry);
+	err = mlxsw_sp_fib6_node_entry_link(mlxsw_sp, fib6_entry, replace);
 	if (err)
 		goto err_fib6_node_entry_link;
 
+	mlxsw_sp_fib6_entry_replace(mlxsw_sp, fib6_entry, replace);
+
 	return 0;
 
 err_fib6_node_entry_link:
@@ -3696,13 +3727,16 @@ static void mlxsw_sp_router_fib6_event_work(struct work_struct *work)
 		container_of(work, struct mlxsw_sp_fib_event_work, work);
 	struct mlxsw_sp *mlxsw_sp = fib_work->mlxsw_sp;
 	struct fib_rule *rule;
+	bool replace;
 	int err;
 
 	rtnl_lock();
 	switch (fib_work->event) {
+	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
 	case FIB_EVENT_ENTRY_ADD:
+		replace = fib_work->event == FIB_EVENT_ENTRY_REPLACE;
 		err = mlxsw_sp_router_fib6_add(mlxsw_sp,
-					       fib_work->fen6_info.rt);
+					       fib_work->fen6_info.rt, replace);
 		if (err)
 			mlxsw_sp_router_fib_abort(mlxsw_sp);
 		rt6_put(fib_work->fen6_info.rt);
@@ -3754,6 +3788,7 @@ static void mlxsw_sp_router_fib6_event(struct mlxsw_sp_fib_event_work *fib_work,
 				       struct fib_notifier_info *info)
 {
 	switch (fib_work->event) {
+	case FIB_EVENT_ENTRY_REPLACE: /* fall through */
 	case FIB_EVENT_ENTRY_ADD: /* fall through */
 	case FIB_EVENT_ENTRY_DEL:
 		memcpy(&fib_work->fen6_info, info, sizeof(fib_work->fen6_info));
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 16/17] mlxsw: spectrum_router: Abort on source-specific routes
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (14 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 15/17] mlxsw: spectrum_router: Add support for route replace Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  2017-07-19 16:16   ` David Ahern
  2017-07-19  7:02 ` [patch net-next 17/17] mlxsw: spectrum_router: Don't ignore IPv6 notifications Jiri Pirko
  16 siblings, 1 reply; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

Without resorting to ACLs, the device performs route lookup solely based
on the destination IP address.

In case source-specific routing is needed, an error is returned and the
abort mechanism is activated, thus allowing the kernel to take over
forwarding decisions.

Instead of aborting, we can trap specific destination prefixes where
source-specific routes are present, but this will result in a lot more
code that is unlikely to ever be used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index c56c700..33cb6b6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -3429,6 +3429,9 @@ static int mlxsw_sp_router_fib6_add(struct mlxsw_sp *mlxsw_sp,
 	if (mlxsw_sp->router->aborted)
 		return 0;
 
+	if (rt->rt6i_src.plen)
+		return -EINVAL;
+
 	if (mlxsw_sp_fib6_rt_should_ignore(rt))
 		return 0;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [patch net-next 17/17] mlxsw: spectrum_router: Don't ignore IPv6 notifications
  2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
                   ` (15 preceding siblings ...)
  2017-07-19  7:02 ` [patch net-next 16/17] mlxsw: spectrum_router: Abort on source-specific routes Jiri Pirko
@ 2017-07-19  7:02 ` Jiri Pirko
  16 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2017-07-19  7:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, mlxsw, dsahern, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

From: Ido Schimmel <idosch@mellanox.com>

We now have all the necessary IPv6 infrastructure in place, so stop
ignoring these notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 33cb6b6..dc9a032 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -3813,7 +3813,7 @@ static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
 	struct fib_notifier_info *info = ptr;
 	struct mlxsw_sp_router *router;
 
-	if (!net_eq(info->net, &init_net) || info->family != AF_INET)
+	if (!net_eq(info->net, &init_net))
 		return NOTIFY_DONE;
 
 	fib_work = kzalloc(sizeof(*fib_work), GFP_ATOMIC);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [patch net-next 01/17] net: core: Make the FIB notification chain generic
  2017-07-19  7:02 ` [patch net-next 01/17] net: core: Make the FIB notification chain generic Jiri Pirko
@ 2017-07-19 14:11   ` David Ahern
  2017-07-19 14:35     ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 14:11 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, mlxsw, roopa, nikolay, kafai, hannes, yoshfuji,
	edumazet, yanhaishuang

On 7/19/17 1:02 AM, Jiri Pirko wrote:
> +struct fib_notifier_ops *
> +fib_notifier_ops_register(const struct fib_notifier_ops *tmpl, struct net *net)
> +{
> +	struct fib_notifier_ops *ops;
> +	int err;
> +
> +	ops = kmemdup(tmpl, sizeof(*ops), GFP_KERNEL);

why allocate memory to copy the ops?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 01/17] net: core: Make the FIB notification chain generic
  2017-07-19 14:11   ` David Ahern
@ 2017-07-19 14:35     ` Ido Schimmel
  0 siblings, 0 replies; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 14:35 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 08:11:56AM -0600, David Ahern wrote:
> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> > +struct fib_notifier_ops *
> > +fib_notifier_ops_register(const struct fib_notifier_ops *tmpl, struct net *net)
> > +{
> > +	struct fib_notifier_ops *ops;
> > +	int err;
> > +
> > +	ops = kmemdup(tmpl, sizeof(*ops), GFP_KERNEL);
> 
> why allocate memory to copy the ops?

It contains a list pointer that I use to list all the registered
families in each net namespace. Same pattern used in FIB rules.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 10/17] ipv6: fib: Add offload indication to routes
  2017-07-19  7:02 ` [patch net-next 10/17] ipv6: fib: Add offload indication to routes Jiri Pirko
@ 2017-07-19 15:27   ` David Ahern
  2017-07-19 15:49     ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 15:27 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, mlxsw, roopa, nikolay, kafai, hannes, yoshfuji,
	edumazet, yanhaishuang

On 7/19/17 1:02 AM, Jiri Pirko wrote:
> Allow user space applications to see which routes are offloaded and
> which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.
> 
> To be consistent with IPv4, a multipath route is marked as offloaded if
> one of its nexthops is offloaded. Individual nexthops aren't marked with
> the 'offload' flag.

It is more user friendly to report the offload per nexthop especially
given the implications. There are already flags per nexthop and those
flags are pushed to userspace so not an API change at all.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 07/17] ipv6: fib: Add in-kernel notifications for route add / delete
  2017-07-19  7:02 ` [patch net-next 07/17] ipv6: fib: Add in-kernel notifications for route add / delete Jiri Pirko
@ 2017-07-19 15:38   ` David Ahern
  2017-07-19 15:53     ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 15:38 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, mlxsw, roopa, nikolay, kafai, hannes, yoshfuji,
	edumazet, yanhaishuang

On 7/19/17 1:02 AM, Jiri Pirko wrote:
> @@ -879,6 +891,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
>  		*ins = rt;
>  		rt->rt6i_node = fn;
>  		atomic_inc(&rt->rt6i_ref);
> +		call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_ADD,
> +					  rt);
>  		if (!info->skip_notify)
>  			inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
>  		info->nl_net->ipv6.rt6_stats->fib_rt_entries++;
> @@ -906,6 +920,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
>  		rt->rt6i_node = fn;
>  		rt->dst.rt6_next = iter->dst.rt6_next;
>  		atomic_inc(&rt->rt6i_ref);
> +		call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_REPLACE,
> +					  rt);
>  		if (!info->skip_notify)
>  			inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
>  		if (!(fn->fn_flags & RTN_RTINFO)) {
> @@ -1459,6 +1475,7 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
>  
>  	fib6_purge_rt(rt, fn, net);
>  
> +	call_fib6_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, rt);
>  	if (!info->skip_notify)
>  		inet6_rt_notify(RTM_DELROUTE, rt, info, 0);
>  	rt6_release(rt);
> 


Why aren't all of the notifier calls under the skip_notify? That flag is
used to make handling of ipv6 multipath routes on par with ipv4. See
commit 3b1137fe74829

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 10/17] ipv6: fib: Add offload indication to routes
  2017-07-19 15:27   ` David Ahern
@ 2017-07-19 15:49     ` Ido Schimmel
  2017-07-19 15:53       ` David Ahern
  0 siblings, 1 reply; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 15:49 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 09:27:30AM -0600, David Ahern wrote:
> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> > Allow user space applications to see which routes are offloaded and
> > which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.
> > 
> > To be consistent with IPv4, a multipath route is marked as offloaded if
> > one of its nexthops is offloaded. Individual nexthops aren't marked with
> > the 'offload' flag.
> 
> It is more user friendly to report the offload per nexthop especially
> given the implications. There are already flags per nexthop and those
> flags are pushed to userspace so not an API change at all.

I thought about it, but then just decided to be consistent with IPv4.

I can send a follow-up patchset that aligns both families to the
behavior you requested. Need to teach iproute2 to look for
RTNH_F_OFFLOAD in rtnh_flags as well.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route
  2017-07-19  7:02 ` [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route Jiri Pirko
@ 2017-07-19 15:49   ` David Ahern
  2017-07-19 16:17     ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 15:49 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, mlxsw, roopa, nikolay, kafai, hannes, yoshfuji,
	edumazet, yanhaishuang

On 7/19/17 1:02 AM, Jiri Pirko wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Listeners of the FIB notification chain are expected to be able to take
> and release a reference on notified IPv6 routes. This is needed in the
> case of drivers capable of offloading these routes to a capable device.
> 
> Since notifications are sent in an atomic context, these drivers need to
> take a reference on the route, prepare a work item to offload the route
> and release the reference at the end of the work.
> 
> Currently, rt6i_ref is used to indicate in how many FIB nodes a route
> appears. Different code paths rely on rt6i_ref being 0 to indicate the
> route is no longer used by the FIB.
> 
> For example, whenever a route is deleted or replaced, fib6_purge_rt() is
> run to make sure the route is no longer present in intermediate nodes. A
> BUG_ON() at the end of the function is executed in case the reference
> count isn't 1, as it's only supposed to appear in the non-intermediate
> node from which it's going to be deleted.
> 
> Instead of changing the semantics of rt6i_ref, a new reference count is
> added, so that external users could also take a reference on routes
> without modifying rt6i_ref.
> 
> To make sure external users don't release routes used by the FIB, the
> reference count is set to 1 upon creation of a route and decremented by
> the FIB upon rt6_release().
> 
> The reference count is atomic, as it's not protected by any locks and
> placed in the 40 bytes hole after the existing rt6i_ref.

I'd rather not add another reference counter. Debugging reference leaks
is a huge PITA now; adding another counter just makes it worse.

Why can't the BUG_ON in fib6_purge_rt be removed since there are other
reference holders now?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 10/17] ipv6: fib: Add offload indication to routes
  2017-07-19 15:49     ` Ido Schimmel
@ 2017-07-19 15:53       ` David Ahern
  2017-07-19 16:19         ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 15:53 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On 7/19/17 9:49 AM, Ido Schimmel wrote:
> On Wed, Jul 19, 2017 at 09:27:30AM -0600, David Ahern wrote:
>> On 7/19/17 1:02 AM, Jiri Pirko wrote:
>>> Allow user space applications to see which routes are offloaded and
>>> which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.
>>>
>>> To be consistent with IPv4, a multipath route is marked as offloaded if
>>> one of its nexthops is offloaded. Individual nexthops aren't marked with
>>> the 'offload' flag.
>>
>> It is more user friendly to report the offload per nexthop especially
>> given the implications. There are already flags per nexthop and those
>> flags are pushed to userspace so not an API change at all.
> 
> I thought about it, but then just decided to be consistent with IPv4.

And the comment stems from just that. I was looking at IPv4 ECMP routes
a few days ago and the existence / lack of offload flag was not intuitive.

> 
> I can send a follow-up patchset that aligns both families to the
> behavior you requested. Need to teach iproute2 to look for
> RTNH_F_OFFLOAD in rtnh_flags as well.
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 07/17] ipv6: fib: Add in-kernel notifications for route add / delete
  2017-07-19 15:38   ` David Ahern
@ 2017-07-19 15:53     ` Ido Schimmel
  0 siblings, 0 replies; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 15:53 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 09:38:11AM -0600, David Ahern wrote:
> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> > @@ -879,6 +891,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> >  		*ins = rt;
> >  		rt->rt6i_node = fn;
> >  		atomic_inc(&rt->rt6i_ref);
> > +		call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_ADD,
> > +					  rt);
> >  		if (!info->skip_notify)
> >  			inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
> >  		info->nl_net->ipv6.rt6_stats->fib_rt_entries++;
> > @@ -906,6 +920,8 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct rt6_info *rt,
> >  		rt->rt6i_node = fn;
> >  		rt->dst.rt6_next = iter->dst.rt6_next;
> >  		atomic_inc(&rt->rt6i_ref);
> > +		call_fib6_entry_notifiers(info->nl_net, FIB_EVENT_ENTRY_REPLACE,
> > +					  rt);
> >  		if (!info->skip_notify)
> >  			inet6_rt_notify(RTM_NEWROUTE, rt, info, NLM_F_REPLACE);
> >  		if (!(fn->fn_flags & RTN_RTINFO)) {
> > @@ -1459,6 +1475,7 @@ static void fib6_del_route(struct fib6_node *fn, struct rt6_info **rtp,
> >  
> >  	fib6_purge_rt(rt, fn, net);
> >  
> > +	call_fib6_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, rt);
> >  	if (!info->skip_notify)
> >  		inet6_rt_notify(RTM_DELROUTE, rt, info, 0);
> >  	rt6_release(rt);
> > 
> 
> 
> Why aren't all of the notifier calls under the skip_notify? That flag is
> used to make handling of ipv6 multipath routes on par with ipv4. See
> commit 3b1137fe74829

>From the cover letter:

"Unlike user space notifications for IPv6 multipath routes, the FIB
notification chain notifies these on a per-nexthop basis. This allows us
to keep the common code lean and is also unnecessary, as notifications
are serialized by each table's lock whereas applications maintaining
netlink caches may suffer from concurrent dumps and deletions /
additions of routes."

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion
  2017-07-19  7:02 ` [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion Jiri Pirko
@ 2017-07-19 16:14   ` David Ahern
  2017-07-19 16:30     ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 16:14 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, mlxsw, roopa, nikolay, kafai, hannes, yoshfuji,
	edumazet, yanhaishuang

On 7/19/17 1:02 AM, Jiri Pirko wrote:
> @@ -2094,6 +2106,40 @@ mlxsw_sp_fib_entry_should_offload(const struct mlxsw_sp_fib_entry *fib_entry)
>  	}
>  }
>  
> +static void
> +mlxsw_sp_fib6_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
> +{
> +	struct mlxsw_sp_fib6_entry *fib6_entry;
> +	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
> +
> +	fib6_entry = container_of(fib_entry, struct mlxsw_sp_fib6_entry,
> +				  common);
> +	list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
> +		struct rt6_info *rt = mlxsw_sp_rt6->rt;
> +
> +		write_lock_bh(&rt->rt6i_table->tb6_lock);
> +		rt->rt6i_flags |= RTF_OFFLOAD;
> +		write_unlock_bh(&rt->rt6i_table->tb6_lock);

Seems wrong. A device driver should not be taking FIB table locks.


> +	}
> +}
> +
> +static void
> +mlxsw_sp_fib6_entry_offload_unset(struct mlxsw_sp_fib_entry *fib_entry)
> +{
> +	struct mlxsw_sp_fib6_entry *fib6_entry;
> +	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
> +
> +	fib6_entry = container_of(fib_entry, struct mlxsw_sp_fib6_entry,
> +				  common);
> +	list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
> +		struct rt6_info *rt = mlxsw_sp_rt6->rt;
> +
> +		write_lock_bh(&rt->rt6i_table->tb6_lock);
> +		rt->rt6i_flags &= ~RTF_OFFLOAD;
> +		write_unlock_bh(&rt->rt6i_table->tb6_lock);

same here.




> +static int mlxsw_sp_nexthop6_init(struct mlxsw_sp *mlxsw_sp,
> +				  struct mlxsw_sp_nexthop_group *nh_grp,
> +				  struct mlxsw_sp_nexthop *nh,
> +				  const struct rt6_info *rt)
> +{
> +	struct net_device *dev = rt->dst.dev;
> +	struct mlxsw_sp_rif *rif;
> +	int err;
> +
> +	nh->nh_grp = nh_grp;
> +	memcpy(&nh->gw_addr, &rt->rt6i_gateway, sizeof(nh->gw_addr));
> +
> +	if (!dev)
> +		return 0;
> +
> +	rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
> +	if (!rif)
> +		return 0;

rif == 0 means the dst device is not related to a port owned by this
driver?


A lot to process so I am sure I missed the answer to these:

1. How do you handle host routes for local addresses? IPv6 inserts the
host and anycast routes with the device set to 'lo' (or VRF device)
instead of the device with the address. I have a patch to change this,
but needs more testing

2. How are routes with devices unrelated to ports owned by this driver
handled?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 16/17] mlxsw: spectrum_router: Abort on source-specific routes
  2017-07-19  7:02 ` [patch net-next 16/17] mlxsw: spectrum_router: Abort on source-specific routes Jiri Pirko
@ 2017-07-19 16:16   ` David Ahern
  2017-07-19 16:36     ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 16:16 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, idosch, mlxsw, roopa, nikolay, kafai, hannes, yoshfuji,
	edumazet, yanhaishuang

On 7/19/17 1:02 AM, Jiri Pirko wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Without resorting to ACLs, the device performs route lookup solely based
> on the destination IP address.
> 
> In case source-specific routing is needed, an error is returned and the
> abort mechanism is activated, thus allowing the kernel to take over
> forwarding decisions.
> 
> Instead of aborting, we can trap specific destination prefixes where
> source-specific routes are present, but this will result in a lot more
> code that is unlikely to ever be used.


Do you have a document summarizing these for users?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route
  2017-07-19 15:49   ` David Ahern
@ 2017-07-19 16:17     ` Ido Schimmel
  2017-07-19 16:29       ` David Ahern
  0 siblings, 1 reply; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 16:17 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 09:49:37AM -0600, David Ahern wrote:
> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> > From: Ido Schimmel <idosch@mellanox.com>
> > 
> > Listeners of the FIB notification chain are expected to be able to take
> > and release a reference on notified IPv6 routes. This is needed in the
> > case of drivers capable of offloading these routes to a capable device.
> > 
> > Since notifications are sent in an atomic context, these drivers need to
> > take a reference on the route, prepare a work item to offload the route
> > and release the reference at the end of the work.
> > 
> > Currently, rt6i_ref is used to indicate in how many FIB nodes a route
> > appears. Different code paths rely on rt6i_ref being 0 to indicate the
> > route is no longer used by the FIB.
> > 
> > For example, whenever a route is deleted or replaced, fib6_purge_rt() is
> > run to make sure the route is no longer present in intermediate nodes. A
> > BUG_ON() at the end of the function is executed in case the reference
> > count isn't 1, as it's only supposed to appear in the non-intermediate
> > node from which it's going to be deleted.
> > 
> > Instead of changing the semantics of rt6i_ref, a new reference count is
> > added, so that external users could also take a reference on routes
> > without modifying rt6i_ref.
> > 
> > To make sure external users don't release routes used by the FIB, the
> > reference count is set to 1 upon creation of a route and decremented by
> > the FIB upon rt6_release().
> > 
> > The reference count is atomic, as it's not protected by any locks and
> > placed in the 40 bytes hole after the existing rt6i_ref.
> 
> I'd rather not add another reference counter. Debugging reference leaks
> is a huge PITA now; adding another counter just makes it worse.
> 
> Why can't the BUG_ON in fib6_purge_rt be removed since there are other
> reference holders now?

I did exactly that in the beginning, but it didn't sit right with me for
the exact reason you mentioned - it can be a PITA to debug.

If we use rt6i_ref for something other than FIB references, then it
breaks existing code that relies on rt6i_ref being 0 to indicate it's
no longer used by the FIB. A non-zero value can now mean "not used by
the FIB, but waiting for some module to drop the reference in its
workqueue".

The BUG_ON() mentioned in the commit message is just one example.
Another check was added by you in commit 8048ced9b.

So I think we both want the same thing, but I'm not sure how your
approach is safer.

Thanks

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 10/17] ipv6: fib: Add offload indication to routes
  2017-07-19 15:53       ` David Ahern
@ 2017-07-19 16:19         ` Ido Schimmel
  0 siblings, 0 replies; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 16:19 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 09:53:28AM -0600, David Ahern wrote:
> On 7/19/17 9:49 AM, Ido Schimmel wrote:
> > On Wed, Jul 19, 2017 at 09:27:30AM -0600, David Ahern wrote:
> >> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> >>> Allow user space applications to see which routes are offloaded and
> >>> which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.
> >>>
> >>> To be consistent with IPv4, a multipath route is marked as offloaded if
> >>> one of its nexthops is offloaded. Individual nexthops aren't marked with
> >>> the 'offload' flag.
> >>
> >> It is more user friendly to report the offload per nexthop especially
> >> given the implications. There are already flags per nexthop and those
> >> flags are pushed to userspace so not an API change at all.
> > 
> > I thought about it, but then just decided to be consistent with IPv4.
> 
> And the comment stems from just that. I was looking at IPv4 ECMP routes
> a few days ago and the existence / lack of offload flag was not intuitive.

Understood. I intend to change that.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route
  2017-07-19 16:17     ` Ido Schimmel
@ 2017-07-19 16:29       ` David Ahern
  0 siblings, 0 replies; 35+ messages in thread
From: David Ahern @ 2017-07-19 16:29 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On 7/19/17 10:17 AM, Ido Schimmel wrote:
> I did exactly that in the beginning, but it didn't sit right with me for
> the exact reason you mentioned - it can be a PITA to debug.
> 
> If we use rt6i_ref for something other than FIB references, then it
> breaks existing code that relies on rt6i_ref being 0 to indicate it's
> no longer used by the FIB. A non-zero value can now mean "not used by
> the FIB, but waiting for some module to drop the reference in its
> workqueue".
> 
> The BUG_ON() mentioned in the commit message is just one example.
> Another check was added by you in commit 8048ced9b.
> 
> So I think we both want the same thing, but I'm not sure how your
> approach is safer.

A single reference counter rt6i_ref is best.

There are 2 reads of that counter to determine if the rt is still in the
FIB. Both of those stem from side effects of using the 'lo' for the
device for host addresses. I think an explicit flag can be used for that
purpose instead of trying to deduce it from the reference counter. The
commit you referenced copied what is done in init_loopback for
consistency (both have same end goal).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion
  2017-07-19 16:14   ` David Ahern
@ 2017-07-19 16:30     ` Ido Schimmel
  2017-07-19 16:36       ` David Ahern
  0 siblings, 1 reply; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 16:30 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 10:14:54AM -0600, David Ahern wrote:
> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> > @@ -2094,6 +2106,40 @@ mlxsw_sp_fib_entry_should_offload(const struct mlxsw_sp_fib_entry *fib_entry)
> >  	}
> >  }
> >  
> > +static void
> > +mlxsw_sp_fib6_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
> > +{
> > +	struct mlxsw_sp_fib6_entry *fib6_entry;
> > +	struct mlxsw_sp_rt6 *mlxsw_sp_rt6;
> > +
> > +	fib6_entry = container_of(fib_entry, struct mlxsw_sp_fib6_entry,
> > +				  common);
> > +	list_for_each_entry(mlxsw_sp_rt6, &fib6_entry->rt6_list, list) {
> > +		struct rt6_info *rt = mlxsw_sp_rt6->rt;
> > +
> > +		write_lock_bh(&rt->rt6i_table->tb6_lock);
> > +		rt->rt6i_flags |= RTF_OFFLOAD;
> > +		write_unlock_bh(&rt->rt6i_table->tb6_lock);
> 
> Seems wrong. A device driver should not be taking FIB table locks.

Will remove this in v2.

[...]

> > +static int mlxsw_sp_nexthop6_init(struct mlxsw_sp *mlxsw_sp,
> > +				  struct mlxsw_sp_nexthop_group *nh_grp,
> > +				  struct mlxsw_sp_nexthop *nh,
> > +				  const struct rt6_info *rt)
> > +{
> > +	struct net_device *dev = rt->dst.dev;
> > +	struct mlxsw_sp_rif *rif;
> > +	int err;
> > +
> > +	nh->nh_grp = nh_grp;
> > +	memcpy(&nh->gw_addr, &rt->rt6i_gateway, sizeof(nh->gw_addr));
> > +
> > +	if (!dev)
> > +		return 0;
> > +
> > +	rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, dev);
> > +	if (!rif)
> > +		return 0;
> 
> rif == 0 means the dst device is not related to a port owned by this
> driver?

Yes.

> 
> 
> A lot to process so I am sure I missed the answer to these:
> 
> 1. How do you handle host routes for local addresses? IPv6 inserts the
> host and anycast routes with the device set to 'lo' (or VRF device)
> instead of the device with the address. I have a patch to change this,
> but needs more testing

In mlxsw_sp_fib6_entry_type_set() we check for RTF_LOCAL and set the
FIB entry type to MLXSW_SP_FIB_ENTRY_TYPE_TRAP. Packets hitting these
routes will be trapped with IP2ME trap ID towards the CPU.

> 2. How are routes with devices unrelated to ports owned by this driver
> handled?

They are handled just like any other route, but they don't have a valid
RIF (for directly connected routes) or an adjacency group (for
gatewayed routes), so the check in mlxsw_sp_fib_entry_should_offload()
will return false and they will be programmed to the device with trap
action, but using a trap ID (RTR_INGRESS0) with a lower traffic class
than IP2ME, so packets that actually need to be locally received by the
CPU have a better QoS.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 16/17] mlxsw: spectrum_router: Abort on source-specific routes
  2017-07-19 16:16   ` David Ahern
@ 2017-07-19 16:36     ` Ido Schimmel
  0 siblings, 0 replies; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 16:36 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 10:16:19AM -0600, David Ahern wrote:
> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> > From: Ido Schimmel <idosch@mellanox.com>
> > 
> > Without resorting to ACLs, the device performs route lookup solely based
> > on the destination IP address.
> > 
> > In case source-specific routing is needed, an error is returned and the
> > abort mechanism is activated, thus allowing the kernel to take over
> > forwarding decisions.
> > 
> > Instead of aborting, we can trap specific destination prefixes where
> > source-specific routes are present, but this will result in a lot more
> > code that is unlikely to ever be used.
> 
> Do you have a document summarizing these for users?

As you know, we've a Wiki we maintain for the features covered by mlxsw.
Once these patches are applied to net-next I intend to extend it with
IPv6 documentation and mention the above there.

I did a similar thing with inter-VRF routes:
https://github.com/Mellanox/mlxsw/wiki/Virtual-Routing-and-Forwarding-(VRF)#inter-vrf-routing

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion
  2017-07-19 16:30     ` Ido Schimmel
@ 2017-07-19 16:36       ` David Ahern
  2017-07-19 16:43         ` Ido Schimmel
  0 siblings, 1 reply; 35+ messages in thread
From: David Ahern @ 2017-07-19 16:36 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On 7/19/17 10:30 AM, Ido Schimmel wrote:
>> rif == 0 means the dst device is not related to a port owned by this
>> driver?
> 
> Yes.
> 
>>
>>
>> A lot to process so I am sure I missed the answer to these:
>>
>> 1. How do you handle host routes for local addresses? IPv6 inserts the
>> host and anycast routes with the device set to 'lo' (or VRF device)
>> instead of the device with the address. I have a patch to change this,
>> but needs more testing
> 
> In mlxsw_sp_fib6_entry_type_set() we check for RTF_LOCAL and set the
> FIB entry type to MLXSW_SP_FIB_ENTRY_TYPE_TRAP. Packets hitting these
> routes will be trapped with IP2ME trap ID towards the CPU.

got it. thanks.

> 
>> 2. How are routes with devices unrelated to ports owned by this driver
>> handled?
> 
> They are handled just like any other route, but they don't have a valid
> RIF (for directly connected routes) or an adjacency group (for
> gatewayed routes), so the check in mlxsw_sp_fib_entry_should_offload()
> will return false and they will be programmed to the device with trap
> action, but using a trap ID (RTR_INGRESS0) with a lower traffic class
> than IP2ME, so packets that actually need to be locally received by the
> CPU have a better QoS.

so mlxsw keeps a copy of the complete FIB for IPv4 and IPv6, even routes
unrelated to its ports?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion
  2017-07-19 16:36       ` David Ahern
@ 2017-07-19 16:43         ` Ido Schimmel
  0 siblings, 0 replies; 35+ messages in thread
From: Ido Schimmel @ 2017-07-19 16:43 UTC (permalink / raw)
  To: David Ahern
  Cc: Jiri Pirko, netdev, davem, mlxsw, roopa, nikolay, kafai, hannes,
	yoshfuji, edumazet, yanhaishuang

On Wed, Jul 19, 2017 at 10:36:52AM -0600, David Ahern wrote:
> >> 2. How are routes with devices unrelated to ports owned by this driver
> >> handled?
> > 
> > They are handled just like any other route, but they don't have a valid
> > RIF (for directly connected routes) or an adjacency group (for
> > gatewayed routes), so the check in mlxsw_sp_fib_entry_should_offload()
> > will return false and they will be programmed to the device with trap
> > action, but using a trap ID (RTR_INGRESS0) with a lower traffic class
> > than IP2ME, so packets that actually need to be locally received by the
> > CPU have a better QoS.
> 
> so mlxsw keeps a copy of the complete FIB for IPv4 and IPv6, even routes
> unrelated to its ports?

If we don't reflect all the routes in the system to the ASIC, then we'll
have a broken routing table and a different behavior from what you would
get with plain NICs.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-07-19 16:43 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-19  7:02 [patch net-next 00/17] mlxsw: Support for IPv6 UC router Jiri Pirko
2017-07-19  7:02 ` [patch net-next 01/17] net: core: Make the FIB notification chain generic Jiri Pirko
2017-07-19 14:11   ` David Ahern
2017-07-19 14:35     ` Ido Schimmel
2017-07-19  7:02 ` [patch net-next 02/17] mlxsw: spectrum_router: Ignore address families other than IPv4 Jiri Pirko
2017-07-19  7:02 ` [patch net-next 03/17] rocker: " Jiri Pirko
2017-07-19  7:02 ` [patch net-next 04/17] net: fib_rules: Implement notification logic in core Jiri Pirko
2017-07-19  7:02 ` [patch net-next 05/17] ipv6: fib_rules: Check if rule is a default rule Jiri Pirko
2017-07-19  7:02 ` [patch net-next 06/17] ipv6: fib: Add FIB notifiers callbacks Jiri Pirko
2017-07-19  7:02 ` [patch net-next 07/17] ipv6: fib: Add in-kernel notifications for route add / delete Jiri Pirko
2017-07-19 15:38   ` David Ahern
2017-07-19 15:53     ` Ido Schimmel
2017-07-19  7:02 ` [patch net-next 08/17] ipv6: fib_rules: Dump rules during registration to FIB chain Jiri Pirko
2017-07-19  7:02 ` [patch net-next 09/17] ipv6: fib: Dump tables " Jiri Pirko
2017-07-19  7:02 ` [patch net-next 10/17] ipv6: fib: Add offload indication to routes Jiri Pirko
2017-07-19 15:27   ` David Ahern
2017-07-19 15:49     ` Ido Schimmel
2017-07-19 15:53       ` David Ahern
2017-07-19 16:19         ` Ido Schimmel
2017-07-19  7:02 ` [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route Jiri Pirko
2017-07-19 15:49   ` David Ahern
2017-07-19 16:17     ` Ido Schimmel
2017-07-19 16:29       ` David Ahern
2017-07-19  7:02 ` [patch net-next 12/17] mlxsw: spectrum_router: Demultiplex FIB event based on family Jiri Pirko
2017-07-19  7:02 ` [patch net-next 13/17] mlxsw: spectrum_router: Sanitize IPv6 FIB rules Jiri Pirko
2017-07-19  7:02 ` [patch net-next 14/17] mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion Jiri Pirko
2017-07-19 16:14   ` David Ahern
2017-07-19 16:30     ` Ido Schimmel
2017-07-19 16:36       ` David Ahern
2017-07-19 16:43         ` Ido Schimmel
2017-07-19  7:02 ` [patch net-next 15/17] mlxsw: spectrum_router: Add support for route replace Jiri Pirko
2017-07-19  7:02 ` [patch net-next 16/17] mlxsw: spectrum_router: Abort on source-specific routes Jiri Pirko
2017-07-19 16:16   ` David Ahern
2017-07-19 16:36     ` Ido Schimmel
2017-07-19  7:02 ` [patch net-next 17/17] mlxsw: spectrum_router: Don't ignore IPv6 notifications Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).