Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH next v3 2/2] sctp: make use of SCTP_TRUNC4 macro
From: Marcelo Ricardo Leitner @ 2016-09-21 11:45 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, David Miller,
	David Laight
In-Reply-To: <cover.1474457954.git.marcelo.leitner@gmail.com>

And avoid the usage of '&~3'. This is the last place still not using
the macro.
Also break the line to make it easier to read.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
When I checked it the other day I thought I had this patch applied by
the moment but I hadn't.

v2: updated patch summary

 net/sctp/chunk.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 76eae828ec891076bc2360bf0ee89c3b650ef986..8afe2e90d003b9e1bbf233fd4e2fde0538ac65f6 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -195,9 +195,10 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 	/* This is the biggest possible DATA chunk that can fit into
 	 * the packet
 	 */
-	max_data = (asoc->pathmtu -
-		sctp_sk(asoc->base.sk)->pf->af->net_header_len -
-		sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk)) & ~3;
+	max_data = asoc->pathmtu -
+		   sctp_sk(asoc->base.sk)->pf->af->net_header_len -
+		   sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk);
+	max_data = SCTP_TRUNC4(max_data);
 
 	max = asoc->frag_point;
 	/* If the the peer requested that we authenticate DATA chunks
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 0/6] fib offload: switch to notifier
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john

From: Jiri Pirko <jiri@mellanox.com>

The goal of this patchset is to allow driver to propagate all prefixes
configured in kernel down HW. This is necessary for routing to work
as expected. If we don't do that HW might forward prefixes known to kernel
incorrectly. Take an example when default route is set in switch HW and there
is an IP address set on a management (non-switch) port.

Currently, only FIB entries related to the switch port netdev are
offloaded using switchdev ops. This model is not extendable so the
first patch introduces a replacement: notifier to propagate FIB entry
additions and removals to whoever is interested.

The second patch introduces couple of helpers to deal with RTNH_F_OFFLOAD
flags. Currently it is set in switchdev core. There the assumption is
that only one offload device exists. But for FIB notifier, we assume
multiple offload devices. So the patch introduces a per FIB entry
reference counter and helpers use it in order to achieve this:
   0 means RTNH_F_OFFLOAD is not set, no device offloads this entry
   n means RTNH_F_OFFLOAD is set and the entry is offloaded by n devices

Patches 3 and 4 convert mlxsw and rocker to adopt this new way, registering
one notifier block for each asic instance. Both of these patches also
implement internal "abort" mechanism.

Using switchdev ops, "abort" is called by switchdev core whenever there is
an error during FIB entry add offload. This leads to removal of all
offloaded entries on system by fib_trie code.

Now the new notifier assumes the driver takes care of the abort action.
Here's why:
1) The fact that one HW cannot offload an entry does not mean that the
   others can't do it. So let only one entity to abort and leave the rest
   to work happily.
2) The driver knows what to in order to properly abort. For example,
   currently abort is broken for mlxsw, as for Spectrum there is a need
   to set 0.0.0.0/0 trap in RALUE register.

The fifth patch removes the old, no longer used FIB offload infrastructure.

The last patch reflects the changes into switchdev documentation file.

Jiri Pirko (6):
  fib: introduce FIB notification infrastructure
  fib: introduce FIB info offload flag helpers
  mlxsw: spectrum_router: Use FIB notifications instead of switchdev
    calls
  rocker: use FIB notifications instead of switchdev calls
  switchdev: remove FIB offload infrastructure
  doc: update switchdev L3 section

 Documentation/networking/switchdev.txt             |  27 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   9 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 386 ++++++++++++---------
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |   9 -
 drivers/net/ethernet/rocker/rocker.h               |  15 +-
 drivers/net/ethernet/rocker/rocker_main.c          | 120 +++++--
 drivers/net/ethernet/rocker/rocker_ofdpa.c         | 115 ++++--
 include/net/ip_fib.h                               |  49 ++-
 include/net/switchdev.h                            |  40 ---
 net/ipv4/fib_frontend.c                            |  29 +-
 net/ipv4/fib_rules.c                               |  12 +-
 net/ipv4/fib_trie.c                                | 166 ++++-----
 net/switchdev/switchdev.c                          | 181 ----------
 13 files changed, 537 insertions(+), 621 deletions(-)

-- 
2.5.5

^ permalink raw reply

* [patch net-next 1/6] fib: introduce FIB notification infrastructure
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

This allows to pass information about added/deleted FIB entries/rules to
whoever is interested. This is done in a very similar way as devinet
notifies address additions/removals.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h    | 34 ++++++++++++++++++++++++---
 net/ipv4/fib_frontend.c | 16 ++++++-------
 net/ipv4/fib_rules.c    | 10 ++++++++
 net/ipv4/fib_trie.c     | 62 ++++++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 108 insertions(+), 14 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 7d4a72e..116a9c0 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,7 @@
 #include <net/fib_rules.h>
 #include <net/inetpeer.h>
 #include <linux/percpu.h>
+#include <linux/notifier.h>
 
 struct fib_config {
 	u8			fc_dst_len;
@@ -185,6 +186,33 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
 #define FIB_RES_PREFSRC(net, res)	((res).fi->fib_prefsrc ? : \
 					 FIB_RES_SADDR(net, res))
 
+struct fib_notifier_info {
+	struct net *net;
+};
+
+struct fib_entry_notifier_info {
+	struct fib_notifier_info info; /* must be first */
+	u32 dst;
+	int dst_len;
+	struct fib_info *fi;
+	u8 tos;
+	u8 type;
+	u32 tb_id;
+	u32 nlflags;
+};
+
+enum fib_event_type {
+	FIB_EVENT_ENTRY_ADD,
+	FIB_EVENT_ENTRY_DEL,
+	FIB_EVENT_RULE_ADD,
+	FIB_EVENT_RULE_DEL,
+};
+
+int register_fib_notifier(struct notifier_block *nb);
+int unregister_fib_notifier(struct notifier_block *nb);
+int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
+		       struct fib_notifier_info *info);
+
 struct fib_table {
 	struct hlist_node	tb_hlist;
 	u32			tb_id;
@@ -196,11 +224,11 @@ struct fib_table {
 
 int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
 		     struct fib_result *res, int fib_flags);
-int fib_table_insert(struct fib_table *, struct fib_config *);
-int fib_table_delete(struct fib_table *, struct fib_config *);
+int fib_table_insert(struct net *, struct fib_table *, struct fib_config *);
+int fib_table_delete(struct net *, struct fib_table *, struct fib_config *);
 int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
 		   struct netlink_callback *cb);
-int fib_table_flush(struct fib_table *table);
+int fib_table_flush(struct net *net, struct fib_table *table);
 struct fib_table *fib_trie_unmerge(struct fib_table *main_tb);
 void fib_table_flush_external(struct fib_table *table);
 void fib_free_table(struct fib_table *tb);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4e56a4c..86c43dc 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -182,7 +182,7 @@ static void fib_flush(struct net *net)
 		struct fib_table *tb;
 
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist)
-			flushed += fib_table_flush(tb);
+			flushed += fib_table_flush(net, tb);
 	}
 
 	if (flushed)
@@ -590,13 +590,13 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 			if (cmd == SIOCDELRT) {
 				tb = fib_get_table(net, cfg.fc_table);
 				if (tb)
-					err = fib_table_delete(tb, &cfg);
+					err = fib_table_delete(net, tb, &cfg);
 				else
 					err = -ESRCH;
 			} else {
 				tb = fib_new_table(net, cfg.fc_table);
 				if (tb)
-					err = fib_table_insert(tb, &cfg);
+					err = fib_table_insert(net, tb, &cfg);
 				else
 					err = -ENOBUFS;
 			}
@@ -719,7 +719,7 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto errout;
 	}
 
-	err = fib_table_delete(tb, &cfg);
+	err = fib_table_delete(net, tb, &cfg);
 errout:
 	return err;
 }
@@ -741,7 +741,7 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
 		goto errout;
 	}
 
-	err = fib_table_insert(tb, &cfg);
+	err = fib_table_insert(net, tb, &cfg);
 errout:
 	return err;
 }
@@ -828,9 +828,9 @@ static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifad
 		cfg.fc_scope = RT_SCOPE_HOST;
 
 	if (cmd == RTM_NEWROUTE)
-		fib_table_insert(tb, &cfg);
+		fib_table_insert(net, tb, &cfg);
 	else
-		fib_table_delete(tb, &cfg);
+		fib_table_delete(net, tb, &cfg);
 }
 
 void fib_add_ifaddr(struct in_ifaddr *ifa)
@@ -1254,7 +1254,7 @@ static void ip_fib_net_exit(struct net *net)
 
 		hlist_for_each_entry_safe(tb, tmp, head, tb_hlist) {
 			hlist_del(&tb->tb_hlist);
-			fib_table_flush(tb);
+			fib_table_flush(net, tb);
 			fib_free_table(tb);
 		}
 	}
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 770bebe..ebadf6b 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -164,6 +164,14 @@ static struct fib_table *fib_empty_table(struct net *net)
 	return NULL;
 }
 
+static int call_fib_rule_notifiers(struct net *net,
+				   enum fib_event_type event_type)
+{
+	struct fib_notifier_info info;
+
+	return call_fib_notifiers(net, event_type, &info);
+}
+
 static const struct nla_policy fib4_rule_policy[FRA_MAX+1] = {
 	FRA_GENERIC_POLICY,
 	[FRA_FLOW]	= { .type = NLA_U32 },
@@ -221,6 +229,7 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 
 	net->ipv4.fib_has_custom_rules = true;
 	fib_flush_external(rule->fr_net);
+	call_fib_rule_notifiers(net, FIB_EVENT_RULE_ADD);
 
 	err = 0;
 errout:
@@ -243,6 +252,7 @@ static int fib4_rule_delete(struct fib_rule *rule)
 #endif
 	net->ipv4.fib_has_custom_rules = true;
 	fib_flush_external(rule->fr_net);
+	call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL);
 errout:
 	return err;
 }
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 241f27b..51a4537 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -73,6 +73,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/vmalloc.h>
+#include <linux/notifier.h>
 #include <net/net_namespace.h>
 #include <net/ip.h>
 #include <net/protocol.h>
@@ -84,6 +85,44 @@
 #include <trace/events/fib.h>
 #include "fib_lookup.h"
 
+static BLOCKING_NOTIFIER_HEAD(fib_chain);
+
+int register_fib_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&fib_chain, nb);
+}
+EXPORT_SYMBOL(register_fib_notifier);
+
+int unregister_fib_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&fib_chain, nb);
+}
+EXPORT_SYMBOL(unregister_fib_notifier);
+
+int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
+		       struct fib_notifier_info *info)
+{
+	info->net = net;
+	return blocking_notifier_call_chain(&fib_chain, event_type, info);
+}
+
+static int call_fib_entry_notifiers(struct net *net,
+				    enum fib_event_type event_type, u32 dst,
+				    int dst_len, struct fib_info *fi,
+				    u8 tos, u8 type, u32 tb_id, u32 nlflags)
+{
+	struct fib_entry_notifier_info info = {
+		.dst = dst,
+		.dst_len = dst_len,
+		.fi = fi,
+		.tos = tos,
+		.type = type,
+		.tb_id = tb_id,
+		.nlflags = nlflags,
+	};
+	return call_fib_notifiers(net, event_type, &info.info);
+}
+
 #define MAX_STAT_DEPTH 32
 
 #define KEYLENGTH	(8*sizeof(t_key))
@@ -1076,7 +1115,8 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
 }
 
 /* Caller must hold RTNL. */
-int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
+int fib_table_insert(struct net *net, struct fib_table *tb,
+		     struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct fib_alias *fa, *new_fa;
@@ -1193,6 +1233,11 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 			fib_release_info(fi_drop);
 			if (state & FA_S_ACCESSED)
 				rt_cache_flush(cfg->fc_nlinfo.nl_net);
+
+			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD,
+						 key, plen, fi,
+						 new_fa->fa_tos, cfg->fc_type,
+						 tb->tb_id, cfg->fc_nlflags);
 			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
 				tb->tb_id, &cfg->fc_nlinfo, nlflags);
 
@@ -1245,6 +1290,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
 		tb->tb_num_default++;
 
 	rt_cache_flush(cfg->fc_nlinfo.nl_net);
+	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD, key, plen, fi, tos,
+				 cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
 	rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
 		  &cfg->fc_nlinfo, nlflags);
 succeeded:
@@ -1490,7 +1537,8 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
 }
 
 /* Caller must hold RTNL. */
-int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
+int fib_table_delete(struct net *net, struct fib_table *tb,
+		     struct fib_config *cfg)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct fib_alias *fa, *fa_to_delete;
@@ -1546,6 +1594,9 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
 			       cfg->fc_type, tb->tb_id);
 
+	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
+				 fa_to_delete->fa_info, tos, cfg->fc_type,
+				 tb->tb_id, 0);
 	rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
 		  &cfg->fc_nlinfo, 0);
 
@@ -1809,7 +1860,7 @@ void fib_table_flush_external(struct fib_table *tb)
 }
 
 /* Caller must hold RTNL. */
-int fib_table_flush(struct fib_table *tb)
+int fib_table_flush(struct net *net, struct fib_table *tb)
 {
 	struct trie *t = (struct trie *)tb->tb_data;
 	struct key_vector *pn = t->kv;
@@ -1861,6 +1912,11 @@ int fib_table_flush(struct fib_table *tb)
 			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
 					       fi, fa->fa_tos, fa->fa_type,
 					       tb->tb_id);
+			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
+						 n->key,
+						 KEYLENGTH - fa->fa_slen,
+						 fi, fa->fa_tos, fa->fa_type,
+						 tb->tb_id, 0);
 			hlist_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 2/6] fib: introduce FIB info offload flag helpers
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

These helpers are to be used in case someone offloads the FIB entry. The
result is that if the entry is offloaded to at least one device, the
offload flag is set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h      | 13 +++++++++++++
 net/switchdev/switchdev.c |  4 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 116a9c0..ffccf17 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -123,6 +123,7 @@ struct fib_info {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	int			fib_weight;
 #endif
+	unsigned int		fib_offload_cnt;
 	struct rcu_head		rcu;
 	struct fib_nh		fib_nh[0];
 #define fib_dev		fib_nh[0].nh_dev
@@ -174,6 +175,18 @@ struct fib_result_nl {
 
 __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
 
+static inline void fib_info_offload_inc(struct fib_info *fi)
+{
+	fi->fib_offload_cnt++;
+	fi->fib_flags |= RTNH_F_OFFLOAD;
+}
+
+static inline void fib_info_offload_dec(struct fib_info *fi)
+{
+	if (--fi->fib_offload_cnt == 0)
+		fi->fib_flags &= ~RTNH_F_OFFLOAD;
+}
+
 #define FIB_RES_SADDR(net, res)				\
 	((FIB_RES_NH(res).nh_saddr_genid ==		\
 	  atomic_read(&(net)->ipv4.dev_addr_genid)) ?	\
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 10b8193..abd8d2a 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -1216,7 +1216,7 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
 	ipv4_fib.obj.orig_dev = dev;
 	err = switchdev_port_obj_add(dev, &ipv4_fib.obj);
 	if (!err)
-		fi->fib_flags |= RTNH_F_OFFLOAD;
+		fib_info_offload_inc(fi);
 
 	return err == -EOPNOTSUPP ? 0 : err;
 }
@@ -1260,7 +1260,7 @@ int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
 	ipv4_fib.obj.orig_dev = dev;
 	err = switchdev_port_obj_del(dev, &ipv4_fib.obj);
 	if (!err)
-		fi->fib_flags &= ~RTNH_F_OFFLOAD;
+		fib_info_offload_dec(fi);
 
 	return err == -EOPNOTSUPP ? 0 : err;
 }
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 3/6] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Until now, in order to offload a FIB entry to HW we use switchdev op.
However that has limits. Mainly in case we need to make the HW aware of
all route prefixes configured in kernel. HW needs to know those in order
to properly trap appropriate packets and pass the to kernel to do
the forwarding. Abort mechanism is now handled within the mlxsw driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   9 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 386 ++++++++++++---------
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   |   9 -
 3 files changed, 216 insertions(+), 188 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 73cae21..9b22863 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -45,7 +45,7 @@
 #include <linux/list.h>
 #include <linux/dcbnl.h>
 #include <linux/in6.h>
-#include <net/switchdev.h>
+#include <linux/notifier.h>
 
 #include "port.h"
 #include "core.h"
@@ -257,6 +257,7 @@ struct mlxsw_sp_router {
 #define MLXSW_SP_UNRESOLVED_NH_PROBE_INTERVAL 5000 /* ms */
 	struct list_head nexthop_group_list;
 	struct list_head nexthop_neighs_list;
+	bool aborted;
 };
 
 struct mlxsw_sp {
@@ -296,6 +297,7 @@ struct mlxsw_sp {
 		struct mlxsw_sp_span_entry *entries;
 		int entries_count;
 	} span;
+	struct notifier_block fib_nb;
 };
 
 static inline struct mlxsw_sp_upper *
@@ -584,11 +586,6 @@ static inline void mlxsw_sp_port_dcb_fini(struct mlxsw_sp_port *mlxsw_sp_port)
 
 int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp);
 void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp);
-int mlxsw_sp_router_fib4_add(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4,
-			     struct switchdev_trans *trans);
-int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4);
 int mlxsw_sp_router_neigh_construct(struct net_device *dev,
 				    struct neighbour *n);
 void mlxsw_sp_router_neigh_destroy(struct net_device *dev,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index cc653ac..2186e25 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -43,6 +43,7 @@
 #include <net/netevent.h>
 #include <net/neighbour.h>
 #include <net/arp.h>
+#include <net/ip_fib.h>
 
 #include "spectrum.h"
 #include "core.h"
@@ -122,17 +123,20 @@ struct mlxsw_sp_nexthop_group;
 
 struct mlxsw_sp_fib_entry {
 	struct rhash_head ht_node;
+	struct list_head list;
 	struct mlxsw_sp_fib_key key;
 	enum mlxsw_sp_fib_entry_type type;
 	unsigned int ref_count;
 	u16 rif; /* used for action local */
 	struct mlxsw_sp_vr *vr;
+	struct fib_info *fi;
 	struct list_head nexthop_group_node;
 	struct mlxsw_sp_nexthop_group *nh_group;
 };
 
 struct mlxsw_sp_fib {
 	struct rhashtable ht;
+	struct list_head entry_list;
 	unsigned long prefix_ref_count[MLXSW_SP_PREFIX_COUNT];
 	struct mlxsw_sp_prefix_usage prefix_usage;
 };
@@ -154,6 +158,7 @@ static int mlxsw_sp_fib_entry_insert(struct mlxsw_sp_fib *fib,
 				     mlxsw_sp_fib_ht_params);
 	if (err)
 		return err;
+	list_add_tail(&fib_entry->list, &fib->entry_list);
 	if (fib->prefix_ref_count[prefix_len]++ == 0)
 		mlxsw_sp_prefix_usage_set(&fib->prefix_usage, prefix_len);
 	return 0;
@@ -166,6 +171,7 @@ static void mlxsw_sp_fib_entry_remove(struct mlxsw_sp_fib *fib,
 
 	if (--fib->prefix_ref_count[prefix_len] == 0)
 		mlxsw_sp_prefix_usage_clear(&fib->prefix_usage, prefix_len);
+	list_del(&fib_entry->list);
 	rhashtable_remove_fast(&fib->ht, &fib_entry->ht_node,
 			       mlxsw_sp_fib_ht_params);
 }
@@ -216,6 +222,7 @@ static struct mlxsw_sp_fib *mlxsw_sp_fib_create(void)
 	err = rhashtable_init(&fib->ht, &mlxsw_sp_fib_ht_params);
 	if (err)
 		goto err_rhashtable_init;
+	INIT_LIST_HEAD(&fib->entry_list);
 	return fib;
 
 err_rhashtable_init:
@@ -1520,85 +1527,6 @@ static void mlxsw_sp_nexthop_group_put(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_nexthop_group_destroy(mlxsw_sp, nh_grp);
 }
 
-static int __mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
-{
-	struct mlxsw_resources *resources;
-	char rgcr_pl[MLXSW_REG_RGCR_LEN];
-	int err;
-
-	resources = mlxsw_core_resources_get(mlxsw_sp->core);
-	if (!resources->max_rif_valid)
-		return -EIO;
-
-	mlxsw_sp->rifs = kcalloc(resources->max_rif,
-				 sizeof(struct mlxsw_sp_rif *), GFP_KERNEL);
-	if (!mlxsw_sp->rifs)
-		return -ENOMEM;
-
-	mlxsw_reg_rgcr_pack(rgcr_pl, true);
-	mlxsw_reg_rgcr_max_router_interfaces_set(rgcr_pl, resources->max_rif);
-	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
-	if (err)
-		goto err_rgcr_fail;
-
-	return 0;
-
-err_rgcr_fail:
-	kfree(mlxsw_sp->rifs);
-	return err;
-}
-
-static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
-{
-	struct mlxsw_resources *resources;
-	char rgcr_pl[MLXSW_REG_RGCR_LEN];
-	int i;
-
-	mlxsw_reg_rgcr_pack(rgcr_pl, false);
-	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
-
-	resources = mlxsw_core_resources_get(mlxsw_sp->core);
-	for (i = 0; i < resources->max_rif; i++)
-		WARN_ON_ONCE(mlxsw_sp->rifs[i]);
-
-	kfree(mlxsw_sp->rifs);
-}
-
-int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
-{
-	int err;
-
-	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_neighs_list);
-	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_group_list);
-	err = __mlxsw_sp_router_init(mlxsw_sp);
-	if (err)
-		return err;
-
-	mlxsw_sp_lpm_init(mlxsw_sp);
-	err = mlxsw_sp_vrs_init(mlxsw_sp);
-	if (err)
-		goto err_vrs_init;
-
-	err =  mlxsw_sp_neigh_init(mlxsw_sp);
-	if (err)
-		goto err_neigh_init;
-
-	return 0;
-
-err_neigh_init:
-	mlxsw_sp_vrs_fini(mlxsw_sp);
-err_vrs_init:
-	__mlxsw_sp_router_fini(mlxsw_sp);
-	return err;
-}
-
-void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
-{
-	mlxsw_sp_neigh_fini(mlxsw_sp);
-	mlxsw_sp_vrs_fini(mlxsw_sp);
-	__mlxsw_sp_router_fini(mlxsw_sp);
-}
-
 static int mlxsw_sp_fib_entry_op4_remote(struct mlxsw_sp *mlxsw_sp,
 					 struct mlxsw_sp_fib_entry *fib_entry,
 					 enum mlxsw_reg_ralue_op op)
@@ -1706,45 +1634,34 @@ static int mlxsw_sp_fib_entry_del(struct mlxsw_sp *mlxsw_sp,
 				     MLXSW_REG_RALUE_OP_WRITE_DELETE);
 }
 
-struct mlxsw_sp_router_fib4_add_info {
-	struct switchdev_trans_item tritem;
-	struct mlxsw_sp *mlxsw_sp;
-	struct mlxsw_sp_fib_entry *fib_entry;
-};
-
-static void mlxsw_sp_router_fib4_add_info_destroy(void const *data)
-{
-	const struct mlxsw_sp_router_fib4_add_info *info = data;
-	struct mlxsw_sp_fib_entry *fib_entry = info->fib_entry;
-	struct mlxsw_sp *mlxsw_sp = info->mlxsw_sp;
-	struct mlxsw_sp_vr *vr = fib_entry->vr;
-
-	mlxsw_sp_fib_entry_destroy(fib_entry);
-	mlxsw_sp_vr_put(mlxsw_sp, vr);
-	kfree(info);
-}
-
 static int
 mlxsw_sp_router_fib4_entry_init(struct mlxsw_sp *mlxsw_sp,
-				const struct switchdev_obj_ipv4_fib *fib4,
+				const struct fib_entry_notifier_info *fen_info,
 				struct mlxsw_sp_fib_entry *fib_entry)
 {
-	struct fib_info *fi = fib4->fi;
+	struct fib_info *fi = fen_info->fi;
 
-	if (fib4->type == RTN_LOCAL || fib4->type == RTN_BROADCAST) {
+	if (fen_info->type == RTN_LOCAL || fen_info->type == RTN_BROADCAST) {
 		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
 		return 0;
 	}
-	if (fib4->type != RTN_UNICAST)
+	if (fen_info->type != RTN_UNICAST)
 		return -EINVAL;
 
 	if (fi->fib_scope != RT_SCOPE_UNIVERSE) {
 		struct mlxsw_sp_rif *r;
 
-		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
 		r = mlxsw_sp_rif_find_by_dev(mlxsw_sp, fi->fib_dev);
-		if (!r)
-			return -EINVAL;
+		if (!r) {
+			/* In case router interface is not found, that means
+			 * the fib entry was added to some device unrelated
+			 * to us. Set trap and pass the packets for
+			 * this prefix to kernel.
+			 */
+			fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
+			return 0;
+		}
+		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
 		fib_entry->rif = r->rif;
 		return 0;
 	}
@@ -1763,37 +1680,38 @@ mlxsw_sp_router_fib4_entry_fini(struct mlxsw_sp *mlxsw_sp,
 
 static struct mlxsw_sp_fib_entry *
 mlxsw_sp_fib_entry_get(struct mlxsw_sp *mlxsw_sp,
-		       const struct switchdev_obj_ipv4_fib *fib4)
+		       const struct fib_entry_notifier_info *fen_info)
 {
 	struct mlxsw_sp_fib_entry *fib_entry;
-	struct fib_info *fi = fib4->fi;
+	struct fib_info *fi = fen_info->fi;
 	struct mlxsw_sp_vr *vr;
 	int err;
 
-	vr = mlxsw_sp_vr_get(mlxsw_sp, fib4->dst_len, fib4->tb_id,
+	vr = mlxsw_sp_vr_get(mlxsw_sp, fen_info->dst_len, fen_info->tb_id,
 			     MLXSW_SP_L3_PROTO_IPV4);
 	if (IS_ERR(vr))
 		return ERR_CAST(vr);
 
-	fib_entry = mlxsw_sp_fib_entry_lookup(vr->fib, &fib4->dst,
-					      sizeof(fib4->dst),
-					      fib4->dst_len, fi->fib_dev);
+	fib_entry = mlxsw_sp_fib_entry_lookup(vr->fib, &fen_info->dst,
+					      sizeof(fen_info->dst),
+					      fen_info->dst_len, fi->fib_dev);
 	if (fib_entry) {
 		/* Already exists, just take a reference */
 		fib_entry->ref_count++;
 		return fib_entry;
 	}
-	fib_entry = mlxsw_sp_fib_entry_create(vr->fib, &fib4->dst,
-					      sizeof(fib4->dst),
-					      fib4->dst_len, fi->fib_dev);
+	fib_entry = mlxsw_sp_fib_entry_create(vr->fib, &fen_info->dst,
+					      sizeof(fen_info->dst),
+					      fen_info->dst_len, fi->fib_dev);
 	if (!fib_entry) {
 		err = -ENOMEM;
 		goto err_fib_entry_create;
 	}
 	fib_entry->vr = vr;
+	fib_entry->fi = fi;
 	fib_entry->ref_count = 1;
 
-	err = mlxsw_sp_router_fib4_entry_init(mlxsw_sp, fib4, fib_entry);
+	err = mlxsw_sp_router_fib4_entry_init(mlxsw_sp, fen_info, fib_entry);
 	if (err)
 		goto err_fib4_entry_init;
 
@@ -1809,17 +1727,19 @@ err_fib_entry_create:
 
 static struct mlxsw_sp_fib_entry *
 mlxsw_sp_fib_entry_find(struct mlxsw_sp *mlxsw_sp,
-			const struct switchdev_obj_ipv4_fib *fib4)
+			const struct fib_entry_notifier_info *fen_info)
 {
 	struct mlxsw_sp_vr *vr;
 
-	vr = mlxsw_sp_vr_find(mlxsw_sp, fib4->tb_id, MLXSW_SP_L3_PROTO_IPV4);
+	vr = mlxsw_sp_vr_find(mlxsw_sp, fen_info->tb_id,
+			      MLXSW_SP_L3_PROTO_IPV4);
 	if (!vr)
 		return NULL;
 
-	return mlxsw_sp_fib_entry_lookup(vr->fib, &fib4->dst,
-					 sizeof(fib4->dst), fib4->dst_len,
-					 fib4->fi->fib_dev);
+	return mlxsw_sp_fib_entry_lookup(vr->fib, &fen_info->dst,
+					 sizeof(fen_info->dst),
+					 fen_info->dst_len,
+					 fen_info->fi->fib_dev);
 }
 
 static void mlxsw_sp_fib_entry_put(struct mlxsw_sp *mlxsw_sp,
@@ -1834,62 +1754,48 @@ static void mlxsw_sp_fib_entry_put(struct mlxsw_sp *mlxsw_sp,
 	mlxsw_sp_vr_put(mlxsw_sp, vr);
 }
 
-static int
-mlxsw_sp_router_fib4_add_prepare(struct mlxsw_sp_port *mlxsw_sp_port,
-				 const struct switchdev_obj_ipv4_fib *fib4,
-				 struct switchdev_trans *trans)
+static void mlxsw_sp_fib_entry_put_all(struct mlxsw_sp *mlxsw_sp,
+				       struct mlxsw_sp_fib_entry *fib_entry)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	struct mlxsw_sp_router_fib4_add_info *info;
-	struct mlxsw_sp_fib_entry *fib_entry;
-	int err;
-
-	fib_entry = mlxsw_sp_fib_entry_get(mlxsw_sp, fib4);
-	if (IS_ERR(fib_entry))
-		return PTR_ERR(fib_entry);
-
-	info = kmalloc(sizeof(*info), GFP_KERNEL);
-	if (!info) {
-		err = -ENOMEM;
-		goto err_alloc_info;
-	}
-	info->mlxsw_sp = mlxsw_sp;
-	info->fib_entry = fib_entry;
-	switchdev_trans_item_enqueue(trans, info,
-				     mlxsw_sp_router_fib4_add_info_destroy,
-				     &info->tritem);
-	return 0;
+	unsigned int last_ref_count;
 
-err_alloc_info:
-	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
-	return err;
+	do {
+		last_ref_count = fib_entry->ref_count;
+		mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
+	} while (last_ref_count != 1);
 }
 
-static int
-mlxsw_sp_router_fib4_add_commit(struct mlxsw_sp_port *mlxsw_sp_port,
-				const struct switchdev_obj_ipv4_fib *fib4,
-				struct switchdev_trans *trans)
+static int mlxsw_sp_router_fib4_add(struct mlxsw_sp *mlxsw_sp,
+				    struct fib_entry_notifier_info *fen_info)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
-	struct mlxsw_sp_router_fib4_add_info *info;
 	struct mlxsw_sp_fib_entry *fib_entry;
 	struct mlxsw_sp_vr *vr;
 	int err;
 
-	info = switchdev_trans_item_dequeue(trans);
-	fib_entry = info->fib_entry;
-	kfree(info);
+	if (mlxsw_sp->router.aborted)
+		return 0;
+
+	fib_entry = mlxsw_sp_fib_entry_get(mlxsw_sp, fen_info);
+	if (IS_ERR(fib_entry)) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to get FIB4 entry being added.\n");
+		return PTR_ERR(fib_entry);
+	}
 
 	if (fib_entry->ref_count != 1)
-		return 0;
+		goto skip_add;
 
 	vr = fib_entry->vr;
 	err = mlxsw_sp_fib_entry_insert(vr->fib, fib_entry);
-	if (err)
+	if (err) {
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to insert FIB4 entry being added.\n");
 		goto err_fib_entry_insert;
-	err = mlxsw_sp_fib_entry_update(mlxsw_sp_port->mlxsw_sp, fib_entry);
+	}
+	err = mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
 	if (err)
 		goto err_fib_entry_add;
+
+skip_add:
+	fib_info_offload_inc(fen_info->fi);
 	return 0;
 
 err_fib_entry_add:
@@ -1899,29 +1805,22 @@ err_fib_entry_insert:
 	return err;
 }
 
-int mlxsw_sp_router_fib4_add(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4,
-			     struct switchdev_trans *trans)
-{
-	if (switchdev_trans_ph_prepare(trans))
-		return mlxsw_sp_router_fib4_add_prepare(mlxsw_sp_port,
-							fib4, trans);
-	return mlxsw_sp_router_fib4_add_commit(mlxsw_sp_port,
-					       fib4, trans);
-}
-
-int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
-			     const struct switchdev_obj_ipv4_fib *fib4)
+static int mlxsw_sp_router_fib4_del(struct mlxsw_sp *mlxsw_sp,
+				    struct fib_entry_notifier_info *fen_info)
 {
-	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 	struct mlxsw_sp_fib_entry *fib_entry;
 
-	fib_entry = mlxsw_sp_fib_entry_find(mlxsw_sp, fib4);
+	if (mlxsw_sp->router.aborted)
+		return 0;
+
+	fib_entry = mlxsw_sp_fib_entry_find(mlxsw_sp, fen_info);
 	if (!fib_entry) {
 		dev_warn(mlxsw_sp->bus_info->dev, "Failed to find FIB4 entry being removed.\n");
 		return -ENOENT;
 	}
 
+	fib_info_offload_dec(fen_info->fi);
+
 	if (fib_entry->ref_count == 1) {
 		mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
 		mlxsw_sp_fib_entry_remove(fib_entry->vr->fib, fib_entry);
@@ -1930,3 +1829,144 @@ int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
 	mlxsw_sp_fib_entry_put(mlxsw_sp, fib_entry);
 	return 0;
 }
+
+static void mlxsw_sp_router_fib4_abort(struct mlxsw_sp *mlxsw_sp)
+{
+	char ralue_pl[MLXSW_REG_RALUE_LEN];
+	struct mlxsw_resources *resources;
+	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_fib_entry *tmp;
+	struct mlxsw_sp_vr *vr;
+	int i;
+	int err;
+
+	resources = mlxsw_core_resources_get(mlxsw_sp->core);
+	for (i = 0; i < resources->max_virtual_routers; i++) {
+		vr = &mlxsw_sp->router.vrs[i];
+		if (!vr->used)
+			continue;
+
+		list_for_each_entry_safe(fib_entry, tmp,
+					 &vr->fib->entry_list, list) {
+			fib_info_offload_dec(fib_entry->fi);
+			mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
+			mlxsw_sp_fib_entry_remove(fib_entry->vr->fib,
+						  fib_entry);
+			mlxsw_sp_fib_entry_put_all(mlxsw_sp, fib_entry);
+		}
+	}
+	mlxsw_sp->router.aborted = true;
+
+	mlxsw_reg_ralue_pack4(ralue_pl, MLXSW_SP_L3_PROTO_IPV4,
+			      MLXSW_REG_RALUE_OP_WRITE_WRITE, 0, 0, 0);
+	mlxsw_reg_ralue_act_ip2me_pack(ralue_pl);
+	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
+	if (err)
+		dev_warn(mlxsw_sp->bus_info->dev, "Failed to set abort trap.\n");
+}
+
+static int __mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_resources *resources;
+	char rgcr_pl[MLXSW_REG_RGCR_LEN];
+	int err;
+
+	resources = mlxsw_core_resources_get(mlxsw_sp->core);
+	if (!resources->max_rif_valid)
+		return -EIO;
+
+	mlxsw_sp->rifs = kcalloc(resources->max_rif,
+				 sizeof(struct mlxsw_sp_rif *), GFP_KERNEL);
+	if (!mlxsw_sp->rifs)
+		return -ENOMEM;
+
+	mlxsw_reg_rgcr_pack(rgcr_pl, true);
+	mlxsw_reg_rgcr_max_router_interfaces_set(rgcr_pl, resources->max_rif);
+	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
+	if (err)
+		goto err_rgcr_fail;
+
+	return 0;
+
+err_rgcr_fail:
+	kfree(mlxsw_sp->rifs);
+	return err;
+}
+
+static void __mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_resources *resources;
+	char rgcr_pl[MLXSW_REG_RGCR_LEN];
+	int i;
+
+	mlxsw_reg_rgcr_pack(rgcr_pl, false);
+	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rgcr), rgcr_pl);
+
+	resources = mlxsw_core_resources_get(mlxsw_sp->core);
+	for (i = 0; i < resources->max_rif; i++)
+		WARN_ON_ONCE(mlxsw_sp->rifs[i]);
+
+	kfree(mlxsw_sp->rifs);
+}
+
+static int mlxsw_sp_router_fib_event(struct notifier_block *nb,
+				     unsigned long event, void *ptr)
+{
+	struct mlxsw_sp *mlxsw_sp = container_of(nb, struct mlxsw_sp, fib_nb);
+	struct fib_entry_notifier_info *fen_info = ptr;
+	int err;
+
+	switch (event) {
+	case FIB_EVENT_ENTRY_ADD:
+		err = mlxsw_sp_router_fib4_add(mlxsw_sp, fen_info);
+		if (err)
+			mlxsw_sp_router_fib4_abort(mlxsw_sp);
+		break;
+	case FIB_EVENT_ENTRY_DEL:
+		mlxsw_sp_router_fib4_del(mlxsw_sp, fen_info);
+		break;
+	case FIB_EVENT_RULE_ADD: /* fall through */
+	case FIB_EVENT_RULE_DEL:
+		mlxsw_sp_router_fib4_abort(mlxsw_sp);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp)
+{
+	int err;
+
+	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_neighs_list);
+	INIT_LIST_HEAD(&mlxsw_sp->router.nexthop_group_list);
+	err = __mlxsw_sp_router_init(mlxsw_sp);
+	if (err)
+		return err;
+
+	mlxsw_sp_lpm_init(mlxsw_sp);
+	err = mlxsw_sp_vrs_init(mlxsw_sp);
+	if (err)
+		goto err_vrs_init;
+
+	err =  mlxsw_sp_neigh_init(mlxsw_sp);
+	if (err)
+		goto err_neigh_init;
+
+	mlxsw_sp->fib_nb.notifier_call = mlxsw_sp_router_fib_event;
+	register_fib_notifier(&mlxsw_sp->fib_nb);
+	return 0;
+
+err_neigh_init:
+	mlxsw_sp_vrs_fini(mlxsw_sp);
+err_vrs_init:
+	__mlxsw_sp_router_fini(mlxsw_sp);
+	return err;
+}
+
+void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	unregister_fib_notifier(&mlxsw_sp->fib_nb);
+	mlxsw_sp_neigh_fini(mlxsw_sp);
+	mlxsw_sp_vrs_fini(mlxsw_sp);
+	__mlxsw_sp_router_fini(mlxsw_sp);
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 2b04b76..5e00c79 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1044,11 +1044,6 @@ static int mlxsw_sp_port_obj_add(struct net_device *dev,
 					      SWITCHDEV_OBJ_PORT_VLAN(obj),
 					      trans);
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = mlxsw_sp_router_fib4_add(mlxsw_sp_port,
-					       SWITCHDEV_OBJ_IPV4_FIB(obj),
-					       trans);
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = mlxsw_sp_port_fdb_static_add(mlxsw_sp_port,
 						   SWITCHDEV_OBJ_PORT_FDB(obj),
@@ -1181,10 +1176,6 @@ static int mlxsw_sp_port_obj_del(struct net_device *dev,
 		err = mlxsw_sp_port_vlans_del(mlxsw_sp_port,
 					      SWITCHDEV_OBJ_PORT_VLAN(obj));
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = mlxsw_sp_router_fib4_del(mlxsw_sp_port,
-					       SWITCHDEV_OBJ_IPV4_FIB(obj));
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = mlxsw_sp_port_fdb_static_del(mlxsw_sp_port,
 						   SWITCHDEV_OBJ_PORT_FDB(obj));
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 4/6] rocker: use FIB notifications instead of switchdev calls
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Until now, in order to offload a FIB entry to HW we use switchdev op.
Use the newly introduced FIB notifier infrasturucture to process
FIB entries to offload the in HW.
Abort mechanism is now handled within the rocker driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/rocker/rocker.h       |  15 ++--
 drivers/net/ethernet/rocker/rocker_main.c  | 120 +++++++++++++++++++++--------
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 115 ++++++++++++++++++++-------
 3 files changed, 185 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
index 1ab995f..2eb9b49 100644
--- a/drivers/net/ethernet/rocker/rocker.h
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -15,6 +15,7 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/netdevice.h>
+#include <linux/notifier.h>
 #include <net/neighbour.h>
 #include <net/switchdev.h>
 
@@ -52,6 +53,9 @@ struct rocker_port {
 	struct rocker_dma_ring_info rx_ring;
 };
 
+struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
+					       struct rocker *rocker);
+
 struct rocker_world_ops;
 
 struct rocker {
@@ -66,6 +70,7 @@ struct rocker {
 	spinlock_t cmd_ring_lock;		/* for cmd ring accesses */
 	struct rocker_dma_ring_info cmd_ring;
 	struct rocker_dma_ring_info event_ring;
+	struct notifier_block fib_nb;
 	struct rocker_world_ops *wops;
 	void *wpriv;
 };
@@ -117,11 +122,6 @@ struct rocker_world_ops {
 	int (*port_obj_vlan_dump)(const struct rocker_port *rocker_port,
 				  struct switchdev_obj_port_vlan *vlan,
 				  switchdev_obj_dump_cb_t *cb);
-	int (*port_obj_fib4_add)(struct rocker_port *rocker_port,
-				 const struct switchdev_obj_ipv4_fib *fib4,
-				 struct switchdev_trans *trans);
-	int (*port_obj_fib4_del)(struct rocker_port *rocker_port,
-				 const struct switchdev_obj_ipv4_fib *fib4);
 	int (*port_obj_fdb_add)(struct rocker_port *rocker_port,
 				const struct switchdev_obj_port_fdb *fdb,
 				struct switchdev_trans *trans);
@@ -141,6 +141,11 @@ struct rocker_world_ops {
 	int (*port_ev_mac_vlan_seen)(struct rocker_port *rocker_port,
 				     const unsigned char *addr,
 				     __be16 vlan_id);
+	int (*fib4_add)(struct rocker *rocker,
+			const struct fib_entry_notifier_info *fen_info);
+	int (*fib4_del)(struct rocker *rocker,
+			const struct fib_entry_notifier_info *fen_info);
+	void (*fib4_abort)(struct rocker *rocker);
 };
 
 extern struct rocker_world_ops rocker_ofdpa_ops;
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 1f0c086..5424fb3 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1625,29 +1625,6 @@ rocker_world_port_obj_vlan_dump(const struct rocker_port *rocker_port,
 }
 
 static int
-rocker_world_port_obj_fib4_add(struct rocker_port *rocker_port,
-			       const struct switchdev_obj_ipv4_fib *fib4,
-			       struct switchdev_trans *trans)
-{
-	struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
-	if (!wops->port_obj_fib4_add)
-		return -EOPNOTSUPP;
-	return wops->port_obj_fib4_add(rocker_port, fib4, trans);
-}
-
-static int
-rocker_world_port_obj_fib4_del(struct rocker_port *rocker_port,
-			       const struct switchdev_obj_ipv4_fib *fib4)
-{
-	struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
-	if (!wops->port_obj_fib4_del)
-		return -EOPNOTSUPP;
-	return wops->port_obj_fib4_del(rocker_port, fib4);
-}
-
-static int
 rocker_world_port_obj_fdb_add(struct rocker_port *rocker_port,
 			      const struct switchdev_obj_port_fdb *fdb,
 			      struct switchdev_trans *trans)
@@ -1733,6 +1710,34 @@ static int rocker_world_port_ev_mac_vlan_seen(struct rocker_port *rocker_port,
 	return wops->port_ev_mac_vlan_seen(rocker_port, addr, vlan_id);
 }
 
+static int rocker_world_fib4_add(struct rocker *rocker,
+				 const struct fib_entry_notifier_info *fen_info)
+{
+	struct rocker_world_ops *wops = rocker->wops;
+
+	if (!wops->fib4_add)
+		return 0;
+	return wops->fib4_add(rocker, fen_info);
+}
+
+static int rocker_world_fib4_del(struct rocker *rocker,
+				 const struct fib_entry_notifier_info *fen_info)
+{
+	struct rocker_world_ops *wops = rocker->wops;
+
+	if (!wops->fib4_del)
+		return 0;
+	return wops->fib4_del(rocker, fen_info);
+}
+
+static void rocker_world_fib4_abort(struct rocker *rocker)
+{
+	struct rocker_world_ops *wops = rocker->wops;
+
+	if (wops->fib4_abort)
+		wops->fib4_abort(rocker);
+}
+
 /*****************
  * Net device ops
  *****************/
@@ -2096,11 +2101,6 @@ static int rocker_port_obj_add(struct net_device *dev,
 						     SWITCHDEV_OBJ_PORT_VLAN(obj),
 						     trans);
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = rocker_world_port_obj_fib4_add(rocker_port,
-						     SWITCHDEV_OBJ_IPV4_FIB(obj),
-						     trans);
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = rocker_world_port_obj_fdb_add(rocker_port,
 						    SWITCHDEV_OBJ_PORT_FDB(obj),
@@ -2125,10 +2125,6 @@ static int rocker_port_obj_del(struct net_device *dev,
 		err = rocker_world_port_obj_vlan_del(rocker_port,
 						     SWITCHDEV_OBJ_PORT_VLAN(obj));
 		break;
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		err = rocker_world_port_obj_fib4_del(rocker_port,
-						     SWITCHDEV_OBJ_IPV4_FIB(obj));
-		break;
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		err = rocker_world_port_obj_fdb_del(rocker_port,
 						    SWITCHDEV_OBJ_PORT_FDB(obj));
@@ -2175,6 +2171,31 @@ static const struct switchdev_ops rocker_port_switchdev_ops = {
 	.switchdev_port_obj_dump	= rocker_port_obj_dump,
 };
 
+static int rocker_router_fib_event(struct notifier_block *nb,
+				   unsigned long event, void *ptr)
+{
+	struct rocker *rocker = container_of(nb, struct rocker, fib_nb);
+	struct fib_entry_notifier_info *fen_info = ptr;
+	int err;
+
+	switch (event) {
+	case FIB_EVENT_ENTRY_ADD:
+		err = rocker_world_fib4_add(rocker, fen_info);
+		if (err)
+			rocker_world_fib4_abort(rocker);
+		else
+		break;
+	case FIB_EVENT_ENTRY_DEL:
+		rocker_world_fib4_del(rocker, fen_info);
+		break;
+	case FIB_EVENT_RULE_ADD: /* fall through */
+	case FIB_EVENT_RULE_DEL:
+		rocker_world_fib4_abort(rocker);
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
 /********************
  * ethtool interface
  ********************/
@@ -2740,6 +2761,9 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_probe_ports;
 	}
 
+	rocker->fib_nb.notifier_call = rocker_router_fib_event;
+	register_fib_notifier(&rocker->fib_nb);
+
 	dev_info(&pdev->dev, "Rocker switch with id %*phN\n",
 		 (int)sizeof(rocker->hw.id), &rocker->hw.id);
 
@@ -2771,6 +2795,7 @@ static void rocker_remove(struct pci_dev *pdev)
 {
 	struct rocker *rocker = pci_get_drvdata(pdev);
 
+	unregister_fib_notifier(&rocker->fib_nb);
 	rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
 	rocker_remove_ports(rocker);
 	free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
@@ -2799,6 +2824,37 @@ static bool rocker_port_dev_check(const struct net_device *dev)
 	return dev->netdev_ops == &rocker_port_netdev_ops;
 }
 
+static bool rocker_port_dev_check_under(const struct net_device *dev,
+					struct rocker *rocker)
+{
+	struct rocker_port *rocker_port;
+
+	if (!rocker_port_dev_check(dev))
+		return false;
+
+	rocker_port = netdev_priv(dev);
+	if (rocker_port->rocker != rocker)
+		return false;
+
+	return true;
+}
+
+struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
+					       struct rocker *rocker)
+{
+	struct net_device *lower_dev;
+	struct list_head *iter;
+
+	if (rocker_port_dev_check_under(dev, rocker))
+		return netdev_priv(dev);
+
+	netdev_for_each_all_lower_dev(dev, lower_dev, iter) {
+		if (rocker_port_dev_check_under(lower_dev, rocker))
+			return netdev_priv(lower_dev);
+	}
+	return NULL;
+}
+
 static int rocker_netdevice_event(struct notifier_block *unused,
 				  unsigned long event, void *ptr)
 {
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index fcad907..431a608 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -99,6 +99,7 @@ struct ofdpa_flow_tbl_entry {
 	struct ofdpa_flow_tbl_key key;
 	size_t key_len;
 	u32 key_crc32; /* key */
+	struct fib_info *fi;
 };
 
 struct ofdpa_group_tbl_entry {
@@ -189,6 +190,7 @@ struct ofdpa {
 	spinlock_t neigh_tbl_lock;		/* for neigh tbl accesses */
 	u32 neigh_tbl_next_index;
 	unsigned long ageing_time;
+	bool fib_aborted;
 };
 
 struct ofdpa_port {
@@ -1043,7 +1045,8 @@ static int ofdpa_flow_tbl_ucast4_routing(struct ofdpa_port *ofdpa_port,
 					 __be16 eth_type, __be32 dst,
 					 __be32 dst_mask, u32 priority,
 					 enum rocker_of_dpa_table_id goto_tbl,
-					 u32 group_id, int flags)
+					 u32 group_id, struct fib_info *fi,
+					 int flags)
 {
 	struct ofdpa_flow_tbl_entry *entry;
 
@@ -1060,6 +1063,7 @@ static int ofdpa_flow_tbl_ucast4_routing(struct ofdpa_port *ofdpa_port,
 	entry->key.ucast_routing.group_id = group_id;
 	entry->key_len = offsetof(struct ofdpa_flow_tbl_key,
 				  ucast_routing.group_id);
+	entry->fi = fi;
 
 	return ofdpa_flow_tbl_do(ofdpa_port, trans, flags, entry);
 }
@@ -1425,7 +1429,7 @@ static int ofdpa_port_ipv4_neigh(struct ofdpa_port *ofdpa_port,
 						    eth_type, ip_addr,
 						    inet_make_mask(32),
 						    priority, goto_tbl,
-						    group_id, flags);
+						    group_id, NULL, flags);
 
 		if (err)
 			netdev_err(ofdpa_port->dev, "Error (%d) /32 unicast route %pI4 group 0x%08x\n",
@@ -2390,7 +2394,7 @@ found:
 
 static int ofdpa_port_fib_ipv4(struct ofdpa_port *ofdpa_port,
 			       struct switchdev_trans *trans, __be32 dst,
-			       int dst_len, const struct fib_info *fi,
+			       int dst_len, struct fib_info *fi,
 			       u32 tb_id, int flags)
 {
 	const struct fib_nh *nh;
@@ -2426,7 +2430,7 @@ static int ofdpa_port_fib_ipv4(struct ofdpa_port *ofdpa_port,
 
 	err = ofdpa_flow_tbl_ucast4_routing(ofdpa_port, trans, eth_type, dst,
 					    dst_mask, priority, goto_tbl,
-					    group_id, flags);
+					    group_id, fi, flags);
 	if (err)
 		netdev_err(ofdpa_port->dev, "Error (%d) IPv4 route %pI4\n",
 			   err, &dst);
@@ -2718,28 +2722,6 @@ static int ofdpa_port_obj_vlan_dump(const struct rocker_port *rocker_port,
 	return err;
 }
 
-static int ofdpa_port_obj_fib4_add(struct rocker_port *rocker_port,
-				   const struct switchdev_obj_ipv4_fib *fib4,
-				   struct switchdev_trans *trans)
-{
-	struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
-
-	return ofdpa_port_fib_ipv4(ofdpa_port, trans,
-				   htonl(fib4->dst), fib4->dst_len,
-				   fib4->fi, fib4->tb_id, 0);
-}
-
-static int ofdpa_port_obj_fib4_del(struct rocker_port *rocker_port,
-				   const struct switchdev_obj_ipv4_fib *fib4)
-{
-	struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
-
-	return ofdpa_port_fib_ipv4(ofdpa_port, NULL,
-				   htonl(fib4->dst), fib4->dst_len,
-				   fib4->fi, fib4->tb_id,
-				   OFDPA_OP_FLAG_REMOVE);
-}
-
 static int ofdpa_port_obj_fdb_add(struct rocker_port *rocker_port,
 				  const struct switchdev_obj_port_fdb *fdb,
 				  struct switchdev_trans *trans)
@@ -2922,6 +2904,82 @@ static int ofdpa_port_ev_mac_vlan_seen(struct rocker_port *rocker_port,
 	return ofdpa_port_fdb(ofdpa_port, NULL, addr, vlan_id, flags);
 }
 
+static struct ofdpa_port *ofdpa_port_dev_lower_find(struct net_device *dev,
+						    struct rocker *rocker)
+{
+	struct rocker_port *rocker_port;
+
+	rocker_port = rocker_port_dev_lower_find(dev, rocker);
+	return rocker_port ? rocker_port->wpriv : NULL;
+}
+
+static int ofdpa_fib4_add(struct rocker *rocker,
+			  const struct fib_entry_notifier_info *fen_info)
+{
+	struct ofdpa *ofdpa = rocker->wpriv;
+	struct ofdpa_port *ofdpa_port;
+	int err;
+
+	if (ofdpa->fib_aborted)
+		return 0;
+	ofdpa_port = ofdpa_port_dev_lower_find(fen_info->fi->fib_dev, rocker);
+	if (!ofdpa_port)
+		return 0;
+	err = ofdpa_port_fib_ipv4(ofdpa_port, NULL, htonl(fen_info->dst),
+				  fen_info->dst_len, fen_info->fi,
+				  fen_info->tb_id, 0);
+	if (err)
+		return err;
+	fib_info_offload_inc(fen_info->fi);
+	return 0;
+}
+
+static int ofdpa_fib4_del(struct rocker *rocker,
+			  const struct fib_entry_notifier_info *fen_info)
+{
+	struct ofdpa *ofdpa = rocker->wpriv;
+	struct ofdpa_port *ofdpa_port;
+
+	if (ofdpa->fib_aborted)
+		return 0;
+	ofdpa_port = ofdpa_port_dev_lower_find(fen_info->fi->fib_dev, rocker);
+	if (!ofdpa_port)
+		return 0;
+	fib_info_offload_dec(fen_info->fi);
+	return ofdpa_port_fib_ipv4(ofdpa_port, NULL, htonl(fen_info->dst),
+				   fen_info->dst_len, fen_info->fi,
+				   fen_info->tb_id, OFDPA_OP_FLAG_REMOVE);
+}
+
+static void ofdpa_fib4_abort(struct rocker *rocker)
+{
+	struct ofdpa *ofdpa = rocker->wpriv;
+	struct ofdpa_port *ofdpa_port;
+	struct ofdpa_flow_tbl_entry *flow_entry;
+	struct hlist_node *tmp;
+	unsigned long flags;
+	int bkt;
+
+	if (ofdpa->fib_aborted)
+		return;
+
+	spin_lock_irqsave(&ofdpa->flow_tbl_lock, flags);
+	hash_for_each_safe(ofdpa->flow_tbl, bkt, tmp, flow_entry, entry) {
+		if (flow_entry->key.tbl_id !=
+		    ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING)
+			continue;
+		ofdpa_port = ofdpa_port_dev_lower_find(flow_entry->fi->fib_dev,
+						       rocker);
+		if (!ofdpa_port)
+			continue;
+		fib_info_offload_dec(flow_entry->fi);
+		ofdpa_flow_tbl_del(ofdpa_port, NULL, OFDPA_OP_FLAG_REMOVE,
+				   flow_entry);
+	}
+	spin_unlock_irqrestore(&ofdpa->flow_tbl_lock, flags);
+	ofdpa->fib_aborted = true;
+}
+
 struct rocker_world_ops rocker_ofdpa_ops = {
 	.kind = "ofdpa",
 	.priv_size = sizeof(struct ofdpa),
@@ -2941,8 +2999,6 @@ struct rocker_world_ops rocker_ofdpa_ops = {
 	.port_obj_vlan_add = ofdpa_port_obj_vlan_add,
 	.port_obj_vlan_del = ofdpa_port_obj_vlan_del,
 	.port_obj_vlan_dump = ofdpa_port_obj_vlan_dump,
-	.port_obj_fib4_add = ofdpa_port_obj_fib4_add,
-	.port_obj_fib4_del = ofdpa_port_obj_fib4_del,
 	.port_obj_fdb_add = ofdpa_port_obj_fdb_add,
 	.port_obj_fdb_del = ofdpa_port_obj_fdb_del,
 	.port_obj_fdb_dump = ofdpa_port_obj_fdb_dump,
@@ -2951,4 +3007,7 @@ struct rocker_world_ops rocker_ofdpa_ops = {
 	.port_neigh_update = ofdpa_port_neigh_update,
 	.port_neigh_destroy = ofdpa_port_neigh_destroy,
 	.port_ev_mac_vlan_seen = ofdpa_port_ev_mac_vlan_seen,
+	.fib4_add = ofdpa_fib4_add,
+	.fib4_del = ofdpa_fib4_del,
+	.fib4_abort = ofdpa_fib4_abort,
 };
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 5/6] switchdev: remove FIB offload infrastructure
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Since this is now taken care of by FIB notifier, remove the code, with
all unused dependencies.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/ip_fib.h      |   2 -
 include/net/switchdev.h   |  40 ----------
 net/ipv4/fib_frontend.c   |  13 ----
 net/ipv4/fib_rules.c      |   2 -
 net/ipv4/fib_trie.c       | 104 +-------------------------
 net/switchdev/switchdev.c | 181 ----------------------------------------------
 6 files changed, 1 insertion(+), 341 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ffccf17..b9314b4 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -243,7 +243,6 @@ int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
 		   struct netlink_callback *cb);
 int fib_table_flush(struct net *net, struct fib_table *table);
 struct fib_table *fib_trie_unmerge(struct fib_table *main_tb);
-void fib_table_flush_external(struct fib_table *table);
 void fib_free_table(struct fib_table *tb);
 
 #ifndef CONFIG_IP_MULTIPLE_TABLES
@@ -356,7 +355,6 @@ static inline int fib_num_tclassid_users(struct net *net)
 }
 #endif
 int fib_unmerge(struct net *net);
-void fib_flush_external(struct net *net);
 
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 729fe15..eba80c4 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -68,7 +68,6 @@ struct switchdev_attr {
 enum switchdev_obj_id {
 	SWITCHDEV_OBJ_ID_UNDEFINED,
 	SWITCHDEV_OBJ_ID_PORT_VLAN,
-	SWITCHDEV_OBJ_ID_IPV4_FIB,
 	SWITCHDEV_OBJ_ID_PORT_FDB,
 	SWITCHDEV_OBJ_ID_PORT_MDB,
 };
@@ -92,21 +91,6 @@ struct switchdev_obj_port_vlan {
 #define SWITCHDEV_OBJ_PORT_VLAN(obj) \
 	container_of(obj, struct switchdev_obj_port_vlan, obj)
 
-/* SWITCHDEV_OBJ_ID_IPV4_FIB */
-struct switchdev_obj_ipv4_fib {
-	struct switchdev_obj obj;
-	u32 dst;
-	int dst_len;
-	struct fib_info *fi;
-	u8 tos;
-	u8 type;
-	u32 nlflags;
-	u32 tb_id;
-};
-
-#define SWITCHDEV_OBJ_IPV4_FIB(obj) \
-	container_of(obj, struct switchdev_obj_ipv4_fib, obj)
-
 /* SWITCHDEV_OBJ_ID_PORT_FDB */
 struct switchdev_obj_port_fdb {
 	struct switchdev_obj obj;
@@ -209,11 +193,6 @@ int switchdev_port_bridge_setlink(struct net_device *dev,
 				  struct nlmsghdr *nlh, u16 flags);
 int switchdev_port_bridge_dellink(struct net_device *dev,
 				  struct nlmsghdr *nlh, u16 flags);
-int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 nlflags, u32 tb_id);
-int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 tb_id);
-void switchdev_fib_ipv4_abort(struct fib_info *fi);
 int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 			   struct net_device *dev, const unsigned char *addr,
 			   u16 vid, u16 nlm_flags);
@@ -304,25 +283,6 @@ static inline int switchdev_port_bridge_dellink(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
-static inline int switchdev_fib_ipv4_add(u32 dst, int dst_len,
-					 struct fib_info *fi,
-					 u8 tos, u8 type,
-					 u32 nlflags, u32 tb_id)
-{
-	return 0;
-}
-
-static inline int switchdev_fib_ipv4_del(u32 dst, int dst_len,
-					 struct fib_info *fi,
-					 u8 tos, u8 type, u32 tb_id)
-{
-	return 0;
-}
-
-static inline void switchdev_fib_ipv4_abort(struct fib_info *fi)
-{
-}
-
 static inline int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 					 struct net_device *dev,
 					 const unsigned char *addr,
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 86c43dc..c3b8047 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -189,19 +189,6 @@ static void fib_flush(struct net *net)
 		rt_cache_flush(net);
 }
 
-void fib_flush_external(struct net *net)
-{
-	struct fib_table *tb;
-	struct hlist_head *head;
-	unsigned int h;
-
-	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
-		head = &net->ipv4.fib_table_hash[h];
-		hlist_for_each_entry(tb, head, tb_hlist)
-			fib_table_flush_external(tb);
-	}
-}
-
 /*
  * Find address type as if only "dev" was present in the system. If
  * on_dev is NULL then all interfaces are taken into consideration.
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index ebadf6b..2e50062 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -228,7 +228,6 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
 	rule4->tos = frh->tos;
 
 	net->ipv4.fib_has_custom_rules = true;
-	fib_flush_external(rule->fr_net);
 	call_fib_rule_notifiers(net, FIB_EVENT_RULE_ADD);
 
 	err = 0;
@@ -251,7 +250,6 @@ static int fib4_rule_delete(struct fib_rule *rule)
 		net->ipv4.fib_num_tclassid_users--;
 #endif
 	net->ipv4.fib_has_custom_rules = true;
-	fib_flush_external(rule->fr_net);
 	call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL);
 errout:
 	return err;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 51a4537..31cef36 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -81,7 +81,6 @@
 #include <net/tcp.h>
 #include <net/sock.h>
 #include <net/ip_fib.h>
-#include <net/switchdev.h>
 #include <trace/events/fib.h>
 #include "fib_lookup.h"
 
@@ -1215,17 +1214,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 			new_fa->tb_id = tb->tb_id;
 			new_fa->fa_default = -1;
 
-			err = switchdev_fib_ipv4_add(key, plen, fi,
-						     new_fa->fa_tos,
-						     cfg->fc_type,
-						     cfg->fc_nlflags,
-						     tb->tb_id);
-			if (err) {
-				switchdev_fib_ipv4_abort(fi);
-				kmem_cache_free(fn_alias_kmem, new_fa);
-				goto out;
-			}
-
 			hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
 
 			alias_free_mem_rcu(fa);
@@ -1273,18 +1261,10 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 	new_fa->tb_id = tb->tb_id;
 	new_fa->fa_default = -1;
 
-	/* (Optionally) offload fib entry to switch hardware. */
-	err = switchdev_fib_ipv4_add(key, plen, fi, tos, cfg->fc_type,
-				     cfg->fc_nlflags, tb->tb_id);
-	if (err) {
-		switchdev_fib_ipv4_abort(fi);
-		goto out_free_new_fa;
-	}
-
 	/* Insert new entry to the list. */
 	err = fib_insert_alias(t, tp, l, new_fa, fa, key);
 	if (err)
-		goto out_sw_fib_del;
+		goto out_free_new_fa;
 
 	if (!plen)
 		tb->tb_num_default++;
@@ -1297,8 +1277,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
 succeeded:
 	return 0;
 
-out_sw_fib_del:
-	switchdev_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb->tb_id);
 out_free_new_fa:
 	kmem_cache_free(fn_alias_kmem, new_fa);
 out:
@@ -1591,9 +1569,6 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
 	if (!fa_to_delete)
 		return -ESRCH;
 
-	switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
-			       cfg->fc_type, tb->tb_id);
-
 	call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
 				 fa_to_delete->fa_info, tos, cfg->fc_type,
 				 tb->tb_id, 0);
@@ -1785,80 +1760,6 @@ out:
 	return NULL;
 }
 
-/* Caller must hold RTNL */
-void fib_table_flush_external(struct fib_table *tb)
-{
-	struct trie *t = (struct trie *)tb->tb_data;
-	struct key_vector *pn = t->kv;
-	unsigned long cindex = 1;
-	struct hlist_node *tmp;
-	struct fib_alias *fa;
-
-	/* walk trie in reverse order */
-	for (;;) {
-		unsigned char slen = 0;
-		struct key_vector *n;
-
-		if (!(cindex--)) {
-			t_key pkey = pn->key;
-
-			/* cannot resize the trie vector */
-			if (IS_TRIE(pn))
-				break;
-
-			/* resize completed node */
-			pn = resize(t, pn);
-			cindex = get_index(pkey, pn);
-
-			continue;
-		}
-
-		/* grab the next available node */
-		n = get_child(pn, cindex);
-		if (!n)
-			continue;
-
-		if (IS_TNODE(n)) {
-			/* record pn and cindex for leaf walking */
-			pn = n;
-			cindex = 1ul << n->bits;
-
-			continue;
-		}
-
-		hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
-			struct fib_info *fi = fa->fa_info;
-
-			/* if alias was cloned to local then we just
-			 * need to remove the local copy from main
-			 */
-			if (tb->tb_id != fa->tb_id) {
-				hlist_del_rcu(&fa->fa_list);
-				alias_free_mem_rcu(fa);
-				continue;
-			}
-
-			/* record local slen */
-			slen = fa->fa_slen;
-
-			if (!fi || !(fi->fib_flags & RTNH_F_OFFLOAD))
-				continue;
-
-			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
-					       fi, fa->fa_tos, fa->fa_type,
-					       tb->tb_id);
-		}
-
-		/* update leaf slen */
-		n->slen = slen;
-
-		if (hlist_empty(&n->leaf)) {
-			put_child_root(pn, n->key, NULL);
-			node_free(n);
-		}
-	}
-}
-
 /* Caller must hold RTNL. */
 int fib_table_flush(struct net *net, struct fib_table *tb)
 {
@@ -1909,9 +1810,6 @@ int fib_table_flush(struct net *net, struct fib_table *tb)
 				continue;
 			}
 
-			switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
-					       fi, fa->fa_tos, fa->fa_type,
-					       tb->tb_id);
 			call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
 						 n->key,
 						 KEYLENGTH - fa->fa_slen,
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index abd8d2a..02beb35 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -21,7 +21,6 @@
 #include <linux/workqueue.h>
 #include <linux/if_vlan.h>
 #include <linux/rtnetlink.h>
-#include <net/ip_fib.h>
 #include <net/switchdev.h>
 
 /**
@@ -344,8 +343,6 @@ static size_t switchdev_obj_size(const struct switchdev_obj *obj)
 	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
 		return sizeof(struct switchdev_obj_port_vlan);
-	case SWITCHDEV_OBJ_ID_IPV4_FIB:
-		return sizeof(struct switchdev_obj_ipv4_fib);
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
 		return sizeof(struct switchdev_obj_port_fdb);
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
@@ -1108,184 +1105,6 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fdb_dump);
 
-static struct net_device *switchdev_get_lowest_dev(struct net_device *dev)
-{
-	const struct switchdev_ops *ops = dev->switchdev_ops;
-	struct net_device *lower_dev;
-	struct net_device *port_dev;
-	struct list_head *iter;
-
-	/* Recusively search down until we find a sw port dev.
-	 * (A sw port dev supports switchdev_port_attr_get).
-	 */
-
-	if (ops && ops->switchdev_port_attr_get)
-		return dev;
-
-	netdev_for_each_lower_dev(dev, lower_dev, iter) {
-		port_dev = switchdev_get_lowest_dev(lower_dev);
-		if (port_dev)
-			return port_dev;
-	}
-
-	return NULL;
-}
-
-static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi)
-{
-	struct switchdev_attr attr = {
-		.id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
-	};
-	struct switchdev_attr prev_attr;
-	struct net_device *dev = NULL;
-	int nhsel;
-
-	ASSERT_RTNL();
-
-	/* For this route, all nexthop devs must be on the same switch. */
-
-	for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
-		const struct fib_nh *nh = &fi->fib_nh[nhsel];
-
-		if (!nh->nh_dev)
-			return NULL;
-
-		dev = switchdev_get_lowest_dev(nh->nh_dev);
-		if (!dev)
-			return NULL;
-
-		attr.orig_dev = dev;
-		if (switchdev_port_attr_get(dev, &attr))
-			return NULL;
-
-		if (nhsel > 0 &&
-		    !netdev_phys_item_id_same(&prev_attr.u.ppid, &attr.u.ppid))
-				return NULL;
-
-		prev_attr = attr;
-	}
-
-	return dev;
-}
-
-/**
- *	switchdev_fib_ipv4_add - Add/modify switch IPv4 route entry
- *
- *	@dst: route's IPv4 destination address
- *	@dst_len: destination address length (prefix length)
- *	@fi: route FIB info structure
- *	@tos: route TOS
- *	@type: route type
- *	@nlflags: netlink flags passed in (NLM_F_*)
- *	@tb_id: route table ID
- *
- *	Add/modify switch IPv4 route entry.
- */
-int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 nlflags, u32 tb_id)
-{
-	struct switchdev_obj_ipv4_fib ipv4_fib = {
-		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
-		.dst = dst,
-		.dst_len = dst_len,
-		.fi = fi,
-		.tos = tos,
-		.type = type,
-		.nlflags = nlflags,
-		.tb_id = tb_id,
-	};
-	struct net_device *dev;
-	int err = 0;
-
-	/* Don't offload route if using custom ip rules or if
-	 * IPv4 FIB offloading has been disabled completely.
-	 */
-
-#ifdef CONFIG_IP_MULTIPLE_TABLES
-	if (fi->fib_net->ipv4.fib_has_custom_rules)
-		return 0;
-#endif
-
-	if (fi->fib_net->ipv4.fib_offload_disabled)
-		return 0;
-
-	dev = switchdev_get_dev_by_nhs(fi);
-	if (!dev)
-		return 0;
-
-	ipv4_fib.obj.orig_dev = dev;
-	err = switchdev_port_obj_add(dev, &ipv4_fib.obj);
-	if (!err)
-		fib_info_offload_inc(fi);
-
-	return err == -EOPNOTSUPP ? 0 : err;
-}
-EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_add);
-
-/**
- *	switchdev_fib_ipv4_del - Delete IPv4 route entry from switch
- *
- *	@dst: route's IPv4 destination address
- *	@dst_len: destination address length (prefix length)
- *	@fi: route FIB info structure
- *	@tos: route TOS
- *	@type: route type
- *	@tb_id: route table ID
- *
- *	Delete IPv4 route entry from switch device.
- */
-int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
-			   u8 tos, u8 type, u32 tb_id)
-{
-	struct switchdev_obj_ipv4_fib ipv4_fib = {
-		.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
-		.dst = dst,
-		.dst_len = dst_len,
-		.fi = fi,
-		.tos = tos,
-		.type = type,
-		.nlflags = 0,
-		.tb_id = tb_id,
-	};
-	struct net_device *dev;
-	int err = 0;
-
-	if (!(fi->fib_flags & RTNH_F_OFFLOAD))
-		return 0;
-
-	dev = switchdev_get_dev_by_nhs(fi);
-	if (!dev)
-		return 0;
-
-	ipv4_fib.obj.orig_dev = dev;
-	err = switchdev_port_obj_del(dev, &ipv4_fib.obj);
-	if (!err)
-		fib_info_offload_dec(fi);
-
-	return err == -EOPNOTSUPP ? 0 : err;
-}
-EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_del);
-
-/**
- *	switchdev_fib_ipv4_abort - Abort an IPv4 FIB operation
- *
- *	@fi: route FIB info structure
- */
-void switchdev_fib_ipv4_abort(struct fib_info *fi)
-{
-	/* There was a problem installing this route to the offload
-	 * device.  For now, until we come up with more refined
-	 * policy handling, abruptly end IPv4 fib offloading for
-	 * for entire net by flushing offload device(s) of all
-	 * IPv4 routes, and mark IPv4 fib offloading broken from
-	 * this point forward.
-	 */
-
-	fib_flush_external(fi->fib_net);
-	fi->fib_net->ipv4.fib_offload_disabled = true;
-}
-EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_abort);
-
 bool switchdev_port_same_parent_id(struct net_device *a,
 				   struct net_device *b)
 {
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 6/6] doc: update switchdev L3 section
From: Jiri Pirko @ 2016-09-21 11:53 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, nogahf, ogerlitz, roopa, nikolay,
	linville, andy, f.fainelli, dsa, jhs, vivien.didelot, andrew,
	ivecera, kaber, john
In-Reply-To: <1474458794-5512-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

This is to reflect the change of FIB offload infrastructure from
switchdev objects to FIB notifier.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 Documentation/networking/switchdev.txt | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
index 44235e8..c956ab8 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -314,30 +314,29 @@ the kernel, with the device doing the FIB lookup and forwarding.  The device
 does a longest prefix match (LPM) on FIB entries matching route prefix and
 forwards the packet to the matching FIB entry's nexthop(s) egress ports.
 
-To program the device, the driver implements support for
-SWITCHDEV_OBJ_IPV[4|6]_FIB object using switchdev_port_obj_xxx ops.
-switchdev_port_obj_add is used for both adding a new FIB entry to the device,
-or modifying an existing entry on the device.
+To program the device, the driver has to register a FIB notifier handler
+using register_fib_notifier. There are following events available:
+FIB_EVENT_ENTRY_ADD: used for both adding a new FIB entry to the device,
+                     or modifying an existing entry on the device.
+FIB_EVENT_ENTRY_DEL: used for removing a FIB entry
+FIB_EVENT_RULE_ADD, FIB_EVENT_RULE_DEL: used to propagate FIB rule changes
 
-XXX: Currently, only SWITCHDEV_OBJ_ID_IPV4_FIB objects are supported.
+FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:
 
-SWITCHDEV_OBJ_ID_IPV4_FIB object passes:
-
-	struct switchdev_obj_ipv4_fib {         /* IPV4_FIB */
+	struct fib_entry_notifier_info {
+		struct fib_notifier_info info; /* must be first */
 		u32 dst;
 		int dst_len;
 		struct fib_info *fi;
 		u8 tos;
 		u8 type;
-		u32 nlflags;
 		u32 tb_id;
-	} ipv4_fib;
+		u32 nlflags;
+	};
 
 to add/modify/delete IPv4 dst/dest_len prefix on table tb_id.  The *fi
 structure holds details on the route and route's nexthops.  *dev is one of the
-port netdevs mentioned in the routes next hop list.  If the output port netdevs
-referenced in the route's nexthop list don't all have the same switch ID, the
-driver is not called to add/modify/delete the FIB entry.
+port netdevs mentioned in the routes next hop list.
 
 Routes offloaded to the device are labeled with "offload" in the ip route
 listing:
@@ -355,6 +354,8 @@ listing:
 	12.0.0.4 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
 	192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.15
 
+The "offload" flag is set in case at least one device offloads the FIB entry.
+
 XXX: add/mod/del IPv6 FIB API
 
 Nexthop Resolution
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH v4 net-next 00/16] tcp: BBR congestion control algorithm
From: Neal Cardwell @ 2016-09-21 11:53 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev
In-Reply-To: <20160921.004459.574306460460243541.davem@davemloft.net>

On Wed, Sep 21, 2016 at 12:44 AM, David Miller <davem@davemloft.net> wrote:
> From: Neal Cardwell <ncardwell@google.com>
> Date: Mon, 19 Sep 2016 23:39:07 -0400
>
>> tcp: BBR congestion control algorithm
>
> Series applied, thanks Neal.

Thanks, David!

neal

^ permalink raw reply

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
From: Thomas Graf @ 2016-09-21 11:55 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer
In-Reply-To: <CALx6S35LmUtwRA85Kg6s7OR=e5Pj9ssqmWLsjtmXfZTXeG02zQ@mail.gmail.com>

On 09/20/16 at 04:59pm, Tom Herbert wrote:
> Well, need to measure to ascertain the cost. As for complexity, this
> actually reduces complexity needed for XDP in the drivers which is a
> good thing because that's where most of the support and development
> pain will be.

I'm not objecting to anything that simplifies the process of adding
XDP capability to drivers. You have my full support here.

> I am looking at using this for ILA router. The problem I am hitting is
> that not all packets that we need to translate go through the XDP
> path. Some would go through the kernel path, some through XDP path but

When you say kernel path, what do you mean specifically? One aspect of
XDP I love is that XDP can act as an acceleration option for existing
BPF programs attached to cls_bpf. Support for direct packet read and
write at clsact level have made it straight forward to write programs
which are compatible or at minimum share a lot of common code. They
can share data structures, lookup functionality, etc.

> We can optimize for allowing only one hook, or maybe limit to only
> allowing one hook to be set. In any case this obviously requires a lot
> of performance evaluation, I am hoping to feedback on the design
> first. My question about using a linear list for this was real, do you
> know a better method off hand to implement a call list?

My main concern is that we overload the XDP hook. Instead of making use
of the programmable glue, we put a linked list in front where everybody
can attach a program to.

A possible alternative:
 1. The XDP hook always has single consumer controlled by the user
    through Netlink, BPF is one of them. If a user wants to hardcode
    the ILA router to that hook, he can do that.

 2. BPF for XDP is extended to allow returning a verdict which results
    in something else to be invoked. If user wants to invoke the ILA
    router for just some packets, he can do that.

That said, I see so much value in a BPF implementation of ILA at XDP
level with all of the parsing logic and exact semantics remain
flexible without the cost of translating some configuration to a set
of actions.

^ permalink raw reply

* Re: [PATCH v4 net-next 00/16] tcp: BBR congestion control algorithm
From: Thomas Graf @ 2016-09-21 12:51 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Miller, Netdev
In-Reply-To: <CADVnQynOD2Hrq994BPzTb+AgtOU=tpyRM+odjSv5gNYTSvbhBQ@mail.gmail.com>

On 09/21/16 at 07:53am, Neal Cardwell wrote:
> On Wed, Sep 21, 2016 at 12:44 AM, David Miller <davem@davemloft.net> wrote:
> > From: Neal Cardwell <ncardwell@google.com>
> > Date: Mon, 19 Sep 2016 23:39:07 -0400
> >
> >> tcp: BBR congestion control algorithm
> >
> > Series applied, thanks Neal.
> 
> Thanks, David!

Let me just say that this is super awesome. Thank you to everybody who
was involved!

^ permalink raw reply

* [PATCH stable 4.4] tipc: move linearization of buffers to generic code
From: Juerg Haefliger @ 2016-09-21 13:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: jonas.arndt, Jon Paul Maloy, Juerg Haefliger

From: Jon Paul Maloy <jon.maloy@ericsson.com>

commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5 upstream.

In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.

We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com>

---
This commit fixes an issue with nodes not joining a cluster over TIPC
after reboots. Upstream TIPC recommends to include this commit in the
stable kernel:
https://sourceforge.net/p/tipc/mailman/message/35368150/
---
 net/tipc/link.c       | 2 ++
 net/tipc/name_distr.c | 1 +
 net/tipc/udp_media.c  | 5 -----
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 91aea071ab27..72268eac4ec7 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1262,6 +1262,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
 		/* fall thru' */
 
 	case ACTIVATE_MSG:
+		skb_linearize(skb);
+		hdr = buf_msg(skb);
 
 		/* Complete own link name with peer's interface name */
 		if_name =  strrchr(l->name, ':') + 1;
diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
index c07612bab95c..f51c8bdbea1c 100644
--- a/net/tipc/name_distr.c
+++ b/net/tipc/name_distr.c
@@ -397,6 +397,7 @@ void tipc_named_rcv(struct net *net, struct sk_buff_head *inputq)
 
 	spin_lock_bh(&tn->nametbl_lock);
 	for (skb = skb_dequeue(inputq); skb; skb = skb_dequeue(inputq)) {
+		skb_linearize(skb);
 		msg = buf_msg(skb);
 		mtype = msg_type(msg);
 		item = (struct distr_item *)msg_data(msg);
diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index 70c03271b798..6af78c6276b4 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -48,7 +48,6 @@
 #include <linux/tipc_netlink.h>
 #include "core.h"
 #include "bearer.h"
-#include "msg.h"
 
 /* IANA assigned UDP port */
 #define UDP_PORT_DEFAULT	6118
@@ -224,10 +223,6 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct udp_bearer *ub;
 	struct tipc_bearer *b;
-	int usr = msg_user(buf_msg(skb));
-
-	if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR))
-		skb_linearize(skb);
 
 	ub = rcu_dereference_sk_user_data(sk);
 	if (!ub) {
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCHv7 net-next 00/15] BPF hardware offload (cls_bpf for now)
From: Daniel Borkmann @ 2016-09-21 13:35 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: ast
In-Reply-To: <1474454647-20137-1-git-send-email-jakub.kicinski@netronome.com>

On 09/21/2016 12:43 PM, Jakub Kicinski wrote:
[...]
> Rebased and improved.

Did a couple of tests with regards to the core bits with this set
applied on top of net-next and looks good to me now.

Thanks a lot, Jakub!

^ permalink raw reply

* [PATCH] net: fec: set mac address unconditionally
From: Gavin Schenk @ 2016-09-21 13:30 UTC (permalink / raw)
  To: fugang.duan; +Cc: netdev, kernel, Gavin Schenk

Fixes: 9638d19e4816 ("net: fec: add netif status check before set mac address")

If the mac address origin is not dt, you can only safe assign a
mac address after "link up" of the device. If the link is down the
clocks are disabled and because of issues assigning registers when
clocks are down the new mac address is discarded on some soc's. This fix
sets the mac address unconditionally in fec_restart(...) and ensures
consistens between fec registers and the network layer.

Signed-off-by: Gavin Schenk <g.schenk@eckelmann.de>
---
 drivers/net/ethernet/freescale/fec_main.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2a03857cca18..bdabea6cd981 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -903,13 +903,11 @@ fec_restart(struct net_device *ndev)
 	 * enet-mac reset will reset mac address registers too,
 	 * so need to reconfigure it.
 	 */
-	if (fep->quirks & FEC_QUIRK_ENET_MAC) {
-		memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
-		writel((__force u32)cpu_to_be32(temp_mac[0]),
-		       fep->hwp + FEC_ADDR_LOW);
-		writel((__force u32)cpu_to_be32(temp_mac[1]),
-		       fep->hwp + FEC_ADDR_HIGH);
-	}
+	memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
+	writel((__force u32)cpu_to_be32(temp_mac[0]),
+	       fep->hwp + FEC_ADDR_LOW);
+	writel((__force u32)cpu_to_be32(temp_mac[1]),
+	       fep->hwp + FEC_ADDR_HIGH);

 	/* Clear any outstanding interrupt. */
 	writel(0xffffffff, fep->hwp + FEC_IEVENT);
-- 
1.9.1

Eckelmann AG
Vorstand: Dipl.-Ing. Peter Frankenbach (Sprecher) Dipl.-Wi.-Ing. Philipp Eckelmann
Dr.-Ing. Frank-Thomas Mellert Dr.-Ing. Marco Münchhof Dr.-Ing. Frank Uhlemann
Vorsitzender des Aufsichtsrats: Hubertus G. Krossa
Sitz der Gesellschaft: Berliner Str. 161, 65205 Wiesbaden, Amtsgericht Wiesbaden HRB 12636
http://www.eckelmann.de 

^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: ethernet: mediatek: add the dts property to set if TRGMII supported on GMAC0
From: Andrew Lunn @ 2016-09-21 14:17 UTC (permalink / raw)
  To: Sean Wang; +Cc: john, davem, nbd, netdev, linux-mediatek, keyhaede
In-Reply-To: <1474438590-6855-1-git-send-email-sean.wang@mediatek.com>

On Wed, Sep 21, 2016 at 02:16:30PM +0800, Sean Wang wrote:
> Date: Tue, 20 Sep 2016 21:37:58 +0200, Andrew Lunn <andrew@lunn.ch> wrote:
> >On Tue, Sep 20, 2016 at 03:59:20PM +0800, sean.wang@mediatek.com wrote:
> >> From: Sean Wang <sean.wang@mediatek.com>
> >>
> >> Add the dts property for the capability if TRGMII supported on GAMC0
> >>
> >> Signed-off-by: Sean Wang <sean.wang@mediatek.com>
> >> ---
> >>  Documentation/devicetree/bindings/net/mediatek-net.txt | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
> >> index 6103e55..32f79d8 100644
> >> --- a/Documentation/devicetree/bindings/net/mediatek-net.txt
> >> +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
> >> @@ -31,7 +31,10 @@ Optional properties:
> >>  Required properties:
> >>  - compatible: Should be "mediatek,eth-mac"
> >>  - reg: The number of the MAC
> >> -- phy-handle: see ethernet.txt file in the same directory.
> >> +- phy-handle: see ethernet.txt file in the same directory and
> >> +     the additional phy-mode "tgrmii" is provided in order to connect
> >> +     with the internal switch MT7530 which is only applicable when reg
> >> +     is equal to 0.
> >
> >Humm. How is the switch connected? Is it on the MDIO bus?
> 
> the switch is connected to MDIO bus
> 
> >If it is on the mdio bus, the binding is going to look something like:
> >
> >eth: ethernet@1b100000 {
> >        compatible = "mediatek,mt7623-eth";
> >        reg = <0 0x1b100000 0 0x20000>;
> >        clocks = <&topckgen CLK_TOP_ETHIF_SEL>,
> >                 <&ethsys CLK_ETHSYS_ESW>,
> >                 <&ethsys CLK_ETHSYS_GP2>,
> >                 <&ethsys CLK_ETHSYS_GP1>;
> >        clock-names = "ethif", "esw", "gp2", "gp1";
> >        interrupts = <GIC_SPI 200 IRQ_TYPE_LEVEL_LOW
> >                      GIC_SPI 199 IRQ_TYPE_LEVEL_LOW
> >                      GIC_SPI 198 IRQ_TYPE_LEVEL_LOW>;
> >        power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
> >        resets = <&ethsys MT2701_ETHSYS_ETH_RST>;
> >        reset-names = "eth";
> >        mediatek,ethsys = <&ethsys>;
> >        mediatek,pctl = <&syscfg_pctl_a>;
> >        #address-cells = <1>;
> >        #size-cells = <0>;
> >
> >        gmac1: mac@0 {
> >                compatible = "mediatek,eth-mac";
> >                reg = <0>;
> >        };
> >
> >        gmac2: mac@1 {
> >                compatible = "mediatek,eth-mac";
> >                reg = <1>;
> >        };
> >
> >        mdio-bus {
> >               reg = <1>;
> >               #address-cells = <1>;
> >               #size-cells = <0>;
> >
> >               switch0: switch0@0 {
> >                       compatible = "marvell,mv88e6085";
> >                       #address-cells = <1>;
> >                       #size-cells = <0>;
> >                       reg = <0>;
> >                       dsa,member = <0 0>;
> >
> >                       ports {
> >                               #address-cells = <1>;
> >                               #size-cells = <0>;
> >                               port@0 {
> >                                       reg = <0>;
> >                                       label = "lan0";
> >...
> >...
> >In this case the switch is an MDIO device, not an PHY. It will not
> >have an phy-mode. It cannot have a phy mode, it is not a PHY.
> >
> >Or am i missing something here?
> >
> >Thanks
> >
> 
> 1)
> 
> The switch driver is not supported for DSA so far yet 
> but DSA is good thing and I will try make it happen
> in the near future.

O.K. But if i understand correctly, the TRGMII is so you can use the
switch. So it needs to work when you have DSA.
 
> And another question about DSA, that is
> if I use DSA for switch, how to know the relationship
> between MAC and DSA ? such like I could know relationship 
> between MAC and PHY by phy-handle.

It will look like what i stated above. But i missed the cpu node in
the ports, which is what you are asking about. There will also be a
node like:

                            port@6 {
                                     reg = <6>;
                                     label = "cpu";
                                     ethernet = <&gmac1>;
                             };

And this is how you couple the MAC to DSA.

> The cause I ask is becasue I think it's good if the topology
> about MAC/PHYs/Switch is known just by dts files.
> 
> 2)
> 
> The phy-mode I mention is for fixed-link. For current MAC driver, 
> it just uses fixed phy to adapt into the part of switch, so the 
> device tree looks something like the below. 
> 
> &eth {
>         status = "okay";
>         gmac0: mac@0 {
>                 compatible = "mediatek,eth-mac";
>                 reg = <0>;
>                 phy-mode = "trgmii";
>                 fixed-link {
>                         speed = <1000>;
>                         full-duplex;
>                         pause;
>                 };
>         };
> 
>         gmac1: mac@1 {
>                 compatible = "mediatek,eth-mac";
>                 reg = <1>;
>                 phy-handle = <&phy5>;
>         };


static int mtk_phy_connect(struct mtk_mac *mac)
{
        struct mtk_eth *eth = mac->hw;
        struct device_node *np;
        u32 val;

        np = of_parse_phandle(mac->of_node, "phy-handle", 0);
        if (!np && of_phy_is_fixed_link(mac->of_node))
                if (!of_phy_register_fixed_link(mac->of_node))
                        np = of_node_get(mac->of_node);
	...
        ...
        mtk_phy_connect_node(eth, mac, np);


So in the case of a fixed-phy, you do look in the MAC node, and when
there is a phy-handle, you look in the PHY node.

So this does work....

   Andrew

^ permalink raw reply

* RE: [RFC v2 04/12] qedr: Add support for user context verbs
From: Amrani, Ram @ 2016-09-21 14:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20160920164556.GS26673-2ukJVAZIZ/Y@public.gmane.org>

> Are you sure that you want GPL-only license?

Thanks for pointing this out. This is something we will obviously fix.

 
> > +#define QEDR_ABI_VERSION		(6)
> 
> Is it possible to start new file with new driver from ABI version 1 and not 6?

This is something that we cannot do since we already have a few libraries out there and we wouldn't like a user to use mismatch a library and a driver...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC v2 00/11] QLogic RDMA Driver (qedr) RFC
From: Amrani, Ram @ 2016-09-21 14:19 UTC (permalink / raw)
  To: Leon Romanovsky, Elior, Ariel
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20160920162306.GP26673-2ukJVAZIZ/Y@public.gmane.org>

> 
> The module parameters is no-go.
> 

We'll remove it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
From: Tom Herbert @ 2016-09-21 14:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer
In-Reply-To: <20160921115545.GA12789@pox.localdomain>

On Wed, Sep 21, 2016 at 4:55 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 09/20/16 at 04:59pm, Tom Herbert wrote:
>> Well, need to measure to ascertain the cost. As for complexity, this
>> actually reduces complexity needed for XDP in the drivers which is a
>> good thing because that's where most of the support and development
>> pain will be.
>
> I'm not objecting to anything that simplifies the process of adding
> XDP capability to drivers. You have my full support here.
>
>> I am looking at using this for ILA router. The problem I am hitting is
>> that not all packets that we need to translate go through the XDP
>> path. Some would go through the kernel path, some through XDP path but
>
> When you say kernel path, what do you mean specifically? One aspect of
> XDP I love is that XDP can act as an acceleration option for existing
> BPF programs attached to cls_bpf. Support for direct packet read and
> write at clsact level have made it straight forward to write programs
> which are compatible or at minimum share a lot of common code. They
> can share data structures, lookup functionality, etc.
>
>> We can optimize for allowing only one hook, or maybe limit to only
>> allowing one hook to be set. In any case this obviously requires a lot
>> of performance evaluation, I am hoping to feedback on the design
>> first. My question about using a linear list for this was real, do you
>> know a better method off hand to implement a call list?
>
> My main concern is that we overload the XDP hook. Instead of making use
> of the programmable glue, we put a linked list in front where everybody
> can attach a program to.
>
> A possible alternative:
>  1. The XDP hook always has single consumer controlled by the user
>     through Netlink, BPF is one of them. If a user wants to hardcode
>     the ILA router to that hook, he can do that.
>
>  2. BPF for XDP is extended to allow returning a verdict which results
>     in something else to be invoked. If user wants to invoke the ILA
>     router for just some packets, he can do that.
>
> That said, I see so much value in a BPF implementation of ILA at XDP
> level with all of the parsing logic and exact semantics remain
> flexible without the cost of translating some configuration to a set
> of actions.

Thomas,

As I mentioned though, not all packets we need to translate are going
to go through XDP. ILA already integrates with the routing table via
LWT and that solution works incredibly well especially when we are
sourcing connections (ILA is by far the fastest most efficient
technique of network virtualization). We already have a lookup table
and the translation process coded in the kernel and configuration is
already implemented by netlink and iproute. So I'm not yet sure I want
to redesign all of this around BPF, maybe that is the right answer
maybe it isn't; but, my point is I don't want to be _forced_ into a
certain design that because of constraints on one kernel interface. As
a kernel developer I want flexibility on how we design and implement
things!

I think there are two questions that this patch set poses for the
community wrt XDP:

#1: Should we allow alternate code to run in XDP other than BPF?
#2: If #1 is true what is the best way to implement that?

If the answer to #1 is "no" then the answer to #2 is irrelevant. So
with this RFC I'm hoping we can come the agreement on questions #1.

IMO, there are at least three factors that would motivate the answer
to #1 be yes

1) There is precedence in nfhooks in creating generic hooks in the
critical data path that can have other uses than those originally
intended by the authors. I would take it as general principle that
hooks in the data path should be as generic as possible to get the
most utility.
2) The changes to make XDP work in drivers are quite invasive and as
we've seen in at least one attempt to modify a driver to support XDP
it is easy to break other mechanisms. If we're going to be modifying a
lot of drivers we really want to get the most value for that work.
3) The XDP interface is well designed and really has no dependencies
on BPF. Writing code to directly parse packets is not rocket science.
So it seems reasonable to me that a subsystem may want to use XDP
interface to accelerate existing functionality without introducing
additional dependencies on BPF or other non-kernel mechanisms

In any case, I ask everyone to be open minded. If there is agreement
that BPF is sufficient for all XDP use cases then I wont' pursue these
patches. But given the ramifications of XDP, especially the
considering the changes required across drivers to support it, I
really would like to get input from the community on this.

Thanks,
Tom

^ permalink raw reply

* RE: [RFC v2 04/12] qedr: Add support for user context verbs
From: Amrani, Ram @ 2016-09-21 14:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem@davemloft.net, dledford@redhat.com, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20160920152729.GC32020@obsidianresearch.com>

> uapi headers need the __ varients and the #include <linux/types.h>
> 
> Follow what Leon has done.

OK

^ permalink raw reply

* RE: [RFC v2 06/12] qedr: Add support for QP verbs
From: Amrani, Ram @ 2016-09-21 14:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20160920152849.GD32020-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

> Ugh, each patch keeps adding to this?

The logic in the patch series is to have each patch contain only what is necessary for the specific functionality it adds.
This made it harder on us to prepare but, IMHO, easier for the reviewer.
If you'd like to have this file in one chunk, I can do this. What do you prefer?



> > +struct qedr_create_qp_ureq {
> > +       u32 qp_handle_hi;
> > +       u32 qp_handle_lo;
> > +
> > +       /* SQ */
> > +       /* user space virtual address of SQ buffer */
> > +       u64 sq_addr;
> > +
> > +       /* length of SQ buffer */
> > +       size_t sq_len;
> 
> Do not use size_t, that is just asking for 32/64 bit interop problems.

OK

> > +struct qedr_create_qp_uresp {
> > +       u32 qp_id;
> > +       int atomic_supported;
> 
> int again, etc. You get the idea.

Yeah, I did. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [RFC v2 06/12] qedr: Add support for QP verbs
From: Amrani, Ram @ 2016-09-21 14:15 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: davem@davemloft.net, dledford@redhat.com, Elior, Ariel,
	Kalderon, Michal, Mintz, Yuval, Borundia, Rajesh,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20160920163048.GQ26673@leon.nu>

> >  include/uapi/rdma/providers/qedr-abi.h     |   35 +
> 
> Please be aware that the last version for ABI header files doesn't have "provider"
> name in directory path (include/uapi/rdma/qedr-abi.h)

OK. Note that I was using https://www.spinics.net/lists/linux-rdma/msg40414.html that dates to 7 days ago and it contains the "providers" folder.
Can you point to the most recent patch?

> > diff --git a/drivers/infiniband/hw/qedr/main.c
> > b/drivers/infiniband/hw/qedr/main.c
> > index c99dd6a..10ad9ed 100644
> > --- a/drivers/infiniband/hw/qedr/main.c
> > +++ b/drivers/infiniband/hw/qedr/main.c
> > @@ -52,7 +52,7 @@ uint debug;
> >  module_param(debug, uint, 0);
> >  MODULE_PARM_DESC(debug, "Default debug msglevel");
> >
> > -static LIST_HEAD(qedr_dev_list);
> 
> You are removing code which was added in previous patches, why do you do it?
> 

Sure. Will fix. Thanks.

> > diff --git a/drivers/infiniband/hw/qedr/qedr.h
> > b/drivers/infiniband/hw/qedr/qedr.h
> > index dd974d5..05017be 100644
> > --- a/drivers/infiniband/hw/qedr/qedr.h
> > +++ b/drivers/infiniband/hw/qedr/qedr.h
> > @@ -49,6 +49,9 @@
> >  enum DP_QEDR_MODULE {
> >  	QEDR_MSG_INIT = 0x10000,
> 
> We prefer shift notation (1 << 16) instead of 0x10000.
> 

OK, I'll convert accordingly.

^ permalink raw reply

* Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP
From: Thomas Graf @ 2016-09-21 14:48 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, Linux Kernel Network Developers, Kernel Team,
	Tariq Toukan, Brenden Blanco, Alexei Starovoitov, Eric Dumazet,
	Jesper Dangaard Brouer
In-Reply-To: <CALx6S35Xctd-aA8DF1_vypMDejiGXcud6=UO33dqgvO60W0DZQ@mail.gmail.com>

On 09/21/16 at 07:19am, Tom Herbert wrote:
> certain design that because of constraints on one kernel interface. As
> a kernel developer I want flexibility on how we design and implement
> things!

Perfectly valid argument. I reviewed your ILA changes and did not
object to them.

> I think there are two questions that this patch set poses for the
> community wrt XDP:
> 
> #1: Should we allow alternate code to run in XDP other than BPF?
> #2: If #1 is true what is the best way to implement that?
> 
> If the answer to #1 is "no" then the answer to #2 is irrelevant. So
> with this RFC I'm hoping we can come the agreement on questions #1.

I'm not opposed to running non-BPF code at XDP. I'm against adding
a linked list of hook consumers.

Would anyone require to run XDP-BPF in combination ILA? Or XDP-BPF
in combination with a potential XDP-nftables? We don't know yet I
guess.

Maybe exclusive access to the hook for one consumer as selected by
the user is good enough.

If that is not good enough: BPF (and potentially nftables in the
future) could provide means to perform a selection process where a
helper call can run another XDP prog or return a verdict to trigger
another XDP prog. Definitely more flexible and faster than a linear
list doing  if, else if, else if, else if, ...

^ permalink raw reply

* [PATCH -next] net: dsa: qca8k: fix non static symbol warning
From: Wei Yongjun @ 2016-09-21 15:04 UTC (permalink / raw)
  To: John Crispin; +Cc: Wei Yongjun, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Fixes the following sparse warning:

drivers/net/dsa/qca8k.c:259:22: warning:
 symbol 'qca8k_regmap_config' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/dsa/qca8k.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 7f3f178..280acb0 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -256,7 +256,7 @@ static struct regmap_access_table qca8k_readable_table = {
 	.n_yes_ranges = ARRAY_SIZE(qca8k_readable_ranges),
 };
 
-struct regmap_config qca8k_regmap_config = {
+static struct regmap_config qca8k_regmap_config = {
 	.reg_bits = 16,
 	.val_bits = 32,
 	.reg_stride = 4,

^ permalink raw reply related

* [PATCH -next] net: dsa: qca8k: use mdio_module_driver to simplify the code
From: Wei Yongjun @ 2016-09-21 15:05 UTC (permalink / raw)
  To: John Crispin; +Cc: Wei Yongjun, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

mdio_module_driver() makes the code simpler by eliminating
boilerplate code.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/dsa/qca8k.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 7f3f178..022a6bc2 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -1040,19 +1040,7 @@ static struct mdio_driver qca8kmdio_driver = {
 	},
 };
 
-static int __init
-qca8kmdio_driver_register(void)
-{
-	return mdio_driver_register(&qca8kmdio_driver);
-}
-module_init(qca8kmdio_driver_register);
-
-static void __exit
-qca8kmdio_driver_unregister(void)
-{
-	mdio_driver_unregister(&qca8kmdio_driver);
-}
-module_exit(qca8kmdio_driver_unregister);
+mdio_module_driver(qca8kmdio_driver);
 
 MODULE_AUTHOR("Mathieu Olivari, John Crispin <john@phrozen.org>");
 MODULE_DESCRIPTION("Driver for QCA8K ethernet switch family");

^ permalink raw reply related

* Re: [RFC PATCH] net: Require socket to allow XPS to set queue mapping
From: Willem de Bruijn @ 2016-09-21 15:07 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, Alexander Duyck, Network Development
In-Reply-To: <3a1d23f2-d2c6-8d28-96cc-1b25b61b948c@hpe.com>

On Thu, Aug 25, 2016 at 5:28 PM, Rick Jones <rick.jones2@hpe.com> wrote:
> On 08/25/2016 02:08 PM, Eric Dumazet wrote:
>>
>> When XPS was submitted, it was _not_ enabled by default and 'magic'
>>
>> Some NIC vendors decided it was a good thing, you should complain to
>> them ;)
>
>
> I kindasorta am with the emails I've been sending to netdev :)  And also
> hopefully precluding others going down that path.

I recently came across another effect of configuring devices at
ndo_open. The behavior of `ip link set dev ${dev} down && ip link set
dev ${dev} up` is no longer consistent across devices. Drivers that
call netif_set_xps_queue overwrite a user supplied xps configuration,
others leave it in place.

This is easily demonstrated by adding this snippet to the loopback driver:

+static int loopback_dev_open(struct net_device *dev)
+{
+       cpumask_t mask;
+
+       cpumask_clear(&mask);
+       cpumask_set_cpu(0, &mask);
+
+       return netif_set_xps_queue(dev, &mask, 0);
+}
+
 static void loopback_dev_free(struct net_device *dev)
 {
        free_percpu(dev->lstats);
@@ -158,6 +168,7 @@ static void loopback_dev_free(struct net_device *dev)

 static const struct net_device_ops loopback_ops = {
        .ndo_init      = loopback_dev_init,
+       .ndo_open       = loopback_dev_open,

and running

fp=/sys/class/net/lo/queues/tx-0/xps_cpus

cat ${fp}
echo 8 > ${fp}
cat ${fp}
ip link set dev lo down
ip link set dev lo up
cat ${fp}

On a vanilla kernel, the output is {0, 8, 8} : the user-supplied xps
setting persists
With the patch, it is {1, 8, 1} : the driver sets, and resets, xps

A third patch that adds netif_set_xps_queue to loopback_dev_init gives
{1, 8, 8}. That is arguably the preferred behavior (sidestepping the
issue of whether device driver initialization is a good thing in
itself): init with a sane default, but do not override user
preference.

The fm10k and i40e drivers makes the call conditional on
test_and_set_bit(__FM10K_TX_XPS_INIT_DONE), so probably already
implement those semantics. In which case other drivers should probably
be updated to do the same. If all drivers are essentially trying to
set the same basic load balancing policy, this could also be lifted
out of drivers into __dev_open for consistency and code deduplication.

I'm pointing this out less for changing this xps feature, than as a
subtle effect to be aware of with any subsequent device driver policy
patches.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox