Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v2 16/18] bridge: switchdev: Allow clearing FDB entry offload indication
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Currently, an FDB entry only ceases being offloaded when it is deleted.
This changes with VxLAN encapsulation.

Devices capable of performing VxLAN encapsulation usually have only one
FDB table, unlike the software data path which has two - one in the
bridge driver and another in the VxLAN driver.

Therefore, bridge FDB entries pointing to a VxLAN device are only
offloaded if there is a corresponding entry in the VxLAN FDB.

Allow clearing the offload indication in case the corresponding entry
was deleted from the VxLAN FDB.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 9 +++++----
 drivers/net/ethernet/rocker/rocker_main.c                | 1 +
 include/net/switchdev.h                                  | 3 ++-
 net/bridge/br.c                                          | 4 ++--
 net/bridge/br_fdb.c                                      | 4 ++--
 net/bridge/br_private.h                                  | 2 +-
 net/bridge/br_switchdev.c                                | 9 ++++++---
 net/dsa/slave.c                                          | 1 +
 8 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index fa16ad2c6a50..a89075beef94 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2090,12 +2090,13 @@ void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
 static void
 mlxsw_sp_fdb_call_notifiers(enum switchdev_notifier_type type,
 			    const char *mac, u16 vid,
-			    struct net_device *dev)
+			    struct net_device *dev, bool offloaded)
 {
 	struct switchdev_notifier_fdb_info info;
 
 	info.addr = mac;
 	info.vid = vid;
+	info.offloaded = offloaded;
 	call_switchdev_notifiers(type, dev, &info.info);
 }
 
@@ -2147,7 +2148,7 @@ static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
 	if (!do_notification)
 		return;
 	type = adding ? SWITCHDEV_FDB_ADD_TO_BRIDGE : SWITCHDEV_FDB_DEL_TO_BRIDGE;
-	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev);
+	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev, adding);
 
 	return;
 
@@ -2207,7 +2208,7 @@ static void mlxsw_sp_fdb_notify_mac_lag_process(struct mlxsw_sp *mlxsw_sp,
 	if (!do_notification)
 		return;
 	type = adding ? SWITCHDEV_FDB_ADD_TO_BRIDGE : SWITCHDEV_FDB_DEL_TO_BRIDGE;
-	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev);
+	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev, adding);
 
 	return;
 
@@ -2312,7 +2313,7 @@ static void mlxsw_sp_switchdev_bridge_fdb_event_work(struct work_struct *work)
 			break;
 		mlxsw_sp_fdb_call_notifiers(SWITCHDEV_FDB_OFFLOADED,
 					    fdb_info->addr,
-					    fdb_info->vid, dev);
+					    fdb_info->vid, dev, true);
 		break;
 	case SWITCHDEV_FDB_DEL_TO_DEVICE:
 		fdb_info = &switchdev_work->fdb_info;
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index aeafdb9ac015..8721c0506af3 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2728,6 +2728,7 @@ rocker_fdb_offload_notify(struct rocker_port *rocker_port,
 
 	info.addr = recv_info->addr;
 	info.vid = recv_info->vid;
+	info.offloaded = true;
 	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
 				 rocker_port->dev, &info.info);
 }
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index b040f82351ba..881ecb1555bf 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -159,7 +159,8 @@ struct switchdev_notifier_fdb_info {
 	struct switchdev_notifier_info info; /* must be first */
 	const unsigned char *addr;
 	u16 vid;
-	bool added_by_user;
+	u8 added_by_user:1,
+	   offloaded:1;
 };
 
 static inline struct net_device *
diff --git a/net/bridge/br.c b/net/bridge/br.c
index e411e40333e2..360ad66c21e9 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -151,7 +151,7 @@ static int br_switchdev_event(struct notifier_block *unused,
 			break;
 		}
 		br_fdb_offloaded_set(br, p, fdb_info->addr,
-				     fdb_info->vid);
+				     fdb_info->vid, true);
 		break;
 	case SWITCHDEV_FDB_DEL_TO_BRIDGE:
 		fdb_info = ptr;
@@ -163,7 +163,7 @@ static int br_switchdev_event(struct notifier_block *unused,
 	case SWITCHDEV_FDB_OFFLOADED:
 		fdb_info = ptr;
 		br_fdb_offloaded_set(br, p, fdb_info->addr,
-				     fdb_info->vid);
+				     fdb_info->vid, fdb_info->offloaded);
 		break;
 	}
 
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 74331690a390..e56ba3912a90 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -1152,7 +1152,7 @@ int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
 }
 
 void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
-			  const unsigned char *addr, u16 vid)
+			  const unsigned char *addr, u16 vid, bool offloaded)
 {
 	struct net_bridge_fdb_entry *fdb;
 
@@ -1160,7 +1160,7 @@ void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
 
 	fdb = br_fdb_find(br, addr, vid);
 	if (fdb)
-		fdb->offloaded = 1;
+		fdb->offloaded = offloaded;
 
 	spin_unlock_bh(&br->hash_lock);
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 10ee39fdca5c..2920e06a5403 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -574,7 +574,7 @@ int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
 			      const unsigned char *addr, u16 vid,
 			      bool swdev_notify);
 void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
-			  const unsigned char *addr, u16 vid);
+			  const unsigned char *addr, u16 vid, bool offloaded);
 
 /* br_forward.c */
 enum br_pkt_type {
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index d77f807420c4..b993df770675 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -103,7 +103,7 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
 static void
 br_switchdev_fdb_call_notifiers(bool adding, const unsigned char *mac,
 				u16 vid, struct net_device *dev,
-				bool added_by_user)
+				bool added_by_user, bool offloaded)
 {
 	struct switchdev_notifier_fdb_info info;
 	unsigned long notifier_type;
@@ -111,6 +111,7 @@ br_switchdev_fdb_call_notifiers(bool adding, const unsigned char *mac,
 	info.addr = mac;
 	info.vid = vid;
 	info.added_by_user = added_by_user;
+	info.offloaded = offloaded;
 	notifier_type = adding ? SWITCHDEV_FDB_ADD_TO_DEVICE : SWITCHDEV_FDB_DEL_TO_DEVICE;
 	call_switchdev_notifiers(notifier_type, dev, &info.info);
 }
@@ -126,13 +127,15 @@ br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
 		br_switchdev_fdb_call_notifiers(false, fdb->key.addr.addr,
 						fdb->key.vlan_id,
 						fdb->dst->dev,
-						fdb->added_by_user);
+						fdb->added_by_user,
+						fdb->offloaded);
 		break;
 	case RTM_NEWNEIGH:
 		br_switchdev_fdb_call_notifiers(true, fdb->key.addr.addr,
 						fdb->key.vlan_id,
 						fdb->dst->dev,
-						fdb->added_by_user);
+						fdb->added_by_user,
+						fdb->offloaded);
 		break;
 	}
 }
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 3f840b6eea69..5428ef529019 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1478,6 +1478,7 @@ static void dsa_slave_switchdev_event_work(struct work_struct *work)
 			netdev_dbg(dev, "fdb add failed err=%d\n", err);
 			break;
 		}
+		fdb_info->offloaded = true;
 		call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED, dev,
 					 &fdb_info->info);
 		break;
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 11/18] net: Add netif_is_vxlan()
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Add the ability to determine whether a netdev is a VxLAN netdev by
calling the above mentioned function that checks the netdev's
rtnl_link_ops.

This will allow modules to identify netdev events involving a VxLAN
netdev and act accordingly. For example, drivers capable of VxLAN
offload will need to configure the underlying device when a VxLAN netdev
is being enslaved to an offloaded bridge.

Convert nfp to use the newly introduced helper.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 3 ++-
 include/net/vxlan.h                                     | 7 +++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 30c926c4bc47..8e5bec04d1f9 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -4,6 +4,7 @@
 #include <linux/etherdevice.h>
 #include <linux/inetdevice.h>
 #include <net/netevent.h>
+#include <net/vxlan.h>
 #include <linux/idr.h>
 #include <net/dst_metadata.h>
 #include <net/arp.h>
@@ -187,7 +188,7 @@ static bool nfp_tun_is_netdev_to_offload(struct net_device *netdev)
 		return false;
 	if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
 		return true;
-	if (!strcmp(netdev->rtnl_link_ops->kind, "vxlan"))
+	if (netif_is_vxlan(netdev))
 		return true;
 
 	return false;
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index dd3d72ce64b6..95227fa925e8 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -5,6 +5,7 @@
 #include <linux/if_vlan.h>
 #include <net/udp_tunnel.h>
 #include <net/dst_metadata.h>
+#include <net/rtnetlink.h>
 
 /* VXLAN protocol (RFC 7348) header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -402,4 +403,10 @@ static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
 
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
+static inline bool netif_is_vxlan(const struct net_device *dev)
+{
+	return dev->rtnl_link_ops &&
+	       !strcmp(dev->rtnl_link_ops->kind, "vxlan");
+}
+
 #endif
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 10/18] mlxsw: spectrum_router: Configure matching local routes for NVE decap
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

When a local route that matches the source IP of an offloaded NVE tunnel
is notified, the driver needs to program it to perform NVE decapsulation
instead of merely trapping packets to the CPU.

This patch complements "mlxsw: spectrum_router: Enable local routes
promotion to perform NVE decap" where existing local routes were
promoted to perform NVE decapsulation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 31b5491d6737..9e9bb57134f2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -4247,6 +4247,7 @@ mlxsw_sp_fib4_entry_type_set(struct mlxsw_sp *mlxsw_sp,
 			     struct mlxsw_sp_fib_entry *fib_entry)
 {
 	union mlxsw_sp_l3addr dip = { .addr4 = htonl(fen_info->dst) };
+	u32 tb_id = mlxsw_sp_fix_tb_id(fen_info->tb_id);
 	struct net_device *dev = fen_info->fi->fib_dev;
 	struct mlxsw_sp_ipip_entry *ipip_entry;
 	struct fib_info *fi = fen_info->fi;
@@ -4261,6 +4262,15 @@ mlxsw_sp_fib4_entry_type_set(struct mlxsw_sp *mlxsw_sp,
 							     fib_entry,
 							     ipip_entry);
 		}
+		if (mlxsw_sp_nve_ipv4_route_is_decap(mlxsw_sp, tb_id,
+						     dip.addr4)) {
+			u32 t_index;
+
+			t_index = mlxsw_sp_nve_decap_tunnel_index_get(mlxsw_sp);
+			fib_entry->decap.tunnel_index = t_index;
+			fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP;
+			return 0;
+		}
 		/* fall through */
 	case RTN_BROADCAST:
 		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 15/18] vxlan: Notify for each remote of a removed FDB entry
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

When notifications are sent about FDB activity, and an FDB entry with
several remotes is removed, the notification is sent only for the first
destination. That makes it impossible to distinguish between the case
where only this first remote is removed, and the one where the FDB entry
is removed as a whole.

Therefore send one notification for each remote of a removed FDB entry.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/vxlan.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e98fc54379f8..1d74f90d6f5d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -843,12 +843,15 @@ static void vxlan_fdb_free(struct rcu_head *head)
 static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
 			      bool do_notify)
 {
+	struct vxlan_rdst *rd;
+
 	netdev_dbg(vxlan->dev,
 		    "delete %pM\n", f->eth_addr);
 
 	--vxlan->addrcnt;
 	if (do_notify)
-		vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);
+		list_for_each_entry(rd, &f->remotes, list)
+			vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH);
 
 	hlist_del_rcu(&f->hlist);
 	call_rcu(&f->rcu, vxlan_fdb_free);
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 09/18] mlxsw: spectrum_fid: Clear NVE configuration when destroying 802.1D FIDs
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

802.1D FIDs are used to represent VLAN-unaware bridges and currently
this is the only type of FID that supports NVE configuration.

Since the NVE tunnel device does not take a reference on the FID, it is
possible for the FID to be destroyed when it still has NVE
configuration.

Therefore, when destroying the FID make sure to disable its NVE
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 0ba3d90d4632..a3db033d7399 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -538,6 +538,8 @@ static int mlxsw_sp_fid_8021d_configure(struct mlxsw_sp_fid *fid)
 
 static void mlxsw_sp_fid_8021d_deconfigure(struct mlxsw_sp_fid *fid)
 {
+	if (fid->vni_valid)
+		mlxsw_sp_nve_fid_disable(fid->fid_family->mlxsw_sp, fid);
 	mlxsw_sp_fid_op(fid->fid_family->mlxsw_sp, fid->fid_index, 0, false);
 }
 
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 14/18] vxlan: Support marking RDSTs as offloaded
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

Offloaded bridge FDB entries are marked with NTF_OFFLOADED. Implement a
similar mechanism for VXLAN, where a given remote destination can be
marked as offloaded.

To that end, introduce a new event, SWITCHDEV_VXLAN_FDB_OFFLOADED,
through which the marking is communicated to the vxlan driver. To
identify which RDST should be marked as offloaded, an
switchdev_notifier_vxlan_fdb_info is passed to the listeners. The
"offloaded" flag in that object determines whether the offloaded mark
should be set or cleared.

When sending offloaded FDB entries over netlink, mark them with
NTF_OFFLOADED.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/vxlan.c     | 59 ++++++++++++++++++++++++++++++++++++++++-
 include/net/switchdev.h |  1 +
 include/net/vxlan.h     |  2 ++
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 410eee23c50c..e98fc54379f8 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -272,6 +272,8 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
 	ndm->ndm_state = fdb->state;
 	ndm->ndm_ifindex = vxlan->dev->ifindex;
 	ndm->ndm_flags = fdb->flags;
+	if (rdst->offloaded)
+		ndm->ndm_flags |= NTF_OFFLOADED;
 	ndm->ndm_type = RTN_UNICAST;
 
 	if (!net_eq(dev_net(vxlan->dev), vxlan->net) &&
@@ -373,6 +375,7 @@ static void vxlan_fdb_switchdev_call_notifiers(struct vxlan_dev *vxlan,
 		.remote_vni = rd->remote_vni,
 		.remote_ifindex = rd->remote_ifindex,
 		.vni = fdb->vni,
+		.offloaded = rd->offloaded,
 	};
 	memcpy(info.eth_addr, fdb->eth_addr, ETH_ALEN);
 
@@ -536,6 +539,7 @@ int vxlan_fdb_find_uc(struct net_device *dev, const u8 *mac, __be32 vni,
 	fdb_info->remote_vni = rdst->remote_vni;
 	fdb_info->remote_ifindex = rdst->remote_ifindex;
 	fdb_info->vni = vni;
+	fdb_info->offloaded = rdst->offloaded;
 	ether_addr_copy(fdb_info->eth_addr, mac);
 
 out:
@@ -589,6 +593,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 
 	rd->remote_ip = *ip;
 	rd->remote_port = port;
+	rd->offloaded = false;
 	rd->remote_vni = vni;
 	rd->remote_ifindex = ifindex;
 
@@ -3817,6 +3822,51 @@ static struct notifier_block vxlan_notifier_block __read_mostly = {
 	.notifier_call = vxlan_netdevice_event,
 };
 
+static void
+vxlan_fdb_offloaded_set(struct net_device *dev,
+			struct switchdev_notifier_vxlan_fdb_info *fdb_info)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct vxlan_rdst *rdst;
+	struct vxlan_fdb *f;
+
+	spin_lock_bh(&vxlan->hash_lock);
+
+	f = vxlan_find_mac(vxlan, fdb_info->eth_addr, fdb_info->vni);
+	if (!f)
+		goto out;
+
+	rdst = vxlan_fdb_find_rdst(f, &fdb_info->remote_ip,
+				   fdb_info->remote_port,
+				   fdb_info->remote_vni,
+				   fdb_info->remote_ifindex);
+	if (!rdst)
+		goto out;
+
+	rdst->offloaded = fdb_info->offloaded;
+
+out:
+	spin_unlock_bh(&vxlan->hash_lock);
+}
+
+static int vxlan_switchdev_event(struct notifier_block *unused,
+				 unsigned long event, void *ptr)
+{
+	struct net_device *dev = switchdev_notifier_info_to_dev(ptr);
+
+	switch (event) {
+	case SWITCHDEV_VXLAN_FDB_OFFLOADED:
+		vxlan_fdb_offloaded_set(dev, ptr);
+		break;
+	}
+
+	return 0;
+}
+
+static struct notifier_block vxlan_switchdev_notifier_block __read_mostly = {
+	.notifier_call = vxlan_switchdev_event,
+};
+
 static __net_init int vxlan_init_net(struct net *net)
 {
 	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -3890,11 +3940,17 @@ static int __init vxlan_init_module(void)
 	if (rc)
 		goto out2;
 
-	rc = rtnl_link_register(&vxlan_link_ops);
+	rc = register_switchdev_notifier(&vxlan_switchdev_notifier_block);
 	if (rc)
 		goto out3;
 
+	rc = rtnl_link_register(&vxlan_link_ops);
+	if (rc)
+		goto out4;
+
 	return 0;
+out4:
+	unregister_switchdev_notifier(&vxlan_switchdev_notifier_block);
 out3:
 	unregister_netdevice_notifier(&vxlan_notifier_block);
 out2:
@@ -3907,6 +3963,7 @@ late_initcall(vxlan_init_module);
 static void __exit vxlan_cleanup_module(void)
 {
 	rtnl_link_unregister(&vxlan_link_ops);
+	unregister_switchdev_notifier(&vxlan_switchdev_notifier_block);
 	unregister_netdevice_notifier(&vxlan_notifier_block);
 	unregister_pernet_subsys(&vxlan_net_ops);
 	/* rcu_barrier() is called by netns */
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 47199a11c586..b040f82351ba 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -148,6 +148,7 @@ enum switchdev_notifier_type {
 
 	SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE,
 	SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE,
+	SWITCHDEV_VXLAN_FDB_OFFLOADED,
 };
 
 struct switchdev_notifier_info {
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 1828d686ac4f..03431c148e16 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -192,6 +192,7 @@ union vxlan_addr {
 struct vxlan_rdst {
 	union vxlan_addr	 remote_ip;
 	__be16			 remote_port;
+	u8			 offloaded:1;
 	__be32			 remote_vni;
 	u32			 remote_ifindex;
 	struct list_head	 list;
@@ -418,6 +419,7 @@ struct switchdev_notifier_vxlan_fdb_info {
 	u32 remote_ifindex;
 	u8 eth_addr[ETH_ALEN];
 	__be32 vni;
+	bool offloaded;
 };
 
 #if IS_ENABLED(CONFIG_VXLAN)
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 08/18] mlxsw: spectrum_nve: Implement VxLAN operations
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

The common NVE core expects each encapsulation type to implement a
certain set of operations that are specific to this type and the
currently used ASIC. These operations include things such as the ability
to determine whether a certain NVE configuration can be offloaded and
ASIC-specific initialization for this type.

Implement these operations for VxLAN on the Spectrum ASIC. Spectrum-2
support will be added by a future patchset.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 .../mellanox/mlxsw/spectrum_nve_vxlan.c       | 190 +++++++++++++++++-
 1 file changed, 188 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
index 4abc4d8a1e68..d21c7be5b1c9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -3,30 +3,216 @@
 
 #include <linux/netdevice.h>
 #include <linux/netlink.h>
+#include <linux/random.h>
+#include <net/vxlan.h>
 
+#include "reg.h"
 #include "spectrum_nve.h"
 
+/* Eth (18B) | IPv6 (40B) | UDP (8B) | VxLAN (8B) | Eth (14B) | IPv6 (40B)
+ *
+ * In the worst case - where we have a VLAN tag on the outer Ethernet
+ * header and IPv6 in overlay and underlay - we need to parse 128 bytes
+ */
+#define MLXSW_SP_NVE_VXLAN_PARSING_DEPTH 128
+#define MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH 96
+
+#define MLXSW_SP_NVE_VXLAN_SUPPORTED_FLAGS	VXLAN_F_UDP_ZERO_CSUM_TX
+
 static bool mlxsw_sp1_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve,
 					    const struct net_device *dev,
 					    struct netlink_ext_ack *extack)
 {
-	return false;
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct vxlan_config *cfg = &vxlan->cfg;
+
+	if (cfg->saddr.sa.sa_family != AF_INET) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Only IPv4 underlay is supported");
+		return false;
+	}
+
+	if (vxlan_addr_multicast(&cfg->remote_ip)) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Multicast destination IP is not supported");
+		return false;
+	}
+
+	if (vxlan_addr_any(&cfg->saddr)) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Source address must be specified");
+		return false;
+	}
+
+	if (cfg->remote_ifindex) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Local interface is not supported");
+		return false;
+	}
+
+	if (cfg->port_min || cfg->port_max) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Only default UDP source port range is supported");
+		return false;
+	}
+
+	if (cfg->tos != 1) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: TOS must be configured to inherit");
+		return false;
+	}
+
+	if (cfg->flags & VXLAN_F_TTL_INHERIT) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: TTL must not be configured to inherit");
+		return false;
+	}
+
+	if (cfg->flags & VXLAN_F_LEARN) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Learning is not supported");
+		return false;
+	}
+
+	if (!(cfg->flags & VXLAN_F_UDP_ZERO_CSUM_TX)) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: UDP checksum is not supported");
+		return false;
+	}
+
+	if (cfg->flags & ~MLXSW_SP_NVE_VXLAN_SUPPORTED_FLAGS) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Unsupported flag");
+		return false;
+	}
+
+	if (cfg->ttl == 0) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: TTL must not be configured to 0");
+		return false;
+	}
+
+	if (cfg->label != 0) {
+		NL_SET_ERR_MSG_MOD(extack, "VxLAN: Flow label must be configured to 0");
+		return false;
+	}
+
+	return true;
 }
 
 static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve,
 				      const struct net_device *dev,
 				      struct mlxsw_sp_nve_config *config)
 {
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	struct vxlan_config *cfg = &vxlan->cfg;
+
+	config->type = MLXSW_SP_NVE_TYPE_VXLAN;
+	config->ttl = cfg->ttl;
+	config->flowlabel = cfg->label;
+	config->learning_en = cfg->flags & VXLAN_F_LEARN ? 1 : 0;
+	config->ul_tb_id = RT_TABLE_MAIN;
+	config->ul_proto = MLXSW_SP_L3_PROTO_IPV4;
+	config->ul_sip.addr4 = cfg->saddr.sin.sin_addr.s_addr;
+	config->udp_dport = cfg->dst_port;
+}
+
+static int mlxsw_sp_nve_parsing_set(struct mlxsw_sp *mlxsw_sp,
+				    unsigned int parsing_depth,
+				    __be16 udp_dport)
+{
+	char mprs_pl[MLXSW_REG_MPRS_LEN];
+
+	mlxsw_reg_mprs_pack(mprs_pl, parsing_depth, be16_to_cpu(udp_dport));
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mprs), mprs_pl);
+}
+
+static int
+mlxsw_sp1_nve_vxlan_config_set(struct mlxsw_sp *mlxsw_sp,
+			       const struct mlxsw_sp_nve_config *config)
+{
+	char tngcr_pl[MLXSW_REG_TNGCR_LEN];
+	u16 ul_vr_id;
+	u8 udp_sport;
+	int err;
+
+	err = mlxsw_sp_router_tb_id_vr_id(mlxsw_sp, config->ul_tb_id,
+					  &ul_vr_id);
+	if (err)
+		return err;
+
+	mlxsw_reg_tngcr_pack(tngcr_pl, MLXSW_REG_TNGCR_TYPE_VXLAN, true,
+			     config->ttl);
+	/* VxLAN driver's default UDP source port range is 32768 (0x8000)
+	 * to 60999 (0xee47). Set the upper 8 bits of the UDP source port
+	 * to a random number between 0x80 and 0xee
+	 */
+	get_random_bytes(&udp_sport, sizeof(udp_sport));
+	udp_sport = (udp_sport % (0xee - 0x80 + 1)) + 0x80;
+	mlxsw_reg_tngcr_nve_udp_sport_prefix_set(tngcr_pl, udp_sport);
+	mlxsw_reg_tngcr_learn_enable_set(tngcr_pl, config->learning_en);
+	mlxsw_reg_tngcr_underlay_virtual_router_set(tngcr_pl, ul_vr_id);
+	mlxsw_reg_tngcr_usipv4_set(tngcr_pl, be32_to_cpu(config->ul_sip.addr4));
+
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tngcr), tngcr_pl);
+}
+
+static void mlxsw_sp1_nve_vxlan_config_clear(struct mlxsw_sp *mlxsw_sp)
+{
+	char tngcr_pl[MLXSW_REG_TNGCR_LEN];
+
+	mlxsw_reg_tngcr_pack(tngcr_pl, MLXSW_REG_TNGCR_TYPE_VXLAN, false, 0);
+
+	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tngcr), tngcr_pl);
+}
+
+static int mlxsw_sp1_nve_vxlan_rtdp_set(struct mlxsw_sp *mlxsw_sp,
+					unsigned int tunnel_index)
+{
+	char rtdp_pl[MLXSW_REG_RTDP_LEN];
+
+	mlxsw_reg_rtdp_pack(rtdp_pl, MLXSW_REG_RTDP_TYPE_NVE, tunnel_index);
+
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rtdp), rtdp_pl);
 }
 
 static int mlxsw_sp1_nve_vxlan_init(struct mlxsw_sp_nve *nve,
 				    const struct mlxsw_sp_nve_config *config)
 {
-	return -EOPNOTSUPP;
+	struct mlxsw_sp *mlxsw_sp = nve->mlxsw_sp;
+	int err;
+
+	err = mlxsw_sp_nve_parsing_set(mlxsw_sp,
+				       MLXSW_SP_NVE_VXLAN_PARSING_DEPTH,
+				       config->udp_dport);
+	if (err)
+		return err;
+
+	err = mlxsw_sp1_nve_vxlan_config_set(mlxsw_sp, config);
+	if (err)
+		goto err_config_set;
+
+	err = mlxsw_sp1_nve_vxlan_rtdp_set(mlxsw_sp, nve->tunnel_index);
+	if (err)
+		goto err_rtdp_set;
+
+	err = mlxsw_sp_router_nve_promote_decap(mlxsw_sp, config->ul_tb_id,
+						config->ul_proto,
+						&config->ul_sip,
+						nve->tunnel_index);
+	if (err)
+		goto err_promote_decap;
+
+	return 0;
+
+err_promote_decap:
+err_rtdp_set:
+	mlxsw_sp1_nve_vxlan_config_clear(mlxsw_sp);
+err_config_set:
+	mlxsw_sp_nve_parsing_set(mlxsw_sp, MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH,
+				 config->udp_dport);
+	return err;
 }
 
 static void mlxsw_sp1_nve_vxlan_fini(struct mlxsw_sp_nve *nve)
 {
+	struct mlxsw_sp_nve_config *config = &nve->config;
+	struct mlxsw_sp *mlxsw_sp = nve->mlxsw_sp;
+
+	mlxsw_sp_router_nve_demote_decap(mlxsw_sp, config->ul_tb_id,
+					 config->ul_proto, &config->ul_sip);
+	mlxsw_sp1_nve_vxlan_config_clear(mlxsw_sp);
+	mlxsw_sp_nve_parsing_set(mlxsw_sp, MLXSW_SP_NVE_DEFAULT_PARSING_DEPTH,
+				 config->udp_dport);
 }
 
 const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops = {
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 13/18] vxlan: Add vxlan_fdb_find_uc() for FDB querying
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

A switchdev-capable driver that is aware of VXLAN may need to query
VXLAN FDB. In the particular case of mlxsw, this functionality is
limited to querying UC FDBs. Those being easier to deal with than the
general case of RDST chain traversal, introduce an interface to query
specifically UC FDBs: vxlan_fdb_find_uc().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/vxlan.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/net/vxlan.h | 12 ++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index de5caa2f6aac..410eee23c50c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -504,6 +504,46 @@ static struct vxlan_rdst *vxlan_fdb_find_rdst(struct vxlan_fdb *f,
 	return NULL;
 }
 
+int vxlan_fdb_find_uc(struct net_device *dev, const u8 *mac, __be32 vni,
+		      struct switchdev_notifier_vxlan_fdb_info *fdb_info)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	u8 eth_addr[ETH_ALEN + 2] = { 0 };
+	struct vxlan_rdst *rdst;
+	struct vxlan_fdb *f;
+	int rc = 0;
+
+	if (is_multicast_ether_addr(mac) ||
+	    is_zero_ether_addr(mac))
+		return -EINVAL;
+
+	ether_addr_copy(eth_addr, mac);
+
+	rcu_read_lock();
+
+	f = __vxlan_find_mac(vxlan, eth_addr, vni);
+	if (!f) {
+		rc = -ENOENT;
+		goto out;
+	}
+
+	rdst = first_remote_rcu(f);
+
+	memset(fdb_info, 0, sizeof(*fdb_info));
+	fdb_info->info.dev = dev;
+	fdb_info->remote_ip = rdst->remote_ip;
+	fdb_info->remote_port = rdst->remote_port;
+	fdb_info->remote_vni = rdst->remote_vni;
+	fdb_info->remote_ifindex = rdst->remote_ifindex;
+	fdb_info->vni = vni;
+	ether_addr_copy(fdb_info->eth_addr, mac);
+
+out:
+	rcu_read_unlock();
+	return rc;
+}
+EXPORT_SYMBOL_GPL(vxlan_fdb_find_uc);
+
 /* Replace destination of unicast mac */
 static int vxlan_fdb_replace(struct vxlan_fdb *f,
 			     union vxlan_addr *ip, __be16 port, __be32 vni,
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 3f00877f5edf..1828d686ac4f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -420,4 +420,16 @@ struct switchdev_notifier_vxlan_fdb_info {
 	__be32 vni;
 };
 
+#if IS_ENABLED(CONFIG_VXLAN)
+int vxlan_fdb_find_uc(struct net_device *dev, const u8 *mac, __be32 vni,
+		      struct switchdev_notifier_vxlan_fdb_info *fdb_info);
+#else
+static inline int
+vxlan_fdb_find_uc(struct net_device *dev, const u8 *mac, __be32 vni,
+		  struct switchdev_notifier_vxlan_fdb_info *fdb_info)
+{
+	return -ENOENT;
+}
+#endif
+
 #endif
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 12/18] vxlan: Add switchdev notifications
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

When offloading VXLAN devices, drivers need to know about events in
VXLAN FDB database. Since VXLAN models a bridge, it is natural to
distribute the VXLAN FDB notifications using the pre-existing switchdev
notification mechanism.

To that end, introduce two new notification types:
SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE and SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE.
Introduce a new function, vxlan_fdb_switchdev_call_notifiers() to send
the new notifier types, and a struct switchdev_notifier_vxlan_fdb_info
to communicate the details of the FDB entry under consideration.

Invoke the new function from vxlan_fdb_notify().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/vxlan.c     | 46 +++++++++++++++++++++++++++++++++++++++--
 include/net/switchdev.h |  3 +++
 include/net/vxlan.h     | 11 ++++++++++
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 0fc2b1d82d4c..de5caa2f6aac 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -327,8 +327,8 @@ static inline size_t vxlan_nlmsg_size(void)
 		+ nla_total_size(sizeof(struct nda_cacheinfo));
 }
 
-static void vxlan_fdb_notify(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
-			     struct vxlan_rdst *rd, int type)
+static void __vxlan_fdb_notify(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
+			       struct vxlan_rdst *rd, int type)
 {
 	struct net *net = dev_net(vxlan->dev);
 	struct sk_buff *skb;
@@ -353,6 +353,48 @@ static void vxlan_fdb_notify(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
 		rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
 }
 
+static void vxlan_fdb_switchdev_call_notifiers(struct vxlan_dev *vxlan,
+					       struct vxlan_fdb *fdb,
+					       struct vxlan_rdst *rd,
+					       bool adding)
+{
+	struct switchdev_notifier_vxlan_fdb_info info;
+	enum switchdev_notifier_type notifier_type;
+
+	if (WARN_ON(!rd))
+		return;
+
+	notifier_type = adding ? SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE
+			       : SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE;
+
+	info = (struct switchdev_notifier_vxlan_fdb_info){
+		.remote_ip = rd->remote_ip,
+		.remote_port = rd->remote_port,
+		.remote_vni = rd->remote_vni,
+		.remote_ifindex = rd->remote_ifindex,
+		.vni = fdb->vni,
+	};
+	memcpy(info.eth_addr, fdb->eth_addr, ETH_ALEN);
+
+	call_switchdev_notifiers(notifier_type, vxlan->dev,
+				 &info.info);
+}
+
+static void vxlan_fdb_notify(struct vxlan_dev *vxlan, struct vxlan_fdb *fdb,
+			     struct vxlan_rdst *rd, int type)
+{
+	switch (type) {
+	case RTM_NEWNEIGH:
+		vxlan_fdb_switchdev_call_notifiers(vxlan, fdb, rd, true);
+		break;
+	case RTM_DELNEIGH:
+		vxlan_fdb_switchdev_call_notifiers(vxlan, fdb, rd, false);
+		break;
+	}
+
+	__vxlan_fdb_notify(vxlan, fdb, rd, type);
+}
+
 static void vxlan_ip_miss(struct net_device *dev, union vxlan_addr *ipa)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index d574ce63bf22..47199a11c586 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -145,6 +145,9 @@ enum switchdev_notifier_type {
 	SWITCHDEV_FDB_ADD_TO_DEVICE,
 	SWITCHDEV_FDB_DEL_TO_DEVICE,
 	SWITCHDEV_FDB_OFFLOADED,
+
+	SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE,
+	SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE,
 };
 
 struct switchdev_notifier_info {
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 95227fa925e8..3f00877f5edf 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -6,6 +6,7 @@
 #include <net/udp_tunnel.h>
 #include <net/dst_metadata.h>
 #include <net/rtnetlink.h>
+#include <net/switchdev.h>
 
 /* VXLAN protocol (RFC 7348) header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -409,4 +410,14 @@ static inline bool netif_is_vxlan(const struct net_device *dev)
 	       !strcmp(dev->rtnl_link_ops->kind, "vxlan");
 }
 
+struct switchdev_notifier_vxlan_fdb_info {
+	struct switchdev_notifier_info info; /* must be first */
+	union vxlan_addr remote_ip;
+	__be16 remote_port;
+	__be32 remote_vni;
+	u32 remote_ifindex;
+	u8 eth_addr[ETH_ALEN];
+	__be32 vni;
+};
+
 #endif
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 07/18] mlxsw: spectrum_nve: Implement common NVE core
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

The Spectrum ASIC supports different types of NVE encapsulations (e.g.,
VxLAN, NVGRE) with more types to be supported by future ASICs.

Despite being different, all these encapsulations share some common
functionality such as the enablement of NVE encapsulation on a given
filtering identifier (FID) and the addition of remote VTEPs to the
linked-list of VTEPs that traffic should be flooded to.

Implement this common core and allow different ASICs to register
different operations for different encapsulation types.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile  |   3 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |  21 +
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  41 +
 .../ethernet/mellanox/mlxsw/spectrum_nve.c    | 982 ++++++++++++++++++
 .../ethernet/mellanox/mlxsw/spectrum_nve.h    |  49 +
 .../mellanox/mlxsw/spectrum_nve_vxlan.c       |  63 ++
 6 files changed, 1158 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 68fa44a41485..1f77e97e2d7a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -27,7 +27,8 @@ mlxsw_spectrum-objs		:= spectrum.o spectrum_buffers.o \
 				   spectrum_acl_flex_keys.o \
 				   spectrum1_mr_tcam.o spectrum2_mr_tcam.o \
 				   spectrum_mr_tcam.o spectrum_mr.o \
-				   spectrum_qdisc.o spectrum_span.o
+				   spectrum_qdisc.o spectrum_span.o \
+				   spectrum_nve.o spectrum_nve_vxlan.o
 mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)	+= spectrum_dcb.o
 mlxsw_spectrum-$(CONFIG_NET_DEVLINK) += spectrum_dpipe.o
 obj-$(CONFIG_MLXSW_MINIMAL)	+= mlxsw_minimal.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index ed7e4c4e7403..68079b16adfa 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2994,6 +2994,13 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 		goto err_port_qdiscs_init;
 	}
 
+	err = mlxsw_sp_port_nve_init(mlxsw_sp_port);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to initialize NVE\n",
+			mlxsw_sp_port->local_port);
+		goto err_port_nve_init;
+	}
+
 	mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_get(mlxsw_sp_port, 1);
 	if (IS_ERR(mlxsw_sp_port_vlan)) {
 		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to create VID 1\n",
@@ -3022,6 +3029,8 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 	mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
 	mlxsw_sp_port_vlan_put(mlxsw_sp_port_vlan);
 err_port_vlan_get:
+	mlxsw_sp_port_nve_fini(mlxsw_sp_port);
+err_port_nve_init:
 	mlxsw_sp_tc_qdisc_fini(mlxsw_sp_port);
 err_port_qdiscs_init:
 	mlxsw_sp_port_fids_fini(mlxsw_sp_port);
@@ -3061,6 +3070,7 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	mlxsw_sp->ports[local_port] = NULL;
 	mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
 	mlxsw_sp_port_vlan_flush(mlxsw_sp_port);
+	mlxsw_sp_port_nve_fini(mlxsw_sp_port);
 	mlxsw_sp_tc_qdisc_fini(mlxsw_sp_port);
 	mlxsw_sp_port_fids_fini(mlxsw_sp_port);
 	mlxsw_sp_port_dcb_fini(mlxsw_sp_port);
@@ -3795,6 +3805,12 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 		goto err_afa_init;
 	}
 
+	err = mlxsw_sp_nve_init(mlxsw_sp);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize NVE\n");
+		goto err_nve_init;
+	}
+
 	err = mlxsw_sp_router_init(mlxsw_sp);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize router\n");
@@ -3841,6 +3857,8 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 err_netdev_notifier:
 	mlxsw_sp_router_fini(mlxsw_sp);
 err_router_init:
+	mlxsw_sp_nve_fini(mlxsw_sp);
+err_nve_init:
 	mlxsw_sp_afa_fini(mlxsw_sp);
 err_afa_init:
 	mlxsw_sp_counter_pool_fini(mlxsw_sp);
@@ -3873,6 +3891,7 @@ static int mlxsw_sp1_init(struct mlxsw_core *mlxsw_core,
 	mlxsw_sp->afk_ops = &mlxsw_sp1_afk_ops;
 	mlxsw_sp->mr_tcam_ops = &mlxsw_sp1_mr_tcam_ops;
 	mlxsw_sp->acl_tcam_ops = &mlxsw_sp1_acl_tcam_ops;
+	mlxsw_sp->nve_ops_arr = mlxsw_sp1_nve_ops_arr;
 
 	return mlxsw_sp_init(mlxsw_core, mlxsw_bus_info);
 }
@@ -3887,6 +3906,7 @@ static int mlxsw_sp2_init(struct mlxsw_core *mlxsw_core,
 	mlxsw_sp->afk_ops = &mlxsw_sp2_afk_ops;
 	mlxsw_sp->mr_tcam_ops = &mlxsw_sp2_mr_tcam_ops;
 	mlxsw_sp->acl_tcam_ops = &mlxsw_sp2_acl_tcam_ops;
+	mlxsw_sp->nve_ops_arr = mlxsw_sp2_nve_ops_arr;
 
 	return mlxsw_sp_init(mlxsw_core, mlxsw_bus_info);
 }
@@ -3900,6 +3920,7 @@ static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
 	mlxsw_sp_acl_fini(mlxsw_sp);
 	unregister_netdevice_notifier(&mlxsw_sp->netdevice_nb);
 	mlxsw_sp_router_fini(mlxsw_sp);
+	mlxsw_sp_nve_fini(mlxsw_sp);
 	mlxsw_sp_afa_fini(mlxsw_sp);
 	mlxsw_sp_counter_pool_fini(mlxsw_sp);
 	mlxsw_sp_switchdev_fini(mlxsw_sp);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index d0a73935e8bb..2d5eca78576a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -55,6 +55,8 @@ enum mlxsw_sp_resource_id {
 struct mlxsw_sp_port;
 struct mlxsw_sp_rif;
 struct mlxsw_sp_span_entry;
+enum mlxsw_sp_l3proto;
+union mlxsw_sp_l3addr;
 
 struct mlxsw_sp_upper {
 	struct net_device *dev;
@@ -113,9 +115,11 @@ struct mlxsw_sp_acl;
 struct mlxsw_sp_counter_pool;
 struct mlxsw_sp_fid_core;
 struct mlxsw_sp_kvdl;
+struct mlxsw_sp_nve;
 struct mlxsw_sp_kvdl_ops;
 struct mlxsw_sp_mr_tcam_ops;
 struct mlxsw_sp_acl_tcam_ops;
+struct mlxsw_sp_nve_ops;
 
 struct mlxsw_sp {
 	struct mlxsw_sp_port **ports;
@@ -132,6 +136,7 @@ struct mlxsw_sp {
 	struct mlxsw_sp_acl *acl;
 	struct mlxsw_sp_fid_core *fid_core;
 	struct mlxsw_sp_kvdl *kvdl;
+	struct mlxsw_sp_nve *nve;
 	struct notifier_block netdevice_nb;
 
 	struct mlxsw_sp_counter_pool *counter_pool;
@@ -146,6 +151,7 @@ struct mlxsw_sp {
 	const struct mlxsw_afk_ops *afk_ops;
 	const struct mlxsw_sp_mr_tcam_ops *mr_tcam_ops;
 	const struct mlxsw_sp_acl_tcam_ops *acl_tcam_ops;
+	const struct mlxsw_sp_nve_ops **nve_ops_arr;
 };
 
 static inline struct mlxsw_sp_upper *
@@ -763,4 +769,39 @@ extern const struct mlxsw_sp_mr_tcam_ops mlxsw_sp1_mr_tcam_ops;
 /* spectrum2_mr_tcam.c */
 extern const struct mlxsw_sp_mr_tcam_ops mlxsw_sp2_mr_tcam_ops;
 
+/* spectrum_nve.c */
+enum mlxsw_sp_nve_type {
+	MLXSW_SP_NVE_TYPE_VXLAN,
+};
+
+struct mlxsw_sp_nve_params {
+	enum mlxsw_sp_nve_type type;
+	__be32 vni;
+	const struct net_device *dev;
+};
+
+extern const struct mlxsw_sp_nve_ops *mlxsw_sp1_nve_ops_arr[];
+extern const struct mlxsw_sp_nve_ops *mlxsw_sp2_nve_ops_arr[];
+
+int mlxsw_sp_nve_flood_ip_add(struct mlxsw_sp *mlxsw_sp,
+			      struct mlxsw_sp_fid *fid,
+			      enum mlxsw_sp_l3proto proto,
+			      union mlxsw_sp_l3addr *addr);
+void mlxsw_sp_nve_flood_ip_del(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_fid *fid,
+			       enum mlxsw_sp_l3proto proto,
+			       union mlxsw_sp_l3addr *addr);
+u32 mlxsw_sp_nve_decap_tunnel_index_get(const struct mlxsw_sp *mlxsw_sp);
+bool mlxsw_sp_nve_ipv4_route_is_decap(const struct mlxsw_sp *mlxsw_sp,
+				      u32 tb_id, __be32 addr);
+int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_fid *fid,
+			    struct mlxsw_sp_nve_params *params,
+			    struct netlink_ext_ack *extack);
+void mlxsw_sp_nve_fid_disable(struct mlxsw_sp *mlxsw_sp,
+			      struct mlxsw_sp_fid *fid);
+int mlxsw_sp_port_nve_init(struct mlxsw_sp_port *mlxsw_sp_port);
+void mlxsw_sp_port_nve_fini(struct mlxsw_sp_port *mlxsw_sp_port);
+int mlxsw_sp_nve_init(struct mlxsw_sp *mlxsw_sp);
+void mlxsw_sp_nve_fini(struct mlxsw_sp *mlxsw_sp);
+
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
new file mode 100644
index 000000000000..ad06d9969bc1
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -0,0 +1,982 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/* Copyright (c) 2018 Mellanox Technologies. All rights reserved */
+
+#include <linux/err.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/netlink.h>
+#include <linux/rtnetlink.h>
+#include <linux/slab.h>
+#include <net/inet_ecn.h>
+#include <net/ipv6.h>
+
+#include "reg.h"
+#include "spectrum.h"
+#include "spectrum_nve.h"
+
+const struct mlxsw_sp_nve_ops *mlxsw_sp1_nve_ops_arr[] = {
+	[MLXSW_SP_NVE_TYPE_VXLAN]	= &mlxsw_sp1_nve_vxlan_ops,
+};
+
+const struct mlxsw_sp_nve_ops *mlxsw_sp2_nve_ops_arr[] = {
+	[MLXSW_SP_NVE_TYPE_VXLAN]	= &mlxsw_sp2_nve_vxlan_ops,
+};
+
+struct mlxsw_sp_nve_mc_entry;
+struct mlxsw_sp_nve_mc_record;
+struct mlxsw_sp_nve_mc_list;
+
+struct mlxsw_sp_nve_mc_record_ops {
+	enum mlxsw_reg_tnumt_record_type type;
+	int (*entry_add)(struct mlxsw_sp_nve_mc_record *mc_record,
+			 struct mlxsw_sp_nve_mc_entry *mc_entry,
+			 const union mlxsw_sp_l3addr *addr);
+	void (*entry_del)(const struct mlxsw_sp_nve_mc_record *mc_record,
+			  const struct mlxsw_sp_nve_mc_entry *mc_entry);
+	void (*entry_set)(const struct mlxsw_sp_nve_mc_record *mc_record,
+			  const struct mlxsw_sp_nve_mc_entry *mc_entry,
+			  char *tnumt_pl, unsigned int entry_index);
+	bool (*entry_compare)(const struct mlxsw_sp_nve_mc_record *mc_record,
+			      const struct mlxsw_sp_nve_mc_entry *mc_entry,
+			      const union mlxsw_sp_l3addr *addr);
+};
+
+struct mlxsw_sp_nve_mc_list_key {
+	u16 fid_index;
+};
+
+struct mlxsw_sp_nve_mc_ipv6_entry {
+	struct in6_addr addr6;
+	u32 addr6_kvdl_index;
+};
+
+struct mlxsw_sp_nve_mc_entry {
+	union {
+		__be32 addr4;
+		struct mlxsw_sp_nve_mc_ipv6_entry ipv6_entry;
+	};
+	u8 valid:1;
+};
+
+struct mlxsw_sp_nve_mc_record {
+	struct list_head list;
+	enum mlxsw_sp_l3proto proto;
+	unsigned int num_entries;
+	struct mlxsw_sp *mlxsw_sp;
+	struct mlxsw_sp_nve_mc_list *mc_list;
+	const struct mlxsw_sp_nve_mc_record_ops *ops;
+	u32 kvdl_index;
+	struct mlxsw_sp_nve_mc_entry entries[0];
+};
+
+struct mlxsw_sp_nve_mc_list {
+	struct list_head records_list;
+	struct rhash_head ht_node;
+	struct mlxsw_sp_nve_mc_list_key key;
+};
+
+static const struct rhashtable_params mlxsw_sp_nve_mc_list_ht_params = {
+	.key_len = sizeof(struct mlxsw_sp_nve_mc_list_key),
+	.key_offset = offsetof(struct mlxsw_sp_nve_mc_list, key),
+	.head_offset = offsetof(struct mlxsw_sp_nve_mc_list, ht_node),
+};
+
+static int
+mlxsw_sp_nve_mc_record_ipv4_entry_add(struct mlxsw_sp_nve_mc_record *mc_record,
+				      struct mlxsw_sp_nve_mc_entry *mc_entry,
+				      const union mlxsw_sp_l3addr *addr)
+{
+	mc_entry->addr4 = addr->addr4;
+
+	return 0;
+}
+
+static void
+mlxsw_sp_nve_mc_record_ipv4_entry_del(const struct mlxsw_sp_nve_mc_record *mc_record,
+				      const struct mlxsw_sp_nve_mc_entry *mc_entry)
+{
+}
+
+static void
+mlxsw_sp_nve_mc_record_ipv4_entry_set(const struct mlxsw_sp_nve_mc_record *mc_record,
+				      const struct mlxsw_sp_nve_mc_entry *mc_entry,
+				      char *tnumt_pl, unsigned int entry_index)
+{
+	u32 udip = be32_to_cpu(mc_entry->addr4);
+
+	mlxsw_reg_tnumt_udip_set(tnumt_pl, entry_index, udip);
+}
+
+static bool
+mlxsw_sp_nve_mc_record_ipv4_entry_compare(const struct mlxsw_sp_nve_mc_record *mc_record,
+					  const struct mlxsw_sp_nve_mc_entry *mc_entry,
+					  const union mlxsw_sp_l3addr *addr)
+{
+	return mc_entry->addr4 == addr->addr4;
+}
+
+static const struct mlxsw_sp_nve_mc_record_ops
+mlxsw_sp_nve_mc_record_ipv4_ops = {
+	.type		= MLXSW_REG_TNUMT_RECORD_TYPE_IPV4,
+	.entry_add	= &mlxsw_sp_nve_mc_record_ipv4_entry_add,
+	.entry_del	= &mlxsw_sp_nve_mc_record_ipv4_entry_del,
+	.entry_set	= &mlxsw_sp_nve_mc_record_ipv4_entry_set,
+	.entry_compare	= &mlxsw_sp_nve_mc_record_ipv4_entry_compare,
+};
+
+static int
+mlxsw_sp_nve_mc_record_ipv6_entry_add(struct mlxsw_sp_nve_mc_record *mc_record,
+				      struct mlxsw_sp_nve_mc_entry *mc_entry,
+				      const union mlxsw_sp_l3addr *addr)
+{
+	WARN_ON(1);
+
+	return -EINVAL;
+}
+
+static void
+mlxsw_sp_nve_mc_record_ipv6_entry_del(const struct mlxsw_sp_nve_mc_record *mc_record,
+				      const struct mlxsw_sp_nve_mc_entry *mc_entry)
+{
+}
+
+static void
+mlxsw_sp_nve_mc_record_ipv6_entry_set(const struct mlxsw_sp_nve_mc_record *mc_record,
+				      const struct mlxsw_sp_nve_mc_entry *mc_entry,
+				      char *tnumt_pl, unsigned int entry_index)
+{
+	u32 udip_ptr = mc_entry->ipv6_entry.addr6_kvdl_index;
+
+	mlxsw_reg_tnumt_udip_ptr_set(tnumt_pl, entry_index, udip_ptr);
+}
+
+static bool
+mlxsw_sp_nve_mc_record_ipv6_entry_compare(const struct mlxsw_sp_nve_mc_record *mc_record,
+					  const struct mlxsw_sp_nve_mc_entry *mc_entry,
+					  const union mlxsw_sp_l3addr *addr)
+{
+	return ipv6_addr_equal(&mc_entry->ipv6_entry.addr6, &addr->addr6);
+}
+
+static const struct mlxsw_sp_nve_mc_record_ops
+mlxsw_sp_nve_mc_record_ipv6_ops = {
+	.type		= MLXSW_REG_TNUMT_RECORD_TYPE_IPV6,
+	.entry_add	= &mlxsw_sp_nve_mc_record_ipv6_entry_add,
+	.entry_del	= &mlxsw_sp_nve_mc_record_ipv6_entry_del,
+	.entry_set	= &mlxsw_sp_nve_mc_record_ipv6_entry_set,
+	.entry_compare	= &mlxsw_sp_nve_mc_record_ipv6_entry_compare,
+};
+
+static const struct mlxsw_sp_nve_mc_record_ops *
+mlxsw_sp_nve_mc_record_ops_arr[] = {
+	[MLXSW_SP_L3_PROTO_IPV4] = &mlxsw_sp_nve_mc_record_ipv4_ops,
+	[MLXSW_SP_L3_PROTO_IPV6] = &mlxsw_sp_nve_mc_record_ipv6_ops,
+};
+
+static struct mlxsw_sp_nve_mc_list *
+mlxsw_sp_nve_mc_list_find(struct mlxsw_sp *mlxsw_sp,
+			  const struct mlxsw_sp_nve_mc_list_key *key)
+{
+	struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+
+	return rhashtable_lookup_fast(&nve->mc_list_ht, key,
+				      mlxsw_sp_nve_mc_list_ht_params);
+}
+
+static struct mlxsw_sp_nve_mc_list *
+mlxsw_sp_nve_mc_list_create(struct mlxsw_sp *mlxsw_sp,
+			    const struct mlxsw_sp_nve_mc_list_key *key)
+{
+	struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+	struct mlxsw_sp_nve_mc_list *mc_list;
+	int err;
+
+	mc_list = kmalloc(sizeof(*mc_list), GFP_KERNEL);
+	if (!mc_list)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&mc_list->records_list);
+	mc_list->key = *key;
+
+	err = rhashtable_insert_fast(&nve->mc_list_ht, &mc_list->ht_node,
+				     mlxsw_sp_nve_mc_list_ht_params);
+	if (err)
+		goto err_rhashtable_insert;
+
+	return mc_list;
+
+err_rhashtable_insert:
+	kfree(mc_list);
+	return ERR_PTR(err);
+}
+
+static void mlxsw_sp_nve_mc_list_destroy(struct mlxsw_sp *mlxsw_sp,
+					 struct mlxsw_sp_nve_mc_list *mc_list)
+{
+	struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+
+	rhashtable_remove_fast(&nve->mc_list_ht, &mc_list->ht_node,
+			       mlxsw_sp_nve_mc_list_ht_params);
+	WARN_ON(!list_empty(&mc_list->records_list));
+	kfree(mc_list);
+}
+
+static struct mlxsw_sp_nve_mc_list *
+mlxsw_sp_nve_mc_list_get(struct mlxsw_sp *mlxsw_sp,
+			 const struct mlxsw_sp_nve_mc_list_key *key)
+{
+	struct mlxsw_sp_nve_mc_list *mc_list;
+
+	mc_list = mlxsw_sp_nve_mc_list_find(mlxsw_sp, key);
+	if (mc_list)
+		return mc_list;
+
+	return mlxsw_sp_nve_mc_list_create(mlxsw_sp, key);
+}
+
+static void
+mlxsw_sp_nve_mc_list_put(struct mlxsw_sp *mlxsw_sp,
+			 struct mlxsw_sp_nve_mc_list *mc_list)
+{
+	if (!list_empty(&mc_list->records_list))
+		return;
+	mlxsw_sp_nve_mc_list_destroy(mlxsw_sp, mc_list);
+}
+
+static struct mlxsw_sp_nve_mc_record *
+mlxsw_sp_nve_mc_record_create(struct mlxsw_sp *mlxsw_sp,
+			      struct mlxsw_sp_nve_mc_list *mc_list,
+			      enum mlxsw_sp_l3proto proto)
+{
+	unsigned int num_max_entries = mlxsw_sp->nve->num_max_mc_entries[proto];
+	struct mlxsw_sp_nve_mc_record *mc_record;
+	int err;
+
+	mc_record = kzalloc(sizeof(*mc_record) + num_max_entries *
+			    sizeof(struct mlxsw_sp_nve_mc_entry), GFP_KERNEL);
+	if (!mc_record)
+		return ERR_PTR(-ENOMEM);
+
+	err = mlxsw_sp_kvdl_alloc(mlxsw_sp, MLXSW_SP_KVDL_ENTRY_TYPE_TNUMT, 1,
+				  &mc_record->kvdl_index);
+	if (err)
+		goto err_kvdl_alloc;
+
+	mc_record->ops = mlxsw_sp_nve_mc_record_ops_arr[proto];
+	mc_record->mlxsw_sp = mlxsw_sp;
+	mc_record->mc_list = mc_list;
+	mc_record->proto = proto;
+	list_add_tail(&mc_record->list, &mc_list->records_list);
+
+	return mc_record;
+
+err_kvdl_alloc:
+	kfree(mc_record);
+	return ERR_PTR(err);
+}
+
+static void
+mlxsw_sp_nve_mc_record_destroy(struct mlxsw_sp_nve_mc_record *mc_record)
+{
+	struct mlxsw_sp *mlxsw_sp = mc_record->mlxsw_sp;
+
+	list_del(&mc_record->list);
+	mlxsw_sp_kvdl_free(mlxsw_sp, MLXSW_SP_KVDL_ENTRY_TYPE_TNUMT, 1,
+			   mc_record->kvdl_index);
+	WARN_ON(mc_record->num_entries);
+	kfree(mc_record);
+}
+
+static struct mlxsw_sp_nve_mc_record *
+mlxsw_sp_nve_mc_record_get(struct mlxsw_sp *mlxsw_sp,
+			   struct mlxsw_sp_nve_mc_list *mc_list,
+			   enum mlxsw_sp_l3proto proto)
+{
+	struct mlxsw_sp_nve_mc_record *mc_record;
+
+	list_for_each_entry_reverse(mc_record, &mc_list->records_list, list) {
+		unsigned int num_entries = mc_record->num_entries;
+		struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+
+		if (mc_record->proto == proto &&
+		    num_entries < nve->num_max_mc_entries[proto])
+			return mc_record;
+	}
+
+	return mlxsw_sp_nve_mc_record_create(mlxsw_sp, mc_list, proto);
+}
+
+static void
+mlxsw_sp_nve_mc_record_put(struct mlxsw_sp_nve_mc_record *mc_record)
+{
+	if (mc_record->num_entries != 0)
+		return;
+
+	mlxsw_sp_nve_mc_record_destroy(mc_record);
+}
+
+static struct mlxsw_sp_nve_mc_entry *
+mlxsw_sp_nve_mc_free_entry_find(struct mlxsw_sp_nve_mc_record *mc_record)
+{
+	struct mlxsw_sp_nve *nve = mc_record->mlxsw_sp->nve;
+	unsigned int num_max_entries;
+	int i;
+
+	num_max_entries = nve->num_max_mc_entries[mc_record->proto];
+	for (i = 0; i < num_max_entries; i++) {
+		if (mc_record->entries[i].valid)
+			continue;
+		return &mc_record->entries[i];
+	}
+
+	return NULL;
+}
+
+static int
+mlxsw_sp_nve_mc_record_refresh(struct mlxsw_sp_nve_mc_record *mc_record)
+{
+	enum mlxsw_reg_tnumt_record_type type = mc_record->ops->type;
+	struct mlxsw_sp_nve_mc_list *mc_list = mc_record->mc_list;
+	struct mlxsw_sp *mlxsw_sp = mc_record->mlxsw_sp;
+	char tnumt_pl[MLXSW_REG_TNUMT_LEN];
+	unsigned int num_max_entries;
+	unsigned int num_entries = 0;
+	u32 next_kvdl_index = 0;
+	bool next_valid = false;
+	int i;
+
+	if (!list_is_last(&mc_record->list, &mc_list->records_list)) {
+		struct mlxsw_sp_nve_mc_record *next_record;
+
+		next_record = list_next_entry(mc_record, list);
+		next_kvdl_index = next_record->kvdl_index;
+		next_valid = true;
+	}
+
+	mlxsw_reg_tnumt_pack(tnumt_pl, type, MLXSW_REG_TNUMT_TUNNEL_PORT_NVE,
+			     mc_record->kvdl_index, next_valid,
+			     next_kvdl_index, mc_record->num_entries);
+
+	num_max_entries = mlxsw_sp->nve->num_max_mc_entries[mc_record->proto];
+	for (i = 0; i < num_max_entries; i++) {
+		struct mlxsw_sp_nve_mc_entry *mc_entry;
+
+		mc_entry = &mc_record->entries[i];
+		if (!mc_entry->valid)
+			continue;
+		mc_record->ops->entry_set(mc_record, mc_entry, tnumt_pl,
+					  num_entries++);
+	}
+
+	WARN_ON(num_entries != mc_record->num_entries);
+
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tnumt), tnumt_pl);
+}
+
+static bool
+mlxsw_sp_nve_mc_record_is_first(struct mlxsw_sp_nve_mc_record *mc_record)
+{
+	struct mlxsw_sp_nve_mc_list *mc_list = mc_record->mc_list;
+	struct mlxsw_sp_nve_mc_record *first_record;
+
+	first_record = list_first_entry(&mc_list->records_list,
+					struct mlxsw_sp_nve_mc_record, list);
+
+	return mc_record == first_record;
+}
+
+static struct mlxsw_sp_nve_mc_entry *
+mlxsw_sp_nve_mc_entry_find(struct mlxsw_sp_nve_mc_record *mc_record,
+			   union mlxsw_sp_l3addr *addr)
+{
+	struct mlxsw_sp_nve *nve = mc_record->mlxsw_sp->nve;
+	unsigned int num_max_entries;
+	int i;
+
+	num_max_entries = nve->num_max_mc_entries[mc_record->proto];
+	for (i = 0; i < num_max_entries; i++) {
+		struct mlxsw_sp_nve_mc_entry *mc_entry;
+
+		mc_entry = &mc_record->entries[i];
+		if (!mc_entry->valid)
+			continue;
+		if (mc_record->ops->entry_compare(mc_record, mc_entry, addr))
+			return mc_entry;
+	}
+
+	return NULL;
+}
+
+static int
+mlxsw_sp_nve_mc_record_ip_add(struct mlxsw_sp_nve_mc_record *mc_record,
+			      union mlxsw_sp_l3addr *addr)
+{
+	struct mlxsw_sp_nve_mc_entry *mc_entry = NULL;
+	int err;
+
+	mc_entry = mlxsw_sp_nve_mc_free_entry_find(mc_record);
+	if (WARN_ON(!mc_entry))
+		return -EINVAL;
+
+	err = mc_record->ops->entry_add(mc_record, mc_entry, addr);
+	if (err)
+		return err;
+	mc_record->num_entries++;
+	mc_entry->valid = true;
+
+	err = mlxsw_sp_nve_mc_record_refresh(mc_record);
+	if (err)
+		goto err_record_refresh;
+
+	/* If this is a new record and not the first one, then we need to
+	 * update the next pointer of the previous entry
+	 */
+	if (mc_record->num_entries != 1 ||
+	    mlxsw_sp_nve_mc_record_is_first(mc_record))
+		return 0;
+
+	err = mlxsw_sp_nve_mc_record_refresh(list_prev_entry(mc_record, list));
+	if (err)
+		goto err_prev_record_refresh;
+
+	return 0;
+
+err_prev_record_refresh:
+err_record_refresh:
+	mc_entry->valid = false;
+	mc_record->num_entries--;
+	mc_record->ops->entry_del(mc_record, mc_entry);
+	return err;
+}
+
+static void
+mlxsw_sp_nve_mc_record_entry_del(struct mlxsw_sp_nve_mc_record *mc_record,
+				 struct mlxsw_sp_nve_mc_entry *mc_entry)
+{
+	struct mlxsw_sp_nve_mc_list *mc_list = mc_record->mc_list;
+
+	mc_entry->valid = false;
+	mc_record->num_entries--;
+
+	/* When the record continues to exist we only need to invalidate
+	 * the requested entry
+	 */
+	if (mc_record->num_entries != 0) {
+		mlxsw_sp_nve_mc_record_refresh(mc_record);
+		mc_record->ops->entry_del(mc_record, mc_entry);
+		return;
+	}
+
+	/* If the record needs to be deleted, but it is not the first,
+	 * then we need to make sure that the previous record no longer
+	 * points to it. Remove deleted record from the list to reflect
+	 * that and then re-add it at the end, so that it could be
+	 * properly removed by the record destruction code
+	 */
+	if (!mlxsw_sp_nve_mc_record_is_first(mc_record)) {
+		struct mlxsw_sp_nve_mc_record *prev_record;
+
+		prev_record = list_prev_entry(mc_record, list);
+		list_del(&mc_record->list);
+		mlxsw_sp_nve_mc_record_refresh(prev_record);
+		list_add_tail(&mc_record->list, &mc_list->records_list);
+		mc_record->ops->entry_del(mc_record, mc_entry);
+		return;
+	}
+
+	/* If the first record needs to be deleted, but the list is not
+	 * singular, then the second record needs to be written in the
+	 * first record's address, as this address is stored as a property
+	 * of the FID
+	 */
+	if (mlxsw_sp_nve_mc_record_is_first(mc_record) &&
+	    !list_is_singular(&mc_list->records_list)) {
+		struct mlxsw_sp_nve_mc_record *next_record;
+
+		next_record = list_next_entry(mc_record, list);
+		swap(mc_record->kvdl_index, next_record->kvdl_index);
+		mlxsw_sp_nve_mc_record_refresh(next_record);
+		mc_record->ops->entry_del(mc_record, mc_entry);
+		return;
+	}
+
+	/* This is the last case where the last remaining record needs to
+	 * be deleted. Simply delete the entry
+	 */
+	mc_record->ops->entry_del(mc_record, mc_entry);
+}
+
+static struct mlxsw_sp_nve_mc_record *
+mlxsw_sp_nve_mc_record_find(struct mlxsw_sp_nve_mc_list *mc_list,
+			    enum mlxsw_sp_l3proto proto,
+			    union mlxsw_sp_l3addr *addr,
+			    struct mlxsw_sp_nve_mc_entry **mc_entry)
+{
+	struct mlxsw_sp_nve_mc_record *mc_record;
+
+	list_for_each_entry(mc_record, &mc_list->records_list, list) {
+		if (mc_record->proto != proto)
+			continue;
+
+		*mc_entry = mlxsw_sp_nve_mc_entry_find(mc_record, addr);
+		if (*mc_entry)
+			return mc_record;
+	}
+
+	return NULL;
+}
+
+static int mlxsw_sp_nve_mc_list_ip_add(struct mlxsw_sp *mlxsw_sp,
+				       struct mlxsw_sp_nve_mc_list *mc_list,
+				       enum mlxsw_sp_l3proto proto,
+				       union mlxsw_sp_l3addr *addr)
+{
+	struct mlxsw_sp_nve_mc_record *mc_record;
+	int err;
+
+	mc_record = mlxsw_sp_nve_mc_record_get(mlxsw_sp, mc_list, proto);
+	if (IS_ERR(mc_record))
+		return PTR_ERR(mc_record);
+
+	err = mlxsw_sp_nve_mc_record_ip_add(mc_record, addr);
+	if (err)
+		goto err_ip_add;
+
+	return 0;
+
+err_ip_add:
+	mlxsw_sp_nve_mc_record_put(mc_record);
+	return err;
+}
+
+static void mlxsw_sp_nve_mc_list_ip_del(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_nve_mc_list *mc_list,
+					enum mlxsw_sp_l3proto proto,
+					union mlxsw_sp_l3addr *addr)
+{
+	struct mlxsw_sp_nve_mc_record *mc_record;
+	struct mlxsw_sp_nve_mc_entry *mc_entry;
+
+	mc_record = mlxsw_sp_nve_mc_record_find(mc_list, proto, addr,
+						&mc_entry);
+	if (WARN_ON(!mc_record))
+		return;
+
+	mlxsw_sp_nve_mc_record_entry_del(mc_record, mc_entry);
+	mlxsw_sp_nve_mc_record_put(mc_record);
+}
+
+static int
+mlxsw_sp_nve_fid_flood_index_set(struct mlxsw_sp_fid *fid,
+				 struct mlxsw_sp_nve_mc_list *mc_list)
+{
+	struct mlxsw_sp_nve_mc_record *mc_record;
+
+	/* The address of the first record in the list is a property of
+	 * the FID and we never change it. It only needs to be set when
+	 * a new list is created
+	 */
+	if (mlxsw_sp_fid_nve_flood_index_is_set(fid))
+		return 0;
+
+	mc_record = list_first_entry(&mc_list->records_list,
+				     struct mlxsw_sp_nve_mc_record, list);
+
+	return mlxsw_sp_fid_nve_flood_index_set(fid, mc_record->kvdl_index);
+}
+
+static void
+mlxsw_sp_nve_fid_flood_index_clear(struct mlxsw_sp_fid *fid,
+				   struct mlxsw_sp_nve_mc_list *mc_list)
+{
+	struct mlxsw_sp_nve_mc_record *mc_record;
+
+	/* The address of the first record needs to be invalidated only when
+	 * the last record is about to be removed
+	 */
+	if (!list_is_singular(&mc_list->records_list))
+		return;
+
+	mc_record = list_first_entry(&mc_list->records_list,
+				     struct mlxsw_sp_nve_mc_record, list);
+	if (mc_record->num_entries != 1)
+		return;
+
+	return mlxsw_sp_fid_nve_flood_index_clear(fid);
+}
+
+int mlxsw_sp_nve_flood_ip_add(struct mlxsw_sp *mlxsw_sp,
+			      struct mlxsw_sp_fid *fid,
+			      enum mlxsw_sp_l3proto proto,
+			      union mlxsw_sp_l3addr *addr)
+{
+	struct mlxsw_sp_nve_mc_list_key key = { 0 };
+	struct mlxsw_sp_nve_mc_list *mc_list;
+	int err;
+
+	key.fid_index = mlxsw_sp_fid_index(fid);
+	mc_list = mlxsw_sp_nve_mc_list_get(mlxsw_sp, &key);
+	if (IS_ERR(mc_list))
+		return PTR_ERR(mc_list);
+
+	err = mlxsw_sp_nve_mc_list_ip_add(mlxsw_sp, mc_list, proto, addr);
+	if (err)
+		goto err_add_ip;
+
+	err = mlxsw_sp_nve_fid_flood_index_set(fid, mc_list);
+	if (err)
+		goto err_fid_flood_index_set;
+
+	return 0;
+
+err_fid_flood_index_set:
+	mlxsw_sp_nve_mc_list_ip_del(mlxsw_sp, mc_list, proto, addr);
+err_add_ip:
+	mlxsw_sp_nve_mc_list_put(mlxsw_sp, mc_list);
+	return err;
+}
+
+void mlxsw_sp_nve_flood_ip_del(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_fid *fid,
+			       enum mlxsw_sp_l3proto proto,
+			       union mlxsw_sp_l3addr *addr)
+{
+	struct mlxsw_sp_nve_mc_list_key key = { 0 };
+	struct mlxsw_sp_nve_mc_list *mc_list;
+
+	key.fid_index = mlxsw_sp_fid_index(fid);
+	mc_list = mlxsw_sp_nve_mc_list_find(mlxsw_sp, &key);
+	if (WARN_ON(!mc_list))
+		return;
+
+	mlxsw_sp_nve_fid_flood_index_clear(fid, mc_list);
+	mlxsw_sp_nve_mc_list_ip_del(mlxsw_sp, mc_list, proto, addr);
+	mlxsw_sp_nve_mc_list_put(mlxsw_sp, mc_list);
+}
+
+static void
+mlxsw_sp_nve_mc_record_delete(struct mlxsw_sp_nve_mc_record *mc_record)
+{
+	struct mlxsw_sp_nve *nve = mc_record->mlxsw_sp->nve;
+	unsigned int num_max_entries;
+	int i;
+
+	num_max_entries = nve->num_max_mc_entries[mc_record->proto];
+	for (i = 0; i < num_max_entries; i++) {
+		struct mlxsw_sp_nve_mc_entry *mc_entry = &mc_record->entries[i];
+
+		if (!mc_entry->valid)
+			continue;
+		mlxsw_sp_nve_mc_record_entry_del(mc_record, mc_entry);
+	}
+
+	WARN_ON(mc_record->num_entries);
+	mlxsw_sp_nve_mc_record_put(mc_record);
+}
+
+static void mlxsw_sp_nve_flood_ip_flush(struct mlxsw_sp *mlxsw_sp,
+					struct mlxsw_sp_fid *fid)
+{
+	struct mlxsw_sp_nve_mc_record *mc_record, *tmp;
+	struct mlxsw_sp_nve_mc_list_key key = { 0 };
+	struct mlxsw_sp_nve_mc_list *mc_list;
+
+	if (!mlxsw_sp_fid_nve_flood_index_is_set(fid))
+		return;
+
+	mlxsw_sp_fid_nve_flood_index_clear(fid);
+
+	key.fid_index = mlxsw_sp_fid_index(fid);
+	mc_list = mlxsw_sp_nve_mc_list_find(mlxsw_sp, &key);
+	if (WARN_ON(!mc_list))
+		return;
+
+	list_for_each_entry_safe(mc_record, tmp, &mc_list->records_list, list)
+		mlxsw_sp_nve_mc_record_delete(mc_record);
+
+	WARN_ON(!list_empty(&mc_list->records_list));
+	mlxsw_sp_nve_mc_list_put(mlxsw_sp, mc_list);
+}
+
+u32 mlxsw_sp_nve_decap_tunnel_index_get(const struct mlxsw_sp *mlxsw_sp)
+{
+	WARN_ON(mlxsw_sp->nve->num_nve_tunnels == 0);
+
+	return mlxsw_sp->nve->tunnel_index;
+}
+
+bool mlxsw_sp_nve_ipv4_route_is_decap(const struct mlxsw_sp *mlxsw_sp,
+				      u32 tb_id, __be32 addr)
+{
+	struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+	struct mlxsw_sp_nve_config *config = &nve->config;
+
+	if (nve->num_nve_tunnels &&
+	    config->ul_proto == MLXSW_SP_L3_PROTO_IPV4 &&
+	    config->ul_sip.addr4 == addr && config->ul_tb_id == tb_id)
+		return true;
+
+	return false;
+}
+
+static int mlxsw_sp_nve_tunnel_init(struct mlxsw_sp *mlxsw_sp,
+				    struct mlxsw_sp_nve_config *config)
+{
+	struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+	const struct mlxsw_sp_nve_ops *ops;
+	int err;
+
+	if (nve->num_nve_tunnels++ != 0)
+		return 0;
+
+	err = mlxsw_sp_kvdl_alloc(mlxsw_sp, MLXSW_SP_KVDL_ENTRY_TYPE_ADJ, 1,
+				  &nve->tunnel_index);
+	if (err)
+		goto err_kvdl_alloc;
+
+	ops = nve->nve_ops_arr[config->type];
+	err = ops->init(nve, config);
+	if (err)
+		goto err_ops_init;
+
+	return 0;
+
+err_ops_init:
+	mlxsw_sp_kvdl_free(mlxsw_sp, MLXSW_SP_KVDL_ENTRY_TYPE_ADJ, 1,
+			   nve->tunnel_index);
+err_kvdl_alloc:
+	nve->num_nve_tunnels--;
+	return err;
+}
+
+static void mlxsw_sp_nve_tunnel_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+	const struct mlxsw_sp_nve_ops *ops;
+
+	ops = nve->nve_ops_arr[nve->config.type];
+
+	if (mlxsw_sp->nve->num_nve_tunnels == 1) {
+		ops->fini(nve);
+		mlxsw_sp_kvdl_free(mlxsw_sp, MLXSW_SP_KVDL_ENTRY_TYPE_ADJ, 1,
+				   nve->tunnel_index);
+	}
+	nve->num_nve_tunnels--;
+}
+
+static void mlxsw_sp_nve_fdb_flush_by_fid(struct mlxsw_sp *mlxsw_sp,
+					  u16 fid_index)
+{
+	char sfdf_pl[MLXSW_REG_SFDF_LEN];
+
+	mlxsw_reg_sfdf_pack(sfdf_pl, MLXSW_REG_SFDF_FLUSH_PER_NVE_AND_FID);
+	mlxsw_reg_sfdf_fid_set(sfdf_pl, fid_index);
+	mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfdf), sfdf_pl);
+}
+
+int mlxsw_sp_nve_fid_enable(struct mlxsw_sp *mlxsw_sp, struct mlxsw_sp_fid *fid,
+			    struct mlxsw_sp_nve_params *params,
+			    struct netlink_ext_ack *extack)
+{
+	struct mlxsw_sp_nve *nve = mlxsw_sp->nve;
+	const struct mlxsw_sp_nve_ops *ops;
+	struct mlxsw_sp_nve_config config;
+	int err;
+
+	ops = nve->nve_ops_arr[params->type];
+
+	if (!ops->can_offload(nve, params->dev, extack))
+		return -EOPNOTSUPP;
+
+	memset(&config, 0, sizeof(config));
+	ops->nve_config(nve, params->dev, &config);
+	if (nve->num_nve_tunnels &&
+	    memcmp(&config, &nve->config, sizeof(config))) {
+		NL_SET_ERR_MSG_MOD(extack, "Conflicting NVE tunnels configuration");
+		return -EOPNOTSUPP;
+	}
+
+	err = mlxsw_sp_nve_tunnel_init(mlxsw_sp, &config);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to initialize NVE tunnel");
+		return err;
+	}
+
+	err = mlxsw_sp_fid_vni_set(fid, params->vni);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(extack, "Failed to set VNI on FID");
+		goto err_fid_vni_set;
+	}
+
+	nve->config = config;
+
+	return 0;
+
+err_fid_vni_set:
+	mlxsw_sp_nve_tunnel_fini(mlxsw_sp);
+	return err;
+}
+
+void mlxsw_sp_nve_fid_disable(struct mlxsw_sp *mlxsw_sp,
+			      struct mlxsw_sp_fid *fid)
+{
+	u16 fid_index = mlxsw_sp_fid_index(fid);
+
+	mlxsw_sp_nve_flood_ip_flush(mlxsw_sp, fid);
+	mlxsw_sp_nve_fdb_flush_by_fid(mlxsw_sp, fid_index);
+	mlxsw_sp_fid_vni_clear(fid);
+	mlxsw_sp_nve_tunnel_fini(mlxsw_sp);
+}
+
+int mlxsw_sp_port_nve_init(struct mlxsw_sp_port *mlxsw_sp_port)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	char tnqdr_pl[MLXSW_REG_TNQDR_LEN];
+
+	mlxsw_reg_tnqdr_pack(tnqdr_pl, mlxsw_sp_port->local_port);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tnqdr), tnqdr_pl);
+}
+
+void mlxsw_sp_port_nve_fini(struct mlxsw_sp_port *mlxsw_sp_port)
+{
+}
+
+static int mlxsw_sp_nve_qos_init(struct mlxsw_sp *mlxsw_sp)
+{
+	char tnqcr_pl[MLXSW_REG_TNQCR_LEN];
+
+	mlxsw_reg_tnqcr_pack(tnqcr_pl);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tnqcr), tnqcr_pl);
+}
+
+static int mlxsw_sp_nve_ecn_encap_init(struct mlxsw_sp *mlxsw_sp)
+{
+	int i;
+
+	/* Iterate over inner ECN values */
+	for (i = INET_ECN_NOT_ECT; i <= INET_ECN_CE; i++) {
+		u8 outer_ecn = INET_ECN_encapsulate(0, i);
+		char tneem_pl[MLXSW_REG_TNEEM_LEN];
+		int err;
+
+		mlxsw_reg_tneem_pack(tneem_pl, i, outer_ecn);
+		err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tneem),
+				      tneem_pl);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int __mlxsw_sp_nve_ecn_decap_init(struct mlxsw_sp *mlxsw_sp,
+					 u8 inner_ecn, u8 outer_ecn)
+{
+	char tndem_pl[MLXSW_REG_TNDEM_LEN];
+	bool trap_en, set_ce = false;
+	u8 new_inner_ecn;
+
+	trap_en = !!__INET_ECN_decapsulate(outer_ecn, inner_ecn, &set_ce);
+	new_inner_ecn = set_ce ? INET_ECN_CE : inner_ecn;
+
+	mlxsw_reg_tndem_pack(tndem_pl, outer_ecn, inner_ecn, new_inner_ecn,
+			     trap_en, trap_en ? MLXSW_TRAP_ID_DECAP_ECN0 : 0);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(tndem), tndem_pl);
+}
+
+static int mlxsw_sp_nve_ecn_decap_init(struct mlxsw_sp *mlxsw_sp)
+{
+	int i;
+
+	/* Iterate over inner ECN values */
+	for (i = INET_ECN_NOT_ECT; i <= INET_ECN_CE; i++) {
+		int j;
+
+		/* Iterate over outer ECN values */
+		for (j = INET_ECN_NOT_ECT; j <= INET_ECN_CE; j++) {
+			int err;
+
+			err = __mlxsw_sp_nve_ecn_decap_init(mlxsw_sp, i, j);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
+static int mlxsw_sp_nve_ecn_init(struct mlxsw_sp *mlxsw_sp)
+{
+	int err;
+
+	err = mlxsw_sp_nve_ecn_encap_init(mlxsw_sp);
+	if (err)
+		return err;
+
+	return mlxsw_sp_nve_ecn_decap_init(mlxsw_sp);
+}
+
+static int mlxsw_sp_nve_resources_query(struct mlxsw_sp *mlxsw_sp)
+{
+	unsigned int max;
+
+	if (!MLXSW_CORE_RES_VALID(mlxsw_sp->core, MAX_NVE_MC_ENTRIES_IPV4) ||
+	    !MLXSW_CORE_RES_VALID(mlxsw_sp->core, MAX_NVE_MC_ENTRIES_IPV6))
+		return -EIO;
+	max = MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_NVE_MC_ENTRIES_IPV4);
+	mlxsw_sp->nve->num_max_mc_entries[MLXSW_SP_L3_PROTO_IPV4] = max;
+	max = MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_NVE_MC_ENTRIES_IPV6);
+	mlxsw_sp->nve->num_max_mc_entries[MLXSW_SP_L3_PROTO_IPV6] = max;
+
+	return 0;
+}
+
+int mlxsw_sp_nve_init(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_sp_nve *nve;
+	int err;
+
+	nve = kzalloc(sizeof(*mlxsw_sp->nve), GFP_KERNEL);
+	if (!nve)
+		return -ENOMEM;
+	mlxsw_sp->nve = nve;
+	nve->mlxsw_sp = mlxsw_sp;
+	nve->nve_ops_arr = mlxsw_sp->nve_ops_arr;
+
+	err = rhashtable_init(&nve->mc_list_ht,
+			      &mlxsw_sp_nve_mc_list_ht_params);
+	if (err)
+		goto err_rhashtable_init;
+
+	err = mlxsw_sp_nve_qos_init(mlxsw_sp);
+	if (err)
+		goto err_nve_qos_init;
+
+	err = mlxsw_sp_nve_ecn_init(mlxsw_sp);
+	if (err)
+		goto err_nve_ecn_init;
+
+	err = mlxsw_sp_nve_resources_query(mlxsw_sp);
+	if (err)
+		goto err_nve_resources_query;
+
+	return 0;
+
+err_nve_resources_query:
+err_nve_ecn_init:
+err_nve_qos_init:
+	rhashtable_destroy(&nve->mc_list_ht);
+err_rhashtable_init:
+	mlxsw_sp->nve = NULL;
+	kfree(nve);
+	return err;
+}
+
+void mlxsw_sp_nve_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	WARN_ON(mlxsw_sp->nve->num_nve_tunnels);
+	rhashtable_destroy(&mlxsw_sp->nve->mc_list_ht);
+	mlxsw_sp->nve = NULL;
+	kfree(mlxsw_sp->nve);
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
new file mode 100644
index 000000000000..4cc3297e13d6
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0 */
+/* Copyright (c) 2018 Mellanox Technologies. All rights reserved */
+
+#ifndef _MLXSW_SPECTRUM_NVE_H
+#define _MLXSW_SPECTRUM_NVE_H
+
+#include <linux/netlink.h>
+#include <linux/rhashtable.h>
+
+#include "spectrum.h"
+
+struct mlxsw_sp_nve_config {
+	enum mlxsw_sp_nve_type type;
+	u8 ttl;
+	u8 learning_en:1;
+	__be16 udp_dport;
+	__be32 flowlabel;
+	u32 ul_tb_id;
+	enum mlxsw_sp_l3proto ul_proto;
+	union mlxsw_sp_l3addr ul_sip;
+};
+
+struct mlxsw_sp_nve {
+	struct mlxsw_sp_nve_config config;
+	struct rhashtable mc_list_ht;
+	struct mlxsw_sp *mlxsw_sp;
+	const struct mlxsw_sp_nve_ops **nve_ops_arr;
+	unsigned int num_nve_tunnels;	/* Protected by RTNL */
+	unsigned int num_max_mc_entries[MLXSW_SP_L3_PROTO_MAX];
+	u32 tunnel_index;
+};
+
+struct mlxsw_sp_nve_ops {
+	enum mlxsw_sp_nve_type type;
+	bool (*can_offload)(const struct mlxsw_sp_nve *nve,
+			    const struct net_device *dev,
+			    struct netlink_ext_ack *extack);
+	void (*nve_config)(const struct mlxsw_sp_nve *nve,
+			   const struct net_device *dev,
+			   struct mlxsw_sp_nve_config *config);
+	int (*init)(struct mlxsw_sp_nve *nve,
+		    const struct mlxsw_sp_nve_config *config);
+	void (*fini)(struct mlxsw_sp_nve *nve);
+};
+
+extern const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops;
+extern const struct mlxsw_sp_nve_ops mlxsw_sp2_nve_vxlan_ops;
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
new file mode 100644
index 000000000000..4abc4d8a1e68
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/* Copyright (c) 2018 Mellanox Technologies. All rights reserved */
+
+#include <linux/netdevice.h>
+#include <linux/netlink.h>
+
+#include "spectrum_nve.h"
+
+static bool mlxsw_sp1_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve,
+					    const struct net_device *dev,
+					    struct netlink_ext_ack *extack)
+{
+	return false;
+}
+
+static void mlxsw_sp_nve_vxlan_config(const struct mlxsw_sp_nve *nve,
+				      const struct net_device *dev,
+				      struct mlxsw_sp_nve_config *config)
+{
+}
+
+static int mlxsw_sp1_nve_vxlan_init(struct mlxsw_sp_nve *nve,
+				    const struct mlxsw_sp_nve_config *config)
+{
+	return -EOPNOTSUPP;
+}
+
+static void mlxsw_sp1_nve_vxlan_fini(struct mlxsw_sp_nve *nve)
+{
+}
+
+const struct mlxsw_sp_nve_ops mlxsw_sp1_nve_vxlan_ops = {
+	.type		= MLXSW_SP_NVE_TYPE_VXLAN,
+	.can_offload	= mlxsw_sp1_nve_vxlan_can_offload,
+	.nve_config	= mlxsw_sp_nve_vxlan_config,
+	.init		= mlxsw_sp1_nve_vxlan_init,
+	.fini		= mlxsw_sp1_nve_vxlan_fini,
+};
+
+static bool mlxsw_sp2_nve_vxlan_can_offload(const struct mlxsw_sp_nve *nve,
+					    const struct net_device *dev,
+					    struct netlink_ext_ack *extack)
+{
+	return false;
+}
+
+static int mlxsw_sp2_nve_vxlan_init(struct mlxsw_sp_nve *nve,
+				    const struct mlxsw_sp_nve_config *config)
+{
+	return -EOPNOTSUPP;
+}
+
+static void mlxsw_sp2_nve_vxlan_fini(struct mlxsw_sp_nve *nve)
+{
+}
+
+const struct mlxsw_sp_nve_ops mlxsw_sp2_nve_vxlan_ops = {
+	.type		= MLXSW_SP_NVE_TYPE_VXLAN,
+	.can_offload	= mlxsw_sp2_nve_vxlan_can_offload,
+	.nve_config	= mlxsw_sp_nve_vxlan_config,
+	.init		= mlxsw_sp2_nve_vxlan_init,
+	.fini		= mlxsw_sp2_nve_vxlan_fini,
+};
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 06/18] inet: Refactor INET_ECN_decapsulate()
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Drivers that support tunnel decapsulation (IPinIP or NVE) need to
configure the underlying device to conform to the behavior outlined in
RFC 6040 with respect to the ECN bits.

This behavior is implemented by INET_ECN_decapsulate() which requires an
skb to be passed where the ECN CE bit can be potentially set. Since
these drivers do not need to mark an skb, but only configure the device
to do so, factor out the business logic to __INET_ECN_decapsulate() and
potentially perform the marking in INET_ECN_decapsulate().

This allows drivers to invoke __INET_ECN_decapsulate() and configure the
device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 include/net/inet_ecn.h | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index 482a1b705362..c8e2bebd8d93 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -183,8 +183,7 @@ static inline int INET_ECN_set_ce(struct sk_buff *skb)
  *          1 if something is broken and should be logged (!!! above)
  *          2 if packet should be dropped
  */
-static inline int INET_ECN_decapsulate(struct sk_buff *skb,
-				       __u8 outer, __u8 inner)
+static inline int __INET_ECN_decapsulate(__u8 outer, __u8 inner, bool *set_ce)
 {
 	if (INET_ECN_is_not_ect(inner)) {
 		switch (outer & INET_ECN_MASK) {
@@ -198,10 +197,21 @@ static inline int INET_ECN_decapsulate(struct sk_buff *skb,
 		}
 	}
 
-	if (INET_ECN_is_ce(outer))
+	*set_ce = INET_ECN_is_ce(outer);
+	return 0;
+}
+
+static inline int INET_ECN_decapsulate(struct sk_buff *skb,
+				       __u8 outer, __u8 inner)
+{
+	bool set_ce = false;
+	int rc;
+
+	rc = __INET_ECN_decapsulate(outer, inner, &set_ce);
+	if (!rc && set_ce)
 		INET_ECN_set_ce(skb);
 
-	return 0;
+	return rc;
 }
 
 static inline int IP_ECN_decapsulate(const struct iphdr *oiph,
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 03/18] mlxsw: spectrum_router: Enable local routes promotion to perform NVE decap
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

When an NVE tunnel with an IP underlay (e.g., VxLAN) is configured the
local route to the tunnel's source IP needs to be promoted to perform
NVE decapsulation.

Expose an API in the unicast IP router to promote / demote local routes.

The case where a local route is configured after the creation of the NVE
tunnel will be handled in a subsequent patch in the set.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |   7 ++
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 115 +++++++++++++++++-
 2 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index f463be58c6dc..739a62d024c9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -431,6 +431,13 @@ struct mlxsw_sp_rif *mlxsw_sp_rif_find_by_dev(const struct mlxsw_sp *mlxsw_sp,
 					      const struct net_device *dev);
 u8 mlxsw_sp_router_port(const struct mlxsw_sp *mlxsw_sp);
 struct mlxsw_sp_fid *mlxsw_sp_rif_fid(const struct mlxsw_sp_rif *rif);
+int mlxsw_sp_router_nve_promote_decap(struct mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
+				      enum mlxsw_sp_l3proto ul_proto,
+				      const union mlxsw_sp_l3addr *ul_sip,
+				      u32 tunnel_index);
+void mlxsw_sp_router_nve_demote_decap(struct mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
+				      enum mlxsw_sp_l3proto ul_proto,
+				      const union mlxsw_sp_l3addr *ul_sip);
 
 /* spectrum_kvdl.c */
 enum mlxsw_sp_kvdl_entry_type {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 2ab9cf25a08a..ca4289d15561 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -366,6 +366,7 @@ enum mlxsw_sp_fib_entry_type {
 	 * encapsulating entries.)
 	 */
 	MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP,
+	MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP,
 };
 
 struct mlxsw_sp_nexthop_group;
@@ -1128,6 +1129,52 @@ mlxsw_sp_ipip_entry_promote_decap(struct mlxsw_sp *mlxsw_sp,
 		mlxsw_sp_ipip_entry_demote_decap(mlxsw_sp, ipip_entry);
 }
 
+static struct mlxsw_sp_fib_entry *
+mlxsw_sp_router_ip2me_fib_entry_find(struct mlxsw_sp *mlxsw_sp, u32 tb_id,
+				     enum mlxsw_sp_l3proto proto,
+				     const union mlxsw_sp_l3addr *addr,
+				     enum mlxsw_sp_fib_entry_type type)
+{
+	struct mlxsw_sp_fib_entry *fib_entry;
+	struct mlxsw_sp_fib_node *fib_node;
+	unsigned char addr_prefix_len;
+	struct mlxsw_sp_fib *fib;
+	struct mlxsw_sp_vr *vr;
+	const void *addrp;
+	size_t addr_len;
+	u32 addr4;
+
+	vr = mlxsw_sp_vr_find(mlxsw_sp, tb_id);
+	if (!vr)
+		return NULL;
+	fib = mlxsw_sp_vr_fib(vr, proto);
+
+	switch (proto) {
+	case MLXSW_SP_L3_PROTO_IPV4:
+		addr4 = be32_to_cpu(addr->addr4);
+		addrp = &addr4;
+		addr_len = 4;
+		addr_prefix_len = 32;
+		break;
+	case MLXSW_SP_L3_PROTO_IPV6: /* fall through */
+	default:
+		WARN_ON(1);
+		return NULL;
+	}
+
+	fib_node = mlxsw_sp_fib_node_lookup(fib, addrp, addr_len,
+					    addr_prefix_len);
+	if (!fib_node || list_empty(&fib_node->entry_list))
+		return NULL;
+
+	fib_entry = list_first_entry(&fib_node->entry_list,
+				     struct mlxsw_sp_fib_entry, list);
+	if (fib_entry->type != type)
+		return NULL;
+
+	return fib_entry;
+}
+
 /* Given an IPIP entry, find the corresponding decap route. */
 static struct mlxsw_sp_fib_entry *
 mlxsw_sp_ipip_entry_find_decap(struct mlxsw_sp *mlxsw_sp,
@@ -1765,6 +1812,56 @@ mlxsw_sp_netdevice_ipip_ul_event(struct mlxsw_sp *mlxsw_sp,
 	return 0;
 }
 
+int mlxsw_sp_router_nve_promote_decap(struct mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
+				      enum mlxsw_sp_l3proto ul_proto,
+				      const union mlxsw_sp_l3addr *ul_sip,
+				      u32 tunnel_index)
+{
+	enum mlxsw_sp_fib_entry_type type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
+	struct mlxsw_sp_fib_entry *fib_entry;
+	int err;
+
+	/* It is valid to create a tunnel with a local IP and only later
+	 * assign this IP address to a local interface
+	 */
+	fib_entry = mlxsw_sp_router_ip2me_fib_entry_find(mlxsw_sp, ul_tb_id,
+							 ul_proto, ul_sip,
+							 type);
+	if (!fib_entry)
+		return 0;
+
+	fib_entry->decap.tunnel_index = tunnel_index;
+	fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP;
+
+	err = mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
+	if (err)
+		goto err_fib_entry_update;
+
+	return 0;
+
+err_fib_entry_update:
+	fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
+	mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
+	return err;
+}
+
+void mlxsw_sp_router_nve_demote_decap(struct mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
+				      enum mlxsw_sp_l3proto ul_proto,
+				      const union mlxsw_sp_l3addr *ul_sip)
+{
+	enum mlxsw_sp_fib_entry_type type = MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP;
+	struct mlxsw_sp_fib_entry *fib_entry;
+
+	fib_entry = mlxsw_sp_router_ip2me_fib_entry_find(mlxsw_sp, ul_tb_id,
+							 ul_proto, ul_sip,
+							 type);
+	if (!fib_entry)
+		return;
+
+	fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
+	mlxsw_sp_fib_entry_update(mlxsw_sp, fib_entry);
+}
+
 struct mlxsw_sp_neigh_key {
 	struct neighbour *n;
 };
@@ -3815,6 +3912,7 @@ mlxsw_sp_fib_entry_should_offload(const struct mlxsw_sp_fib_entry *fib_entry)
 	case MLXSW_SP_FIB_ENTRY_TYPE_LOCAL:
 		return !!nh_group->nh_rif;
 	case MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP:
+	case MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP:
 		return true;
 	default:
 		return false;
@@ -3848,7 +3946,8 @@ mlxsw_sp_fib4_entry_offload_set(struct mlxsw_sp_fib_entry *fib_entry)
 	int i;
 
 	if (fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_LOCAL ||
-	    fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP) {
+	    fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP ||
+	    fib_entry->type == MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP) {
 		nh_grp->nexthops->key.fib_nh->nh_flags |= RTNH_F_OFFLOAD;
 		return;
 	}
@@ -4072,6 +4171,18 @@ mlxsw_sp_fib_entry_op_ipip_decap(struct mlxsw_sp *mlxsw_sp,
 				      fib_entry->decap.tunnel_index);
 }
 
+static int mlxsw_sp_fib_entry_op_nve_decap(struct mlxsw_sp *mlxsw_sp,
+					   struct mlxsw_sp_fib_entry *fib_entry,
+					   enum mlxsw_reg_ralue_op op)
+{
+	char ralue_pl[MLXSW_REG_RALUE_LEN];
+
+	mlxsw_sp_fib_entry_ralue_pack(ralue_pl, fib_entry, op);
+	mlxsw_reg_ralue_act_ip2me_tun_pack(ralue_pl,
+					   fib_entry->decap.tunnel_index);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
+}
+
 static int __mlxsw_sp_fib_entry_op(struct mlxsw_sp *mlxsw_sp,
 				   struct mlxsw_sp_fib_entry *fib_entry,
 				   enum mlxsw_reg_ralue_op op)
@@ -4086,6 +4197,8 @@ static int __mlxsw_sp_fib_entry_op(struct mlxsw_sp *mlxsw_sp,
 	case MLXSW_SP_FIB_ENTRY_TYPE_IPIP_DECAP:
 		return mlxsw_sp_fib_entry_op_ipip_decap(mlxsw_sp,
 							fib_entry, op);
+	case MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP:
+		return mlxsw_sp_fib_entry_op_nve_decap(mlxsw_sp, fib_entry, op);
 	}
 	return -EINVAL;
 }
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 05/18] vxlan: Export address checking functions
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Drivers that support VxLAN offload need to be able to sanitize the
configuration of the VxLAN device and accept / reject its offload.

For example, mlxsw requires that the local IP of the VxLAN device be set
and that packets be flooded to unicast IP(s) and not to a multicast
group.

Expose the functions that perform such checks.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/vxlan.c | 26 --------------------------
 include/net/vxlan.h | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 018406c4d944..0fc2b1d82d4c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -103,22 +103,6 @@ bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b)
 		return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr;
 }
 
-static inline bool vxlan_addr_any(const union vxlan_addr *ipa)
-{
-	if (ipa->sa.sa_family == AF_INET6)
-		return ipv6_addr_any(&ipa->sin6.sin6_addr);
-	else
-		return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY);
-}
-
-static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
-{
-	if (ipa->sa.sa_family == AF_INET6)
-		return ipv6_addr_is_multicast(&ipa->sin6.sin6_addr);
-	else
-		return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr));
-}
-
 static int vxlan_nla_get_addr(union vxlan_addr *ip, struct nlattr *nla)
 {
 	if (nla_len(nla) >= sizeof(struct in6_addr)) {
@@ -151,16 +135,6 @@ bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b)
 	return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr;
 }
 
-static inline bool vxlan_addr_any(const union vxlan_addr *ipa)
-{
-	return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY);
-}
-
-static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
-{
-	return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr));
-}
-
 static int vxlan_nla_get_addr(union vxlan_addr *ip, struct nlattr *nla)
 {
 	if (nla_len(nla) >= sizeof(struct in6_addr)) {
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 7ef15179f263..dd3d72ce64b6 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -370,4 +370,36 @@ static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
 	return vs->sock->sk->sk_family;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+
+static inline bool vxlan_addr_any(const union vxlan_addr *ipa)
+{
+	if (ipa->sa.sa_family == AF_INET6)
+		return ipv6_addr_any(&ipa->sin6.sin6_addr);
+	else
+		return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY);
+}
+
+static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
+{
+	if (ipa->sa.sa_family == AF_INET6)
+		return ipv6_addr_is_multicast(&ipa->sin6.sin6_addr);
+	else
+		return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr));
+}
+
+#else /* !IS_ENABLED(CONFIG_IPV6) */
+
+static inline bool vxlan_addr_any(const union vxlan_addr *ipa)
+{
+	return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY);
+}
+
+static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
+{
+	return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr));
+}
+
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+
 #endif
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 04/18] mlxsw: spectrum_router: Allow querying VR ID based on table ID
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

In the device, different VRFs (routing tables) are represented using
different virtual routers (VRs) and thus the kernel's table IDs are
mapped to VR IDs.

Allow internal users of the IP router to query the VR ID based on a
kernel table ID.

This is needed - for example - when configuring the underlay VR where
VxLAN encapsulated packets will undergo an L3 lookup. In this case, the
kernel's table ID is derived from the VxLAN device's configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h      |  2 ++
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c   | 13 +++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 739a62d024c9..d0a73935e8bb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -438,6 +438,8 @@ int mlxsw_sp_router_nve_promote_decap(struct mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
 void mlxsw_sp_router_nve_demote_decap(struct mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
 				      enum mlxsw_sp_l3proto ul_proto,
 				      const union mlxsw_sp_l3addr *ul_sip);
+int mlxsw_sp_router_tb_id_vr_id(struct mlxsw_sp *mlxsw_sp, u32 tb_id,
+				u16 *vr_id);
 
 /* spectrum_kvdl.c */
 enum mlxsw_sp_kvdl_entry_type {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index ca4289d15561..31b5491d6737 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -742,6 +742,19 @@ static struct mlxsw_sp_vr *mlxsw_sp_vr_find(struct mlxsw_sp *mlxsw_sp,
 	return NULL;
 }
 
+int mlxsw_sp_router_tb_id_vr_id(struct mlxsw_sp *mlxsw_sp, u32 tb_id,
+				u16 *vr_id)
+{
+	struct mlxsw_sp_vr *vr;
+
+	vr = mlxsw_sp_vr_find(mlxsw_sp, tb_id);
+	if (!vr)
+		return -ESRCH;
+	*vr_id = vr->id;
+
+	return 0;
+}
+
 static struct mlxsw_sp_fib *mlxsw_sp_vr_fib(const struct mlxsw_sp_vr *vr,
 					    enum mlxsw_sp_l3proto proto)
 {
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 02/18] mlxsw: spectrum_fid: Add APIs to lookup FID without creating it
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Current APIs only allow looking for a FID and creating it in case it
does not exist.

With VxLAN, in case the bridge to which the VxLAN device was enslaved
does not already have a corresponding FID, then it means that something
went wrong that we need to be aware of.

Add an API to look up a FID, but without creating it in order to catch
above-mentioned situation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  4 ++
 .../ethernet/mellanox/mlxsw/spectrum_fid.c    | 45 ++++++++++++++++---
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 7f96953f0409..f463be58c6dc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -679,6 +679,8 @@ int mlxsw_sp_setup_tc_prio(struct mlxsw_sp_port *mlxsw_sp_port,
 			   struct tc_prio_qopt_offload *p);
 
 /* spectrum_fid.c */
+struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_vni(struct mlxsw_sp *mlxsw_sp,
+						__be32 vni);
 int mlxsw_sp_fid_vni(const struct mlxsw_sp_fid *fid, __be32 *vni);
 int mlxsw_sp_fid_nve_flood_index_set(struct mlxsw_sp_fid *fid,
 				     u32 nve_flood_index);
@@ -705,6 +707,8 @@ u16 mlxsw_sp_fid_8021q_vid(const struct mlxsw_sp_fid *fid);
 struct mlxsw_sp_fid *mlxsw_sp_fid_8021q_get(struct mlxsw_sp *mlxsw_sp, u16 vid);
 struct mlxsw_sp_fid *mlxsw_sp_fid_8021d_get(struct mlxsw_sp *mlxsw_sp,
 					    int br_ifindex);
+struct mlxsw_sp_fid *mlxsw_sp_fid_8021d_lookup(struct mlxsw_sp *mlxsw_sp,
+					       int br_ifindex);
 struct mlxsw_sp_fid *mlxsw_sp_fid_rfid_get(struct mlxsw_sp *mlxsw_sp,
 					   u16 rif_index);
 struct mlxsw_sp_fid *mlxsw_sp_fid_dummy_get(struct mlxsw_sp *mlxsw_sp);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 7e07cf368e90..0ba3d90d4632 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -113,6 +113,19 @@ static const int *mlxsw_sp_packet_type_sfgc_types[] = {
 	[MLXSW_SP_FLOOD_TYPE_MC]	= mlxsw_sp_sfgc_mc_packet_types,
 };
 
+struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_vni(struct mlxsw_sp *mlxsw_sp,
+						__be32 vni)
+{
+	struct mlxsw_sp_fid *fid;
+
+	fid = rhashtable_lookup_fast(&mlxsw_sp->fid_core->vni_ht, &vni,
+				     mlxsw_sp_fid_vni_ht_params);
+	if (fid)
+		fid->ref_count++;
+
+	return fid;
+}
+
 int mlxsw_sp_fid_vni(const struct mlxsw_sp_fid *fid, __be32 *vni)
 {
 	if (!fid->vni_valid)
@@ -879,14 +892,12 @@ static const struct mlxsw_sp_fid_family *mlxsw_sp_fid_family_arr[] = {
 	[MLXSW_SP_FID_TYPE_DUMMY]	= &mlxsw_sp_fid_dummy_family,
 };
 
-static struct mlxsw_sp_fid *mlxsw_sp_fid_get(struct mlxsw_sp *mlxsw_sp,
-					     enum mlxsw_sp_fid_type type,
-					     const void *arg)
+static struct mlxsw_sp_fid *mlxsw_sp_fid_lookup(struct mlxsw_sp *mlxsw_sp,
+						enum mlxsw_sp_fid_type type,
+						const void *arg)
 {
 	struct mlxsw_sp_fid_family *fid_family;
 	struct mlxsw_sp_fid *fid;
-	u16 fid_index;
-	int err;
 
 	fid_family = mlxsw_sp->fid_core->fid_family_arr[type];
 	list_for_each_entry(fid, &fid_family->fids_list, list) {
@@ -896,6 +907,23 @@ static struct mlxsw_sp_fid *mlxsw_sp_fid_get(struct mlxsw_sp *mlxsw_sp,
 		return fid;
 	}
 
+	return NULL;
+}
+
+static struct mlxsw_sp_fid *mlxsw_sp_fid_get(struct mlxsw_sp *mlxsw_sp,
+					     enum mlxsw_sp_fid_type type,
+					     const void *arg)
+{
+	struct mlxsw_sp_fid_family *fid_family;
+	struct mlxsw_sp_fid *fid;
+	u16 fid_index;
+	int err;
+
+	fid = mlxsw_sp_fid_lookup(mlxsw_sp, type, arg);
+	if (fid)
+		return fid;
+
+	fid_family = mlxsw_sp->fid_core->fid_family_arr[type];
 	fid = kzalloc(fid_family->fid_size, GFP_KERNEL);
 	if (!fid)
 		return ERR_PTR(-ENOMEM);
@@ -955,6 +983,13 @@ struct mlxsw_sp_fid *mlxsw_sp_fid_8021d_get(struct mlxsw_sp *mlxsw_sp,
 	return mlxsw_sp_fid_get(mlxsw_sp, MLXSW_SP_FID_TYPE_8021D, &br_ifindex);
 }
 
+struct mlxsw_sp_fid *mlxsw_sp_fid_8021d_lookup(struct mlxsw_sp *mlxsw_sp,
+					       int br_ifindex)
+{
+	return mlxsw_sp_fid_lookup(mlxsw_sp, MLXSW_SP_FID_TYPE_8021D,
+				   &br_ifindex);
+}
+
 struct mlxsw_sp_fid *mlxsw_sp_fid_rfid_get(struct mlxsw_sp *mlxsw_sp,
 					   u16 rif_index)
 {
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 01/18] mlxsw: spectrum_fid: Allow setting and clearing NVE properties on FID
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

In the device, the VNI and the list of remote VTEPs a packet should be
flooded to is a property of the filtering identifier (FID).

During encapsulation, the VNI is taken from the FID the packet was
classified to. During decapsulation, the overlay packet is injected into
a bridge and classified to a FID based on the VNI it came with.

Allow NVE configuration for a FID. Currently, this is only supported
with 802.1D FIDs which are used for VLAN-unaware bridges. However, NVE
configuration is going to be supported with 802.1Q FIDs which is why the
related fields are placed in the common FID struct.

Since the device requires a 1:1 mapping between FID and VNI, the driver
maintains a hashtable keyed by VNI and checks if the VNI is already
associated with an existing FID.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |   8 +
 .../ethernet/mellanox/mlxsw/spectrum_fid.c    | 178 ++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 1f68ac2a20f4..7f96953f0409 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -679,6 +679,14 @@ int mlxsw_sp_setup_tc_prio(struct mlxsw_sp_port *mlxsw_sp_port,
 			   struct tc_prio_qopt_offload *p);
 
 /* spectrum_fid.c */
+int mlxsw_sp_fid_vni(const struct mlxsw_sp_fid *fid, __be32 *vni);
+int mlxsw_sp_fid_nve_flood_index_set(struct mlxsw_sp_fid *fid,
+				     u32 nve_flood_index);
+void mlxsw_sp_fid_nve_flood_index_clear(struct mlxsw_sp_fid *fid);
+bool mlxsw_sp_fid_nve_flood_index_is_set(const struct mlxsw_sp_fid *fid);
+int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, __be32 vni);
+void mlxsw_sp_fid_vni_clear(struct mlxsw_sp_fid *fid);
+bool mlxsw_sp_fid_vni_is_set(const struct mlxsw_sp_fid *fid);
 int mlxsw_sp_fid_flood_set(struct mlxsw_sp_fid *fid,
 			   enum mlxsw_sp_flood_type packet_type, u8 local_port,
 			   bool member);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 715d24ff937e..7e07cf368e90 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -6,6 +6,7 @@
 #include <linux/if_vlan.h>
 #include <linux/if_bridge.h>
 #include <linux/netdevice.h>
+#include <linux/rhashtable.h>
 #include <linux/rtnetlink.h>
 
 #include "spectrum.h"
@@ -14,6 +15,7 @@
 struct mlxsw_sp_fid_family;
 
 struct mlxsw_sp_fid_core {
+	struct rhashtable vni_ht;
 	struct mlxsw_sp_fid_family *fid_family_arr[MLXSW_SP_FID_TYPE_MAX];
 	unsigned int *port_fid_mappings;
 };
@@ -24,6 +26,12 @@ struct mlxsw_sp_fid {
 	unsigned int ref_count;
 	u16 fid_index;
 	struct mlxsw_sp_fid_family *fid_family;
+
+	struct rhash_head vni_ht_node;
+	__be32 vni;
+	u32 nve_flood_index;
+	u8 vni_valid:1,
+	   nve_flood_index_valid:1;
 };
 
 struct mlxsw_sp_fid_8021q {
@@ -36,6 +44,12 @@ struct mlxsw_sp_fid_8021d {
 	int br_ifindex;
 };
 
+static const struct rhashtable_params mlxsw_sp_fid_vni_ht_params = {
+	.key_len = sizeof_field(struct mlxsw_sp_fid, vni),
+	.key_offset = offsetof(struct mlxsw_sp_fid, vni),
+	.head_offset = offsetof(struct mlxsw_sp_fid, vni_ht_node),
+};
+
 struct mlxsw_sp_flood_table {
 	enum mlxsw_sp_flood_type packet_type;
 	enum mlxsw_reg_sfgc_bridge_type bridge_type;
@@ -56,6 +70,11 @@ struct mlxsw_sp_fid_ops {
 			    struct mlxsw_sp_port *port, u16 vid);
 	void (*port_vid_unmap)(struct mlxsw_sp_fid *fid,
 			       struct mlxsw_sp_port *port, u16 vid);
+	int (*vni_set)(struct mlxsw_sp_fid *fid, __be32 vni);
+	void (*vni_clear)(struct mlxsw_sp_fid *fid);
+	int (*nve_flood_index_set)(struct mlxsw_sp_fid *fid,
+				   u32 nve_flood_index);
+	void (*nve_flood_index_clear)(struct mlxsw_sp_fid *fid);
 };
 
 struct mlxsw_sp_fid_family {
@@ -94,6 +113,104 @@ static const int *mlxsw_sp_packet_type_sfgc_types[] = {
 	[MLXSW_SP_FLOOD_TYPE_MC]	= mlxsw_sp_sfgc_mc_packet_types,
 };
 
+int mlxsw_sp_fid_vni(const struct mlxsw_sp_fid *fid, __be32 *vni)
+{
+	if (!fid->vni_valid)
+		return -EINVAL;
+
+	*vni = fid->vni;
+
+	return 0;
+}
+
+int mlxsw_sp_fid_nve_flood_index_set(struct mlxsw_sp_fid *fid,
+				     u32 nve_flood_index)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+	const struct mlxsw_sp_fid_ops *ops = fid_family->ops;
+	int err;
+
+	if (WARN_ON(!ops->nve_flood_index_set || fid->nve_flood_index_valid))
+		return -EINVAL;
+
+	err = ops->nve_flood_index_set(fid, nve_flood_index);
+	if (err)
+		return err;
+
+	fid->nve_flood_index = nve_flood_index;
+	fid->nve_flood_index_valid = true;
+
+	return 0;
+}
+
+void mlxsw_sp_fid_nve_flood_index_clear(struct mlxsw_sp_fid *fid)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+	const struct mlxsw_sp_fid_ops *ops = fid_family->ops;
+
+	if (WARN_ON(!ops->nve_flood_index_clear || !fid->nve_flood_index_valid))
+		return;
+
+	fid->nve_flood_index_valid = false;
+	ops->nve_flood_index_clear(fid);
+}
+
+bool mlxsw_sp_fid_nve_flood_index_is_set(const struct mlxsw_sp_fid *fid)
+{
+	return fid->nve_flood_index_valid;
+}
+
+int mlxsw_sp_fid_vni_set(struct mlxsw_sp_fid *fid, __be32 vni)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+	const struct mlxsw_sp_fid_ops *ops = fid_family->ops;
+	struct mlxsw_sp *mlxsw_sp = fid_family->mlxsw_sp;
+	int err;
+
+	if (WARN_ON(!ops->vni_set || fid->vni_valid))
+		return -EINVAL;
+
+	fid->vni = vni;
+	err = rhashtable_lookup_insert_fast(&mlxsw_sp->fid_core->vni_ht,
+					    &fid->vni_ht_node,
+					    mlxsw_sp_fid_vni_ht_params);
+	if (err)
+		return err;
+
+	err = ops->vni_set(fid, vni);
+	if (err)
+		goto err_vni_set;
+
+	fid->vni_valid = true;
+
+	return 0;
+
+err_vni_set:
+	rhashtable_remove_fast(&mlxsw_sp->fid_core->vni_ht, &fid->vni_ht_node,
+			       mlxsw_sp_fid_vni_ht_params);
+	return err;
+}
+
+void mlxsw_sp_fid_vni_clear(struct mlxsw_sp_fid *fid)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+	const struct mlxsw_sp_fid_ops *ops = fid_family->ops;
+	struct mlxsw_sp *mlxsw_sp = fid_family->mlxsw_sp;
+
+	if (WARN_ON(!ops->vni_clear || !fid->vni_valid))
+		return;
+
+	fid->vni_valid = false;
+	ops->vni_clear(fid);
+	rhashtable_remove_fast(&mlxsw_sp->fid_core->vni_ht, &fid->vni_ht_node,
+			       mlxsw_sp_fid_vni_ht_params);
+}
+
+bool mlxsw_sp_fid_vni_is_set(const struct mlxsw_sp_fid *fid)
+{
+	return fid->vni_valid;
+}
+
 static const struct mlxsw_sp_flood_table *
 mlxsw_sp_fid_flood_table_lookup(const struct mlxsw_sp_fid *fid,
 				enum mlxsw_sp_flood_type packet_type)
@@ -217,6 +334,21 @@ static int mlxsw_sp_fid_op(struct mlxsw_sp *mlxsw_sp, u16 fid_index,
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfmr), sfmr_pl);
 }
 
+static int mlxsw_sp_fid_vni_op(struct mlxsw_sp *mlxsw_sp, u16 fid_index,
+			       __be32 vni, bool vni_valid, u32 nve_flood_index,
+			       bool nve_flood_index_valid)
+{
+	char sfmr_pl[MLXSW_REG_SFMR_LEN];
+
+	mlxsw_reg_sfmr_pack(sfmr_pl, MLXSW_REG_SFMR_OP_CREATE_FID, fid_index,
+			    0);
+	mlxsw_reg_sfmr_vv_set(sfmr_pl, vni_valid);
+	mlxsw_reg_sfmr_vni_set(sfmr_pl, be32_to_cpu(vni));
+	mlxsw_reg_sfmr_vtfp_set(sfmr_pl, nve_flood_index_valid);
+	mlxsw_reg_sfmr_nve_tunnel_flood_ptr_set(sfmr_pl, nve_flood_index);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfmr), sfmr_pl);
+}
+
 static int mlxsw_sp_fid_vid_map(struct mlxsw_sp *mlxsw_sp, u16 fid_index,
 				u16 vid, bool valid)
 {
@@ -531,6 +663,41 @@ mlxsw_sp_fid_8021d_port_vid_unmap(struct mlxsw_sp_fid *fid,
 				    mlxsw_sp_port->local_port, vid, false);
 }
 
+static int mlxsw_sp_fid_8021d_vni_set(struct mlxsw_sp_fid *fid, __be32 vni)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+
+	return mlxsw_sp_fid_vni_op(fid_family->mlxsw_sp, fid->fid_index, vni,
+				   true, fid->nve_flood_index,
+				   fid->nve_flood_index_valid);
+}
+
+static void mlxsw_sp_fid_8021d_vni_clear(struct mlxsw_sp_fid *fid)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+
+	mlxsw_sp_fid_vni_op(fid_family->mlxsw_sp, fid->fid_index, 0, false,
+			    fid->nve_flood_index, fid->nve_flood_index_valid);
+}
+
+static int mlxsw_sp_fid_8021d_nve_flood_index_set(struct mlxsw_sp_fid *fid,
+						  u32 nve_flood_index)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+
+	return mlxsw_sp_fid_vni_op(fid_family->mlxsw_sp, fid->fid_index,
+				   fid->vni, fid->vni_valid, nve_flood_index,
+				   true);
+}
+
+static void mlxsw_sp_fid_8021d_nve_flood_index_clear(struct mlxsw_sp_fid *fid)
+{
+	struct mlxsw_sp_fid_family *fid_family = fid->fid_family;
+
+	mlxsw_sp_fid_vni_op(fid_family->mlxsw_sp, fid->fid_index, fid->vni,
+			    fid->vni_valid, 0, false);
+}
+
 static const struct mlxsw_sp_fid_ops mlxsw_sp_fid_8021d_ops = {
 	.setup			= mlxsw_sp_fid_8021d_setup,
 	.configure		= mlxsw_sp_fid_8021d_configure,
@@ -540,6 +707,10 @@ static const struct mlxsw_sp_fid_ops mlxsw_sp_fid_8021d_ops = {
 	.flood_index		= mlxsw_sp_fid_8021d_flood_index,
 	.port_vid_map		= mlxsw_sp_fid_8021d_port_vid_map,
 	.port_vid_unmap		= mlxsw_sp_fid_8021d_port_vid_unmap,
+	.vni_set		= mlxsw_sp_fid_8021d_vni_set,
+	.vni_clear		= mlxsw_sp_fid_8021d_vni_clear,
+	.nve_flood_index_set	= mlxsw_sp_fid_8021d_nve_flood_index_set,
+	.nve_flood_index_clear	= mlxsw_sp_fid_8021d_nve_flood_index_clear,
 };
 
 static const struct mlxsw_sp_flood_table mlxsw_sp_fid_8021d_flood_tables[] = {
@@ -918,6 +1089,10 @@ int mlxsw_sp_fids_init(struct mlxsw_sp *mlxsw_sp)
 		return -ENOMEM;
 	mlxsw_sp->fid_core = fid_core;
 
+	err = rhashtable_init(&fid_core->vni_ht, &mlxsw_sp_fid_vni_ht_params);
+	if (err)
+		goto err_rhashtable_init;
+
 	fid_core->port_fid_mappings = kcalloc(max_ports, sizeof(unsigned int),
 					      GFP_KERNEL);
 	if (!fid_core->port_fid_mappings) {
@@ -944,6 +1119,8 @@ int mlxsw_sp_fids_init(struct mlxsw_sp *mlxsw_sp)
 	}
 	kfree(fid_core->port_fid_mappings);
 err_alloc_port_fid_mappings:
+	rhashtable_destroy(&fid_core->vni_ht);
+err_rhashtable_init:
 	kfree(fid_core);
 	return err;
 }
@@ -957,5 +1134,6 @@ void mlxsw_sp_fids_fini(struct mlxsw_sp *mlxsw_sp)
 		mlxsw_sp_fid_family_unregister(mlxsw_sp,
 					       fid_core->fid_family_arr[i]);
 	kfree(fid_core->port_fid_mappings);
+	rhashtable_destroy(&fid_core->vni_ht);
 	kfree(fid_core);
 }
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 00/18] mlxsw: Add VxLAN support
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel

This patchset adds support for VxLAN offload in the mlxsw driver.

With regards to the forwarding plane, VxLAN support is composed from two
main parts: Encapsulation and decapsulation.

In the device, NVE encapsulation (and VxLAN in particular) takes place
in the bridge. A packet can be encapsulated using VxLAN either because
it hit an FDB entry that forwards it to the router with the IP of the
remote VTEP or because it was flooded, in which case it is sent to a
list of remote VTEPs (in addition to local ports). In either case, the
VNI is derived from the filtering identifier (FID) the packet was
classified to at ingress and the underlay source IP is taken from a
device global configuration.

VxLAN decapsulation takes place in the underlay router, where packets
that hit a local route that corresponds to the source IP of the local
VTEP are decapsulated and injected to the bridge. The packets are
classified to a FID based on the VNI they came with.

The first six patches export the required APIs in the VxLAN and mlxsw
drivers in order to allow for the introduction of the NVE core in the
next two patches. The NVE core is designed to support a variety of NVE
encapsulations (e.g., VxLAN, NVGRE) and different ASICs, but currently
only VxLAN and Spectrum are supported. Spectrum-2 support will be added
in the future.

The last 10 patches add support for VxLAN decapsulation and
encapsulation and include the addition of the required switchdev APIs in
the VxLAN driver. These APIs allow capable drivers to get a notification
about the addition / deletion of FDB entries to / from the VxLAN's FDB.

Subsequent patchset will add selftests (generic and mlxsw-specific),
data plane learning, FDB extack and vetoing and support for VLAN-aware
bridges (one VNI per VxLAN device model).

v2:
* Implement netif_is_vxlan() using rtnl_link_ops->kind (Jakub & Stephen)

Ido Schimmel (14):
  mlxsw: spectrum_fid: Allow setting and clearing NVE properties on FID
  mlxsw: spectrum_fid: Add APIs to lookup FID without creating it
  mlxsw: spectrum_router: Enable local routes promotion to perform NVE
    decap
  mlxsw: spectrum_router: Allow querying VR ID based on table ID
  vxlan: Export address checking functions
  inet: Refactor INET_ECN_decapsulate()
  mlxsw: spectrum_nve: Implement common NVE core
  mlxsw: spectrum_nve: Implement VxLAN operations
  mlxsw: spectrum_fid: Clear NVE configuration when destroying 802.1D
    FIDs
  mlxsw: spectrum_router: Configure matching local routes for NVE decap
  net: Add netif_is_vxlan()
  bridge: switchdev: Allow clearing FDB entry offload indication
  mlxsw: spectrum: Enable VxLAN enslavement to bridges
  mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation

Petr Machata (4):
  vxlan: Add switchdev notifications
  vxlan: Add vxlan_fdb_find_uc() for FDB querying
  vxlan: Support marking RDSTs as offloaded
  vxlan: Notify for each remote of a removed FDB entry

 drivers/net/ethernet/mellanox/mlxsw/Makefile  |   3 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 125 +++
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  89 ++
 .../ethernet/mellanox/mlxsw/spectrum_fid.c    | 225 +++-
 .../ethernet/mellanox/mlxsw/spectrum_nve.c    | 982 ++++++++++++++++++
 .../ethernet/mellanox/mlxsw/spectrum_nve.h    |  49 +
 .../mellanox/mlxsw/spectrum_nve_vxlan.c       | 249 +++++
 .../ethernet/mellanox/mlxsw/spectrum_router.c | 138 ++-
 .../mellanox/mlxsw/spectrum_switchdev.c       | 552 +++++++++-
 .../netronome/nfp/flower/tunnel_conf.c        |   3 +-
 drivers/net/ethernet/rocker/rocker_main.c     |   1 +
 drivers/net/vxlan.c                           | 176 +++-
 include/net/inet_ecn.h                        |  18 +-
 include/net/switchdev.h                       |   7 +-
 include/net/vxlan.h                           |  64 ++
 net/bridge/br.c                               |   4 +-
 net/bridge/br_fdb.c                           |   4 +-
 net/bridge/br_private.h                       |   2 +-
 net/bridge/br_switchdev.c                     |   9 +-
 net/dsa/slave.c                               |   1 +
 20 files changed, 2644 insertions(+), 57 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve_vxlan.c

-- 
2.17.2

^ permalink raw reply

* Re: [PATCH net-next] net: ena: Fix Kconfig dependencies X86
From: Sergei Shtylyov @ 2018-10-17  8:37 UTC (permalink / raw)
  To: netanel, davem, netdev
  Cc: akiyano, alisaidi, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik
In-Reply-To: <20181017081601.42717-1-netanel@amazon.com>

Hello!

On 17.10.2018 11:16, netanel@amazon.com wrote:

> From: Netanel Belgazal <netanel@amazon.com>
>
> The Kconfig limitation of X86 is to too wide.
> The ENA driver only requires a little endian dependency.
>
> Change the dependency to be on little endian CPU.
>
> Signed-off-by: Netanel Belgazal <netanel@amazon.com>
> ---
>  drivers/net/ethernet/amazon/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/amazon/Kconfig b/drivers/net/ethernet/amazon/Kconfig
> index 99b30353541a..f4d16c7e104f 100644
> --- a/drivers/net/ethernet/amazon/Kconfig
> +++ b/drivers/net/ethernet/amazon/Kconfig
> @@ -17,7 +17,7 @@ if NET_VENDOR_AMAZON
>
>  config ENA_ETHERNET
>  	tristate "Elastic Network Adapter (ENA) support"
> -	depends on (PCI_MSI && X86)
> +	depends on (PCI_MSI && !CPU_BIG_ENDIAN)

     Parens not needed here. High time to remove them, I think.

[...]

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] [PATCH net-next] openvswitch: Use correct reply values in datapath and vport ops
From: Koichiro Den @ 2018-10-17  8:24 UTC (permalink / raw)
  To: David Miller, pkusunyifeng; +Cc: netdev
In-Reply-To: <20180929.114446.1833643989991465370.davem@davemloft.net>

On Sat, 2018-09-29 at 11:44 -0700, David Miller wrote:
> From: Yifeng Sun <pkusunyifeng@gmail.com>
> Date: Wed, 26 Sep 2018 11:40:14 -0700
> 
> > This patch fixes the bug that all datapath and vport ops are returning
> > wrong values (OVS_FLOW_CMD_NEW or OVS_DP_CMD_NEW) in their replies.
> > 
> > Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
> 
> Applied.

Not directly relating to ovs, but as a similar existing behaviour, we do have
somewhat odd response for CTRL_CMD_GETFAMILY. I mean this part:

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 25eeb6d2a75a..17428d5d1434 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -789,7 +789,7 @@ static int ctrl_dumpfamily(struct sk_buff *skb, struct
netlink_callback *cb)

                if (ctrl_fill_info(rt, NETLINK_CB(cb->skb).portid,
                                   cb->nlh->nlmsg_seq, NLM_F_MULTI,
-                                  skb, CTRL_CMD_NEWFAMILY) < 0) {
+                                  skb, CTRL_CMD_GETFAMILY) < 0) {
                        n--;
                        break;
                }
@@ -886,7 +886,7 @@ static int ctrl_getfamily(struct sk_buff *skb, struct
genl_info *info)
        }

        msg = ctrl_build_family_msg(res, info->snd_portid, info->snd_seq,
-                                   CTRL_CMD_NEWFAMILY);
+                                   CTRL_CMD_GETFAMILY);
        if (IS_ERR(msg))
                return PTR_ERR(msg);

David, do you think we should retain this existing bahaviour, since this would
be a relatively huge breaking change for userland apps?

^ permalink raw reply related

* [PATCH net-next] net: ena: Fix Kconfig dependencies X86
From: netanel @ 2018-10-17  8:16 UTC (permalink / raw)
  To: davem, netdev
  Cc: akiyano, alisaidi, Netanel Belgazal, dwmw, zorik, matua, saeedb,
	msw, aliguori, nafea, gtzalik

From: Netanel Belgazal <netanel@amazon.com>

The Kconfig limitation of X86 is to too wide.
The ENA driver only requires a little endian dependency.

Change the dependency to be on little endian CPU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/Kconfig b/drivers/net/ethernet/amazon/Kconfig
index 99b30353541a..f4d16c7e104f 100644
--- a/drivers/net/ethernet/amazon/Kconfig
+++ b/drivers/net/ethernet/amazon/Kconfig
@@ -17,7 +17,7 @@ if NET_VENDOR_AMAZON
 
 config ENA_ETHERNET
 	tristate "Elastic Network Adapter (ENA) support"
-	depends on (PCI_MSI && X86)
+	depends on (PCI_MSI && !CPU_BIG_ENDIAN)
 	---help---
 	  This driver supports Elastic Network Adapter (ENA)"
 
-- 
2.15.2.AMZN

^ permalink raw reply related

* Re: [PATCH net-next] ixgbe: fix XFRM_ALGO dependency
From: Arnd Bergmann @ 2018-10-17 16:04 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: shannon.nelson, David Miller, Steffen Klassert, Herbert Xu,
	Jesse Brandeburg, Björn Töpel, Alexander Duyck,
	intel-wired-lan, Networking, Linux Kernel Mailing List
In-Reply-To: <3f8e7a6e9532abcd772927432d5f00cebeba08b8.camel@intel.com>

On Wed, Oct 17, 2018 at 5:53 PM Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> On Tue, 2018-10-16 at 09:35 -0700, Shannon Nelson wrote:
> > On 10/16/2018 3:03 AM, Arnd Bergmann wrote:
> > > A separate Kconfig symbol now controls whether we include the ipsec
> > > offload code. To keep the old behavior, this is left as 'default
> > > y'. The
> > > dependency in XFRM_OFFLOAD still causes a circular dependency but
> > > is
> > > not actually needed because this symbol is not user visible, so
> > > removing
> > > that dependency on top makes it all work.
> > >
> > > Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> I agree with Shannon's suggested changes.  Arnd, are you working on v2?
> Or would you like me to take care of it?

I was planning to respin it, but didn't get around to it yet, and will
be travelling for the next week, so I'd welcome if you can take over
from here. Shannon's comments all make sense to me as well.

> > > +config IXGBE_IPSEC
> > > +   bool "IPSec XFRM cryptography-offload accelaration"
> > > +   default n
> >
> > remove this "default n" line?

I meant for this to say "default y", as I said in the changelog,
but feel free to pick whichever default makes sense to you
make make the description match ;-)

Thanks,

       Arnd

^ permalink raw reply

* [PATCH net] mlxsw: core: Fix use-after-free when flashing firmware during init
From: Ido Schimmel @ 2018-10-17  8:05 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	Alexander Petrovskiy, mlxsw, Ido Schimmel

When the switch driver (e.g., mlxsw_spectrum) determines it needs to
flash a new firmware version it resets the ASIC after the flashing
process. The bus driver (e.g., mlxsw_pci) then registers itself again
with mlxsw_core which means (among other things) that the device
registers itself again with the hwmon subsystem again.

Since the device was registered with the hwmon subsystem using
devm_hwmon_device_register_with_groups(), then the old hwmon device
(registered before the flashing) was never unregistered and was
referencing stale data, resulting in a use-after free.

Fix by removing reliance on device managed APIs in mlxsw_hwmon_init().

Fixes: c86d62cc410c ("mlxsw: spectrum: Reset FW after flash")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c      |  2 ++
 drivers/net/ethernet/mellanox/mlxsw/core.h      |  4 ++++
 .../net/ethernet/mellanox/mlxsw/core_hwmon.c    | 17 +++++++++++------
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 81533d7f395c..937d0ace699a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1055,6 +1055,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 err_driver_init:
 	mlxsw_thermal_fini(mlxsw_core->thermal);
 err_thermal_init:
+	mlxsw_hwmon_fini(mlxsw_core->hwmon);
 err_hwmon_init:
 	if (!reload)
 		devlink_unregister(devlink);
@@ -1088,6 +1089,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
 	if (mlxsw_core->driver->fini)
 		mlxsw_core->driver->fini(mlxsw_core);
 	mlxsw_thermal_fini(mlxsw_core->thermal);
+	mlxsw_hwmon_fini(mlxsw_core->hwmon);
 	if (!reload)
 		devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 655ddd204ab2..c35be477856f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -359,6 +359,10 @@ static inline int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 	return 0;
 }
 
+static inline void mlxsw_hwmon_fini(struct mlxsw_hwmon *mlxsw_hwmon)
+{
+}
+
 #endif
 
 struct mlxsw_thermal;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c b/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
index f6cf2896d337..e04e8162aa14 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_hwmon.c
@@ -303,8 +303,7 @@ int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 	struct device *hwmon_dev;
 	int err;
 
-	mlxsw_hwmon = devm_kzalloc(mlxsw_bus_info->dev, sizeof(*mlxsw_hwmon),
-				   GFP_KERNEL);
+	mlxsw_hwmon = kzalloc(sizeof(*mlxsw_hwmon), GFP_KERNEL);
 	if (!mlxsw_hwmon)
 		return -ENOMEM;
 	mlxsw_hwmon->core = mlxsw_core;
@@ -321,10 +320,9 @@ int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 	mlxsw_hwmon->groups[0] = &mlxsw_hwmon->group;
 	mlxsw_hwmon->group.attrs = mlxsw_hwmon->attrs;
 
-	hwmon_dev = devm_hwmon_device_register_with_groups(mlxsw_bus_info->dev,
-							   "mlxsw",
-							   mlxsw_hwmon,
-							   mlxsw_hwmon->groups);
+	hwmon_dev = hwmon_device_register_with_groups(mlxsw_bus_info->dev,
+						      "mlxsw", mlxsw_hwmon,
+						      mlxsw_hwmon->groups);
 	if (IS_ERR(hwmon_dev)) {
 		err = PTR_ERR(hwmon_dev);
 		goto err_hwmon_register;
@@ -337,5 +335,12 @@ int mlxsw_hwmon_init(struct mlxsw_core *mlxsw_core,
 err_hwmon_register:
 err_fans_init:
 err_temp_init:
+	kfree(mlxsw_hwmon);
 	return err;
 }
+
+void mlxsw_hwmon_fini(struct mlxsw_hwmon *mlxsw_hwmon)
+{
+	hwmon_device_unregister(mlxsw_hwmon->hwmon_dev);
+	kfree(mlxsw_hwmon);
+}
-- 
2.17.2

^ permalink raw reply related

* Re: [RFC PATCH 10/11] dt-bindings: net: ti: deprecate cpsw-phy-sel bindings
From: Rob Herring @ 2018-10-17 15:41 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Tony Lindgren, Kishon Vijay Abraham I,
	Sekhar Nori, linux-kernel, linux-omap, devicetree,
	Grygorii Strashko
In-Reply-To: <20181008234949.15416-11-grygorii.strashko@ti.com>

On Mon, 8 Oct 2018 18:49:48 -0500, Grygorii Strashko wrote:
> The cpsw-phy-sel driver was replaced with new PHY driver phy-gmii-sel, so
> deprecate cpsw-phy-sel bindings.
> 
> Cc: Kishon Vijay Abraham I <kishon@ti.com>
> Cc: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  Documentation/devicetree/bindings/net/cpsw-phy-sel.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH 04/11] dt-bindings: net: ti: cpsw: switch to use phy-gmii-sel phy
From: Rob Herring @ 2018-10-17 15:41 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Tony Lindgren, Kishon Vijay Abraham I,
	Sekhar Nori, linux-kernel, linux-omap, devicetree
In-Reply-To: <20181008234949.15416-5-grygorii.strashko@ti.com>

On Mon, Oct 08, 2018 at 06:49:42PM -0500, Grygorii Strashko wrote:
> The cpsw-phy-sel driver was replaced with new PHY driver phy-gmii-sel, so
> deprecate cpsw-phy-sel bindings and update CPSW binding to use phy-gmii-sel
> PHY bindings.
> 
> Cc: Kishon Vijay Abraham I <kishon@ti.com>
> Cc: Tony Lindgren <tony@atomide.com>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  Documentation/devicetree/bindings/net/cpsw.txt | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt
> index b3acebe..69d7ef8 100644
> --- a/Documentation/devicetree/bindings/net/cpsw.txt
> +++ b/Documentation/devicetree/bindings/net/cpsw.txt
> @@ -22,7 +22,8 @@ Required properties:
>  - cpsw-phy-sel		: Specifies the phandle to the CPSW phy mode selection
>  			  device. See also cpsw-phy-sel.txt for it's binding.
>  			  Note that in legacy cases cpsw-phy-sel may be
> -			  a child device instead of a phandle.
> +			  a child device instead of a phandle
> +			  (DEPRECATED, use phy-gmii-sel PHY phandle).

phy-gmii-sel is outside the scope of this binding. Just say use 'phys' 
property instead.

>  
>  Optional properties:
>  - ti,hwmods		: Must be "cpgmac0"
> @@ -44,6 +45,7 @@ Optional properties:
>  Slave Properties:
>  Required properties:
>  - phy-mode		: See ethernet.txt file in the same directory
> +- phys			: phandle on phy-gmii-sel PHY (see phy/ti-phy-gmii-sel.txt)
>  
>  Optional properties:
>  - dual_emac_res_vlan	: Specifies VID to be used to segregate the ports
> @@ -85,12 +87,14 @@ Examples:
>  			phy-mode = "rgmii-txid";
>  			/* Filled in by U-Boot */
>  			mac-address = [ 00 00 00 00 00 00 ];
> +			phys = <&phy_gmii_sel 1 0>;
>  		};
>  		cpsw_emac1: slave@1 {
>  			phy_id = <&davinci_mdio>, <1>;
>  			phy-mode = "rgmii-txid";
>  			/* Filled in by U-Boot */
>  			mac-address = [ 00 00 00 00 00 00 ];
> +			phys = <&phy_gmii_sel 2 0>;
>  		};
>  	};
>  
> @@ -114,11 +118,13 @@ Examples:
>  			phy-mode = "rgmii-txid";
>  			/* Filled in by U-Boot */
>  			mac-address = [ 00 00 00 00 00 00 ];
> +			phys = <&phy_gmii_sel 1 0>;
>  		};
>  		cpsw_emac1: slave@1 {
>  			phy_id = <&davinci_mdio>, <1>;
>  			phy-mode = "rgmii-txid";
>  			/* Filled in by U-Boot */
>  			mac-address = [ 00 00 00 00 00 00 ];
> +			phys = <&phy_gmii_sel 2 0>;
>  		};
>  	};
> -- 
> 2.10.5
> 

^ permalink raw reply

* Re: [RFC PATCH 02/11] dt-bindings: phy: add cpsw port interface mode selection phy bindings
From: Rob Herring @ 2018-10-17 15:39 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Tony Lindgren, David S. Miller, netdev, Kishon Vijay Abraham I,
	Sekhar Nori, linux-kernel, linux-omap, devicetree
In-Reply-To: <3fa09831-0f70-dfcc-3fd9-f877215a4631@ti.com>

On Tue, Oct 09, 2018 at 03:10:34PM -0500, Grygorii Strashko wrote:
> 
> 
> On 10/09/2018 09:40 AM, Tony Lindgren wrote:
> > * Grygorii Strashko <grygorii.strashko@ti.com> [181008 23:54]:
> >> +Examples:
> >> +	phy_gmii_sel: phy-gmii-sel {
> >> +		compatible = "ti,am3352-phy-gmii-sel";
> >> +		syscon-scm = <&scm_conf>;
> >> +		#phy-cells = <2>;
> >> +	};
> > 
> > Now that this driver can live in it's proper place in the
> 
> right
> 
> > dts, you may want to consider just using standard reg
> > property for it instead of the syscon-scm. And also get
> > rid of the syscon reads and writes.
> 
> Could you help clarify how to get syscon in this case?
> syscon_node_to_regmap(dev->parent->of_node)?
> 
> Also, there are could be more then one gmii_sel registers in SCM in the future,
> so I hidden offsets in of_match data. 
> As result, "reg" not needed at all now.

If there's a defined register range which doesn't overlap with other 
things (other than the parent), then use reg whether you currently need 
it or not.

Rob

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox