Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v2 09/18] mlxsw: spectrum_fid: Clear NVE configuration when destroying 802.1D FIDs
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

802.1D FIDs are used to represent VLAN-unaware bridges and currently
this is the only type of FID that supports NVE configuration.

Since the NVE tunnel device does not take a reference on the FID, it is
possible for the FID to be destroyed when it still has NVE
configuration.

Therefore, when destroying the FID make sure to disable its NVE
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 0ba3d90d4632..a3db033d7399 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -538,6 +538,8 @@ static int mlxsw_sp_fid_8021d_configure(struct mlxsw_sp_fid *fid)
 
 static void mlxsw_sp_fid_8021d_deconfigure(struct mlxsw_sp_fid *fid)
 {
+	if (fid->vni_valid)
+		mlxsw_sp_nve_fid_disable(fid->fid_family->mlxsw_sp, fid);
 	mlxsw_sp_fid_op(fid->fid_family->mlxsw_sp, fid->fid_index, 0, false);
 }
 
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 15/18] vxlan: Notify for each remote of a removed FDB entry
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

When notifications are sent about FDB activity, and an FDB entry with
several remotes is removed, the notification is sent only for the first
destination. That makes it impossible to distinguish between the case
where only this first remote is removed, and the one where the FDB entry
is removed as a whole.

Therefore send one notification for each remote of a removed FDB entry.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/vxlan.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e98fc54379f8..1d74f90d6f5d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -843,12 +843,15 @@ static void vxlan_fdb_free(struct rcu_head *head)
 static void vxlan_fdb_destroy(struct vxlan_dev *vxlan, struct vxlan_fdb *f,
 			      bool do_notify)
 {
+	struct vxlan_rdst *rd;
+
 	netdev_dbg(vxlan->dev,
 		    "delete %pM\n", f->eth_addr);
 
 	--vxlan->addrcnt;
 	if (do_notify)
-		vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_DELNEIGH);
+		list_for_each_entry(rd, &f->remotes, list)
+			vxlan_fdb_notify(vxlan, f, rd, RTM_DELNEIGH);
 
 	hlist_del_rcu(&f->hlist);
 	call_rcu(&f->rcu, vxlan_fdb_free);
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 10/18] mlxsw: spectrum_router: Configure matching local routes for NVE decap
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

When a local route that matches the source IP of an offloaded NVE tunnel
is notified, the driver needs to program it to perform NVE decapsulation
instead of merely trapping packets to the CPU.

This patch complements "mlxsw: spectrum_router: Enable local routes
promotion to perform NVE decap" where existing local routes were
promoted to perform NVE decapsulation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 31b5491d6737..9e9bb57134f2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -4247,6 +4247,7 @@ mlxsw_sp_fib4_entry_type_set(struct mlxsw_sp *mlxsw_sp,
 			     struct mlxsw_sp_fib_entry *fib_entry)
 {
 	union mlxsw_sp_l3addr dip = { .addr4 = htonl(fen_info->dst) };
+	u32 tb_id = mlxsw_sp_fix_tb_id(fen_info->tb_id);
 	struct net_device *dev = fen_info->fi->fib_dev;
 	struct mlxsw_sp_ipip_entry *ipip_entry;
 	struct fib_info *fi = fen_info->fi;
@@ -4261,6 +4262,15 @@ mlxsw_sp_fib4_entry_type_set(struct mlxsw_sp *mlxsw_sp,
 							     fib_entry,
 							     ipip_entry);
 		}
+		if (mlxsw_sp_nve_ipv4_route_is_decap(mlxsw_sp, tb_id,
+						     dip.addr4)) {
+			u32 t_index;
+
+			t_index = mlxsw_sp_nve_decap_tunnel_index_get(mlxsw_sp);
+			fib_entry->decap.tunnel_index = t_index;
+			fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP;
+			return 0;
+		}
 		/* fall through */
 	case RTN_BROADCAST:
 		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 11/18] net: Add netif_is_vxlan()
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Add the ability to determine whether a netdev is a VxLAN netdev by
calling the above mentioned function that checks the netdev's
rtnl_link_ops.

This will allow modules to identify netdev events involving a VxLAN
netdev and act accordingly. For example, drivers capable of VxLAN
offload will need to configure the underlying device when a VxLAN netdev
is being enslaved to an offloaded bridge.

Convert nfp to use the newly introduced helper.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 3 ++-
 include/net/vxlan.h                                     | 7 +++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 30c926c4bc47..8e5bec04d1f9 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -4,6 +4,7 @@
 #include <linux/etherdevice.h>
 #include <linux/inetdevice.h>
 #include <net/netevent.h>
+#include <net/vxlan.h>
 #include <linux/idr.h>
 #include <net/dst_metadata.h>
 #include <net/arp.h>
@@ -187,7 +188,7 @@ static bool nfp_tun_is_netdev_to_offload(struct net_device *netdev)
 		return false;
 	if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
 		return true;
-	if (!strcmp(netdev->rtnl_link_ops->kind, "vxlan"))
+	if (netif_is_vxlan(netdev))
 		return true;
 
 	return false;
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index dd3d72ce64b6..95227fa925e8 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -5,6 +5,7 @@
 #include <linux/if_vlan.h>
 #include <net/udp_tunnel.h>
 #include <net/dst_metadata.h>
+#include <net/rtnetlink.h>
 
 /* VXLAN protocol (RFC 7348) header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -402,4 +403,10 @@ static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa)
 
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
+static inline bool netif_is_vxlan(const struct net_device *dev)
+{
+	return dev->rtnl_link_ops &&
+	       !strcmp(dev->rtnl_link_ops->kind, "vxlan");
+}
+
 #endif
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 16/18] bridge: switchdev: Allow clearing FDB entry offload indication
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Currently, an FDB entry only ceases being offloaded when it is deleted.
This changes with VxLAN encapsulation.

Devices capable of performing VxLAN encapsulation usually have only one
FDB table, unlike the software data path which has two - one in the
bridge driver and another in the VxLAN driver.

Therefore, bridge FDB entries pointing to a VxLAN device are only
offloaded if there is a corresponding entry in the VxLAN FDB.

Allow clearing the offload indication in case the corresponding entry
was deleted from the VxLAN FDB.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 9 +++++----
 drivers/net/ethernet/rocker/rocker_main.c                | 1 +
 include/net/switchdev.h                                  | 3 ++-
 net/bridge/br.c                                          | 4 ++--
 net/bridge/br_fdb.c                                      | 4 ++--
 net/bridge/br_private.h                                  | 2 +-
 net/bridge/br_switchdev.c                                | 9 ++++++---
 net/dsa/slave.c                                          | 1 +
 8 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index fa16ad2c6a50..a89075beef94 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2090,12 +2090,13 @@ void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
 static void
 mlxsw_sp_fdb_call_notifiers(enum switchdev_notifier_type type,
 			    const char *mac, u16 vid,
-			    struct net_device *dev)
+			    struct net_device *dev, bool offloaded)
 {
 	struct switchdev_notifier_fdb_info info;
 
 	info.addr = mac;
 	info.vid = vid;
+	info.offloaded = offloaded;
 	call_switchdev_notifiers(type, dev, &info.info);
 }
 
@@ -2147,7 +2148,7 @@ static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
 	if (!do_notification)
 		return;
 	type = adding ? SWITCHDEV_FDB_ADD_TO_BRIDGE : SWITCHDEV_FDB_DEL_TO_BRIDGE;
-	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev);
+	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev, adding);
 
 	return;
 
@@ -2207,7 +2208,7 @@ static void mlxsw_sp_fdb_notify_mac_lag_process(struct mlxsw_sp *mlxsw_sp,
 	if (!do_notification)
 		return;
 	type = adding ? SWITCHDEV_FDB_ADD_TO_BRIDGE : SWITCHDEV_FDB_DEL_TO_BRIDGE;
-	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev);
+	mlxsw_sp_fdb_call_notifiers(type, mac, vid, bridge_port->dev, adding);
 
 	return;
 
@@ -2312,7 +2313,7 @@ static void mlxsw_sp_switchdev_bridge_fdb_event_work(struct work_struct *work)
 			break;
 		mlxsw_sp_fdb_call_notifiers(SWITCHDEV_FDB_OFFLOADED,
 					    fdb_info->addr,
-					    fdb_info->vid, dev);
+					    fdb_info->vid, dev, true);
 		break;
 	case SWITCHDEV_FDB_DEL_TO_DEVICE:
 		fdb_info = &switchdev_work->fdb_info;
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index aeafdb9ac015..8721c0506af3 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2728,6 +2728,7 @@ rocker_fdb_offload_notify(struct rocker_port *rocker_port,
 
 	info.addr = recv_info->addr;
 	info.vid = recv_info->vid;
+	info.offloaded = true;
 	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
 				 rocker_port->dev, &info.info);
 }
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index b040f82351ba..881ecb1555bf 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -159,7 +159,8 @@ struct switchdev_notifier_fdb_info {
 	struct switchdev_notifier_info info; /* must be first */
 	const unsigned char *addr;
 	u16 vid;
-	bool added_by_user;
+	u8 added_by_user:1,
+	   offloaded:1;
 };
 
 static inline struct net_device *
diff --git a/net/bridge/br.c b/net/bridge/br.c
index e411e40333e2..360ad66c21e9 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -151,7 +151,7 @@ static int br_switchdev_event(struct notifier_block *unused,
 			break;
 		}
 		br_fdb_offloaded_set(br, p, fdb_info->addr,
-				     fdb_info->vid);
+				     fdb_info->vid, true);
 		break;
 	case SWITCHDEV_FDB_DEL_TO_BRIDGE:
 		fdb_info = ptr;
@@ -163,7 +163,7 @@ static int br_switchdev_event(struct notifier_block *unused,
 	case SWITCHDEV_FDB_OFFLOADED:
 		fdb_info = ptr;
 		br_fdb_offloaded_set(br, p, fdb_info->addr,
-				     fdb_info->vid);
+				     fdb_info->vid, fdb_info->offloaded);
 		break;
 	}
 
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 74331690a390..e56ba3912a90 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -1152,7 +1152,7 @@ int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
 }
 
 void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
-			  const unsigned char *addr, u16 vid)
+			  const unsigned char *addr, u16 vid, bool offloaded)
 {
 	struct net_bridge_fdb_entry *fdb;
 
@@ -1160,7 +1160,7 @@ void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
 
 	fdb = br_fdb_find(br, addr, vid);
 	if (fdb)
-		fdb->offloaded = 1;
+		fdb->offloaded = offloaded;
 
 	spin_unlock_bh(&br->hash_lock);
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 10ee39fdca5c..2920e06a5403 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -574,7 +574,7 @@ int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
 			      const unsigned char *addr, u16 vid,
 			      bool swdev_notify);
 void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
-			  const unsigned char *addr, u16 vid);
+			  const unsigned char *addr, u16 vid, bool offloaded);
 
 /* br_forward.c */
 enum br_pkt_type {
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index d77f807420c4..b993df770675 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -103,7 +103,7 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
 static void
 br_switchdev_fdb_call_notifiers(bool adding, const unsigned char *mac,
 				u16 vid, struct net_device *dev,
-				bool added_by_user)
+				bool added_by_user, bool offloaded)
 {
 	struct switchdev_notifier_fdb_info info;
 	unsigned long notifier_type;
@@ -111,6 +111,7 @@ br_switchdev_fdb_call_notifiers(bool adding, const unsigned char *mac,
 	info.addr = mac;
 	info.vid = vid;
 	info.added_by_user = added_by_user;
+	info.offloaded = offloaded;
 	notifier_type = adding ? SWITCHDEV_FDB_ADD_TO_DEVICE : SWITCHDEV_FDB_DEL_TO_DEVICE;
 	call_switchdev_notifiers(notifier_type, dev, &info.info);
 }
@@ -126,13 +127,15 @@ br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type)
 		br_switchdev_fdb_call_notifiers(false, fdb->key.addr.addr,
 						fdb->key.vlan_id,
 						fdb->dst->dev,
-						fdb->added_by_user);
+						fdb->added_by_user,
+						fdb->offloaded);
 		break;
 	case RTM_NEWNEIGH:
 		br_switchdev_fdb_call_notifiers(true, fdb->key.addr.addr,
 						fdb->key.vlan_id,
 						fdb->dst->dev,
-						fdb->added_by_user);
+						fdb->added_by_user,
+						fdb->offloaded);
 		break;
 	}
 }
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 3f840b6eea69..5428ef529019 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1478,6 +1478,7 @@ static void dsa_slave_switchdev_event_work(struct work_struct *work)
 			netdev_dbg(dev, "fdb add failed err=%d\n", err);
 			break;
 		}
+		fdb_info->offloaded = true;
 		call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED, dev,
 					 &fdb_info->info);
 		break;
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 17/18] mlxsw: spectrum: Enable VxLAN enslavement to bridges
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

Enslavement of VxLAN devices to offloaded bridges was never forbidden by
mlxsw, but this patch makes sure the required configuration is performed
in order to allow VxLAN encapsulation and decapsulation to take place in
the device.

The patch handles both the case where a VxLAN device is enslaved to an
already offloaded bridge and the case where the first mlxsw port is
enslaved to a bridge that already has VxLAN device configured.

Invalid configurations are sanitized and an error string is returned via
extack.

Since encapsulation and decapsulation do not occur when the VxLAN device
is down, the driver makes sure to enable / disable these functionalities
based on NETDEV_PRE_UP and NETDEV_DOWN events.

Note that NETDEV_PRE_UP is used in favor of NETDEV_UP, as the former
allows to veto the operation, if necessary.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 104 +++++++++++++
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  27 ++++
 .../mellanox/mlxsw/spectrum_switchdev.c       | 137 +++++++++++++++++-
 3 files changed, 267 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 68079b16adfa..8a4983adae94 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4587,6 +4587,41 @@ static void mlxsw_sp_port_ovs_leave(struct mlxsw_sp_port *mlxsw_sp_port)
 	mlxsw_sp_port_vp_mode_set(mlxsw_sp_port, false);
 }
 
+static bool mlxsw_sp_bridge_has_multiple_vxlans(struct net_device *br_dev)
+{
+	unsigned int num_vxlans = 0;
+	struct net_device *dev;
+	struct list_head *iter;
+
+	netdev_for_each_lower_dev(br_dev, dev, iter) {
+		if (netif_is_vxlan(dev))
+			num_vxlans++;
+	}
+
+	return num_vxlans > 1;
+}
+
+static bool mlxsw_sp_bridge_vxlan_is_valid(struct net_device *br_dev,
+					   struct netlink_ext_ack *extack)
+{
+	if (br_multicast_enabled(br_dev)) {
+		NL_SET_ERR_MSG_MOD(extack, "Multicast can not be enabled on a bridge with a VxLAN device");
+		return false;
+	}
+
+	if (br_vlan_enabled(br_dev)) {
+		NL_SET_ERR_MSG_MOD(extack, "VLAN filtering can not be enabled on a bridge with a VxLAN device");
+		return false;
+	}
+
+	if (mlxsw_sp_bridge_has_multiple_vxlans(br_dev)) {
+		NL_SET_ERR_MSG_MOD(extack, "Multiple VxLAN devices are not supported in a VLAN-unaware bridge");
+		return false;
+	}
+
+	return true;
+}
+
 static int mlxsw_sp_netdevice_port_upper_event(struct net_device *lower_dev,
 					       struct net_device *dev,
 					       unsigned long event, void *ptr)
@@ -4616,6 +4651,11 @@ static int mlxsw_sp_netdevice_port_upper_event(struct net_device *lower_dev,
 		}
 		if (!info->linking)
 			break;
+		if (netif_is_bridge_master(upper_dev) &&
+		    !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp, upper_dev) &&
+		    mlxsw_sp_bridge_has_vxlan(upper_dev) &&
+		    !mlxsw_sp_bridge_vxlan_is_valid(upper_dev, extack))
+			return -EOPNOTSUPP;
 		if (netdev_has_any_upper_dev(upper_dev) &&
 		    (!netif_is_bridge_master(upper_dev) ||
 		     !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp,
@@ -4773,6 +4813,11 @@ static int mlxsw_sp_netdevice_port_vlan_event(struct net_device *vlan_dev,
 		}
 		if (!info->linking)
 			break;
+		if (netif_is_bridge_master(upper_dev) &&
+		    !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp, upper_dev) &&
+		    mlxsw_sp_bridge_has_vxlan(upper_dev) &&
+		    !mlxsw_sp_bridge_vxlan_is_valid(upper_dev, extack))
+			return -EOPNOTSUPP;
 		if (netdev_has_any_upper_dev(upper_dev) &&
 		    (!netif_is_bridge_master(upper_dev) ||
 		     !mlxsw_sp_bridge_device_is_offloaded(mlxsw_sp,
@@ -4919,6 +4964,63 @@ static bool mlxsw_sp_is_vrf_event(unsigned long event, void *ptr)
 	return netif_is_l3_master(info->upper_dev);
 }
 
+static int mlxsw_sp_netdevice_vxlan_event(struct mlxsw_sp *mlxsw_sp,
+					  struct net_device *dev,
+					  unsigned long event, void *ptr)
+{
+	struct netdev_notifier_changeupper_info *cu_info;
+	struct netdev_notifier_info *info = ptr;
+	struct netlink_ext_ack *extack;
+	struct net_device *upper_dev;
+
+	extack = netdev_notifier_info_to_extack(info);
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		cu_info = container_of(info,
+				       struct netdev_notifier_changeupper_info,
+				       info);
+		upper_dev = cu_info->upper_dev;
+		if (!netif_is_bridge_master(upper_dev))
+			return 0;
+		if (!mlxsw_sp_lower_get(upper_dev))
+			return 0;
+		if (!mlxsw_sp_bridge_vxlan_is_valid(upper_dev, extack))
+			return -EOPNOTSUPP;
+		if (cu_info->linking) {
+			if (!netif_running(dev))
+				return 0;
+			return mlxsw_sp_bridge_vxlan_join(mlxsw_sp, upper_dev,
+							  dev, extack);
+		} else {
+			mlxsw_sp_bridge_vxlan_leave(mlxsw_sp, upper_dev, dev);
+		}
+		break;
+	case NETDEV_PRE_UP:
+		upper_dev = netdev_master_upper_dev_get(dev);
+		if (!upper_dev)
+			return 0;
+		if (!netif_is_bridge_master(upper_dev))
+			return 0;
+		if (!mlxsw_sp_lower_get(upper_dev))
+			return 0;
+		return mlxsw_sp_bridge_vxlan_join(mlxsw_sp, upper_dev, dev,
+						  extack);
+	case NETDEV_DOWN:
+		upper_dev = netdev_master_upper_dev_get(dev);
+		if (!upper_dev)
+			return 0;
+		if (!netif_is_bridge_master(upper_dev))
+			return 0;
+		if (!mlxsw_sp_lower_get(upper_dev))
+			return 0;
+		mlxsw_sp_bridge_vxlan_leave(mlxsw_sp, upper_dev, dev);
+		break;
+	}
+
+	return 0;
+}
+
 static int mlxsw_sp_netdevice_event(struct notifier_block *nb,
 				    unsigned long event, void *ptr)
 {
@@ -4935,6 +5037,8 @@ static int mlxsw_sp_netdevice_event(struct notifier_block *nb,
 	}
 	mlxsw_sp_span_respin(mlxsw_sp);
 
+	if (netif_is_vxlan(dev))
+		err = mlxsw_sp_netdevice_vxlan_event(mlxsw_sp, dev, event, ptr);
 	if (mlxsw_sp_netdev_is_ipip_ol(mlxsw_sp, dev))
 		err = mlxsw_sp_netdevice_ipip_ol_event(mlxsw_sp, dev,
 						       event, ptr);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 2d5eca78576a..0875a79cbe7b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -16,6 +16,7 @@
 #include <net/psample.h>
 #include <net/pkt_cls.h>
 #include <net/red.h>
+#include <net/vxlan.h>
 
 #include "port.h"
 #include "core.h"
@@ -241,6 +242,25 @@ struct mlxsw_sp_port {
 	struct mlxsw_sp_acl_block *eg_acl_block;
 };
 
+static inline struct net_device *
+mlxsw_sp_bridge_vxlan_dev_find(struct net_device *br_dev)
+{
+	struct net_device *dev;
+	struct list_head *iter;
+
+	netdev_for_each_lower_dev(br_dev, dev, iter) {
+		if (netif_is_vxlan(dev))
+			return dev;
+	}
+
+	return NULL;
+}
+
+static inline bool mlxsw_sp_bridge_has_vxlan(struct net_device *br_dev)
+{
+	return !!mlxsw_sp_bridge_vxlan_dev_find(br_dev);
+}
+
 static inline bool
 mlxsw_sp_port_is_pause_en(const struct mlxsw_sp_port *mlxsw_sp_port)
 {
@@ -336,6 +356,13 @@ void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
 				struct net_device *br_dev);
 bool mlxsw_sp_bridge_device_is_offloaded(const struct mlxsw_sp *mlxsw_sp,
 					 const struct net_device *br_dev);
+int mlxsw_sp_bridge_vxlan_join(struct mlxsw_sp *mlxsw_sp,
+			       const struct net_device *br_dev,
+			       const struct net_device *vxlan_dev,
+			       struct netlink_ext_ack *extack);
+void mlxsw_sp_bridge_vxlan_leave(struct mlxsw_sp *mlxsw_sp,
+				 const struct net_device *br_dev,
+				 const struct net_device *vxlan_dev);
 
 /* spectrum.c */
 int mlxsw_sp_port_ets_set(struct mlxsw_sp_port *mlxsw_sp_port,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index a89075beef94..bab7712e1721 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -15,6 +15,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/netlink.h>
 #include <net/switchdev.h>
+#include <net/vxlan.h>
 
 #include "spectrum_span.h"
 #include "spectrum_switchdev.h"
@@ -83,6 +84,11 @@ struct mlxsw_sp_bridge_ops {
 	void (*port_leave)(struct mlxsw_sp_bridge_device *bridge_device,
 			   struct mlxsw_sp_bridge_port *bridge_port,
 			   struct mlxsw_sp_port *mlxsw_sp_port);
+	int (*vxlan_join)(struct mlxsw_sp_bridge_device *bridge_device,
+			  const struct net_device *vxlan_dev,
+			  struct netlink_ext_ack *extack);
+	void (*vxlan_leave)(struct mlxsw_sp_bridge_device *bridge_device,
+			    const struct net_device *vxlan_dev);
 	struct mlxsw_sp_fid *
 		(*fid_get)(struct mlxsw_sp_bridge_device *bridge_device,
 			   u16 vid);
@@ -1949,6 +1955,21 @@ mlxsw_sp_bridge_8021q_port_leave(struct mlxsw_sp_bridge_device *bridge_device,
 	mlxsw_sp_port_pvid_set(mlxsw_sp_port, 1);
 }
 
+static int
+mlxsw_sp_bridge_8021q_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device,
+				 const struct net_device *vxlan_dev,
+				 struct netlink_ext_ack *extack)
+{
+	WARN_ON(1);
+	return -EINVAL;
+}
+
+static void
+mlxsw_sp_bridge_8021q_vxlan_leave(struct mlxsw_sp_bridge_device *bridge_device,
+				  const struct net_device *vxlan_dev)
+{
+}
+
 static struct mlxsw_sp_fid *
 mlxsw_sp_bridge_8021q_fid_get(struct mlxsw_sp_bridge_device *bridge_device,
 			      u16 vid)
@@ -1961,6 +1982,8 @@ mlxsw_sp_bridge_8021q_fid_get(struct mlxsw_sp_bridge_device *bridge_device,
 static const struct mlxsw_sp_bridge_ops mlxsw_sp_bridge_8021q_ops = {
 	.port_join	= mlxsw_sp_bridge_8021q_port_join,
 	.port_leave	= mlxsw_sp_bridge_8021q_port_leave,
+	.vxlan_join	= mlxsw_sp_bridge_8021q_vxlan_join,
+	.vxlan_leave	= mlxsw_sp_bridge_8021q_vxlan_leave,
 	.fid_get	= mlxsw_sp_bridge_8021q_fid_get,
 };
 
@@ -2025,18 +2048,103 @@ mlxsw_sp_bridge_8021d_port_leave(struct mlxsw_sp_bridge_device *bridge_device,
 	mlxsw_sp_port_vlan_bridge_leave(mlxsw_sp_port_vlan);
 }
 
+static int
+mlxsw_sp_bridge_8021d_vxlan_join(struct mlxsw_sp_bridge_device *bridge_device,
+				 const struct net_device *vxlan_dev,
+				 struct netlink_ext_ack *extack)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev);
+	struct vxlan_dev *vxlan = netdev_priv(vxlan_dev);
+	struct mlxsw_sp_nve_params params = {
+		.type = MLXSW_SP_NVE_TYPE_VXLAN,
+		.vni = vxlan->cfg.vni,
+		.dev = vxlan_dev,
+	};
+	struct mlxsw_sp_fid *fid;
+	int err;
+
+	fid = mlxsw_sp_fid_8021d_lookup(mlxsw_sp, bridge_device->dev->ifindex);
+	if (!fid)
+		return -EINVAL;
+
+	if (mlxsw_sp_fid_vni_is_set(fid))
+		return -EINVAL;
+
+	err = mlxsw_sp_nve_fid_enable(mlxsw_sp, fid, &params, extack);
+	if (err)
+		goto err_nve_fid_enable;
+
+	/* The tunnel port does not hold a reference on the FID. Only
+	 * local ports and the router port
+	 */
+	mlxsw_sp_fid_put(fid);
+
+	return 0;
+
+err_nve_fid_enable:
+	mlxsw_sp_fid_put(fid);
+	return err;
+}
+
+static void
+mlxsw_sp_bridge_8021d_vxlan_leave(struct mlxsw_sp_bridge_device *bridge_device,
+				  const struct net_device *vxlan_dev)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev);
+	struct mlxsw_sp_fid *fid;
+
+	fid = mlxsw_sp_fid_8021d_lookup(mlxsw_sp, bridge_device->dev->ifindex);
+	if (WARN_ON(!fid))
+		return;
+
+	/* If the VxLAN device is down, then the FID does not have a VNI */
+	if (!mlxsw_sp_fid_vni_is_set(fid))
+		goto out;
+
+	mlxsw_sp_nve_fid_disable(mlxsw_sp, fid);
+out:
+	mlxsw_sp_fid_put(fid);
+}
+
 static struct mlxsw_sp_fid *
 mlxsw_sp_bridge_8021d_fid_get(struct mlxsw_sp_bridge_device *bridge_device,
 			      u16 vid)
 {
 	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev);
+	struct net_device *vxlan_dev;
+	struct mlxsw_sp_fid *fid;
+	int err;
 
-	return mlxsw_sp_fid_8021d_get(mlxsw_sp, bridge_device->dev->ifindex);
+	fid = mlxsw_sp_fid_8021d_get(mlxsw_sp, bridge_device->dev->ifindex);
+	if (IS_ERR(fid))
+		return fid;
+
+	if (mlxsw_sp_fid_vni_is_set(fid))
+		return fid;
+
+	vxlan_dev = mlxsw_sp_bridge_vxlan_dev_find(bridge_device->dev);
+	if (!vxlan_dev)
+		return fid;
+
+	if (!netif_running(vxlan_dev))
+		return fid;
+
+	err = mlxsw_sp_bridge_8021d_vxlan_join(bridge_device, vxlan_dev, NULL);
+	if (err)
+		goto err_vxlan_join;
+
+	return fid;
+
+err_vxlan_join:
+	mlxsw_sp_fid_put(fid);
+	return ERR_PTR(err);
 }
 
 static const struct mlxsw_sp_bridge_ops mlxsw_sp_bridge_8021d_ops = {
 	.port_join	= mlxsw_sp_bridge_8021d_port_join,
 	.port_leave	= mlxsw_sp_bridge_8021d_port_leave,
+	.vxlan_join	= mlxsw_sp_bridge_8021d_vxlan_join,
+	.vxlan_leave	= mlxsw_sp_bridge_8021d_vxlan_leave,
 	.fid_get	= mlxsw_sp_bridge_8021d_fid_get,
 };
 
@@ -2087,6 +2195,33 @@ void mlxsw_sp_port_bridge_leave(struct mlxsw_sp_port *mlxsw_sp_port,
 	mlxsw_sp_bridge_port_put(mlxsw_sp->bridge, bridge_port);
 }
 
+int mlxsw_sp_bridge_vxlan_join(struct mlxsw_sp *mlxsw_sp,
+			       const struct net_device *br_dev,
+			       const struct net_device *vxlan_dev,
+			       struct netlink_ext_ack *extack)
+{
+	struct mlxsw_sp_bridge_device *bridge_device;
+
+	bridge_device = mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
+	if (WARN_ON(!bridge_device))
+		return -EINVAL;
+
+	return bridge_device->ops->vxlan_join(bridge_device, vxlan_dev, extack);
+}
+
+void mlxsw_sp_bridge_vxlan_leave(struct mlxsw_sp *mlxsw_sp,
+				 const struct net_device *br_dev,
+				 const struct net_device *vxlan_dev)
+{
+	struct mlxsw_sp_bridge_device *bridge_device;
+
+	bridge_device = mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
+	if (WARN_ON(!bridge_device))
+		return;
+
+	bridge_device->ops->vxlan_leave(bridge_device, vxlan_dev);
+}
+
 static void
 mlxsw_sp_fdb_call_notifiers(enum switchdev_notifier_type type,
 			    const char *mac, u16 vid,
-- 
2.17.2

^ permalink raw reply related

* [PATCH net-next v2 18/18] mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation
From: Ido Schimmel @ 2018-10-17  8:53 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata,
	jakub.kicinski@netronome.com, ivecera@redhat.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	andrew@lunn.ch, vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com, stephen@networkplumber.org,
	bridge@lists.linux-foundation.org, mlxsw, Ido Schimmel
In-Reply-To: <20181017085215.26607-1-idosch@mellanox.com>

In the device, VxLAN encapsulation takes place in the FDB table where
certain {MAC, FID} entries are programmed with an underlay unicast IP.
MAC addresses that are not programmed in the FDB are flooded to the
relevant local ports and also to a list of underlay unicast IPs that are
programmed using the all zeros MAC address in the VxLAN driver.

One difference between the hardware and software data paths is the fact
that in the software data path there are two FDB lookups prior to the
encapsulation of the packet. First in the bridge's FDB table using {MAC,
VID} and another in the VxLAN's FDB table using {MAC, VNI}.

Therefore, when a new VxLAN FDB entry is notified, it is only programmed
to the device if there is a corresponding entry in the bridge's FDB
table. Similarly, when a new bridge FDB entry pointing to the VxLAN
device is notified, it is only programmed to the device if there is a
corresponding entry in the VxLAN's FDB table.

Note that the above scheme will result in a discrepancy between both
data paths if only one FDB table is populated in the software data path.
For example, if only the bridge's FDB is populated with an entry
pointing to a VxLAN device, then a packet hitting the entry will only be
flooded by the kernel to remote VTEPs whereas the device will also flood
the packets to other local ports member in the VLAN.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
---
 .../mellanox/mlxsw/spectrum_switchdev.c       | 406 +++++++++++++++++-
 1 file changed, 405 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index bab7712e1721..bc60d7a8b49d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -92,6 +92,11 @@ struct mlxsw_sp_bridge_ops {
 	struct mlxsw_sp_fid *
 		(*fid_get)(struct mlxsw_sp_bridge_device *bridge_device,
 			   u16 vid);
+	struct mlxsw_sp_fid *
+		(*fid_lookup)(struct mlxsw_sp_bridge_device *bridge_device,
+			      u16 vid);
+	u16 (*fid_vid)(struct mlxsw_sp_bridge_device *bridge_device,
+		       const struct mlxsw_sp_fid *fid);
 };
 
 static int
@@ -1242,6 +1247,51 @@ static enum mlxsw_reg_sfd_op mlxsw_sp_sfd_op(bool adding)
 			MLXSW_REG_SFD_OP_WRITE_REMOVE;
 }
 
+static int mlxsw_sp_port_fdb_tunnel_uc_op(struct mlxsw_sp *mlxsw_sp,
+					  const char *mac, u16 fid,
+					  enum mlxsw_sp_l3proto proto,
+					  const union mlxsw_sp_l3addr *addr,
+					  bool adding, bool dynamic)
+{
+	enum mlxsw_reg_sfd_uc_tunnel_protocol sfd_proto;
+	char *sfd_pl;
+	u8 num_rec;
+	u32 uip;
+	int err;
+
+	switch (proto) {
+	case MLXSW_SP_L3_PROTO_IPV4:
+		uip = be32_to_cpu(addr->addr4);
+		sfd_proto = MLXSW_REG_SFD_UC_TUNNEL_PROTOCOL_IPV4;
+		break;
+	case MLXSW_SP_L3_PROTO_IPV6: /* fall through */
+	default:
+		WARN_ON(1);
+		return -EOPNOTSUPP;
+	}
+
+	sfd_pl = kmalloc(MLXSW_REG_SFD_LEN, GFP_KERNEL);
+	if (!sfd_pl)
+		return -ENOMEM;
+
+	mlxsw_reg_sfd_pack(sfd_pl, mlxsw_sp_sfd_op(adding), 0);
+	mlxsw_reg_sfd_uc_tunnel_pack(sfd_pl, 0,
+				     mlxsw_sp_sfd_rec_policy(dynamic), mac, fid,
+				     MLXSW_REG_SFD_REC_ACTION_NOP, uip,
+				     sfd_proto);
+	num_rec = mlxsw_reg_sfd_num_rec_get(sfd_pl);
+	err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfd), sfd_pl);
+	if (err)
+		goto out;
+
+	if (num_rec != mlxsw_reg_sfd_num_rec_get(sfd_pl))
+		err = -EBUSY;
+
+out:
+	kfree(sfd_pl);
+	return err;
+}
+
 static int __mlxsw_sp_port_fdb_uc_op(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 				     const char *mac, u16 fid, bool adding,
 				     enum mlxsw_reg_sfd_rec_action action,
@@ -1979,12 +2029,29 @@ mlxsw_sp_bridge_8021q_fid_get(struct mlxsw_sp_bridge_device *bridge_device,
 	return mlxsw_sp_fid_8021q_get(mlxsw_sp, vid);
 }
 
+static struct mlxsw_sp_fid *
+mlxsw_sp_bridge_8021q_fid_lookup(struct mlxsw_sp_bridge_device *bridge_device,
+				 u16 vid)
+{
+	WARN_ON(1);
+	return NULL;
+}
+
+static u16
+mlxsw_sp_bridge_8021q_fid_vid(struct mlxsw_sp_bridge_device *bridge_device,
+			      const struct mlxsw_sp_fid *fid)
+{
+	return mlxsw_sp_fid_8021q_vid(fid);
+}
+
 static const struct mlxsw_sp_bridge_ops mlxsw_sp_bridge_8021q_ops = {
 	.port_join	= mlxsw_sp_bridge_8021q_port_join,
 	.port_leave	= mlxsw_sp_bridge_8021q_port_leave,
 	.vxlan_join	= mlxsw_sp_bridge_8021q_vxlan_join,
 	.vxlan_leave	= mlxsw_sp_bridge_8021q_vxlan_leave,
 	.fid_get	= mlxsw_sp_bridge_8021q_fid_get,
+	.fid_lookup	= mlxsw_sp_bridge_8021q_fid_lookup,
+	.fid_vid	= mlxsw_sp_bridge_8021q_fid_vid,
 };
 
 static bool
@@ -2140,12 +2207,34 @@ mlxsw_sp_bridge_8021d_fid_get(struct mlxsw_sp_bridge_device *bridge_device,
 	return ERR_PTR(err);
 }
 
+static struct mlxsw_sp_fid *
+mlxsw_sp_bridge_8021d_fid_lookup(struct mlxsw_sp_bridge_device *bridge_device,
+				 u16 vid)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_device->dev);
+
+	/* The only valid VLAN for a VLAN-unaware bridge is 0 */
+	if (vid)
+		return NULL;
+
+	return mlxsw_sp_fid_8021d_lookup(mlxsw_sp, bridge_device->dev->ifindex);
+}
+
+static u16
+mlxsw_sp_bridge_8021d_fid_vid(struct mlxsw_sp_bridge_device *bridge_device,
+			      const struct mlxsw_sp_fid *fid)
+{
+	return 0;
+}
+
 static const struct mlxsw_sp_bridge_ops mlxsw_sp_bridge_8021d_ops = {
 	.port_join	= mlxsw_sp_bridge_8021d_port_join,
 	.port_leave	= mlxsw_sp_bridge_8021d_port_leave,
 	.vxlan_join	= mlxsw_sp_bridge_8021d_vxlan_join,
 	.vxlan_leave	= mlxsw_sp_bridge_8021d_vxlan_leave,
 	.fid_get	= mlxsw_sp_bridge_8021d_fid_get,
+	.fid_lookup	= mlxsw_sp_bridge_8021d_fid_lookup,
+	.fid_vid	= mlxsw_sp_bridge_8021d_fid_vid,
 };
 
 int mlxsw_sp_port_bridge_join(struct mlxsw_sp_port *mlxsw_sp_port,
@@ -2419,11 +2508,126 @@ static void mlxsw_sp_fdb_notify_work(struct work_struct *work)
 
 struct mlxsw_sp_switchdev_event_work {
 	struct work_struct work;
-	struct switchdev_notifier_fdb_info fdb_info;
+	union {
+		struct switchdev_notifier_fdb_info fdb_info;
+		struct switchdev_notifier_vxlan_fdb_info vxlan_fdb_info;
+	};
 	struct net_device *dev;
 	unsigned long event;
 };
 
+static void
+mlxsw_sp_switchdev_vxlan_addr_convert(const union vxlan_addr *vxlan_addr,
+				      enum mlxsw_sp_l3proto *proto,
+				      union mlxsw_sp_l3addr *addr)
+{
+	if (vxlan_addr->sa.sa_family == AF_INET) {
+		addr->addr4 = vxlan_addr->sin.sin_addr.s_addr;
+		*proto = MLXSW_SP_L3_PROTO_IPV4;
+	} else {
+		addr->addr6 = vxlan_addr->sin6.sin6_addr;
+		*proto = MLXSW_SP_L3_PROTO_IPV6;
+	}
+}
+
+static void
+mlxsw_sp_switchdev_bridge_vxlan_fdb_event(struct mlxsw_sp *mlxsw_sp,
+					  struct mlxsw_sp_switchdev_event_work *
+					  switchdev_work,
+					  struct mlxsw_sp_fid *fid, __be32 vni)
+{
+	struct switchdev_notifier_vxlan_fdb_info vxlan_fdb_info;
+	struct switchdev_notifier_fdb_info *fdb_info;
+	struct net_device *dev = switchdev_work->dev;
+	enum mlxsw_sp_l3proto proto;
+	union mlxsw_sp_l3addr addr;
+	int err;
+
+	fdb_info = &switchdev_work->fdb_info;
+	err = vxlan_fdb_find_uc(dev, fdb_info->addr, vni, &vxlan_fdb_info);
+	if (err)
+		return;
+
+	mlxsw_sp_switchdev_vxlan_addr_convert(&vxlan_fdb_info.remote_ip,
+					      &proto, &addr);
+
+	switch (switchdev_work->event) {
+	case SWITCHDEV_FDB_ADD_TO_DEVICE:
+		err = mlxsw_sp_port_fdb_tunnel_uc_op(mlxsw_sp,
+						     vxlan_fdb_info.eth_addr,
+						     mlxsw_sp_fid_index(fid),
+						     proto, &addr, true, false);
+		if (err)
+			return;
+		vxlan_fdb_info.offloaded = true;
+		call_switchdev_notifiers(SWITCHDEV_VXLAN_FDB_OFFLOADED, dev,
+					 &vxlan_fdb_info.info);
+		mlxsw_sp_fdb_call_notifiers(SWITCHDEV_FDB_OFFLOADED,
+					    vxlan_fdb_info.eth_addr,
+					    fdb_info->vid, dev, true);
+		break;
+	case SWITCHDEV_FDB_DEL_TO_DEVICE:
+		err = mlxsw_sp_port_fdb_tunnel_uc_op(mlxsw_sp,
+						     vxlan_fdb_info.eth_addr,
+						     mlxsw_sp_fid_index(fid),
+						     proto, &addr, false,
+						     false);
+		vxlan_fdb_info.offloaded = false;
+		call_switchdev_notifiers(SWITCHDEV_VXLAN_FDB_OFFLOADED, dev,
+					 &vxlan_fdb_info.info);
+		break;
+	}
+}
+
+static void
+mlxsw_sp_switchdev_bridge_nve_fdb_event(struct mlxsw_sp_switchdev_event_work *
+					switchdev_work)
+{
+	struct mlxsw_sp_bridge_device *bridge_device;
+	struct net_device *dev = switchdev_work->dev;
+	struct net_device *br_dev;
+	struct mlxsw_sp *mlxsw_sp;
+	struct mlxsw_sp_fid *fid;
+	__be32 vni;
+	int err;
+
+	if (switchdev_work->event != SWITCHDEV_FDB_ADD_TO_DEVICE &&
+	    switchdev_work->event != SWITCHDEV_FDB_DEL_TO_DEVICE)
+		return;
+
+	if (!switchdev_work->fdb_info.added_by_user)
+		return;
+
+	if (!netif_running(dev))
+		return;
+	br_dev = netdev_master_upper_dev_get(dev);
+	if (!br_dev)
+		return;
+	if (!netif_is_bridge_master(br_dev))
+		return;
+	mlxsw_sp = mlxsw_sp_lower_get(br_dev);
+	if (!mlxsw_sp)
+		return;
+	bridge_device = mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
+	if (!bridge_device)
+		return;
+
+	fid = bridge_device->ops->fid_lookup(bridge_device,
+					     switchdev_work->fdb_info.vid);
+	if (!fid)
+		return;
+
+	err = mlxsw_sp_fid_vni(fid, &vni);
+	if (err)
+		goto out;
+
+	mlxsw_sp_switchdev_bridge_vxlan_fdb_event(mlxsw_sp, switchdev_work, fid,
+						  vni);
+
+out:
+	mlxsw_sp_fid_put(fid);
+}
+
 static void mlxsw_sp_switchdev_bridge_fdb_event_work(struct work_struct *work)
 {
 	struct mlxsw_sp_switchdev_event_work *switchdev_work =
@@ -2434,6 +2638,11 @@ static void mlxsw_sp_switchdev_bridge_fdb_event_work(struct work_struct *work)
 	int err;
 
 	rtnl_lock();
+	if (netif_is_vxlan(dev)) {
+		mlxsw_sp_switchdev_bridge_nve_fdb_event(switchdev_work);
+		goto out;
+	}
+
 	mlxsw_sp_port = mlxsw_sp_port_dev_lower_find(dev);
 	if (!mlxsw_sp_port)
 		goto out;
@@ -2473,6 +2682,189 @@ static void mlxsw_sp_switchdev_bridge_fdb_event_work(struct work_struct *work)
 	dev_put(dev);
 }
 
+static void
+mlxsw_sp_switchdev_vxlan_fdb_add(struct mlxsw_sp *mlxsw_sp,
+				 struct mlxsw_sp_switchdev_event_work *
+				 switchdev_work)
+{
+	struct switchdev_notifier_vxlan_fdb_info *vxlan_fdb_info;
+	struct mlxsw_sp_bridge_device *bridge_device;
+	struct net_device *dev = switchdev_work->dev;
+	u8 all_zeros_mac[ETH_ALEN] = { 0 };
+	enum mlxsw_sp_l3proto proto;
+	union mlxsw_sp_l3addr addr;
+	struct net_device *br_dev;
+	struct mlxsw_sp_fid *fid;
+	u16 vid;
+	int err;
+
+	vxlan_fdb_info = &switchdev_work->vxlan_fdb_info;
+	br_dev = netdev_master_upper_dev_get(dev);
+
+	bridge_device = mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
+	if (!bridge_device)
+		return;
+
+	fid = mlxsw_sp_fid_lookup_by_vni(mlxsw_sp, vxlan_fdb_info->vni);
+	if (!fid)
+		return;
+
+	mlxsw_sp_switchdev_vxlan_addr_convert(&vxlan_fdb_info->remote_ip,
+					      &proto, &addr);
+
+	if (ether_addr_equal(vxlan_fdb_info->eth_addr, all_zeros_mac)) {
+		err = mlxsw_sp_nve_flood_ip_add(mlxsw_sp, fid, proto, &addr);
+		if (err) {
+			mlxsw_sp_fid_put(fid);
+			return;
+		}
+		vxlan_fdb_info->offloaded = true;
+		call_switchdev_notifiers(SWITCHDEV_VXLAN_FDB_OFFLOADED, dev,
+					 &vxlan_fdb_info->info);
+		mlxsw_sp_fid_put(fid);
+		return;
+	}
+
+	/* The device has a single FDB table, whereas Linux has two - one
+	 * in the bridge driver and another in the VxLAN driver. We only
+	 * program an entry to the device if the MAC points to the VxLAN
+	 * device in the bridge's FDB table
+	 */
+	vid = bridge_device->ops->fid_vid(bridge_device, fid);
+	if (br_fdb_find_port(br_dev, vxlan_fdb_info->eth_addr, vid) != dev)
+		goto err_br_fdb_find;
+
+	err = mlxsw_sp_port_fdb_tunnel_uc_op(mlxsw_sp, vxlan_fdb_info->eth_addr,
+					     mlxsw_sp_fid_index(fid), proto,
+					     &addr, true, false);
+	if (err)
+		goto err_fdb_tunnel_uc_op;
+	vxlan_fdb_info->offloaded = true;
+	call_switchdev_notifiers(SWITCHDEV_VXLAN_FDB_OFFLOADED, dev,
+				 &vxlan_fdb_info->info);
+	mlxsw_sp_fdb_call_notifiers(SWITCHDEV_FDB_OFFLOADED,
+				    vxlan_fdb_info->eth_addr, vid, dev, true);
+
+	mlxsw_sp_fid_put(fid);
+
+	return;
+
+err_fdb_tunnel_uc_op:
+err_br_fdb_find:
+	mlxsw_sp_fid_put(fid);
+}
+
+static void
+mlxsw_sp_switchdev_vxlan_fdb_del(struct mlxsw_sp *mlxsw_sp,
+				 struct mlxsw_sp_switchdev_event_work *
+				 switchdev_work)
+{
+	struct switchdev_notifier_vxlan_fdb_info *vxlan_fdb_info;
+	struct mlxsw_sp_bridge_device *bridge_device;
+	struct net_device *dev = switchdev_work->dev;
+	struct net_device *br_dev = netdev_master_upper_dev_get(dev);
+	u8 all_zeros_mac[ETH_ALEN] = { 0 };
+	enum mlxsw_sp_l3proto proto;
+	union mlxsw_sp_l3addr addr;
+	struct mlxsw_sp_fid *fid;
+	u16 vid;
+
+	vxlan_fdb_info = &switchdev_work->vxlan_fdb_info;
+
+	bridge_device = mlxsw_sp_bridge_device_find(mlxsw_sp->bridge, br_dev);
+	if (!bridge_device)
+		return;
+
+	fid = mlxsw_sp_fid_lookup_by_vni(mlxsw_sp, vxlan_fdb_info->vni);
+	if (!fid)
+		return;
+
+	mlxsw_sp_switchdev_vxlan_addr_convert(&vxlan_fdb_info->remote_ip,
+					      &proto, &addr);
+
+	if (ether_addr_equal(vxlan_fdb_info->eth_addr, all_zeros_mac)) {
+		mlxsw_sp_nve_flood_ip_del(mlxsw_sp, fid, proto, &addr);
+		mlxsw_sp_fid_put(fid);
+		return;
+	}
+
+	mlxsw_sp_port_fdb_tunnel_uc_op(mlxsw_sp, vxlan_fdb_info->eth_addr,
+				       mlxsw_sp_fid_index(fid), proto, &addr,
+				       false, false);
+	vid = bridge_device->ops->fid_vid(bridge_device, fid);
+	mlxsw_sp_fdb_call_notifiers(SWITCHDEV_FDB_OFFLOADED,
+				    vxlan_fdb_info->eth_addr, vid, dev, false);
+
+	mlxsw_sp_fid_put(fid);
+}
+
+static void mlxsw_sp_switchdev_vxlan_fdb_event_work(struct work_struct *work)
+{
+	struct mlxsw_sp_switchdev_event_work *switchdev_work =
+		container_of(work, struct mlxsw_sp_switchdev_event_work, work);
+	struct net_device *dev = switchdev_work->dev;
+	struct mlxsw_sp *mlxsw_sp;
+	struct net_device *br_dev;
+
+	rtnl_lock();
+
+	if (!netif_running(dev))
+		goto out;
+	br_dev = netdev_master_upper_dev_get(dev);
+	if (!br_dev)
+		goto out;
+	if (!netif_is_bridge_master(br_dev))
+		goto out;
+	mlxsw_sp = mlxsw_sp_lower_get(br_dev);
+	if (!mlxsw_sp)
+		goto out;
+
+	switch (switchdev_work->event) {
+	case SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE:
+		mlxsw_sp_switchdev_vxlan_fdb_add(mlxsw_sp, switchdev_work);
+		break;
+	case SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE:
+		mlxsw_sp_switchdev_vxlan_fdb_del(mlxsw_sp, switchdev_work);
+		break;
+	}
+
+out:
+	rtnl_unlock();
+	kfree(switchdev_work);
+	dev_put(dev);
+}
+
+static int
+mlxsw_sp_switchdev_vxlan_work_prepare(struct mlxsw_sp_switchdev_event_work *
+				      switchdev_work,
+				      struct switchdev_notifier_info *info)
+{
+	struct vxlan_dev *vxlan = netdev_priv(switchdev_work->dev);
+	struct switchdev_notifier_vxlan_fdb_info *vxlan_fdb_info;
+	struct vxlan_config *cfg = &vxlan->cfg;
+
+	vxlan_fdb_info = container_of(info,
+				      struct switchdev_notifier_vxlan_fdb_info,
+				      info);
+
+	if (vxlan_fdb_info->remote_port != cfg->dst_port)
+		return -EOPNOTSUPP;
+	if (vxlan_fdb_info->remote_vni != cfg->vni)
+		return -EOPNOTSUPP;
+	if (vxlan_fdb_info->vni != cfg->vni)
+		return -EOPNOTSUPP;
+	if (vxlan_fdb_info->remote_ifindex)
+		return -EOPNOTSUPP;
+	if (is_multicast_ether_addr(vxlan_fdb_info->eth_addr))
+		return -EOPNOTSUPP;
+	if (vxlan_addr_multicast(&vxlan_fdb_info->remote_ip))
+		return -EOPNOTSUPP;
+
+	switchdev_work->vxlan_fdb_info = *vxlan_fdb_info;
+
+	return 0;
+}
+
 /* Called under rcu_read_lock() */
 static int mlxsw_sp_switchdev_event(struct notifier_block *unused,
 				    unsigned long event, void *ptr)
@@ -2482,6 +2874,7 @@ static int mlxsw_sp_switchdev_event(struct notifier_block *unused,
 	struct switchdev_notifier_fdb_info *fdb_info;
 	struct switchdev_notifier_info *info = ptr;
 	struct net_device *br_dev;
+	int err;
 
 	/* Tunnel devices are not our uppers, so check their master instead */
 	br_dev = netdev_master_upper_dev_get_rcu(dev);
@@ -2522,6 +2915,16 @@ static int mlxsw_sp_switchdev_event(struct notifier_block *unused,
 		 */
 		dev_hold(dev);
 		break;
+	case SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE: /* fall through */
+	case SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE:
+		INIT_WORK(&switchdev_work->work,
+			  mlxsw_sp_switchdev_vxlan_fdb_event_work);
+		err = mlxsw_sp_switchdev_vxlan_work_prepare(switchdev_work,
+							    info);
+		if (err)
+			goto err_vxlan_work_prepare;
+		dev_hold(dev);
+		break;
 	default:
 		kfree(switchdev_work);
 		return NOTIFY_DONE;
@@ -2531,6 +2934,7 @@ static int mlxsw_sp_switchdev_event(struct notifier_block *unused,
 
 	return NOTIFY_DONE;
 
+err_vxlan_work_prepare:
 err_addr_alloc:
 	kfree(switchdev_work);
 	return NOTIFY_BAD;
-- 
2.17.2

^ permalink raw reply related

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-17  9:39 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BC70069.4000600@huawei.com>


On 2018/10/17 下午5:27, jiangyiwen wrote:
> On 2018/10/15 14:12, jiangyiwen wrote:
>> On 2018/10/15 10:33, Jason Wang wrote:
>>>
>>> On 2018年10月15日 09:43, jiangyiwen wrote:
>>>> Hi Stefan & All:
>>>>
>>>> Now I find vhost-vsock has two performance problems even if it
>>>> is not designed for performance.
>>>>
>>>> First, I think vhost-vsock should faster than vhost-net because it
>>>> is no TCP/IP stack, but the real test result vhost-net is 5~10
>>>> times than vhost-vsock, currently I am looking for the reason.
>>> TCP/IP is not a must for vhost-net.
>>>
>>> How do you test and compare the performance?
>>>
>>> Thanks
>>>
>> I test the performance used my test tool, like follows:
>>
>> Server                   Client
>> socket()
>> bind()
>> listen()
>>
>>                           socket(AF_VSOCK) or socket(AF_INET)
>> Accept() <-------------->connect()
>>                           *======Start Record Time======*
>>                           Call syscall sendfile()
>> Recv()
>>                           Send end
>> Receive end
>> Send(file_size)
>>                           Recv(file_size)
>>                           *======End Record Time======*
>>
>> The test result, vhost-vsock is about 500MB/s, and vhost-net is about 2500MB/s.
>>
>> By the way, vhost-net use single queue.
>>
>> Thanks.
>>
>>>> Second, vhost-vsock only supports two vqs(tx and rx), that means
>>>> if multiple sockets in the guest will use the same vq to transmit
>>>> the message and get the response. So if there are multiple applications
>>>> in the guest, we should support "Multiqueue" feature for Virtio-vsock.
>>>>
>>>> Stefan, have you encountered these problems?
>>>>
>>>> Thanks,
>>>> Yiwen.
>>>>
>>>
>>> .
>>>
>>
> Hi Jason and Stefan,
>
> Maybe I find the reason of bad performance.
>
> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
> it will cause the bandwidth is limited to 500~600MB/s. And once I
> increase to 64k, it can improve about 3 times(~1500MB/s).


Looks like the value was chosen for a balance between rx buffer size and 
performance. Allocating 64K always even for small packet is kind of 
waste and stress for guest memory. Virito-net try to avoid this by 
inventing the merge able rx buffer which allows big packet to be 
scattered in into different buffers. We can reuse this idea or revisit 
the idea of using virtio-net/vhost-net as a transport of vsock.

What interesting is the performance is still behind vhost-net.

Thanks

>
> By the way, I send to 64K in application once, and I don't use
> sg_init_one and rewrite function to packet sg list because pkt_len
> include multiple pages.
>
> Thanks,
> Yiwen.
>
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>>
>

^ permalink raw reply

* Details.
From: Smadar Barber-Tsadik @ 2018-10-17  7:00 UTC (permalink / raw)
  To: Recipients

My name is Smadar Barber-Tsadik, 
I'm the Chief Executive Officer (C.P.A) of the First International Bank of Israel (FIBI).
I'm getting in touch with you in regards to a very important and urgent matter.
Kindly respond back at your earliest convenience so I can provide you the details.

Faithfully,
Smadar Barber-Tsadik

^ permalink raw reply

* [PATCH net] udp6: fix encap return code for resubmitting
From: Paolo Abeni @ 2018-10-17  9:44 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller

The commit eb63f2964dbe ("udp6: add missing checks on edumux packet
processing") used the same return code convention of the ipv4 counterpart,
but ipv6 uses the opposite one: positive values means resubmit.

This change addresses the issue, using positive return value for
resubmitting. Also update the related comment, which was broken, too.

Fixes: eb63f2964dbe ("udp6: add missing checks on edumux packet processing")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: I could not find any in kernel udp6 encap using the above
feature, that would explain why nobody complained so far...
---
 net/ipv6/udp.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 28c4aa5078fc..b36694b6716e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -766,11 +766,9 @@ static int udp6_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
 
 	ret = udpv6_queue_rcv_skb(sk, skb);
 
-	/* a return value > 0 means to resubmit the input, but
-	 * it wants the return to be -protocol, or 0
-	 */
+	/* a return value > 0 means to resubmit the input */
 	if (ret > 0)
-		return -ret;
+		return ret;
 	return 0;
 }
 
-- 
2.17.2

^ permalink raw reply related

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-17  9:51 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <d16d2052-bfb1-7861-e210-b53b4ea3260c@redhat.com>


On 2018/10/17 下午5:39, Jason Wang wrote:
>>>
>> Hi Jason and Stefan,
>>
>> Maybe I find the reason of bad performance.
>>
>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>> increase to 64k, it can improve about 3 times(~1500MB/s).
>
>
> Looks like the value was chosen for a balance between rx buffer size 
> and performance. Allocating 64K always even for small packet is kind 
> of waste and stress for guest memory. Virito-net try to avoid this by 
> inventing the merge able rx buffer which allows big packet to be 
> scattered in into different buffers. We can reuse this idea or revisit 
> the idea of using virtio-net/vhost-net as a transport of vsock.
>
> What interesting is the performance is still behind vhost-net.
>
> Thanks
>
>>
>> By the way, I send to 64K in application once, and I don't use
>> sg_init_one and rewrite function to packet sg list because pkt_len
>> include multiple pages.
>>
>> Thanks,
>> Yiwen. 


Btw, if you're using vsock for transferring large files, maybe it's more 
efficient to implement sendpage() for vsock to allow sendfile()/splice() 
work.

Thanks

^ permalink raw reply

* Re: [PATCH net-next] net: ena: Fix Kconfig dependencies X86
From: Belgazal, Netanel @ 2018-10-17 10:01 UTC (permalink / raw)
  To: Sergei Shtylyov, davem@davemloft.net, netdev@vger.kernel.org
  Cc: Kiyanovski, Arthur, Saidi, Ali, Woodhouse, David,
	Machulsky, Zorik, Matushevsky, Alexander, Bshara, Saeed,
	Wilson, Matt, Liguori, Anthony, Bshara, Nafea, Tzalik, Guy
In-Reply-To: <155c40da-faf6-ad9d-4d0b-f64d67217160@cogentembedded.com>

Sure.
Removing them and resubmit.

On 10/17/18, 11:37 AM, "Sergei Shtylyov" <sergei.shtylyov@cogentembedded.com> wrote:

    Hello!
    
    On 17.10.2018 11:16, netanel@amazon.com wrote:
    
    > From: Netanel Belgazal <netanel@amazon.com>
    >
    > The Kconfig limitation of X86 is to too wide.
    > The ENA driver only requires a little endian dependency.
    >
    > Change the dependency to be on little endian CPU.
    >
    > Signed-off-by: Netanel Belgazal <netanel@amazon.com>
    > ---
    >  drivers/net/ethernet/amazon/Kconfig | 2 +-
    >  1 file changed, 1 insertion(+), 1 deletion(-)
    >
    > diff --git a/drivers/net/ethernet/amazon/Kconfig b/drivers/net/ethernet/amazon/Kconfig
    > index 99b30353541a..f4d16c7e104f 100644
    > --- a/drivers/net/ethernet/amazon/Kconfig
    > +++ b/drivers/net/ethernet/amazon/Kconfig
    > @@ -17,7 +17,7 @@ if NET_VENDOR_AMAZON
    >
    >  config ENA_ETHERNET
    >  	tristate "Elastic Network Adapter (ENA) support"
    > -	depends on (PCI_MSI && X86)
    > +	depends on (PCI_MSI && !CPU_BIG_ENDIAN)
    
         Parens not needed here. High time to remove them, I think.
    
    [...]
    
    MBR, Sergei
    
    
    


^ permalink raw reply

* [PATCH V2 net-next] net: ena: Fix Kconfig dependency on X86
From: netanel @ 2018-10-17 10:04 UTC (permalink / raw)
  To: davem, netdev
  Cc: akiyano, alisaidi, Netanel Belgazal, dwmw, zorik, matua, saeedb,
	msw, aliguori, nafea, gtzalik

From: Netanel Belgazal <netanel@amazon.com>

The Kconfig limitation of X86 is to too wide.
The ENA driver only requires a little endian dependency.

Change the dependency to be on little endian CPU.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
 drivers/net/ethernet/amazon/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/Kconfig b/drivers/net/ethernet/amazon/Kconfig
index 99b30353541a..9e87d7b8360f 100644
--- a/drivers/net/ethernet/amazon/Kconfig
+++ b/drivers/net/ethernet/amazon/Kconfig
@@ -17,7 +17,7 @@ if NET_VENDOR_AMAZON
 
 config ENA_ETHERNET
 	tristate "Elastic Network Adapter (ENA) support"
-	depends on (PCI_MSI && X86)
+	depends on PCI_MSI && !CPU_BIG_ENDIAN
 	---help---
 	  This driver supports Elastic Network Adapter (ENA)"
 
-- 
2.15.2.AMZN

^ permalink raw reply related

* [PATCH] atm: eni: Move semicolon to a new line after empty for loop
From: Nathan Chancellor @ 2018-10-17 18:03 UTC (permalink / raw)
  To: Chas Williams; +Cc: linux-atm-general, netdev, linux-kernel, Nathan Chancellor

Clang warns:

drivers/atm/eni.c:244:48: error: for loop has empty body
[-Werror,-Wempty-body]
        for (order = 0; (1 << order) < *size; order++);
                                                      ^
drivers/atm/eni.c:244:48: note: put the semicolon on a separate line to
silence this warning

In this case, that loop is expected to be empty so silence the warning
in the way that Clang suggests.

Link: https://github.com/ClangBuiltLinux/linux/issues/42
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/atm/eni.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c
index 6470e3c4c990..f8c703426c90 100644
--- a/drivers/atm/eni.c
+++ b/drivers/atm/eni.c
@@ -241,7 +241,8 @@ static void __iomem *eni_alloc_mem(struct eni_dev *eni_dev, unsigned long *size)
 	len = eni_dev->free_len;
 	if (*size < MID_MIN_BUF_SIZE) *size = MID_MIN_BUF_SIZE;
 	if (*size > MID_MAX_BUF_SIZE) return NULL;
-	for (order = 0; (1 << order) < *size; order++);
+	for (order = 0; (1 << order) < *size; order++)
+		;
 	DPRINTK("trying: %ld->%d\n",*size,order);
 	best_order = 65; /* we don't have more than 2^64 of anything ... */
 	index = 0; /* silence GCC */
-- 
2.19.1

^ permalink raw reply related

* [PATCH] atm: zatm: Fix empty body Clang warnings
From: Nathan Chancellor @ 2018-10-17 18:04 UTC (permalink / raw)
  To: Chas Williams; +Cc: linux-atm-general, netdev, linux-kernel, Nathan Chancellor

Clang warns:

drivers/atm/zatm.c:513:7: error: while loop has empty body
[-Werror,-Wempty-body]
        zwait;
             ^
drivers/atm/zatm.c:513:7: note: put the semicolon on a separate line to
silence this warning

Get rid of this warning by using an empty do-while loop. While we're at
it, add parentheses to make it clear that this is a function-like macro.

Link: https://github.com/ClangBuiltLinux/linux/issues/42
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/atm/zatm.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/atm/zatm.c b/drivers/atm/zatm.c
index e89146ddede6..d5c76b50d357 100644
--- a/drivers/atm/zatm.c
+++ b/drivers/atm/zatm.c
@@ -126,7 +126,7 @@ static unsigned long dummy[2] = {0,0};
 #define zin_n(r) inl(zatm_dev->base+r*4)
 #define zin(r) inl(zatm_dev->base+uPD98401_##r*4)
 #define zout(v,r) outl(v,zatm_dev->base+uPD98401_##r*4)
-#define zwait while (zin(CMR) & uPD98401_BUSY)
+#define zwait() do {} while (zin(CMR) & uPD98401_BUSY)
 
 /* RX0, RX1, TX0, TX1 */
 static const int mbx_entries[NR_MBX] = { 1024,1024,1024,1024 };
@@ -140,7 +140,7 @@ static const int mbx_esize[NR_MBX] = { 16,16,4,4 }; /* entry size in bytes */
 
 static void zpokel(struct zatm_dev *zatm_dev,u32 value,u32 addr)
 {
-	zwait;
+	zwait();
 	zout(value,CER);
 	zout(uPD98401_IND_ACC | uPD98401_IA_BALL |
 	    (uPD98401_IA_TGT_CM << uPD98401_IA_TGT_SHIFT) | addr,CMR);
@@ -149,10 +149,10 @@ static void zpokel(struct zatm_dev *zatm_dev,u32 value,u32 addr)
 
 static u32 zpeekl(struct zatm_dev *zatm_dev,u32 addr)
 {
-	zwait;
+	zwait();
 	zout(uPD98401_IND_ACC | uPD98401_IA_BALL | uPD98401_IA_RW |
 	  (uPD98401_IA_TGT_CM << uPD98401_IA_TGT_SHIFT) | addr,CMR);
-	zwait;
+	zwait();
 	return zin(CER);
 }
 
@@ -241,7 +241,7 @@ static void refill_pool(struct atm_dev *dev,int pool)
 	}
 	if (first) {
 		spin_lock_irqsave(&zatm_dev->lock, flags);
-		zwait;
+		zwait();
 		zout(virt_to_bus(first),CER);
 		zout(uPD98401_ADD_BAT | (pool << uPD98401_POOL_SHIFT) | count,
 		    CMR);
@@ -508,9 +508,9 @@ static int open_rx_first(struct atm_vcc *vcc)
 	}
 	if (zatm_vcc->pool < 0) return -EMSGSIZE;
 	spin_lock_irqsave(&zatm_dev->lock, flags);
-	zwait;
+	zwait();
 	zout(uPD98401_OPEN_CHAN,CMR);
-	zwait;
+	zwait();
 	DPRINTK("0x%x 0x%x\n",zin(CMR),zin(CER));
 	chan = (zin(CMR) & uPD98401_CHAN_ADDR) >> uPD98401_CHAN_ADDR_SHIFT;
 	spin_unlock_irqrestore(&zatm_dev->lock, flags);
@@ -571,21 +571,21 @@ static void close_rx(struct atm_vcc *vcc)
 		pos = vcc->vci >> 1;
 		shift = (1-(vcc->vci & 1)) << 4;
 		zpokel(zatm_dev,zpeekl(zatm_dev,pos) & ~(0xffff << shift),pos);
-		zwait;
+		zwait();
 		zout(uPD98401_NOP,CMR);
-		zwait;
+		zwait();
 		zout(uPD98401_NOP,CMR);
 		spin_unlock_irqrestore(&zatm_dev->lock, flags);
 	}
 	spin_lock_irqsave(&zatm_dev->lock, flags);
-	zwait;
+	zwait();
 	zout(uPD98401_DEACT_CHAN | uPD98401_CHAN_RT | (zatm_vcc->rx_chan <<
 	    uPD98401_CHAN_ADDR_SHIFT),CMR);
-	zwait;
+	zwait();
 	udelay(10); /* why oh why ... ? */
 	zout(uPD98401_CLOSE_CHAN | uPD98401_CHAN_RT | (zatm_vcc->rx_chan <<
 	    uPD98401_CHAN_ADDR_SHIFT),CMR);
-	zwait;
+	zwait();
 	if (!(zin(CMR) & uPD98401_CHAN_ADDR))
 		printk(KERN_CRIT DEV_LABEL "(itf %d): can't close RX channel "
 		    "%d\n",vcc->dev->number,zatm_vcc->rx_chan);
@@ -699,7 +699,7 @@ printk("NONONONOO!!!!\n");
 	skb_queue_tail(&zatm_vcc->tx_queue,skb);
 	DPRINTK("QRP=0x%08lx\n",zpeekl(zatm_dev,zatm_vcc->tx_chan*VC_SIZE/4+
 	  uPD98401_TXVC_QRP));
-	zwait;
+	zwait();
 	zout(uPD98401_TX_READY | (zatm_vcc->tx_chan <<
 	    uPD98401_CHAN_ADDR_SHIFT),CMR);
 	spin_unlock_irqrestore(&zatm_dev->lock, flags);
@@ -891,12 +891,12 @@ static void close_tx(struct atm_vcc *vcc)
 	}
 	spin_lock_irqsave(&zatm_dev->lock, flags);
 #if 0
-	zwait;
+	zwait();
 	zout(uPD98401_DEACT_CHAN | (chan << uPD98401_CHAN_ADDR_SHIFT),CMR);
 #endif
-	zwait;
+	zwait();
 	zout(uPD98401_CLOSE_CHAN | (chan << uPD98401_CHAN_ADDR_SHIFT),CMR);
-	zwait;
+	zwait();
 	if (!(zin(CMR) & uPD98401_CHAN_ADDR))
 		printk(KERN_CRIT DEV_LABEL "(itf %d): can't close TX channel "
 		    "%d\n",vcc->dev->number,chan);
@@ -926,9 +926,9 @@ static int open_tx_first(struct atm_vcc *vcc)
 	zatm_vcc->tx_chan = 0;
 	if (vcc->qos.txtp.traffic_class == ATM_NONE) return 0;
 	spin_lock_irqsave(&zatm_dev->lock, flags);
-	zwait;
+	zwait();
 	zout(uPD98401_OPEN_CHAN,CMR);
-	zwait;
+	zwait();
 	DPRINTK("0x%x 0x%x\n",zin(CMR),zin(CER));
 	chan = (zin(CMR) & uPD98401_CHAN_ADDR) >> uPD98401_CHAN_ADDR_SHIFT;
 	spin_unlock_irqrestore(&zatm_dev->lock, flags);
@@ -1557,7 +1557,7 @@ static void zatm_phy_put(struct atm_dev *dev,unsigned char value,
 	struct zatm_dev *zatm_dev;
 
 	zatm_dev = ZATM_DEV(dev);
-	zwait;
+	zwait();
 	zout(value,CER);
 	zout(uPD98401_IND_ACC | uPD98401_IA_B0 |
 	    (uPD98401_IA_TGT_PHY << uPD98401_IA_TGT_SHIFT) | addr,CMR);
@@ -1569,10 +1569,10 @@ static unsigned char zatm_phy_get(struct atm_dev *dev,unsigned long addr)
 	struct zatm_dev *zatm_dev;
 
 	zatm_dev = ZATM_DEV(dev);
-	zwait;
+	zwait();
 	zout(uPD98401_IND_ACC | uPD98401_IA_B0 | uPD98401_IA_RW |
 	  (uPD98401_IA_TGT_PHY << uPD98401_IA_TGT_SHIFT) | addr,CMR);
-	zwait;
+	zwait();
 	return zin(CER) & 0xff;
 }
 
-- 
2.19.1

^ permalink raw reply related

* [PATCH] isdn: hfc_{pci,sx}: Avoid empty body if statements and use proper register accessors
From: Nathan Chancellor @ 2018-10-17 18:06 UTC (permalink / raw)
  To: Karsten Keil; +Cc: netdev, linux-kernel, Nathan Chancellor

Clang warns:

drivers/isdn/hisax/hfc_pci.c:131:34: error: if statement has empty body
[-Werror,-Wempty-body]
        if (Read_hfc(cs, HFCPCI_INT_S1));
                                        ^
drivers/isdn/hisax/hfc_pci.c:131:34: note: put the semicolon on a
separate line to silence this warning

Use the format found in drivers/isdn/hardware/mISDN/hfcpci.c of casting
the return of Read_hfc to void, instead of using an empty if statement.

While we're at it, Masahiro Yamada pointed out that {Read,Write}_hfc
should be using a standard access method in hfc_pci.h. Use the one found
in drivers/isdn/hardware/mISDN/hfc_pci.h.

Link: https://github.com/ClangBuiltLinux/linux/issues/66
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/isdn/hisax/hfc_pci.c | 6 +++---
 drivers/isdn/hisax/hfc_pci.h | 4 ++--
 drivers/isdn/hisax/hfc_sx.c  | 6 +++---
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/isdn/hisax/hfc_pci.c b/drivers/isdn/hisax/hfc_pci.c
index 8e5b03161b2f..a63b9155b697 100644
--- a/drivers/isdn/hisax/hfc_pci.c
+++ b/drivers/isdn/hisax/hfc_pci.c
@@ -128,7 +128,7 @@ reset_hfcpci(struct IsdnCardState *cs)
 	Write_hfc(cs, HFCPCI_INT_M1, cs->hw.hfcpci.int_m1);
 
 	/* Clear already pending ints */
-	if (Read_hfc(cs, HFCPCI_INT_S1));
+	(void) Read_hfc(cs, HFCPCI_INT_S1);
 
 	Write_hfc(cs, HFCPCI_STATES, HFCPCI_LOAD_STATE | 2);	/* HFC ST 2 */
 	udelay(10);
@@ -158,7 +158,7 @@ reset_hfcpci(struct IsdnCardState *cs)
 	/* Finally enable IRQ output */
 	cs->hw.hfcpci.int_m2 = HFCPCI_IRQ_ENABLE;
 	Write_hfc(cs, HFCPCI_INT_M2, cs->hw.hfcpci.int_m2);
-	if (Read_hfc(cs, HFCPCI_INT_S1));
+	(void) Read_hfc(cs, HFCPCI_INT_S1);
 }
 
 /***************************************************/
@@ -1537,7 +1537,7 @@ hfcpci_bh(struct work_struct *work)
 					cs->hw.hfcpci.int_m1 &= ~HFCPCI_INTS_TIMER;
 					Write_hfc(cs, HFCPCI_INT_M1, cs->hw.hfcpci.int_m1);
 					/* Clear already pending ints */
-					if (Read_hfc(cs, HFCPCI_INT_S1));
+					(void) Read_hfc(cs, HFCPCI_INT_S1);
 					Write_hfc(cs, HFCPCI_STATES, 4 | HFCPCI_LOAD_STATE);
 					udelay(10);
 					Write_hfc(cs, HFCPCI_STATES, 4);
diff --git a/drivers/isdn/hisax/hfc_pci.h b/drivers/isdn/hisax/hfc_pci.h
index 4e58700a3e61..4c3b3ba35726 100644
--- a/drivers/isdn/hisax/hfc_pci.h
+++ b/drivers/isdn/hisax/hfc_pci.h
@@ -228,8 +228,8 @@ typedef union {
 } fifo_area;
 
 
-#define Write_hfc(a, b, c) (*(((u_char *)a->hw.hfcpci.pci_io) + b) = c)
-#define Read_hfc(a, b) (*(((u_char *)a->hw.hfcpci.pci_io) + b))
+#define Write_hfc(a, b, c) (writeb(c, (a->hw.hfcpci.pci_io) + b))
+#define Read_hfc(a, b) (readb((a->hw.hfcpci.pci_io) + b))
 
 extern void main_irq_hcpci(struct BCState *bcs);
 extern void releasehfcpci(struct IsdnCardState *cs);
diff --git a/drivers/isdn/hisax/hfc_sx.c b/drivers/isdn/hisax/hfc_sx.c
index 4d3b4b2f2612..c4f3f37adfc8 100644
--- a/drivers/isdn/hisax/hfc_sx.c
+++ b/drivers/isdn/hisax/hfc_sx.c
@@ -381,7 +381,7 @@ reset_hfcsx(struct IsdnCardState *cs)
 	Write_hfc(cs, HFCSX_INT_M1, cs->hw.hfcsx.int_m1);
 
 	/* Clear already pending ints */
-	if (Read_hfc(cs, HFCSX_INT_S1));
+	(void) Read_hfc(cs, HFCSX_INT_S1);
 
 	Write_hfc(cs, HFCSX_STATES, HFCSX_LOAD_STATE | 2);	/* HFC ST 2 */
 	udelay(10);
@@ -411,7 +411,7 @@ reset_hfcsx(struct IsdnCardState *cs)
 	/* Finally enable IRQ output */
 	cs->hw.hfcsx.int_m2 = HFCSX_IRQ_ENABLE;
 	Write_hfc(cs, HFCSX_INT_M2, cs->hw.hfcsx.int_m2);
-	if (Read_hfc(cs, HFCSX_INT_S2));
+	(void) Read_hfc(cs, HFCSX_INT_S2);
 }
 
 /***************************************************/
@@ -1288,7 +1288,7 @@ hfcsx_bh(struct work_struct *work)
 					cs->hw.hfcsx.int_m1 &= ~HFCSX_INTS_TIMER;
 					Write_hfc(cs, HFCSX_INT_M1, cs->hw.hfcsx.int_m1);
 					/* Clear already pending ints */
-					if (Read_hfc(cs, HFCSX_INT_S1));
+					(void) Read_hfc(cs, HFCSX_INT_S1);
 
 					Write_hfc(cs, HFCSX_STATES, 4 | HFCSX_LOAD_STATE);
 					udelay(10);
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH net-next] ixgbe: fix XFRM_ALGO dependency
From: Jeff Kirsher @ 2018-10-17 18:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: shannon.nelson, David Miller, Steffen Klassert, Herbert Xu,
	Jesse Brandeburg, Björn Töpel, Alexander Duyck,
	intel-wired-lan, Networking, Linux Kernel Mailing List
In-Reply-To: <CAK8P3a3RiMHVkoA+Wp_abPJ5Fzwk5UdhoOaxt++q4_YnzDRwfA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

On Wed, 2018-10-17 at 18:04 +0200, Arnd Bergmann wrote:
> On Wed, Oct 17, 2018 at 5:53 PM Jeff Kirsher
> <jeffrey.t.kirsher@intel.com> wrote:
> > On Tue, 2018-10-16 at 09:35 -0700, Shannon Nelson wrote:
> > > On 10/16/2018 3:03 AM, Arnd Bergmann wrote:
> > > > A separate Kconfig symbol now controls whether we include the
> > > > ipsec
> > > > offload code. To keep the old behavior, this is left as
> > > > 'default
> > > > y'. The
> > > > dependency in XFRM_OFFLOAD still causes a circular dependency
> > > > but
> > > > is
> > > > not actually needed because this symbol is not user visible, so
> > > > removing
> > > > that dependency on top makes it all work.
> > > > 
> > > > Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
> > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > 
> > I agree with Shannon's suggested changes.  Arnd, are you working on
> > v2?
> > Or would you like me to take care of it?
> 
> I was planning to respin it, but didn't get around to it yet, and
> will
> be travelling for the next week, so I'd welcome if you can take over
> from here. Shannon's comments all make sense to me as well.

Ok, I will run with it and make a v2 for you.

> 
> > > > +config IXGBE_IPSEC
> > > > +   bool "IPSec XFRM cryptography-offload accelaration"
> > > > +   default n
> > > 
> > > remove this "default n" line?
> 
> I meant for this to say "default y", as I said in the changelog,
> but feel free to pick whichever default makes sense to you
> make make the description match ;-)


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v2 00/13] bpf: add btf func info support
From: Edward Cree @ 2018-10-17 11:02 UTC (permalink / raw)
  To: Yonghong Song, ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017072315.2766920-1-yhs@fb.com>

I think the BTF work needs to be better documented; at the moment the only way
 to determine how BTF sections are structured is to read through the headers,
 and cross-reference with the DWARF spec to guess at the semantics of various
 fields.  I've been working on adding BTF support to ebpf_asm, and finding
 very frustrating the amount of guesswork required.
Therefore please make sure that each patch extending the BTF format includes
 documentation patches describing both the layout and the semantics of the new
 extensions.  For example in patch #9 there is no explanation of
 btf_ext_header.line_info_off and btf_ext_header.line_info_len (they're not
 even used by the code, so one cannot reverse-engineer it); while it's fairly
 clear that they indicate the bounds of the line_info subsection, there is no
 specification of what this subsection contains.

-Ed

^ permalink raw reply

* Re: [PATCH bpf-next v2 13/13] tools/bpf: bpftool: add support for jited func types
From: Edward Cree @ 2018-10-17 11:11 UTC (permalink / raw)
  To: Yonghong Song, ast, kafai, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20181017072400.2768484-4-yhs@fb.com>

On 17/10/18 08:24, Yonghong Song wrote:
> This patch added support to print function signature
> if btf func_info is available. Note that ksym
> now uses function name instead of prog_name as
> prog_name has a limit of 16 bytes including
> ending '\0'.
>
> The following is a sample output for selftests
> test_btf with file test_btf_haskv.o:
>
>   $ bpftool prog dump jited id 1
>   int _dummy_tracepoint(struct dummy_tracepoint_args * ):
>   bpf_prog_b07ccb89267cf242__dummy_tracepoint:
>      0:   push   %rbp
>      1:   mov    %rsp,%rbp
>     ......
>     3c:   add    $0x28,%rbp
>     40:   leaveq
>     41:   retq
>
>   int test_long_fname_1(struct dummy_tracepoint_args * ):
>   bpf_prog_2dcecc18072623fc_test_long_fname_1:
>      0:   push   %rbp
>      1:   mov    %rsp,%rbp
>     ......
>     3a:   add    $0x28,%rbp
>     3e:   leaveq
>     3f:   retq
>
>   int test_long_fname_2(struct dummy_tracepoint_args * ):
>   bpf_prog_89d64e4abf0f0126_test_long_fname_2:
>      0:   push   %rbp
>      1:   mov    %rsp,%rbp
>     ......
>     80:   add    $0x28,%rbp
>     84:   leaveq
>     85:   retq
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  tools/bpf/bpftool/btf_dumper.c | 96 ++++++++++++++++++++++++++++++++++
>  tools/bpf/bpftool/main.h       |  2 +
>  tools/bpf/bpftool/prog.c       | 54 +++++++++++++++++++
>  3 files changed, 152 insertions(+)
>
> diff --git a/tools/bpf/bpftool/btf_dumper.c b/tools/bpf/bpftool/btf_dumper.c
> index 55bc512a1831..a31df4202335 100644
> --- a/tools/bpf/bpftool/btf_dumper.c
> +++ b/tools/bpf/bpftool/btf_dumper.c
> @@ -249,3 +249,99 @@ int btf_dumper_type(const struct btf_dumper *d, __u32 type_id,
>  {
>  	return btf_dumper_do_type(d, type_id, 0, data);
>  }
> +
> +#define BTF_PRINT_STRING(str)						\
> +	{								\
> +		pos += snprintf(func_sig + pos, size - pos, str);	\
> +		if (pos >= size)					\
> +			return -1;					\
> +	}
Usual kernel practice for this sort of macro is to use
    do { \
    } while(0)
 to ensure correct behaviour if the macro is used within another control
 flow statement, e.g.
    if (x)
        BTF_PRINT_STRING(x);
    else
        do_something_else();
 will not compile with the bare braces as the else will be detached.
> +#define BTF_PRINT_ONE_ARG(fmt, arg)					\
> +	{								\
> +		pos += snprintf(func_sig + pos, size - pos, fmt, arg);	\
> +		if (pos >= size)					\
> +			return -1;					\
> +	}
Any reason for not just using a variadic macro?
> +#define BTF_PRINT_TYPE_ONLY(type)					\
> +	{								\
> +		pos = __btf_dumper_type_only(btf, type, func_sig,	\
> +					     pos, size);		\
> +		if (pos == -1)						\
> +			return -1;					\
> +	}
> +
> +static int __btf_dumper_type_only(struct btf *btf, __u32 type_id,
> +				  char *func_sig, int pos, int size)
> +{
> +	const struct btf_type *t = btf__type_by_id(btf, type_id);
> +	const struct btf_array *array;
> +	int i, vlen;
> +
> +	switch (BTF_INFO_KIND(t->info)) {
> +	case BTF_KIND_INT:
> +		BTF_PRINT_ONE_ARG("%s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_STRUCT:
> +		BTF_PRINT_ONE_ARG("struct %s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_UNION:
> +		BTF_PRINT_ONE_ARG("union %s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_ENUM:
> +		BTF_PRINT_ONE_ARG("enum %s ",
> +				  btf__name_by_offset(btf, t->name_off));
> +		break;
> +	case BTF_KIND_ARRAY:
> +		array = (struct btf_array *)(t + 1);
> +		BTF_PRINT_TYPE_ONLY(array->type);
> +		BTF_PRINT_ONE_ARG("[%d]", array->nelems);
> +		break;
> +	case BTF_KIND_PTR:
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		BTF_PRINT_STRING("* ");
> +		break;
> +	case BTF_KIND_UNKN:
> +	case BTF_KIND_FWD:
> +	case BTF_KIND_TYPEDEF:
> +		return -1;
> +	case BTF_KIND_VOLATILE:
> +		BTF_PRINT_STRING("volatile ");
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		break;
> +	case BTF_KIND_CONST:
> +		BTF_PRINT_STRING("const ");
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		break;
> +	case BTF_KIND_RESTRICT:
> +		BTF_PRINT_STRING("restrict ");
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		break;
> +	case BTF_KIND_FUNC:
> +	case BTF_KIND_FUNC_PROTO:
> +		BTF_PRINT_TYPE_ONLY(t->type);
> +		BTF_PRINT_ONE_ARG("%s(", btf__name_by_offset(btf, t->name_off));
> +		vlen = BTF_INFO_VLEN(t->info);
> +		for (i = 0; i < vlen; i++) {
> +			__u32 arg_type = ((__u32 *)(t + 1))[i];
> +
> +			BTF_PRINT_TYPE_ONLY(arg_type);
> +			if (i != (vlen - 1))
> +				BTF_PRINT_STRING(", ");
> +		}
In this kind of loop I find it cleaner to print the comma before the item;
 that way the test becomes i != 0.  Thus:
    for (i = 0; i < vlen; i++) {
        __u32 arg_type = ((__u32 *)(t + 1))[i];

        if (i)
            BTF_PRINT_STRING(", ");
        BTF_PRINT_TYPE_ONLY(arg_type);
    }

-Ed

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: jiangyiwen @ 2018-10-17 11:32 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <d16d2052-bfb1-7861-e210-b53b4ea3260c@redhat.com>

On 2018/10/17 17:39, Jason Wang wrote:
> 
> On 2018/10/17 下午5:27, jiangyiwen wrote:
>> On 2018/10/15 14:12, jiangyiwen wrote:
>>> On 2018/10/15 10:33, Jason Wang wrote:
>>>>
>>>> On 2018年10月15日 09:43, jiangyiwen wrote:
>>>>> Hi Stefan & All:
>>>>>
>>>>> Now I find vhost-vsock has two performance problems even if it
>>>>> is not designed for performance.
>>>>>
>>>>> First, I think vhost-vsock should faster than vhost-net because it
>>>>> is no TCP/IP stack, but the real test result vhost-net is 5~10
>>>>> times than vhost-vsock, currently I am looking for the reason.
>>>> TCP/IP is not a must for vhost-net.
>>>>
>>>> How do you test and compare the performance?
>>>>
>>>> Thanks
>>>>
>>> I test the performance used my test tool, like follows:
>>>
>>> Server                   Client
>>> socket()
>>> bind()
>>> listen()
>>>
>>>                           socket(AF_VSOCK) or socket(AF_INET)
>>> Accept() <-------------->connect()
>>>                           *======Start Record Time======*
>>>                           Call syscall sendfile()
>>> Recv()
>>>                           Send end
>>> Receive end
>>> Send(file_size)
>>>                           Recv(file_size)
>>>                           *======End Record Time======*
>>>
>>> The test result, vhost-vsock is about 500MB/s, and vhost-net is about 2500MB/s.
>>>
>>> By the way, vhost-net use single queue.
>>>
>>> Thanks.
>>>
>>>>> Second, vhost-vsock only supports two vqs(tx and rx), that means
>>>>> if multiple sockets in the guest will use the same vq to transmit
>>>>> the message and get the response. So if there are multiple applications
>>>>> in the guest, we should support "Multiqueue" feature for Virtio-vsock.
>>>>>
>>>>> Stefan, have you encountered these problems?
>>>>>
>>>>> Thanks,
>>>>> Yiwen.
>>>>>
>>>>
>>>> .
>>>>
>>>
>> Hi Jason and Stefan,
>>
>> Maybe I find the reason of bad performance.
>>
>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>> increase to 64k, it can improve about 3 times(~1500MB/s).
> 
> 
> Looks like the value was chosen for a balance between rx buffer size and performance. Allocating 64K always even for small packet is kind of waste and stress for guest memory. Virito-net try to avoid this by inventing the merge able rx buffer which allows big packet to be scattered in into different buffers. We can reuse this idea or revisit the idea of using virtio-net/vhost-net as a transport of vsock.
> 
> What interesting is the performance is still behind vhost-net.
> 
> Thanks
> 

Actually I don't understand why pkt_len is limited to
VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE in virtio_transport_send_pkt_info(),
while I think it should used VIRTIO_VSOCK_MAX_PKT_BUF_SIZE instead.

Thanks.

>>
>> By the way, I send to 64K in application once, and I don't use
>> sg_init_one and rewrite function to packet sg list because pkt_len
>> include multiple pages.
>>
>> Thanks,
>> Yiwen.
>>
>>> _______________________________________________
>>> Virtualization mailing list
>>> Virtualization@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>>>
>>
> 
> .
> 

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload
From: Arnaldo Carvalho de Melo @ 2018-10-17 12:11 UTC (permalink / raw)
  To: Song Liu
  Cc: David Ahern, Alexei Starovoitov, Peter Zijlstra,
	Alexei Starovoitov, Alexey Budankov, David S . Miller,
	Daniel Borkmann, Namhyung Kim, Jiri Olsa, Networking, kernel-team
In-Reply-To: <CAPhsuW7zZE51ibma__y8SDVUU_YQMjGyHRhYDhBsuJF_b89h6g@mail.gmail.com>

Adding Alexey, Jiri and Namhyung as they worked/are working on
multithreading 'perf record'.

Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> On Tue, Oct 16, 2018 at 4:43 PM David Ahern <dsahern@gmail.com> wrote:
> > On 10/15/18 4:33 PM, Song Liu wrote:
> > > I am working with Alexei on the idea of fetching BPF program information via
> > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > to perf_event_type, and dumped these events to perf event ring buffer.

> > > I found that perf will not process event until the end of perf-record:

> > > root@virt-test:~# ~/perf record -ag -- sleep 10
> > > ...... 10 seconds later
> > > [ perf record: Woken up 34 times to write data ]
> > > machine__process_bpf_event: prog_id 6 loaded
> > > machine__process_bpf_event: prog_id 6 unloaded
> > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]

> > > In this example, the bpf program was loaded and then unloaded in
> > > another terminal. When machine__process_bpf_event() processes
> > > the load event, the bpf program is already unloaded. Therefore,
> > > machine__process_bpf_event() will not be able to get information
> > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.

> > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > as soon as perf get the event from kernel. I looked around the perf
> > > code for a while. But I haven't found a good example where some
> > > events are processed before the end of perf-record. Could you
> > > please help me with this?

> > perf record does not process events as they are generated. Its sole job
> > is pushing data from the maps to a file as fast as possible meaning in
> > bulk based on current read and write locations.

> > Adding code to process events will add significant overhead to the
> > record command and will not really solve your race problem.

> I agree that processing events while recording has significant overhead.
> In this case, perf user space need to know details about the the jited BPF
> program. It is impossible to pass all these details to user space through
> the relatively stable ring_buffer API. Therefore, some processing of the
> data is necessary (get bpf prog_id from ring buffer, and then fetch program
> details via BPF_OBJ_GET_INFO_BY_FD.
 
> I have some idea on processing important data with relatively low overhead.
> Let me try implement it.

Well, you could have a separate thread processing just those kinds of
events, associate it with a dummy event where you only ask for
PERF_RECORD_BPF_EVENTs.

Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
perf_event_attr:

[root@seventh ~]# perf record -vv -e dummy sleep 01
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             112
  config                           0x9
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|PERIOD
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
------------------------------------------------------------
sys_perf_event_open: pid 12046  cpu 0  group_fd -1  flags 0x8 = 4
sys_perf_event_open: pid 12046  cpu 1  group_fd -1  flags 0x8 = 5
sys_perf_event_open: pid 12046  cpu 2  group_fd -1  flags 0x8 = 6
sys_perf_event_open: pid 12046  cpu 3  group_fd -1  flags 0x8 = 8
mmap size 528384B
perf event ring buffer mmapped per cpu
Synthesizing TSC conversion information
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]
[root@seventh ~]#

[root@seventh ~]# perf evlist -v
dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
[root@seventh ~]# 

There is work ongoing in dumping one file per cpu and then, at post
processing time merging all those files to get ordering, so one more
file, for these VIP events, that require per-event processing would be
ordered at that time with all the other per-cpu files.

- Arnaldo

^ permalink raw reply

* Re: [RFC PATCH 0/3] M_CAN Framework rework
From: Dan Murphy @ 2018-10-17 20:21 UTC (permalink / raw)
  To: wg, mkl, davem; +Cc: linux-can, netdev, linux-kernel
In-Reply-To: <20181010142055.25271-1-dmurphy@ti.com>

Bump

On 10/10/2018 09:20 AM, Dan Murphy wrote:
> All
> 
> This patch series creates a m_can core framework that devices can register
> to.  The m_can core manages the Bosch IP and CAN frames.  Each device that
> is registered is responsible for managing device specific functions.
> 
> This rewrite was suggested in a device driver submission for the TCAN4x5x
> device
> Reference upstream post:
> https://lore.kernel.org/patchwork/patch/984163/
> 
> For instance the TCAN device is a SPI device that uses a specific data payload to
> determine writes and reads.  In addition the device has a reset input as well
> as a wakeup pin.  The register offset of the m_can registers differs and must
> be set by the device attached to the core.
> 
> The m_can core will use iomapped writes and reads as the default mechanism for
> writing and reading.  The device driver can provide over rides for this.
> 
> This patch series is not complete as it does not handle the CAN interrupts
> nor can perform a CAN write.  If this patch series is deemed acceptable I will
> finish debugging the driver and post a non RFC series.
> 
> Finally I did attempt to reduce the first patch with various git format patch
> directives but none seemed to reduce the patch.
> 
> Dan
> 
> Dan Murphy (3):
>   can: m_can: Create m_can core to leverage common code
>   dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver
>   can: tcan4x5x: Add tcan4x5x driver to the kernel
> 
>  .../devicetree/bindings/net/can/tcan4x5x.txt  |   34 +
>  drivers/net/can/m_can/Kconfig                 |   18 +
>  drivers/net/can/m_can/Makefile                |    4 +-
>  drivers/net/can/m_can/m_can.c                 | 1683 +----------------
>  .../net/can/m_can/{m_can.c => m_can_core.c}   |  479 +++--
>  drivers/net/can/m_can/m_can_core.h            |  100 +
>  drivers/net/can/m_can/tcan4x5x.c              |  321 ++++
>  7 files changed, 722 insertions(+), 1917 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/net/can/tcan4x5x.txt
>  copy drivers/net/can/m_can/{m_can.c => m_can_core.c} (83%)
>  create mode 100644 drivers/net/can/m_can/m_can_core.h
>  create mode 100644 drivers/net/can/m_can/tcan4x5x.c
> 


-- 
------------------
Dan Murphy

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-17 12:31 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BC72006.9010000@huawei.com>


On 2018/10/17 下午7:41, jiangyiwen wrote:
> On 2018/10/17 17:51, Jason Wang wrote:
>> On 2018/10/17 下午5:39, Jason Wang wrote:
>>>> Hi Jason and Stefan,
>>>>
>>>> Maybe I find the reason of bad performance.
>>>>
>>>> I found pkt_len is limited to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE(4K),
>>>> it will cause the bandwidth is limited to 500~600MB/s. And once I
>>>> increase to 64k, it can improve about 3 times(~1500MB/s).
>>>
>>> Looks like the value was chosen for a balance between rx buffer size and performance. Allocating 64K always even for small packet is kind of waste and stress for guest memory. Virito-net try to avoid this by inventing the merge able rx buffer which allows big packet to be scattered in into different buffers. We can reuse this idea or revisit the idea of using virtio-net/vhost-net as a transport of vsock.
>>>
>>> What interesting is the performance is still behind vhost-net.
>>>
>>> Thanks
>>>
>>>> By the way, I send to 64K in application once, and I don't use
>>>> sg_init_one and rewrite function to packet sg list because pkt_len
>>>> include multiple pages.
>>>>
>>>> Thanks,
>>>> Yiwen.
>>
>> Btw, if you're using vsock for transferring large files, maybe it's more efficient to implement sendpage() for vsock to allow sendfile()/splice() work.
>>
>> Thanks
>>
> I can't agree more.
>
> why vhost_vsock is still behind vhost_net?
> Because I use sendfile() to test performance at first, and then
> I found vsock don't implement sendpage() and cause the bandwidth
> can't be increased. So I use read() and send() to replace sendfile(),
> it will increase some switch between kernel and user mode, and sendfile()
> can support zero copy. I think this is main reason.
>
> Thanks.


Want to post patches for this then :) ?

Thanks


>
>> .
>>
>

^ permalink raw reply

* [PATCH V1 net-next] net: ena: enable Low Latency Queues
From: akiyano @ 2018-10-17 12:33 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi

From: Arthur Kiyanovski <akiyano@amazon.com>

Use the new API to enable usage of LLQ.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 284a0a6..18956e7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3022,20 +3022,10 @@ static int ena_calc_io_queue_num(struct pci_dev *pdev,
 	int io_sq_num, io_queue_num;
 
 	/* In case of LLQ use the llq number in the get feature cmd */
-	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) {
-		io_sq_num = get_feat_ctx->max_queues.max_legacy_llq_num;
-
-		if (io_sq_num == 0) {
-			dev_err(&pdev->dev,
-				"Trying to use LLQ but llq_num is 0. Fall back into regular queues\n");
-
-			ena_dev->tx_mem_queue_type =
-				ENA_ADMIN_PLACEMENT_POLICY_HOST;
-			io_sq_num = get_feat_ctx->max_queues.max_sq_num;
-		}
-	} else {
+	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
+		io_sq_num = get_feat_ctx->llq.max_llq_num;
+	else
 		io_sq_num = get_feat_ctx->max_queues.max_sq_num;
-	}
 
 	io_queue_num = min_t(int, num_online_cpus(), ENA_MAX_NUM_IO_QUEUES);
 	io_queue_num = min_t(int, io_queue_num, io_sq_num);
@@ -3238,7 +3228,7 @@ static int ena_calc_queue_size(struct pci_dev *pdev,
 
 	if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV)
 		queue_size = min_t(u32, queue_size,
-				   get_feat_ctx->max_queues.max_legacy_llq_depth);
+				   get_feat_ctx->llq.max_llq_depth);
 
 	queue_size = rounddown_pow_of_two(queue_size);
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload
From: Arnaldo Carvalho de Melo @ 2018-10-17 12:50 UTC (permalink / raw)
  To: Song Liu
  Cc: David Ahern, Alexei Starovoitov, Peter Zijlstra,
	Alexei Starovoitov, Alexey Budankov, David S . Miller,
	Daniel Borkmann, Namhyung Kim, Jiri Olsa, Networking, kernel-team
In-Reply-To: <20181017121140.GA31465@kernel.org>

Em Wed, Oct 17, 2018 at 09:11:40AM -0300, Arnaldo Carvalho de Melo escreveu:
> Adding Alexey, Jiri and Namhyung as they worked/are working on
> multithreading 'perf record'.
> 
> Em Tue, Oct 16, 2018 at 11:43:11PM -0700, Song Liu escreveu:
> > On Tue, Oct 16, 2018 at 4:43 PM David Ahern <dsahern@gmail.com> wrote:
> > > On 10/15/18 4:33 PM, Song Liu wrote:
> > > > I am working with Alexei on the idea of fetching BPF program information via
> > > > BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
> > > > to perf_event_type, and dumped these events to perf event ring buffer.
> 
> > > > I found that perf will not process event until the end of perf-record:
> 
> > > > root@virt-test:~# ~/perf record -ag -- sleep 10
> > > > ...... 10 seconds later
> > > > [ perf record: Woken up 34 times to write data ]
> > > > machine__process_bpf_event: prog_id 6 loaded
> > > > machine__process_bpf_event: prog_id 6 unloaded
> > > > [ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]
> 
> > > > In this example, the bpf program was loaded and then unloaded in
> > > > another terminal. When machine__process_bpf_event() processes
> > > > the load event, the bpf program is already unloaded. Therefore,
> > > > machine__process_bpf_event() will not be able to get information
> > > > about the program via BPF_OBJ_GET_INFO_BY_FD cmd.
> 
> > > > To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
> > > > as soon as perf get the event from kernel. I looked around the perf
> > > > code for a while. But I haven't found a good example where some
> > > > events are processed before the end of perf-record. Could you
> > > > please help me with this?
> 
> > > perf record does not process events as they are generated. Its sole job
> > > is pushing data from the maps to a file as fast as possible meaning in
> > > bulk based on current read and write locations.
> 
> > > Adding code to process events will add significant overhead to the
> > > record command and will not really solve your race problem.
> 
> > I agree that processing events while recording has significant overhead.
> > In this case, perf user space need to know details about the the jited BPF
> > program. It is impossible to pass all these details to user space through
> > the relatively stable ring_buffer API. Therefore, some processing of the
> > data is necessary (get bpf prog_id from ring buffer, and then fetch program
> > details via BPF_OBJ_GET_INFO_BY_FD.
>  
> > I have some idea on processing important data with relatively low overhead.
> > Let me try implement it.
> 
> Well, you could have a separate thread processing just those kinds of
> events, associate it with a dummy event where you only ask for
> PERF_RECORD_BPF_EVENTs.
> 
> Here is how to setup the PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY
> perf_event_attr:
> 
> [root@seventh ~]# perf record -vv -e dummy sleep 01
> ------------------------------------------------------------
> perf_event_attr:
>   type                             1
>   size                             112
>   config                           0x9
>   { sample_period, sample_freq }   4000
>   sample_type                      IP|TID|TIME|PERIOD
>   disabled                         1
>   inherit                          1

These you would have disabled, no need for
PERF_RECORD_{MMAP*,COMM,FORK,EXIT} just PERF_RECORD_BPF_EVENT

>   mmap                             1
>   comm                             1
>   task                             1
>   mmap2                            1
>   comm_exec                        1


>   freq                             1
>   enable_on_exec                   1
>   sample_id_all                    1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 12046  cpu 0  group_fd -1  flags 0x8 = 4
> sys_perf_event_open: pid 12046  cpu 1  group_fd -1  flags 0x8 = 5
> sys_perf_event_open: pid 12046  cpu 2  group_fd -1  flags 0x8 = 6
> sys_perf_event_open: pid 12046  cpu 3  group_fd -1  flags 0x8 = 8
> mmap size 528384B
> perf event ring buffer mmapped per cpu
> Synthesizing TSC conversion information
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.014 MB perf.data ]
> [root@seventh ~]#
> 
> [root@seventh ~]# perf evlist -v
> dummy: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
> [root@seventh ~]# 
> 
> There is work ongoing in dumping one file per cpu and then, at post
> processing time merging all those files to get ordering, so one more
> file, for these VIP events, that require per-event processing would be
> ordered at that time with all the other per-cpu files.
> 
> - Arnaldo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox