Netdev List

Netdev List
 help / color / mirror / Atom feed

* [iwl next-queue PATCH 08/10] ixgbe/fm10k: Only support macvlan offload for types that support destination filtering
From: Alexander Duyck @ 2018-04-03 21:16 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

Both the ixgbe and fm10k drivers support destination filtering.

Instead of adding a ton of complexity to support either source or passthru
mode we can instead just avoid offloading them for now. Doing this we avoid
leaking packets into interfaces that aren't meant to receive them.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |    8 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |    7 +++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index ee645ba..26e7497 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -22,6 +22,7 @@
 #include "fm10k.h"
 #include <linux/vmalloc.h>
 #include <net/udp_tunnel.h>
+#include <linux/if_macvlan.h>
 
 /**
  * fm10k_setup_tx_resources - allocate Tx resources (Descriptors)
@@ -1449,6 +1450,13 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 	int size = 0, i;
 	u16 glort;
 
+	/* The hardware supported by fm10k only filters on the destination MAC
+	 * address. In order to avoid issues we only support offloading modes
+	 * where the hardware can actually provide the functionality.
+	 */
+	if (!macvlan_supports_dest_filter(sdev))
+		return ERR_PTR(-EMEDIUMTYPE);
+
 	/* allocate l2 accel structure if it is not available */
 	if (!l2_accel) {
 		/* verify there is enough free GLORTs to support l2_accel */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2c5100e..7c0f310 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9744,6 +9744,13 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
 	unsigned int limit;
 	int pool, err;
 
+	/* The hardware supported by ixgbe only filters on the destination MAC
+	 * address. In order to avoid issues we only support offloading modes
+	 * where the hardware can actually provide the functionality.
+	 */
+	if (!macvlan_supports_dest_filter(vdev))
+		return ERR_PTR(-EMEDIUMTYPE);
+
 	/* Hardware has a limited number of available pools. Each VF, and the
 	 * PF require a pool. Check to ensure we don't attempt to use more
 	 * then the available number of pools.

^ permalink raw reply related

* [iwl next-queue PATCH 07/10] macvlan: Provide function for interfaces to release HW offload
From: Alexander Duyck @ 2018-04-03 21:16 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

This patch provides a basic function to allow a lower device to disable
macvlan offload if it was previously enabled on a given macvlan. The idea
here is to allow for recovery from failure should the lowerdev run out of
resources.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 include/linux/if_macvlan.h |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 0221390..2e55e4c 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -97,4 +97,12 @@ static inline bool macvlan_supports_dest_filter(struct net_device *dev)
 	       macvlan->mode == MACVLAN_MODE_VEPA ||
 	       macvlan->mode == MACVLAN_MODE_BRIDGE;
 }
+
+static inline int macvlan_release_l2fw_offload(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	macvlan->accel_priv = NULL;
+	return dev_uc_add(macvlan->lowerdev, dev->dev_addr);
+}
 #endif /* _LINUX_IF_MACVLAN_H */

^ permalink raw reply related

* [iwl next-queue PATCH 06/10] macvlan: Add function to test for destination filtering support
From: Alexander Duyck @ 2018-04-03 21:16 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

This patch adds a function indicating if a given macvlan can fully supports
destination filtering, especially as it relates to unicast traffic. For
those macvlan interfaces that do not support destination filtering such
passthru or source mode filtering we should not be enabling offload
support.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 include/linux/if_macvlan.h |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 80089d6..0221390 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -88,4 +88,13 @@ static inline void *macvlan_accel_priv(struct net_device *dev)
 
 	return macvlan->accel_priv;
 }
+
+static inline bool macvlan_supports_dest_filter(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	return macvlan->mode == MACVLAN_MODE_PRIVATE ||
+	       macvlan->mode == MACVLAN_MODE_VEPA ||
+	       macvlan->mode == MACVLAN_MODE_BRIDGE;
+}
 #endif /* _LINUX_IF_MACVLAN_H */

^ permalink raw reply related

* [iwl next-queue PATCH 05/10] macvlan: macvlan_count_rx shouldn't be static inline AND extern
From: Alexander Duyck @ 2018-04-03 21:16 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

It doesn't make sense to define macvlan_count_rx as a static inline and
then add a forward declaration after that as an extern. I am dropping the
extern declaration since it seems like it is something that likely got
missed when the function was made an inline.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 include/linux/if_macvlan.h |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index c5106ce..80089d6 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -61,10 +61,6 @@ extern int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 				  struct nlattr *tb[], struct nlattr *data[],
 				  struct netlink_ext_ack *extack);
 
-extern void macvlan_count_rx(const struct macvlan_dev *vlan,
-			     unsigned int len, bool success,
-			     bool multicast);
-
 extern void macvlan_dellink(struct net_device *dev, struct list_head *head);
 
 extern int macvlan_link_register(struct rtnl_link_ops *ops);

^ permalink raw reply related

* [iwl next-queue PATCH 04/10] ixgbe/fm10k: Drop tracking stats for macvlan broadcast/multicast
From: Alexander Duyck @ 2018-04-03 21:16 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

Drop dead code now that we shouldn't be receiving broadcast or multicast
frames on the queues associated to the macvlan netdev.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    7 +++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index df86070..c51d61f 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -445,15 +445,14 @@ static void fm10k_type_trans(struct fm10k_ring *rx_ring,
 			l2_accel = NULL;
 	}
 
-	skb->protocol = eth_type_trans(skb, dev);
-
 	/* Record Rx queue, or update macvlan statistics */
 	if (!l2_accel)
 		skb_record_rx_queue(skb, rx_ring->queue_index);
 	else
 		macvlan_count_rx(netdev_priv(dev), skb->len + ETH_HLEN, true,
-				 (skb->pkt_type == PACKET_BROADCAST) ||
-				 (skb->pkt_type == PACKET_MULTICAST));
+				 false);
+
+	skb->protocol = eth_type_trans(skb, dev);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6172c1b..2c5100e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1768,15 +1768,14 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 	if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_STAT_SECP))
 		ixgbe_ipsec_rx(rx_ring, rx_desc, skb);
 
-	skb->protocol = eth_type_trans(skb, dev);
-
 	/* record Rx queue, or update MACVLAN statistics */
 	if (netif_is_ixgbe(dev))
 		skb_record_rx_queue(skb, rx_ring->queue_index);
 	else
 		macvlan_count_rx(netdev_priv(dev), skb->len + ETH_HLEN, true,
-				 (skb->pkt_type == PACKET_BROADCAST) ||
-				 (skb->pkt_type == PACKET_MULTICAST));
+				 false);
+
+	skb->protocol = eth_type_trans(skb, dev);
 }
 
 static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,

^ permalink raw reply related

* [iwl next-queue PATCH 03/10] macvlan: Use software path for offloaded local, broadcast, and multicast traffic
From: Alexander Duyck @ 2018-04-03 21:16 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

This change makes it so that we use a software path for packets that are
going to be locally switched between two macvlan interfaces on the same
device. In addition we resort to software replication of broadcast and
multicast packets instead of offloading that to hardware.

The general idea is that using the device for east/west traffic local to
the system is extremely inefficient. We can only support up to whatever the
PCIe limit is for any given device so this caps us at somewhere around 20G
for devices supported by ixgbe. This is compounded even further when you
take broadcast and multicast into account as a single 10G port can come to
a crawl as a packet is replicated up to 60+ times in some cases. In order
to get away from that I am implementing changes so that we handle
broadcast/multicast replication and east/west local traffic all in
software.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |    4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |   39 +++++--------------
 drivers/net/macvlan.c                           |   47 ++++++++++-------------
 3 files changed, 33 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 4579349..ee645ba 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1254,7 +1254,7 @@ void fm10k_restore_rx_state(struct fm10k_intfc *interface)
 			glort = l2_accel->dglort + 1 + i;
 
 			hw->mac.ops.update_xcast_mode(hw, glort,
-						      FM10K_XCAST_MODE_MULTI);
+						      FM10K_XCAST_MODE_NONE);
 			fm10k_queue_mac_request(interface, glort,
 						sdev->dev_addr,
 						hw->mac.default_vid, true);
@@ -1515,7 +1515,7 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 
 	if (fm10k_host_mbx_ready(interface)) {
 		hw->mac.ops.update_xcast_mode(hw, glort,
-					      FM10K_XCAST_MODE_MULTI);
+					      FM10K_XCAST_MODE_NONE);
 		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
 					hw->mac.default_vid, true);
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c08e6b5..6172c1b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4211,7 +4211,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 reg_offset, vf_shift;
+	u16 pool = adapter->num_rx_pools;
+	u32 reg_offset, vf_shift, vmolr;
 	u32 gcr_ext, vmdctl;
 	int i;
 
@@ -4225,6 +4226,13 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	vmdctl |= IXGBE_VT_CTL_REPLEN;
 	IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vmdctl);
 
+	/* accept untagged packets until a vlan tag is
+	 * specifically set for the VMDQ queue/pool
+	 */
+	vmolr = IXGBE_VMOLR_AUPE;
+	while (pool--)
+		IXGBE_WRITE_REG(hw, IXGBE_VMOLR(VMDQ_P(pool)), vmolr);
+
 	vf_shift = VMDQ_P(0) % 32;
 	reg_offset = (VMDQ_P(0) >= 32) ? 1 : 0;
 
@@ -5271,28 +5279,6 @@ static void ixgbe_fdir_filter_restore(struct ixgbe_adapter *adapter)
 	spin_unlock(&adapter->fdir_perfect_lock);
 }
 
-static void ixgbe_macvlan_set_rx_mode(struct net_device *dev, unsigned int pool,
-				      struct ixgbe_adapter *adapter)
-{
-	struct ixgbe_hw *hw = &adapter->hw;
-	u32 vmolr;
-
-	/* No unicast promiscuous support for VMDQ devices. */
-	vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(pool));
-	vmolr |= (IXGBE_VMOLR_ROMPE | IXGBE_VMOLR_BAM | IXGBE_VMOLR_AUPE);
-
-	/* clear the affected bit */
-	vmolr &= ~IXGBE_VMOLR_MPE;
-
-	if (dev->flags & IFF_ALLMULTI) {
-		vmolr |= IXGBE_VMOLR_MPE;
-	} else {
-		vmolr |= IXGBE_VMOLR_ROMPE;
-		hw->mac.ops.update_mc_addr_list(hw, dev);
-	}
-	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr);
-}
-
 /**
  * ixgbe_clean_rx_ring - Free Rx Buffers per Queue
  * @rx_ring: ring to free buffers from
@@ -5376,10 +5362,8 @@ static int ixgbe_fwd_ring_up(struct net_device *vdev,
 	 */
 	err = ixgbe_add_mac_filter(adapter, vdev->dev_addr,
 				   VMDQ_P(accel->pool));
-	if (err >= 0) {
-		ixgbe_macvlan_set_rx_mode(vdev, accel->pool, adapter);
+	if (err >= 0)
 		return 0;
-	}
 
 	for (i = 0; i < adapter->num_rx_queues_per_pool; i++)
 		adapter->rx_ring[baseq + i]->netdev = NULL;
@@ -9816,9 +9800,6 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
 	ixgbe_del_mac_filter(adapter, accel->netdev->dev_addr,
 			     VMDQ_P(accel->pool));
 
-	/* disable ability to receive packets for this pool */
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_VMOLR(accel->pool), 0);
-
 	/* Allow remaining Rx packets to get flushed out of the
 	 * Rx FIFO before we drop the netdev for the ring.
 	 */
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 7ddc94f..adde8fc 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -514,6 +514,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 	const struct macvlan_dev *vlan = netdev_priv(dev);
 	const struct macvlan_port *port = vlan->port;
 	const struct macvlan_dev *dest;
+	void *accel_priv = NULL;
 
 	if (vlan->mode == MACVLAN_MODE_BRIDGE) {
 		const struct ethhdr *eth = (void *)skb->data;
@@ -533,9 +534,14 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 
+	/* For packets that are non-multicast and not bridged we will pass
+	 * the necessary information so that the lowerdev can distinguish
+	 * the source of the packets via the accel_priv value.
+	 */
+	accel_priv = vlan->accel_priv;
 xmit_world:
 	skb->dev = vlan->lowerdev;
-	return dev_queue_xmit(skb);
+	return dev_queue_xmit_accel(skb, accel_priv);
 }
 
 static inline netdev_tx_t macvlan_netpoll_send_skb(struct macvlan_dev *vlan, struct sk_buff *skb)
@@ -552,19 +558,14 @@ static inline netdev_tx_t macvlan_netpoll_send_skb(struct macvlan_dev *vlan, str
 static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 				      struct net_device *dev)
 {
+	struct macvlan_dev *vlan = netdev_priv(dev);
 	unsigned int len = skb->len;
 	int ret;
-	struct macvlan_dev *vlan = netdev_priv(dev);
 
 	if (unlikely(netpoll_tx_running(dev)))
 		return macvlan_netpoll_send_skb(vlan, skb);
 
-	if (vlan->accel_priv) {
-		skb->dev = vlan->lowerdev;
-		ret = dev_queue_xmit_accel(skb, vlan->accel_priv);
-	} else {
-		ret = macvlan_queue_xmit(skb, dev);
-	}
+	ret = macvlan_queue_xmit(skb, dev);
 
 	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
 		struct vlan_pcpu_stats *pcpu_stats;
@@ -620,26 +621,20 @@ static int macvlan_open(struct net_device *dev)
 	/* Attempt to populate accel_priv which is used to offload the L2
 	 * forwarding requests for unicast packets.
 	 */
-	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
+	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD)
 		vlan->accel_priv =
 		      lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
 
-		/* If we get a NULL pointer back, or if we get an error
-		 * then we should just fall through to the non accelerated path
-		 */
-		if (IS_ERR_OR_NULL(vlan->accel_priv))
-			vlan->accel_priv = NULL;
-		else
-			return 0;
+	/* If earlier attempt to offload failed, or accel_priv is not
+	 * populated we must add the unicast address to the lower device.
+	 */
+	if (IS_ERR_OR_NULL(vlan->accel_priv)) {
+		vlan->accel_priv = NULL;
+		err = dev_uc_add(lowerdev, dev->dev_addr);
+		if (err < 0)
+			goto out;
 	}
 
-	err = -EBUSY;
-	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
-		goto out;
-
-	err = dev_uc_add(lowerdev, dev->dev_addr);
-	if (err < 0)
-		goto out;
 	if (dev->flags & IFF_ALLMULTI) {
 		err = dev_set_allmulti(lowerdev, 1);
 		if (err < 0)
@@ -660,13 +655,14 @@ static int macvlan_open(struct net_device *dev)
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(lowerdev, -1);
 del_unicast:
-	dev_uc_del(lowerdev, dev->dev_addr);
-out:
 	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
 							   vlan->accel_priv);
 		vlan->accel_priv = NULL;
+	} else {
+		dev_uc_del(lowerdev, dev->dev_addr);
 	}
+out:
 	return err;
 }
 
@@ -679,7 +675,6 @@ static int macvlan_stop(struct net_device *dev)
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
 							   vlan->accel_priv);
 		vlan->accel_priv = NULL;
-		return 0;
 	}
 
 	dev_uc_unsync(lowerdev, dev);

^ permalink raw reply related

* [iwl next-queue PATCH 02/10] macvlan: Rename fwd_priv to accel_priv and add accessor function
From: Alexander Duyck @ 2018-04-03 21:16 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

This change renames the fwd_priv member to accel_priv as this more
accurately reflects the actual purpose of this value. In addition I am
adding an accessor which will allow us to further abstract this in the
future if needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   10 +++-----
 drivers/net/macvlan.c                         |   31 +++++++++++++++----------
 include/linux/if_macvlan.h                    |    8 ++++++
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4d2c0ed..c08e6b5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5390,10 +5390,9 @@ static int ixgbe_fwd_ring_up(struct net_device *vdev,
 static int ixgbe_upper_dev_walk(struct net_device *upper, void *data)
 {
 	if (netif_is_macvlan(upper)) {
-		struct macvlan_dev *dfwd = netdev_priv(upper);
-		struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+		struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
 
-		if (dfwd->fwd_priv)
+		if (vadapter)
 			ixgbe_fwd_ring_up(upper, vadapter);
 	}
 
@@ -8967,13 +8966,12 @@ struct upper_walk_data {
 static int get_macvlan_queue(struct net_device *upper, void *_data)
 {
 	if (netif_is_macvlan(upper)) {
-		struct macvlan_dev *dfwd = netdev_priv(upper);
-		struct ixgbe_fwd_adapter *vadapter = dfwd->fwd_priv;
+		struct ixgbe_fwd_adapter *vadapter = macvlan_accel_priv(upper);
 		struct upper_walk_data *data = _data;
 		struct ixgbe_adapter *adapter = data->adapter;
 		int ifindex = data->ifindex;
 
-		if (vadapter && vadapter->netdev->ifindex == ifindex) {
+		if (vadapter && upper->ifindex == ifindex) {
 			data->queue = adapter->rx_ring[vadapter->rx_base_queue]->reg_idx;
 			data->action = data->queue;
 			return 1;
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 725f4b4..7ddc94f 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -559,9 +559,9 @@ static netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 	if (unlikely(netpoll_tx_running(dev)))
 		return macvlan_netpoll_send_skb(vlan, skb);
 
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		skb->dev = vlan->lowerdev;
-		ret = dev_queue_xmit_accel(skb, vlan->fwd_priv);
+		ret = dev_queue_xmit_accel(skb, vlan->accel_priv);
 	} else {
 		ret = macvlan_queue_xmit(skb, dev);
 	}
@@ -613,16 +613,23 @@ static int macvlan_open(struct net_device *dev)
 		goto hash_add;
 	}
 
+	err = -EBUSY;
+	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
+		goto out;
+
+	/* Attempt to populate accel_priv which is used to offload the L2
+	 * forwarding requests for unicast packets.
+	 */
 	if (lowerdev->features & NETIF_F_HW_L2FW_DOFFLOAD) {
-		vlan->fwd_priv =
+		vlan->accel_priv =
 		      lowerdev->netdev_ops->ndo_dfwd_add_station(lowerdev, dev);
 
 		/* If we get a NULL pointer back, or if we get an error
 		 * then we should just fall through to the non accelerated path
 		 */
-		if (IS_ERR_OR_NULL(vlan->fwd_priv)) {
-			vlan->fwd_priv = NULL;
-		} else
+		if (IS_ERR_OR_NULL(vlan->accel_priv))
+			vlan->accel_priv = NULL;
+		else
 			return 0;
 	}
 
@@ -655,10 +662,10 @@ static int macvlan_open(struct net_device *dev)
 del_unicast:
 	dev_uc_del(lowerdev, dev->dev_addr);
 out:
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
-							   vlan->fwd_priv);
-		vlan->fwd_priv = NULL;
+							   vlan->accel_priv);
+		vlan->accel_priv = NULL;
 	}
 	return err;
 }
@@ -668,10 +675,10 @@ static int macvlan_stop(struct net_device *dev)
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
 
-	if (vlan->fwd_priv) {
+	if (vlan->accel_priv) {
 		lowerdev->netdev_ops->ndo_dfwd_del_station(lowerdev,
-							   vlan->fwd_priv);
-		vlan->fwd_priv = NULL;
+							   vlan->accel_priv);
+		vlan->accel_priv = NULL;
 		return 0;
 	}
 
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 4cb7aee..c5106ce 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -21,7 +21,7 @@ struct macvlan_dev {
 	struct hlist_node	hlist;
 	struct macvlan_port	*port;
 	struct net_device	*lowerdev;
-	void			*fwd_priv;
+	void			*accel_priv;
 	struct vlan_pcpu_stats __percpu *pcpu_stats;
 
 	DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
@@ -86,4 +86,10 @@ extern void macvlan_count_rx(const struct macvlan_dev *vlan,
 }
 #endif
 
+static inline void *macvlan_accel_priv(struct net_device *dev)
+{
+	struct macvlan_dev *macvlan = netdev_priv(dev);
+
+	return macvlan->accel_priv;
+}
 #endif /* _LINUX_IF_MACVLAN_H */

^ permalink raw reply related

* [iwl next-queue PATCH 01/10] ixgbe: Drop support for macvlan specific unicast lists
From: Alexander Duyck @ 2018-04-03 21:15 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev
In-Reply-To: <20180403211519.7880.70243.stgit@ahduyck-green-test.jf.intel.com>

Drop the code for handling macvlan specific unicast lists. It isn't needed
since we don't take any efforts to maintain it when we bring the interface
up and it takes the slow path anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   31 -------------------------
 1 file changed, 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba9..4d2c0ed 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4892,36 +4892,6 @@ int ixgbe_del_mac_filter(struct ixgbe_adapter *adapter,
 	return -ENOMEM;
 }
 
-/**
- * ixgbe_write_uc_addr_list - write unicast addresses to RAR table
- * @netdev: network interface device structure
- * @vfn: pool to associate with unicast addresses
- *
- * Writes unicast address list to the RAR table.
- * Returns: -ENOMEM on failure/insufficient address space
- *                0 on no addresses written
- *                X on writing X addresses to the RAR table
- **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev, int vfn)
-{
-	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int count = 0;
-
-	/* return ENOMEM indicating insufficient memory for addresses */
-	if (netdev_uc_count(netdev) > ixgbe_available_rars(adapter, vfn))
-		return -ENOMEM;
-
-	if (!netdev_uc_empty(netdev)) {
-		struct netdev_hw_addr *ha;
-		netdev_for_each_uc_addr(ha, netdev) {
-			ixgbe_del_mac_filter(adapter, ha->addr, vfn);
-			ixgbe_add_mac_filter(adapter, ha->addr, vfn);
-			count++;
-		}
-	}
-	return count;
-}
-
 static int ixgbe_uc_sync(struct net_device *netdev, const unsigned char *addr)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
@@ -5320,7 +5290,6 @@ static void ixgbe_macvlan_set_rx_mode(struct net_device *dev, unsigned int pool,
 		vmolr |= IXGBE_VMOLR_ROMPE;
 		hw->mac.ops.update_mc_addr_list(hw, dev);
 	}
-	ixgbe_write_uc_addr_list(adapter->netdev, pool);
 	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(pool), vmolr);
 }
 

^ permalink raw reply related

* [iwl next-queue PATCH 00/10] Clean-up macvlan offloading
From: Alexander Duyck @ 2018-04-03 21:15 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev

This patch set represents yet another phase of the macvlan cleanup I have
been working on.

The main goal of these changes is to make it so that we only support
offloading what we can actually offload and we don't break any existing
functionality. So for example we were claiming to advertise source mode
macvlan and we were doing nothing of the sort, so support for that has been
dropped.

The biggest change with this set is that broadcast/multicast replication is
no longer being supported in software. I am dropping it as it leads to
scaling issues when a broadcast frame has to be replciated up to 64 times.

Beyond that this set goes through and optimized the time needed to bring up
and tear down the macvlan interfaces on ixgbe and provides a clean way for
us to disable the macvlan offload when needed.

---

Alexander Duyck (10):
      ixgbe: Drop support for macvlan specific unicast lists
      macvlan: Rename fwd_priv to accel_priv and add accessor function
      macvlan: Use software path for offloaded local, broadcast, and multicast traffic
      ixgbe/fm10k: Drop tracking stats for macvlan broadcast/multicast
      macvlan: macvlan_count_rx shouldn't be static inline AND extern
      macvlan: Add function to test for destination filtering support
      macvlan: Provide function for interfaces to release HW offload
      ixgbe/fm10k: Only support macvlan offload for types that support hairpin switching
      ixgbe: Drop real_adapter from l2 fwd acceleration structure
      ixgbe: Avoid performing unnecessary resets for macvlan offload

 drivers/net/ethernet/intel/fm10k/fm10k_main.c   |    7 -
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |   12 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h        |    1 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |  306 +++++++++++++----------
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c  |    5 
 drivers/net/macvlan.c                           |   68 +++--
 include/linux/if_macvlan.h                      |   29 ++
 7 files changed, 243 insertions(+), 185 deletions(-)

^ permalink raw reply

* [PATCH net 1/2] net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()
From: Florian Fainelli @ 2018-04-03 20:40 UTC (permalink / raw)
  To: Al Viro, David Miller; +Cc: netdev, opendmb, linux-kernel
In-Reply-To: <20180403164545.GK30522@ZenIV.linux.org.uk>

On 04/03/2018 09:45 AM, Al Viro wrote:
> On Tue, Apr 03, 2018 at 12:33:05PM -0400, David Miller wrote:
> 
>> Yes Al, however the pattern choosen here is probably cheaper on little endian:
>>
>> 	__beXX val = x;
>> 	switch (val) {
>> 	case htons(ETH_P_FOO):
>> 	 ...
>> 	}
>>
>> This way only the compiler byte swaps the constants at compile time,
>> no code is actually generated to do it.
> 
> That's not obvious, actually - depends upon how sparse the switch ends
> up being.  You can easily lose more than a single byteswap insn on
> worse cascase of comparisons.

David is right though here and this is used primarily on LE hosts, the
code produced with your suggested change does fix the sparse warning and
is functional, but results in less efficient code with a rev16
instruction being used. If you would prefer me to use __constant_htons()
to make that clearer, I don't have any objection to that.

172 instructions including a rev16

00002da8 <bcmgenet_insert_status>:
    2da8:       e1d038b8        ldrh    r3, [r0, #136]  ; 0x88
    2dac:       e92d4070        push    {r4, r5, r6, lr}
    2db0:       e6bf3fb3        rev16   r3, r3			<==============
    2db4:       e6ff3073        uxth    r3, r3
    2db8:       e3530b02        cmp     r3, #2048       ; 0x800
    2dbc:       0a000020        beq     2e44 <bcmgenet_insert_status+0x9c>
    2dc0:       e30826dd        movw    r2, #34525      ; 0x86dd
    2dc4:       e1530002        cmp     r3, r2
    2dc8:       1a00001c        bne     2e40 <bcmgenet_insert_status+0x98>
    2dcc:       e5906098        ldr     r6, [r0, #152]  ; 0x98
    2dd0:       e1d028bc        ldrh    r2, [r0, #140]  ; 0x8c
    2dd4:       e0862002        add     r2, r6, r2
    2dd8:       e5d22006        ldrb    r2, [r2, #6]
    2ddc:       e242e011        sub     lr, r2, #17
    2de0:       e1d0c6b4        ldrh    ip, [r0, #100]  ; 0x64
    2de4:       e16fef1e        clz     lr, lr
    2de8:       e590509c        ldr     r5, [r0, #156]  ; 0x9c
    2dec:       e1d046b6        ldrh    r4, [r0, #102]  ; 0x66
    2df0:       e1a0e2ae        lsr     lr, lr, #5
    2df4:       e3520006        cmp     r2, #6
    2df8:       11a0200e        movne   r2, lr
    2dfc:       038e2001        orreq   r2, lr, #1
    2e00:       e3520000        cmp     r2, #0
    2e04:       0a00000b        beq     2e38 <bcmgenet_insert_status+0x90>
    2e08:       e0455006        sub     r5, r5, r6
    2e0c:       e3530b02        cmp     r3, #2048       ; 0x800
    2e10:       13a0e000        movne   lr, #0
    2e14:       020ee001        andeq   lr, lr, #1
    2e18:       e04c3005        sub     r3, ip, r5
    2e1c:       e35e0000        cmp     lr, #0
    2e20:       e2433040        sub     r3, r3, #64     ; 0x40
    2e24:       e6ff3073        uxth    r3, r3
    2e28:       e0844003        add     r4, r4, r3
    2e2c:       e1842803        orr     r2, r4, r3, lsl #16
    2e30:       e3822102        orr     r2, r2, #-2147483648    ; 0x80000000
    2e34:       13822902        orrne   r2, r2, #32768  ; 0x8000
    2e38:       e5812030        str     r2, [r1, #48]   ; 0x30
    2e3c:       e8bd8070        pop     {r4, r5, r6, pc}
    2e40:       e8bd8070        pop     {r4, r5, r6, pc}
    2e44:       e5906098        ldr     r6, [r0, #152]  ; 0x98
    2e48:       e1d028bc        ldrh    r2, [r0, #140]  ; 0x8c
    2e4c:       e0862002        add     r2, r6, r2
    2e50:       e5d22009        ldrb    r2, [r2, #9]
    2e54:       eaffffe0        b       2ddc <bcmgenet_insert_status+0x34>


Whereas my changes result in:

164 instructions, and no rev* since everything is resolved at compile time.

00000810 <bcmgenet_insert_status>:
     810:       e92d4070        push    {r4, r5, r6, lr}
     814:       e1d0e8b8        ldrh    lr, [r0, #136]  ; 0x88
     818:       e35e0008        cmp     lr, #8
     81c:       0a000020        beq     8a4 <bcmgenet_insert_status+0x94>
     820:       e30d3d86        movw    r3, #56710      ; 0xdd86
     824:       e15e0003        cmp     lr, r3
     828:       1a00001c        bne     8a0 <bcmgenet_insert_status+0x90>
     82c:       e5906098        ldr     r6, [r0, #152]  ; 0x98
     830:       e1d038bc        ldrh    r3, [r0, #140]  ; 0x8c
     834:       e0863003        add     r3, r6, r3
     838:       e5d33006        ldrb    r3, [r3, #6]
     83c:       e2434011        sub     r4, r3, #17
     840:       e1d0c6b4        ldrh    ip, [r0, #100]  ; 0x64
     844:       e16f4f14        clz     r4, r4
     848:       e590209c        ldr     r2, [r0, #156]  ; 0x9c
     84c:       e1d056b6        ldrh    r5, [r0, #102]  ; 0x66
     850:       e1a042a4        lsr     r4, r4, #5
     854:       e3530006        cmp     r3, #6
     858:       11a03004        movne   r3, r4
     85c:       03843001        orreq   r3, r4, #1
     860:       e3530000        cmp     r3, #0
     864:       0a00000b        beq     898 <bcmgenet_insert_status+0x88>
     868:       e0422006        sub     r2, r2, r6
     86c:       e35e0008        cmp     lr, #8
     870:       13a0e000        movne   lr, #0
     874:       0204e001        andeq   lr, r4, #1
     878:       e04c2002        sub     r2, ip, r2
     87c:       e35e0000        cmp     lr, #0
     880:       e2422040        sub     r2, r2, #64     ; 0x40
     884:       e6ff2072        uxth    r2, r2
     888:       e0855002        add     r5, r5, r2
     88c:       e1853802        orr     r3, r5, r2, lsl #16
     890:       e3833102        orr     r3, r3, #-2147483648    ; 0x80000000
     894:       13833902        orrne   r3, r3, #32768  ; 0x8000
     898:       e5813030        str     r3, [r1, #48]   ; 0x30
     89c:       e8bd8070        pop     {r4, r5, r6, pc}
     8a0:       e8bd8070        pop     {r4, r5, r6, pc}
     8a4:       e5906098        ldr     r6, [r0, #152]  ; 0x98
     8a8:       e1d038bc        ldrh    r3, [r0, #140]  ; 0x8c
     8ac:       e0863003        add     r3, r6, r3
     8b0:       e5d33009        ldrb    r3, [r3, #9]
     8b4:       eaffffe0        b       83c <bcmgenet_insert_status+0x2c>

-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next 00/10] BTF: BPF Type Format
From: Arnaldo Carvalho de Melo @ 2018-04-03 19:30 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20180330182643.3539371-1-kafai@fb.com>

Em Fri, Mar 30, 2018 at 11:26:33AM -0700, Martin KaFai Lau escreveu:
> This patch introduces BPF Type Format (BTF).
> 
> BTF (BPF Type Format) is the meta data format which describes
> the data types of BPF program/map.  Hence, it basically focus
> on the C programming language which the modern BPF is primary
> using.  The first use case is to provide a generic pretty print
> capability for a BPF map.
> 
> A modified pahole that can convert dwarf to BTF is here:
> https://github.com/iamkafai/pahole/tree/btf

Hey, great, I'll try to review this and if all is well, merge this,
please consider CCing me in patches to pahole :-)

- Arnaldo
 
> Please see individual patch for details.
> 
> Martin KaFai Lau (10):
>   bpf: btf: Introduce BPF Type Format (BTF)
>   bpf: btf: Validate type reference
>   bpf: btf: Check members of struct/union
>   bpf: btf: Add pretty print capability for data with BTF type info
>   bpf: btf: Add BPF_BTF_LOAD command
>   bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
>   bpf: btf: Add pretty print support to the basic arraymap
>   bpf: btf: Sync bpf.h and btf.h to tools/
>   bpf: btf: Add BTF support to libbpf
>   bpf: btf: Add BTF tests
> 
>  include/linux/bpf.h                          |   20 +-
>  include/linux/btf.h                          |   48 +
>  include/uapi/linux/bpf.h                     |   12 +
>  include/uapi/linux/btf.h                     |  132 ++
>  kernel/bpf/Makefile                          |    1 +
>  kernel/bpf/arraymap.c                        |   50 +
>  kernel/bpf/btf.c                             | 2064 ++++++++++++++++++++++++++
>  kernel/bpf/inode.c                           |  146 +-
>  kernel/bpf/syscall.c                         |   51 +-
>  tools/include/uapi/linux/bpf.h               |   13 +
>  tools/include/uapi/linux/btf.h               |  132 ++
>  tools/lib/bpf/Build                          |    2 +-
>  tools/lib/bpf/bpf.c                          |   92 +-
>  tools/lib/bpf/bpf.h                          |   16 +
>  tools/lib/bpf/btf.c                          |  377 +++++
>  tools/lib/bpf/btf.h                          |   22 +
>  tools/lib/bpf/libbpf.c                       |  148 +-
>  tools/lib/bpf/libbpf.h                       |    3 +
>  tools/testing/selftests/bpf/Makefile         |   25 +-
>  tools/testing/selftests/bpf/test_btf.c       | 1539 +++++++++++++++++++
>  tools/testing/selftests/bpf/test_btf_haskv.c |   48 +
>  tools/testing/selftests/bpf/test_btf_nokv.c  |   43 +
>  22 files changed, 4943 insertions(+), 41 deletions(-)
>  create mode 100644 include/linux/btf.h
>  create mode 100644 include/uapi/linux/btf.h
>  create mode 100644 kernel/bpf/btf.c
>  create mode 100644 tools/include/uapi/linux/btf.h
>  create mode 100644 tools/lib/bpf/btf.c
>  create mode 100644 tools/lib/bpf/btf.h
>  create mode 100644 tools/testing/selftests/bpf/test_btf.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_haskv.c
>  create mode 100644 tools/testing/selftests/bpf/test_btf_nokv.c
> 
> -- 
> 2.9.5

^ permalink raw reply

* [PATCH iproute2-next 1/1] tc: jsonify skbedit action
From: Roman Mashak @ 2018-04-03 19:24 UTC (permalink / raw)
  To: dsahern; +Cc: stephen, netdev, kernel, jhs, xiyou.wangcong, jiri, Roman Mashak

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
---
 tc/m_skbedit.c | 53 +++++++++++++++++++++++++++++------------------------
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/tc/m_skbedit.c b/tc/m_skbedit.c
index db5c64caf2ba..070280cea29e 100644
--- a/tc/m_skbedit.c
+++ b/tc/m_skbedit.c
@@ -168,9 +168,8 @@ static int print_skbedit(struct action_util *au, FILE *f, struct rtattr *arg)
 	struct rtattr *tb[TCA_SKBEDIT_MAX + 1];
 
 	SPRINT_BUF(b1);
-	__u32 *priority;
-	__u32 *mark;
-	__u16 *queue_mapping, *ptype;
+	__u32 priority;
+	__u16 ptype;
 	struct tc_skbedit *p = NULL;
 
 	if (arg == NULL)
@@ -179,43 +178,49 @@ static int print_skbedit(struct action_util *au, FILE *f, struct rtattr *arg)
 	parse_rtattr_nested(tb, TCA_SKBEDIT_MAX, arg);
 
 	if (tb[TCA_SKBEDIT_PARMS] == NULL) {
-		fprintf(f, "[NULL skbedit parameters]");
+		print_string(PRINT_FP, NULL, "%s", "[NULL skbedit parameters]");
 		return -1;
 	}
 	p = RTA_DATA(tb[TCA_SKBEDIT_PARMS]);
 
-	fprintf(f, " skbedit");
+	print_string(PRINT_ANY, "kind", "%s ", "skbedit");
 
 	if (tb[TCA_SKBEDIT_QUEUE_MAPPING] != NULL) {
-		queue_mapping = RTA_DATA(tb[TCA_SKBEDIT_QUEUE_MAPPING]);
-		fprintf(f, " queue_mapping %u", *queue_mapping);
+		print_uint(PRINT_ANY, "queue_mapping", "queue_mapping %u",
+			   rta_getattr_u16(tb[TCA_SKBEDIT_QUEUE_MAPPING]));
 	}
 	if (tb[TCA_SKBEDIT_PRIORITY] != NULL) {
-		priority = RTA_DATA(tb[TCA_SKBEDIT_PRIORITY]);
-		fprintf(f, " priority %s", sprint_tc_classid(*priority, b1));
+		priority = rta_getattr_u32(tb[TCA_SKBEDIT_PRIORITY]);
+		print_string(PRINT_ANY, "priority", " priority %s",
+			     sprint_tc_classid(priority, b1));
 	}
 	if (tb[TCA_SKBEDIT_MARK] != NULL) {
-		mark = RTA_DATA(tb[TCA_SKBEDIT_MARK]);
-		fprintf(f, " mark %d", *mark);
+		print_uint(PRINT_ANY, "mark", " mark %u",
+			   rta_getattr_u32(tb[TCA_SKBEDIT_MARK]));
 	}
 	if (tb[TCA_SKBEDIT_PTYPE] != NULL) {
-		ptype = RTA_DATA(tb[TCA_SKBEDIT_PTYPE]);
-		if (*ptype == PACKET_HOST)
-			fprintf(f, " ptype host");
-		else if (*ptype == PACKET_BROADCAST)
-			fprintf(f, " ptype broadcast");
-		else if (*ptype == PACKET_MULTICAST)
-			fprintf(f, " ptype multicast");
-		else if (*ptype == PACKET_OTHERHOST)
-			fprintf(f, " ptype otherhost");
+		ptype = rta_getattr_u16(tb[TCA_SKBEDIT_PTYPE]);
+		if (ptype == PACKET_HOST)
+			print_string(PRINT_ANY, "ptype", " %s", "ptype host");
+		else if (ptype == PACKET_BROADCAST)
+			print_string(PRINT_ANY, "ptype", " %s",
+				     "ptype broadcast");
+		else if (ptype == PACKET_MULTICAST)
+			print_string(PRINT_ANY, "ptype", " %s",
+				     "ptype multicast");
+		else if (ptype == PACKET_OTHERHOST)
+			print_string(PRINT_ANY, "ptype", " %s",
+				     "ptype otherhost");
 		else
-			fprintf(f, " ptype %d", *ptype);
+			print_uint(PRINT_ANY, "ptype", " %u", ptype);
 	}
 
 	print_action_control(f, " ", p->action, "");
 
-	fprintf(f, "\n\t index %u ref %d bind %d",
-		p->index, p->refcnt, p->bindcnt);
+	print_string(PRINT_FP, NULL, "%s", _SL_);
+	print_uint(PRINT_ANY, "index", "\t index %u", p->index);
+	print_int(PRINT_ANY, "ref", " ref %d", p->refcnt);
+	print_int(PRINT_ANY, "bind", " bind %d", p->bindcnt);
 
 	if (show_stats) {
 		if (tb[TCA_SKBEDIT_TM]) {
@@ -225,7 +230,7 @@ static int print_skbedit(struct action_util *au, FILE *f, struct rtattr *arg)
 		}
 	}
 
-	fprintf(f, "\n ");
+	print_string(PRINT_FP, NULL, "%s", _SL_);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Siwei Liu @ 2018-04-03 19:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Si-Wei Liu, Michael S. Tsirkin, Stephen Hemminger,
	Alexander Duyck, David Miller, Brandeburg, Jesse, Jakub Kicinski,
	Jason Wang, Samudrala, Sridhar, Netdev, virtualization,
	virtio-dev
In-Reply-To: <20180403154210.GK3313@nanopsycho>

On Tue, Apr 3, 2018 at 8:42 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Sun, Apr 01, 2018 at 06:11:29PM CEST, dsahern@gmail.com wrote:
>>On 4/1/18 3:13 AM, Si-Wei Liu wrote:
>>> Hidden netdevice is not visible to userspace such that
>>> typical network utilites e.g. ip, ifconfig and et al,
>>> cannot sense its existence or configure it. Internally
>>> hidden netdev may associate with an upper level netdev
>>> that userspace has access to. Although userspace cannot
>>> manipulate the lower netdev directly, user may control
>>> or configure the underlying hidden device through the
>>> upper-level netdev. For identification purpose, the
>>> kobject for hidden netdev still presents in the sysfs
>>> hierarchy, however, no uevent message will be generated
>>> when the sysfs entry is created, modified or destroyed.
>>>
>>> For that end, a separate namescope needs to be carved
>>> out for IFF_HIDDEN netdevs. As of now netdev name that
>>> starts with colon i.e. ':' is invalid in userspace,
>>> since socket ioctls such as SIOCGIFCONF use ':' as the
>>> separator for ifname. The absence of namescope started
>>> with ':' can rightly be used as the namescope for
>>> the kernel-only IFF_HIDDEN netdevs.
>>>
>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>> ---
>>>  include/linux/netdevice.h   |  12 ++
>>>  include/net/net_namespace.h |   2 +
>>>  net/core/dev.c              | 281 ++++++++++++++++++++++++++++++++++++++------
>>>  net/core/net_namespace.c    |   1 +
>>>  4 files changed, 263 insertions(+), 33 deletions(-)
>>>
>>
>>There are other use cases that want to hide a device from userspace. I
>
> What usecases do you have in mind?

Hope you're not staring at me and shouting. :)

I think we had discussed a lot, and if the common goal is to merge two
drivers rather than diverge, there's no better way than to hide the
lower devices from all existing userspace management utiliies
(NetworManager, cloud-init). This does not mean loss of visibility as
we can add new API or CLI later on to get those missing ones exposed
as needed, in a way existing userspace apps don't break while new apps
aware of the feature know where to get it. This requirement is
critical to cloud providers, which I wouldn't repeat enough why it
drove me crazy if not seeing this resolved.

Thanks,
-Siwei

>
>>would prefer a better solution than playing games with name prefixes and
>>one that includes an API for users to list all devices -- even ones
>>hidden by default.
>
> Netdevice hiding feels a bit scarry for me. This smells like a workaround
> for userspace issues. Why can't the netdevice be visible always and
> userspace would know what is it and what should it do with it?
>
> Once we start with hiding, there are other things related to that which
> appear. Like who can see what, levels of visibility etc...
>
>
>>
>>https://github.com/dsahern/linux/commit/48a80a00eac284e58bae04af10a5a932dd7aee00
>>
>>https://github.com/dsahern/iproute2/commit/7563f5b26f5539960e99066e34a995d22ea908ed
>>
>>Also, why are you suggesting that the device should still be visible via
>>/sysfs? That leads to inconsistent views of networking state - /sys
>>shows a device but a link dump does not.

^ permalink raw reply

* Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Rustad, Mark D @ 2018-04-03 19:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, Daly, Dan, Bjorn Helgaas, Duyck, Alexander H,
	linux-pci@vger.kernel.org, virtio-dev@lists.oasis-open.org,
	kvm-devel, Netdev, LKML, linux-nvme@lists.infradead.org,
	Busch, Keith, netanel@amazon.com, Don Dutile, Maximilian Heyne,
	Wang, Liang-min, David Woodhouse, Christoph Hellwig,
	dwmw@amazon.co.uk
In-Reply-To: <20180403212503-mutt-send-email-mst@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

On Apr 3, 2018, at 11:27 AM, Michael S. Tsirkin <mst@redhat.com> wrote:

>> I'm not sure why you would need a feature bit. The capability is
>> controlled via PCI configuration space. If it is present the device
>> has the capability. If it is not then it does not.
>>
>> Basically if the PCI configuration space is not present then the sysfs
>> entries will not be spawned and nothing will attempt to use this
>> function.
>>
>> - ALex
>
> It's about compability with older guests which ignore the
> capability.
>
> The feature is thus helpful so host knows whether guest supports VFs.

This is not about a guest creating its own VFs. This is about a host PF  
that happens to have a virtio interface to be able to create virtio VFs  
that can be assigned to guests. Nothing changes at all from a guest  
perspective. Or maybe I am not understanding what you mean by "whether  
guest supports VFs".

--
Mark Rustad, Networking Division, Intel Corporation

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply

* Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Alexander Duyck @ 2018-04-03 19:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daly, Dan, Bjorn Helgaas, Duyck, Alexander H, linux-pci,
	virtio-dev, kvm, Netdev, LKML, linux-nvme, Keith Busch, netanel,
	Don Dutile, Maximilian Heyne, Wang, Liang-min, Rustad, Mark D,
	David Woodhouse, Christoph Hellwig, dwmw
In-Reply-To: <20180403212503-mutt-send-email-mst@kernel.org>

On Tue, Apr 3, 2018 at 11:27 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Apr 03, 2018 at 10:32:00AM -0700, Alexander Duyck wrote:
>> On Tue, Apr 3, 2018 at 6:12 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Fri, Mar 16, 2018 at 09:40:34AM -0700, Alexander Duyck wrote:
>> >> On Fri, Mar 16, 2018 at 9:34 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > On Thu, Mar 15, 2018 at 11:42:41AM -0700, Alexander Duyck wrote:
>> >> >> From: Alexander Duyck <alexander.h.duyck@intel.com>
>> >> >>
>> >> >> Hardware-realized virtio_pci devices can implement SR-IOV, so this
>> >> >> patch enables its use. The device in question is an upcoming Intel
>> >> >> NIC that implements both a virtio_net PF and virtio_net VFs. These
>> >> >> are hardware realizations of what has been up to now been a software
>> >> >> interface.
>> >> >>
>> >> >> The device in question has the following 4-part PCI IDs:
>> >> >>
>> >> >> PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
>> >> >> VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe
>> >> >>
>> >> >> The patch currently needs no check for device ID, because the callback
>> >> >> will never be made for devices that do not assert the capability or
>> >> >> when run on a platform incapable of SR-IOV.
>> >> >>
>> >> >> One reason for this patch is because the hardware requires the
>> >> >> vendor ID of a VF to be the same as the vendor ID of the PF that
>> >> >> created it. So it seemed logical to simply have a fully-functioning
>> >> >> virtio_net PF create the VFs. This patch makes that possible.
>> >> >>
>> >> >> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> >> >> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>> >> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> >> >
>> >> > So if and when virtio PFs can manage the VFs, then we can
>> >> > add a feature bit for that?
>> >> > Seems reasonable.
>> >>
>> >> Yes. If nothing else you may not even need a feature bit depending on
>> >> how things go.
>> >
>> > OTOH if the interface is changed in an incompatible way,
>> > and old Linux will attempt to drive the new device
>> > since there is no check.
>> >
>> > I think we should add a feature bit right away.
>>
>> I'm not sure why you would need a feature bit. The capability is
>> controlled via PCI configuration space. If it is present the device
>> has the capability. If it is not then it does not.
>>
>> Basically if the PCI configuration space is not present then the sysfs
>> entries will not be spawned and nothing will attempt to use this
>> function.
>>
>> - ALex
>
> It's about compability with older guests which ignore the
> capability.
>
> The feature is thus helpful so host knows whether guest supports VFs.

The thing is if the capability is ignored then the feature isn't used.
So for SR-IOV it isn't an uncommon thing for there to be drivers for
the PF floating around that do not support SR-IOV. In such cases
SR-IOV just isn't used while the hardware could support it.

I would think in the case of virtio it would be the same kind of
thing. Basically if SR-IOV is supported by the host then the
capability would be present. If SR-IOV is supported by the guest then
it would make use of the capability to spawn VFs. If either the
capability isn't present, or the driver doesn't use it then you won't
be able to spawn VFs in the guest.

Maybe I am missing something. Do you support dynamically changing the
PCI configuration space for Virtio devices based on the presence of
feature bits provided by the guest?

Also are you saying this patch set should wait on the feature bit to
be added, or are you talking about doing this as some sort of
follow-up?

- Alex

^ permalink raw reply

* Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Michael S. Tsirkin @ 2018-04-03 18:27 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Daly, Dan, Bjorn Helgaas, Duyck, Alexander H, linux-pci,
	virtio-dev, kvm, Netdev, LKML, linux-nvme, Keith Busch, netanel,
	Don Dutile, Maximilian Heyne, Wang, Liang-min, Rustad, Mark D,
	David Woodhouse, Christoph Hellwig, dwmw
In-Reply-To: <CAKgT0UfeSmG=xsd6sTkg4cOMnZ0QiqtCRDt4PynpF3d6nvvFCA@mail.gmail.com>

On Tue, Apr 03, 2018 at 10:32:00AM -0700, Alexander Duyck wrote:
> On Tue, Apr 3, 2018 at 6:12 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Mar 16, 2018 at 09:40:34AM -0700, Alexander Duyck wrote:
> >> On Fri, Mar 16, 2018 at 9:34 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> > On Thu, Mar 15, 2018 at 11:42:41AM -0700, Alexander Duyck wrote:
> >> >> From: Alexander Duyck <alexander.h.duyck@intel.com>
> >> >>
> >> >> Hardware-realized virtio_pci devices can implement SR-IOV, so this
> >> >> patch enables its use. The device in question is an upcoming Intel
> >> >> NIC that implements both a virtio_net PF and virtio_net VFs. These
> >> >> are hardware realizations of what has been up to now been a software
> >> >> interface.
> >> >>
> >> >> The device in question has the following 4-part PCI IDs:
> >> >>
> >> >> PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
> >> >> VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe
> >> >>
> >> >> The patch currently needs no check for device ID, because the callback
> >> >> will never be made for devices that do not assert the capability or
> >> >> when run on a platform incapable of SR-IOV.
> >> >>
> >> >> One reason for this patch is because the hardware requires the
> >> >> vendor ID of a VF to be the same as the vendor ID of the PF that
> >> >> created it. So it seemed logical to simply have a fully-functioning
> >> >> virtio_net PF create the VFs. This patch makes that possible.
> >> >>
> >> >> Reviewed-by: Christoph Hellwig <hch@lst.de>
> >> >> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
> >> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> >> >
> >> > So if and when virtio PFs can manage the VFs, then we can
> >> > add a feature bit for that?
> >> > Seems reasonable.
> >>
> >> Yes. If nothing else you may not even need a feature bit depending on
> >> how things go.
> >
> > OTOH if the interface is changed in an incompatible way,
> > and old Linux will attempt to drive the new device
> > since there is no check.
> >
> > I think we should add a feature bit right away.
> 
> I'm not sure why you would need a feature bit. The capability is
> controlled via PCI configuration space. If it is present the device
> has the capability. If it is not then it does not.
> 
> Basically if the PCI configuration space is not present then the sysfs
> entries will not be spawned and nothing will attempt to use this
> function.
> 
> - ALex

It's about compability with older guests which ignore the
capability.

The feature is thus helpful so host knows whether guest supports VFs.


-- 
MSR

^ permalink raw reply

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
From: Jeff Kirsher @ 2018-04-03 18:26 UTC (permalink / raw)
  To: Sinan Kaya, Alexander Duyck, intel-wired-lan
  Cc: Netdev, Timur Tabi, sulrich, linux-arm-msm, linux-arm-kernel
In-Reply-To: <92f46034-f041-05e4-56b7-01ae4cb04efc@codeaurora.org>

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]

On Tue, 2018-04-03 at 13:50 -0400, Sinan Kaya wrote:
> > > What do you think about this version? Did I miss any SMP
> > > barriers?
> > 
> > I would say we should probably just drop the whole set and start
> > over.
> > If we don't need the wmb() we are going to need to go through and
> > clean up all of these paths and consider the barriers when updating
> > the layout of the code.
> > 
> > For example I have been thinking about it and in the case of x86 we
> > are probably better off not bothering with the wmb() and
> > writel_relaxed() and instead switch over to the smp_wmb() and
> > writel()
> > since in the case of a strongly ordered system like x86 or sparc
> > this
> > ends up translating out to a couple of compile barriers.
> > 
> > I will also need time to reevaluate the Rx paths since dropping the
> > wmb() may have other effects which I need to verify.
> 
> Sounds good, I'll let you work on it.
> 
> @Jeff Kirsher: can you drop this version from your development branch
> until
> Alex posts the next version?

Already on it, I will work with Alex on any possible future versions of
these patches.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v15 ] net/veth/XDP: Line-rate packet forwarding in kernel
From: Jesper Dangaard Brouer @ 2018-04-03 18:21 UTC (permalink / raw)
  To: David Ahern
  Cc: John Fastabend, Alexei Starovoitov, Md. Islam, netdev,
	David Miller, stephen, agaceph, Pavel Emelyanov, Eric Dumazet,
	brouer
In-Reply-To: <8832a8a4-24bb-7841-bb84-31f7c05c2b72@gmail.com>

On Tue, 3 Apr 2018 11:00:20 -0600 David Ahern <dsahern@gmail.com> wrote:

> On 4/3/18 10:41 AM, John Fastabend wrote:
> > On 04/03/2018 08:07 AM, David Ahern wrote:  
> >> On 4/2/18 12:16 PM, Alexei Starovoitov wrote:  
> >>> On Mon, Apr 02, 2018 at 12:09:44PM -0600, David Ahern wrote:  
> >>>> On 4/2/18 12:03 PM, John Fastabend wrote:  
> >>>>>
[...]
> >>>> That's what I have here:
> >>>>
> >>>> https://github.com/dsahern/linux/commit/bab42f158c0925339f7519df7fb2cde8eac33aa8  
> >>>
> >>> was wondering what's up with the delay and when are you going to
> >>> submit them officially...

I'm also eager to see these go upstream! :-)

[...]
> >>
> >> 2. Jesper had concerns about xdp redirect to any netdev. e.g., How does
> >> the lookup know the egress netdev supports xdp? Right now you can try
> >> and the packet is dropped if it is not supported.
> >>  
> > 
> > There should probably be a tracepoint there if not already. Otherwise
> > I think the orchestration/loader layer should be ensuring that xdp
> > support is sufficient. I don't think we need anything specific in the
> > XDP/BPF code to handle unsupported devices.  
> 
> ok. I am fine with starting with that.

David this concern was not related to your code.  Your code need to
stay generic, to be usable from both XDP and clsact.  As John writes
the error cases can be troubleshooted through XDP tracepoints.

My request/concern was actually performance related.  I wanted to add a
flag to the helper, to allow for a later optimization.  Today, we have
pushed the responsibility, of knowing if an ifindex is XDP-xmit
capable, to the BPF program(mer).  This implies that the BPF-prog need
extra code/lookup into a map to validate the ifindex returned from your
helper.  Later, we might get some XDP feature bits, and a flag could
specify to only return the looked up device if XDP-xmit is enabled.
Thus, simplifying the work need in the bpf-prog.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
From: Sinan Kaya @ 2018-04-03 17:50 UTC (permalink / raw)
  To: Alexander Duyck, intel-wired-lan
  Cc: Jeff Kirsher, Netdev, Timur Tabi, sulrich, linux-arm-msm,
	linux-arm-kernel
In-Reply-To: <CAKgT0UfbCzTGuwCnYiXOBL5Lm3AhYw6Kf5i0K2e19mTjise+Ew@mail.gmail.com>

On 4/3/2018 1:47 PM, Alexander Duyck wrote:
> On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
>> Alex,
>>
>> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>>> Code includes wmb() followed by writel() in multiple places. writel()
>>> already has a barrier on some architectures like arm64.
>>>
>>> This ends up CPU observing two barriers back to back before executing the
>>> register write.
>>>
>>> Since code already has an explicit barrier call, changing writel() to
>>> writel_relaxed().
>>>
>>> I did a regex search for wmb() followed by writel() in each drivers
>>> directory. I scrubbed the ones I care about in this series.
>>>
>>> I considered "ease of change", "popular usage" and "performance critical
>>> path" as the determining criteria for my filtering.
>>>
>>> We used relaxed API heavily on ARM for a long time but
>>> it did not exist on other architectures. For this reason, relaxed
>>> architectures have been paying double penalty in order to use the common
>>> drivers.
>>>
>>> Now that relaxed API is present on all architectures, we can go and scrub
>>> all drivers to see what needs to change and what can remain.
>>>
>>> We start with mostly used ones and hope to increase the coverage over time.
>>> It will take a while to cover all drivers.
>>>
>>> Feel free to apply patches individually.
>>>
>>> Change since 7:
>>>
>>> * API clarification with Linus and several architecture folks regarding
>>> writel()
>>>
>>> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
>>>
>>> * removed wmb() in front of writel() at several places.
>>> * remove old IA64 comments regarding ordering.
>>>
>>
>> What do you think about this version? Did I miss any SMP barriers?
> 
> I would say we should probably just drop the whole set and start over.
> If we don't need the wmb() we are going to need to go through and
> clean up all of these paths and consider the barriers when updating
> the layout of the code.
> 
> For example I have been thinking about it and in the case of x86 we
> are probably better off not bothering with the wmb() and
> writel_relaxed() and instead switch over to the smp_wmb() and writel()
> since in the case of a strongly ordered system like x86 or sparc this
> ends up translating out to a couple of compile barriers.
> 
> I will also need time to reevaluate the Rx paths since dropping the
> wmb() may have other effects which I need to verify.

Sounds good, I'll let you work on it.

@Jeff Kirsher: can you drop this version from your development branch until
Alex posts the next version?

> 
> Thanks.
> 
> - Alex
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH v8 0/7] netdev: intel: Eliminate duplicate barriers on weakly-ordered archs
From: Alexander Duyck @ 2018-04-03 17:47 UTC (permalink / raw)
  To: Sinan Kaya, intel-wired-lan
  Cc: Jeff Kirsher, Netdev, Timur Tabi, sulrich, linux-arm-msm,
	linux-arm-kernel
In-Reply-To: <b8355df0-f3ba-c8c8-ccbb-53b1a294eab6@codeaurora.org>

On Mon, Apr 2, 2018 at 7:59 PM, Sinan Kaya <okaya@codeaurora.org> wrote:
> Alex,
>
> On 4/2/2018 3:06 PM, Sinan Kaya wrote:
>> Code includes wmb() followed by writel() in multiple places. writel()
>> already has a barrier on some architectures like arm64.
>>
>> This ends up CPU observing two barriers back to back before executing the
>> register write.
>>
>> Since code already has an explicit barrier call, changing writel() to
>> writel_relaxed().
>>
>> I did a regex search for wmb() followed by writel() in each drivers
>> directory. I scrubbed the ones I care about in this series.
>>
>> I considered "ease of change", "popular usage" and "performance critical
>> path" as the determining criteria for my filtering.
>>
>> We used relaxed API heavily on ARM for a long time but
>> it did not exist on other architectures. For this reason, relaxed
>> architectures have been paying double penalty in order to use the common
>> drivers.
>>
>> Now that relaxed API is present on all architectures, we can go and scrub
>> all drivers to see what needs to change and what can remain.
>>
>> We start with mostly used ones and hope to increase the coverage over time.
>> It will take a while to cover all drivers.
>>
>> Feel free to apply patches individually.
>>
>> Change since 7:
>>
>> * API clarification with Linus and several architecture folks regarding
>> writel()
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg225806.html
>>
>> * removed wmb() in front of writel() at several places.
>> * remove old IA64 comments regarding ordering.
>>
>
> What do you think about this version? Did I miss any SMP barriers?

I would say we should probably just drop the whole set and start over.
If we don't need the wmb() we are going to need to go through and
clean up all of these paths and consider the barriers when updating
the layout of the code.

For example I have been thinking about it and in the case of x86 we
are probably better off not bothering with the wmb() and
writel_relaxed() and instead switch over to the smp_wmb() and writel()
since in the case of a strongly ordered system like x86 or sparc this
ends up translating out to a couple of compile barriers.

I will also need time to reevaluate the Rx paths since dropping the
wmb() may have other effects which I need to verify.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH v15 ] net/veth/XDP: Line-rate packet forwarding in kernel
From: Alexei Starovoitov @ 2018-04-03 17:37 UTC (permalink / raw)
  To: David Ahern
  Cc: John Fastabend, Md. Islam, netdev, David Miller, stephen, agaceph,
	Pavel Emelyanov, Eric Dumazet, brouer
In-Reply-To: <dea228d2-8483-b291-493d-3627b824fd00@gmail.com>

On Tue, Apr 03, 2018 at 11:14:00AM -0600, David Ahern wrote:
> On 4/3/18 11:06 AM, Alexei Starovoitov wrote:
> >> For 3 and 4 above I was referring to the route lookup part of it; sorry
> >> for not being clear.
> >>
> >> For example, eth1 is enslaved to bond1 which is in VRF red. The lookup
> >> needs to go to the table associated with the VRF. That is not known by
> >> just looking at eth1. The code exists to walk the upper layers and do
> >> the effective translations, just need to cover those cases.
> >>
> >> The VLAN part of it is a bit more difficult - ingress device for the
> >> lookup should be eth1.100 for example not eth1, and then if eth1.100 is
> >> enslaved to a VRF, ...
> >>
> >> None of it is that complex, just need to walk through the various use
> >> cases and make sure bpf_ipv4_fwd_lookup and bpf_ipv6_fwd_lookup can do
> >> the right thing for these common use cases.
> > I'm a bit lost here. Why this is a concern?
> > 'index' as argument that bpf prog is passing into the helper.
> > The clsbpf program may choose to pass ifindex of the netdev
> > it's attached to or some other one.
> > In your patch you have:
> > +BPF_CALL_3(bpf_ipv4_fwd_lookup, int, index, const struct iphdr *, iph,
> > +	   struct ethhdr *, eth)
> > +{
> > +	struct flowi4 fl4 = {
> > +		.daddr = iph->daddr,
> > +		.saddr = iph->saddr,
> > +		.flowi4_iif = index,
> > +		.flowi4_tos = iph->tos & IPTOS_RT_MASK,
> > +		.flowi4_scope = RT_SCOPE_UNIVERSE,
> > +	};
> > 
> > As you saying there is concern with .flowi4_iif = index line ?
> 
> yes. BPF / XDP programs are installed on the bottom device ... e.g.,
> eth1. The L3 lookup is not necessarily done on that device index.

right, but I still don't see any problem with this helper and vlans.
If xdp program passes incorrect ifindex, it's program's mistake.
If clsbpf attached to vlan passed good ifindex, the lookup will
happen in the correct scope, but even in this case the prog
can pass whatever ifindex it wants.

> 
> > In the above the only thing Daniel and myself pointed out that
> > passing struct iphdr * like this is not safe.
> > We either need size argument which would be a bit cumbersome or
> > extend verifier a little to specify size as part of helper proto,
> > so that verifier can eforce it without having program to pass it.
> > imo that's the only bit missing from that patch to upstream it.
> 
> sure. I did not mean that item 1. was a big deal, just something that
> needed to be fixed.
> 
> > 
> > Also the helper isn't really related to XDP. It should work as-is
> > for clsbpf and xdp programs as far as I can tell.
> > 
> 
> yes.

^ permalink raw reply

* Re: [RFC PATCH 2/3] netdev: kernel-only IFF_HIDDEN netdevice
From: Stephen Hemminger @ 2018-04-03 17:35 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: mst, jiri, alexander.h.duyck, davem, jesse.brandeburg, kubakici,
	jasowang, sridhar.samudrala, netdev, virtualization, virtio-dev
In-Reply-To: <1522573990-5242-3-git-send-email-si-wei.liu@oracle.com>

On Sun,  1 Apr 2018 05:13:09 -0400
Si-Wei Liu <si-wei.liu@oracle.com> wrote:

> Hidden netdevice is not visible to userspace such that
> typical network utilites e.g. ip, ifconfig and et al,
> cannot sense its existence or configure it. Internally
> hidden netdev may associate with an upper level netdev
> that userspace has access to. Although userspace cannot
> manipulate the lower netdev directly, user may control
> or configure the underlying hidden device through the
> upper-level netdev. For identification purpose, the
> kobject for hidden netdev still presents in the sysfs
> hierarchy, however, no uevent message will be generated
> when the sysfs entry is created, modified or destroyed.
> 
> For that end, a separate namescope needs to be carved
> out for IFF_HIDDEN netdevs. As of now netdev name that
> starts with colon i.e. ':' is invalid in userspace,
> since socket ioctls such as SIOCGIFCONF use ':' as the
> separator for ifname. The absence of namescope started
> with ':' can rightly be used as the namescope for
> the kernel-only IFF_HIDDEN netdevs.
> 
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> ---

I understand the use case. I proposed using . as a prefix before
but that ran into resistance. Using colon seems worse.

Rather than playing with names and all the issues that can cause,
why not make it an attribute flag of the device in netlink.

^ permalink raw reply

* Re: [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Alexander Duyck @ 2018-04-03 17:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daly, Dan, Bjorn Helgaas, Duyck, Alexander H, linux-pci,
	virtio-dev, kvm, Netdev, LKML, linux-nvme, Keith Busch, netanel,
	Don Dutile, Maximilian Heyne, Wang, Liang-min, Rustad, Mark D,
	David Woodhouse, Christoph Hellwig, dwmw
In-Reply-To: <20180403161151-mutt-send-email-mst@kernel.org>

On Tue, Apr 3, 2018 at 6:12 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Mar 16, 2018 at 09:40:34AM -0700, Alexander Duyck wrote:
>> On Fri, Mar 16, 2018 at 9:34 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Thu, Mar 15, 2018 at 11:42:41AM -0700, Alexander Duyck wrote:
>> >> From: Alexander Duyck <alexander.h.duyck@intel.com>
>> >>
>> >> Hardware-realized virtio_pci devices can implement SR-IOV, so this
>> >> patch enables its use. The device in question is an upcoming Intel
>> >> NIC that implements both a virtio_net PF and virtio_net VFs. These
>> >> are hardware realizations of what has been up to now been a software
>> >> interface.
>> >>
>> >> The device in question has the following 4-part PCI IDs:
>> >>
>> >> PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
>> >> VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe
>> >>
>> >> The patch currently needs no check for device ID, because the callback
>> >> will never be made for devices that do not assert the capability or
>> >> when run on a platform incapable of SR-IOV.
>> >>
>> >> One reason for this patch is because the hardware requires the
>> >> vendor ID of a VF to be the same as the vendor ID of the PF that
>> >> created it. So it seemed logical to simply have a fully-functioning
>> >> virtio_net PF create the VFs. This patch makes that possible.
>> >>
>> >> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> >> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> >
>> > So if and when virtio PFs can manage the VFs, then we can
>> > add a feature bit for that?
>> > Seems reasonable.
>>
>> Yes. If nothing else you may not even need a feature bit depending on
>> how things go.
>
> OTOH if the interface is changed in an incompatible way,
> and old Linux will attempt to drive the new device
> since there is no check.
>
> I think we should add a feature bit right away.

I'm not sure why you would need a feature bit. The capability is
controlled via PCI configuration space. If it is present the device
has the capability. If it is not then it does not.

Basically if the PCI configuration space is not present then the sysfs
entries will not be spawned and nothing will attempt to use this
function.

- ALex

^ permalink raw reply

* Re: [PATCH net] net: dsa: mt7530: Use NULL instead of plain integer
From: Sean Wang @ 2018-04-03 17:31 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Vivien Didelot,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, open list,
	Andrew Lunn
In-Reply-To: <f5aa0467-8124-b097-2b89-dcf1698a5ba5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Tue, 2018-04-03 at 10:02 -0700, Florian Fainelli wrote:
> On 04/02/2018 07:18 PM, Sean Wang wrote:
> > On Mon, 2018-04-02 at 16:24 -0700, Florian Fainelli wrote:
> >> We would be passing 0 instead of NULL as the rsp argument to
> >> mt7530_fdb_cmd(), fix that.
> >>
> > 
> > Acked-by: Sean Wang <sean.wang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> > 
> > BTW, does the part of the commit message should be updated with "passing
> > NULL instead of 0"?
> 
> I don't follow you, the commit message indicates what we were doing
> which implies it was wrong and fixes it. Would you want me to reword
> that part?

thanks for the explanation.

the commit message is clear and okay for me now

^ permalink raw reply

* RE: general protection fault in tipc_nametbl_unsubscribe
From: Jon Maloy @ 2018-04-03 17:16 UTC (permalink / raw)
  To: syzbot, davem@davemloft.net, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	tipc-discussion@lists.sourceforge.net, ying.xue@windriver.com
In-Reply-To: <000000000000a7c16a0568d750ce@google.com>

#syz dup: general protection fault in __list_del_entry_valid (3)

> -----Original Message-----
> From: syzbot
> [mailto:syzbot+4859fe19555ea87c42f3@syzkaller.appspotmail.com]
> Sent: Monday, April 02, 2018 02:01
> To: davem@davemloft.net; Jon Maloy <jon.maloy@ericsson.com>; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org; syzkaller-
> bugs@googlegroups.com; tipc-discussion@lists.sourceforge.net;
> ying.xue@windriver.com
> Subject: general protection fault in tipc_nametbl_unsubscribe
> 
> Hello,
> 
> syzbot hit the following crash on upstream commit
> 10b84daddbec72c6b440216a69de9a9605127f7a (Sat Mar 31 17:59:00 2018
> +0000) Merge branch 'perf-urgent-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=4859fe19555ea87c42f3
> 
> So far this crash happened 3 times on upstream.
> C reproducer:
> https://syzkaller.appspot.com/x/repro.c?id=4775372465897472
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=4868734988582912
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5073802094444544
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-2760467897697295172
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4859fe19555ea87c42f3@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
> 
> R13: ffffffffffffffff R14: 0000000000000000 R15: 0000000000000000 Name
> sequence creation failed, no memory Failed to create subscription for
> {24576,0,4294967295}
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer:
>     (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 4447 Comm: syzkaller851181 Not tainted 4.16.0-rc7+ #374
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> RIP: 0010:__list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51
> RSP: 0018:ffff8801ae1aef48 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff8801cf54c760 RDI: ffff8801cf54c768
> RBP: ffff8801ae1aef60 R08: 1ffff10035c35cff R09: ffffffff89956150
> R10: ffff8801ae1aee28 R11: 000000000000168a R12: ffffffff87745ea0
> R13: ffff8801ae1af100 R14: ffff8801cf54c760 R15: ffff8801cf4c8cc0
> FS:  0000000000000000(0000) GS:ffff8801db100000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055dce15c3090 CR3: 000000000846a002 CR4: 00000000001606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call
> Trace:
>   __list_del_entry include/linux/list.h:117 [inline]
>   list_del_init include/linux/list.h:159 [inline]
>   tipc_nametbl_unsubscribe+0x318/0x990 net/tipc/name_table.c:848
>   tipc_subscrb_subscrp_delete+0x1e9/0x460 net/tipc/subscr.c:212
>   tipc_subscrb_delete net/tipc/subscr.c:242 [inline]
>   tipc_subscrb_release_cb+0x17/0x30 net/tipc/subscr.c:321
>   tipc_topsrv_kern_unsubscr+0x2c3/0x430 net/tipc/server.c:535
>   tipc_group_delete+0x2c0/0x3d0 net/tipc/group.c:231
>   tipc_sk_leave+0x10b/0x200 net/tipc/socket.c:2795
>   tipc_release+0x154/0xff0 net/tipc/socket.c:577
>   sock_release+0x8d/0x1e0 net/socket.c:595
>   sock_close+0x16/0x20 net/socket.c:1149
>   __fput+0x327/0x7e0 fs/file_table.c:209
>   ____fput+0x15/0x20 fs/file_table.c:243
>   task_work_run+0x199/0x270 kernel/task_work.c:113
>   exit_task_work include/linux/task_work.h:22 [inline]
>   do_exit+0x9bb/0x1ad0 kernel/exit.c:865
>   do_group_exit+0x149/0x400 kernel/exit.c:968
>   SYSC_exit_group kernel/exit.c:979 [inline]
>   SyS_exit_group+0x1d/0x20 kernel/exit.c:977
>   do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x43f228
> RSP: 002b:00007ffde31217e8 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f228
> RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> RBP: 00000000004bf308 R08: 00000000000000e7 R09: ffffffffffffffd0
> R10: 00000000204ee000 R11: 0000000000000246 R12: 0000000000000001
> R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
> Code: 00 00 00 00 ad de 49 39 c4 74 66 48 b8 00 02 00 00 00 00 ad de 48 89 da 48
> 39 c3 74 65 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df <80> 3c 02 00
> 75 7b 48 8b 13 48 39 f2 75 57 49 8d 7c 24 08 48 b8
> RIP: __list_del_entry_valid+0x7e/0x150 lib/list_debug.c:51 RSP:
> ffff8801ae1aef48
> ---[ end trace ba18c1598e2d5535 ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch and provide the patch inline or as an
> attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report,
> please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug report.
> Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox