Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 1/7] hv_netvsc: use consume_skb
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: Stephen Hemminger <sthemmin@microsoft.com>

Packets that are transmitted in normal path should use consume_skb
instead of kfree_skb. This allows for better tracing of packet drops.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index ff05b9b..720b5fa 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -635,7 +635,7 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device,
 		q_idx = nvsc_packet->q_idx;
 		channel = incoming_channel;
 
-		dev_kfree_skb_any(skb);
+		dev_consume_skb_any(skb);
 	}
 
 	num_outstanding_sends =
@@ -944,7 +944,7 @@ int netvsc_send(struct hv_device *device,
 		}
 
 		if (msdp->skb)
-			dev_kfree_skb_any(msdp->skb);
+			dev_consume_skb_any(msdp->skb);
 
 		if (xmit_more && !packet->cp_partial) {
 			msdp->skb = skb;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 3/7] hv_netvsc: simplify callback event code
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: Stephen Hemminger <sthemmin@microsoft.com>

The callback handler for netlink events can be simplified:
 * Consolidate check for netlink callback events about this driver itself.
 * Ignore non-Ethernet devices.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |   28 ++++++++++------------------
 1 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index e74dbcc..849b566 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1238,10 +1238,6 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 	struct net_device *ndev;
 	struct net_device_context *net_device_ctx;
 	struct netvsc_device *netvsc_dev;
-	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
-
-	if (eth_ops == NULL || eth_ops == &ethtool_ops)
-		return NOTIFY_DONE;
 
 	/*
 	 * We will use the MAC address to locate the synthetic interface to
@@ -1286,12 +1282,8 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
 {
 	struct net_device *ndev;
 	struct netvsc_device *netvsc_dev;
-	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
 	struct net_device_context *net_device_ctx;
 
-	if (eth_ops == &ethtool_ops)
-		return NOTIFY_DONE;
-
 	ndev = get_netvsc_net_device(vf_netdev->dev_addr);
 	if (!ndev)
 		return NOTIFY_DONE;
@@ -1329,10 +1321,6 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
 	struct net_device *ndev;
 	struct netvsc_device *netvsc_dev;
 	struct net_device_context *net_device_ctx;
-	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
-
-	if (eth_ops == &ethtool_ops)
-		return NOTIFY_DONE;
 
 	ndev = get_netvsc_net_device(vf_netdev->dev_addr);
 	if (!ndev)
@@ -1361,12 +1349,8 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
 {
 	struct net_device *ndev;
 	struct netvsc_device *netvsc_dev;
-	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
 	struct net_device_context *net_device_ctx;
 
-	if (eth_ops == &ethtool_ops)
-		return NOTIFY_DONE;
-
 	ndev = get_netvsc_net_device(vf_netdev->dev_addr);
 	if (!ndev)
 		return NOTIFY_DONE;
@@ -1542,13 +1526,21 @@ static int netvsc_netdev_event(struct notifier_block *this,
 {
 	struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
 
+	/* Skip our own events */
+	if (event_dev->netdev_ops == &device_ops)
+		return NOTIFY_DONE;
+
+	/* Avoid non-Ethernet type devices */
+	if (event_dev->type != ARPHRD_ETHER)
+		return NOTIFY_DONE;
+
 	/* Avoid Vlan dev with same MAC registering as VF */
 	if (event_dev->priv_flags & IFF_802_1Q_VLAN)
 		return NOTIFY_DONE;
 
 	/* Avoid Bonding master dev with same MAC registering as VF */
-	if (event_dev->priv_flags & IFF_BONDING &&
-	    event_dev->flags & IFF_MASTER)
+	if ((event_dev->priv_flags & IFF_BONDING) &&
+	    (event_dev->flags & IFF_MASTER))
 		return NOTIFY_DONE;
 
 	switch (event) {
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 2/7] hv_netvsc: dev hold/put reference to VF
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: Stephen Hemminger <sthemmin@microsoft.com>

The netvsc driver holds a pointer to the virtual function network device if
managing SR-IOV association. In order to ensure that the VF network device
does not disappear, it should be using dev_hold/dev_put to get a reference
count.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 2360e70..e74dbcc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1262,6 +1262,8 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 	 * Take a reference on the module.
 	 */
 	try_module_get(THIS_MODULE);
+
+	dev_hold(vf_netdev);
 	net_device_ctx->vf_netdev = vf_netdev;
 	return NOTIFY_OK;
 }
@@ -1376,6 +1378,7 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
 	netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
 	netvsc_inject_disable(net_device_ctx);
 	net_device_ctx->vf_netdev = NULL;
+	dev_put(vf_netdev);
 	module_put(THIS_MODULE);
 	return NOTIFY_OK;
 }
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 4/7] hv_netvsc: improve VF device matching
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: Stephen Hemminger <sthemmin@microsoft.com>

The code to associate netvsc and VF devices can be made less error prone
by using a better matching algorithms.

On registration, use the permanent address which avoids any possible
issues caused by device MAC address being changed. For all other callbacks,
search by the netdevice pointer value to ensure getting the correct
network device.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c |   60 +++++++++++++++++++++++++-------------
 1 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 849b566..8768219 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1215,22 +1215,44 @@ static void netvsc_free_netdev(struct net_device *netdev)
 	free_netdev(netdev);
 }
 
-static struct net_device *get_netvsc_net_device(char *mac)
+static struct net_device *get_netvsc_bymac(const u8 *mac)
 {
-	struct net_device *dev, *found = NULL;
+	struct net_device *dev;
 
 	ASSERT_RTNL();
 
 	for_each_netdev(&init_net, dev) {
-		if (memcmp(dev->dev_addr, mac, ETH_ALEN) == 0) {
-			if (dev->netdev_ops != &device_ops)
-				continue;
-			found = dev;
-			break;
-		}
+		if (dev->netdev_ops != &device_ops)
+			continue;	/* not a netvsc device */
+
+		if (ether_addr_equal(mac, dev->perm_addr))
+			return dev;
+	}
+
+	return NULL;
+}
+
+static struct net_device *get_netvsc_byref(const struct net_device *vf_netdev)
+{
+	struct net_device *dev;
+
+	ASSERT_RTNL();
+
+	for_each_netdev(&init_net, dev) {
+		struct net_device_context *net_device_ctx;
+
+		if (dev->netdev_ops != &device_ops)
+			continue;	/* not a netvsc device */
+
+		net_device_ctx = netdev_priv(dev);
+		if (net_device_ctx->nvdev == NULL)
+			continue;	/* device is removed */
+
+		if (net_device_ctx->vf_netdev == vf_netdev)
+			return dev;	/* a match */
 	}
 
-	return found;
+	return NULL;
 }
 
 static int netvsc_register_vf(struct net_device *vf_netdev)
@@ -1239,12 +1261,15 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 	struct net_device_context *net_device_ctx;
 	struct netvsc_device *netvsc_dev;
 
+	if (vf_netdev->addr_len != ETH_ALEN)
+		return NOTIFY_DONE;
+
 	/*
 	 * We will use the MAC address to locate the synthetic interface to
 	 * associate with the VF interface. If we don't find a matching
 	 * synthetic interface, move on.
 	 */
-	ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+	ndev = get_netvsc_bymac(vf_netdev->perm_addr);
 	if (!ndev)
 		return NOTIFY_DONE;
 
@@ -1284,16 +1309,13 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
 	struct netvsc_device *netvsc_dev;
 	struct net_device_context *net_device_ctx;
 
-	ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+	ndev = get_netvsc_byref(vf_netdev);
 	if (!ndev)
 		return NOTIFY_DONE;
 
 	net_device_ctx = netdev_priv(ndev);
 	netvsc_dev = net_device_ctx->nvdev;
 
-	if (!netvsc_dev || !net_device_ctx->vf_netdev)
-		return NOTIFY_DONE;
-
 	netdev_info(ndev, "VF up: %s\n", vf_netdev->name);
 	netvsc_inject_enable(net_device_ctx);
 
@@ -1322,16 +1344,13 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
 	struct netvsc_device *netvsc_dev;
 	struct net_device_context *net_device_ctx;
 
-	ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+	ndev = get_netvsc_byref(vf_netdev);
 	if (!ndev)
 		return NOTIFY_DONE;
 
 	net_device_ctx = netdev_priv(ndev);
 	netvsc_dev = net_device_ctx->nvdev;
 
-	if (!netvsc_dev || !net_device_ctx->vf_netdev)
-		return NOTIFY_DONE;
-
 	netdev_info(ndev, "VF down: %s\n", vf_netdev->name);
 	netvsc_inject_disable(net_device_ctx);
 	netvsc_switch_datapath(ndev, false);
@@ -1351,14 +1370,13 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
 	struct netvsc_device *netvsc_dev;
 	struct net_device_context *net_device_ctx;
 
-	ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+	ndev = get_netvsc_byref(vf_netdev);
 	if (!ndev)
 		return NOTIFY_DONE;
 
 	net_device_ctx = netdev_priv(ndev);
 	netvsc_dev = net_device_ctx->nvdev;
-	if (!netvsc_dev || !net_device_ctx->vf_netdev)
-		return NOTIFY_DONE;
+
 	netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
 	netvsc_inject_disable(net_device_ctx);
 	net_device_ctx->vf_netdev = NULL;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 6/7] hv_netvsc: remove VF in flight counters
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: Stephen Hemminger <sthemmin@microsoft.com>

Since VF reference is now protected by RCU, no longer need the VF usage
counter and can use device flags to see whether to inject or not.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |    3 +-
 drivers/net/hyperv/netvsc_drv.c |   81 ++++++++++-----------------------------
 2 files changed, 21 insertions(+), 63 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 6b79487..1d49740 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -696,8 +696,7 @@ struct net_device_context {
 
 	/* State to manage the associated VF interface. */
 	struct net_device __rcu *vf_netdev;
-	bool vf_inject;
-	atomic_t vf_use_cnt;
+
 	/* 1: allocated, serial number is valid. 0: not allocated */
 	u32 vf_alloc;
 	/* Serial number of the VF to team with */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index dde17c0..9375d82 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -670,50 +670,20 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 	struct net_device *vf_netdev;
 	struct sk_buff *skb;
 	struct netvsc_stats *rx_stats;
-	u32 bytes_recvd = packet->total_data_buflen;
-	int ret = 0;
 
-	if (!net || net->reg_state != NETREG_REGISTERED)
+	if (net->reg_state != NETREG_REGISTERED)
 		return NVSP_STAT_FAIL;
 
+	/*
+	 * If necessary, inject this packet into the VF interface.
+	 * On Hyper-V, multicast and brodcast packets are only delivered
+	 * to the synthetic interface (after subjecting these to
+	 * policy filters on the host). Deliver these via the VF
+	 * interface in the guest.
+	 */
 	vf_netdev = rcu_dereference(net_device_ctx->vf_netdev);
-	if (vf_netdev) {
-		struct sk_buff *vf_skb;
-
-		atomic_inc(&net_device_ctx->vf_use_cnt);
-		if (!net_device_ctx->vf_inject) {
-			/*
-			 * We raced; just move on.
-			 */
-			atomic_dec(&net_device_ctx->vf_use_cnt);
-			goto vf_injection_done;
-		}
-
-		/*
-		 * Inject this packet into the VF inerface.
-		 * On Hyper-V, multicast and brodcast packets
-		 * are only delivered on the synthetic interface
-		 * (after subjecting these to policy filters on
-		 * the host). Deliver these via the VF interface
-		 * in the guest.
-		 */
-		vf_skb = netvsc_alloc_recv_skb(vf_netdev,
-					       packet, csum_info, *data,
-					       vlan_tci);
-		if (vf_skb != NULL) {
-			++vf_netdev->stats.rx_packets;
-			vf_netdev->stats.rx_bytes += bytes_recvd;
-			netif_receive_skb(vf_skb);
-		} else {
-			++net->stats.rx_dropped;
-			ret = NVSP_STAT_FAIL;
-		}
-		atomic_dec(&net_device_ctx->vf_use_cnt);
-		return ret;
-	}
-
-vf_injection_done:
-	rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
+	if (vf_netdev && (vf_netdev->flags & IFF_UP))
+		net = vf_netdev;
 
 	/* Allocate a skb - TODO direct I/O to pages? */
 	skb = netvsc_alloc_recv_skb(net, packet, csum_info, *data, vlan_tci);
@@ -721,9 +691,17 @@ vf_injection_done:
 		++net->stats.rx_dropped;
 		return NVSP_STAT_FAIL;
 	}
-	skb_record_rx_queue(skb, channel->
-			    offermsg.offer.sub_channel_index);
 
+	if (net != vf_netdev)
+		skb_record_rx_queue(skb,
+				    channel->offermsg.offer.sub_channel_index);
+
+	/*
+	 * Even if injecting the packet, record the statistics
+	 * on the synthetic device because modifying the VF device
+	 * statistics will not work correctly.
+	 */
+	rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
 	u64_stats_update_begin(&rx_stats->syncp);
 	rx_stats->packets++;
 	rx_stats->bytes += packet->total_data_buflen;
@@ -1291,20 +1269,6 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 	return NOTIFY_OK;
 }
 
-static void netvsc_inject_enable(struct net_device_context *net_device_ctx)
-{
-	net_device_ctx->vf_inject = true;
-}
-
-static void netvsc_inject_disable(struct net_device_context *net_device_ctx)
-{
-	net_device_ctx->vf_inject = false;
-
-	/* Wait for currently active users to drain out. */
-	while (atomic_read(&net_device_ctx->vf_use_cnt) != 0)
-		udelay(50);
-}
-
 static int netvsc_vf_up(struct net_device *vf_netdev)
 {
 	struct net_device *ndev;
@@ -1319,7 +1283,6 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
 	netvsc_dev = net_device_ctx->nvdev;
 
 	netdev_info(ndev, "VF up: %s\n", vf_netdev->name);
-	netvsc_inject_enable(net_device_ctx);
 
 	/*
 	 * Open the device before switching data path.
@@ -1354,7 +1317,6 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
 	netvsc_dev = net_device_ctx->nvdev;
 
 	netdev_info(ndev, "VF down: %s\n", vf_netdev->name);
-	netvsc_inject_disable(net_device_ctx);
 	netvsc_switch_datapath(ndev, false);
 	netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
 	rndis_filter_close(netvsc_dev);
@@ -1380,7 +1342,6 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
 	netvsc_dev = net_device_ctx->nvdev;
 
 	netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
-	netvsc_inject_disable(net_device_ctx);
 
 	RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL);
 	dev_put(vf_netdev);
@@ -1435,8 +1396,6 @@ static int netvsc_probe(struct hv_device *dev,
 	spin_lock_init(&net_device_ctx->lock);
 	INIT_LIST_HEAD(&net_device_ctx->reconfig_events);
 
-	atomic_set(&net_device_ctx->vf_use_cnt, 0);
-
 	net->netdev_ops = &device_ops;
 
 	net->hw_features = NETVSC_HW_FEATURES;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 5/7] hv_netvsc: use RCU to protect vf_netdev
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: Stephen Hemminger <sthemmin@microsoft.com>

The vf_netdev pointer in the netvsc device context can simply be protected
by RCU because network device destruction is already RCU synchronized.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |    2 +-
 drivers/net/hyperv/netvsc_drv.c |   29 +++++++++++++++--------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 284b97b..6b79487 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -695,7 +695,7 @@ struct net_device_context {
 	bool start_remove;
 
 	/* State to manage the associated VF interface. */
-	struct net_device *vf_netdev;
+	struct net_device __rcu *vf_netdev;
 	bool vf_inject;
 	atomic_t vf_use_cnt;
 	/* 1: allocated, serial number is valid. 0: not allocated */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 8768219..dde17c0 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -667,8 +667,8 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 {
 	struct net_device *net = hv_get_drvdata(device_obj);
 	struct net_device_context *net_device_ctx = netdev_priv(net);
+	struct net_device *vf_netdev;
 	struct sk_buff *skb;
-	struct sk_buff *vf_skb;
 	struct netvsc_stats *rx_stats;
 	u32 bytes_recvd = packet->total_data_buflen;
 	int ret = 0;
@@ -676,9 +676,12 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 	if (!net || net->reg_state != NETREG_REGISTERED)
 		return NVSP_STAT_FAIL;
 
-	if (READ_ONCE(net_device_ctx->vf_inject)) {
+	vf_netdev = rcu_dereference(net_device_ctx->vf_netdev);
+	if (vf_netdev) {
+		struct sk_buff *vf_skb;
+
 		atomic_inc(&net_device_ctx->vf_use_cnt);
-		if (!READ_ONCE(net_device_ctx->vf_inject)) {
+		if (!net_device_ctx->vf_inject) {
 			/*
 			 * We raced; just move on.
 			 */
@@ -694,13 +697,12 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 		 * the host). Deliver these via the VF interface
 		 * in the guest.
 		 */
-		vf_skb = netvsc_alloc_recv_skb(net_device_ctx->vf_netdev,
+		vf_skb = netvsc_alloc_recv_skb(vf_netdev,
 					       packet, csum_info, *data,
 					       vlan_tci);
 		if (vf_skb != NULL) {
-			++net_device_ctx->vf_netdev->stats.rx_packets;
-			net_device_ctx->vf_netdev->stats.rx_bytes +=
-				bytes_recvd;
+			++vf_netdev->stats.rx_packets;
+			vf_netdev->stats.rx_bytes += bytes_recvd;
 			netif_receive_skb(vf_skb);
 		} else {
 			++net->stats.rx_dropped;
@@ -1232,7 +1234,7 @@ static struct net_device *get_netvsc_bymac(const u8 *mac)
 	return NULL;
 }
 
-static struct net_device *get_netvsc_byref(const struct net_device *vf_netdev)
+static struct net_device *get_netvsc_byref(struct net_device *vf_netdev)
 {
 	struct net_device *dev;
 
@@ -1248,7 +1250,7 @@ static struct net_device *get_netvsc_byref(const struct net_device *vf_netdev)
 		if (net_device_ctx->nvdev == NULL)
 			continue;	/* device is removed */
 
-		if (net_device_ctx->vf_netdev == vf_netdev)
+		if (rtnl_dereference(net_device_ctx->vf_netdev) == vf_netdev)
 			return dev;	/* a match */
 	}
 
@@ -1275,7 +1277,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 
 	net_device_ctx = netdev_priv(ndev);
 	netvsc_dev = net_device_ctx->nvdev;
-	if (!netvsc_dev || net_device_ctx->vf_netdev)
+	if (!netvsc_dev || rtnl_dereference(net_device_ctx->vf_netdev))
 		return NOTIFY_DONE;
 
 	netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
@@ -1285,7 +1287,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 	try_module_get(THIS_MODULE);
 
 	dev_hold(vf_netdev);
-	net_device_ctx->vf_netdev = vf_netdev;
+	rcu_assign_pointer(net_device_ctx->vf_netdev, vf_netdev);
 	return NOTIFY_OK;
 }
 
@@ -1379,7 +1381,8 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
 
 	netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
 	netvsc_inject_disable(net_device_ctx);
-	net_device_ctx->vf_netdev = NULL;
+
+	RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL);
 	dev_put(vf_netdev);
 	module_put(THIS_MODULE);
 	return NOTIFY_OK;
@@ -1433,8 +1436,6 @@ static int netvsc_probe(struct hv_device *dev,
 	INIT_LIST_HEAD(&net_device_ctx->reconfig_events);
 
 	atomic_set(&net_device_ctx->vf_use_cnt, 0);
-	net_device_ctx->vf_netdev = NULL;
-	net_device_ctx->vf_inject = false;
 
 	net->netdev_ops = &device_ops;
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 7/7] hv_netvsc: count multicast packets received
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1474588595-16054-1-git-send-email-sthemmin@exchange.microsoft.com>

From: Stephen Hemminger <sthemmin@microsoft.com>

Useful for debugging issues with multicast and SR-IOV to keep track
of number of received multicast packets.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |    2 ++
 drivers/net/hyperv/netvsc_drv.c |    9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 1d49740..7130bf9 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -649,6 +649,8 @@ struct multi_recv_comp {
 struct netvsc_stats {
 	u64 packets;
 	u64 bytes;
+	u64 broadcast;
+	u64 multicast;
 	struct u64_stats_sync syncp;
 };
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 9375d82..52eeb2f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -705,6 +705,11 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 	u64_stats_update_begin(&rx_stats->syncp);
 	rx_stats->packets++;
 	rx_stats->bytes += packet->total_data_buflen;
+
+	if (skb->pkt_type == PACKET_BROADCAST)
+		++rx_stats->broadcast;
+	else if (skb->pkt_type == PACKET_MULTICAST)
+		++rx_stats->multicast;
 	u64_stats_update_end(&rx_stats->syncp);
 
 	/*
@@ -947,7 +952,7 @@ static struct rtnl_link_stats64 *netvsc_get_stats64(struct net_device *net,
 							    cpu);
 		struct netvsc_stats *rx_stats = per_cpu_ptr(ndev_ctx->rx_stats,
 							    cpu);
-		u64 tx_packets, tx_bytes, rx_packets, rx_bytes;
+		u64 tx_packets, tx_bytes, rx_packets, rx_bytes, rx_multicast;
 		unsigned int start;
 
 		do {
@@ -960,12 +965,14 @@ static struct rtnl_link_stats64 *netvsc_get_stats64(struct net_device *net,
 			start = u64_stats_fetch_begin_irq(&rx_stats->syncp);
 			rx_packets = rx_stats->packets;
 			rx_bytes = rx_stats->bytes;
+			rx_multicast = rx_stats->multicast + rx_stats->broadcast;
 		} while (u64_stats_fetch_retry_irq(&rx_stats->syncp, start));
 
 		t->tx_bytes	+= tx_bytes;
 		t->tx_packets	+= tx_packets;
 		t->rx_bytes	+= rx_bytes;
 		t->rx_packets	+= rx_packets;
+		t->multicast	+= rx_multicast;
 	}
 
 	t->tx_dropped	= net->stats.tx_dropped;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH net-next 0/7] hv_netvsc changes
From: sthemmin @ 2016-09-22 23:56 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, davem; +Cc: netdev, Stephen Hemminger

From: Stephen Hemminger <sthemmin@microsoft.com>

These are mostly about improving the handling of interaction between
the virtual network device (netvsc) and the SR-IOV VF network device.

Stephen Hemminger (7):
  hv_netvsc: use consume_skb
  hv_netvsc: dev hold/put reference to VF
  hv_netvsc: simplify callback event code
  hv_netvsc: improve VF device matching
  hv_netvsc: use RCU to protect vf_netdev
  hv_netvsc: remove VF in flight counters
  hv_netvsc: count multicast packets received

 drivers/net/hyperv/hyperv_net.h |    7 +-
 drivers/net/hyperv/netvsc.c     |    4 +-
 drivers/net/hyperv/netvsc_drv.c |  188 +++++++++++++++++---------------------
 3 files changed, 90 insertions(+), 109 deletions(-)

-- 
1.7.4.1

^ permalink raw reply

* Re: [PATCH] net: VRF: Fix receiving multicast traffic
From: Mark Tomlinson @ 2016-09-22 22:10 UTC (permalink / raw)
  To: David Ahern, netdev@vger.kernel.org
In-Reply-To: <aac5b560-d757-3281-8e69-7953e7c4b418@cumulusnetworks.com>


On 09/23/2016 03:14 AM, David Ahern wrote:
>
> l3mdev devices do not support IPv4 multicast so checking mcast against that device should not be working at all. For that reason I was fine with the change in the previous patch. ie., you want the real ingress device there not the vrf device.
>
> What test are you running that says your previous patch broke something?
Although we do not expect any multicast routing to work in an l3mdev, 
(IGMP snooping or PIM), we still want to have multicast packets 
delivered for protocols such as RIP. This was working before my previous 
patch, but these multicast packets are now dropped. This current patch 
fixes that again, hopefully still with the benefits of my first patch.

^ permalink raw reply

* Re: [PATCH net-next] tcp: add tcp_add_backlog()
From: Marcelo Ricardo Leitner @ 2016-09-22 22:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <1472308674.14381.226.camel@edumazet-glaptop3.roam.corp.google.com>

On Sat, Aug 27, 2016 at 07:37:54AM -0700, Eric Dumazet wrote:
> +bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
> +{
> +	u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
                                 ^^^
...
> +	if (!skb->data_len)
> +		skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> +
> +	if (unlikely(sk_add_backlog(sk, skb, limit))) {
...
> -	} else if (unlikely(sk_add_backlog(sk, skb,
> -					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
	                                                 ^---- [1]
> -		bh_unlock_sock(sk);
> -		__NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
> +	} else if (tcp_add_backlog(sk, skb)) {

Hi Eric, after this patch, do you think we still need to add sk_sndbuf
as a stretching factor to the backlog here?

It was added by [1] and it was justified that the (s)ack packets were
just too big for the rx buf size. Maybe this new patch alone is enough
already, as such packets will have a very small truesize then.

  Marcelo

[1] da882c1f2eca ("tcp: sk_add_backlog() is too agressive for TCP")

^ permalink raw reply

* Re: [PATCH net] act_ife: Add support for machines with hard_header_len != mac_len
From: Jamal Hadi Salim @ 2016-09-22 22:39 UTC (permalink / raw)
  To: Yotam Gigi, davem, netdev, Roman Mashak
In-Reply-To: <1474462453-37668-1-git-send-email-yotamg@mellanox.com>

On 16-09-21 08:54 AM, Yotam Gigi wrote:
> Without that fix, the following could occur:
>  - On encode ingress, the total amount of skb_pushes (in lines 751 and
>    753) was more than specified in cow.
>  - On machines with hard_header_len > mac_len, the packet format was not

Just curious: What hardware would this be?


> Fixes: ef6980b6becb ("net sched: introduce IFE action")
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> ---
>  net/sched/act_ife.c | 34 +++++++++++++++++++++++++---------
>  1 file changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
> index e87cd81..27b19ca 100644
> --- a/net/sched/act_ife.c
> +++ b/net/sched/act_ife.c
> @@ -708,11 +708,13 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
>  	   where ORIGDATA = original ethernet header ...
>  	 */
>  	u16 metalen = ife_get_sz(skb, ife);
> -	int hdrm = metalen + skb->dev->hard_header_len + IFE_METAHDRLEN;
> -	unsigned int skboff = skb->dev->hard_header_len;
>  	u32 at = G_TC_AT(skb->tc_verd);
> -	int new_len = skb->len + hdrm;
>  	bool exceed_mtu = false;
> +	unsigned int skboff;
> +	int total_push;
> +	int reserve;
> +	int new_len;
> +	int hdrm;
>  	int err;
>
>  	if (at & AT_EGRESS) {
> @@ -724,6 +726,22 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
>  	bstats_update(&ife->tcf_bstats, skb);
>  	tcf_lastuse_update(&ife->tcf_tm);
>
> +	if (at & AT_EGRESS) {
> +		/* on egress, reserve space for hard_header_len instead of
> +		 * mac_len
> +		 */
> +		skb_reset_mac_len(skb);

The skb_reset_mac_len() above is unneeded.

> +		hdrm = metalen + skb->mac_len + IFE_METAHDRLEN;

Can you move this line outside of the if? It appears on the else
so factoring it out is useful.

> +		total_push = hdrm;
> +		reserve = metalen + skb->dev->hard_header_len + IFE_METAHDRLEN;
> +	} else {
> +		/* on ingress, push mac_len as it already get parsed from tc */
> +		hdrm = metalen + skb->mac_len + IFE_METAHDRLEN;
> +		total_push = hdrm + skb->mac_len;
> +		reserve = total_push;
> +	}
> +	new_len =  skb->len + hdrm;
> +
>  	if (!metalen) {		/* no metadata to send */
>  		/* abuse overlimits to count when we allow packet
>  		 * with no metadata
> @@ -742,19 +760,17 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
>
>  	iethh = eth_hdr(skb);
>
> -	err = skb_cow_head(skb, hdrm);
> +	err = skb_cow_head(skb, reserve);
>  	if (unlikely(err)) {
>  		ife->tcf_qstats.drops++;
>  		spin_unlock(&ife->tcf_lock);
>  		return TC_ACT_SHOT;
>  	}
>
> -	if (!(at & AT_EGRESS))
> -		skb_push(skb, skb->dev->hard_header_len);
> -
> -	__skb_push(skb, hdrm);
> +	__skb_push(skb, total_push);
>  	memcpy(skb->data, iethh, skb->mac_len);
>  	skb_reset_mac_header(skb);
> +	skboff += skb->mac_len;

Above looks dangerous. Did the compiler not warn?
Maybe init skboff to skb->mac_len at the top.

Otherwise the ingress bits look good. Thanks!

Please fix above and resend with:
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] net: VRF: Fix receiving multicast traffic
From: David Ahern @ 2016-09-22 22:41 UTC (permalink / raw)
  To: Mark Tomlinson, netdev@vger.kernel.org
In-Reply-To: <ee8de997-007c-4035-5f65-a8b389a138ac@alliedtelesis.co.nz>

On 9/22/16 4:10 PM, Mark Tomlinson wrote:
> 
> On 09/23/2016 03:14 AM, David Ahern wrote:
>>
>> l3mdev devices do not support IPv4 multicast so checking mcast against that device should not be working at all. For that reason I was fine with the change in the previous patch. ie., you want the real ingress device there not the vrf device.
>>
>> What test are you running that says your previous patch broke something?
> Although we do not expect any multicast routing to work in an l3mdev, 
> (IGMP snooping or PIM), we still want to have multicast packets 
> delivered for protocols such as RIP. This was working before my previous 
> patch, but these multicast packets are now dropped. This current patch 
> fixes that again, hopefully still with the benefits of my first patch.
> 

can you discern which check is making that happen?

It does not make sense to look at the in_device of a vrf device for mcast addresses. For IPv6 linklocal and mcast is specifically blocked. IPv4 should do the same. So, how is RIP getting the packet at all?

^ permalink raw reply

* [PATCH net-next] drivers: net: xgene: Fix MSS programming
From: Iyappan Subramanian @ 2016-09-22 22:47 UTC (permalink / raw)
  To: davem, netdev; +Cc: linux-arm-kernel, patches, Iyappan Subramanian, Toan Le

Current driver programs static value of MSS in hardware register for TSO
offload engine to segment the TCP payload regardless the MSS value
provided by network stack.

This patch fixes this by programming hardware registers with the
stack provided MSS value.

Since the hardware has the limitation of having only 4 MSS registers,
this patch uses reference count of mss values being used.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h    |  7 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 90 ++++++++++++++++++-----
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  8 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 18 ++++-
 4 files changed, 100 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index 8a8d055..8456337 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -237,6 +237,8 @@ enum xgene_enet_rm {
 #define TCPHDR_LEN			6
 #define IPHDR_POS			6
 #define IPHDR_LEN			6
+#define MSS_POS				20
+#define MSS_LEN				2
 #define EC_POS				22	/* Enable checksum */
 #define EC_LEN				1
 #define ET_POS				23	/* Enable TSO */
@@ -253,6 +255,11 @@ enum xgene_enet_rm {
 
 #define LAST_BUFFER			(0x7800ULL << BUFDATALEN_POS)
 
+#define TSO_MSS0_POS			0
+#define TSO_MSS0_LEN			14
+#define TSO_MSS1_POS			16
+#define TSO_MSS1_LEN			14
+
 struct xgene_enet_raw_desc {
 	__le64 m0;
 	__le64 m1;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 522ba92..429f18f 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -137,6 +137,7 @@ static irqreturn_t xgene_enet_rx_irq(const int irq, void *data)
 static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
 				    struct xgene_enet_raw_desc *raw_desc)
 {
+	struct xgene_enet_pdata *pdata = netdev_priv(cp_ring->ndev);
 	struct sk_buff *skb;
 	struct device *dev;
 	skb_frag_t *frag;
@@ -144,6 +145,7 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
 	u16 skb_index;
 	u8 status;
 	int i, ret = 0;
+	u8 mss_index;
 
 	skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0));
 	skb = cp_ring->cp_skb[skb_index];
@@ -160,6 +162,13 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
 			       DMA_TO_DEVICE);
 	}
 
+	if (GET_BIT(ET, le64_to_cpu(raw_desc->m3))) {
+		mss_index = GET_VAL(MSS, le64_to_cpu(raw_desc->m3));
+		spin_lock(&pdata->mss_lock);
+		pdata->mss_refcnt[mss_index]--;
+		spin_unlock(&pdata->mss_lock);
+	}
+
 	/* Checking for error */
 	status = GET_VAL(LERR, le64_to_cpu(raw_desc->m0));
 	if (unlikely(status > 2)) {
@@ -178,15 +187,53 @@ static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
 	return ret;
 }
 
-static u64 xgene_enet_work_msg(struct sk_buff *skb)
+static int xgene_enet_setup_mss(struct net_device *ndev, u32 mss)
+{
+	struct xgene_enet_pdata *pdata = netdev_priv(ndev);
+	bool mss_index_found = false;
+	int mss_index;
+	int i;
+
+	spin_lock(&pdata->mss_lock);
+
+	/* Reuse the slot if MSS matches */
+	for (i = 0; !mss_index_found && i < NUM_MSS_REG; i++) {
+		if (pdata->mss[i] == mss) {
+			pdata->mss_refcnt[i]++;
+			mss_index = i;
+			mss_index_found = true;
+		}
+	}
+
+	/* Overwrite the slot with ref_count = 0 */
+	for (i = 0; !mss_index_found && i < NUM_MSS_REG; i++) {
+		if (!pdata->mss_refcnt[i]) {
+			pdata->mss_refcnt[i]++;
+			pdata->mac_ops->set_mss(pdata, mss, i);
+			pdata->mss[i] = mss;
+			mss_index = i;
+			mss_index_found = true;
+		}
+	}
+
+	spin_unlock(&pdata->mss_lock);
+
+	/* No slots with ref_count = 0 available, return busy */
+	if (!mss_index_found)
+		return -EBUSY;
+
+	return mss_index;
+}
+
+static int xgene_enet_work_msg(struct sk_buff *skb, u64 *hopinfo)
 {
 	struct net_device *ndev = skb->dev;
 	struct iphdr *iph;
 	u8 l3hlen = 0, l4hlen = 0;
 	u8 ethhdr, proto = 0, csum_enable = 0;
-	u64 hopinfo = 0;
 	u32 hdr_len, mss = 0;
 	u32 i, len, nr_frags;
+	int mss_index;
 
 	ethhdr = xgene_enet_hdr_len(skb->data);
 
@@ -226,7 +273,11 @@ static u64 xgene_enet_work_msg(struct sk_buff *skb)
 			if (!mss || ((skb->len - hdr_len) <= mss))
 				goto out;
 
-			hopinfo |= SET_BIT(ET);
+			mss_index = xgene_enet_setup_mss(ndev, mss);
+			if (unlikely(mss_index < 0))
+				return -EBUSY;
+
+			*hopinfo |= SET_BIT(ET) | SET_VAL(MSS, mss_index);
 		}
 	} else if (iph->protocol == IPPROTO_UDP) {
 		l4hlen = UDP_HDR_SIZE;
@@ -234,15 +285,15 @@ static u64 xgene_enet_work_msg(struct sk_buff *skb)
 	}
 out:
 	l3hlen = ip_hdrlen(skb) >> 2;
-	hopinfo |= SET_VAL(TCPHDR, l4hlen) |
-		  SET_VAL(IPHDR, l3hlen) |
-		  SET_VAL(ETHHDR, ethhdr) |
-		  SET_VAL(EC, csum_enable) |
-		  SET_VAL(IS, proto) |
-		  SET_BIT(IC) |
-		  SET_BIT(TYPE_ETH_WORK_MESSAGE);
-
-	return hopinfo;
+	*hopinfo |= SET_VAL(TCPHDR, l4hlen) |
+		    SET_VAL(IPHDR, l3hlen) |
+		    SET_VAL(ETHHDR, ethhdr) |
+		    SET_VAL(EC, csum_enable) |
+		    SET_VAL(IS, proto) |
+		    SET_BIT(IC) |
+		    SET_BIT(TYPE_ETH_WORK_MESSAGE);
+
+	return 0;
 }
 
 static u16 xgene_enet_encode_len(u16 len)
@@ -282,20 +333,22 @@ static int xgene_enet_setup_tx_desc(struct xgene_enet_desc_ring *tx_ring,
 	dma_addr_t dma_addr, pbuf_addr, *frag_dma_addr;
 	skb_frag_t *frag;
 	u16 tail = tx_ring->tail;
-	u64 hopinfo;
+	u64 hopinfo = 0;
 	u32 len, hw_len;
 	u8 ll = 0, nv = 0, idx = 0;
 	bool split = false;
 	u32 size, offset, ell_bytes = 0;
 	u32 i, fidx, nr_frags, count = 1;
+	int ret;
 
 	raw_desc = &tx_ring->raw_desc[tail];
 	tail = (tail + 1) & (tx_ring->slots - 1);
 	memset(raw_desc, 0, sizeof(struct xgene_enet_raw_desc));
 
-	hopinfo = xgene_enet_work_msg(skb);
-	if (!hopinfo)
-		return -EINVAL;
+	ret = xgene_enet_work_msg(skb, &hopinfo);
+	if (ret)
+		return ret;
+
 	raw_desc->m3 = cpu_to_le64(SET_VAL(HENQNUM, tx_ring->dst_ring_num) |
 				   hopinfo);
 
@@ -435,6 +488,9 @@ static netdev_tx_t xgene_enet_start_xmit(struct sk_buff *skb,
 		return NETDEV_TX_OK;
 
 	count = xgene_enet_setup_tx_desc(tx_ring, skb);
+	if (count == -EBUSY)
+		return NETDEV_TX_BUSY;
+
 	if (count <= 0) {
 		dev_kfree_skb_any(skb);
 		return NETDEV_TX_OK;
@@ -1669,7 +1725,7 @@ static int xgene_enet_probe(struct platform_device *pdev)
 
 	if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
 		ndev->features |= NETIF_F_TSO;
-		pdata->mss = XGENE_ENET_MSS;
+		spin_lock_init(&pdata->mss_lock);
 	}
 	ndev->hw_features = ndev->features;
 
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 7735371..0cda58f 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -47,7 +47,7 @@
 #define NUM_PKT_BUF	64
 #define NUM_BUFPOOL	32
 #define MAX_EXP_BUFFS	256
-#define XGENE_ENET_MSS	1448
+#define NUM_MSS_REG	4
 #define XGENE_MIN_ENET_FRAME_SIZE	60
 
 #define XGENE_MAX_ENET_IRQ	16
@@ -143,7 +143,7 @@ struct xgene_mac_ops {
 	void (*rx_disable)(struct xgene_enet_pdata *pdata);
 	void (*set_speed)(struct xgene_enet_pdata *pdata);
 	void (*set_mac_addr)(struct xgene_enet_pdata *pdata);
-	void (*set_mss)(struct xgene_enet_pdata *pdata);
+	void (*set_mss)(struct xgene_enet_pdata *pdata, u16 mss, u8 index);
 	void (*link_state)(struct work_struct *work);
 };
 
@@ -212,7 +212,9 @@ struct xgene_enet_pdata {
 	u8 eth_bufnum;
 	u8 bp_bufnum;
 	u16 ring_num;
-	u32 mss;
+	u32 mss[NUM_MSS_REG];
+	u32 mss_refcnt[NUM_MSS_REG];
+	spinlock_t mss_lock;  /* mss lock */
 	u8 tx_delay;
 	u8 rx_delay;
 	bool mdio_driver;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
index 279ee27..6475f38 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c
@@ -232,9 +232,22 @@ static void xgene_xgmac_set_mac_addr(struct xgene_enet_pdata *pdata)
 	xgene_enet_wr_mac(pdata, HSTMACADR_MSW_ADDR, addr1);
 }
 
-static void xgene_xgmac_set_mss(struct xgene_enet_pdata *pdata)
+static void xgene_xgmac_set_mss(struct xgene_enet_pdata *pdata,
+				u16 mss, u8 index)
 {
-	xgene_enet_wr_csr(pdata, XG_TSIF_MSS_REG0_ADDR, pdata->mss);
+	u8 offset;
+	u32 data;
+
+	offset = (index < 2) ? 0 : 4;
+	xgene_enet_rd_csr(pdata, XG_TSIF_MSS_REG0_ADDR + offset, &data);
+
+	if (!(index & 0x1))
+		data = SET_VAL(TSO_MSS1, data >> TSO_MSS1_POS) |
+			SET_VAL(TSO_MSS0, mss);
+	else
+		data = SET_VAL(TSO_MSS1, mss) | SET_VAL(TSO_MSS0, data);
+
+	xgene_enet_wr_csr(pdata, XG_TSIF_MSS_REG0_ADDR + offset, data);
 }
 
 static u32 xgene_enet_link_status(struct xgene_enet_pdata *pdata)
@@ -258,7 +271,6 @@ static void xgene_xgmac_init(struct xgene_enet_pdata *pdata)
 	xgene_enet_wr_mac(pdata, AXGMAC_CONFIG_1, data);
 
 	xgene_xgmac_set_mac_addr(pdata);
-	xgene_xgmac_set_mss(pdata);
 
 	xgene_enet_rd_csr(pdata, XG_RSIF_CONFIG_REG_ADDR, &data);
 	data |= CFG_RSIF_FPBUFF_TIMEOUT_EN;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net 1/2] act_ife: Fix external mac header on encode
From: Jamal Hadi Salim @ 2016-09-22 23:16 UTC (permalink / raw)
  To: Yotam Gigi, davem, netdev, Roman Mashak
In-Reply-To: <1474548926-22815-2-git-send-email-yotamg@mellanox.com>

On 16-09-22 08:55 AM, Yotam Gigi wrote:
> On ife encode side, external mac header is copied from the original packet
> and may be overridden if the user requests. Before, the mac header copy
> was done from memory region that might not be accessible anymore, as
> skb_cow_head might free it and copy the packet. This led to random values
> in the external mac header once the values were not set by user.
>
> This fix takes the internal mac header from the packet, after the call to
> skb_cow_head.


Since this depends on the previous patch, can you double check for me? I 
will test later, but here's a very simple test case:

sudo $TC qdisc del dev $ETH root handle 1: prio
sudo $TC qdisc add dev $ETH root handle 1: prio

#set mark of decimal 17 and allow sending out
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

I am not going to comment on your other patch, but i suggest you
test with with this (encoding at least two TLVs):

sudo $TC qdisc del dev $ETH root handle 1: prio
sudo $TC qdisc add dev $ETH root handle 1: prio
#Override mark and send prio of 0x33 (unfortunately
#skbedit is not very consistent 33 means 0x33)
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbedit prio 33 \
action ife encode \
type 0xDEAD \
use mark 12 \
allow prio \
dst 02:15:15:15:15:15


cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next] tcp: add tcp_add_backlog()
From: Eric Dumazet @ 2016-09-22 23:21 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: David Miller, netdev, Neal Cardwell, Yuchung Cheng
In-Reply-To: <20160922223411.GA17222@localhost.localdomain>

On Thu, 2016-09-22 at 19:34 -0300, Marcelo Ricardo Leitner wrote:
> On Sat, Aug 27, 2016 at 07:37:54AM -0700, Eric Dumazet wrote:
> > +bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
> > +{
> > +	u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
>                                  ^^^
> ...
> > +	if (!skb->data_len)
> > +		skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> > +
> > +	if (unlikely(sk_add_backlog(sk, skb, limit))) {
> ...
> > -	} else if (unlikely(sk_add_backlog(sk, skb,
> > -					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
> 	                                                 ^---- [1]
> > -		bh_unlock_sock(sk);
> > -		__NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
> > +	} else if (tcp_add_backlog(sk, skb)) {
> 
> Hi Eric, after this patch, do you think we still need to add sk_sndbuf
> as a stretching factor to the backlog here?
> 
> It was added by [1] and it was justified that the (s)ack packets were
> just too big for the rx buf size. Maybe this new patch alone is enough
> already, as such packets will have a very small truesize then.
> 
>   Marcelo
> 
> [1] da882c1f2eca ("tcp: sk_add_backlog() is too agressive for TCP")
> 

Hi Marcelo

Yes, it is still needed, some drivers provide linear skbs, so the
skb->truesize of ack packets will likely be the same (skb->head points
to a full size frame allocated by the driver)

^ permalink raw reply

* Re: [Patch net-next] net_sched: check NULL on error path in route4_change()
From: Jamal Hadi Salim @ 2016-09-22 23:26 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <1474239140-6738-1-git-send-email-xiyou.wangcong@gmail.com>

On 16-09-18 06:52 PM, Cong Wang wrote:
> On error path in route4_change(), 'f' could be NULL,
> so we should check NULL before calling tcf_exts_destroy().
>
> Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [Patch net] sch_qfq: keep backlog updated with qlen
From: Jamal Hadi Salim @ 2016-09-22 23:27 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <1474240968-15202-1-git-send-email-xiyou.wangcong@gmail.com>

On 16-09-18 07:22 PM, Cong Wang wrote:
> Reported-by: Stas Nichiporovich <stasn77@gmail.com>
> Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>


cheers,
jamal

^ permalink raw reply

* Re: [Patch net] sch_sfb: keep backlog updated with qlen
From: Jamal Hadi Salim @ 2016-09-22 23:28 UTC (permalink / raw)
  To: Cong Wang, netdev
In-Reply-To: <1474240968-15202-2-git-send-email-xiyou.wangcong@gmail.com>

On 16-09-18 07:22 PM, Cong Wang wrote:
> Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* [PATCH net-next 0/3] Few minor BPF helper improvements
From: Daniel Borkmann @ 2016-09-22 23:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann

Just a few minor improvements around BPF helpers, first one is a
fix but given this late stage and that it's not really a critical
one, I think net-next is just fine. For details please see the
individual patches.

Thanks!

Daniel Borkmann (3):
  bpf: use skb_to_full_sk helper in bpf_skb_under_cgroup
  bpf: use bpf_get_smp_processor_id_proto instead of raw one
  bpf: add helper to invalidate hash

 include/uapi/linux/bpf.h |  7 +++++++
 net/core/filter.c        | 22 +++++++++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

-- 
1.9.3

^ permalink raw reply

* [PATCH net-next 2/3] bpf: use bpf_get_smp_processor_id_proto instead of raw one
From: Daniel Borkmann @ 2016-09-22 23:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1474586162.git.daniel@iogearbox.net>

Same motivation as in commit 80b48c445797 ("bpf: don't use raw processor
id in generic helper"), but this time for XDP typed programs. Thus, allow
for preemption checks when we have DEBUG_PREEMPT enabled, and otherwise
use the raw variant.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 net/core/filter.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index e5d9977..acf84fb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2551,6 +2551,8 @@ xdp_func_proto(enum bpf_func_id func_id)
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
 		return &bpf_xdp_event_output_proto;
+	case BPF_FUNC_get_smp_processor_id:
+		return &bpf_get_smp_processor_id_proto;
 	default:
 		return sk_filter_func_proto(func_id);
 	}
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 3/3] bpf: add helper to invalidate hash
From: Daniel Borkmann @ 2016-09-22 23:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1474586162.git.daniel@iogearbox.net>

Add a small helper that complements 36bbef52c7eb ("bpf: direct packet
write and access for helpers for clsact progs") for invalidating the
current skb->hash after mangling on headers via direct packet write.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/uapi/linux/bpf.h |  7 +++++++
 net/core/filter.c        | 18 ++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e07432b..f09c70b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -419,6 +419,13 @@ enum bpf_func_id {
 	 */
 	BPF_FUNC_csum_update,
 
+	/**
+	 * bpf_set_hash_invalid(skb)
+	 * Invalidate current skb>hash.
+	 * @skb: pointer to skb
+	 */
+	BPF_FUNC_set_hash_invalid,
+
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index acf84fb..00351cd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1777,6 +1777,22 @@ static const struct bpf_func_proto bpf_get_hash_recalc_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_1(bpf_set_hash_invalid, struct sk_buff *, skb)
+{
+	/* After all direct packet write, this can be used once for
+	 * triggering a lazy recalc on next skb_get_hash() invocation.
+	 */
+	skb_clear_hash(skb);
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_set_hash_invalid_proto = {
+	.func		= bpf_set_hash_invalid,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
 BPF_CALL_3(bpf_skb_vlan_push, struct sk_buff *, skb, __be16, vlan_proto,
 	   u16, vlan_tci)
 {
@@ -2534,6 +2550,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 		return &bpf_get_route_realm_proto;
 	case BPF_FUNC_get_hash_recalc:
 		return &bpf_get_hash_recalc_proto;
+	case BPF_FUNC_set_hash_invalid:
+		return &bpf_set_hash_invalid_proto;
 	case BPF_FUNC_perf_event_output:
 		return &bpf_skb_event_output_proto;
 	case BPF_FUNC_get_smp_processor_id:
-- 
1.9.3

^ permalink raw reply related

* [PATCH net-next 1/3] bpf: use skb_to_full_sk helper in bpf_skb_under_cgroup
From: Daniel Borkmann @ 2016-09-22 23:28 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, netdev, Daniel Borkmann
In-Reply-To: <cover.1474586162.git.daniel@iogearbox.net>

We need to use skb_to_full_sk() helper introduced in commit bd5eb35f16a9
("xfrm: take care of request sockets") as otherwise we miss tcp synack
messages, since ownership is on request socket and therefore it would
miss the sk_fullsock() check. Use skb_to_full_sk() as also done similarly
in the bpf_get_cgroup_classid() helper via 2309236c13fe ("cls_cgroup:
get sk_classid only from full sockets") fix to not let this fall through.

Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0920c2a..e5d9977 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2408,7 +2408,7 @@ BPF_CALL_3(bpf_skb_under_cgroup, struct sk_buff *, skb, struct bpf_map *, map,
 	struct cgroup *cgrp;
 	struct sock *sk;

-	sk = skb->sk;
+	sk = skb_to_full_sk(skb);
 	if (!sk || !sk_fullsock(sk))
 		return -ENOENT;
 	if (unlikely(idx >= array->map.max_entries))
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH iproute2] ss: Support displaying and filtering on socket marks.
From: Stephen Hemminger @ 2016-09-22 23:37 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: netdev, shemming, zenczykowski
In-Reply-To: <1474473770-81126-1-git-send-email-lorenzo@google.com>

On Thu, 22 Sep 2016 01:02:50 +0900
Lorenzo Colitti <lorenzo@google.com> wrote:

> This allows the user to dump sockets with a given mark (via
> "fwmark = 0x1234/0x1234" or "fwmark = 12345", etc.) , and to
> display the socket marks of dumped sockets.
> 
> The relevant kernel commits are: d545caca827b ("net: inet: diag:
> expose the socket mark to privileged processes.") and
> - a52e95abf772 ("net: diag: allow socket bytecode filters to
> match socket marks")
> 
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH iproute2] misc/ss: tcp cwnd should be unsigned
From: Stephen Hemminger @ 2016-09-22 23:39 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, Phil Sutter
In-Reply-To: <1474533628-5784-1-git-send-email-liuhangbin@gmail.com>

On Thu, 22 Sep 2016 16:40:28 +0800
Hangbin Liu <liuhangbin@gmail.com> wrote:

> tcp->snd_cwd is a u32, but ss treats it like a signed int. This may
> results in negative bandwidth calculations.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Sure applied.

^ permalink raw reply

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Jamal Hadi Salim @ 2016-09-22 23:40 UTC (permalink / raw)
  To: Shmulik Ladkani, David S. Miller
  Cc: WANG Cong, Eric Dumazet, netdev, Shmulik Ladkani
In-Reply-To: <1474550512-7552-5-git-send-email-shmulik.ladkani@gmail.com>

On 16-09-22 09:21 AM, Shmulik Ladkani wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
>
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
>
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.
>

Thank you for doing this. There was something that made me remove
initial support for this feature - I am blanking out right now but
will find my notes and give more details. It may be around preventing
loops maybe. If that was the thought then:
I am just wondering is there a use case for a packet that is redirected
from egress ethx to ingress of ethy that then requires ingress of ethy
classify? Otherwise you could just set the "dont classify" flag.
i.e SET_TC_NCLS()

cheers,
jamal

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox