netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 00/10] netvsc fixes and new features
@ 2017-07-26 23:40 Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 01/10] netvsc: fix return value for set_channels Stephen Hemminger
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, linux-doc, netdev

This series has updates for Hyper-V network driver.
The first four patches are just minor fixes.

The next group of patches aims to reduce the driver memory footprint
by reducing the size of the per-device receive and send buffers, and
the per-channel receive completion queue.

The biggest change is in the last three patches that implement transparent
management of SR-IOV VF device. A little more explaination is required.

What does it do?

With the new netvsc VF logic, the Linux hyper-V network
virtual driver will directly manage the link to SR-IOV VF device.
When VF device is detected (hot plug) it is automatically made a
slave device of the netvsc device. The VF device state reflects
the state of the netvsc device; i.e. if netvsc is set down, then
VF is set down. If netvsc is set up, then VF is brought up.
 
Packet flow is independent of VF status; all packets are sent and
received as if they were associated with the netvsc device. If VF is
removed or link is down then the synthetic VMBUS path is used.
 
What was wrong with using bonding script?

A lot of work went into getting the bonding script to work on all
distributions, but it was a major struggle. Linux network devices
can be configured many, many ways and there is no one solution from
userspace to make it all work. What is really hard is when
configuration is attached to synthetic device during boot (eth0) and
then the same addresses and firewall rules needs to also work later if
doing bonding. The new code gets around all of this.
 
How does VF work during initialization?

Since all packets are sent and received through the logical netvsc
device, initialization is much easier. Just configure the regular
netvsc Ethernet device; when/if SR-IOV is enabled it just
works. Provisioning and cloud init only need to worry about setting up
netvsc device (eth0). If SR-IOV is enabled (even as a later step), the
address and rules stay the same.
 
What devices show up?

Both netvsc and PCI devices are visible in the system. The netvsc
device is active and named in usual manner (eth0). The PCI device is
visible to Linux and gets renamed by udev to a persistent name
(enP2p3s0). The PCI device name is now irrelevant now.

The logica also sets the PCI VF device SLAVE flag on the network
device so network tools can see the relationship if they are smart
enough to understand how layered devices work.
 
This is a lot like how I see Windows working.
The VF device is visible in Device Manager, but is not configured.
 
Is there any performance impact?
There is no visible change in performance. The bonding
and netvsc driver both have equivalent steps.
 
Is it compatible with old bonding script?

It turns out that if you use the old bonding script, then everything
still works but in a sub-optimum manner. What happens is that bonding
is unable to steal the VF from the netvsc device so it creates a one
legged bond.  Packet flow then is:
	bond0 <--> eth0 <- -> VF (enP2p3s0).
In other words, if you get it wrong it still works, just
awkward and slower.
 
What if I add address or firewall rule onto the VF?

Same problems occur with now as already occur with bonding, bridging,
teaming on Linux if user incorrectly does configuration onto
an underlying slave device. It will sort of work, packets will come in
and out but the Linux kernel gets confused and things like ARP don’t
work right.  There is no way to block manipulation of the slave
device, and I am sure someone will find some special use case where
they want it.

Stephen Hemminger (10):
  netvsc: fix return value for set_channels
  netvsc: fix warnings reported by lockdep
  netvsc: don't print pointer value in error message
  netvsc: remove unnecessary indirection of page_buffer
  netvsc: optimize receive completions
  netvsc: signal host if receive ring is emptied
  netvsc: allow smaller send/recv buffer size
  netvsc: transparent VF management
  netvsc: add documentation
  netvsc: remove bonding setup script

 Documentation/networking/netvsc.txt |  50 ++++
 MAINTAINERS                         |   1 +
 drivers/net/hyperv/hyperv_net.h     |  33 ++-
 drivers/net/hyperv/netvsc.c         | 317 +++++++++++-------------
 drivers/net/hyperv/netvsc_drv.c     | 476 ++++++++++++++++++++++++++++--------
 drivers/net/hyperv/rndis_filter.c   | 108 ++++----
 tools/hv/bondvf.sh                  | 255 -------------------
 7 files changed, 642 insertions(+), 598 deletions(-)
 create mode 100644 Documentation/networking/netvsc.txt
 delete mode 100755 tools/hv/bondvf.sh

-- 
2.11.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 01/10] netvsc: fix return value for set_channels
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 02/10] netvsc: fix warnings reported by lockdep Stephen Hemminger
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, linux-doc, netdev

The error and normal case got swapped.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 262486ce8e2a..f1eaf675d2e9 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -758,8 +758,8 @@ static int netvsc_set_channels(struct net_device *net,
 	if (!IS_ERR(nvdev)) {
 		netif_set_real_num_tx_queues(net, nvdev->num_chn);
 		netif_set_real_num_rx_queues(net, nvdev->num_chn);
-		ret = PTR_ERR(nvdev);
 	} else {
+		ret = PTR_ERR(nvdev);
 		device_info.num_chn = orig;
 		rndis_filter_device_add(dev, &device_info);
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 02/10] netvsc: fix warnings reported by lockdep
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 01/10] netvsc: fix return value for set_channels Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 03/10] netvsc: don't print pointer value in error message Stephen Hemminger
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, netdev, linux-doc

This includes a bunch of fixups for issues reported by
lockdep.
   * ethtool routines can assume RTNL
   * send is done with RCU lock (and BH disable)
   * avoid refetching internal device struct (netvsc)
     instead pass it as a parameter.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h   |  3 +-
 drivers/net/hyperv/netvsc.c       |  2 +-
 drivers/net/hyperv/netvsc_drv.c   | 15 ++++---
 drivers/net/hyperv/rndis_filter.c | 84 +++++++++++++++++++--------------------
 4 files changed, 54 insertions(+), 50 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 4e7ff348327e..fb62ea632914 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -217,7 +217,8 @@ int rndis_filter_receive(struct net_device *ndev,
 			 struct vmbus_channel *channel,
 			 void *data, u32 buflen);
 
-int rndis_filter_set_device_mac(struct net_device *ndev, char *mac);
+int rndis_filter_set_device_mac(struct netvsc_device *ndev,
+				const char *mac);
 
 void netvsc_switch_datapath(struct net_device *nv_dev, bool vf);
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 06f39a99da7c..94c00acac58a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -832,7 +832,7 @@ int netvsc_send(struct net_device_context *ndev_ctx,
 		struct sk_buff *skb)
 {
 	struct netvsc_device *net_device
-		= rcu_dereference_rtnl(ndev_ctx->nvdev);
+		= rcu_dereference_bh(ndev_ctx->nvdev);
 	struct hv_device *device = ndev_ctx->device_ctx;
 	int ret = 0;
 	struct netvsc_channel *nvchan;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f1eaf675d2e9..a04f2efbbc25 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -923,6 +923,8 @@ static void netvsc_get_stats64(struct net_device *net,
 
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
 {
+	struct net_device_context *ndc = netdev_priv(ndev);
+	struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
 	struct sockaddr *addr = p;
 	char save_adr[ETH_ALEN];
 	unsigned char save_aatype;
@@ -935,7 +937,10 @@ static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
 	if (err != 0)
 		return err;
 
-	err = rndis_filter_set_device_mac(ndev, addr->sa_data);
+	if (!nvdev)
+		return -ENODEV;
+
+	err = rndis_filter_set_device_mac(nvdev, addr->sa_data);
 	if (err != 0) {
 		/* roll back to saved MAC */
 		memcpy(ndev->dev_addr, save_adr, ETH_ALEN);
@@ -981,7 +986,7 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
 				     struct ethtool_stats *stats, u64 *data)
 {
 	struct net_device_context *ndc = netdev_priv(dev);
-	struct netvsc_device *nvdev = rcu_dereference(ndc->nvdev);
+	struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
 	const void *nds = &ndc->eth_stats;
 	const struct netvsc_stats *qstats;
 	unsigned int start;
@@ -1019,7 +1024,7 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
 static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 {
 	struct net_device_context *ndc = netdev_priv(dev);
-	struct netvsc_device *nvdev = rcu_dereference(ndc->nvdev);
+	struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
 	u8 *p = data;
 	int i;
 
@@ -1077,7 +1082,7 @@ netvsc_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info,
 		 u32 *rules)
 {
 	struct net_device_context *ndc = netdev_priv(dev);
-	struct netvsc_device *nvdev = rcu_dereference(ndc->nvdev);
+	struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
 
 	if (!nvdev)
 		return -ENODEV;
@@ -1127,7 +1132,7 @@ static int netvsc_get_rxfh(struct net_device *dev, u32 *indir, u8 *key,
 			   u8 *hfunc)
 {
 	struct net_device_context *ndc = netdev_priv(dev);
-	struct netvsc_device *ndev = rcu_dereference(ndc->nvdev);
+	struct netvsc_device *ndev = rtnl_dereference(ndc->nvdev);
 	struct rndis_device *rndis_dev;
 	int i;
 
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index eaa3f0d5682a..bf21ea92c743 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -85,14 +85,6 @@ static struct rndis_device *get_rndis_device(void)
 	return device;
 }
 
-static struct netvsc_device *
-net_device_to_netvsc_device(struct net_device *ndev)
-{
-	struct net_device_context *net_device_ctx = netdev_priv(ndev);
-
-	return rtnl_dereference(net_device_ctx->nvdev);
-}
-
 static struct rndis_request *get_rndis_request(struct rndis_device *dev,
 					     u32 msg_type,
 					     u32 msg_len)
@@ -252,7 +244,10 @@ static int rndis_filter_send_request(struct rndis_device *dev,
 			pb[0].len;
 	}
 
+	rcu_read_lock_bh();
 	ret = netvsc_send(net_device_ctx, packet, NULL, &pb, NULL);
+	rcu_read_unlock_bh();
+
 	return ret;
 }
 
@@ -452,8 +447,9 @@ int rndis_filter_receive(struct net_device *ndev,
 	return 0;
 }
 
-static int rndis_filter_query_device(struct rndis_device *dev, u32 oid,
-				  void *result, u32 *result_size)
+static int rndis_filter_query_device(struct rndis_device *dev,
+				     struct netvsc_device *nvdev,
+				     u32 oid, void *result, u32 *result_size)
 {
 	struct rndis_request *request;
 	u32 inresult_size = *result_size;
@@ -480,8 +476,6 @@ static int rndis_filter_query_device(struct rndis_device *dev, u32 oid,
 	query->dev_vc_handle = 0;
 
 	if (oid == OID_TCP_OFFLOAD_HARDWARE_CAPABILITIES) {
-		struct net_device_context *ndevctx = netdev_priv(dev->ndev);
-		struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
 		struct ndis_offload *hwcaps;
 		u32 nvsp_version = nvdev->nvsp_version;
 		u8 ndis_rev;
@@ -550,14 +544,15 @@ static int rndis_filter_query_device(struct rndis_device *dev, u32 oid,
 
 /* Get the hardware offload capabilities */
 static int
-rndis_query_hwcaps(struct rndis_device *dev, struct ndis_offload *caps)
+rndis_query_hwcaps(struct rndis_device *dev, struct netvsc_device *net_device,
+		   struct ndis_offload *caps)
 {
 	u32 caps_len = sizeof(*caps);
 	int ret;
 
 	memset(caps, 0, sizeof(*caps));
 
-	ret = rndis_filter_query_device(dev,
+	ret = rndis_filter_query_device(dev, net_device,
 					OID_TCP_OFFLOAD_HARDWARE_CAPABILITIES,
 					caps, &caps_len);
 	if (ret)
@@ -586,11 +581,12 @@ rndis_query_hwcaps(struct rndis_device *dev, struct ndis_offload *caps)
 	return 0;
 }
 
-static int rndis_filter_query_device_mac(struct rndis_device *dev)
+static int rndis_filter_query_device_mac(struct rndis_device *dev,
+					 struct netvsc_device *net_device)
 {
 	u32 size = ETH_ALEN;
 
-	return rndis_filter_query_device(dev,
+	return rndis_filter_query_device(dev, net_device,
 				      RNDIS_OID_802_3_PERMANENT_ADDRESS,
 				      dev->hw_mac_adr, &size);
 }
@@ -598,9 +594,9 @@ static int rndis_filter_query_device_mac(struct rndis_device *dev)
 #define NWADR_STR "NetworkAddress"
 #define NWADR_STRLEN 14
 
-int rndis_filter_set_device_mac(struct net_device *ndev, char *mac)
+int rndis_filter_set_device_mac(struct netvsc_device *nvdev,
+				const char *mac)
 {
-	struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
 	struct rndis_device *rdev = nvdev->extension;
 	struct rndis_request *request;
 	struct rndis_set_request *set;
@@ -654,11 +650,8 @@ int rndis_filter_set_device_mac(struct net_device *ndev, char *mac)
 	wait_for_completion(&request->wait_event);
 
 	set_complete = &request->response_msg.msg.set_complete;
-	if (set_complete->status != RNDIS_STATUS_SUCCESS) {
-		netdev_err(ndev, "Fail to set MAC on host side:0x%x\n",
-			   set_complete->status);
-		ret = -EINVAL;
-	}
+	if (set_complete->status != RNDIS_STATUS_SUCCESS)
+		ret = -EIO;
 
 cleanup:
 	put_rndis_request(rdev, request);
@@ -791,27 +784,27 @@ int rndis_filter_set_rss_param(struct rndis_device *rdev,
 	return ret;
 }
 
-static int rndis_filter_query_device_link_status(struct rndis_device *dev)
+static int rndis_filter_query_device_link_status(struct rndis_device *dev,
+						 struct netvsc_device *net_device)
 {
 	u32 size = sizeof(u32);
 	u32 link_status;
-	int ret;
 
-	ret = rndis_filter_query_device(dev,
-				      RNDIS_OID_GEN_MEDIA_CONNECT_STATUS,
-				      &link_status, &size);
-
-	return ret;
+	return rndis_filter_query_device(dev, net_device,
+					 RNDIS_OID_GEN_MEDIA_CONNECT_STATUS,
+					 &link_status, &size);
 }
 
-static int rndis_filter_query_link_speed(struct rndis_device *dev)
+static int rndis_filter_query_link_speed(struct rndis_device *dev,
+					 struct netvsc_device *net_device)
 {
 	u32 size = sizeof(u32);
 	u32 link_speed;
 	struct net_device_context *ndc;
 	int ret;
 
-	ret = rndis_filter_query_device(dev, RNDIS_OID_GEN_LINK_SPEED,
+	ret = rndis_filter_query_device(dev, net_device,
+					RNDIS_OID_GEN_LINK_SPEED,
 					&link_speed, &size);
 
 	if (!ret) {
@@ -880,14 +873,14 @@ void rndis_filter_update(struct netvsc_device *nvdev)
 	schedule_work(&rdev->mcast_work);
 }
 
-static int rndis_filter_init_device(struct rndis_device *dev)
+static int rndis_filter_init_device(struct rndis_device *dev,
+				    struct netvsc_device *nvdev)
 {
 	struct rndis_request *request;
 	struct rndis_initialize_request *init;
 	struct rndis_initialize_complete *init_complete;
 	u32 status;
 	int ret;
-	struct netvsc_device *nvdev = net_device_to_netvsc_device(dev->ndev);
 
 	request = get_rndis_request(dev, RNDIS_MSG_INIT,
 			RNDIS_MESSAGE_SIZE(struct rndis_initialize_request));
@@ -1024,12 +1017,17 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
 {
 	struct net_device *ndev =
 		hv_get_drvdata(new_sc->primary_channel->device_obj);
-	struct netvsc_device *nvscdev = net_device_to_netvsc_device(ndev);
+	struct net_device_context *ndev_ctx = netdev_priv(ndev);
+	struct netvsc_device *nvscdev;
 	u16 chn_index = new_sc->offermsg.offer.sub_channel_index;
 	struct netvsc_channel *nvchan;
 	int ret;
 
-	if (chn_index >= nvscdev->num_chn)
+	/* This is safe because this callback only happens when
+	 * new device is being setup and waiting on the channel_init_wait.
+	 */
+	nvscdev = rcu_dereference_raw(ndev_ctx->nvdev);
+	if (!nvscdev || chn_index >= nvscdev->num_chn)
 		return;
 
 	nvchan = nvscdev->chan_table + chn_index;
@@ -1104,27 +1102,27 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 	rndis_device->ndev = net;
 
 	/* Send the rndis initialization message */
-	ret = rndis_filter_init_device(rndis_device);
+	ret = rndis_filter_init_device(rndis_device, net_device);
 	if (ret != 0)
 		goto err_dev_remv;
 
 	/* Get the MTU from the host */
 	size = sizeof(u32);
-	ret = rndis_filter_query_device(rndis_device,
+	ret = rndis_filter_query_device(rndis_device, net_device,
 					RNDIS_OID_GEN_MAXIMUM_FRAME_SIZE,
 					&mtu, &size);
 	if (ret == 0 && size == sizeof(u32) && mtu < net->mtu)
 		net->mtu = mtu;
 
 	/* Get the mac address */
-	ret = rndis_filter_query_device_mac(rndis_device);
+	ret = rndis_filter_query_device_mac(rndis_device, net_device);
 	if (ret != 0)
 		goto err_dev_remv;
 
 	memcpy(device_info->mac_adr, rndis_device->hw_mac_adr, ETH_ALEN);
 
 	/* Find HW offload capabilities */
-	ret = rndis_query_hwcaps(rndis_device, &hwcaps);
+	ret = rndis_query_hwcaps(rndis_device, net_device, &hwcaps);
 	if (ret != 0)
 		goto err_dev_remv;
 
@@ -1185,7 +1183,7 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 	if (ret)
 		goto err_dev_remv;
 
-	rndis_filter_query_device_link_status(rndis_device);
+	rndis_filter_query_device_link_status(rndis_device, net_device);
 
 	netdev_dbg(net, "Device MAC %pM link state %s\n",
 		   rndis_device->hw_mac_adr,
@@ -1194,11 +1192,11 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_5)
 		return net_device;
 
-	rndis_filter_query_link_speed(rndis_device);
+	rndis_filter_query_link_speed(rndis_device, net_device);
 
 	/* vRSS setup */
 	memset(&rsscap, 0, rsscap_size);
-	ret = rndis_filter_query_device(rndis_device,
+	ret = rndis_filter_query_device(rndis_device, net_device,
 					OID_GEN_RECEIVE_SCALE_CAPABILITIES,
 					&rsscap, &rsscap_size);
 	if (ret || rsscap.num_recv_que < 2)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 03/10] netvsc: don't print pointer value in error message
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 01/10] netvsc: fix return value for set_channels Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 02/10] netvsc: fix warnings reported by lockdep Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 04/10] netvsc: remove unnecessary indirection of page_buffer Stephen Hemminger
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, linux-doc, netdev

Using %p to print pointer to packet meta-data doesn't give any
good info, and exposes kernel memory offsets.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 94c00acac58a..f0c15e782ce0 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -805,8 +805,10 @@ static inline int netvsc_send_pkt(
 			ret = -ENOSPC;
 		}
 	} else {
-		netdev_err(ndev, "Unable to send packet %p ret %d\n",
-			   packet, ret);
+		netdev_err(ndev,
+			   "Unable to send packet pages %u len %u, ret %d\n",
+			   packet->page_buf_cnt, packet->total_data_buflen,
+			   ret);
 	}
 
 	return ret;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 04/10] netvsc: remove unnecessary indirection of page_buffer
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (2 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 03/10] netvsc: don't print pointer value in error message Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 05/10] netvsc: optimize receive completions Stephen Hemminger
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, netdev, linux-doc

The internal API was passing struct hv_page_buffer **
when only simple struct hv_page_buffer * was necessary
for passing an array.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h   |  2 +-
 drivers/net/hyperv/netvsc.c       | 21 ++++++++++-----------
 drivers/net/hyperv/netvsc_drv.c   | 10 ++++------
 drivers/net/hyperv/rndis_filter.c |  4 ++--
 4 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index fb62ea632914..9ca3ed692d73 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -190,7 +190,7 @@ void netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct net_device_context *ndc,
 		struct hv_netvsc_packet *packet,
 		struct rndis_message *rndis_msg,
-		struct hv_page_buffer **page_buffer,
+		struct hv_page_buffer *page_buffer,
 		struct sk_buff *skb);
 void netvsc_linkstatus_callback(struct hv_device *device_obj,
 				struct rndis_message *resp);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index f0c15e782ce0..d3c0b19f6d34 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -697,7 +697,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device *net_device,
 				   u32 pend_size,
 				   struct hv_netvsc_packet *packet,
 				   struct rndis_message *rndis_msg,
-				   struct hv_page_buffer **pb,
+				   struct hv_page_buffer *pb,
 				   struct sk_buff *skb)
 {
 	char *start = net_device->send_buf;
@@ -718,9 +718,9 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device *net_device,
 	}
 
 	for (i = 0; i < page_count; i++) {
-		char *src = phys_to_virt((*pb)[i].pfn << PAGE_SHIFT);
-		u32 offset = (*pb)[i].offset;
-		u32 len = (*pb)[i].len;
+		char *src = phys_to_virt(pb[i].pfn << PAGE_SHIFT);
+		u32 offset = pb[i].offset;
+		u32 len = pb[i].len;
 
 		memcpy(dest, (src + offset), len);
 		msg_size += len;
@@ -739,7 +739,7 @@ static inline int netvsc_send_pkt(
 	struct hv_device *device,
 	struct hv_netvsc_packet *packet,
 	struct netvsc_device *net_device,
-	struct hv_page_buffer **pb,
+	struct hv_page_buffer *pb,
 	struct sk_buff *skb)
 {
 	struct nvsp_message nvmsg;
@@ -750,7 +750,6 @@ static inline int netvsc_send_pkt(
 	struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
 	u64 req_id;
 	int ret;
-	struct hv_page_buffer *pgbuf;
 	u32 ring_avail = hv_ringbuf_avail_percent(&out_channel->outbound);
 
 	nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
@@ -776,11 +775,11 @@ static inline int netvsc_send_pkt(
 		return -ENODEV;
 
 	if (packet->page_buf_cnt) {
-		pgbuf = packet->cp_partial ? (*pb) +
-			packet->rmsg_pgcnt : (*pb);
+		if (packet->cp_partial)
+			pb += packet->rmsg_pgcnt;
+
 		ret = vmbus_sendpacket_pagebuffer_ctl(out_channel,
-						      pgbuf,
-						      packet->page_buf_cnt,
+						      pb, packet->page_buf_cnt,
 						      &nvmsg,
 						      sizeof(struct nvsp_message),
 						      req_id,
@@ -830,7 +829,7 @@ static inline void move_pkt_msd(struct hv_netvsc_packet **msd_send,
 int netvsc_send(struct net_device_context *ndev_ctx,
 		struct hv_netvsc_packet *packet,
 		struct rndis_message *rndis_msg,
-		struct hv_page_buffer **pb,
+		struct hv_page_buffer *pb,
 		struct sk_buff *skb)
 {
 	struct netvsc_device *net_device
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index a04f2efbbc25..8ff4cbf582cc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -282,9 +282,8 @@ static u32 fill_pg_buf(struct page *page, u32 offset, u32 len,
 
 static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
 			   struct hv_netvsc_packet *packet,
-			   struct hv_page_buffer **page_buf)
+			   struct hv_page_buffer *pb)
 {
-	struct hv_page_buffer *pb = *page_buf;
 	u32 slots_used = 0;
 	char *data = skb->data;
 	int frags = skb_shinfo(skb)->nr_frags;
@@ -359,8 +358,7 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
 	u32 rndis_msg_size;
 	struct rndis_per_packet_info *ppi;
 	u32 hash;
-	struct hv_page_buffer page_buf[MAX_PAGE_BUFFER_COUNT];
-	struct hv_page_buffer *pb = page_buf;
+	struct hv_page_buffer pb[MAX_PAGE_BUFFER_COUNT];
 
 	/* We can only transmit MAX_PAGE_BUFFER_COUNT number
 	 * of pages in a single packet. If skb is scattered around
@@ -503,12 +501,12 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
 	rndis_msg->msg_len += rndis_msg_size;
 	packet->total_data_buflen = rndis_msg->msg_len;
 	packet->page_buf_cnt = init_page_array(rndis_msg, rndis_msg_size,
-					       skb, packet, &pb);
+					       skb, packet, pb);
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);
 
-	ret = netvsc_send(net_device_ctx, packet, rndis_msg, &pb, skb);
+	ret = netvsc_send(net_device_ctx, packet, rndis_msg, pb, skb);
 	if (likely(ret == 0))
 		return NETDEV_TX_OK;
 
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index bf21ea92c743..d80e9e3f433e 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -214,11 +214,11 @@ static void dump_rndis_message(struct hv_device *hv_dev,
 static int rndis_filter_send_request(struct rndis_device *dev,
 				  struct rndis_request *req)
 {
-	int ret;
 	struct hv_netvsc_packet *packet;
 	struct hv_page_buffer page_buf[2];
 	struct hv_page_buffer *pb = page_buf;
 	struct net_device_context *net_device_ctx = netdev_priv(dev->ndev);
+	int ret;
 
 	/* Setup the packet to send it */
 	packet = &req->pkt;
@@ -245,7 +245,7 @@ static int rndis_filter_send_request(struct rndis_device *dev,
 	}
 
 	rcu_read_lock_bh();
-	ret = netvsc_send(net_device_ctx, packet, NULL, &pb, NULL);
+	ret = netvsc_send(net_device_ctx, packet, NULL, pb, NULL);
 	rcu_read_unlock_bh();
 
 	return ret;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 05/10] netvsc: optimize receive completions
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (3 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 04/10] netvsc: remove unnecessary indirection of page_buffer Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 06/10] netvsc: signal host if receive ring is emptied Stephen Hemminger
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, linux-doc, netdev

Optimize how receive completion ring are managed.
   * Allocate only as many slots as needed for all buffers from host
   * Allocate before setting up sub channel for better error detection
   * Don't need to keep copy of initial receive section message
   * Precompute the watermark for when receive flushing is needed
   * Replace division with conditional test
   * Replace atomic per-device variable with per-channel check.
   * Handle corner case where receive completion send
     fails if ring buffer to host is full.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h   |  14 +-
 drivers/net/hyperv/netvsc.c       | 267 ++++++++++++++++----------------------
 drivers/net/hyperv/rndis_filter.c |  20 +--
 3 files changed, 126 insertions(+), 175 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 9ca3ed692d73..f2cef5aaed1f 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -186,6 +186,7 @@ struct net_device_context;
 
 struct netvsc_device *netvsc_device_add(struct hv_device *device,
 					const struct netvsc_device_info *info);
+int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx);
 void netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct net_device_context *ndc,
 		struct hv_netvsc_packet *packet,
@@ -657,13 +658,10 @@ struct recv_comp_data {
 	u32 status;
 };
 
-/* Netvsc Receive Slots Max */
-#define NETVSC_RECVSLOT_MAX (NETVSC_RECEIVE_BUFFER_SIZE / ETH_DATA_LEN + 1)
-
 struct multi_recv_comp {
-	void *buf; /* queued receive completions */
-	u32 first; /* first data entry */
-	u32 next; /* next entry for writing */
+	struct recv_comp_data *slots;
+	u32 first;	/* first data entry */
+	u32 next;	/* next entry for writing */
 };
 
 struct netvsc_stats {
@@ -750,7 +748,7 @@ struct netvsc_device {
 	u32 recv_buf_size;
 	u32 recv_buf_gpadl_handle;
 	u32 recv_section_cnt;
-	struct nvsp_1_receive_buffer_section *recv_section;
+	u32 recv_completion_cnt;
 
 	/* Send buffer allocated by us */
 	void *send_buf;
@@ -778,8 +776,6 @@ struct netvsc_device {
 	u32 max_pkt; /* max number of pkt in one send, e.g. 8 */
 	u32 pkt_align; /* alignment bytes, e.g. 8 */
 
-	atomic_t num_outstanding_recvs;
-
 	atomic_t open_cnt;
 
 	struct netvsc_channel chan_table[VRSS_CHANNEL_MAX];
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index d3c0b19f6d34..4c709b454d34 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -72,9 +72,6 @@ static struct netvsc_device *alloc_net_device(void)
 	if (!net_device)
 		return NULL;
 
-	net_device->chan_table[0].mrc.buf
-		= vzalloc(NETVSC_RECVSLOT_MAX * sizeof(struct recv_comp_data));
-
 	init_waitqueue_head(&net_device->wait_drain);
 	net_device->destroy = false;
 	atomic_set(&net_device->open_cnt, 0);
@@ -92,7 +89,7 @@ static void free_netvsc_device(struct rcu_head *head)
 	int i;
 
 	for (i = 0; i < VRSS_CHANNEL_MAX; i++)
-		vfree(nvdev->chan_table[i].mrc.buf);
+		vfree(nvdev->chan_table[i].mrc.slots);
 
 	kfree(nvdev);
 }
@@ -171,12 +168,6 @@ static void netvsc_destroy_buf(struct hv_device *device)
 		net_device->recv_buf = NULL;
 	}
 
-	if (net_device->recv_section) {
-		net_device->recv_section_cnt = 0;
-		kfree(net_device->recv_section);
-		net_device->recv_section = NULL;
-	}
-
 	/* Deal with the send buffer we may have setup.
 	 * If we got a  send section size, it means we received a
 	 * NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE msg (ie sent
@@ -239,11 +230,26 @@ static void netvsc_destroy_buf(struct hv_device *device)
 	kfree(net_device->send_section_map);
 }
 
+int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx)
+{
+	struct netvsc_channel *nvchan = &net_device->chan_table[q_idx];
+	int node = cpu_to_node(nvchan->channel->target_cpu);
+	size_t size;
+
+	size = net_device->recv_completion_cnt * sizeof(struct recv_comp_data);
+	nvchan->mrc.slots = vzalloc_node(size, node);
+	if (!nvchan->mrc.slots)
+		nvchan->mrc.slots = vzalloc(size);
+
+	return nvchan->mrc.slots ? 0 : -ENOMEM;
+}
+
 static int netvsc_init_buf(struct hv_device *device,
 			   struct netvsc_device *net_device)
 {
 	int ret = 0;
 	struct nvsp_message *init_packet;
+	struct nvsp_1_message_send_receive_buffer_complete *resp;
 	struct net_device *ndev;
 	size_t map_words;
 	int node;
@@ -300,43 +306,41 @@ static int netvsc_init_buf(struct hv_device *device,
 	wait_for_completion(&net_device->channel_init_wait);
 
 	/* Check the response */
-	if (init_packet->msg.v1_msg.
-	    send_recv_buf_complete.status != NVSP_STAT_SUCCESS) {
-		netdev_err(ndev, "Unable to complete receive buffer "
-			   "initialization with NetVsp - status %d\n",
-			   init_packet->msg.v1_msg.
-			   send_recv_buf_complete.status);
+	resp = &init_packet->msg.v1_msg.send_recv_buf_complete;
+	if (resp->status != NVSP_STAT_SUCCESS) {
+		netdev_err(ndev,
+			   "Unable to complete receive buffer initialization with NetVsp - status %d\n",
+			   resp->status);
 		ret = -EINVAL;
 		goto cleanup;
 	}
 
 	/* Parse the response */
+	netdev_dbg(ndev, "Receive sections: %u sub_allocs: size %u count: %u\n",
+		   resp->num_sections, resp->sections[0].sub_alloc_size,
+		   resp->sections[0].num_sub_allocs);
 
-	net_device->recv_section_cnt = init_packet->msg.
-		v1_msg.send_recv_buf_complete.num_sections;
-
-	net_device->recv_section = kmemdup(
-		init_packet->msg.v1_msg.send_recv_buf_complete.sections,
-		net_device->recv_section_cnt *
-		sizeof(struct nvsp_1_receive_buffer_section),
-		GFP_KERNEL);
-	if (net_device->recv_section == NULL) {
-		ret = -EINVAL;
-		goto cleanup;
-	}
+	net_device->recv_section_cnt = resp->num_sections;
 
 	/*
 	 * For 1st release, there should only be 1 section that represents the
 	 * entire receive buffer
 	 */
 	if (net_device->recv_section_cnt != 1 ||
-	    net_device->recv_section->offset != 0) {
+	    resp->sections[0].offset != 0) {
 		ret = -EINVAL;
 		goto cleanup;
 	}
 
-	/* Now setup the send buffer.
-	 */
+	/* Setup receive completion ring */
+	net_device->recv_completion_cnt
+		= round_up(resp->sections[0].num_sub_allocs + 1,
+			   PAGE_SIZE / sizeof(u64));
+	ret = netvsc_alloc_recv_comp_ring(net_device, 0);
+	if (ret)
+		goto cleanup;
+
+	/* Now setup the send buffer. */
 	net_device->send_buf = vzalloc_node(net_device->send_buf_size, node);
 	if (!net_device->send_buf)
 		net_device->send_buf = vzalloc(net_device->send_buf_size);
@@ -951,130 +955,94 @@ int netvsc_send(struct net_device_context *ndev_ctx,
 	return ret;
 }
 
-static int netvsc_send_recv_completion(struct vmbus_channel *channel,
-				       u64 transaction_id, u32 status)
+/* Send pending recv completions */
+static int send_recv_completions(struct netvsc_channel *nvchan)
 {
-	struct nvsp_message recvcompMessage;
+	struct netvsc_device *nvdev = nvchan->net_device;
+	struct multi_recv_comp *mrc = &nvchan->mrc;
+	struct recv_comp_msg {
+		struct nvsp_message_header hdr;
+		u32 status;
+	}  __packed;
+	struct recv_comp_msg msg = {
+		.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE,
+	};
 	int ret;
 
-	recvcompMessage.hdr.msg_type =
-				NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE;
-
-	recvcompMessage.msg.v1_msg.send_rndis_pkt_complete.status = status;
-
-	/* Send the completion */
-	ret = vmbus_sendpacket(channel, &recvcompMessage,
-			       sizeof(struct nvsp_message_header) + sizeof(u32),
-			       transaction_id, VM_PKT_COMP, 0);
+	while (mrc->first != mrc->next) {
+		const struct recv_comp_data *rcd
+			= mrc->slots + mrc->first;
 
-	return ret;
-}
-
-static inline void count_recv_comp_slot(struct netvsc_device *nvdev, u16 q_idx,
-					u32 *filled, u32 *avail)
-{
-	struct multi_recv_comp *mrc = &nvdev->chan_table[q_idx].mrc;
-	u32 first = mrc->first;
-	u32 next = mrc->next;
+		msg.status = rcd->status;
+		ret = vmbus_sendpacket(nvchan->channel, &msg, sizeof(msg),
+				       rcd->tid, VM_PKT_COMP, 0);
+		if (unlikely(ret))
+			return ret;
 
-	*filled = (first > next) ? NETVSC_RECVSLOT_MAX - first + next :
-		  next - first;
-
-	*avail = NETVSC_RECVSLOT_MAX - *filled - 1;
-}
-
-/* Read the first filled slot, no change to index */
-static inline struct recv_comp_data *read_recv_comp_slot(struct netvsc_device
-							 *nvdev, u16 q_idx)
-{
-	struct multi_recv_comp *mrc = &nvdev->chan_table[q_idx].mrc;
-	u32 filled, avail;
-
-	if (unlikely(!mrc->buf))
-		return NULL;
+		if (++mrc->first == nvdev->recv_completion_cnt)
+			mrc->first = 0;
+	}
 
-	count_recv_comp_slot(nvdev, q_idx, &filled, &avail);
-	if (!filled)
-		return NULL;
+	/* receive completion ring has been emptied */
+	if (unlikely(nvdev->destroy))
+		wake_up(&nvdev->wait_drain);
 
-	return mrc->buf + mrc->first * sizeof(struct recv_comp_data);
+	return 0;
 }
 
-/* Put the first filled slot back to available pool */
-static inline void put_recv_comp_slot(struct netvsc_device *nvdev, u16 q_idx)
+/* Count how many receive completions are outstanding */
+static void recv_comp_slot_avail(const struct netvsc_device *nvdev,
+				 const struct multi_recv_comp *mrc,
+				 u32 *filled, u32 *avail)
 {
-	struct multi_recv_comp *mrc = &nvdev->chan_table[q_idx].mrc;
-	int num_recv;
+	u32 count = nvdev->recv_completion_cnt;
 
-	mrc->first = (mrc->first + 1) % NETVSC_RECVSLOT_MAX;
-
-	num_recv = atomic_dec_return(&nvdev->num_outstanding_recvs);
+	if (mrc->next >= mrc->first)
+		*filled = mrc->next - mrc->first;
+	else
+		*filled = (count - mrc->first) + mrc->next;
 
-	if (nvdev->destroy && num_recv == 0)
-		wake_up(&nvdev->wait_drain);
+	*avail = count - *filled - 1;
 }
 
-/* Check and send pending recv completions */
-static void netvsc_chk_recv_comp(struct netvsc_device *nvdev,
-				 struct vmbus_channel *channel, u16 q_idx)
+/* Add receive complete to ring to send to host. */
+static void enq_receive_complete(struct net_device *ndev,
+				 struct netvsc_device *nvdev, u16 q_idx,
+				 u64 tid, u32 status)
 {
+	struct netvsc_channel *nvchan = &nvdev->chan_table[q_idx];
+	struct multi_recv_comp *mrc = &nvchan->mrc;
 	struct recv_comp_data *rcd;
-	int ret;
-
-	while (true) {
-		rcd = read_recv_comp_slot(nvdev, q_idx);
-		if (!rcd)
-			break;
+	u32 filled, avail;
 
-		ret = netvsc_send_recv_completion(channel, rcd->tid,
-						  rcd->status);
-		if (ret)
-			break;
+	recv_comp_slot_avail(nvdev, mrc, &filled, &avail);
 
-		put_recv_comp_slot(nvdev, q_idx);
+	if (unlikely(filled > NAPI_POLL_WEIGHT)) {
+		send_recv_completions(nvchan);
+		recv_comp_slot_avail(nvdev, mrc, &filled, &avail);
 	}
-}
-
-#define NETVSC_RCD_WATERMARK 80
 
-/* Get next available slot */
-static inline struct recv_comp_data *get_recv_comp_slot(
-	struct netvsc_device *nvdev, struct vmbus_channel *channel, u16 q_idx)
-{
-	struct multi_recv_comp *mrc = &nvdev->chan_table[q_idx].mrc;
-	u32 filled, avail, next;
-	struct recv_comp_data *rcd;
-
-	if (unlikely(!nvdev->recv_section))
-		return NULL;
-
-	if (unlikely(!mrc->buf))
-		return NULL;
-
-	if (atomic_read(&nvdev->num_outstanding_recvs) >
-	    nvdev->recv_section->num_sub_allocs * NETVSC_RCD_WATERMARK / 100)
-		netvsc_chk_recv_comp(nvdev, channel, q_idx);
-
-	count_recv_comp_slot(nvdev, q_idx, &filled, &avail);
-	if (!avail)
-		return NULL;
-
-	next = mrc->next;
-	rcd = mrc->buf + next * sizeof(struct recv_comp_data);
-	mrc->next = (next + 1) % NETVSC_RECVSLOT_MAX;
+	if (unlikely(!avail)) {
+		netdev_err(ndev, "Recv_comp full buf q:%hd, tid:%llx\n",
+			   q_idx, tid);
+		return;
+	}
 
-	atomic_inc(&nvdev->num_outstanding_recvs);
+	rcd = mrc->slots + mrc->next;
+	rcd->tid = tid;
+	rcd->status = status;
 
-	return rcd;
+	if (++mrc->next == nvdev->recv_completion_cnt)
+		mrc->next = 0;
 }
 
 static int netvsc_receive(struct net_device *ndev,
-		   struct netvsc_device *net_device,
-		   struct net_device_context *net_device_ctx,
-		   struct hv_device *device,
-		   struct vmbus_channel *channel,
-		   const struct vmpacket_descriptor *desc,
-		   struct nvsp_message *nvsp)
+			  struct netvsc_device *net_device,
+			  struct net_device_context *net_device_ctx,
+			  struct hv_device *device,
+			  struct vmbus_channel *channel,
+			  const struct vmpacket_descriptor *desc,
+			  struct nvsp_message *nvsp)
 {
 	const struct vmtransfer_page_packet_header *vmxferpage_packet
 		= container_of(desc, const struct vmtransfer_page_packet_header, d);
@@ -1083,7 +1051,6 @@ static int netvsc_receive(struct net_device *ndev,
 	u32 status = NVSP_STAT_SUCCESS;
 	int i;
 	int count = 0;
-	int ret;
 
 	/* Make sure this is a valid nvsp packet */
 	if (unlikely(nvsp->hdr.msg_type != NVSP_MSG1_TYPE_SEND_RNDIS_PKT)) {
@@ -1114,25 +1081,9 @@ static int netvsc_receive(struct net_device *ndev,
 					      channel, data, buflen);
 	}
 
-	if (net_device->chan_table[q_idx].mrc.buf) {
-		struct recv_comp_data *rcd;
+	enq_receive_complete(ndev, net_device, q_idx,
+			     vmxferpage_packet->d.trans_id, status);
 
-		rcd = get_recv_comp_slot(net_device, channel, q_idx);
-		if (rcd) {
-			rcd->tid = vmxferpage_packet->d.trans_id;
-			rcd->status = status;
-		} else {
-			netdev_err(ndev, "Recv_comp full buf q:%hd, tid:%llx\n",
-				   q_idx, vmxferpage_packet->d.trans_id);
-		}
-	} else {
-		ret = netvsc_send_recv_completion(channel,
-						  vmxferpage_packet->d.trans_id,
-						  status);
-		if (ret)
-			netdev_err(ndev, "Recv_comp q:%hd, tid:%llx, err:%d\n",
-				   q_idx, vmxferpage_packet->d.trans_id, ret);
-	}
 	return count;
 }
 
@@ -1231,7 +1182,6 @@ int netvsc_poll(struct napi_struct *napi, int budget)
 	struct netvsc_device *net_device = nvchan->net_device;
 	struct vmbus_channel *channel = nvchan->channel;
 	struct hv_device *device = netvsc_channel_to_device(channel);
-	u16 q_idx = channel->offermsg.offer.sub_channel_index;
 	struct net_device *ndev = hv_get_drvdata(device);
 	int work_done = 0;
 
@@ -1245,17 +1195,18 @@ int netvsc_poll(struct napi_struct *napi, int budget)
 		nvchan->desc = hv_pkt_iter_next(channel, nvchan->desc);
 	}
 
-	/* If receive ring was exhausted
-	 * and not doing busy poll
-	 * then re-enable host interrupts
-	 *  and reschedule if ring is not empty.
+	/* If send of  pending receive completions suceeded
+	 *   and did not exhaust NAPI budget
+	 *   and not doing busy poll
+	 * then reschedule if more data has arrived from host
 	 */
-	if (work_done < budget &&
+	if (send_recv_completions(nvchan) == 0 &&
+	    work_done < budget &&
 	    napi_complete_done(napi, work_done) &&
-	    hv_end_read(&channel->inbound) != 0)
+	    hv_end_read(&channel->inbound)) {
+		hv_begin_read(&channel->inbound);
 		napi_reschedule(napi);
-
-	netvsc_chk_recv_comp(net_device, channel, q_idx);
+	}
 
 	/* Driver may overshoot since multiple packets per descriptor */
 	return min(work_done, budget);
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index d80e9e3f433e..44165fe328a4 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -928,12 +928,12 @@ static bool netvsc_device_idle(const struct netvsc_device *nvdev)
 {
 	int i;
 
-	if (atomic_read(&nvdev->num_outstanding_recvs) > 0)
-		return false;
-
 	for (i = 0; i < nvdev->num_chn; i++) {
 		const struct netvsc_channel *nvchan = &nvdev->chan_table[i];
 
+		if (nvchan->mrc.first != nvchan->mrc.next)
+			return false;
+
 		if (atomic_read(&nvchan->queue_sends) > 0)
 			return false;
 	}
@@ -1031,11 +1031,6 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
 		return;
 
 	nvchan = nvscdev->chan_table + chn_index;
-	nvchan->mrc.buf
-		= vzalloc(NETVSC_RECVSLOT_MAX * sizeof(struct recv_comp_data));
-
-	if (!nvchan->mrc.buf)
-		return;
 
 	/* Because the device uses NAPI, all the interrupt batching and
 	 * control is done via Net softirq, not the channel handling
@@ -1225,6 +1220,15 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 	if (num_rss_qs == 0)
 		return net_device;
 
+	for (i = 1; i < net_device->num_chn; i++) {
+		ret = netvsc_alloc_recv_comp_ring(net_device, i);
+		if (ret) {
+			while (--i != 0)
+				vfree(net_device->chan_table[i].mrc.slots);
+			goto out;
+		}
+	}
+
 	refcount_set(&net_device->sc_offered, num_rss_qs);
 	vmbus_set_sc_create_callback(dev->channel, netvsc_sc_open);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 06/10] netvsc: signal host if receive ring is emptied
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (4 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 05/10] netvsc: optimize receive completions Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 07/10] netvsc: allow smaller send/recv buffer size Stephen Hemminger
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, netdev, linux-doc

Latency improvement related to NAPI conversion.
If all packets are processed from receive ring then need
to signal host.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 4c709b454d34..c5e6f7fc4f2b 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1195,10 +1195,15 @@ int netvsc_poll(struct napi_struct *napi, int budget)
 		nvchan->desc = hv_pkt_iter_next(channel, nvchan->desc);
 	}
 
-	/* If send of  pending receive completions suceeded
-	 *   and did not exhaust NAPI budget
+	/* if ring is empty, signal host */
+	if (!nvchan->desc)
+		hv_pkt_iter_close(channel);
+
+	/* If send of pending receive completions suceeded
+	 *   and did not exhaust NAPI budget this time
 	 *   and not doing busy poll
-	 * then reschedule if more data has arrived from host
+	 * then re-enable host interrupts
+	 *     and reschedule if ring is not empty.
 	 */
 	if (send_recv_completions(nvchan) == 0 &&
 	    work_done < budget &&
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 07/10] netvsc: allow smaller send/recv buffer size
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (5 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 06/10] netvsc: signal host if receive ring is emptied Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 08/10] netvsc: transparent VF management Stephen Hemminger
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, linux-doc, netdev

The default value of send and receive buffer area for host DMA
is much larger than it needs to be. Experimentation shows that
4M receive and 1M send is sufficient.

Make the size a module parameter so that it can be adjusted
as needed for testing or special needs.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |  2 ++
 drivers/net/hyperv/netvsc.c     | 18 ++++++++++++------
 drivers/net/hyperv/netvsc_drv.c | 31 ++++++++++++++++++++++++++++++-
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f2cef5aaed1f..4779422868cf 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -148,6 +148,8 @@ struct netvsc_device_info {
 	unsigned char mac_adr[ETH_ALEN];
 	int  ring_size;
 	u32  num_chn;
+	u32  recv_buf_size;
+	u32  send_buf_size;
 };
 
 enum rndis_device_state {
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index c5e6f7fc4f2b..dd64227c736c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -482,14 +482,16 @@ static int negotiate_nvsp_ver(struct hv_device *device,
 }
 
 static int netvsc_connect_vsp(struct hv_device *device,
-			      struct netvsc_device *net_device)
+			      struct netvsc_device *net_device,
+			      const struct netvsc_device_info *device_info)
 {
 	const u32 ver_list[] = {
 		NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2,
 		NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5
 	};
 	struct nvsp_message *init_packet;
-	int ndis_version, i, ret;
+	u32 max_recv_buf_size, ndis_version;
+	int i, ret;
 
 	init_packet = &net_device->channel_init_pkt;
 
@@ -534,10 +536,14 @@ static int netvsc_connect_vsp(struct hv_device *device,
 
 	/* Post the big receive buffer to NetVSP */
 	if (net_device->nvsp_version <= NVSP_PROTOCOL_VERSION_2)
-		net_device->recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE_LEGACY;
+		max_recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE_LEGACY;
 	else
-		net_device->recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE;
-	net_device->send_buf_size = NETVSC_SEND_BUFFER_SIZE;
+		max_recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE;
+
+	net_device->recv_buf_size = min(max_recv_buf_size,
+					device_info->recv_buf_size);
+	net_device->send_buf_size = min_t(u32, NETVSC_SEND_BUFFER_SIZE,
+					  device_info->send_buf_size);
 
 	ret = netvsc_init_buf(device, net_device);
 
@@ -1302,7 +1308,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
 	rcu_assign_pointer(net_device_ctx->nvdev, net_device);
 
 	/* Connect with the NetVsp */
-	ret = netvsc_connect_vsp(device, net_device);
+	ret = netvsc_connect_vsp(device, net_device, device_info);
 	if (ret != 0) {
 		netdev_err(ndev,
 			"unable to connect to NetVSP - %d\n", ret);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 8ff4cbf582cc..b662c0198807 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -44,13 +44,23 @@
 
 #include "hyperv_net.h"
 
-#define RING_SIZE_MIN 64
+#define RING_SIZE_MIN	 64
+#define RECV_BUFFER_MIN	 16
+#define SEND_BUFFER_MIN	 4
 #define LINKCHANGE_INT (2 * HZ)
 
 static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
 
+static unsigned int recv_buffer_size = (4 * 1024 * 1024) / PAGE_SIZE;
+module_param(recv_buffer_size, uint, S_IRUGO);
+MODULE_PARM_DESC(recv_buffer_size, "Receive buffer size (# of pages)");
+
+static unsigned int send_buffer_size = (1024 * 1024) / PAGE_SIZE;
+module_param(send_buffer_size, uint, S_IRUGO);
+MODULE_PARM_DESC(send_buffer_size, "Send buffer size (# of pages)");
+
 static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
 				NETIF_MSG_LINK | NETIF_MSG_IFUP |
 				NETIF_MSG_IFDOWN | NETIF_MSG_RX_ERR |
@@ -751,6 +761,8 @@ static int netvsc_set_channels(struct net_device *net,
 	memset(&device_info, 0, sizeof(device_info));
 	device_info.num_chn = count;
 	device_info.ring_size = ring_size;
+	device_info.send_buf_size = nvdev->send_buf_size;
+	device_info.recv_buf_size = nvdev->recv_buf_size;
 
 	nvdev = rndis_filter_device_add(dev, &device_info);
 	if (!IS_ERR(nvdev)) {
@@ -848,6 +860,8 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 	memset(&device_info, 0, sizeof(device_info));
 	device_info.ring_size = ring_size;
 	device_info.num_chn = nvdev->num_chn;
+	device_info.send_buf_size = nvdev->send_buf_size;
+	device_info.recv_buf_size = nvdev->recv_buf_size;
 
 	rndis_filter_device_remove(hdev, nvdev);
 
@@ -1518,6 +1532,8 @@ static int netvsc_probe(struct hv_device *dev,
 	memset(&device_info, 0, sizeof(device_info));
 	device_info.ring_size = ring_size;
 	device_info.num_chn = VRSS_CHANNEL_DEFAULT;
+	device_info.send_buf_size = send_buffer_size * PAGE_SIZE;
+	device_info.recv_buf_size = recv_buffer_size * PAGE_SIZE;
 
 	nvdev = rndis_filter_device_add(dev, &device_info);
 	if (IS_ERR(nvdev)) {
@@ -1669,6 +1685,19 @@ static int __init netvsc_drv_init(void)
 		pr_info("Increased ring_size to %d (min allowed)\n",
 			ring_size);
 	}
+
+	if (recv_buffer_size < RECV_BUFFER_MIN) {
+		recv_buffer_size = RECV_BUFFER_MIN;
+		pr_notice("Increased receive buffer size to %u (min allowed)\n",
+			  recv_buffer_size);
+	}
+
+	if (send_buffer_size < SEND_BUFFER_MIN) {
+		send_buffer_size = SEND_BUFFER_MIN;
+		pr_notice("Increased receive buffer size to %u (min allowed)\n",
+			  send_buffer_size);
+	}
+
 	ret = vmbus_driver_register(&netvsc_drv);
 
 	if (ret)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 08/10] netvsc: transparent VF management
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (6 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 07/10] netvsc: allow smaller send/recv buffer size Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 09/10] netvsc: add documentation Stephen Hemminger
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, linux-doc, netdev

This patch implements transparent fail over from synthetic NIC to
SR-IOV virtual function NIC in Hyper-V environment. It is a better
alternative to using bonding as is done now. Instead, the receive and
transmit fail over is done internally inside the driver.

Using bonding driver has lots of issues because it depends on the
script being run early enough in the boot process and with sufficient
information to make the association. This patch moves all that
functionality into the kernel.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |  12 ++
 drivers/net/hyperv/netvsc_drv.c | 418 +++++++++++++++++++++++++++++++---------
 2 files changed, 341 insertions(+), 89 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 4779422868cf..8ff4ff0d77dd 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -682,6 +682,15 @@ struct netvsc_ethtool_stats {
 	unsigned long tx_busy;
 };
 
+struct netvsc_vf_pcpu_stats {
+	u64     rx_packets;
+	u64     rx_bytes;
+	u64     tx_packets;
+	u64     tx_bytes;
+	struct u64_stats_sync   syncp;
+	u32	tx_dropped;
+};
+
 struct netvsc_reconfig {
 	struct list_head list;
 	u32 event;
@@ -715,6 +724,9 @@ struct net_device_context {
 
 	/* State to manage the associated VF interface. */
 	struct net_device __rcu *vf_netdev;
+	struct netvsc_vf_pcpu_stats __percpu *vf_stats;
+	struct work_struct vf_takeover;
+	struct work_struct vf_notify;
 
 	/* 1: allocated, serial number is valid. 0: not allocated */
 	u32 vf_alloc;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b662c0198807..73edc3db97b2 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -34,6 +34,7 @@
 #include <linux/in.h>
 #include <linux/slab.h>
 #include <linux/rtnetlink.h>
+#include <linux/netpoll.h>
 
 #include <net/arp.h>
 #include <net/route.h>
@@ -81,6 +82,7 @@ static void netvsc_set_multicast_list(struct net_device *net)
 static int netvsc_open(struct net_device *net)
 {
 	struct net_device_context *ndev_ctx = netdev_priv(net);
+	struct net_device *vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
 	struct netvsc_device *nvdev = rtnl_dereference(ndev_ctx->nvdev);
 	struct rndis_device *rdev;
 	int ret = 0;
@@ -97,15 +99,29 @@ static int netvsc_open(struct net_device *net)
 	netif_tx_wake_all_queues(net);
 
 	rdev = nvdev->extension;
-	if (!rdev->link_state && !ndev_ctx->datapath)
+
+	if (!rdev->link_state)
 		netif_carrier_on(net);
 
-	return ret;
+	if (vf_netdev) {
+		/* Setting synthetic device up transparently sets
+		 * slave as up. If open fails, then slave will be
+		 * still be offline (and not used).
+		 */
+		ret = dev_open(vf_netdev);
+		if (ret)
+			netdev_warn(net,
+				    "unable to open slave: %s: %d\n",
+				    vf_netdev->name, ret);
+	}
+	return 0;
 }
 
 static int netvsc_close(struct net_device *net)
 {
 	struct net_device_context *net_device_ctx = netdev_priv(net);
+	struct net_device *vf_netdev
+		= rtnl_dereference(net_device_ctx->vf_netdev);
 	struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
 	int ret;
 	u32 aread, i, msec = 10, retry = 0, retry_max = 20;
@@ -151,6 +167,9 @@ static int netvsc_close(struct net_device *net)
 		ret = -ETIMEDOUT;
 	}
 
+	if (vf_netdev)
+		dev_close(vf_netdev);
+
 	return ret;
 }
 
@@ -234,13 +253,11 @@ static inline int netvsc_get_tx_queue(struct net_device *ndev,
  *
  * TODO support XPS - but get_xps_queue not exported
  */
-static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
-			void *accel_priv, select_queue_fallback_t fallback)
+static u16 netvsc_pick_tx(struct net_device *ndev, struct sk_buff *skb)
 {
-	unsigned int num_tx_queues = ndev->real_num_tx_queues;
 	int q_idx = sk_tx_queue_get(skb->sk);
 
-	if (q_idx < 0 || skb->ooo_okay) {
+	if (q_idx < 0 || skb->ooo_okay || q_idx >= ndev->real_num_tx_queues) {
 		/* If forwarding a packet, we use the recorded queue when
 		 * available for better cache locality.
 		 */
@@ -250,12 +267,33 @@ static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
 			q_idx = netvsc_get_tx_queue(ndev, skb, q_idx);
 	}
 
-	while (unlikely(q_idx >= num_tx_queues))
-		q_idx -= num_tx_queues;
-
 	return q_idx;
 }
 
+static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
+			       void *accel_priv,
+			       select_queue_fallback_t fallback)
+{
+	struct net_device_context *ndc = netdev_priv(ndev);
+	struct net_device *vf_netdev;
+	u16 txq;
+
+	rcu_read_lock();
+	vf_netdev = rcu_dereference(ndc->vf_netdev);
+	if (vf_netdev) {
+		txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : 0;
+		qdisc_skb_cb(skb)->slave_dev_queue_mapping = skb->queue_mapping;
+	} else {
+		txq = netvsc_pick_tx(ndev, skb);
+	}
+	rcu_read_unlock();
+
+	while (unlikely(txq >= ndev->real_num_tx_queues))
+		txq -= ndev->real_num_tx_queues;
+
+	return txq;
+}
+
 static u32 fill_pg_buf(struct page *page, u32 offset, u32 len,
 			struct hv_page_buffer *pb)
 {
@@ -357,6 +395,32 @@ static u32 net_checksum_info(struct sk_buff *skb)
 	return TRANSPORT_INFO_NOT_IP;
 }
 
+/* Send skb on the slave VF device. */
+static int netvsc_vf_xmit(struct net_device *net, struct net_device *vf_netdev,
+			  struct sk_buff *skb)
+{
+	struct net_device_context *ndev_ctx = netdev_priv(net);
+	int rc;
+
+	skb->dev = vf_netdev;
+	skb->queue_mapping = qdisc_skb_cb(skb)->slave_dev_queue_mapping;
+
+	rc = dev_queue_xmit(skb);
+	if (likely(rc == NET_XMIT_SUCCESS || rc == NET_XMIT_CN)) {
+		struct netvsc_vf_pcpu_stats *pcpu_stats
+			= this_cpu_ptr(ndev_ctx->vf_stats);
+
+		u64_stats_update_begin(&pcpu_stats->syncp);
+		pcpu_stats->tx_packets++;
+		pcpu_stats->tx_bytes += skb->len;
+		u64_stats_update_end(&pcpu_stats->syncp);
+	} else {
+		this_cpu_inc(ndev_ctx->vf_stats->tx_dropped);
+	}
+
+	return rc;
+}
+
 static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
 {
 	struct net_device_context *net_device_ctx = netdev_priv(net);
@@ -365,11 +429,20 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
 	unsigned int num_data_pgs;
 	struct rndis_message *rndis_msg;
 	struct rndis_packet *rndis_pkt;
+	struct net_device *vf_netdev;
 	u32 rndis_msg_size;
 	struct rndis_per_packet_info *ppi;
 	u32 hash;
 	struct hv_page_buffer pb[MAX_PAGE_BUFFER_COUNT];
 
+	/* if VF is present and up then redirect packets
+	 * already called with rcu_read_lock_bh
+	 */
+	vf_netdev = rcu_dereference_bh(net_device_ctx->vf_netdev);
+	if (vf_netdev && netif_running(vf_netdev) &&
+	    !netpoll_tx_running(net))
+		return netvsc_vf_xmit(net, vf_netdev, skb);
+
 	/* We can only transmit MAX_PAGE_BUFFER_COUNT number
 	 * of pages in a single packet. If skb is scattered around
 	 * more pages we try linearizing it.
@@ -645,29 +718,18 @@ int netvsc_recv_callback(struct net_device *net,
 	struct netvsc_device *net_device;
 	u16 q_idx = channel->offermsg.offer.sub_channel_index;
 	struct netvsc_channel *nvchan;
-	struct net_device *vf_netdev;
 	struct sk_buff *skb;
 	struct netvsc_stats *rx_stats;
 
 	if (net->reg_state != NETREG_REGISTERED)
 		return NVSP_STAT_FAIL;
 
-	/*
-	 * If necessary, inject this packet into the VF interface.
-	 * On Hyper-V, multicast and brodcast packets are only delivered
-	 * to the synthetic interface (after subjecting these to
-	 * policy filters on the host). Deliver these via the VF
-	 * interface in the guest.
-	 */
 	rcu_read_lock();
 	net_device = rcu_dereference(net_device_ctx->nvdev);
 	if (unlikely(!net_device))
 		goto drop;
 
 	nvchan = &net_device->chan_table[q_idx];
-	vf_netdev = rcu_dereference(net_device_ctx->vf_netdev);
-	if (vf_netdev && (vf_netdev->flags & IFF_UP))
-		net = vf_netdev;
 
 	/* Allocate a skb - TODO direct I/O to pages? */
 	skb = netvsc_alloc_recv_skb(net, &nvchan->napi,
@@ -679,8 +741,7 @@ int netvsc_recv_callback(struct net_device *net,
 		return NVSP_STAT_FAIL;
 	}
 
-	if (net != vf_netdev)
-		skb_record_rx_queue(skb, q_idx);
+	skb_record_rx_queue(skb, q_idx);
 
 	/*
 	 * Even if injecting the packet, record the statistics
@@ -842,6 +903,7 @@ static int netvsc_set_link_ksettings(struct net_device *dev,
 static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 {
 	struct net_device_context *ndevctx = netdev_priv(ndev);
+	struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev);
 	struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
 	struct hv_device *hdev = ndevctx->device_ctx;
 	int orig_mtu = ndev->mtu;
@@ -852,6 +914,13 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 	if (!nvdev || nvdev->destroy)
 		return -ENODEV;
 
+	/* Change MTU of underlying VF netdev first. */
+	if (vf_netdev) {
+		ret = dev_set_mtu(vf_netdev, mtu);
+		if (ret)
+			return ret;
+	}
+
 	netif_device_detach(ndev);
 	was_opened = rndis_filter_opened(nvdev);
 	if (was_opened)
@@ -874,6 +943,9 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 		/* Attempt rollback to original MTU */
 		ndev->mtu = orig_mtu;
 		rndis_filter_device_add(hdev, &device_info);
+
+		if (vf_netdev)
+			dev_set_mtu(vf_netdev, orig_mtu);
 	}
 
 	if (was_opened)
@@ -887,16 +959,56 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 	return ret;
 }
 
+static void netvsc_get_vf_stats(struct net_device *net,
+				struct netvsc_vf_pcpu_stats *tot)
+{
+	struct net_device_context *ndev_ctx = netdev_priv(net);
+	int i;
+
+	memset(tot, 0, sizeof(*tot));
+
+	for_each_possible_cpu(i) {
+		const struct netvsc_vf_pcpu_stats *stats
+			= per_cpu_ptr(ndev_ctx->vf_stats, i);
+		u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
+		unsigned int start;
+
+		do {
+			start = u64_stats_fetch_begin_irq(&stats->syncp);
+			rx_packets = stats->rx_packets;
+			tx_packets = stats->tx_packets;
+			rx_bytes = stats->rx_bytes;
+			tx_bytes = stats->tx_bytes;
+		} while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+		tot->rx_packets += rx_packets;
+		tot->tx_packets += tx_packets;
+		tot->rx_bytes   += rx_bytes;
+		tot->tx_bytes   += tx_bytes;
+		tot->tx_dropped += stats->tx_dropped;
+	}
+}
+
 static void netvsc_get_stats64(struct net_device *net,
 			       struct rtnl_link_stats64 *t)
 {
 	struct net_device_context *ndev_ctx = netdev_priv(net);
 	struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);
-	int i;
+	struct netvsc_vf_pcpu_stats vf_tot;
+		int i;
 
 	if (!nvdev)
 		return;
 
+	netdev_stats_to_stats64(t, &net->stats);
+
+	netvsc_get_vf_stats(net, &vf_tot);
+	t->rx_packets += vf_tot.rx_packets;
+	t->tx_packets += vf_tot.tx_packets;
+	t->rx_bytes   += vf_tot.rx_bytes;
+	t->tx_bytes   += vf_tot.tx_bytes;
+	t->tx_dropped += vf_tot.tx_dropped;
+
 	for (i = 0; i < nvdev->num_chn; i++) {
 		const struct netvsc_channel *nvchan = &nvdev->chan_table[i];
 		const struct netvsc_stats *stats;
@@ -925,12 +1037,6 @@ static void netvsc_get_stats64(struct net_device *net,
 		t->rx_packets	+= packets;
 		t->multicast	+= multicast;
 	}
-
-	t->tx_dropped	= net->stats.tx_dropped;
-	t->tx_errors	= net->stats.tx_errors;
-
-	t->rx_dropped	= net->stats.rx_dropped;
-	t->rx_errors	= net->stats.rx_errors;
 }
 
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
@@ -971,9 +1077,16 @@ static const struct {
 	{ "tx_no_space",  offsetof(struct netvsc_ethtool_stats, tx_no_space) },
 	{ "tx_too_big",	  offsetof(struct netvsc_ethtool_stats, tx_too_big) },
 	{ "tx_busy",	  offsetof(struct netvsc_ethtool_stats, tx_busy) },
+}, vf_stats[] = {
+	{ "vf_rx_packets", offsetof(struct netvsc_vf_pcpu_stats, rx_packets) },
+	{ "vf_rx_bytes",   offsetof(struct netvsc_vf_pcpu_stats, rx_bytes) },
+	{ "vf_tx_packets", offsetof(struct netvsc_vf_pcpu_stats, tx_packets) },
+	{ "vf_tx_bytes",   offsetof(struct netvsc_vf_pcpu_stats, tx_bytes) },
+	{ "vf_tx_dropped", offsetof(struct netvsc_vf_pcpu_stats, tx_dropped) },
 };
 
 #define NETVSC_GLOBAL_STATS_LEN	ARRAY_SIZE(netvsc_stats)
+#define NETVSC_VF_STATS_LEN	ARRAY_SIZE(vf_stats)
 
 /* 4 statistics per queue (rx/tx packets/bytes) */
 #define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 4)
@@ -988,7 +1101,9 @@ static int netvsc_get_sset_count(struct net_device *dev, int string_set)
 
 	switch (string_set) {
 	case ETH_SS_STATS:
-		return NETVSC_GLOBAL_STATS_LEN + NETVSC_QUEUE_STATS_LEN(nvdev);
+		return NETVSC_GLOBAL_STATS_LEN
+			+ NETVSC_VF_STATS_LEN
+			+ NETVSC_QUEUE_STATS_LEN(nvdev);
 	default:
 		return -EINVAL;
 	}
@@ -1001,6 +1116,7 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
 	struct netvsc_device *nvdev = rtnl_dereference(ndc->nvdev);
 	const void *nds = &ndc->eth_stats;
 	const struct netvsc_stats *qstats;
+	struct netvsc_vf_pcpu_stats sum;
 	unsigned int start;
 	u64 packets, bytes;
 	int i, j;
@@ -1011,6 +1127,10 @@ static void netvsc_get_ethtool_stats(struct net_device *dev,
 	for (i = 0; i < NETVSC_GLOBAL_STATS_LEN; i++)
 		data[i] = *(unsigned long *)(nds + netvsc_stats[i].offset);
 
+	netvsc_get_vf_stats(dev, &sum);
+	for (j = 0; j < NETVSC_VF_STATS_LEN; j++)
+		data[i++] = *(u64 *)((void *)&sum + vf_stats[j].offset);
+
 	for (j = 0; j < nvdev->num_chn; j++) {
 		qstats = &nvdev->chan_table[j].tx_stats;
 
@@ -1045,11 +1165,16 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 
 	switch (stringset) {
 	case ETH_SS_STATS:
-		for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++)
-			memcpy(p + i * ETH_GSTRING_LEN,
-			       netvsc_stats[i].name, ETH_GSTRING_LEN);
+		for (i = 0; i < ARRAY_SIZE(netvsc_stats); i++) {
+			memcpy(p, netvsc_stats[i].name, ETH_GSTRING_LEN);
+			p += ETH_GSTRING_LEN;
+		}
+
+		for (i = 0; i < ARRAY_SIZE(vf_stats); i++) {
+			memcpy(p, vf_stats[i].name, ETH_GSTRING_LEN);
+			p += ETH_GSTRING_LEN;
+		}
 
-		p += i * ETH_GSTRING_LEN;
 		for (i = 0; i < nvdev->num_chn; i++) {
 			sprintf(p, "tx_queue_%u_packets", i);
 			p += ETH_GSTRING_LEN;
@@ -1289,8 +1414,7 @@ static void netvsc_link_change(struct work_struct *w)
 	case RNDIS_STATUS_MEDIA_CONNECT:
 		if (rdev->link_state) {
 			rdev->link_state = false;
-			if (!ndev_ctx->datapath)
-				netif_carrier_on(net);
+			netif_carrier_on(net);
 			netif_tx_wake_all_queues(net);
 		} else {
 			notify = true;
@@ -1377,6 +1501,104 @@ static struct net_device *get_netvsc_byref(struct net_device *vf_netdev)
 	return NULL;
 }
 
+/* Called when VF is injecting data into network stack.
+ * Change the associated network device from VF to netvsc.
+ * note: already called with rcu_read_lock
+ */
+static rx_handler_result_t netvsc_vf_handle_frame(struct sk_buff **pskb)
+{
+	struct sk_buff *skb = *pskb;
+	struct net_device *ndev = rcu_dereference(skb->dev->rx_handler_data);
+	struct net_device_context *ndev_ctx = netdev_priv(ndev);
+	struct netvsc_vf_pcpu_stats *pcpu_stats
+		 = this_cpu_ptr(ndev_ctx->vf_stats);
+
+	skb->dev = ndev;
+
+	u64_stats_update_begin(&pcpu_stats->syncp);
+	pcpu_stats->rx_packets++;
+	pcpu_stats->rx_bytes += skb->len;
+	u64_stats_update_end(&pcpu_stats->syncp);
+
+	return RX_HANDLER_ANOTHER;
+}
+
+static int netvsc_vf_join(struct net_device *vf_netdev,
+			  struct net_device *ndev)
+{
+	struct net_device_context *ndev_ctx = netdev_priv(ndev);
+	int ret;
+
+	ret = netdev_rx_handler_register(vf_netdev,
+					 netvsc_vf_handle_frame, ndev);
+	if (ret != 0) {
+		netdev_err(vf_netdev,
+			   "can not register netvsc VF receive handler (err = %d)\n",
+			   ret);
+		goto rx_handler_failed;
+	}
+
+	ret = netdev_upper_dev_link(vf_netdev, ndev);
+	if (ret != 0) {
+		netdev_err(vf_netdev,
+			   "can not set master device %s (err = %d)\n",
+			   ndev->name, ret);
+		goto upper_link_failed;
+	}
+
+	/* set slave flag before open to prevent IPv6 addrconf */
+	vf_netdev->flags |= IFF_SLAVE;
+
+	schedule_work(&ndev_ctx->vf_takeover);
+
+	netdev_info(vf_netdev, "joined to %s\n", ndev->name);
+	return 0;
+
+upper_link_failed:
+	netdev_rx_handler_unregister(vf_netdev);
+rx_handler_failed:
+	return ret;
+}
+
+static void __netvsc_vf_setup(struct net_device *ndev,
+			      struct net_device *vf_netdev)
+{
+	int ret;
+
+	call_netdevice_notifiers(NETDEV_JOIN, vf_netdev);
+
+	/* Align MTU of VF with master */
+	ret = dev_set_mtu(vf_netdev, ndev->mtu);
+	if (ret)
+		netdev_warn(vf_netdev,
+			    "unable to change mtu to %u\n", ndev->mtu);
+
+	if (netif_running(ndev)) {
+		ret = dev_open(vf_netdev);
+		if (ret)
+			netdev_warn(vf_netdev,
+				    "unable to open: %d\n", ret);
+	}
+}
+
+/* Setup VF as slave of the synthetic device.
+ * Runs in workqueue to avoid recursion in netlink callbacks.
+ */
+static void netvsc_vf_setup(struct work_struct *w)
+{
+	struct net_device_context *ndev_ctx
+		= container_of(w, struct net_device_context, vf_takeover);
+	struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
+	struct net_device *vf_netdev;
+
+	rtnl_lock();
+	vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
+	if (vf_netdev)
+		__netvsc_vf_setup(ndev, vf_netdev);
+
+	rtnl_unlock();
+}
+
 static int netvsc_register_vf(struct net_device *vf_netdev)
 {
 	struct net_device *ndev;
@@ -1400,10 +1622,12 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 	if (!netvsc_dev || rtnl_dereference(net_device_ctx->vf_netdev))
 		return NOTIFY_DONE;
 
+	if (netvsc_vf_join(vf_netdev, ndev) != 0)
+		return NOTIFY_DONE;
+
 	netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
-	/*
-	 * Take a reference on the module.
-	 */
+
+	/* Prevent this module from being unloaded while VF is registered */
 	try_module_get(THIS_MODULE);
 
 	dev_hold(vf_netdev);
@@ -1411,61 +1635,59 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 	return NOTIFY_OK;
 }
 
-static int netvsc_vf_up(struct net_device *vf_netdev)
+/* Change datapath */
+static void netvsc_vf_update(struct work_struct *w)
 {
-	struct net_device *ndev;
+	struct net_device_context *ndev_ctx
+		= container_of(w, struct net_device_context, vf_notify);
+	struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
 	struct netvsc_device *netvsc_dev;
-	struct net_device_context *net_device_ctx;
-
-	ndev = get_netvsc_byref(vf_netdev);
-	if (!ndev)
-		return NOTIFY_DONE;
-
-	net_device_ctx = netdev_priv(ndev);
-	netvsc_dev = rtnl_dereference(net_device_ctx->nvdev);
-
-	netdev_info(ndev, "VF up: %s\n", vf_netdev->name);
-
-	/*
-	 * Open the device before switching data path.
-	 */
-	rndis_filter_open(netvsc_dev);
-
-	/*
-	 * notify the host to switch the data path.
-	 */
-	netvsc_switch_datapath(ndev, true);
-	netdev_info(ndev, "Data path switched to VF: %s\n", vf_netdev->name);
-
-	netif_carrier_off(ndev);
+	struct net_device *vf_netdev;
+	bool vf_is_up;
 
-	/* Now notify peers through VF device. */
-	call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, vf_netdev);
+	rtnl_lock();
+	vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
+	if (!vf_netdev)
+		goto unlock;
+
+	netvsc_dev = rtnl_dereference(ndev_ctx->nvdev);
+	if (!netvsc_dev)
+		goto unlock;
+
+	vf_is_up = netif_running(vf_netdev);
+	if (vf_is_up != ndev_ctx->datapath) {
+		if (vf_is_up) {
+			netdev_info(ndev, "VF up: %s\n", vf_netdev->name);
+			rndis_filter_open(netvsc_dev);
+			netvsc_switch_datapath(ndev, true);
+			netdev_info(ndev, "Data path switched to VF: %s\n",
+				    vf_netdev->name);
+		} else {
+			netdev_info(ndev, "VF down: %s\n", vf_netdev->name);
+			netvsc_switch_datapath(ndev, false);
+			rndis_filter_close(netvsc_dev);
+			netdev_info(ndev, "Data path switched from VF: %s\n",
+				    vf_netdev->name);
+		}
 
-	return NOTIFY_OK;
+		/* Now notify peers through VF device. */
+		call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, ndev);
+	}
+unlock:
+	rtnl_unlock();
 }
 
-static int netvsc_vf_down(struct net_device *vf_netdev)
+static int netvsc_vf_notify(struct net_device *vf_netdev)
 {
-	struct net_device *ndev;
-	struct netvsc_device *netvsc_dev;
 	struct net_device_context *net_device_ctx;
+	struct net_device *ndev;
 
 	ndev = get_netvsc_byref(vf_netdev);
 	if (!ndev)
 		return NOTIFY_DONE;
 
 	net_device_ctx = netdev_priv(ndev);
-	netvsc_dev = rtnl_dereference(net_device_ctx->nvdev);
-
-	netdev_info(ndev, "VF down: %s\n", vf_netdev->name);
-	netvsc_switch_datapath(ndev, false);
-	netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
-	rndis_filter_close(netvsc_dev);
-	netif_carrier_on(ndev);
-
-	/* Now notify peers through netvsc device. */
-	call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, ndev);
+	schedule_work(&net_device_ctx->vf_notify);
 
 	return NOTIFY_OK;
 }
@@ -1480,9 +1702,12 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
 		return NOTIFY_DONE;
 
 	net_device_ctx = netdev_priv(ndev);
+	cancel_work_sync(&net_device_ctx->vf_takeover);
+	cancel_work_sync(&net_device_ctx->vf_notify);
 
 	netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
 
+	netdev_upper_dev_unlink(vf_netdev, ndev);
 	RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL);
 	dev_put(vf_netdev);
 	module_put(THIS_MODULE);
@@ -1496,12 +1721,12 @@ static int netvsc_probe(struct hv_device *dev,
 	struct net_device_context *net_device_ctx;
 	struct netvsc_device_info device_info;
 	struct netvsc_device *nvdev;
-	int ret;
+	int ret = -ENOMEM;
 
 	net = alloc_etherdev_mq(sizeof(struct net_device_context),
 				VRSS_CHANNEL_MAX);
 	if (!net)
-		return -ENOMEM;
+		goto no_net;
 
 	netif_carrier_off(net);
 
@@ -1520,6 +1745,13 @@ static int netvsc_probe(struct hv_device *dev,
 
 	spin_lock_init(&net_device_ctx->lock);
 	INIT_LIST_HEAD(&net_device_ctx->reconfig_events);
+	INIT_WORK(&net_device_ctx->vf_takeover, netvsc_vf_setup);
+	INIT_WORK(&net_device_ctx->vf_notify, netvsc_vf_update);
+
+	net_device_ctx->vf_stats
+		= netdev_alloc_pcpu_stats(struct netvsc_vf_pcpu_stats);
+	if (!net_device_ctx->vf_stats)
+		goto no_stats;
 
 	net->netdev_ops = &device_ops;
 	net->ethtool_ops = &ethtool_ops;
@@ -1539,10 +1771,9 @@ static int netvsc_probe(struct hv_device *dev,
 	if (IS_ERR(nvdev)) {
 		ret = PTR_ERR(nvdev);
 		netdev_err(net, "unable to add netvsc device (ret %d)\n", ret);
-		free_netdev(net);
-		hv_set_drvdata(dev, NULL);
-		return ret;
+		goto rndis_failed;
 	}
+
 	memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
 
 	/* hw_features computed in rndis_filter_device_add */
@@ -1566,11 +1797,20 @@ static int netvsc_probe(struct hv_device *dev,
 	ret = register_netdev(net);
 	if (ret != 0) {
 		pr_err("Unable to register netdev.\n");
-		rndis_filter_device_remove(dev, nvdev);
-		free_netdev(net);
+		goto register_failed;
 	}
 
 	return ret;
+
+register_failed:
+	rndis_filter_device_remove(dev, nvdev);
+rndis_failed:
+	free_percpu(net_device_ctx->vf_stats);
+no_stats:
+	hv_set_drvdata(dev, NULL);
+	free_netdev(net);
+no_net:
+	return ret;
 }
 
 static int netvsc_remove(struct hv_device *dev)
@@ -1604,6 +1844,7 @@ static int netvsc_remove(struct hv_device *dev)
 
 	hv_set_drvdata(dev, NULL);
 
+	free_percpu(ndev_ctx->vf_stats);
 	free_netdev(net);
 	return 0;
 }
@@ -1658,9 +1899,8 @@ static int netvsc_netdev_event(struct notifier_block *this,
 	case NETDEV_UNREGISTER:
 		return netvsc_unregister_vf(event_dev);
 	case NETDEV_UP:
-		return netvsc_vf_up(event_dev);
 	case NETDEV_DOWN:
-		return netvsc_vf_down(event_dev);
+		return netvsc_vf_notify(event_dev);
 	default:
 		return NOTIFY_DONE;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 09/10] netvsc: add documentation
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (7 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 08/10] netvsc: transparent VF management Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-26 23:40 ` [PATCH net-next v2 10/10] netvsc: remove bonding setup script Stephen Hemminger
  2017-07-27  6:59 ` [PATCH net-next v2 00/10] netvsc fixes and new features David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, linux-doc, netdev

Add some background documentation on netvsc device options
and limitations.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 Documentation/networking/netvsc.txt | 50 +++++++++++++++++++++++++++++++++++++
 MAINTAINERS                         |  1 +
 2 files changed, 51 insertions(+)
 create mode 100644 Documentation/networking/netvsc.txt

diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/netvsc.txt
new file mode 100644
index 000000000000..86f72dced4da
--- /dev/null
+++ b/Documentation/networking/netvsc.txt
@@ -0,0 +1,50 @@
+Hyper-V network driver
+======================
+
+Compatibility
+=============
+
+This driver is compatible and tested on Windows Server 2012 R2, 2016 and
+Windows 10.
+
+Features
+========
+
+  Checksum offload
+  ----------------
+  The netvsc driver supports checksum offload as long as the
+  Hyper-V host version does. Windows Server 2016 and Azure
+  support checksum offload for TCP and UDP for both IPv4 and
+  IPv6. Windows Server 2012 only supports checksum offload for TCP.
+
+  Receive Side Scaling
+  --------------------
+  Hyper-V supports receive side scaling. For TCP, packets are
+  distributed among available queues based on IP address and port
+  number. Current versions of Hyper-V host, only distribute UDP
+  packets based on the IP source and destination address.
+  The port number is not used as part of the hash value for UDP.
+  Fragmented IP packets are not distributed between queues;
+  all fragmented packets arrive on the first channel.
+
+  Generic Receive Offload, aka GRO
+  --------------------------------
+  The driver supports GRO and it is enabled by default. GRO coalesces
+  like packets and significantly reduces CPU usage under heavy Rx
+  load.
+
+  SR-IOV support
+  --------------
+  Hyper-V supports SR-IOV as a hardware acceleration option. If SR-IOV
+  is enabled in both the vSwitch and the guest configuration, then the
+  Virtual Function (VF) device is passed to the guest as a PCI
+  device. In this case, both a synthetic (netvsc) and VF device are
+  visible in the guest OS and both NIC's have the same MAC address.
+
+  The VF is enslaved by netvsc device.  The netvsc driver will transparently
+  switch the data path to the VF when it is available and up.
+  Network state (addresses, firewall, etc) should be applied only to the
+  netvsc device; the slave device should not be accessed directly in
+  most cases.  The exceptions are if some special queue discipline or
+  flow direction is desired, these should be applied directly to the
+  VF slave device.
diff --git a/MAINTAINERS b/MAINTAINERS
index 297e610c9163..d30c17df1deb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6294,6 +6294,7 @@ M:	Haiyang Zhang <haiyangz@microsoft.com>
 M:	Stephen Hemminger <sthemmin@microsoft.com>
 L:	devel@linuxdriverproject.org
 S:	Maintained
+F:	Documentation/networking/netvsc.txt
 F:	arch/x86/include/asm/mshyperv.h
 F:	arch/x86/include/uapi/asm/hyperv.h
 F:	arch/x86/kernel/cpu/mshyperv.c
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v2 10/10] netvsc: remove bonding setup script
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (8 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 09/10] netvsc: add documentation Stephen Hemminger
@ 2017-07-26 23:40 ` Stephen Hemminger
  2017-07-27  6:59 ` [PATCH net-next v2 00/10] netvsc fixes and new features David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-26 23:40 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, corbet; +Cc: devel, netdev, linux-doc

No longer needed, now all managed by transparent VF logic.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 tools/hv/bondvf.sh | 255 -----------------------------------------------------
 1 file changed, 255 deletions(-)
 delete mode 100755 tools/hv/bondvf.sh

diff --git a/tools/hv/bondvf.sh b/tools/hv/bondvf.sh
deleted file mode 100755
index 80f102860cf8..000000000000
--- a/tools/hv/bondvf.sh
+++ /dev/null
@@ -1,255 +0,0 @@
-#!/bin/bash
-
-# This example script creates bonding network devices based on synthetic NIC
-# (the virtual network adapter usually provided by Hyper-V) and the matching
-# VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
-# function as one network device, and fail over to the synthetic NIC if VF is
-# down.
-#
-# Usage:
-# - After configured vSwitch and vNIC with SRIOV, start Linux virtual
-#   machine (VM)
-# - Run this scripts on the VM. It will create configuration files in
-#   distro specific directory.
-# - Reboot the VM, so that the bonding config are enabled.
-#
-# The config files are DHCP by default. You may edit them if you need to change
-# to Static IP or change other settings.
-#
-
-sysdir=/sys/class/net
-netvsc_cls={f8615163-df3e-46c5-913f-f2d2f965ed0e}
-bondcnt=0
-
-# Detect Distro
-if [ -f /etc/redhat-release ];
-then
-	cfgdir=/etc/sysconfig/network-scripts
-	distro=redhat
-elif grep -q 'Ubuntu' /etc/issue
-then
-	cfgdir=/etc/network
-	distro=ubuntu
-elif grep -q 'SUSE' /etc/issue
-then
-	cfgdir=/etc/sysconfig/network
-	distro=suse
-else
-	echo "Unsupported Distro"
-	exit 1
-fi
-
-echo Detected Distro: $distro, or compatible
-
-# Get a list of ethernet names
-list_eth=(`cd $sysdir && ls -d */ | cut -d/ -f1 | grep -v bond`)
-eth_cnt=${#list_eth[@]}
-
-echo List of net devices:
-
-# Get the MAC addresses
-for (( i=0; i < $eth_cnt; i++ ))
-do
-	list_mac[$i]=`cat $sysdir/${list_eth[$i]}/address`
-	echo ${list_eth[$i]}, ${list_mac[$i]}
-done
-
-# Find NIC with matching MAC
-for (( i=0; i < $eth_cnt-1; i++ ))
-do
-	for (( j=i+1; j < $eth_cnt; j++ ))
-	do
-		if [ "${list_mac[$i]}" = "${list_mac[$j]}" ]
-		then
-			list_match[$i]=${list_eth[$j]}
-			break
-		fi
-	done
-done
-
-function create_eth_cfg_redhat {
-	local fn=$cfgdir/ifcfg-$1
-
-	rm -f $fn
-	echo DEVICE=$1 >>$fn
-	echo TYPE=Ethernet >>$fn
-	echo BOOTPROTO=none >>$fn
-	echo UUID=`uuidgen` >>$fn
-	echo ONBOOT=yes >>$fn
-	echo PEERDNS=yes >>$fn
-	echo IPV6INIT=yes >>$fn
-	echo MASTER=$2 >>$fn
-	echo SLAVE=yes >>$fn
-}
-
-function create_eth_cfg_pri_redhat {
-	create_eth_cfg_redhat $1 $2
-}
-
-function create_bond_cfg_redhat {
-	local fn=$cfgdir/ifcfg-$1
-
-	rm -f $fn
-	echo DEVICE=$1 >>$fn
-	echo TYPE=Bond >>$fn
-	echo BOOTPROTO=dhcp >>$fn
-	echo UUID=`uuidgen` >>$fn
-	echo ONBOOT=yes >>$fn
-	echo PEERDNS=yes >>$fn
-	echo IPV6INIT=yes >>$fn
-	echo BONDING_MASTER=yes >>$fn
-	echo BONDING_OPTS=\"mode=active-backup miimon=100 primary=$2\" >>$fn
-}
-
-function del_eth_cfg_ubuntu {
-	local mainfn=$cfgdir/interfaces
-	local fnlist=( $mainfn )
-
-	local dirlist=(`awk '/^[ \t]*source/{print $2}' $mainfn`)
-
-	local i
-	for i in "${dirlist[@]}"
-	do
-		fnlist+=(`ls $i 2>/dev/null`)
-	done
-
-	local tmpfl=$(mktemp)
-
-	local nic_start='^[ \t]*(auto|iface|mapping|allow-.*)[ \t]+'$1
-	local nic_end='^[ \t]*(auto|iface|mapping|allow-.*|source)'
-
-	local fn
-	for fn in "${fnlist[@]}"
-	do
-		awk "/$nic_end/{x=0} x{next} /$nic_start/{x=1;next} 1" \
-			$fn >$tmpfl
-
-		cp $tmpfl $fn
-	done
-
-	rm $tmpfl
-}
-
-function create_eth_cfg_ubuntu {
-	local fn=$cfgdir/interfaces
-
-	del_eth_cfg_ubuntu $1
-	echo $'\n'auto $1 >>$fn
-	echo iface $1 inet manual >>$fn
-	echo bond-master $2 >>$fn
-}
-
-function create_eth_cfg_pri_ubuntu {
-	local fn=$cfgdir/interfaces
-
-	del_eth_cfg_ubuntu $1
-	echo $'\n'allow-hotplug $1 >>$fn
-	echo iface $1 inet manual >>$fn
-	echo bond-master $2 >>$fn
-	echo bond-primary $1 >>$fn
-}
-
-function create_bond_cfg_ubuntu {
-	local fn=$cfgdir/interfaces
-
-	del_eth_cfg_ubuntu $1
-
-	echo $'\n'auto $1 >>$fn
-	echo iface $1 inet dhcp >>$fn
-	echo bond-mode active-backup >>$fn
-	echo bond-miimon 100 >>$fn
-	echo bond-slaves none >>$fn
-}
-
-function create_eth_cfg_suse {
-        local fn=$cfgdir/ifcfg-$1
-
-        rm -f $fn
-	echo BOOTPROTO=none >>$fn
-	echo STARTMODE=auto >>$fn
-}
-
-function create_eth_cfg_pri_suse {
-	local fn=$cfgdir/ifcfg-$1
-
-	rm -f $fn
-	echo BOOTPROTO=none >>$fn
-	echo STARTMODE=hotplug >>$fn
-}
-
-function create_bond_cfg_suse {
-	local fn=$cfgdir/ifcfg-$1
-
-	rm -f $fn
-	echo BOOTPROTO=dhcp >>$fn
-	echo STARTMODE=auto >>$fn
-	echo BONDING_MASTER=yes >>$fn
-	echo BONDING_SLAVE_0=$2 >>$fn
-	echo BONDING_SLAVE_1=$3 >>$fn
-	echo BONDING_MODULE_OPTS=\'mode=active-backup miimon=100 primary=$2\' >>$fn
-}
-
-function create_bond {
-	local bondname=bond$bondcnt
-	local primary
-	local secondary
-
-	local class_id1=`cat $sysdir/$1/device/class_id 2>/dev/null`
-	local class_id2=`cat $sysdir/$2/device/class_id 2>/dev/null`
-
-	if [ "$class_id1" = "$netvsc_cls" ]
-	then
-		primary=$2
-		secondary=$1
-	elif [ "$class_id2" = "$netvsc_cls" ]
-	then
-		primary=$1
-		secondary=$2
-	else
-		return 0
-	fi
-
-	echo $'\nBond name:' $bondname
-
-	if [ $distro == ubuntu ]
-	then
-		local mainfn=$cfgdir/interfaces
-		local s="^[ \t]*(auto|iface|mapping|allow-.*)[ \t]+${bondname}"
-
-		grep -E "$s" $mainfn
-		if [ $? -eq 0 ]
-		then
-			echo "WARNING: ${bondname} has been configured already"
-			return
-		fi
-	elif [ $distro == redhat ] || [ $distro == suse ]
-	then
-		local fn=$cfgdir/ifcfg-$bondname
-		if [ -f $fn ]
-		then
-			echo "WARNING: ${bondname} has been configured already"
-			return
-		fi
-	else
-		echo "Unsupported Distro: ${distro}"
-		return
-	fi
-
-	echo configuring $primary
-	create_eth_cfg_pri_$distro $primary $bondname
-
-	echo configuring $secondary
-	create_eth_cfg_$distro $secondary $bondname
-
-	echo creating: $bondname with primary slave: $primary
-	create_bond_cfg_$distro $bondname $primary $secondary
-}
-
-for (( i=0; i < $eth_cnt-1; i++ ))
-do
-        if [ -n "${list_match[$i]}" ]
-        then
-		create_bond ${list_eth[$i]} ${list_match[$i]}
-		let bondcnt=bondcnt+1
-        fi
-done
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v2 00/10] netvsc fixes and new features
  2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
                   ` (9 preceding siblings ...)
  2017-07-26 23:40 ` [PATCH net-next v2 10/10] netvsc: remove bonding setup script Stephen Hemminger
@ 2017-07-27  6:59 ` David Miller
  10 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2017-07-27  6:59 UTC (permalink / raw)
  To: stephen; +Cc: kys, haiyangz, sthemmin, corbet, devel, linux-doc, netdev

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 26 Jul 2017 16:40:19 -0700

> The next group of patches aims to reduce the driver memory footprint
> by reducing the size of the per-device receive and send buffers, and
> the per-channel receive completion queue.

Sorry Stephen, I have to draw the line with those module parameters.

Search out alternate mechanism which can work at run time, such as
using devlink or similar.

Don't tell me it's "impossible" or "hard", simply do.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-07-27  6:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-26 23:40 [PATCH net-next v2 00/10] netvsc fixes and new features Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 01/10] netvsc: fix return value for set_channels Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 02/10] netvsc: fix warnings reported by lockdep Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 03/10] netvsc: don't print pointer value in error message Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 04/10] netvsc: remove unnecessary indirection of page_buffer Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 05/10] netvsc: optimize receive completions Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 06/10] netvsc: signal host if receive ring is emptied Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 07/10] netvsc: allow smaller send/recv buffer size Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 08/10] netvsc: transparent VF management Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 09/10] netvsc: add documentation Stephen Hemminger
2017-07-26 23:40 ` [PATCH net-next v2 10/10] netvsc: remove bonding setup script Stephen Hemminger
2017-07-27  6:59 ` [PATCH net-next v2 00/10] netvsc fixes and new features David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).