Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH RFC 2/2] tg3: Convert to get_pauseparamext
From: Matt Carlson @ 2011-10-14 20:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson, bhutchings

This patch converts the tg3 driver to the get_pauseparamext ethtool
command.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/ethernet/broadcom/tg3.c |   28 ++++++++++++++++++++--------
 1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index fe712f9..a7b7ddd 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -10576,24 +10576,36 @@ static int tg3_set_ringparam(struct net_device *dev, struct ethtool_ringparam *e
 	return err;
 }
 
-static void tg3_get_pauseparam(struct net_device *dev, struct ethtool_pauseparam *epause)
+static void tg3_get_pauseparamext(struct net_device *dev,
+				  struct ethtool_pauseparamext *epause)
 {
 	struct tg3 *tp = netdev_priv(dev);
 
-	epause->autoneg = !!tg3_flag(tp, PAUSE_AUTONEG);
+	epause->cfg.autoneg = !!tg3_flag(tp, PAUSE_AUTONEG);
+
+	if (tp->link_config.flowctrl & FLOW_CTRL_RX)
+		epause->cfg.rx_pause = 1;
+	else
+		epause->cfg.rx_pause = 0;
+
+	if (tp->link_config.flowctrl & FLOW_CTRL_TX)
+		epause->cfg.tx_pause = 1;
+	else
+		epause->cfg.tx_pause = 0;
 
 	if (tp->link_config.active_flowctrl & FLOW_CTRL_RX)
-		epause->rx_pause = 1;
+		epause->rx_pause_status = 1;
 	else
-		epause->rx_pause = 0;
+		epause->rx_pause_status = 0;
 
 	if (tp->link_config.active_flowctrl & FLOW_CTRL_TX)
-		epause->tx_pause = 1;
+		epause->tx_pause_status = 1;
 	else
-		epause->tx_pause = 0;
+		epause->tx_pause_status = 0;
 }
 
-static int tg3_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam *epause)
+static int tg3_set_pauseparam(struct net_device *dev,
+			      struct ethtool_pauseparam *epause)
 {
 	struct tg3 *tp = netdev_priv(dev);
 	int err = 0;
@@ -11926,7 +11938,7 @@ static const struct ethtool_ops tg3_ethtool_ops = {
 	.set_eeprom		= tg3_set_eeprom,
 	.get_ringparam		= tg3_get_ringparam,
 	.set_ringparam		= tg3_set_ringparam,
-	.get_pauseparam		= tg3_get_pauseparam,
+	.get_pauseparamext	= tg3_get_pauseparamext,
 	.set_pauseparam		= tg3_set_pauseparam,
 	.self_test		= tg3_self_test,
 	.get_strings		= tg3_get_strings,
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH RFC 0/2] Add extended pause query capability
From: Matt Carlson @ 2011-10-14 20:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson, bhutchings

The current implementation of get_pauseparam allows userspace to query the
flow control configuration, but not the flow control status.  This patchset
defines a new ethtool_pauseparamext structure and adds a new
get_pauseparamext ethtool callback to support it.  The new facilities allow
the driver to report both config and status in the same query.

Please note that Ben Hutchings' suggestion to deduce the flow control settings
through the 'advertising' and 'lp_advertising' from ETHTOOL_GSET was considered,
but rejected because there was no way to know if the flow control
advertisements reported were valid.

^ permalink raw reply

* [PATCH RFC 1/2] ethtool: Add extended flow control query facility
From: Matt Carlson @ 2011-10-14 20:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, mcarlson, bhutchings

This patch creates a new ethtool_pauseparamext structure that elaborates
on the older struct ethtool_pauseparam structure.  The new structure
adds two new fields that represent the current flow control status.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 include/linux/ethtool.h |   14 ++++++++++-
 net/core/ethtool.c      |   59 +++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 45f00b6..cd60d74 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -284,6 +284,15 @@ struct ethtool_pauseparam {
 	__u32	tx_pause;
 };
 
+/* for configuring link flow control parameters */
+struct ethtool_pauseparamext {
+	struct ethtool_pauseparam cfg;
+
+	__u32	rx_pause_status;
+	__u32	tx_pause_status;
+	__u32	reserved[2];
+};
+
 #define ETH_GSTRING_LEN		32
 enum ethtool_stringset {
 	ETH_SS_TEST		= 0,
@@ -956,7 +965,8 @@ struct ethtool_ops {
 	int	(*get_dump_data)(struct net_device *,
 				 struct ethtool_dump *, void *);
 	int	(*set_dump)(struct net_device *, struct ethtool_dump *);
-
+	void	(*get_pauseparamext)(struct net_device *,
+				  struct ethtool_pauseparamext*);
 };
 #endif /* __KERNEL__ */
 
@@ -1030,6 +1040,8 @@ struct ethtool_ops {
 #define ETHTOOL_SET_DUMP	0x0000003e /* Set dump settings */
 #define ETHTOOL_GET_DUMP_FLAG	0x0000003f /* Get dump settings */
 #define ETHTOOL_GET_DUMP_DATA	0x00000040 /* Get dump data */
+#define ETHTOOL_GPAUSEPARAMEXT	0x00000041 /* Get pause parameters v2 */
+#define ETHTOOL_SPAUSEPARAMEXT	0x00000042 /* Set pause parameters v2 */
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f444817..1c0f58c 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1206,15 +1206,21 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 
 static int ethtool_get_pauseparam(struct net_device *dev, void __user *useraddr)
 {
-	struct ethtool_pauseparam pauseparam = { ETHTOOL_GPAUSEPARAM };
+	struct ethtool_pauseparamext pauseparam = {
+		{ .cmd = ETHTOOL_GPAUSEPARAM }
+	};
 
-	if (!dev->ethtool_ops->get_pauseparam)
+	if (dev->ethtool_ops->get_pauseparam)
+		dev->ethtool_ops->get_pauseparam(dev, &pauseparam.cfg);
+	else if (dev->ethtool_ops->get_pauseparamext)
+		dev->ethtool_ops->get_pauseparamext(dev, &pauseparam);
+	else
 		return -EOPNOTSUPP;
 
-	dev->ethtool_ops->get_pauseparam(dev, &pauseparam);
-
-	if (copy_to_user(useraddr, &pauseparam, sizeof(pauseparam)))
+	if (copy_to_user(useraddr, &pauseparam,
+			 sizeof(struct ethtool_pauseparam)))
 		return -EFAULT;
+
 	return 0;
 }
 
@@ -1231,6 +1237,42 @@ static int ethtool_set_pauseparam(struct net_device *dev, void __user *useraddr)
 	return dev->ethtool_ops->set_pauseparam(dev, &pauseparam);
 }
 
+static int ethtool_get_pauseparamext(struct net_device *dev,
+				     void __user *useraddr)
+{
+	struct ethtool_pauseparamext pauseparam = {
+		{ .cmd = ETHTOOL_GPAUSEPARAMEXT }
+	};
+
+	if (dev->ethtool_ops->get_pauseparamext) {
+		dev->ethtool_ops->get_pauseparamext(dev, &pauseparam);
+	} else {
+		if (!dev->ethtool_ops->get_pauseparam)
+			return -EOPNOTSUPP;
+
+		dev->ethtool_ops->get_pauseparam(dev, &pauseparam.cfg);
+	}
+
+	if (copy_to_user(useraddr, &pauseparam, sizeof(pauseparam)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int ethtool_set_pauseparamext(struct net_device *dev,
+				     void __user *useraddr)
+{
+	struct ethtool_pauseparamext pauseparam;
+
+	if (!dev->ethtool_ops->set_pauseparam)
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&pauseparam, useraddr, sizeof(pauseparam)))
+		return -EFAULT;
+
+	return dev->ethtool_ops->set_pauseparam(dev, &pauseparam.cfg);
+}
+
 static int __ethtool_set_sg(struct net_device *dev, u32 data)
 {
 	int err;
@@ -1667,6 +1709,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GCOALESCE:
 	case ETHTOOL_GRINGPARAM:
 	case ETHTOOL_GPAUSEPARAM:
+	case ETHTOOL_GPAUSEPARAMEXT:
 	case ETHTOOL_GRXCSUM:
 	case ETHTOOL_GTXCSUM:
 	case ETHTOOL_GSG:
@@ -1754,6 +1797,12 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_SPAUSEPARAM:
 		rc = ethtool_set_pauseparam(dev, useraddr);
 		break;
+	case ETHTOOL_GPAUSEPARAMEXT:
+		rc = ethtool_get_pauseparamext(dev, useraddr);
+		break;
+	case ETHTOOL_SPAUSEPARAMEXT:
+		rc = ethtool_set_pauseparamext(dev, useraddr);
+		break;
 	case ETHTOOL_TEST:
 		rc = ethtool_self_test(dev, useraddr);
 		break;
-- 
1.7.3.4

^ permalink raw reply related

* 群发软件+买家搜索机+最新广交会买家、海关数据,B2B询盘买家500万。
From: 保证有买家回复 @ 2011-10-14 20:41 UTC (permalink / raw)
  To: netacrylichaiying, netcomsales, netcross1960, netdev, neted,
	netherwerks, netinfo, <netlaw_

群发软件+买家搜索机+109届广交会买家、展会买家、海关数据,B2B询盘买家500万。

一共8个包(数据是全行业的，按照行业分好类，并且可以按照关键词查询的)： 
1，2011春季109届广交会买家数据库新鲜出炉，超级新鲜买家，新鲜数据，容易成单！ 
2，最新全球买家库,共451660条数据。 
3，2008年,2009年,2010年 春季+秋季广交会买家名录，103 104 105 106 107 108 共六届 共120.6万数据。
4，2010年国际促销协会（PPAI）成员名单 PPAI Members Directory，非常重要的大买家。
5，2010年到香港采购的国外客人名录(香港贸发局提供)，共7.2万数据，超级重要的买家。
6，60.8万条最新国外B2B买家询盘。
7，2009年海关提单数据piers版数据 1千万。
8，群发软件，群发软件的部署与安装。

共 500万个买家，每个均有Email. 

保证每天都有买家回复。
保证每天都有买家回复。

要的抓紧联系QQ: 1339625218   或者立即回复邮箱: 1339625218@qq.com
要的抓紧联系QQ: 1339625218   或者立即回复邮箱: 1339625218@qq.com
要的抓紧联系QQ: 1339625218   或者立即回复邮箱: 1339625218@qq.com

诚信为本，如果不信任本人,可以走淘宝交易,收货验证后再付款,这是对您最好的保障了。 

保证每天都有买家回复。
保证每天都有买家回复。
保证每天都有买家回复。




广交会买家按产品类别分类，分为以下几类：
1 办公设备
2 编织及藤铁工艺品
3 玻璃
4 餐厨用具
5 车辆
6 大型机械及设备
7 电子电气
8 电子消费品
9 纺织
10 服装
11 个人护理
12 工程机械
13 工具
14 化工
15 计算机及通讯
16 家居用品
17 家居装饰
18 家具
19 家用电器
20 建筑及装饰材料
21 节日用品
22 礼品及赠品
23 摩托车
24 汽车配件
25 食品
26 陶瓷
27 铁石
28 玩具
29 卫浴
30 五金
31 小型机械
32 鞋
33 休闲用品
34 医疗
35 浴室产品
36 园林
37 照明产品
38 钟表眼镜
39 自行车
40 包


保证每天都有买家回复。
保证每天都有买家回复。
保证每天都有买家回复。
保证每天都有买家回复。
保证每天都有买家回复。

^ permalink raw reply

* [PATCH] staging: hv: move hv_netvsc into drivers/net directory
From: Haiyang Zhang @ 2011-10-14 20:34 UTC (permalink / raw)
  To: haiyangz, kys, gregkh, linux-kernel, devel, virtualization
  Cc: Mike Sterling, NetDev

hv_netvsc has been reviewed on netdev mailing list on 6/09/2011.
All recommended changes have been made. We are requesting to move
it from staging area into drivers/net directory.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Mike Sterling <Mike.Sterling@microsoft.com>
Cc: NetDev <netdev@vger.kernel.org>

---
 drivers/net/Kconfig               |    2 +
 drivers/net/Makefile              |    2 +
 drivers/net/hyperv/Kconfig        |    5 +
 drivers/net/hyperv/Makefile       |    3 +
 drivers/net/hyperv/hyperv_net.h   | 1058 +++++++++++++++++++++++++++++++++++++
 drivers/net/hyperv/netvsc.c       |  944 +++++++++++++++++++++++++++++++++
 drivers/net/hyperv/netvsc_drv.c   |  456 ++++++++++++++++
 drivers/net/hyperv/rndis_filter.c |  855 ++++++++++++++++++++++++++++++
 8 files changed, 3325 insertions(+), 0 deletions(-)
 create mode 100644 drivers/net/hyperv/Kconfig
 create mode 100644 drivers/net/hyperv/Makefile
 create mode 100644 drivers/net/hyperv/hyperv_net.h
 create mode 100644 drivers/net/hyperv/netvsc.c
 create mode 100644 drivers/net/hyperv/netvsc_drv.c
 create mode 100644 drivers/net/hyperv/rndis_filter.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8d0314d..088c330 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3451,4 +3451,6 @@ config VMXNET3
 	  To compile this driver as a module, choose M here: the
 	  module will be called vmxnet3.
 
+source "drivers/net/hyperv/Kconfig"
+
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e1eca2a..647c878 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -306,3 +306,5 @@ obj-$(CONFIG_CAIF) += caif/
 obj-$(CONFIG_OCTEON_MGMT_ETHERNET) += octeon/
 obj-$(CONFIG_PCH_GBE) += pch_gbe/
 obj-$(CONFIG_TILE_NET) += tile/
+
+obj-$(CONFIG_HYPERV_NET) += hyperv/
diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig
new file mode 100644
index 0000000..936968d
--- /dev/null
+++ b/drivers/net/hyperv/Kconfig
@@ -0,0 +1,5 @@
+config HYPERV_NET
+	tristate "Microsoft Hyper-V virtual network driver"
+	depends on HYPERV
+	help
+	  Select this option to enable the Hyper-V virtual network driver.
diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile
new file mode 100644
index 0000000..c8a6682
--- /dev/null
+++ b/drivers/net/hyperv/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_HYPERV_NET) += hv_netvsc.o
+
+hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
new file mode 100644
index 0000000..ac1ec84
--- /dev/null
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -0,0 +1,1058 @@
+/*
+ *
+ * Copyright (c) 2011, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ *   Haiyang Zhang <haiyangz@microsoft.com>
+ *   Hank Janssen  <hjanssen@microsoft.com>
+ *   K. Y. Srinivasan <kys@microsoft.com>
+ *
+ */
+
+#ifndef _HYPERV_NET_H
+#define _HYPERV_NET_H
+
+#include <linux/list.h>
+#include <linux/hyperv.h>
+
+/* Fwd declaration */
+struct hv_netvsc_packet;
+
+/* Represent the xfer page packet which contains 1 or more netvsc packet */
+struct xferpage_packet {
+	struct list_head list_ent;
+
+	/* # of netvsc packets this xfer packet contains */
+	u32 count;
+};
+
+/* The number of pages which are enough to cover jumbo frame buffer. */
+#define NETVSC_PACKET_MAXPAGE		4
+
+/*
+ * Represent netvsc packet which contains 1 RNDIS and 1 ethernet frame
+ * within the RNDIS
+ */
+struct hv_netvsc_packet {
+	/* Bookkeeping stuff */
+	struct list_head list_ent;
+
+	struct hv_device *device;
+	bool is_data_pkt;
+
+	/*
+	 * Valid only for receives when we break a xfer page packet
+	 * into multiple netvsc packets
+	 */
+	struct xferpage_packet *xfer_page_pkt;
+
+	union {
+		struct {
+			u64 recv_completion_tid;
+			void *recv_completion_ctx;
+			void (*recv_completion)(void *context);
+		} recv;
+		struct {
+			u64 send_completion_tid;
+			void *send_completion_ctx;
+			void (*send_completion)(void *context);
+		} send;
+	} completion;
+
+	/* This points to the memory after page_buf */
+	void *extension;
+
+	u32 total_data_buflen;
+	/* Points to the send/receive buffer where the ethernet frame is */
+	u32 page_buf_cnt;
+	struct hv_page_buffer page_buf[NETVSC_PACKET_MAXPAGE];
+};
+
+struct netvsc_device_info {
+	unsigned char mac_adr[6];
+	bool link_state;	/* 0 - link up, 1 - link down */
+	int  ring_size;
+};
+
+/* Interface */
+int netvsc_device_add(struct hv_device *device, void *additional_info);
+int netvsc_device_remove(struct hv_device *device);
+int netvsc_send(struct hv_device *device,
+		struct hv_netvsc_packet *packet);
+void netvsc_linkstatus_callback(struct hv_device *device_obj,
+				unsigned int status);
+int netvsc_recv_callback(struct hv_device *device_obj,
+			struct hv_netvsc_packet *packet);
+int rndis_filter_open(struct hv_device *dev);
+int rndis_filter_close(struct hv_device *dev);
+int rndis_filter_device_add(struct hv_device *dev,
+			void *additional_info);
+void rndis_filter_device_remove(struct hv_device *dev);
+int rndis_filter_receive(struct hv_device *dev,
+			struct hv_netvsc_packet *pkt);
+
+
+
+int rndis_filter_send(struct hv_device *dev,
+			struct hv_netvsc_packet *pkt);
+
+#define NVSP_INVALID_PROTOCOL_VERSION	((u32)0xFFFFFFFF)
+
+#define NVSP_PROTOCOL_VERSION_1		2
+#define NVSP_MIN_PROTOCOL_VERSION	NVSP_PROTOCOL_VERSION_1
+#define NVSP_MAX_PROTOCOL_VERSION	NVSP_PROTOCOL_VERSION_1
+
+enum {
+	NVSP_MSG_TYPE_NONE = 0,
+
+	/* Init Messages */
+	NVSP_MSG_TYPE_INIT			= 1,
+	NVSP_MSG_TYPE_INIT_COMPLETE		= 2,
+
+	NVSP_VERSION_MSG_START			= 100,
+
+	/* Version 1 Messages */
+	NVSP_MSG1_TYPE_SEND_NDIS_VER		= NVSP_VERSION_MSG_START,
+
+	NVSP_MSG1_TYPE_SEND_RECV_BUF,
+	NVSP_MSG1_TYPE_SEND_RECV_BUF_COMPLETE,
+	NVSP_MSG1_TYPE_REVOKE_RECV_BUF,
+
+	NVSP_MSG1_TYPE_SEND_SEND_BUF,
+	NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE,
+	NVSP_MSG1_TYPE_REVOKE_SEND_BUF,
+
+	NVSP_MSG1_TYPE_SEND_RNDIS_PKT,
+	NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE,
+
+	/*
+	 * This should be set to the number of messages for the version with
+	 * the maximum number of messages.
+	 */
+	NVSP_NUM_MSG_PER_VERSION		= 9,
+};
+
+enum {
+	NVSP_STAT_NONE = 0,
+	NVSP_STAT_SUCCESS,
+	NVSP_STAT_FAIL,
+	NVSP_STAT_PROTOCOL_TOO_NEW,
+	NVSP_STAT_PROTOCOL_TOO_OLD,
+	NVSP_STAT_INVALID_RNDIS_PKT,
+	NVSP_STAT_BUSY,
+	NVSP_STAT_MAX,
+};
+
+struct nvsp_message_header {
+	u32 msg_type;
+};
+
+/* Init Messages */
+
+/*
+ * This message is used by the VSC to initialize the channel after the channels
+ * has been opened. This message should never include anything other then
+ * versioning (i.e. this message will be the same for ever).
+ */
+struct nvsp_message_init {
+	u32 min_protocol_ver;
+	u32 max_protocol_ver;
+} __packed;
+
+/*
+ * This message is used by the VSP to complete the initialization of the
+ * channel. This message should never include anything other then versioning
+ * (i.e. this message will be the same for ever).
+ */
+struct nvsp_message_init_complete {
+	u32 negotiated_protocol_ver;
+	u32 max_mdl_chain_len;
+	u32 status;
+} __packed;
+
+union nvsp_message_init_uber {
+	struct nvsp_message_init init;
+	struct nvsp_message_init_complete init_complete;
+} __packed;
+
+/* Version 1 Messages */
+
+/*
+ * This message is used by the VSC to send the NDIS version to the VSP. The VSP
+ * can use this information when handling OIDs sent by the VSC.
+ */
+struct nvsp_1_message_send_ndis_version {
+	u32 ndis_major_ver;
+	u32 ndis_minor_ver;
+} __packed;
+
+/*
+ * This message is used by the VSC to send a receive buffer to the VSP. The VSP
+ * can then use the receive buffer to send data to the VSC.
+ */
+struct nvsp_1_message_send_receive_buffer {
+	u32 gpadl_handle;
+	u16 id;
+} __packed;
+
+struct nvsp_1_receive_buffer_section {
+	u32 offset;
+	u32 sub_alloc_size;
+	u32 num_sub_allocs;
+	u32 end_offset;
+} __packed;
+
+/*
+ * This message is used by the VSP to acknowledge a receive buffer send by the
+ * VSC. This message must be sent by the VSP before the VSP uses the receive
+ * buffer.
+ */
+struct nvsp_1_message_send_receive_buffer_complete {
+	u32 status;
+	u32 num_sections;
+
+	/*
+	 * The receive buffer is split into two parts, a large suballocation
+	 * section and a small suballocation section. These sections are then
+	 * suballocated by a certain size.
+	 */
+
+	/*
+	 * For example, the following break up of the receive buffer has 6
+	 * large suballocations and 10 small suballocations.
+	 */
+
+	/*
+	 * |            Large Section          |  |   Small Section   |
+	 * ------------------------------------------------------------
+	 * |     |     |     |     |     |     |  | | | | | | | | | | |
+	 * |                                      |
+	 *  LargeOffset                            SmallOffset
+	 */
+
+	struct nvsp_1_receive_buffer_section sections[1];
+} __packed;
+
+/*
+ * This message is sent by the VSC to revoke the receive buffer.  After the VSP
+ * completes this transaction, the vsp should never use the receive buffer
+ * again.
+ */
+struct nvsp_1_message_revoke_receive_buffer {
+	u16 id;
+};
+
+/*
+ * This message is used by the VSC to send a send buffer to the VSP. The VSC
+ * can then use the send buffer to send data to the VSP.
+ */
+struct nvsp_1_message_send_send_buffer {
+	u32 gpadl_handle;
+	u16 id;
+} __packed;
+
+/*
+ * This message is used by the VSP to acknowledge a send buffer sent by the
+ * VSC. This message must be sent by the VSP before the VSP uses the sent
+ * buffer.
+ */
+struct nvsp_1_message_send_send_buffer_complete {
+	u32 status;
+
+	/*
+	 * The VSC gets to choose the size of the send buffer and the VSP gets
+	 * to choose the sections size of the buffer.  This was done to enable
+	 * dynamic reconfigurations when the cost of GPA-direct buffers
+	 * decreases.
+	 */
+	u32 section_size;
+} __packed;
+
+/*
+ * This message is sent by the VSC to revoke the send buffer.  After the VSP
+ * completes this transaction, the vsp should never use the send buffer again.
+ */
+struct nvsp_1_message_revoke_send_buffer {
+	u16 id;
+};
+
+/*
+ * This message is used by both the VSP and the VSC to send a RNDIS message to
+ * the opposite channel endpoint.
+ */
+struct nvsp_1_message_send_rndis_packet {
+	/*
+	 * This field is specified by RNIDS. They assume there's two different
+	 * channels of communication. However, the Network VSP only has one.
+	 * Therefore, the channel travels with the RNDIS packet.
+	 */
+	u32 channel_type;
+
+	/*
+	 * This field is used to send part or all of the data through a send
+	 * buffer. This values specifies an index into the send buffer. If the
+	 * index is 0xFFFFFFFF, then the send buffer is not being used and all
+	 * of the data was sent through other VMBus mechanisms.
+	 */
+	u32 send_buf_section_index;
+	u32 send_buf_section_size;
+} __packed;
+
+/*
+ * This message is used by both the VSP and the VSC to complete a RNDIS message
+ * to the opposite channel endpoint. At this point, the initiator of this
+ * message cannot use any resources associated with the original RNDIS packet.
+ */
+struct nvsp_1_message_send_rndis_packet_complete {
+	u32 status;
+};
+
+union nvsp_1_message_uber {
+	struct nvsp_1_message_send_ndis_version send_ndis_ver;
+
+	struct nvsp_1_message_send_receive_buffer send_recv_buf;
+	struct nvsp_1_message_send_receive_buffer_complete
+						send_recv_buf_complete;
+	struct nvsp_1_message_revoke_receive_buffer revoke_recv_buf;
+
+	struct nvsp_1_message_send_send_buffer send_send_buf;
+	struct nvsp_1_message_send_send_buffer_complete send_send_buf_complete;
+	struct nvsp_1_message_revoke_send_buffer revoke_send_buf;
+
+	struct nvsp_1_message_send_rndis_packet send_rndis_pkt;
+	struct nvsp_1_message_send_rndis_packet_complete
+						send_rndis_pkt_complete;
+} __packed;
+
+union nvsp_all_messages {
+	union nvsp_message_init_uber init_msg;
+	union nvsp_1_message_uber v1_msg;
+} __packed;
+
+/* ALL Messages */
+struct nvsp_message {
+	struct nvsp_message_header hdr;
+	union nvsp_all_messages msg;
+} __packed;
+
+
+
+
+/* #define NVSC_MIN_PROTOCOL_VERSION		1 */
+/* #define NVSC_MAX_PROTOCOL_VERSION		1 */
+
+#define NETVSC_RECEIVE_BUFFER_SIZE		(1024*1024)	/* 1MB */
+
+#define NETVSC_RECEIVE_BUFFER_ID		0xcafe
+
+#define NETVSC_RECEIVE_SG_COUNT			1
+
+/* Preallocated receive packets */
+#define NETVSC_RECEIVE_PACKETLIST_COUNT		256
+
+#define NETVSC_PACKET_SIZE                      2048
+
+/* Per netvsc channel-specific */
+struct netvsc_device {
+	struct hv_device *dev;
+
+	atomic_t num_outstanding_sends;
+	bool destroy;
+	/*
+	 * List of free preallocated hv_netvsc_packet to represent receive
+	 * packet
+	 */
+	struct list_head recv_pkt_list;
+	spinlock_t recv_pkt_list_lock;
+
+	/* Receive buffer allocated by us but manages by NetVSP */
+	void *recv_buf;
+	u32 recv_buf_size;
+	u32 recv_buf_gpadl_handle;
+	u32 recv_section_cnt;
+	struct nvsp_1_receive_buffer_section *recv_section;
+
+	/* Used for NetVSP initialization protocol */
+	struct completion channel_init_wait;
+	struct nvsp_message channel_init_pkt;
+
+	struct nvsp_message revoke_packet;
+	/* unsigned char HwMacAddr[HW_MACADDR_LEN]; */
+
+	struct net_device *ndev;
+
+	/* Holds rndis device info */
+	void *extension;
+};
+
+
+/*  Status codes */
+
+
+#ifndef STATUS_SUCCESS
+#define STATUS_SUCCESS				(0x00000000L)
+#endif
+
+#ifndef STATUS_UNSUCCESSFUL
+#define STATUS_UNSUCCESSFUL			(0xC0000001L)
+#endif
+
+#ifndef STATUS_PENDING
+#define STATUS_PENDING				(0x00000103L)
+#endif
+
+#ifndef STATUS_INSUFFICIENT_RESOURCES
+#define STATUS_INSUFFICIENT_RESOURCES		(0xC000009AL)
+#endif
+
+#ifndef STATUS_BUFFER_OVERFLOW
+#define STATUS_BUFFER_OVERFLOW			(0x80000005L)
+#endif
+
+#ifndef STATUS_NOT_SUPPORTED
+#define STATUS_NOT_SUPPORTED			(0xC00000BBL)
+#endif
+
+#define RNDIS_STATUS_SUCCESS			(STATUS_SUCCESS)
+#define RNDIS_STATUS_PENDING			(STATUS_PENDING)
+#define RNDIS_STATUS_NOT_RECOGNIZED		(0x00010001L)
+#define RNDIS_STATUS_NOT_COPIED			(0x00010002L)
+#define RNDIS_STATUS_NOT_ACCEPTED		(0x00010003L)
+#define RNDIS_STATUS_CALL_ACTIVE		(0x00010007L)
+
+#define RNDIS_STATUS_ONLINE			(0x40010003L)
+#define RNDIS_STATUS_RESET_START		(0x40010004L)
+#define RNDIS_STATUS_RESET_END			(0x40010005L)
+#define RNDIS_STATUS_RING_STATUS		(0x40010006L)
+#define RNDIS_STATUS_CLOSED			(0x40010007L)
+#define RNDIS_STATUS_WAN_LINE_UP		(0x40010008L)
+#define RNDIS_STATUS_WAN_LINE_DOWN		(0x40010009L)
+#define RNDIS_STATUS_WAN_FRAGMENT		(0x4001000AL)
+#define RNDIS_STATUS_MEDIA_CONNECT		(0x4001000BL)
+#define RNDIS_STATUS_MEDIA_DISCONNECT		(0x4001000CL)
+#define RNDIS_STATUS_HARDWARE_LINE_UP		(0x4001000DL)
+#define RNDIS_STATUS_HARDWARE_LINE_DOWN		(0x4001000EL)
+#define RNDIS_STATUS_INTERFACE_UP		(0x4001000FL)
+#define RNDIS_STATUS_INTERFACE_DOWN		(0x40010010L)
+#define RNDIS_STATUS_MEDIA_BUSY			(0x40010011L)
+#define RNDIS_STATUS_MEDIA_SPECIFIC_INDICATION	(0x40010012L)
+#define RNDIS_STATUS_WW_INDICATION		RDIA_SPECIFIC_INDICATION
+#define RNDIS_STATUS_LINK_SPEED_CHANGE		(0x40010013L)
+
+#define RNDIS_STATUS_NOT_RESETTABLE		(0x80010001L)
+#define RNDIS_STATUS_SOFT_ERRORS		(0x80010003L)
+#define RNDIS_STATUS_HARD_ERRORS		(0x80010004L)
+#define RNDIS_STATUS_BUFFER_OVERFLOW		(STATUS_BUFFER_OVERFLOW)
+
+#define RNDIS_STATUS_FAILURE			(STATUS_UNSUCCESSFUL)
+#define RNDIS_STATUS_RESOURCES			(STATUS_INSUFFICIENT_RESOURCES)
+#define RNDIS_STATUS_CLOSING			(0xC0010002L)
+#define RNDIS_STATUS_BAD_VERSION		(0xC0010004L)
+#define RNDIS_STATUS_BAD_CHARACTERISTICS	(0xC0010005L)
+#define RNDIS_STATUS_ADAPTER_NOT_FOUND		(0xC0010006L)
+#define RNDIS_STATUS_OPEN_FAILED		(0xC0010007L)
+#define RNDIS_STATUS_DEVICE_FAILED		(0xC0010008L)
+#define RNDIS_STATUS_MULTICAST_FULL		(0xC0010009L)
+#define RNDIS_STATUS_MULTICAST_EXISTS		(0xC001000AL)
+#define RNDIS_STATUS_MULTICAST_NOT_FOUND	(0xC001000BL)
+#define RNDIS_STATUS_REQUEST_ABORTED		(0xC001000CL)
+#define RNDIS_STATUS_RESET_IN_PROGRESS		(0xC001000DL)
+#define RNDIS_STATUS_CLOSING_INDICATING		(0xC001000EL)
+#define RNDIS_STATUS_NOT_SUPPORTED		(STATUS_NOT_SUPPORTED)
+#define RNDIS_STATUS_INVALID_PACKET		(0xC001000FL)
+#define RNDIS_STATUS_OPEN_LIST_FULL		(0xC0010010L)
+#define RNDIS_STATUS_ADAPTER_NOT_READY		(0xC0010011L)
+#define RNDIS_STATUS_ADAPTER_NOT_OPEN		(0xC0010012L)
+#define RNDIS_STATUS_NOT_INDICATING		(0xC0010013L)
+#define RNDIS_STATUS_INVALID_LENGTH		(0xC0010014L)
+#define RNDIS_STATUS_INVALID_DATA		(0xC0010015L)
+#define RNDIS_STATUS_BUFFER_TOO_SHORT		(0xC0010016L)
+#define RNDIS_STATUS_INVALID_OID		(0xC0010017L)
+#define RNDIS_STATUS_ADAPTER_REMOVED		(0xC0010018L)
+#define RNDIS_STATUS_UNSUPPORTED_MEDIA		(0xC0010019L)
+#define RNDIS_STATUS_GROUP_ADDRESS_IN_USE	(0xC001001AL)
+#define RNDIS_STATUS_FILE_NOT_FOUND		(0xC001001BL)
+#define RNDIS_STATUS_ERROR_READING_FILE		(0xC001001CL)
+#define RNDIS_STATUS_ALREADY_MAPPED		(0xC001001DL)
+#define RNDIS_STATUS_RESOURCE_CONFLICT		(0xC001001EL)
+#define RNDIS_STATUS_NO_CABLE			(0xC001001FL)
+
+#define RNDIS_STATUS_INVALID_SAP		(0xC0010020L)
+#define RNDIS_STATUS_SAP_IN_USE			(0xC0010021L)
+#define RNDIS_STATUS_INVALID_ADDRESS		(0xC0010022L)
+#define RNDIS_STATUS_VC_NOT_ACTIVATED		(0xC0010023L)
+#define RNDIS_STATUS_DEST_OUT_OF_ORDER		(0xC0010024L)
+#define RNDIS_STATUS_VC_NOT_AVAILABLE		(0xC0010025L)
+#define RNDIS_STATUS_CELLRATE_NOT_AVAILABLE	(0xC0010026L)
+#define RNDIS_STATUS_INCOMPATABLE_QOS		(0xC0010027L)
+#define RNDIS_STATUS_AAL_PARAMS_UNSUPPORTED	(0xC0010028L)
+#define RNDIS_STATUS_NO_ROUTE_TO_DESTINATION	(0xC0010029L)
+
+#define RNDIS_STATUS_TOKEN_RING_OPEN_ERROR	(0xC0011000L)
+
+/* Object Identifiers used by NdisRequest Query/Set Information */
+/* General Objects */
+#define RNDIS_OID_GEN_SUPPORTED_LIST		0x00010101
+#define RNDIS_OID_GEN_HARDWARE_STATUS		0x00010102
+#define RNDIS_OID_GEN_MEDIA_SUPPORTED		0x00010103
+#define RNDIS_OID_GEN_MEDIA_IN_USE		0x00010104
+#define RNDIS_OID_GEN_MAXIMUM_LOOKAHEAD		0x00010105
+#define RNDIS_OID_GEN_MAXIMUM_FRAME_SIZE	0x00010106
+#define RNDIS_OID_GEN_LINK_SPEED		0x00010107
+#define RNDIS_OID_GEN_TRANSMIT_BUFFER_SPACE	0x00010108
+#define RNDIS_OID_GEN_RECEIVE_BUFFER_SPACE	0x00010109
+#define RNDIS_OID_GEN_TRANSMIT_BLOCK_SIZE	0x0001010A
+#define RNDIS_OID_GEN_RECEIVE_BLOCK_SIZE	0x0001010B
+#define RNDIS_OID_GEN_VENDOR_ID			0x0001010C
+#define RNDIS_OID_GEN_VENDOR_DESCRIPTION	0x0001010D
+#define RNDIS_OID_GEN_CURRENT_PACKET_FILTER	0x0001010E
+#define RNDIS_OID_GEN_CURRENT_LOOKAHEAD		0x0001010F
+#define RNDIS_OID_GEN_DRIVER_VERSION		0x00010110
+#define RNDIS_OID_GEN_MAXIMUM_TOTAL_SIZE	0x00010111
+#define RNDIS_OID_GEN_PROTOCOL_OPTIONS		0x00010112
+#define RNDIS_OID_GEN_MAC_OPTIONS		0x00010113
+#define RNDIS_OID_GEN_MEDIA_CONNECT_STATUS	0x00010114
+#define RNDIS_OID_GEN_MAXIMUM_SEND_PACKETS	0x00010115
+#define RNDIS_OID_GEN_VENDOR_DRIVER_VERSION	0x00010116
+#define RNDIS_OID_GEN_NETWORK_LAYER_ADDRESSES	0x00010118
+#define RNDIS_OID_GEN_TRANSPORT_HEADER_OFFSET	0x00010119
+#define RNDIS_OID_GEN_MACHINE_NAME		0x0001021A
+#define RNDIS_OID_GEN_RNDIS_CONFIG_PARAMETER	0x0001021B
+
+#define RNDIS_OID_GEN_XMIT_OK			0x00020101
+#define RNDIS_OID_GEN_RCV_OK			0x00020102
+#define RNDIS_OID_GEN_XMIT_ERROR		0x00020103
+#define RNDIS_OID_GEN_RCV_ERROR			0x00020104
+#define RNDIS_OID_GEN_RCV_NO_BUFFER		0x00020105
+
+#define RNDIS_OID_GEN_DIRECTED_BYTES_XMIT	0x00020201
+#define RNDIS_OID_GEN_DIRECTED_FRAMES_XMIT	0x00020202
+#define RNDIS_OID_GEN_MULTICAST_BYTES_XMIT	0x00020203
+#define RNDIS_OID_GEN_MULTICAST_FRAMES_XMIT	0x00020204
+#define RNDIS_OID_GEN_BROADCAST_BYTES_XMIT	0x00020205
+#define RNDIS_OID_GEN_BROADCAST_FRAMES_XMIT	0x00020206
+#define RNDIS_OID_GEN_DIRECTED_BYTES_RCV	0x00020207
+#define RNDIS_OID_GEN_DIRECTED_FRAMES_RCV	0x00020208
+#define RNDIS_OID_GEN_MULTICAST_BYTES_RCV	0x00020209
+#define RNDIS_OID_GEN_MULTICAST_FRAMES_RCV	0x0002020A
+#define RNDIS_OID_GEN_BROADCAST_BYTES_RCV	0x0002020B
+#define RNDIS_OID_GEN_BROADCAST_FRAMES_RCV	0x0002020C
+
+#define RNDIS_OID_GEN_RCV_CRC_ERROR		0x0002020D
+#define RNDIS_OID_GEN_TRANSMIT_QUEUE_LENGTH	0x0002020E
+
+#define RNDIS_OID_GEN_GET_TIME_CAPS		0x0002020F
+#define RNDIS_OID_GEN_GET_NETCARD_TIME		0x00020210
+
+/* These are connection-oriented general OIDs. */
+/* These replace the above OIDs for connection-oriented media. */
+#define RNDIS_OID_GEN_CO_SUPPORTED_LIST		0x00010101
+#define RNDIS_OID_GEN_CO_HARDWARE_STATUS	0x00010102
+#define RNDIS_OID_GEN_CO_MEDIA_SUPPORTED	0x00010103
+#define RNDIS_OID_GEN_CO_MEDIA_IN_USE		0x00010104
+#define RNDIS_OID_GEN_CO_LINK_SPEED		0x00010105
+#define RNDIS_OID_GEN_CO_VENDOR_ID		0x00010106
+#define RNDIS_OID_GEN_CO_VENDOR_DESCRIPTION	0x00010107
+#define RNDIS_OID_GEN_CO_DRIVER_VERSION		0x00010108
+#define RNDIS_OID_GEN_CO_PROTOCOL_OPTIONS	0x00010109
+#define RNDIS_OID_GEN_CO_MAC_OPTIONS		0x0001010A
+#define RNDIS_OID_GEN_CO_MEDIA_CONNECT_STATUS	0x0001010B
+#define RNDIS_OID_GEN_CO_VENDOR_DRIVER_VERSION	0x0001010C
+#define RNDIS_OID_GEN_CO_MINIMUM_LINK_SPEED	0x0001010D
+
+#define RNDIS_OID_GEN_CO_GET_TIME_CAPS		0x00010201
+#define RNDIS_OID_GEN_CO_GET_NETCARD_TIME	0x00010202
+
+/* These are connection-oriented statistics OIDs. */
+#define RNDIS_OID_GEN_CO_XMIT_PDUS_OK		0x00020101
+#define RNDIS_OID_GEN_CO_RCV_PDUS_OK		0x00020102
+#define RNDIS_OID_GEN_CO_XMIT_PDUS_ERROR	0x00020103
+#define RNDIS_OID_GEN_CO_RCV_PDUS_ERROR		0x00020104
+#define RNDIS_OID_GEN_CO_RCV_PDUS_NO_BUFFER	0x00020105
+
+
+#define RNDIS_OID_GEN_CO_RCV_CRC_ERROR		0x00020201
+#define RNDIS_OID_GEN_CO_TRANSMIT_QUEUE_LENGTH	0x00020202
+#define RNDIS_OID_GEN_CO_BYTES_XMIT		0x00020203
+#define RNDIS_OID_GEN_CO_BYTES_RCV		0x00020204
+#define RNDIS_OID_GEN_CO_BYTES_XMIT_OUTSTANDING	0x00020205
+#define RNDIS_OID_GEN_CO_NETCARD_LOAD		0x00020206
+
+/* These are objects for Connection-oriented media call-managers. */
+#define RNDIS_OID_CO_ADD_PVC			0xFF000001
+#define RNDIS_OID_CO_DELETE_PVC			0xFF000002
+#define RNDIS_OID_CO_GET_CALL_INFORMATION	0xFF000003
+#define RNDIS_OID_CO_ADD_ADDRESS		0xFF000004
+#define RNDIS_OID_CO_DELETE_ADDRESS		0xFF000005
+#define RNDIS_OID_CO_GET_ADDRESSES		0xFF000006
+#define RNDIS_OID_CO_ADDRESS_CHANGE		0xFF000007
+#define RNDIS_OID_CO_SIGNALING_ENABLED		0xFF000008
+#define RNDIS_OID_CO_SIGNALING_DISABLED		0xFF000009
+
+/* 802.3 Objects (Ethernet) */
+#define RNDIS_OID_802_3_PERMANENT_ADDRESS	0x01010101
+#define RNDIS_OID_802_3_CURRENT_ADDRESS		0x01010102
+#define RNDIS_OID_802_3_MULTICAST_LIST		0x01010103
+#define RNDIS_OID_802_3_MAXIMUM_LIST_SIZE	0x01010104
+#define RNDIS_OID_802_3_MAC_OPTIONS		0x01010105
+
+#define NDIS_802_3_MAC_OPTION_PRIORITY		0x00000001
+
+#define RNDIS_OID_802_3_RCV_ERROR_ALIGNMENT	0x01020101
+#define RNDIS_OID_802_3_XMIT_ONE_COLLISION	0x01020102
+#define RNDIS_OID_802_3_XMIT_MORE_COLLISIONS	0x01020103
+
+#define RNDIS_OID_802_3_XMIT_DEFERRED		0x01020201
+#define RNDIS_OID_802_3_XMIT_MAX_COLLISIONS	0x01020202
+#define RNDIS_OID_802_3_RCV_OVERRUN		0x01020203
+#define RNDIS_OID_802_3_XMIT_UNDERRUN		0x01020204
+#define RNDIS_OID_802_3_XMIT_HEARTBEAT_FAILURE	0x01020205
+#define RNDIS_OID_802_3_XMIT_TIMES_CRS_LOST	0x01020206
+#define RNDIS_OID_802_3_XMIT_LATE_COLLISIONS	0x01020207
+
+/* Remote NDIS message types */
+#define REMOTE_NDIS_PACKET_MSG			0x00000001
+#define REMOTE_NDIS_INITIALIZE_MSG		0x00000002
+#define REMOTE_NDIS_HALT_MSG			0x00000003
+#define REMOTE_NDIS_QUERY_MSG			0x00000004
+#define REMOTE_NDIS_SET_MSG			0x00000005
+#define REMOTE_NDIS_RESET_MSG			0x00000006
+#define REMOTE_NDIS_INDICATE_STATUS_MSG		0x00000007
+#define REMOTE_NDIS_KEEPALIVE_MSG		0x00000008
+
+#define REMOTE_CONDIS_MP_CREATE_VC_MSG		0x00008001
+#define REMOTE_CONDIS_MP_DELETE_VC_MSG		0x00008002
+#define REMOTE_CONDIS_MP_ACTIVATE_VC_MSG	0x00008005
+#define REMOTE_CONDIS_MP_DEACTIVATE_VC_MSG	0x00008006
+#define REMOTE_CONDIS_INDICATE_STATUS_MSG	0x00008007
+
+/* Remote NDIS message completion types */
+#define REMOTE_NDIS_INITIALIZE_CMPLT		0x80000002
+#define REMOTE_NDIS_QUERY_CMPLT			0x80000004
+#define REMOTE_NDIS_SET_CMPLT			0x80000005
+#define REMOTE_NDIS_RESET_CMPLT			0x80000006
+#define REMOTE_NDIS_KEEPALIVE_CMPLT		0x80000008
+
+#define REMOTE_CONDIS_MP_CREATE_VC_CMPLT	0x80008001
+#define REMOTE_CONDIS_MP_DELETE_VC_CMPLT	0x80008002
+#define REMOTE_CONDIS_MP_ACTIVATE_VC_CMPLT	0x80008005
+#define REMOTE_CONDIS_MP_DEACTIVATE_VC_CMPLT	0x80008006
+
+/*
+ * Reserved message type for private communication between lower-layer host
+ * driver and remote device, if necessary.
+ */
+#define REMOTE_NDIS_BUS_MSG			0xff000001
+
+/*  Defines for DeviceFlags in struct rndis_initialize_complete */
+#define RNDIS_DF_CONNECTIONLESS			0x00000001
+#define RNDIS_DF_CONNECTION_ORIENTED		0x00000002
+#define RNDIS_DF_RAW_DATA			0x00000004
+
+/*  Remote NDIS medium types. */
+#define RNDIS_MEDIUM_802_3			0x00000000
+#define RNDIS_MEDIUM_802_5			0x00000001
+#define RNDIS_MEDIUM_FDDI				0x00000002
+#define RNDIS_MEDIUM_WAN				0x00000003
+#define RNDIS_MEDIUM_LOCAL_TALK			0x00000004
+#define RNDIS_MEDIUM_ARCNET_RAW			0x00000006
+#define RNDIS_MEDIUM_ARCNET_878_2			0x00000007
+#define RNDIS_MEDIUM_ATM				0x00000008
+#define RNDIS_MEDIUM_WIRELESS_WAN			0x00000009
+#define RNDIS_MEDIUM_IRDA				0x0000000a
+#define RNDIS_MEDIUM_CO_WAN			0x0000000b
+/* Not a real medium, defined as an upper-bound */
+#define RNDIS_MEDIUM_MAX				0x0000000d
+
+
+/* Remote NDIS medium connection states. */
+#define RNDIS_MEDIA_STATE_CONNECTED		0x00000000
+#define RNDIS_MEDIA_STATE_DISCONNECTED		0x00000001
+
+/*  Remote NDIS version numbers */
+#define RNDIS_MAJOR_VERSION			0x00000001
+#define RNDIS_MINOR_VERSION			0x00000000
+
+
+/* NdisInitialize message */
+struct rndis_initialize_request {
+	u32 req_id;
+	u32 major_ver;
+	u32 minor_ver;
+	u32 max_xfer_size;
+};
+
+/* Response to NdisInitialize */
+struct rndis_initialize_complete {
+	u32 req_id;
+	u32 status;
+	u32 major_ver;
+	u32 minor_ver;
+	u32 dev_flags;
+	u32 medium;
+	u32 max_pkt_per_msg;
+	u32 max_xfer_size;
+	u32 pkt_alignment_factor;
+	u32 af_list_offset;
+	u32 af_list_size;
+};
+
+/* Call manager devices only: Information about an address family */
+/* supported by the device is appended to the response to NdisInitialize. */
+struct rndis_co_address_family {
+	u32 address_family;
+	u32 major_ver;
+	u32 minor_ver;
+};
+
+/* NdisHalt message */
+struct rndis_halt_request {
+	u32 req_id;
+};
+
+/* NdisQueryRequest message */
+struct rndis_query_request {
+	u32 req_id;
+	u32 oid;
+	u32 info_buflen;
+	u32 info_buf_offset;
+	u32 dev_vc_handle;
+};
+
+/* Response to NdisQueryRequest */
+struct rndis_query_complete {
+	u32 req_id;
+	u32 status;
+	u32 info_buflen;
+	u32 info_buf_offset;
+};
+
+/* NdisSetRequest message */
+struct rndis_set_request {
+	u32 req_id;
+	u32 oid;
+	u32 info_buflen;
+	u32 info_buf_offset;
+	u32 dev_vc_handle;
+};
+
+/* Response to NdisSetRequest */
+struct rndis_set_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/* NdisReset message */
+struct rndis_reset_request {
+	u32 reserved;
+};
+
+/* Response to NdisReset */
+struct rndis_reset_complete {
+	u32 status;
+	u32 addressing_reset;
+};
+
+/* NdisMIndicateStatus message */
+struct rndis_indicate_status {
+	u32 status;
+	u32 status_buflen;
+	u32 status_buf_offset;
+};
+
+/* Diagnostic information passed as the status buffer in */
+/* struct rndis_indicate_status messages signifying error conditions. */
+struct rndis_diagnostic_info {
+	u32 diag_status;
+	u32 error_offset;
+};
+
+/* NdisKeepAlive message */
+struct rndis_keepalive_request {
+	u32 req_id;
+};
+
+/* Response to NdisKeepAlive */
+struct rndis_keepalive_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/*
+ * Data message. All Offset fields contain byte offsets from the beginning of
+ * struct rndis_packet. All Length fields are in bytes.  VcHandle is set
+ * to 0 for connectionless data, otherwise it contains the VC handle.
+ */
+struct rndis_packet {
+	u32 data_offset;
+	u32 data_len;
+	u32 oob_data_offset;
+	u32 oob_data_len;
+	u32 num_oob_data_elements;
+	u32 per_pkt_info_offset;
+	u32 per_pkt_info_len;
+	u32 vc_handle;
+	u32 reserved;
+};
+
+/* Optional Out of Band data associated with a Data message. */
+struct rndis_oobd {
+	u32 size;
+	u32 type;
+	u32 class_info_offset;
+};
+
+/* Packet extension field contents associated with a Data message. */
+struct rndis_per_packet_info {
+	u32 size;
+	u32 type;
+	u32 per_pkt_info_offset;
+};
+
+/* Format of Information buffer passed in a SetRequest for the OID */
+/* OID_GEN_RNDIS_CONFIG_PARAMETER. */
+struct rndis_config_parameter_info {
+	u32 parameter_name_offset;
+	u32 parameter_name_length;
+	u32 parameter_type;
+	u32 parameter_value_offset;
+	u32 parameter_value_length;
+};
+
+/* Values for ParameterType in struct rndis_config_parameter_info */
+#define RNDIS_CONFIG_PARAM_TYPE_INTEGER     0
+#define RNDIS_CONFIG_PARAM_TYPE_STRING      2
+
+/* CONDIS Miniport messages for connection oriented devices */
+/* that do not implement a call manager. */
+
+/* CoNdisMiniportCreateVc message */
+struct rcondis_mp_create_vc {
+	u32 req_id;
+	u32 ndis_vc_handle;
+};
+
+/* Response to CoNdisMiniportCreateVc */
+struct rcondis_mp_create_vc_complete {
+	u32 req_id;
+	u32 dev_vc_handle;
+	u32 status;
+};
+
+/* CoNdisMiniportDeleteVc message */
+struct rcondis_mp_delete_vc {
+	u32 req_id;
+	u32 dev_vc_handle;
+};
+
+/* Response to CoNdisMiniportDeleteVc */
+struct rcondis_mp_delete_vc_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/* CoNdisMiniportQueryRequest message */
+struct rcondis_mp_query_request {
+	u32 req_id;
+	u32 request_type;
+	u32 oid;
+	u32 dev_vc_handle;
+	u32 info_buflen;
+	u32 info_buf_offset;
+};
+
+/* CoNdisMiniportSetRequest message */
+struct rcondis_mp_set_request {
+	u32 req_id;
+	u32 request_type;
+	u32 oid;
+	u32 dev_vc_handle;
+	u32 info_buflen;
+	u32 info_buf_offset;
+};
+
+/* CoNdisIndicateStatus message */
+struct rcondis_indicate_status {
+	u32 ndis_vc_handle;
+	u32 status;
+	u32 status_buflen;
+	u32 status_buf_offset;
+};
+
+/* CONDIS Call/VC parameters */
+struct rcondis_specific_parameters {
+	u32 parameter_type;
+	u32 parameter_length;
+	u32 parameter_lffset;
+};
+
+struct rcondis_media_parameters {
+	u32 flags;
+	u32 reserved1;
+	u32 reserved2;
+	struct rcondis_specific_parameters media_specific;
+};
+
+struct rndis_flowspec {
+	u32 token_rate;
+	u32 token_bucket_size;
+	u32 peak_bandwidth;
+	u32 latency;
+	u32 delay_variation;
+	u32 service_type;
+	u32 max_sdu_size;
+	u32 minimum_policed_size;
+};
+
+struct rcondis_call_manager_parameters {
+	struct rndis_flowspec transmit;
+	struct rndis_flowspec receive;
+	struct rcondis_specific_parameters call_mgr_specific;
+};
+
+/* CoNdisMiniportActivateVc message */
+struct rcondis_mp_activate_vc_request {
+	u32 req_id;
+	u32 flags;
+	u32 dev_vc_handle;
+	u32 media_params_offset;
+	u32 media_params_length;
+	u32 call_mgr_params_offset;
+	u32 call_mgr_params_length;
+};
+
+/* Response to CoNdisMiniportActivateVc */
+struct rcondis_mp_activate_vc_complete {
+	u32 req_id;
+	u32 status;
+};
+
+/* CoNdisMiniportDeactivateVc message */
+struct rcondis_mp_deactivate_vc_request {
+	u32 req_id;
+	u32 flags;
+	u32 dev_vc_handle;
+};
+
+/* Response to CoNdisMiniportDeactivateVc */
+struct rcondis_mp_deactivate_vc_complete {
+	u32 req_id;
+	u32 status;
+};
+
+
+/* union with all of the RNDIS messages */
+union rndis_message_container {
+	struct rndis_packet pkt;
+	struct rndis_initialize_request init_req;
+	struct rndis_halt_request halt_req;
+	struct rndis_query_request query_req;
+	struct rndis_set_request set_req;
+	struct rndis_reset_request reset_req;
+	struct rndis_keepalive_request keep_alive_req;
+	struct rndis_indicate_status indicate_status;
+	struct rndis_initialize_complete init_complete;
+	struct rndis_query_complete query_complete;
+	struct rndis_set_complete set_complete;
+	struct rndis_reset_complete reset_complete;
+	struct rndis_keepalive_complete keep_alive_complete;
+	struct rcondis_mp_create_vc co_miniport_create_vc;
+	struct rcondis_mp_delete_vc co_miniport_delete_vc;
+	struct rcondis_indicate_status co_indicate_status;
+	struct rcondis_mp_activate_vc_request co_miniport_activate_vc;
+	struct rcondis_mp_deactivate_vc_request co_miniport_deactivate_vc;
+	struct rcondis_mp_create_vc_complete co_miniport_create_vc_complete;
+	struct rcondis_mp_delete_vc_complete co_miniport_delete_vc_complete;
+	struct rcondis_mp_activate_vc_complete co_miniport_activate_vc_complete;
+	struct rcondis_mp_deactivate_vc_complete
+		co_miniport_deactivate_vc_complete;
+};
+
+/* Remote NDIS message format */
+struct rndis_message {
+	u32 ndis_msg_type;
+
+	/* Total length of this message, from the beginning */
+	/* of the sruct rndis_message, in bytes. */
+	u32 msg_len;
+
+	/* Actual message */
+	union rndis_message_container msg;
+};
+
+
+struct rndis_filter_packet {
+	void *completion_ctx;
+	void (*completion)(void *context);
+	struct rndis_message msg;
+};
+
+/* Handy macros */
+
+/* get the size of an RNDIS message. Pass in the message type, */
+/* struct rndis_set_request, struct rndis_packet for example */
+#define RNDIS_MESSAGE_SIZE(msg)				\
+	(sizeof(msg) + (sizeof(struct rndis_message) -	\
+	 sizeof(union rndis_message_container)))
+
+/* get pointer to info buffer with message pointer */
+#define MESSAGE_TO_INFO_BUFFER(msg)				\
+	(((unsigned char *)(msg)) + msg->info_buf_offset)
+
+/* get pointer to status buffer with message pointer */
+#define MESSAGE_TO_STATUS_BUFFER(msg)			\
+	(((unsigned char *)(msg)) + msg->status_buf_offset)
+
+/* get pointer to OOBD buffer with message pointer */
+#define MESSAGE_TO_OOBD_BUFFER(msg)				\
+	(((unsigned char *)(msg)) + msg->oob_data_offset)
+
+/* get pointer to data buffer with message pointer */
+#define MESSAGE_TO_DATA_BUFFER(msg)				\
+	(((unsigned char *)(msg)) + msg->per_pkt_info_offset)
+
+/* get pointer to contained message from NDIS_MESSAGE pointer */
+#define RNDIS_MESSAGE_PTR_TO_MESSAGE_PTR(rndis_msg)		\
+	((void *) &rndis_msg->msg)
+
+/* get pointer to contained message from NDIS_MESSAGE pointer */
+#define RNDIS_MESSAGE_RAW_PTR_TO_MESSAGE_PTR(rndis_msg)	\
+	((void *) rndis_msg)
+
+
+#define __struct_bcount(x)
+
+
+
+#define RNDIS_HEADER_SIZE	(sizeof(struct rndis_message) - \
+				 sizeof(union rndis_message_container))
+
+#define NDIS_PACKET_TYPE_DIRECTED	0x00000001
+#define NDIS_PACKET_TYPE_MULTICAST	0x00000002
+#define NDIS_PACKET_TYPE_ALL_MULTICAST	0x00000004
+#define NDIS_PACKET_TYPE_BROADCAST	0x00000008
+#define NDIS_PACKET_TYPE_SOURCE_ROUTING	0x00000010
+#define NDIS_PACKET_TYPE_PROMISCUOUS	0x00000020
+#define NDIS_PACKET_TYPE_SMT		0x00000040
+#define NDIS_PACKET_TYPE_ALL_LOCAL	0x00000080
+#define NDIS_PACKET_TYPE_GROUP		0x00000100
+#define NDIS_PACKET_TYPE_ALL_FUNCTIONAL	0x00000200
+#define NDIS_PACKET_TYPE_FUNCTIONAL	0x00000400
+#define NDIS_PACKET_TYPE_MAC_FRAME	0x00000800
+
+
+
+#endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
new file mode 100644
index 0000000..b902579
--- /dev/null
+++ b/drivers/net/hyperv/netvsc.c
@@ -0,0 +1,944 @@
+/*
+ * Copyright (c) 2009, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ *   Haiyang Zhang <haiyangz@microsoft.com>
+ *   Hank Janssen  <hjanssen@microsoft.com>
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/mm.h>
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <linux/slab.h>
+#include <linux/netdevice.h>
+
+#include "hyperv_net.h"
+
+
+static struct netvsc_device *alloc_net_device(struct hv_device *device)
+{
+	struct netvsc_device *net_device;
+	struct net_device *ndev = hv_get_drvdata(device);
+
+	net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
+	if (!net_device)
+		return NULL;
+
+
+	net_device->destroy = false;
+	net_device->dev = device;
+	net_device->ndev = ndev;
+
+	hv_set_drvdata(device, net_device);
+	return net_device;
+}
+
+static struct netvsc_device *get_outbound_net_device(struct hv_device *device)
+{
+	struct netvsc_device *net_device;
+
+	net_device = hv_get_drvdata(device);
+	if (net_device && net_device->destroy)
+		net_device = NULL;
+
+	return net_device;
+}
+
+static struct netvsc_device *get_inbound_net_device(struct hv_device *device)
+{
+	struct netvsc_device *net_device;
+
+	net_device = hv_get_drvdata(device);
+
+	if (!net_device)
+		goto get_in_err;
+
+	if (net_device->destroy &&
+		atomic_read(&net_device->num_outstanding_sends) == 0)
+		net_device = NULL;
+
+get_in_err:
+	return net_device;
+}
+
+
+static int netvsc_destroy_recv_buf(struct netvsc_device *net_device)
+{
+	struct nvsp_message *revoke_packet;
+	int ret = 0;
+	struct net_device *ndev = net_device->ndev;
+
+	/*
+	 * If we got a section count, it means we received a
+	 * SendReceiveBufferComplete msg (ie sent
+	 * NvspMessage1TypeSendReceiveBuffer msg) therefore, we need
+	 * to send a revoke msg here
+	 */
+	if (net_device->recv_section_cnt) {
+		/* Send the revoke receive buffer */
+		revoke_packet = &net_device->revoke_packet;
+		memset(revoke_packet, 0, sizeof(struct nvsp_message));
+
+		revoke_packet->hdr.msg_type =
+			NVSP_MSG1_TYPE_REVOKE_RECV_BUF;
+		revoke_packet->msg.v1_msg.
+		revoke_recv_buf.id = NETVSC_RECEIVE_BUFFER_ID;
+
+		ret = vmbus_sendpacket(net_device->dev->channel,
+				       revoke_packet,
+				       sizeof(struct nvsp_message),
+				       (unsigned long)revoke_packet,
+				       VM_PKT_DATA_INBAND, 0);
+		/*
+		 * If we failed here, we might as well return and
+		 * have a leak rather than continue and a bugchk
+		 */
+		if (ret != 0) {
+			netdev_err(ndev, "unable to send "
+				"revoke receive buffer to netvsp\n");
+			return ret;
+		}
+	}
+
+	/* Teardown the gpadl on the vsp end */
+	if (net_device->recv_buf_gpadl_handle) {
+		ret = vmbus_teardown_gpadl(net_device->dev->channel,
+			   net_device->recv_buf_gpadl_handle);
+
+		/* If we failed here, we might as well return and have a leak
+		 * rather than continue and a bugchk
+		 */
+		if (ret != 0) {
+			netdev_err(ndev,
+				   "unable to teardown receive buffer's gpadl\n");
+			return ret;
+		}
+		net_device->recv_buf_gpadl_handle = 0;
+	}
+
+	if (net_device->recv_buf) {
+		/* Free up the receive buffer */
+		free_pages((unsigned long)net_device->recv_buf,
+			get_order(net_device->recv_buf_size));
+		net_device->recv_buf = NULL;
+	}
+
+	if (net_device->recv_section) {
+		net_device->recv_section_cnt = 0;
+		kfree(net_device->recv_section);
+		net_device->recv_section = NULL;
+	}
+
+	return ret;
+}
+
+static int netvsc_init_recv_buf(struct hv_device *device)
+{
+	int ret = 0;
+	int t;
+	struct netvsc_device *net_device;
+	struct nvsp_message *init_packet;
+	struct net_device *ndev;
+
+	net_device = get_outbound_net_device(device);
+	if (!net_device)
+		return -ENODEV;
+	ndev = net_device->ndev;
+
+	net_device->recv_buf =
+		(void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO,
+				get_order(net_device->recv_buf_size));
+	if (!net_device->recv_buf) {
+		netdev_err(ndev, "unable to allocate receive "
+			"buffer of size %d\n", net_device->recv_buf_size);
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	/*
+	 * Establish the gpadl handle for this buffer on this
+	 * channel.  Note: This call uses the vmbus connection rather
+	 * than the channel to establish the gpadl handle.
+	 */
+	ret = vmbus_establish_gpadl(device->channel, net_device->recv_buf,
+				    net_device->recv_buf_size,
+				    &net_device->recv_buf_gpadl_handle);
+	if (ret != 0) {
+		netdev_err(ndev,
+			"unable to establish receive buffer's gpadl\n");
+		goto cleanup;
+	}
+
+
+	/* Notify the NetVsp of the gpadl handle */
+	init_packet = &net_device->channel_init_pkt;
+
+	memset(init_packet, 0, sizeof(struct nvsp_message));
+
+	init_packet->hdr.msg_type = NVSP_MSG1_TYPE_SEND_RECV_BUF;
+	init_packet->msg.v1_msg.send_recv_buf.
+		gpadl_handle = net_device->recv_buf_gpadl_handle;
+	init_packet->msg.v1_msg.
+		send_recv_buf.id = NETVSC_RECEIVE_BUFFER_ID;
+
+	/* Send the gpadl notification request */
+	ret = vmbus_sendpacket(device->channel, init_packet,
+			       sizeof(struct nvsp_message),
+			       (unsigned long)init_packet,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+	if (ret != 0) {
+		netdev_err(ndev,
+			"unable to send receive buffer's gpadl to netvsp\n");
+		goto cleanup;
+	}
+
+	t = wait_for_completion_timeout(&net_device->channel_init_wait, 5*HZ);
+	BUG_ON(t == 0);
+
+
+	/* Check the response */
+	if (init_packet->msg.v1_msg.
+	    send_recv_buf_complete.status != NVSP_STAT_SUCCESS) {
+		netdev_err(ndev, "Unable to complete receive buffer "
+			   "initialization with NetVsp - status %d\n",
+			   init_packet->msg.v1_msg.
+			   send_recv_buf_complete.status);
+		ret = -EINVAL;
+		goto cleanup;
+	}
+
+	/* Parse the response */
+
+	net_device->recv_section_cnt = init_packet->msg.
+		v1_msg.send_recv_buf_complete.num_sections;
+
+	net_device->recv_section = kmalloc(net_device->recv_section_cnt
+		* sizeof(struct nvsp_1_receive_buffer_section), GFP_KERNEL);
+	if (net_device->recv_section == NULL) {
+		ret = -EINVAL;
+		goto cleanup;
+	}
+
+	memcpy(net_device->recv_section,
+		init_packet->msg.v1_msg.
+	       send_recv_buf_complete.sections,
+		net_device->recv_section_cnt *
+	       sizeof(struct nvsp_1_receive_buffer_section));
+
+	/*
+	 * For 1st release, there should only be 1 section that represents the
+	 * entire receive buffer
+	 */
+	if (net_device->recv_section_cnt != 1 ||
+	    net_device->recv_section->offset != 0) {
+		ret = -EINVAL;
+		goto cleanup;
+	}
+
+	goto exit;
+
+cleanup:
+	netvsc_destroy_recv_buf(net_device);
+
+exit:
+	return ret;
+}
+
+
+static int netvsc_connect_vsp(struct hv_device *device)
+{
+	int ret, t;
+	struct netvsc_device *net_device;
+	struct nvsp_message *init_packet;
+	int ndis_version;
+	struct net_device *ndev;
+
+	net_device = get_outbound_net_device(device);
+	if (!net_device)
+		return -ENODEV;
+	ndev = net_device->ndev;
+
+	init_packet = &net_device->channel_init_pkt;
+
+	memset(init_packet, 0, sizeof(struct nvsp_message));
+	init_packet->hdr.msg_type = NVSP_MSG_TYPE_INIT;
+	init_packet->msg.init_msg.init.min_protocol_ver =
+		NVSP_MIN_PROTOCOL_VERSION;
+	init_packet->msg.init_msg.init.max_protocol_ver =
+		NVSP_MAX_PROTOCOL_VERSION;
+
+	/* Send the init request */
+	ret = vmbus_sendpacket(device->channel, init_packet,
+			       sizeof(struct nvsp_message),
+			       (unsigned long)init_packet,
+			       VM_PKT_DATA_INBAND,
+			       VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+
+	if (ret != 0)
+		goto cleanup;
+
+	t = wait_for_completion_timeout(&net_device->channel_init_wait, 5*HZ);
+
+	if (t == 0) {
+		ret = -ETIMEDOUT;
+		goto cleanup;
+	}
+
+	if (init_packet->msg.init_msg.init_complete.status !=
+	    NVSP_STAT_SUCCESS) {
+		ret = -EINVAL;
+		goto cleanup;
+	}
+
+	if (init_packet->msg.init_msg.init_complete.
+	    negotiated_protocol_ver != NVSP_PROTOCOL_VERSION_1) {
+		ret = -EPROTO;
+		goto cleanup;
+	}
+	/* Send the ndis version */
+	memset(init_packet, 0, sizeof(struct nvsp_message));
+
+	ndis_version = 0x00050000;
+
+	init_packet->hdr.msg_type = NVSP_MSG1_TYPE_SEND_NDIS_VER;
+	init_packet->msg.v1_msg.
+		send_ndis_ver.ndis_major_ver =
+				(ndis_version & 0xFFFF0000) >> 16;
+	init_packet->msg.v1_msg.
+		send_ndis_ver.ndis_minor_ver =
+				ndis_version & 0xFFFF;
+
+	/* Send the init request */
+	ret = vmbus_sendpacket(device->channel, init_packet,
+				sizeof(struct nvsp_message),
+				(unsigned long)init_packet,
+				VM_PKT_DATA_INBAND, 0);
+	if (ret != 0)
+		goto cleanup;
+
+	/* Post the big receive buffer to NetVSP */
+	ret = netvsc_init_recv_buf(device);
+
+cleanup:
+	return ret;
+}
+
+static void netvsc_disconnect_vsp(struct netvsc_device *net_device)
+{
+	netvsc_destroy_recv_buf(net_device);
+}
+
+/*
+ * netvsc_device_remove - Callback when the root bus device is removed
+ */
+int netvsc_device_remove(struct hv_device *device)
+{
+	struct netvsc_device *net_device;
+	struct hv_netvsc_packet *netvsc_packet, *pos;
+	unsigned long flags;
+
+	net_device = hv_get_drvdata(device);
+	spin_lock_irqsave(&device->channel->inbound_lock, flags);
+	net_device->destroy = true;
+	spin_unlock_irqrestore(&device->channel->inbound_lock, flags);
+
+	/* Wait for all send completions */
+	while (atomic_read(&net_device->num_outstanding_sends)) {
+		dev_info(&device->device,
+			"waiting for %d requests to complete...\n",
+			atomic_read(&net_device->num_outstanding_sends));
+		udelay(100);
+	}
+
+	netvsc_disconnect_vsp(net_device);
+
+	/*
+	 * Since we have already drained, we don't need to busy wait
+	 * as was done in final_release_stor_device()
+	 * Note that we cannot set the ext pointer to NULL until
+	 * we have drained - to drain the outgoing packets, we need to
+	 * allow incoming packets.
+	 */
+
+	spin_lock_irqsave(&device->channel->inbound_lock, flags);
+	hv_set_drvdata(device, NULL);
+	spin_unlock_irqrestore(&device->channel->inbound_lock, flags);
+
+	/*
+	 * At this point, no one should be accessing net_device
+	 * except in here
+	 */
+	dev_notice(&device->device, "net device safe to remove\n");
+
+	/* Now, we can close the channel safely */
+	vmbus_close(device->channel);
+
+	/* Release all resources */
+	list_for_each_entry_safe(netvsc_packet, pos,
+				 &net_device->recv_pkt_list, list_ent) {
+		list_del(&netvsc_packet->list_ent);
+		kfree(netvsc_packet);
+	}
+
+	kfree(net_device);
+	return 0;
+}
+
+static void netvsc_send_completion(struct hv_device *device,
+				   struct vmpacket_descriptor *packet)
+{
+	struct netvsc_device *net_device;
+	struct nvsp_message *nvsp_packet;
+	struct hv_netvsc_packet *nvsc_packet;
+	struct net_device *ndev;
+
+	net_device = get_inbound_net_device(device);
+	if (!net_device)
+		return;
+	ndev = net_device->ndev;
+
+	nvsp_packet = (struct nvsp_message *)((unsigned long)packet +
+			(packet->offset8 << 3));
+
+	if ((nvsp_packet->hdr.msg_type == NVSP_MSG_TYPE_INIT_COMPLETE) ||
+	    (nvsp_packet->hdr.msg_type ==
+	     NVSP_MSG1_TYPE_SEND_RECV_BUF_COMPLETE) ||
+	    (nvsp_packet->hdr.msg_type ==
+	     NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE)) {
+		/* Copy the response back */
+		memcpy(&net_device->channel_init_pkt, nvsp_packet,
+		       sizeof(struct nvsp_message));
+		complete(&net_device->channel_init_wait);
+	} else if (nvsp_packet->hdr.msg_type ==
+		   NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE) {
+		/* Get the send context */
+		nvsc_packet = (struct hv_netvsc_packet *)(unsigned long)
+			packet->trans_id;
+
+		/* Notify the layer above us */
+		nvsc_packet->completion.send.send_completion(
+			nvsc_packet->completion.send.send_completion_ctx);
+
+		atomic_dec(&net_device->num_outstanding_sends);
+	} else {
+		netdev_err(ndev, "Unknown send completion packet type- "
+			   "%d received!!\n", nvsp_packet->hdr.msg_type);
+	}
+
+}
+
+int netvsc_send(struct hv_device *device,
+			struct hv_netvsc_packet *packet)
+{
+	struct netvsc_device *net_device;
+	int ret = 0;
+	struct nvsp_message sendMessage;
+	struct net_device *ndev;
+
+	net_device = get_outbound_net_device(device);
+	if (!net_device)
+		return -ENODEV;
+	ndev = net_device->ndev;
+
+	sendMessage.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
+	if (packet->is_data_pkt) {
+		/* 0 is RMC_DATA; */
+		sendMessage.msg.v1_msg.send_rndis_pkt.channel_type = 0;
+	} else {
+		/* 1 is RMC_CONTROL; */
+		sendMessage.msg.v1_msg.send_rndis_pkt.channel_type = 1;
+	}
+
+	/* Not using send buffer section */
+	sendMessage.msg.v1_msg.send_rndis_pkt.send_buf_section_index =
+		0xFFFFFFFF;
+	sendMessage.msg.v1_msg.send_rndis_pkt.send_buf_section_size = 0;
+
+	if (packet->page_buf_cnt) {
+		ret = vmbus_sendpacket_pagebuffer(device->channel,
+						  packet->page_buf,
+						  packet->page_buf_cnt,
+						  &sendMessage,
+						  sizeof(struct nvsp_message),
+						  (unsigned long)packet);
+	} else {
+		ret = vmbus_sendpacket(device->channel, &sendMessage,
+				sizeof(struct nvsp_message),
+				(unsigned long)packet,
+				VM_PKT_DATA_INBAND,
+				VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+
+	}
+
+	if (ret != 0)
+		netdev_err(ndev, "Unable to send packet %p ret %d\n",
+			   packet, ret);
+	else
+		atomic_inc(&net_device->num_outstanding_sends);
+
+	return ret;
+}
+
+static void netvsc_send_recv_completion(struct hv_device *device,
+					u64 transaction_id)
+{
+	struct nvsp_message recvcompMessage;
+	int retries = 0;
+	int ret;
+	struct net_device *ndev;
+	struct netvsc_device *net_device = hv_get_drvdata(device);
+
+	ndev = net_device->ndev;
+
+	recvcompMessage.hdr.msg_type =
+				NVSP_MSG1_TYPE_SEND_RNDIS_PKT_COMPLETE;
+
+	/* FIXME: Pass in the status */
+	recvcompMessage.msg.v1_msg.send_rndis_pkt_complete.status =
+		NVSP_STAT_SUCCESS;
+
+retry_send_cmplt:
+	/* Send the completion */
+	ret = vmbus_sendpacket(device->channel, &recvcompMessage,
+			       sizeof(struct nvsp_message), transaction_id,
+			       VM_PKT_COMP, 0);
+	if (ret == 0) {
+		/* success */
+		/* no-op */
+	} else if (ret == -EAGAIN) {
+		/* no more room...wait a bit and attempt to retry 3 times */
+		retries++;
+		netdev_err(ndev, "unable to send receive completion pkt"
+			" (tid %llx)...retrying %d\n", transaction_id, retries);
+
+		if (retries < 4) {
+			udelay(100);
+			goto retry_send_cmplt;
+		} else {
+			netdev_err(ndev, "unable to send receive "
+				"completion pkt (tid %llx)...give up retrying\n",
+				transaction_id);
+		}
+	} else {
+		netdev_err(ndev, "unable to send receive "
+			"completion pkt - %llx\n", transaction_id);
+	}
+}
+
+/* Send a receive completion packet to RNDIS device (ie NetVsp) */
+static void netvsc_receive_completion(void *context)
+{
+	struct hv_netvsc_packet *packet = context;
+	struct hv_device *device = (struct hv_device *)packet->device;
+	struct netvsc_device *net_device;
+	u64 transaction_id = 0;
+	bool fsend_receive_comp = false;
+	unsigned long flags;
+	struct net_device *ndev;
+
+	/*
+	 * Even though it seems logical to do a GetOutboundNetDevice() here to
+	 * send out receive completion, we are using GetInboundNetDevice()
+	 * since we may have disable outbound traffic already.
+	 */
+	net_device = get_inbound_net_device(device);
+	if (!net_device)
+		return;
+	ndev = net_device->ndev;
+
+	/* Overloading use of the lock. */
+	spin_lock_irqsave(&net_device->recv_pkt_list_lock, flags);
+
+	packet->xfer_page_pkt->count--;
+
+	/*
+	 * Last one in the line that represent 1 xfer page packet.
+	 * Return the xfer page packet itself to the freelist
+	 */
+	if (packet->xfer_page_pkt->count == 0) {
+		fsend_receive_comp = true;
+		transaction_id = packet->completion.recv.recv_completion_tid;
+		list_add_tail(&packet->xfer_page_pkt->list_ent,
+			      &net_device->recv_pkt_list);
+
+	}
+
+	/* Put the packet back */
+	list_add_tail(&packet->list_ent, &net_device->recv_pkt_list);
+	spin_unlock_irqrestore(&net_device->recv_pkt_list_lock, flags);
+
+	/* Send a receive completion for the xfer page packet */
+	if (fsend_receive_comp)
+		netvsc_send_recv_completion(device, transaction_id);
+
+}
+
+static void netvsc_receive(struct hv_device *device,
+			    struct vmpacket_descriptor *packet)
+{
+	struct netvsc_device *net_device;
+	struct vmtransfer_page_packet_header *vmxferpage_packet;
+	struct nvsp_message *nvsp_packet;
+	struct hv_netvsc_packet *netvsc_packet = NULL;
+	unsigned long start;
+	unsigned long end, end_virtual;
+	/* struct netvsc_driver *netvscDriver; */
+	struct xferpage_packet *xferpage_packet = NULL;
+	int i, j;
+	int count = 0, bytes_remain = 0;
+	unsigned long flags;
+	struct net_device *ndev;
+
+	LIST_HEAD(listHead);
+
+	net_device = get_inbound_net_device(device);
+	if (!net_device)
+		return;
+	ndev = net_device->ndev;
+
+	/*
+	 * All inbound packets other than send completion should be xfer page
+	 * packet
+	 */
+	if (packet->type != VM_PKT_DATA_USING_XFER_PAGES) {
+		netdev_err(ndev, "Unknown packet type received - %d\n",
+			   packet->type);
+		return;
+	}
+
+	nvsp_packet = (struct nvsp_message *)((unsigned long)packet +
+			(packet->offset8 << 3));
+
+	/* Make sure this is a valid nvsp packet */
+	if (nvsp_packet->hdr.msg_type !=
+	    NVSP_MSG1_TYPE_SEND_RNDIS_PKT) {
+		netdev_err(ndev, "Unknown nvsp packet type received-"
+			" %d\n", nvsp_packet->hdr.msg_type);
+		return;
+	}
+
+	vmxferpage_packet = (struct vmtransfer_page_packet_header *)packet;
+
+	if (vmxferpage_packet->xfer_pageset_id != NETVSC_RECEIVE_BUFFER_ID) {
+		netdev_err(ndev, "Invalid xfer page set id - "
+			   "expecting %x got %x\n", NETVSC_RECEIVE_BUFFER_ID,
+			   vmxferpage_packet->xfer_pageset_id);
+		return;
+	}
+
+	/*
+	 * Grab free packets (range count + 1) to represent this xfer
+	 * page packet. +1 to represent the xfer page packet itself.
+	 * We grab it here so that we know exactly how many we can
+	 * fulfil
+	 */
+	spin_lock_irqsave(&net_device->recv_pkt_list_lock, flags);
+	while (!list_empty(&net_device->recv_pkt_list)) {
+		list_move_tail(net_device->recv_pkt_list.next, &listHead);
+		if (++count == vmxferpage_packet->range_cnt + 1)
+			break;
+	}
+	spin_unlock_irqrestore(&net_device->recv_pkt_list_lock, flags);
+
+	/*
+	 * We need at least 2 netvsc pkts (1 to represent the xfer
+	 * page and at least 1 for the range) i.e. we can handled
+	 * some of the xfer page packet ranges...
+	 */
+	if (count < 2) {
+		netdev_err(ndev, "Got only %d netvsc pkt...needed "
+			"%d pkts. Dropping this xfer page packet completely!\n",
+			count, vmxferpage_packet->range_cnt + 1);
+
+		/* Return it to the freelist */
+		spin_lock_irqsave(&net_device->recv_pkt_list_lock, flags);
+		for (i = count; i != 0; i--) {
+			list_move_tail(listHead.next,
+				       &net_device->recv_pkt_list);
+		}
+		spin_unlock_irqrestore(&net_device->recv_pkt_list_lock,
+				       flags);
+
+		netvsc_send_recv_completion(device,
+					    vmxferpage_packet->d.trans_id);
+
+		return;
+	}
+
+	/* Remove the 1st packet to represent the xfer page packet itself */
+	xferpage_packet = (struct xferpage_packet *)listHead.next;
+	list_del(&xferpage_packet->list_ent);
+
+	/* This is how much we can satisfy */
+	xferpage_packet->count = count - 1;
+
+	if (xferpage_packet->count != vmxferpage_packet->range_cnt) {
+		netdev_err(ndev, "Needed %d netvsc pkts to satisfy "
+			"this xfer page...got %d\n",
+			vmxferpage_packet->range_cnt, xferpage_packet->count);
+	}
+
+	/* Each range represents 1 RNDIS pkt that contains 1 ethernet frame */
+	for (i = 0; i < (count - 1); i++) {
+		netvsc_packet = (struct hv_netvsc_packet *)listHead.next;
+		list_del(&netvsc_packet->list_ent);
+
+		/* Initialize the netvsc packet */
+		netvsc_packet->xfer_page_pkt = xferpage_packet;
+		netvsc_packet->completion.recv.recv_completion =
+					netvsc_receive_completion;
+		netvsc_packet->completion.recv.recv_completion_ctx =
+					netvsc_packet;
+		netvsc_packet->device = device;
+		/* Save this so that we can send it back */
+		netvsc_packet->completion.recv.recv_completion_tid =
+					vmxferpage_packet->d.trans_id;
+
+		netvsc_packet->total_data_buflen =
+					vmxferpage_packet->ranges[i].byte_count;
+		netvsc_packet->page_buf_cnt = 1;
+
+		netvsc_packet->page_buf[0].len =
+					vmxferpage_packet->ranges[i].byte_count;
+
+		start = virt_to_phys((void *)((unsigned long)net_device->
+		recv_buf + vmxferpage_packet->ranges[i].byte_offset));
+
+		netvsc_packet->page_buf[0].pfn = start >> PAGE_SHIFT;
+		end_virtual = (unsigned long)net_device->recv_buf
+		    + vmxferpage_packet->ranges[i].byte_offset
+		    + vmxferpage_packet->ranges[i].byte_count - 1;
+		end = virt_to_phys((void *)end_virtual);
+
+		/* Calculate the page relative offset */
+		netvsc_packet->page_buf[0].offset =
+			vmxferpage_packet->ranges[i].byte_offset &
+			(PAGE_SIZE - 1);
+		if ((end >> PAGE_SHIFT) != (start >> PAGE_SHIFT)) {
+			/* Handle frame across multiple pages: */
+			netvsc_packet->page_buf[0].len =
+				(netvsc_packet->page_buf[0].pfn <<
+				 PAGE_SHIFT)
+				+ PAGE_SIZE - start;
+			bytes_remain = netvsc_packet->total_data_buflen -
+					netvsc_packet->page_buf[0].len;
+			for (j = 1; j < NETVSC_PACKET_MAXPAGE; j++) {
+				netvsc_packet->page_buf[j].offset = 0;
+				if (bytes_remain <= PAGE_SIZE) {
+					netvsc_packet->page_buf[j].len =
+						bytes_remain;
+					bytes_remain = 0;
+				} else {
+					netvsc_packet->page_buf[j].len =
+						PAGE_SIZE;
+					bytes_remain -= PAGE_SIZE;
+				}
+				netvsc_packet->page_buf[j].pfn =
+				    virt_to_phys((void *)(end_virtual -
+						bytes_remain)) >> PAGE_SHIFT;
+				netvsc_packet->page_buf_cnt++;
+				if (bytes_remain == 0)
+					break;
+			}
+		}
+
+		/* Pass it to the upper layer */
+		rndis_filter_receive(device, netvsc_packet);
+
+		netvsc_receive_completion(netvsc_packet->
+				completion.recv.recv_completion_ctx);
+	}
+
+}
+
+static void netvsc_channel_cb(void *context)
+{
+	int ret;
+	struct hv_device *device = context;
+	struct netvsc_device *net_device;
+	u32 bytes_recvd;
+	u64 request_id;
+	unsigned char *packet;
+	struct vmpacket_descriptor *desc;
+	unsigned char *buffer;
+	int bufferlen = NETVSC_PACKET_SIZE;
+	struct net_device *ndev;
+
+	packet = kzalloc(NETVSC_PACKET_SIZE * sizeof(unsigned char),
+			 GFP_ATOMIC);
+	if (!packet)
+		return;
+	buffer = packet;
+
+	net_device = get_inbound_net_device(device);
+	if (!net_device)
+		goto out;
+	ndev = net_device->ndev;
+
+	do {
+		ret = vmbus_recvpacket_raw(device->channel, buffer, bufferlen,
+					   &bytes_recvd, &request_id);
+		if (ret == 0) {
+			if (bytes_recvd > 0) {
+				desc = (struct vmpacket_descriptor *)buffer;
+				switch (desc->type) {
+				case VM_PKT_COMP:
+					netvsc_send_completion(device, desc);
+					break;
+
+				case VM_PKT_DATA_USING_XFER_PAGES:
+					netvsc_receive(device, desc);
+					break;
+
+				default:
+					netdev_err(ndev,
+						   "unhandled packet type %d, "
+						   "tid %llx len %d\n",
+						   desc->type, request_id,
+						   bytes_recvd);
+					break;
+				}
+
+				/* reset */
+				if (bufferlen > NETVSC_PACKET_SIZE) {
+					kfree(buffer);
+					buffer = packet;
+					bufferlen = NETVSC_PACKET_SIZE;
+				}
+			} else {
+				/* reset */
+				if (bufferlen > NETVSC_PACKET_SIZE) {
+					kfree(buffer);
+					buffer = packet;
+					bufferlen = NETVSC_PACKET_SIZE;
+				}
+
+				break;
+			}
+		} else if (ret == -ENOBUFS) {
+			/* Handle large packet */
+			buffer = kmalloc(bytes_recvd, GFP_ATOMIC);
+			if (buffer == NULL) {
+				/* Try again next time around */
+				netdev_err(ndev,
+					   "unable to allocate buffer of size "
+					   "(%d)!!\n", bytes_recvd);
+				break;
+			}
+
+			bufferlen = bytes_recvd;
+		}
+	} while (1);
+
+out:
+	kfree(buffer);
+	return;
+}
+
+/*
+ * netvsc_device_add - Callback when the device belonging to this
+ * driver is added
+ */
+int netvsc_device_add(struct hv_device *device, void *additional_info)
+{
+	int ret = 0;
+	int i;
+	int ring_size =
+	((struct netvsc_device_info *)additional_info)->ring_size;
+	struct netvsc_device *net_device;
+	struct hv_netvsc_packet *packet, *pos;
+	struct net_device *ndev;
+
+	net_device = alloc_net_device(device);
+	if (!net_device) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	/*
+	 * Coming into this function, struct net_device * is
+	 * registered as the driver private data.
+	 * In alloc_net_device(), we register struct netvsc_device *
+	 * as the driver private data and stash away struct net_device *
+	 * in struct netvsc_device *.
+	 */
+	ndev = net_device->ndev;
+
+	/* Initialize the NetVSC channel extension */
+	net_device->recv_buf_size = NETVSC_RECEIVE_BUFFER_SIZE;
+	spin_lock_init(&net_device->recv_pkt_list_lock);
+
+	INIT_LIST_HEAD(&net_device->recv_pkt_list);
+
+	for (i = 0; i < NETVSC_RECEIVE_PACKETLIST_COUNT; i++) {
+		packet = kzalloc(sizeof(struct hv_netvsc_packet) +
+				 (NETVSC_RECEIVE_SG_COUNT *
+				  sizeof(struct hv_page_buffer)), GFP_KERNEL);
+		if (!packet)
+			break;
+
+		list_add_tail(&packet->list_ent,
+			      &net_device->recv_pkt_list);
+	}
+	init_completion(&net_device->channel_init_wait);
+
+	/* Open the channel */
+	ret = vmbus_open(device->channel, ring_size * PAGE_SIZE,
+			 ring_size * PAGE_SIZE, NULL, 0,
+			 netvsc_channel_cb, device);
+
+	if (ret != 0) {
+		netdev_err(ndev, "unable to open channel: %d\n", ret);
+		goto cleanup;
+	}
+
+	/* Channel is opened */
+	pr_info("hv_netvsc channel opened successfully\n");
+
+	/* Connect with the NetVsp */
+	ret = netvsc_connect_vsp(device);
+	if (ret != 0) {
+		netdev_err(ndev,
+			"unable to connect to NetVSP - %d\n", ret);
+		goto close;
+	}
+
+	return ret;
+
+close:
+	/* Now, we can close the channel safely */
+	vmbus_close(device->channel);
+
+cleanup:
+
+	if (net_device) {
+		list_for_each_entry_safe(packet, pos,
+					 &net_device->recv_pkt_list,
+					 list_ent) {
+			list_del(&packet->list_ent);
+			kfree(packet);
+		}
+
+		kfree(net_device);
+	}
+
+	return ret;
+}
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
new file mode 100644
index 0000000..561ba58
--- /dev/null
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -0,0 +1,456 @@
+/*
+ * Copyright (c) 2009, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ *   Haiyang Zhang <haiyangz@microsoft.com>
+ *   Hank Janssen  <hjanssen@microsoft.com>
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/init.h>
+#include <linux/atomic.h>
+#include <linux/module.h>
+#include <linux/highmem.h>
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/delay.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/skbuff.h>
+#include <linux/in.h>
+#include <linux/slab.h>
+#include <net/arp.h>
+#include <net/route.h>
+#include <net/sock.h>
+#include <net/pkt_sched.h>
+
+#include "hyperv_net.h"
+
+struct net_device_context {
+	/* point back to our device context */
+	struct hv_device *device_ctx;
+	atomic_t avail;
+	struct delayed_work dwork;
+};
+
+
+#define PACKET_PAGES_LOWATER  8
+/* Need this many pages to handle worst case fragmented packet */
+#define PACKET_PAGES_HIWATER  (MAX_SKB_FRAGS + 2)
+
+static int ring_size = 128;
+module_param(ring_size, int, S_IRUGO);
+MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
+
+/* no-op so the netdev core doesn't return -EINVAL when modifying the the
+ * multicast address list in SIOCADDMULTI. hv is setup to get all multicast
+ * when it calls RndisFilterOnOpen() */
+static void netvsc_set_multicast_list(struct net_device *net)
+{
+}
+
+static int netvsc_open(struct net_device *net)
+{
+	struct net_device_context *net_device_ctx = netdev_priv(net);
+	struct hv_device *device_obj = net_device_ctx->device_ctx;
+	int ret = 0;
+
+	/* Open up the device */
+	ret = rndis_filter_open(device_obj);
+	if (ret != 0) {
+		netdev_err(net, "unable to open device (ret %d).\n", ret);
+		return ret;
+	}
+
+	netif_start_queue(net);
+
+	return ret;
+}
+
+static int netvsc_close(struct net_device *net)
+{
+	struct net_device_context *net_device_ctx = netdev_priv(net);
+	struct hv_device *device_obj = net_device_ctx->device_ctx;
+	int ret;
+
+	netif_stop_queue(net);
+
+	ret = rndis_filter_close(device_obj);
+	if (ret != 0)
+		netdev_err(net, "unable to close device (ret %d).\n", ret);
+
+	return ret;
+}
+
+static void netvsc_xmit_completion(void *context)
+{
+	struct hv_netvsc_packet *packet = (struct hv_netvsc_packet *)context;
+	struct sk_buff *skb = (struct sk_buff *)
+		(unsigned long)packet->completion.send.send_completion_tid;
+
+	kfree(packet);
+
+	if (skb) {
+		struct net_device *net = skb->dev;
+		struct net_device_context *net_device_ctx = netdev_priv(net);
+		unsigned int num_pages = skb_shinfo(skb)->nr_frags + 2;
+
+		dev_kfree_skb_any(skb);
+
+		atomic_add(num_pages, &net_device_ctx->avail);
+		if (atomic_read(&net_device_ctx->avail) >=
+				PACKET_PAGES_HIWATER)
+			netif_wake_queue(net);
+	}
+}
+
+static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
+{
+	struct net_device_context *net_device_ctx = netdev_priv(net);
+	struct hv_netvsc_packet *packet;
+	int ret;
+	unsigned int i, num_pages;
+
+	/* Add 1 for skb->data and additional one for RNDIS */
+	num_pages = skb_shinfo(skb)->nr_frags + 1 + 1;
+	if (num_pages > atomic_read(&net_device_ctx->avail))
+		return NETDEV_TX_BUSY;
+
+	/* Allocate a netvsc packet based on # of frags. */
+	packet = kzalloc(sizeof(struct hv_netvsc_packet) +
+			 (num_pages * sizeof(struct hv_page_buffer)) +
+			 sizeof(struct rndis_filter_packet), GFP_ATOMIC);
+	if (!packet) {
+		/* out of memory, drop packet */
+		netdev_err(net, "unable to allocate hv_netvsc_packet\n");
+
+		dev_kfree_skb(skb);
+		net->stats.tx_dropped++;
+		return NETDEV_TX_BUSY;
+	}
+
+	packet->extension = (void *)(unsigned long)packet +
+				sizeof(struct hv_netvsc_packet) +
+				    (num_pages * sizeof(struct hv_page_buffer));
+
+	/* Setup the rndis header */
+	packet->page_buf_cnt = num_pages;
+
+	/* Initialize it from the skb */
+	packet->total_data_buflen	= skb->len;
+
+	/* Start filling in the page buffers starting after RNDIS buffer. */
+	packet->page_buf[1].pfn = virt_to_phys(skb->data) >> PAGE_SHIFT;
+	packet->page_buf[1].offset
+		= (unsigned long)skb->data & (PAGE_SIZE - 1);
+	packet->page_buf[1].len = skb_headlen(skb);
+
+	/* Additional fragments are after SKB data */
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+
+		packet->page_buf[i+2].pfn = page_to_pfn(f->page);
+		packet->page_buf[i+2].offset = f->page_offset;
+		packet->page_buf[i+2].len = f->size;
+	}
+
+	/* Set the completion routine */
+	packet->completion.send.send_completion = netvsc_xmit_completion;
+	packet->completion.send.send_completion_ctx = packet;
+	packet->completion.send.send_completion_tid = (unsigned long)skb;
+
+	ret = rndis_filter_send(net_device_ctx->device_ctx,
+				  packet);
+	if (ret == 0) {
+		net->stats.tx_bytes += skb->len;
+		net->stats.tx_packets++;
+
+		atomic_sub(num_pages, &net_device_ctx->avail);
+		if (atomic_read(&net_device_ctx->avail) < PACKET_PAGES_LOWATER)
+			netif_stop_queue(net);
+	} else {
+		/* we are shutting down or bus overloaded, just drop packet */
+		net->stats.tx_dropped++;
+		kfree(packet);
+		dev_kfree_skb_any(skb);
+	}
+
+	return ret ? NETDEV_TX_BUSY : NETDEV_TX_OK;
+}
+
+/*
+ * netvsc_linkstatus_callback - Link up/down notification
+ */
+void netvsc_linkstatus_callback(struct hv_device *device_obj,
+				       unsigned int status)
+{
+	struct net_device *net;
+	struct net_device_context *ndev_ctx;
+	struct netvsc_device *net_device;
+
+	net_device = hv_get_drvdata(device_obj);
+	net = net_device->ndev;
+
+	if (!net) {
+		netdev_err(net, "got link status but net device "
+				"not initialized yet\n");
+		return;
+	}
+
+	if (status == 1) {
+		netif_carrier_on(net);
+		netif_wake_queue(net);
+		ndev_ctx = netdev_priv(net);
+		schedule_delayed_work(&ndev_ctx->dwork, 0);
+		schedule_delayed_work(&ndev_ctx->dwork, msecs_to_jiffies(20));
+	} else {
+		netif_carrier_off(net);
+		netif_stop_queue(net);
+	}
+}
+
+/*
+ * netvsc_recv_callback -  Callback when we receive a packet from the
+ * "wire" on the specified device.
+ */
+int netvsc_recv_callback(struct hv_device *device_obj,
+				struct hv_netvsc_packet *packet)
+{
+	struct net_device *net = dev_get_drvdata(&device_obj->device);
+	struct sk_buff *skb;
+	void *data;
+	int i;
+	unsigned long flags;
+	struct netvsc_device *net_device;
+
+	net_device = hv_get_drvdata(device_obj);
+	net = net_device->ndev;
+
+	if (!net) {
+		netdev_err(net, "got receive callback but net device"
+			" not initialized yet\n");
+		return 0;
+	}
+
+	/* Allocate a skb - TODO direct I/O to pages? */
+	skb = netdev_alloc_skb_ip_align(net, packet->total_data_buflen);
+	if (unlikely(!skb)) {
+		++net->stats.rx_dropped;
+		return 0;
+	}
+
+	/* for kmap_atomic */
+	local_irq_save(flags);
+
+	/*
+	 * Copy to skb. This copy is needed here since the memory pointed by
+	 * hv_netvsc_packet cannot be deallocated
+	 */
+	for (i = 0; i < packet->page_buf_cnt; i++) {
+		data = kmap_atomic(pfn_to_page(packet->page_buf[i].pfn),
+					       KM_IRQ1);
+		data = (void *)(unsigned long)data +
+				packet->page_buf[i].offset;
+
+		memcpy(skb_put(skb, packet->page_buf[i].len), data,
+		       packet->page_buf[i].len);
+
+		kunmap_atomic((void *)((unsigned long)data -
+				       packet->page_buf[i].offset), KM_IRQ1);
+	}
+
+	local_irq_restore(flags);
+
+	skb->protocol = eth_type_trans(skb, net);
+	skb->ip_summed = CHECKSUM_NONE;
+
+	net->stats.rx_packets++;
+	net->stats.rx_bytes += skb->len;
+
+	/*
+	 * Pass the skb back up. Network stack will deallocate the skb when it
+	 * is done.
+	 * TODO - use NAPI?
+	 */
+	netif_rx(skb);
+
+	return 0;
+}
+
+static void netvsc_get_drvinfo(struct net_device *net,
+			       struct ethtool_drvinfo *info)
+{
+	strcpy(info->driver, "hv_netvsc");
+	strcpy(info->version, HV_DRV_VERSION);
+	strcpy(info->fw_version, "N/A");
+}
+
+static const struct ethtool_ops ethtool_ops = {
+	.get_drvinfo	= netvsc_get_drvinfo,
+	.get_link	= ethtool_op_get_link,
+};
+
+static const struct net_device_ops device_ops = {
+	.ndo_open =			netvsc_open,
+	.ndo_stop =			netvsc_close,
+	.ndo_start_xmit =		netvsc_start_xmit,
+	.ndo_set_multicast_list =	netvsc_set_multicast_list,
+	.ndo_change_mtu =		eth_change_mtu,
+	.ndo_validate_addr =		eth_validate_addr,
+	.ndo_set_mac_address =		eth_mac_addr,
+};
+
+/*
+ * Send GARP packet to network peers after migrations.
+ * After Quick Migration, the network is not immediately operational in the
+ * current context when receiving RNDIS_STATUS_MEDIA_CONNECT event. So, add
+ * another netif_notify_peers() into a delayed work, otherwise GARP packet
+ * will not be sent after quick migration, and cause network disconnection.
+ */
+static void netvsc_send_garp(struct work_struct *w)
+{
+	struct net_device_context *ndev_ctx;
+	struct net_device *net;
+	struct netvsc_device *net_device;
+
+	ndev_ctx = container_of(w, struct net_device_context, dwork.work);
+	net_device = hv_get_drvdata(ndev_ctx->device_ctx);
+	net = net_device->ndev;
+	netif_notify_peers(net);
+}
+
+
+static int netvsc_probe(struct hv_device *dev,
+			const struct hv_vmbus_device_id *dev_id)
+{
+	struct net_device *net = NULL;
+	struct net_device_context *net_device_ctx;
+	struct netvsc_device_info device_info;
+	int ret;
+
+	net = alloc_etherdev(sizeof(struct net_device_context));
+	if (!net)
+		return -ENOMEM;
+
+	/* Set initial state */
+	netif_carrier_off(net);
+
+	net_device_ctx = netdev_priv(net);
+	net_device_ctx->device_ctx = dev;
+	atomic_set(&net_device_ctx->avail, ring_size);
+	hv_set_drvdata(dev, net);
+	INIT_DELAYED_WORK(&net_device_ctx->dwork, netvsc_send_garp);
+
+	net->netdev_ops = &device_ops;
+
+	/* TODO: Add GSO and Checksum offload */
+	net->hw_features = NETIF_F_SG;
+	net->features = NETIF_F_SG;
+
+	SET_ETHTOOL_OPS(net, &ethtool_ops);
+	SET_NETDEV_DEV(net, &dev->device);
+
+	ret = register_netdev(net);
+	if (ret != 0) {
+		pr_err("Unable to register netdev.\n");
+		free_netdev(net);
+		goto out;
+	}
+
+	/* Notify the netvsc driver of the new device */
+	device_info.ring_size = ring_size;
+	ret = rndis_filter_device_add(dev, &device_info);
+	if (ret != 0) {
+		netdev_err(net, "unable to add netvsc device (ret %d)\n", ret);
+		unregister_netdev(net);
+		free_netdev(net);
+		hv_set_drvdata(dev, NULL);
+		return ret;
+	}
+	memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
+
+	netif_carrier_on(net);
+
+out:
+	return ret;
+}
+
+static int netvsc_remove(struct hv_device *dev)
+{
+	struct net_device *net;
+	struct net_device_context *ndev_ctx;
+	struct netvsc_device *net_device;
+
+	net_device = hv_get_drvdata(dev);
+	net = net_device->ndev;
+
+	if (net == NULL) {
+		dev_err(&dev->device, "No net device to remove\n");
+		return 0;
+	}
+
+	ndev_ctx = netdev_priv(net);
+	cancel_delayed_work_sync(&ndev_ctx->dwork);
+
+	/* Stop outbound asap */
+	netif_stop_queue(net);
+
+	unregister_netdev(net);
+
+	/*
+	 * Call to the vsc driver to let it know that the device is being
+	 * removed
+	 */
+	rndis_filter_device_remove(dev);
+
+	free_netdev(net);
+	return 0;
+}
+
+static const struct hv_vmbus_device_id id_table[] = {
+	/* Network guid */
+	{ VMBUS_DEVICE(0x63, 0x51, 0x61, 0xF8, 0x3E, 0xDF, 0xc5, 0x46,
+		       0x91, 0x3F, 0xF2, 0xD2, 0xF9, 0x65, 0xED, 0x0E) },
+	{ },
+};
+
+MODULE_DEVICE_TABLE(vmbus, id_table);
+
+/* The one and only one */
+static struct  hv_driver netvsc_drv = {
+	.name = "netvsc",
+	.id_table = id_table,
+	.probe = netvsc_probe,
+	.remove = netvsc_remove,
+};
+
+static void __exit netvsc_drv_exit(void)
+{
+	vmbus_driver_unregister(&netvsc_drv);
+}
+
+static int __init netvsc_drv_init(void)
+{
+	return vmbus_driver_register(&netvsc_drv);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_VERSION(HV_DRV_VERSION);
+MODULE_DESCRIPTION("Microsoft Hyper-V network driver");
+
+module_init(netvsc_drv_init);
+module_exit(netvsc_drv_exit);
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
new file mode 100644
index 0000000..bafccb3
--- /dev/null
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -0,0 +1,855 @@
+/*
+ * Copyright (c) 2009, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Authors:
+ *   Haiyang Zhang <haiyangz@microsoft.com>
+ *   Hank Janssen  <hjanssen@microsoft.com>
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/highmem.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/if_ether.h>
+#include <linux/netdevice.h>
+
+#include "hyperv_net.h"
+
+
+enum rndis_device_state {
+	RNDIS_DEV_UNINITIALIZED = 0,
+	RNDIS_DEV_INITIALIZING,
+	RNDIS_DEV_INITIALIZED,
+	RNDIS_DEV_DATAINITIALIZED,
+};
+
+struct rndis_device {
+	struct netvsc_device *net_dev;
+
+	enum rndis_device_state state;
+	bool link_state;
+	atomic_t new_req_id;
+
+	spinlock_t request_lock;
+	struct list_head req_list;
+
+	unsigned char hw_mac_adr[ETH_ALEN];
+};
+
+struct rndis_request {
+	struct list_head list_ent;
+	struct completion  wait_event;
+
+	/*
+	 * FIXME: We assumed a fixed size response here. If we do ever need to
+	 * handle a bigger response, we can either define a max response
+	 * message or add a response buffer variable above this field
+	 */
+	struct rndis_message response_msg;
+
+	/* Simplify allocation by having a netvsc packet inline */
+	struct hv_netvsc_packet	pkt;
+	struct hv_page_buffer buf;
+	/* FIXME: We assumed a fixed size request here. */
+	struct rndis_message request_msg;
+};
+
+static void rndis_filter_send_completion(void *ctx);
+
+static void rndis_filter_send_request_completion(void *ctx);
+
+
+
+static struct rndis_device *get_rndis_device(void)
+{
+	struct rndis_device *device;
+
+	device = kzalloc(sizeof(struct rndis_device), GFP_KERNEL);
+	if (!device)
+		return NULL;
+
+	spin_lock_init(&device->request_lock);
+
+	INIT_LIST_HEAD(&device->req_list);
+
+	device->state = RNDIS_DEV_UNINITIALIZED;
+
+	return device;
+}
+
+static struct rndis_request *get_rndis_request(struct rndis_device *dev,
+					     u32 msg_type,
+					     u32 msg_len)
+{
+	struct rndis_request *request;
+	struct rndis_message *rndis_msg;
+	struct rndis_set_request *set;
+	unsigned long flags;
+
+	request = kzalloc(sizeof(struct rndis_request), GFP_KERNEL);
+	if (!request)
+		return NULL;
+
+	init_completion(&request->wait_event);
+
+	rndis_msg = &request->request_msg;
+	rndis_msg->ndis_msg_type = msg_type;
+	rndis_msg->msg_len = msg_len;
+
+	/*
+	 * Set the request id. This field is always after the rndis header for
+	 * request/response packet types so we just used the SetRequest as a
+	 * template
+	 */
+	set = &rndis_msg->msg.set_req;
+	set->req_id = atomic_inc_return(&dev->new_req_id);
+
+	/* Add to the request list */
+	spin_lock_irqsave(&dev->request_lock, flags);
+	list_add_tail(&request->list_ent, &dev->req_list);
+	spin_unlock_irqrestore(&dev->request_lock, flags);
+
+	return request;
+}
+
+static void put_rndis_request(struct rndis_device *dev,
+			    struct rndis_request *req)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->request_lock, flags);
+	list_del(&req->list_ent);
+	spin_unlock_irqrestore(&dev->request_lock, flags);
+
+	kfree(req);
+}
+
+static void dump_rndis_message(struct hv_device *hv_dev,
+			struct rndis_message *rndis_msg)
+{
+	struct net_device *netdev;
+	struct netvsc_device *net_device;
+
+	net_device = hv_get_drvdata(hv_dev);
+	netdev = net_device->ndev;
+
+	switch (rndis_msg->ndis_msg_type) {
+	case REMOTE_NDIS_PACKET_MSG:
+		netdev_dbg(netdev, "REMOTE_NDIS_PACKET_MSG (len %u, "
+			   "data offset %u data len %u, # oob %u, "
+			   "oob offset %u, oob len %u, pkt offset %u, "
+			   "pkt len %u\n",
+			   rndis_msg->msg_len,
+			   rndis_msg->msg.pkt.data_offset,
+			   rndis_msg->msg.pkt.data_len,
+			   rndis_msg->msg.pkt.num_oob_data_elements,
+			   rndis_msg->msg.pkt.oob_data_offset,
+			   rndis_msg->msg.pkt.oob_data_len,
+			   rndis_msg->msg.pkt.per_pkt_info_offset,
+			   rndis_msg->msg.pkt.per_pkt_info_len);
+		break;
+
+	case REMOTE_NDIS_INITIALIZE_CMPLT:
+		netdev_dbg(netdev, "REMOTE_NDIS_INITIALIZE_CMPLT "
+			"(len %u, id 0x%x, status 0x%x, major %d, minor %d, "
+			"device flags %d, max xfer size 0x%x, max pkts %u, "
+			"pkt aligned %u)\n",
+			rndis_msg->msg_len,
+			rndis_msg->msg.init_complete.req_id,
+			rndis_msg->msg.init_complete.status,
+			rndis_msg->msg.init_complete.major_ver,
+			rndis_msg->msg.init_complete.minor_ver,
+			rndis_msg->msg.init_complete.dev_flags,
+			rndis_msg->msg.init_complete.max_xfer_size,
+			rndis_msg->msg.init_complete.
+			   max_pkt_per_msg,
+			rndis_msg->msg.init_complete.
+			   pkt_alignment_factor);
+		break;
+
+	case REMOTE_NDIS_QUERY_CMPLT:
+		netdev_dbg(netdev, "REMOTE_NDIS_QUERY_CMPLT "
+			"(len %u, id 0x%x, status 0x%x, buf len %u, "
+			"buf offset %u)\n",
+			rndis_msg->msg_len,
+			rndis_msg->msg.query_complete.req_id,
+			rndis_msg->msg.query_complete.status,
+			rndis_msg->msg.query_complete.
+			   info_buflen,
+			rndis_msg->msg.query_complete.
+			   info_buf_offset);
+		break;
+
+	case REMOTE_NDIS_SET_CMPLT:
+		netdev_dbg(netdev,
+			"REMOTE_NDIS_SET_CMPLT (len %u, id 0x%x, status 0x%x)\n",
+			rndis_msg->msg_len,
+			rndis_msg->msg.set_complete.req_id,
+			rndis_msg->msg.set_complete.status);
+		break;
+
+	case REMOTE_NDIS_INDICATE_STATUS_MSG:
+		netdev_dbg(netdev, "REMOTE_NDIS_INDICATE_STATUS_MSG "
+			"(len %u, status 0x%x, buf len %u, buf offset %u)\n",
+			rndis_msg->msg_len,
+			rndis_msg->msg.indicate_status.status,
+			rndis_msg->msg.indicate_status.status_buflen,
+			rndis_msg->msg.indicate_status.status_buf_offset);
+		break;
+
+	default:
+		netdev_dbg(netdev, "0x%x (len %u)\n",
+			rndis_msg->ndis_msg_type,
+			rndis_msg->msg_len);
+		break;
+	}
+}
+
+static int rndis_filter_send_request(struct rndis_device *dev,
+				  struct rndis_request *req)
+{
+	int ret;
+	struct hv_netvsc_packet *packet;
+
+	/* Setup the packet to send it */
+	packet = &req->pkt;
+
+	packet->is_data_pkt = false;
+	packet->total_data_buflen = req->request_msg.msg_len;
+	packet->page_buf_cnt = 1;
+
+	packet->page_buf[0].pfn = virt_to_phys(&req->request_msg) >>
+					PAGE_SHIFT;
+	packet->page_buf[0].len = req->request_msg.msg_len;
+	packet->page_buf[0].offset =
+		(unsigned long)&req->request_msg & (PAGE_SIZE - 1);
+
+	packet->completion.send.send_completion_ctx = req;/* packet; */
+	packet->completion.send.send_completion =
+		rndis_filter_send_request_completion;
+	packet->completion.send.send_completion_tid = (unsigned long)dev;
+
+	ret = netvsc_send(dev->net_dev->dev, packet);
+	return ret;
+}
+
+static void rndis_filter_receive_response(struct rndis_device *dev,
+				       struct rndis_message *resp)
+{
+	struct rndis_request *request = NULL;
+	bool found = false;
+	unsigned long flags;
+	struct net_device *ndev;
+
+	ndev = dev->net_dev->ndev;
+
+	spin_lock_irqsave(&dev->request_lock, flags);
+	list_for_each_entry(request, &dev->req_list, list_ent) {
+		/*
+		 * All request/response message contains RequestId as the 1st
+		 * field
+		 */
+		if (request->request_msg.msg.init_req.req_id
+		    == resp->msg.init_complete.req_id) {
+			found = true;
+			break;
+		}
+	}
+	spin_unlock_irqrestore(&dev->request_lock, flags);
+
+	if (found) {
+		if (resp->msg_len <= sizeof(struct rndis_message)) {
+			memcpy(&request->response_msg, resp,
+			       resp->msg_len);
+		} else {
+			netdev_err(ndev,
+				"rndis response buffer overflow "
+				"detected (size %u max %zu)\n",
+				resp->msg_len,
+				sizeof(struct rndis_filter_packet));
+
+			if (resp->ndis_msg_type ==
+			    REMOTE_NDIS_RESET_CMPLT) {
+				/* does not have a request id field */
+				request->response_msg.msg.reset_complete.
+					status = STATUS_BUFFER_OVERFLOW;
+			} else {
+				request->response_msg.msg.
+				init_complete.status =
+					STATUS_BUFFER_OVERFLOW;
+			}
+		}
+
+		complete(&request->wait_event);
+	} else {
+		netdev_err(ndev,
+			"no rndis request found for this response "
+			"(id 0x%x res type 0x%x)\n",
+			resp->msg.init_complete.req_id,
+			resp->ndis_msg_type);
+	}
+}
+
+static void rndis_filter_receive_indicate_status(struct rndis_device *dev,
+					     struct rndis_message *resp)
+{
+	struct rndis_indicate_status *indicate =
+			&resp->msg.indicate_status;
+
+	if (indicate->status == RNDIS_STATUS_MEDIA_CONNECT) {
+		netvsc_linkstatus_callback(
+			dev->net_dev->dev, 1);
+	} else if (indicate->status == RNDIS_STATUS_MEDIA_DISCONNECT) {
+		netvsc_linkstatus_callback(
+			dev->net_dev->dev, 0);
+	} else {
+		/*
+		 * TODO:
+		 */
+	}
+}
+
+static void rndis_filter_receive_data(struct rndis_device *dev,
+				   struct rndis_message *msg,
+				   struct hv_netvsc_packet *pkt)
+{
+	struct rndis_packet *rndis_pkt;
+	u32 data_offset;
+	int i;
+
+	rndis_pkt = &msg->msg.pkt;
+
+	/*
+	 * FIXME: Handle multiple rndis pkt msgs that maybe enclosed in this
+	 * netvsc packet (ie TotalDataBufferLength != MessageLength)
+	 */
+
+	/* Remove the rndis header and pass it back up the stack */
+	data_offset = RNDIS_HEADER_SIZE + rndis_pkt->data_offset;
+
+	pkt->total_data_buflen -= data_offset;
+	pkt->page_buf[0].offset += data_offset;
+	pkt->page_buf[0].len -= data_offset;
+
+	/* Drop the 0th page, if rndis data go beyond page boundary */
+	if (pkt->page_buf[0].offset >= PAGE_SIZE) {
+		pkt->page_buf[1].offset = pkt->page_buf[0].offset - PAGE_SIZE;
+		pkt->page_buf[1].len -= pkt->page_buf[1].offset;
+		pkt->page_buf_cnt--;
+		for (i = 0; i < pkt->page_buf_cnt; i++)
+			pkt->page_buf[i] = pkt->page_buf[i+1];
+	}
+
+	pkt->is_data_pkt = true;
+
+	netvsc_recv_callback(dev->net_dev->dev, pkt);
+}
+
+int rndis_filter_receive(struct hv_device *dev,
+				struct hv_netvsc_packet	*pkt)
+{
+	struct netvsc_device *net_dev = hv_get_drvdata(dev);
+	struct rndis_device *rndis_dev;
+	struct rndis_message rndis_msg;
+	struct rndis_message *rndis_hdr;
+	struct net_device *ndev;
+
+	if (!net_dev)
+		return -EINVAL;
+
+	ndev = net_dev->ndev;
+
+	/* Make sure the rndis device state is initialized */
+	if (!net_dev->extension) {
+		netdev_err(ndev, "got rndis message but no rndis device - "
+			  "dropping this message!\n");
+		return -ENODEV;
+	}
+
+	rndis_dev = (struct rndis_device *)net_dev->extension;
+	if (rndis_dev->state == RNDIS_DEV_UNINITIALIZED) {
+		netdev_err(ndev, "got rndis message but rndis device "
+			   "uninitialized...dropping this message!\n");
+		return -ENODEV;
+	}
+
+	rndis_hdr = (struct rndis_message *)kmap_atomic(
+			pfn_to_page(pkt->page_buf[0].pfn), KM_IRQ0);
+
+	rndis_hdr = (void *)((unsigned long)rndis_hdr +
+			pkt->page_buf[0].offset);
+
+	/* Make sure we got a valid rndis message */
+	if ((rndis_hdr->ndis_msg_type != REMOTE_NDIS_PACKET_MSG) &&
+	    (rndis_hdr->msg_len > sizeof(struct rndis_message))) {
+		netdev_err(ndev, "incoming rndis message buffer overflow "
+			   "detected (got %u, max %zu)..marking it an error!\n",
+			   rndis_hdr->msg_len,
+			   sizeof(struct rndis_message));
+	}
+
+	memcpy(&rndis_msg, rndis_hdr,
+		(rndis_hdr->msg_len > sizeof(struct rndis_message)) ?
+			sizeof(struct rndis_message) :
+			rndis_hdr->msg_len);
+
+	kunmap_atomic(rndis_hdr - pkt->page_buf[0].offset, KM_IRQ0);
+
+	dump_rndis_message(dev, &rndis_msg);
+
+	switch (rndis_msg.ndis_msg_type) {
+	case REMOTE_NDIS_PACKET_MSG:
+		/* data msg */
+		rndis_filter_receive_data(rndis_dev, &rndis_msg, pkt);
+		break;
+
+	case REMOTE_NDIS_INITIALIZE_CMPLT:
+	case REMOTE_NDIS_QUERY_CMPLT:
+	case REMOTE_NDIS_SET_CMPLT:
+		/* completion msgs */
+		rndis_filter_receive_response(rndis_dev, &rndis_msg);
+		break;
+
+	case REMOTE_NDIS_INDICATE_STATUS_MSG:
+		/* notification msgs */
+		rndis_filter_receive_indicate_status(rndis_dev, &rndis_msg);
+		break;
+	default:
+		netdev_err(ndev,
+			"unhandled rndis message (type %u len %u)\n",
+			   rndis_msg.ndis_msg_type,
+			   rndis_msg.msg_len);
+		break;
+	}
+
+	return 0;
+}
+
+static int rndis_filter_query_device(struct rndis_device *dev, u32 oid,
+				  void *result, u32 *result_size)
+{
+	struct rndis_request *request;
+	u32 inresult_size = *result_size;
+	struct rndis_query_request *query;
+	struct rndis_query_complete *query_complete;
+	int ret = 0;
+	int t;
+
+	if (!result)
+		return -EINVAL;
+
+	*result_size = 0;
+	request = get_rndis_request(dev, REMOTE_NDIS_QUERY_MSG,
+			RNDIS_MESSAGE_SIZE(struct rndis_query_request));
+	if (!request) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	/* Setup the rndis query */
+	query = &request->request_msg.msg.query_req;
+	query->oid = oid;
+	query->info_buf_offset = sizeof(struct rndis_query_request);
+	query->info_buflen = 0;
+	query->dev_vc_handle = 0;
+
+	ret = rndis_filter_send_request(dev, request);
+	if (ret != 0)
+		goto cleanup;
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+	if (t == 0) {
+		ret = -ETIMEDOUT;
+		goto cleanup;
+	}
+
+	/* Copy the response back */
+	query_complete = &request->response_msg.msg.query_complete;
+
+	if (query_complete->info_buflen > inresult_size) {
+		ret = -1;
+		goto cleanup;
+	}
+
+	memcpy(result,
+	       (void *)((unsigned long)query_complete +
+			 query_complete->info_buf_offset),
+	       query_complete->info_buflen);
+
+	*result_size = query_complete->info_buflen;
+
+cleanup:
+	if (request)
+		put_rndis_request(dev, request);
+
+	return ret;
+}
+
+static int rndis_filter_query_device_mac(struct rndis_device *dev)
+{
+	u32 size = ETH_ALEN;
+
+	return rndis_filter_query_device(dev,
+				      RNDIS_OID_802_3_PERMANENT_ADDRESS,
+				      dev->hw_mac_adr, &size);
+}
+
+static int rndis_filter_query_device_link_status(struct rndis_device *dev)
+{
+	u32 size = sizeof(u32);
+	u32 link_status;
+	int ret;
+
+	ret = rndis_filter_query_device(dev,
+				      RNDIS_OID_GEN_MEDIA_CONNECT_STATUS,
+				      &link_status, &size);
+	dev->link_state = (link_status != 0) ? true : false;
+
+	return ret;
+}
+
+static int rndis_filter_set_packet_filter(struct rndis_device *dev,
+				      u32 new_filter)
+{
+	struct rndis_request *request;
+	struct rndis_set_request *set;
+	struct rndis_set_complete *set_complete;
+	u32 status;
+	int ret, t;
+	struct net_device *ndev;
+
+	ndev = dev->net_dev->ndev;
+
+	request = get_rndis_request(dev, REMOTE_NDIS_SET_MSG,
+			RNDIS_MESSAGE_SIZE(struct rndis_set_request) +
+			sizeof(u32));
+	if (!request) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	/* Setup the rndis set */
+	set = &request->request_msg.msg.set_req;
+	set->oid = RNDIS_OID_GEN_CURRENT_PACKET_FILTER;
+	set->info_buflen = sizeof(u32);
+	set->info_buf_offset = sizeof(struct rndis_set_request);
+
+	memcpy((void *)(unsigned long)set + sizeof(struct rndis_set_request),
+	       &new_filter, sizeof(u32));
+
+	ret = rndis_filter_send_request(dev, request);
+	if (ret != 0)
+		goto cleanup;
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+
+	if (t == 0) {
+		netdev_err(ndev,
+			"timeout before we got a set response...\n");
+		/*
+		 * We can't deallocate the request since we may still receive a
+		 * send completion for it.
+		 */
+		goto exit;
+	} else {
+		set_complete = &request->response_msg.msg.set_complete;
+		status = set_complete->status;
+	}
+
+cleanup:
+	if (request)
+		put_rndis_request(dev, request);
+exit:
+	return ret;
+}
+
+
+static int rndis_filter_init_device(struct rndis_device *dev)
+{
+	struct rndis_request *request;
+	struct rndis_initialize_request *init;
+	struct rndis_initialize_complete *init_complete;
+	u32 status;
+	int ret, t;
+
+	request = get_rndis_request(dev, REMOTE_NDIS_INITIALIZE_MSG,
+			RNDIS_MESSAGE_SIZE(struct rndis_initialize_request));
+	if (!request) {
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	/* Setup the rndis set */
+	init = &request->request_msg.msg.init_req;
+	init->major_ver = RNDIS_MAJOR_VERSION;
+	init->minor_ver = RNDIS_MINOR_VERSION;
+	/* FIXME: Use 1536 - rounded ethernet frame size */
+	init->max_xfer_size = 2048;
+
+	dev->state = RNDIS_DEV_INITIALIZING;
+
+	ret = rndis_filter_send_request(dev, request);
+	if (ret != 0) {
+		dev->state = RNDIS_DEV_UNINITIALIZED;
+		goto cleanup;
+	}
+
+
+	t = wait_for_completion_timeout(&request->wait_event, 5*HZ);
+
+	if (t == 0) {
+		ret = -ETIMEDOUT;
+		goto cleanup;
+	}
+
+	init_complete = &request->response_msg.msg.init_complete;
+	status = init_complete->status;
+	if (status == RNDIS_STATUS_SUCCESS) {
+		dev->state = RNDIS_DEV_INITIALIZED;
+		ret = 0;
+	} else {
+		dev->state = RNDIS_DEV_UNINITIALIZED;
+		ret = -EINVAL;
+	}
+
+cleanup:
+	if (request)
+		put_rndis_request(dev, request);
+
+	return ret;
+}
+
+static void rndis_filter_halt_device(struct rndis_device *dev)
+{
+	struct rndis_request *request;
+	struct rndis_halt_request *halt;
+
+	/* Attempt to do a rndis device halt */
+	request = get_rndis_request(dev, REMOTE_NDIS_HALT_MSG,
+				RNDIS_MESSAGE_SIZE(struct rndis_halt_request));
+	if (!request)
+		goto cleanup;
+
+	/* Setup the rndis set */
+	halt = &request->request_msg.msg.halt_req;
+	halt->req_id = atomic_inc_return(&dev->new_req_id);
+
+	/* Ignore return since this msg is optional. */
+	rndis_filter_send_request(dev, request);
+
+	dev->state = RNDIS_DEV_UNINITIALIZED;
+
+cleanup:
+	if (request)
+		put_rndis_request(dev, request);
+	return;
+}
+
+static int rndis_filter_open_device(struct rndis_device *dev)
+{
+	int ret;
+
+	if (dev->state != RNDIS_DEV_INITIALIZED)
+		return 0;
+
+	ret = rndis_filter_set_packet_filter(dev,
+					 NDIS_PACKET_TYPE_BROADCAST |
+					 NDIS_PACKET_TYPE_ALL_MULTICAST |
+					 NDIS_PACKET_TYPE_DIRECTED);
+	if (ret == 0)
+		dev->state = RNDIS_DEV_DATAINITIALIZED;
+
+	return ret;
+}
+
+static int rndis_filter_close_device(struct rndis_device *dev)
+{
+	int ret;
+
+	if (dev->state != RNDIS_DEV_DATAINITIALIZED)
+		return 0;
+
+	ret = rndis_filter_set_packet_filter(dev, 0);
+	if (ret == 0)
+		dev->state = RNDIS_DEV_INITIALIZED;
+
+	return ret;
+}
+
+int rndis_filter_device_add(struct hv_device *dev,
+				  void *additional_info)
+{
+	int ret;
+	struct netvsc_device *net_device;
+	struct rndis_device *rndis_device;
+	struct netvsc_device_info *device_info = additional_info;
+
+	rndis_device = get_rndis_device();
+	if (!rndis_device)
+		return -ENODEV;
+
+	/*
+	 * Let the inner driver handle this first to create the netvsc channel
+	 * NOTE! Once the channel is created, we may get a receive callback
+	 * (RndisFilterOnReceive()) before this call is completed
+	 */
+	ret = netvsc_device_add(dev, additional_info);
+	if (ret != 0) {
+		kfree(rndis_device);
+		return ret;
+	}
+
+
+	/* Initialize the rndis device */
+	net_device = hv_get_drvdata(dev);
+
+	net_device->extension = rndis_device;
+	rndis_device->net_dev = net_device;
+
+	/* Send the rndis initialization message */
+	ret = rndis_filter_init_device(rndis_device);
+	if (ret != 0) {
+		/*
+		 * TODO: If rndis init failed, we will need to shut down the
+		 * channel
+		 */
+	}
+
+	/* Get the mac address */
+	ret = rndis_filter_query_device_mac(rndis_device);
+	if (ret != 0) {
+		/*
+		 * TODO: shutdown rndis device and the channel
+		 */
+	}
+
+	memcpy(device_info->mac_adr, rndis_device->hw_mac_adr, ETH_ALEN);
+
+	rndis_filter_query_device_link_status(rndis_device);
+
+	device_info->link_state = rndis_device->link_state;
+
+	dev_info(&dev->device, "Device MAC %pM link state %s\n",
+		 rndis_device->hw_mac_adr,
+		 device_info->link_state ? "down" : "up");
+
+	return ret;
+}
+
+void rndis_filter_device_remove(struct hv_device *dev)
+{
+	struct netvsc_device *net_dev = hv_get_drvdata(dev);
+	struct rndis_device *rndis_dev = net_dev->extension;
+
+	/* Halt and release the rndis device */
+	rndis_filter_halt_device(rndis_dev);
+
+	kfree(rndis_dev);
+	net_dev->extension = NULL;
+
+	netvsc_device_remove(dev);
+}
+
+
+int rndis_filter_open(struct hv_device *dev)
+{
+	struct netvsc_device *net_device = hv_get_drvdata(dev);
+
+	if (!net_device)
+		return -EINVAL;
+
+	return rndis_filter_open_device(net_device->extension);
+}
+
+int rndis_filter_close(struct hv_device *dev)
+{
+	struct netvsc_device *netDevice = hv_get_drvdata(dev);
+
+	if (!netDevice)
+		return -EINVAL;
+
+	return rndis_filter_close_device(netDevice->extension);
+}
+
+int rndis_filter_send(struct hv_device *dev,
+			     struct hv_netvsc_packet *pkt)
+{
+	int ret;
+	struct rndis_filter_packet *filterPacket;
+	struct rndis_message *rndisMessage;
+	struct rndis_packet *rndisPacket;
+	u32 rndisMessageSize;
+
+	/* Add the rndis header */
+	filterPacket = (struct rndis_filter_packet *)pkt->extension;
+
+	memset(filterPacket, 0, sizeof(struct rndis_filter_packet));
+
+	rndisMessage = &filterPacket->msg;
+	rndisMessageSize = RNDIS_MESSAGE_SIZE(struct rndis_packet);
+
+	rndisMessage->ndis_msg_type = REMOTE_NDIS_PACKET_MSG;
+	rndisMessage->msg_len = pkt->total_data_buflen +
+				      rndisMessageSize;
+
+	rndisPacket = &rndisMessage->msg.pkt;
+	rndisPacket->data_offset = sizeof(struct rndis_packet);
+	rndisPacket->data_len = pkt->total_data_buflen;
+
+	pkt->is_data_pkt = true;
+	pkt->page_buf[0].pfn = virt_to_phys(rndisMessage) >> PAGE_SHIFT;
+	pkt->page_buf[0].offset =
+			(unsigned long)rndisMessage & (PAGE_SIZE-1);
+	pkt->page_buf[0].len = rndisMessageSize;
+
+	/* Save the packet send completion and context */
+	filterPacket->completion = pkt->completion.send.send_completion;
+	filterPacket->completion_ctx =
+				pkt->completion.send.send_completion_ctx;
+
+	/* Use ours */
+	pkt->completion.send.send_completion = rndis_filter_send_completion;
+	pkt->completion.send.send_completion_ctx = filterPacket;
+
+	ret = netvsc_send(dev, pkt);
+	if (ret != 0) {
+		/*
+		 * Reset the completion to originals to allow retries from
+		 * above
+		 */
+		pkt->completion.send.send_completion =
+				filterPacket->completion;
+		pkt->completion.send.send_completion_ctx =
+				filterPacket->completion_ctx;
+	}
+
+	return ret;
+}
+
+static void rndis_filter_send_completion(void *ctx)
+{
+	struct rndis_filter_packet *filterPacket = ctx;
+
+	/* Pass it back to the original handler */
+	filterPacket->completion(filterPacket->completion_ctx);
+}
+
+
+static void rndis_filter_send_request_completion(void *ctx)
+{
+	/* Noop */
+}
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: John Fastabend @ 2011-10-14 20:17 UTC (permalink / raw)
  To: Michael Chan
  Cc: 'Rick Jones', 'davem@davemloft.net',
	'netdev@vger.kernel.org', Dmitry Kravkov,
	Eilon Greenstein
In-Reply-To: <C27F8246C663564A84BB7AB34397724266D2018306@IRVEXCHCCR01.corp.ad.broadcom.com>

On 10/14/2011 9:15 AM, Michael Chan wrote:
> Rick Jones wrote:
> 
>> On 10/14/2011 08:53 AM, Michael Chan wrote:
>>> Rick Jones wrote:
>>>
>>>> Is this perhaps saying that a bnx2x-driven device being used for
>>>> FCoE or iSCSI boot must not permit *any* run-time configuration
>>>> change which leads to a NIC reset?
>>>>
>>>
>>> That is right.  Unless you have a multipath configuration with
>> multiple
>>> ports, then you can reset one port at a time.
>>
>> So, should there also be a "cnic_boot_device" check in many of the
>> "capital letter" ethtool paths?
>>
> 
> If the user is doing ethtool configuration changes or device shutdown,
> it is more obvious what the consequence will be.  The user may also be
> careful to do it on a multipath setup.
> 
> The reset caused by the auto turn-off of LRO when you enable
> ip_forward or bridging will not be obvious to the user.  In addition,
> all devices with LRO turned on will be reset at the same time so even
> multipath will not survive.
>

But after the reset the device should login and SCSI layer should
handle retries. So I don't see why this is a problem. Why do we
need to handle this any different from any other link events?

.John

^ permalink raw reply

* Re: [PATCH] net: ipv6: Allow netlink to set IPv6 address scope
From: Brian Haley @ 2011-10-14 20:14 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: maze, yoshfuji, netdev
In-Reply-To: <CAKD1Yr29dAC+_=bT5-_W3XuTzoxMiEzOhpHvwrAzyA8NdcCjSQ@mail.gmail.com>

On 10/13/2011 07:55 PM, Lorenzo Colitti wrote:
> On Mon, Oct 10, 2011 at 09:16, Brian Haley <brian.haley@hp.com> wrote:
>>> net: ipv6: Allow netlink to set IPv6 address scope
>>>
>>> Currently, userspace cannot specify the scope of IPv6
>>> addresses when creating or modifying them. Instead, the
>>> scope is automatically determined from the address itself.
>>> In IPv4, userspace can set whatever scope it likes.
>>>
>>> Allow userspace to specify the scope of IPv6 addresses in
>>> a backwards-compatible way: if the scope passed in is zero,
>>> use the old behaviour of automatically determining the
>>> scope based on the address.
>>>
>>> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
>>
>> Hi Lorenzo,
>>
>> I remember someone proposing a similar patch before and it was not accepted, do you have a use case for doing this?  It just seems like it will cause problems.
> 
> Well, to begin with it's a question of feature parity. If we allow
> users to set the scope of IPv4 addresses, then we should allow them to
> set the scope of IPv6 addresses as well. Policy belongs in userspace,
> not in the kernel.

I can understand the feature parity need, and we are just providing the rope for
the sysadmin to hang themselves with, but I'm still not convinced.  Hopefully
someone else will have an opinion.

> One use case is as follows. I have a phone with an always-on IPv6
> interface which is used for signaling, provisioning, SMS, etc. This
> IPv6 address is a global unicast IPv6 address, but it belongs to a
> closed carrier network, and cannot be used to communicate with the
> IPv6 Internet in either direction.

Playing devil's advocate here, isn't this a brain-dead ISP?  If they're giving
you a global IPv6 address you should get Internet connectivity with it.  If not,
you probably knew it up front, or you're going to find another provider that
does.  It's like they're giving you a site-local address...

So are you talking about being able to dynamically change the scope of an
address?  Wifi comes up - change provider addreses to host-local, wifi goes down
- change it back to global.  That looks like a hack.

> The phone also supports wifi. When there is IPv6 on the wifi
> interface, things get messy because the phone can decide to use the
> carrier source address on the wifi interface. This will not work,
> because the replies will be dropped in the carrier network.
> 
> You can stop this from happening in most cases by not putting a
> default route on the walled garden interface, and only using host
> routes as needed. Unfortunately, you can't stop it from happening in
> at least the following two cases:

The IETF MIF working group is looking at a lot of these scenarios, I only lurk
there and don't have any good drafts I can point you at though.

> 1. You receive an IPv6 RA on the wifi interface which provides a
> default route and a 6to4 address, or a default route and no IPv6
> address (for example, because the network uses DHCPv6 and not
> RFC4862-style autoconf)

A default route with only a link-local address isn't very useful.  Will the
kernel ever use this interface with the global address of your carrier - isn't
it going to prefer the interface that address is configured on?

> 2. You receive an IPv6 RA on the wifi interface which provides a
> default route and a "native" IPv6 address, but the phone attempts a
> connection while the address is in the tentative state (i.e., the
> phone is performing DAD on the address)

That's a pretty small window where the address is in tentative state (< 2 sec),
and re-trying shortly after will work.

> In both cases, ipv6_get_saddr_eval will return the carrier IPv6
> address (because "avoid tentative and optimistic addresses" takes
> precedence over "prefer outgoing interface" in RFC3484) and the kernel
> will pick the default route on the wifi interface. The return packets
> will get dropped and the connection will time out after several
> minutes.
> The only way I can think of to get this right is to set the carrier
> IPv6 address to a scope less than global - which, in effect, it is,
> because it can't reach the Internet.

You can also use routing rules, like anyone that does dual-homed with IPv4 does
- only use 1.2.3.4 on eth0, and only use 4.3.2.1 on eth1.

And there's also gai.conf, although I haven't played with that in a while.

The other trick/hack is to change the preferred lifetime of an address to zero,
which should mark it deprecated, moving it down in the selection list.

>> Also, there are other parts of the kernel (NFS, SCTP, IPv6 multicast) that are still calling ipv6_addr_scope() on a plain address - won't those be broken since they'll return the correct, RFC-implied scope?
> 
> Good point. I looked at these and don't think there is a serious problem though:
> 
> - SCTP doesn't look at the scope in IPv4 either, it just looks at the
> address itself. So at worse this change will make IPv6 match IPv4.
> 
> - NFS only looks at the scope to check whether it's link-local, and if
> so only declares an address to be unique if the scopes match. In this
> case I think it's the right thing to do, because it's really the
> network that decides whether an address can be duplicate on different
> links, not the host.
> 
> - Unicast and multicast do it when dumping the addresses to userspace.
> I need to fix these.
> 
> Does that make sense?

What else will break though?  If I configure fe80::1/64 and set the scope to
global, do applications know to look at ifa_scope and not just the address
itself to determine the scope?  Should they?

-Brian

^ permalink raw reply

* RE: [PATCH] staging: hv: move hv_netvsc out of staging area
From: Haiyang Zhang @ 2011-10-14 19:40 UTC (permalink / raw)
  To: Sasha Levin
  Cc: KY Srinivasan, gregkh@suse.de, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, virtualization@lists.osdl.org,
	Mike Sterling, NetDev
In-Reply-To: <1318620950.31522.9.camel@lappy>

> -----Original Message-----
> From: Sasha Levin [mailto:levinsasha928@gmail.com]
> Sent: Friday, October 14, 2011 3:36 PM
> > +
> > +obj-$(CONFIG_HYPERV_NET) += hyperv/
> > diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig
> > new file mode 100644
> > index 0000000..936968d
> > --- /dev/null
> > +++ b/drivers/net/hyperv/Kconfig
> > @@ -0,0 +1,5 @@
> > +config HYPERV_NET
> > +	tristate "Microsoft Hyper-V virtual network driver"
> > +	depends on HYPERV
> 
> It doesn't depend on NET anymore?

All drivers in drivers/net depends on NET, which is handled by net/Kconfig 
already. So we don't need to duplicate the dependency. (suggested by Randy 
Dunlap <randy.dunlap@oracle.com> in last review.)

Thanks,
- Haiyang

^ permalink raw reply

* Re: [PATCH] staging: hv: move hv_netvsc out of staging area
From: Sasha Levin @ 2011-10-14 19:35 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: NetDev, gregkh, linux-kernel, virtualization, Mike Sterling,
	devel
In-Reply-To: <1318620026-8349-1-git-send-email-haiyangz@microsoft.com>

On Fri, 2011-10-14 at 12:20 -0700, Haiyang Zhang wrote:
> hv_netvsc has been reviewed on netdev mailing list on 6/09/2011.
> All recommended changes have been made. We are requesting to move
> it out of staging area.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: KY Srinivasan <kys@microsoft.com>
> Signed-off-by: Mike Sterling <Mike.Sterling@microsoft.com>
> Cc: NetDev <netdev@vger.kernel.org>
> 
> ---
>  drivers/net/Kconfig                               |    2 ++
>  drivers/net/Makefile                              |    2 ++
>  drivers/net/hyperv/Kconfig                        |    5 +++++
>  drivers/net/hyperv/Makefile                       |    3 +++
>  drivers/{staging/hv => net/hyperv}/hyperv_net.h   |    0
>  drivers/{staging/hv => net/hyperv}/netvsc.c       |    0
>  drivers/{staging/hv => net/hyperv}/netvsc_drv.c   |    0
>  drivers/{staging/hv => net/hyperv}/rndis_filter.c |    0
>  drivers/staging/hv/Kconfig                        |    6 ------
>  drivers/staging/hv/Makefile                       |    2 --
>  drivers/staging/hv/TODO                           |    1 -
>  11 files changed, 12 insertions(+), 9 deletions(-)
>  create mode 100644 drivers/net/hyperv/Kconfig
>  create mode 100644 drivers/net/hyperv/Makefile
>  rename drivers/{staging/hv => net/hyperv}/hyperv_net.h (100%)
>  rename drivers/{staging/hv => net/hyperv}/netvsc.c (100%)
>  rename drivers/{staging/hv => net/hyperv}/netvsc_drv.c (100%)
>  rename drivers/{staging/hv => net/hyperv}/rndis_filter.c (100%)
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 8d0314d..088c330 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -3451,4 +3451,6 @@ config VMXNET3
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called vmxnet3.
>  
> +source "drivers/net/hyperv/Kconfig"
> +
>  endif # NETDEVICES
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index e1eca2a..647c878 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -306,3 +306,5 @@ obj-$(CONFIG_CAIF) += caif/
>  obj-$(CONFIG_OCTEON_MGMT_ETHERNET) += octeon/
>  obj-$(CONFIG_PCH_GBE) += pch_gbe/
>  obj-$(CONFIG_TILE_NET) += tile/
> +
> +obj-$(CONFIG_HYPERV_NET) += hyperv/
> diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig
> new file mode 100644
> index 0000000..936968d
> --- /dev/null
> +++ b/drivers/net/hyperv/Kconfig
> @@ -0,0 +1,5 @@
> +config HYPERV_NET
> +	tristate "Microsoft Hyper-V virtual network driver"
> +	depends on HYPERV

It doesn't depend on NET anymore?

-- 

Sasha.

^ permalink raw reply

* RE: [PATCH] staging: hv: move hv_netvsc out of staging area
From: Haiyang Zhang @ 2011-10-14 19:28 UTC (permalink / raw)
  To: Greg KH
  Cc: NetDev, linux-kernel@vger.kernel.org,
	virtualization@lists.osdl.org, Mike Sterling,
	devel@linuxdriverproject.org
In-Reply-To: <20111014191605.GA31039@suse.de>

> -----Original Message-----
> From: Greg KH [mailto:gregkh@suse.de]
> Sent: Friday, October 14, 2011 3:16 PM
> Because of renames, the network developers really can't review this.
> I
> suggest posting a new patch, that just adds the driver to the
> drivers/net/ directory, and have the network developer review it that
> way.
> 
> Then, when it is accepted, I can delete the version in the staging
> directory.  We've done it this way for other drivers and it is the
> best
> way to get proper reviews as well as handling cross-subsystem merge
> issues.

During the previous review in June, Joe Perches suggested me to use -M 
option to show only changed code. But I will make a new patch to show 
all code according to your request.

Thanks,
- Haiyang

^ permalink raw reply

* Problem with ixgbe and TX locked on one cpu
From: Paweł Staszewski @ 2011-10-14 19:18 UTC (permalink / raw)
  To: Linux Network Development list

[-- Attachment #1: Type: text/plain, Size: 4338 bytes --]

Hello

I have weird problem with ixgbe and irq affinity / rx-tx queue assignment

Statistics for my ethernet - ixgbe driver:
ethtool -S eth4
NIC statistics:
      rx_packets: 5815535848808
      tx_packets: 5811202378421
      rx_bytes: 4791001750842200
      tx_bytes: 4781190419358301
      rx_pkts_nic: 5815535848827
      tx_pkts_nic: 5811202378510
      rx_bytes_nic: 4837563124411799
      tx_bytes_nic: 4829987507084013
      lsc_int: 8
      tx_busy: 0
      non_eop_descs: 0
      rx_errors: 0
      tx_errors: 0
      rx_dropped: 0
      tx_dropped: 0
      multicast: 92494273
      broadcast: 268718206
      rx_no_buffer_count: 28829
      collisions: 0
      rx_over_errors: 0
      rx_crc_errors: 0
      rx_frame_errors: 0
      hw_rsc_aggregated: 0
      hw_rsc_flushed: 0
      fdir_match: 0
      fdir_miss: 0
      rx_fifo_errors: 0
      rx_missed_errors: 307051074
      tx_aborted_errors: 0
      tx_carrier_errors: 0
      tx_fifo_errors: 0
      tx_heartbeat_errors: 0
      tx_timeout_count: 0
      tx_restart_queue: 15926219
      rx_long_length_errors: 298
      rx_short_length_errors: 0
      tx_flow_control_xon: 0
      rx_flow_control_xon: 0
      tx_flow_control_xoff: 0
      rx_flow_control_xoff: 0
      rx_csum_offload_errors: 54173917
      alloc_rx_page_failed: 0
      alloc_rx_buff_failed: 0
      rx_no_dma_resources: 0
      tx_queue_0_packets: 68694825
      tx_queue_0_bytes: 9443750332
      tx_queue_1_packets: 8410961
      tx_queue_1_bytes: 2527763233
      tx_queue_2_packets: 14411252
      tx_queue_2_bytes: 1317132394
      tx_queue_3_packets: 15013508147
      tx_queue_3_bytes: 17364767277348
      tx_queue_4_packets: 62779891
      tx_queue_4_bytes: 63476596221
      tx_queue_5_packets: 11176001
      tx_queue_5_bytes: 2763600253
      tx_queue_6_packets: 4416357
      tx_queue_6_bytes: 611874984
      tx_queue_7_packets: 8933405
      tx_queue_7_bytes: 1837198524
      tx_queue_8_packets: 13292669
      tx_queue_8_bytes: 3241333510
      tx_queue_9_packets: 10747236
      tx_queue_9_bytes: 1805109931
      tx_queue_10_packets: 5795935258380
      tx_queue_10_bytes: 4763725304722245
      tx_queue_11_packets: 12073934
      tx_queue_11_bytes: 2982743045
      tx_queue_12_packets: 10523764
      tx_queue_12_bytes: 2637451199
      tx_queue_13_packets: 12480552
      tx_queue_13_bytes: 2434827407
      tx_queue_14_packets: 7401777
      tx_queue_14_bytes: 2413618099
      tx_queue_15_packets: 8269270
      tx_queue_15_bytes: 2854359576
      rx_queue_0_packets: 361373769507
      rx_queue_0_bytes: 298565751248279
      rx_queue_1_packets: 369901571908
      rx_queue_1_bytes: 303414679798160
      rx_queue_2_packets: 362508961738
      rx_queue_2_bytes: 299852439447157
      rx_queue_3_packets: 363449272013
      rx_queue_3_bytes: 299738390792515
      rx_queue_4_packets: 361876234461
      rx_queue_4_bytes: 297483366939732
      rx_queue_5_packets: 361402926316
      rx_queue_5_bytes: 297633876486533
      rx_queue_6_packets: 362261522767
      rx_queue_6_bytes: 298026696344647
      rx_queue_7_packets: 361248593301
      rx_queue_7_bytes: 296756459279986
      rx_queue_8_packets: 361654143416
      rx_queue_8_bytes: 298272433659520
      rx_queue_9_packets: 362781764710
      rx_queue_9_bytes: 298804803191595
      rx_queue_10_packets: 361386593064
      rx_queue_10_bytes: 297434987797644
      rx_queue_11_packets: 369886597895
      rx_queue_11_bytes: 302353350171712
      rx_queue_12_packets: 361582732276
      rx_queue_12_bytes: 298670408005971
      rx_queue_13_packets: 365248093536
      rx_queue_13_bytes: 302573023878287
      rx_queue_14_packets: 366571142073
      rx_queue_14_bytes: 302396739276514
      rx_queue_15_packets: 362401929830
      rx_queue_15_bytes: 299024344526029

The problem is with queue 10
      tx_queue_10_packets: 5795935258380
      tx_queue_10_bytes: 4763725304722245

as you can see most of the queue processing is used in queue 10
Average difference is 1,854271229903958e-6  - compared to other queues

and the problem is that almost all TX packet processing is on one CPU
cat /proc/interrupts - in attached file

Is this driver or kernel problem ?

Kernel is: 2.6.38.2

ixgbe driver is:
ethtool -i eth4
driver: ixgbe
version: 3.2.9-k2
firmware-version: 1.12-2
bus-info: 0000:04:00.0


Thanks
Pawel


-- 


[-- Attachment #2: interrupts.txt --]
[-- Type: text/plain, Size: 3675 bytes --]

cat /proc/interrupts  | grep eth4
 135: 3109261876    4289060          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-0
 136: 2738300312 2654348120    4055848          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-1
 137:         43 2636245312 3776381478    4281702          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-2
 138:        340 2156086460    3340495 3269054231    4487452          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-3
 139:         38          0 2738519426          0 1088719123    4176363          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-4
 140:         39          0 2632858749    3512903          0 2307156010    4310322          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-5
 141:         41          0          0 2655130571          0          0 2492897896    4249569          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-6
 142:        173          0          0 2625727263          0          0          0 2509835335    8038276          0          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-7
 143:         44          0          0          0    2115559          0          0          0 3275187626    5066092          0          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-8
 144:         51          0          0          0          0          0          0          0 2668196538 1238317991    4373599          0          0          0          0          0   PCI-MSI-edge      eth4-TxRx-9
 145:     528852          0          0          0     386077          0          0          0     438158  294605430 1867115075    4806187          0          0          0          0   PCI-MSI-edge      eth4-TxRx-10
 146:         65          0          0          0          0          0          0          0          0 2378013639    3357280 1179087288    4668439          0          0          0   PCI-MSI-edge      eth4-TxRx-11
 147:         83          0          0          0          0          0          0          0          0          0 2447343915          0 1621496283    4718715          0          0   PCI-MSI-edge      eth4-TxRx-12
 148:         64          0          0          0          0          0          0          0          0          0 2719008413    3788138          0 2492359875    4697458          0   PCI-MSI-edge      eth4-TxRx-13
 149:         41          0          0          0          0          0          0          0          0          0          0 2569764726          0          0 3216415633    4546711   PCI-MSI-edge      eth4-TxRx-14
 150:         45          0          0          0          0          0          0          0          0          0          0 2553654902          0          0          0 2587543392   PCI-MSI-edge      eth4-TxRx-15
 151:          5          0          0          0          0          0          0          0          0          0          0          0          0          0          0          3   PCI-MSI-edge      eth4:lsc

[-- Attachment #3: pstaszewski.vcf --]
[-- Type: text/x-vcard, Size: 336 bytes --]

begin:vcard
fn;quoted-printable:Pawe=C5=82 Staszewski
n;quoted-printable:Staszewski;Pawe=C5=82
org:ITCare
adr;quoted-printable;quoted-printable;dom:;;Sikorskiego 22;Libi=C4=85=C5=BC;Ma=C5=82opolskie;32-590
title:IT Manager
tel;work:+48 32 7203681
tel;fax:+48 32 7203682
tel;cell:+48 0 609911040
url:www.itcare.pl
version:2.1
end:vcard


^ permalink raw reply

* Re: iwlagn: WARN_ON() in iwl_get_idle_rx_chain_count()
From: Michał Mirosław @ 2011-10-14 19:21 UTC (permalink / raw)
  To: wwguy
  Cc: Intel Linux Wireless, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <1318606158.12009.2.camel@wwguy-ubuntu>

On Fri, Oct 14, 2011 at 08:29:18AM -0700, wwguy wrote:
> Could you try the attach patch and see if it fix your problem.
[attached patch removed]

Backported and applied. I'll test it for couple of days.

Best Regards,
Michał Mirosław

^ permalink raw reply

* [PATCH] staging: hv: move hv_netvsc out of staging area
From: Haiyang Zhang @ 2011-10-14 19:20 UTC (permalink / raw)
  To: haiyangz, kys, gregkh, linux-kernel, devel, virtualization
  Cc: Mike Sterling, NetDev

hv_netvsc has been reviewed on netdev mailing list on 6/09/2011.
All recommended changes have been made. We are requesting to move
it out of staging area.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Mike Sterling <Mike.Sterling@microsoft.com>
Cc: NetDev <netdev@vger.kernel.org>

---
 drivers/net/Kconfig                               |    2 ++
 drivers/net/Makefile                              |    2 ++
 drivers/net/hyperv/Kconfig                        |    5 +++++
 drivers/net/hyperv/Makefile                       |    3 +++
 drivers/{staging/hv => net/hyperv}/hyperv_net.h   |    0
 drivers/{staging/hv => net/hyperv}/netvsc.c       |    0
 drivers/{staging/hv => net/hyperv}/netvsc_drv.c   |    0
 drivers/{staging/hv => net/hyperv}/rndis_filter.c |    0
 drivers/staging/hv/Kconfig                        |    6 ------
 drivers/staging/hv/Makefile                       |    2 --
 drivers/staging/hv/TODO                           |    1 -
 11 files changed, 12 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/hyperv/Kconfig
 create mode 100644 drivers/net/hyperv/Makefile
 rename drivers/{staging/hv => net/hyperv}/hyperv_net.h (100%)
 rename drivers/{staging/hv => net/hyperv}/netvsc.c (100%)
 rename drivers/{staging/hv => net/hyperv}/netvsc_drv.c (100%)
 rename drivers/{staging/hv => net/hyperv}/rndis_filter.c (100%)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8d0314d..088c330 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3451,4 +3451,6 @@ config VMXNET3
 	  To compile this driver as a module, choose M here: the
 	  module will be called vmxnet3.
 
+source "drivers/net/hyperv/Kconfig"
+
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e1eca2a..647c878 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -306,3 +306,5 @@ obj-$(CONFIG_CAIF) += caif/
 obj-$(CONFIG_OCTEON_MGMT_ETHERNET) += octeon/
 obj-$(CONFIG_PCH_GBE) += pch_gbe/
 obj-$(CONFIG_TILE_NET) += tile/
+
+obj-$(CONFIG_HYPERV_NET) += hyperv/
diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig
new file mode 100644
index 0000000..936968d
--- /dev/null
+++ b/drivers/net/hyperv/Kconfig
@@ -0,0 +1,5 @@
+config HYPERV_NET
+	tristate "Microsoft Hyper-V virtual network driver"
+	depends on HYPERV
+	help
+	  Select this option to enable the Hyper-V virtual network driver.
diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile
new file mode 100644
index 0000000..c8a6682
--- /dev/null
+++ b/drivers/net/hyperv/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_HYPERV_NET) += hv_netvsc.o
+
+hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o
diff --git a/drivers/staging/hv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
similarity index 100%
rename from drivers/staging/hv/hyperv_net.h
rename to drivers/net/hyperv/hyperv_net.h
diff --git a/drivers/staging/hv/netvsc.c b/drivers/net/hyperv/netvsc.c
similarity index 100%
rename from drivers/staging/hv/netvsc.c
rename to drivers/net/hyperv/netvsc.c
diff --git a/drivers/staging/hv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
similarity index 100%
rename from drivers/staging/hv/netvsc_drv.c
rename to drivers/net/hyperv/netvsc_drv.c
diff --git a/drivers/staging/hv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
similarity index 100%
rename from drivers/staging/hv/rndis_filter.c
rename to drivers/net/hyperv/rndis_filter.c
diff --git a/drivers/staging/hv/Kconfig b/drivers/staging/hv/Kconfig
index 072185e..8a51166 100644
--- a/drivers/staging/hv/Kconfig
+++ b/drivers/staging/hv/Kconfig
@@ -4,12 +4,6 @@ config HYPERV_STORAGE
 	help
 	 Select this option to enable the Hyper-V virtual storage driver.
 
-config HYPERV_NET
-	tristate "Microsoft Hyper-V virtual network driver"
-	depends on HYPERV && NET
-	help
-	  Select this option to enable the Hyper-V virtual network driver.
-
 config HYPERV_MOUSE
 	tristate "Microsoft Hyper-V mouse driver"
 	depends on HYPERV && HID
diff --git a/drivers/staging/hv/Makefile b/drivers/staging/hv/Makefile
index e071c12..b536584 100644
--- a/drivers/staging/hv/Makefile
+++ b/drivers/staging/hv/Makefile
@@ -1,7 +1,5 @@
 obj-$(CONFIG_HYPERV)		+= hv_timesource.o
 obj-$(CONFIG_HYPERV_STORAGE)	+= hv_storvsc.o
-obj-$(CONFIG_HYPERV_NET)	+= hv_netvsc.o
 obj-$(CONFIG_HYPERV_MOUSE)	+= hv_mouse.o
 
 hv_storvsc-y := storvsc_drv.o
-hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o
diff --git a/drivers/staging/hv/TODO b/drivers/staging/hv/TODO
index ed4d636..fd080cb 100644
--- a/drivers/staging/hv/TODO
+++ b/drivers/staging/hv/TODO
@@ -1,5 +1,4 @@
 TODO:
-	- audit the network driver
 	- audit the scsi driver
 
 Please send patches for this code to Greg Kroah-Hartman <gregkh@suse.de>,
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH] staging: hv: move hv_netvsc out of staging area
From: Greg KH @ 2011-10-14 19:16 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: kys, linux-kernel, devel, virtualization, Mike Sterling, NetDev
In-Reply-To: <1318620026-8349-1-git-send-email-haiyangz@microsoft.com>

On Fri, Oct 14, 2011 at 12:20:26PM -0700, Haiyang Zhang wrote:
> hv_netvsc has been reviewed on netdev mailing list on 6/09/2011.
> All recommended changes have been made. We are requesting to move
> it out of staging area.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: KY Srinivasan <kys@microsoft.com>
> Signed-off-by: Mike Sterling <Mike.Sterling@microsoft.com>
> Cc: NetDev <netdev@vger.kernel.org>
> 
> ---
>  drivers/net/Kconfig                               |    2 ++
>  drivers/net/Makefile                              |    2 ++
>  drivers/net/hyperv/Kconfig                        |    5 +++++
>  drivers/net/hyperv/Makefile                       |    3 +++
>  drivers/{staging/hv => net/hyperv}/hyperv_net.h   |    0
>  drivers/{staging/hv => net/hyperv}/netvsc.c       |    0
>  drivers/{staging/hv => net/hyperv}/netvsc_drv.c   |    0
>  drivers/{staging/hv => net/hyperv}/rndis_filter.c |    0

Because of renames, the network developers really can't review this.  I
suggest posting a new patch, that just adds the driver to the
drivers/net/ directory, and have the network developer review it that
way.

Then, when it is accepted, I can delete the version in the staging
directory.  We've done it this way for other drivers and it is the best
way to get proper reviews as well as handling cross-subsystem merge
issues.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] staging: hv: move hv_netvsc out of staging area
From: Stephen Hemminger @ 2011-10-14 19:12 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: kys, gregkh, linux-kernel, devel, virtualization, Mike Sterling,
	NetDev
In-Reply-To: <1318620026-8349-1-git-send-email-haiyangz@microsoft.com>

On Fri, 14 Oct 2011 12:20:26 -0700
Haiyang Zhang <haiyangz@microsoft.com> wrote:

> hv_netvsc has been reviewed on netdev mailing list on 6/09/2011.
> All recommended changes have been made. We are requesting to move
> it out of staging area.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: KY Srinivasan <kys@microsoft.com>
> Signed-off-by: Mike Sterling <Mike.Sterling@microsoft.com>
> Cc: NetDev <netdev@vger.kernel.org>

Thanks for all the work.

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

^ permalink raw reply

* Re: [PATCH net-next] niu: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-14 18:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111014.003427.1515514811425011051.davem@davemloft.net>

Le vendredi 14 octobre 2011 à 00:34 -0400, David Miller a écrit :
> 
> It would be pretty amazing for a leak of this magnitude to exist for
> so long. :-)
> 
> A page can be split into multiple blocks, each block is some power
> of two in size.
> 
> The chip splits up "blocks" into smaller (also power of two)
> fragments, and these fragments are what we en-tail to the SKBs.
> 
> So at the top level we give the chip blocks.  We try to make this
> equal to PAGE_SIZE.  But if PAGE_SIZE is really large we limit the
> block size to 1 << 15.  Note that it is only when we enforce this
> block size limit that the compount_page(page)->_count atomic increment
> will occur.  As long as PAGE_SIZE <= 1 << 15, rbr_blocks_per_page
> will be 1.
> 
> When the chip takes a block and starts using it, it decides which
> fragment size to use for that block.  Once a fragment size has been
> choosen for a block, it will not change.
> 
> The fragment sizes the chip can use is stored in rp->rbr_sizes[].  We
> always configure the chip to use 256 byte and 1024 byte blocks, then
> depending upon the MTU and the PAGE_SIZE we'll optionally enable other
> sizes such as 2048, 4096, and 8192.
> 
> When we get an RX packet the descriptor tells us the DMA address
> and the fragment size in use for the block that the memory at
> DMA address belongs to.
> 
> So the two seperate page reference count grabs you see are handling
> references for memory being chopped up at two different levels.
> 
> I can't see how we could optimize the intra-block refcounts any
> further.  Part of the problem is that we don't know apriori what
> fragment size the chip will use for a given block.
> 

Thanks for taking the time to explain this David :)

^ permalink raw reply

* Re: [PATCH 2/4] drivers/net/can: the mailinglist moved to vger.kernel.org
From: David Miller @ 2011-10-14 18:00 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can
In-Reply-To: <1318581817-12352-3-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Fri, 14 Oct 2011 10:43:35 +0200

> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Same comment as for the net/can patch, just get rid of these
reference altogether please.

Thanks.

^ permalink raw reply

* Re: [PATCH 3/4] net/can: the mailinglist moved to vger.kernel.org
From: David Miller @ 2011-10-14 17:59 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can
In-Reply-To: <1318581817-12352-4-git-send-email-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Fri, 14 Oct 2011 10:43:36 +0200

> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Please, just remove these references from the source files, they
absolutely do not belong there.

The MAINTAINERS file is the place to obtain this kind of information
and automated tools can let you know which MAINTAINERS entry applies
to a particular source file as long as the entry has appropriate file
tags.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] tcp: reduce memory needs of out of order queue
From: Eric Dumazet @ 2011-10-14 17:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1318579509.2533.110.camel@edumazet-laptop>

Le vendredi 14 octobre 2011 à 10:05 +0200, Eric Dumazet a écrit :

> This patch specifically addresses the OFO problem, trying to lower
> memory usage for machines handling lot of sockets (proxies for example)

Well, thinking a bit more about it is needed, so zap the patch please.

Thanks

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: Michael Chan @ 2011-10-14 16:15 UTC (permalink / raw)
  To: 'Rick Jones'
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	Dmitry Kravkov, Eilon Greenstein
In-Reply-To: <4E985DE8.3090308@hp.com>

Rick Jones wrote:

> On 10/14/2011 08:53 AM, Michael Chan wrote:
> > Rick Jones wrote:
> >
> >> Is this perhaps saying that a bnx2x-driven device being used for
> >> FCoE or iSCSI boot must not permit *any* run-time configuration
> >> change which leads to a NIC reset?
> >>
> >
> > That is right.  Unless you have a multipath configuration with
> multiple
> > ports, then you can reset one port at a time.
> 
> So, should there also be a "cnic_boot_device" check in many of the
> "capital letter" ethtool paths?
> 

If the user is doing ethtool configuration changes or device shutdown,
it is more obvious what the consequence will be.  The user may also be
careful to do it on a multipath setup.

The reset caused by the auto turn-off of LRO when you enable
ip_forward or bridging will not be obvious to the user.  In addition,
all devices with LRO turned on will be reset at the same time so even
multipath will not survive.

^ permalink raw reply

* Re: [PATCH net-next] tcp: reduce memory needs of out of order queue
From: Eric Dumazet @ 2011-10-14 16:11 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev
In-Reply-To: <1318608052.2223.35.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Le vendredi 14 octobre 2011 à 18:00 +0200, Eric Dumazet a écrit :

> Now we also could do the copybreak for frames queued into regular
>  receive_queue, if current wmem_alloc is above 25% of rcvbuf space...

I mean rmem_alloc of course...


diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c1653fe..0fe0828 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4426,6 +4426,25 @@ static inline int tcp_try_rmem_schedule(struct sock *sk, unsigned int size)
 	return 0;
 }
 
+/*
+ * Caller want to reduce memory needs before queueing skb
+ * The (expensive) copy should not be be done in fast path.
+ */
+static struct sk_buff *skb_reduce_truesize(struct sk_buff *skb)
+{
+	if (skb->truesize > 2 * SKB_TRUESIZE(skb->len)) {
+		struct sk_buff *nskb;
+
+		nskb = skb_copy_expand(skb, skb_headroom(skb), 0,
+				       GFP_ATOMIC | __GFP_NOWARN);
+		if (nskb) {
+			__kfree_skb(skb);
+			skb = nskb;
+		}
+	}
+	return skb;
+}
+
 static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcphdr *th = tcp_hdr(skb);
@@ -4475,6 +4494,10 @@ queue_and_out:
 			    tcp_try_rmem_schedule(sk, skb->truesize))
 				goto drop;
 
+			if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf >> 2) {
+				skb = skb_reduce_truesize(skb);
+				th = tcp_hdr(skb);
+			}
 			skb_set_owner_r(skb, sk);
 			__skb_queue_tail(&sk->sk_receive_queue, skb);
 		}
@@ -4553,6 +4576,11 @@ drop:
 	SOCK_DEBUG(sk, "out of order segment: rcv_next %X seq %X - %X\n",
 		   tp->rcv_nxt, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq);
 
+	/* Since this skb might stay on ofo a long time, try to reduce
+	 * its truesize (if its too big) to avoid future pruning.
+	 * Many drivers allocate large buffers even to hold tiny frames.
+	 */
+	skb = skb_reduce_truesize(skb);
 	skb_set_owner_r(skb, sk);
 
 	if (!skb_peek(&tp->out_of_order_queue)) {

^ permalink raw reply related

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: Rick Jones @ 2011-10-14 16:06 UTC (permalink / raw)
  To: Michael Chan
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	Dmitry Kravkov, Eilon Greenstein
In-Reply-To: <C27F8246C663564A84BB7AB34397724266D2018303@IRVEXCHCCR01.corp.ad.broadcom.com>

On 10/14/2011 08:53 AM, Michael Chan wrote:
> Rick Jones wrote:
>
>> Is this perhaps saying that a bnx2x-driven device being used for
>> FCoE or iSCSI boot must not permit *any* run-time configuration
>> change which leads to a NIC reset?
>>
>
> That is right.  Unless you have a multipath configuration with multiple
> ports, then you can reset one port at a time.

So, should there also be a "cnic_boot_device" check in many of the 
"capital letter" ethtool paths?

rick

^ permalink raw reply

* Re: [PATCH net-next] tcp: reduce memory needs of out of order queue
From: Eric Dumazet @ 2011-10-14 16:00 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Miller, netdev
In-Reply-To: <4E985A3F.5080103@hp.com>

Le vendredi 14 octobre 2011 à 08:50 -0700, Rick Jones a écrit :

> Is the wireless problem strictly a wireless problem?  Many of the 
> drivers where Eric has been fixing the truesize accounting have been 
> wired devices no?

Yes, but the goal of such fixes it to make bugs happen too with said
wired devices ;)

About WIFI, I get these TCP Collapses on two different machines, one
using drivers/net/wireless/rt2x00 driver

Extract from drivers/net/wireless/rt2x00/rt2x00queue.h

/**
 * DOC: Entry frame size
 * 
 * Ralink PCI devices demand the Frame size to be a multiple of 128 bytes,
 * for USB devices this restriction does not apply, but the value of
 * 2432 makes sense since it is big enough to contain the maximum fragment
 * size according to the ieee802.11 specs. 
 * The aggregation size depends on support from the driver, but should
 * be something around 3840 bytes.
 */
#define DATA_FRAME_SIZE         2432
#define MGMT_FRAME_SIZE         256
#define AGGREGATION_SIZE        3840

You understand why we endup using skb->truesize > 4096 buffers 

I liked doing the copybreak only if needed, I found the OFO case was
 most of the time responsible of the Collapses.

Now we also could do the copybreak for frames queued into regular
 receive_queue, if current wmem_alloc is above 25% of rcvbuf space...

^ permalink raw reply

* Re: [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: Michael Chan @ 2011-10-14 15:53 UTC (permalink / raw)
  To: 'Rick Jones'
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	Dmitry Kravkov, Eilon Greenstein
In-Reply-To: <4E9855EC.1020509@hp.com>

Rick Jones wrote:

> Is this perhaps saying that a bnx2x-driven device being used for FCoE
> or
> iSCSI boot must not permit *any* run-time configuration change which
> leads to a NIC reset?
> 

That is right.  Unless you have a multipath configuration with multiple
ports, then you can reset one port at a time.

^ permalink raw reply

* Re: [PATCH net-next] tcp: reduce memory needs of out of order queue
From: Rick Jones @ 2011-10-14 15:50 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20111014.034224.1197576516015404466.davem@davemloft.net>

On 10/14/2011 12:42 AM, David Miller wrote:

> No objection from me, although I wish wireless drivers were able to
> size their SKBs more appropriately.  I wonder how many problems that
> look like "OMG we gotz da Buffer Bloat, arrr!" are actually due to
> this truesize issue.

I think the buffer bloat folks are looking at latency through transmit 
queues - now perhaps some of their latency is really coming from 
retransmissions thanks to packets being dropped thanks to overfilling 
socket buffers, but I'm pretty sure they are clever enough to look for that.

> I think such large truesize SKBs will cause problems even in non loss
> situations, in that the receive buffer will hit it's limits more
> quickly.  I not sure that the receive buffer autotuning is built to
> handle this sort of scenerio as a common occurance.

I believe that may be the case - at least during something like:

netperf -t TCP_RR -H <host> -l 30 -- -b 256 -D

which on an otherwise quiet test setup will report a non-trivial number 
of retransmissions - either via looking at netstat -s output, or by 
adding local_transport_retrans,remote_transport_retrans to an output 
selector for netperf (eg -o 
throughput,burst_size,local_transport_retrans,remote_transport_retrans,lss_size_end,rsr_size_end)

(I plan on providing more data after a laptop has gone through some 
upgrades)

> You might want to check if this is the actual root cause of your
> problems.  If the receive buffer autotuning doesn't expand the receive
> buffer enough to hold two windows worth of these large truesize SKBs,
> that's the real reason why we end up pruning.
>
> We have to decide if these kinds of SKBs are acceptable as a normal
> situation for MSS sized frames.  And if they are then it's probably
> a good idea to adjust the receive buffer autotuning code too.
>
> Although I realize it might be difficult, getting rid of these weird
> SKBs in the first place would be ideal.

That means a semi-arbitrary alloc/copy in drivers, even when/if the 
wasted space isn't going to be a problem no?  That TCP_RR test above 
would run "just fine" if the burst size was much smaller, but if there 
was an arbitrary allocate/copy it would take a service demand and thus 
transaction rate hit.

> It would also be a good idea to put the truesize inaccuracies into
> perspective when selecting how to fix this.  It's trying to prevent
> 1 byte packets not accounting for the 256 byte SKB and metadata.
> That kind of case with such a high ratio of wastage is important.
>
> On the other hand, using 2048 bytes for a 1500 byte packet and claiming
> the truesize is 1500 + sizeof(metadata)... that might be an acceptable
> lie to tell :-)  This is especially true if it allows an easy solution
> to this wireless problem.

Is the wireless problem strictly a wireless problem?  Many of the 
drivers where Eric has been fixing the truesize accounting have been 
wired devices no?

rick jones

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox