Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC PATCH 3/4] net: VSI: Add virtual station interface support
From: John Fastabend @ 2013-09-11 18:47 UTC (permalink / raw)
  To: stephen, bhutchings, ogerlitz
  Cc: vfalico, john.ronciak, netdev, shannon.nelson
In-Reply-To: <20130911184441.26914.10336.stgit@nitbit.x32>

This patch adds support for a new device type VSI (virtual station
interface) this device type exposes additional net devices complete
with queues and a MAC/VLAN pair to the host OS that are logically
stacked on top of a switching/routing component with the physical
link acting as the downlink to the peer switch.

The hardware on receive path will forward packets to the new VSI
net device using the forwarding database (FDB) already exposed via
the ndo ops ndo_fdb_{add|del|dump}. On transmit the hardware may
use either a VEB or VEPA. In the VEB case traffic may be "switched"
between VSI net devices by the hardware and in VEPA case all traffic
is sent to the adjacent switch. The hardware _should_ expose this
functionality via the ndo_bridge_{set|get}link ndo operations.

This net device should be functionally analogous to an offloaded
macvlan device with the ebridge component offloaded into hardware.

Also notice that for now the ixgbe implementation accompanying this
patch set only supports L2 forwarding the fdb interfaces could push
L3/L4 forwarding to the hardware for more advanced usages including
vxlan and other tunnel schemes.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/Kconfig       |    9 +++
 drivers/net/Makefile      |    1 
 drivers/net/vsi.c         |  124 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h |   27 ++++++++++
 include/uapi/linux/if.h   |    1 
 5 files changed, 162 insertions(+)
 create mode 100644 drivers/net/vsi.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b45b240..19be0fb 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -362,4 +362,13 @@ config VMXNET3
 
 source "drivers/net/hyperv/Kconfig"
 
+config VSI
+	tristate "Virtual Station Interfaces (VSI)"
+	help
+	  This supports chip sets with embedded switching components
+	  and allows creating additional net devices that are
+	  logically slaves of a master net device typically the net
+	  device associated with the physical function. For these
+	  child devices switching occurs in the hardware component.
+
 endif # NETDEVICES
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3fef8a8..3ef1d66 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
 obj-$(CONFIG_NLMON) += nlmon.o
+obj-$(CONFIG_VSI) += vsi.o
 
 #
 # Networking Drivers
diff --git a/drivers/net/vsi.c b/drivers/net/vsi.c
new file mode 100644
index 0000000..e9d39da
--- /dev/null
+++ b/drivers/net/vsi.c
@@ -0,0 +1,124 @@
+/*
+ * VSI - Virtual Sstation Interface
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * John Fastabend <john.r.fastabend@intel.com>
+ */
+#include <linux/module.h>
+#include <net/rtnetlink.h>
+#include <linux/etherdevice.h>
+
+size_t vsi_priv_size(struct net *src_net, struct nlattr *tb[])
+{
+	struct net_device *dev;
+	size_t size = 0;
+
+	if (!tb[IFLA_LINK])
+		return 0;
+
+	dev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+	if (!dev)
+		return -ENODEV;
+
+	if (dev->netdev_ops->ndo_vsi_size)
+		size = dev->netdev_ops->ndo_vsi_size(dev);
+	return size;
+}
+
+static int vsi_newlink(struct net *src_net, struct net_device *dev,
+		       struct nlattr *tb[], struct nlattr *data[])
+{
+	struct net_device *lower;
+	int err;
+
+	if (!tb[IFLA_LINK])
+		return -EINVAL;
+
+	lower = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
+	if (!lower)
+		return -ENODEV;
+
+	if (!tb[IFLA_MTU])
+		dev->mtu = lower->mtu;
+	else if (lower->mtu > dev->mtu)
+		return -EINVAL;
+
+	dev->priv_flags |= IFF_VSI_PORT;
+	err = lower->netdev_ops->ndo_vsi_add(lower, dev);
+	if (err < 0)
+		return err;
+
+	err = netdev_upper_dev_link(lower, dev);
+	if (err)
+		goto destroy_port;
+
+	err = register_netdevice(dev);
+	if (err < 0)
+		goto upper_dev_unlink;
+
+	netif_stacked_transfer_operstate(lower, dev);
+	return 0;
+upper_dev_unlink:
+	netdev_upper_dev_unlink(lower, dev);
+destroy_port:
+	if (lower->netdev_ops->ndo_vsi_del)
+		lower->netdev_ops->ndo_vsi_del(dev);
+	return err;
+}
+
+void vsi_dellink(struct net_device *dev, struct list_head *head)
+{
+	struct net_device *lower;
+	struct list_head *iter;
+
+	netdev_for_each_lower_dev_rcu(dev, lower, iter) {
+		if (lower->netdev_ops->ndo_vsi_del)
+			lower->netdev_ops->ndo_vsi_del(dev);
+		netdev_upper_dev_unlink(lower, dev);
+	}
+
+	unregister_netdevice_queue(dev, head);
+}
+
+static struct rtnl_link_ops vsi_link_ops __read_mostly = {
+	.kind		= "vsi",
+	.priv_size	= vsi_priv_size,
+	.setup		= ether_setup,
+	.newlink	= vsi_newlink,
+	.dellink	= vsi_dellink,
+};
+
+static int __init vsi_init_module(void)
+{
+	return rtnl_link_register(&vsi_link_ops);
+}
+
+static void __exit vsi_cleanup_module(void)
+{
+	rtnl_link_unregister(&vsi_link_ops);
+}
+
+module_init(vsi_init_module);
+module_exit(vsi_cleanup_module);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("John Fastabend <john.r.fastabend@intel.com>");
+MODULE_DESCRIPTION("Virutal Station Interfaces (VSI)");
+MODULE_ALIAS_RTNL_LINK("vsi");
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4d24b38..9817745 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -961,6 +961,24 @@ struct netdev_phys_port_id {
  *	Called by vxlan to notify the driver about a UDP port and socket
  *	address family that vxlan is not listening to anymore. The operation
  *	is protected by the vxlan_net->sock_lock.
+ *
+ * int (*ndo_vsi_add)(struct net_device *lower, struct net_device *dev)
+ *	Called by the virtual station interface (VSI) link type to add a new
+ *	net device 'dev' to an embedded switch where the embedded switch
+ *	management net device is identified by 'lower'. This should return
+ *	0 on success or may return negative error codes. Error codes should
+ *	be used here to signify resource constraints, unsupportable attributes,
+ *	or any other condition which caused the creation to fail.
+ * void (*ndo_vsi_del)(struct net_device *dev)
+ *	Called by the virtual station interface (VSI) link type to remove the
+ *	net device 'dev' from an embedded switch. Drivers may not fail this
+ *	command.
+ * size_t (*ndo_vsi_size)(struct net_device *dev)
+ *	Called by the virtual station interface (VSI) link type to add the
+ *	required private size to a VSI interface that is being created. If
+ *	this routine is not implemented size_t 0 is used. The 'dev' argument
+ *	indicates the embedded switch management interface where the new
+ *	net devices is being attached.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1097,6 +1115,10 @@ struct net_device_ops {
 	void			(*ndo_del_vxlan_port)(struct  net_device *dev,
 						      sa_family_t sa_family,
 						      __u16 port);
+	int			(*ndo_vsi_add)(struct net_device *lower,
+					       struct net_device *dev);
+	void			(*ndo_vsi_del)(struct net_device *dev);
+	size_t			(*ndo_vsi_size)(struct net_device *dev);
 };
 
 /*
@@ -2967,6 +2989,11 @@ static inline bool netif_supports_nofcs(struct net_device *dev)
 	return dev->priv_flags & IFF_SUPP_NOFCS;
 }
 
+static inline bool netif_is_vsi_port(struct net_device *dev)
+{
+	return dev->priv_flags & IFF_VSI_PORT;
+}
+
 extern struct pernet_operations __net_initdata loopback_net_ops;
 
 /* Logging, debugging and troubleshooting/diagnostic helpers. */
diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 1ec407b..9b8d6a0 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -83,6 +83,7 @@
 #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
 #define IFF_LIVE_ADDR_CHANGE 0x100000	/* device supports hardware address
 					 * change when it's running */
+#define IFF_VSI_PORT 0x200000		/* Virtual Station Interface port */
 
 
 #define IF_GET_IFACE	0x0001		/* for querying only */

^ permalink raw reply related

* [RFC PATCH 4/4] ixgbe: Adding VSI support to ixgbe
From: John Fastabend @ 2013-09-11 18:47 UTC (permalink / raw)
  To: stephen, bhutchings, ogerlitz
  Cc: vfalico, john.ronciak, netdev, shannon.nelson
In-Reply-To: <20130911184441.26914.10336.stgit@nitbit.x32>

This adds initial base support for VSI interfaces to the ixgbe
driver.

This supports up to 32 VSIs per physical function with each VSI
supporting up to 4 TX/RX queue pairs. The hardware has support
for modes using 2 queue pairs that enable up to 64 VSIs but they
are not enabled here. VSIs can be instantiated with 1,2, or 4
queues unfortunately due to hardware restrictions related to RSS
masking across queues 3 TX/RX queues is not supported. If a user
inputs a queue parameter of 3 for either TX or RX the vsi_add
will fail with EINVAL and a message will be printed to dmesg
explaining the error. EBUSY is returned if more than 32 VSIs are
added.

The driver uses a bitmask added to the ixgbe_adapter struct to
track VSIs. VSIs spawn net devices that can be independently
managed via their own ndo and ethtool ops.

Currently, MAC addresses must be added manually via 'ip link set'
or similar configuration (netlink) commands before bringing the
spawned devices online. I could easily generate random MAC addresses
if that is prefered although not required in my use case and seems
a bit arbitrary. I expect most users have a pool of known "good"
MAC addresses they can use or for testing can simply pick their
favorite address.

Additionally VSIs do not support promisc mode. It is not clear
to me at least what promiscuous mode means when the VSI is behind
a non-learning embedded bridge wich does not do flooding. The
hardware does not support flooding on the embedded bridge however
it does support mirroring which could be enabled to get something
that looks like promiscuous mode. Currently there is no kernel
support (netlink cmd?) to configure mirroring.

The embedded bridge that connects VSIs looks like an edge relay
in IEEE std terms. And can support both VEB and VEPA modes which
are managed independently of VSIs through the {set|get} bridge
APIs.

VSIs only support a single MAC addresses in this patch. This
is a software limitation and can/will be extended in subsequent
patches. For now a single MAC addresses is supported.

NOTE:
The SRIOV and VMDQ flags in ixgbe seem to be a bit overloaded
and convoluted. This patch uses these flags. A follow up patch
could clean up the ixgbe flag usage scheme.

RFC TODO: I'll complete these before sending the real patch
(1) interop with DCB/FCOE/SR-IOV
(2) vmdq_netdev can be dropped from tx_ring and just use
    correctly set netdev. In slowpath we can look up management
    netdev via vadapter->real_adapter->netdev and save the hotpath
    vmdq_netdev if/else branch.

Otherwise I've been adding/delete VSIs with multiple netperf
TCP sessions without any crashes but I still need to do some
more testing.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/Makefile        |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   32 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    4 
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   15 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  307 +++++++++++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c     |  428 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h     |   71 ++++
 7 files changed, 766 insertions(+), 94 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h

diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile b/drivers/net/ethernet/intel/ixgbe/Makefile
index be2989e..24136e7 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -34,7 +34,8 @@ obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
               ixgbe_82599.o ixgbe_82598.o ixgbe_phy.o ixgbe_sriov.o \
-              ixgbe_mbx.o ixgbe_x540.o ixgbe_lib.o ixgbe_ptp.o
+              ixgbe_mbx.o ixgbe_x540.o ixgbe_lib.o ixgbe_ptp.o \
+	      ixgbe_vsi.o
 
 ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
                               ixgbe_dcb_82599.o ixgbe_dcb_nl.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 0ac6b11..ba2ab14 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -244,6 +244,7 @@ struct ixgbe_ring {
 	unsigned long last_rx_timestamp;
 	unsigned long state;
 	u8 __iomem *tail;
+	struct net_device *vmdq_netdev;
 	dma_addr_t dma;			/* phys. address of descriptor ring */
 	unsigned int size;		/* length in bytes */
 
@@ -288,11 +289,15 @@ enum ixgbe_ring_f_enum {
 };
 
 #define IXGBE_MAX_RSS_INDICES  16
-#define IXGBE_MAX_VMDQ_INDICES 64
+#define IXGBE_MAX_VMDQ_INDICES 32
 #define IXGBE_MAX_FDIR_INDICES 63	/* based on q_vector limit */
 #define IXGBE_MAX_FCOE_INDICES  8
 #define MAX_RX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
 #define MAX_TX_QUEUES (IXGBE_MAX_FDIR_INDICES + 1)
+#define IXGBE_MAX_VSI_QUEUES 4
+#define IXGBE_MAX_VSI_QUEUES 4
+#define IXGBE_BAD_VSI_QUEUE 3
+
 struct ixgbe_ring_feature {
 	u16 limit;	/* upper limit on feature indices */
 	u16 indices;	/* current value of indices */
@@ -738,6 +743,7 @@ struct ixgbe_adapter {
 #endif /*CONFIG_DEBUG_FS*/
 
 	u8 default_up;
+	unsigned long vsi_bitmask; /* Bitmask indicating in use pools */
 };
 
 struct ixgbe_fdir_filter {
@@ -747,6 +753,17 @@ struct ixgbe_fdir_filter {
 	u16 action;
 };
 
+struct ixgbe_vsi_adapter {
+	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
+	struct net_device *netdev;
+	struct ixgbe_adapter *real_adapter;
+	unsigned int tx_base_queue;
+	unsigned int rx_base_queue;
+	struct net_device_stats net_stats;
+	int pool;
+	bool online;
+};
+
 enum ixgbe_state_t {
 	__IXGBE_TESTING,
 	__IXGBE_RESETTING,
@@ -879,9 +896,14 @@ static inline void ixgbe_dbg_adapter_exit(struct ixgbe_adapter *adapter) {}
 static inline void ixgbe_dbg_init(void) {}
 static inline void ixgbe_dbg_exit(void) {}
 #endif /* CONFIG_DEBUG_FS */
+static inline struct net_device *netdev_ring(const struct ixgbe_ring *ring)
+{
+	return ring->vmdq_netdev ? ring->vmdq_netdev : ring->netdev;
+}
+
 static inline struct netdev_queue *txring_txq(const struct ixgbe_ring *ring)
 {
-	return netdev_get_tx_queue(ring->netdev, ring->queue_index);
+	return netdev_get_tx_queue(netdev_ring(ring), ring->queue_index);
 }
 
 extern void ixgbe_ptp_init(struct ixgbe_adapter *adapter);
@@ -915,4 +937,10 @@ extern void ixgbe_ptp_check_pps_event(struct ixgbe_adapter *adapter, u32 eicr);
 void ixgbe_sriov_reinit(struct ixgbe_adapter *adapter);
 #endif
 
+int ixgbe_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd);
+int ixgbe_write_uc_addr_list(struct net_device *netdev);
+netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
+				  struct ixgbe_adapter *adapter,
+				  struct ixgbe_ring *tx_ring);
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring);
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 0e1b973..48b2d81 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -150,8 +150,8 @@ static const char ixgbe_gstrings_test[][ETH_GSTRING_LEN] = {
 };
 #define IXGBE_TEST_LEN sizeof(ixgbe_gstrings_test) / ETH_GSTRING_LEN
 
-static int ixgbe_get_settings(struct net_device *netdev,
-                              struct ethtool_cmd *ecmd)
+int ixgbe_get_settings(struct net_device *netdev,
+		       struct ethtool_cmd *ecmd)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 90b4e10..e2dd635 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -500,7 +500,8 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 #endif
 
 	/* only proceed if SR-IOV is enabled */
-	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
+	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED) &&
+	    !(adapter->flags & IXGBE_FLAG_VMDQ_ENABLED))
 		return false;
 
 	/* Add starting offset to total pool count */
@@ -852,7 +853,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 
 		/* apply Tx specific ring traits */
 		ring->count = adapter->tx_ring_count;
-		ring->queue_index = txr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				txr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = txr_idx;
 
 		/* assign ring to adapter */
 		adapter->tx_ring[txr_idx] = ring;
@@ -895,7 +900,11 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 #endif /* IXGBE_FCOE */
 		/* apply Rx specific ring traits */
 		ring->count = adapter->rx_ring_count;
-		ring->queue_index = rxr_idx;
+		if (adapter->num_rx_pools > 1)
+			ring->queue_index =
+				rxr_idx % adapter->num_rx_queues_per_pool;
+		else
+			ring->queue_index = rxr_idx;
 
 		/* assign ring to adapter */
 		adapter->rx_ring[rxr_idx] = ring;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7aba452..3a2138a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -52,6 +52,7 @@
 #include "ixgbe_common.h"
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
+#include "ixgbe_vsi.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -872,7 +873,8 @@ static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
 
 static u64 ixgbe_get_tx_pending(struct ixgbe_ring *ring)
 {
-	struct ixgbe_adapter *adapter = netdev_priv(ring->netdev);
+	struct net_device *dev = ring->netdev;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
 	struct ixgbe_hw *hw = &adapter->hw;
 
 	u32 head = IXGBE_READ_REG(hw, IXGBE_TDH(ring->reg_idx));
@@ -1055,7 +1057,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			tx_ring->next_to_use, i,
 			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
-		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+		netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 
 		e_info(probe,
 		       "tx hang %d detected on queue %d, resetting adapter\n",
@@ -1072,16 +1074,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 				  total_packets, total_bytes);
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(netdev_ring(tx_ring)) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
 		 */
 		smp_mb();
-		if (__netif_subqueue_stopped(tx_ring->netdev,
+		if (__netif_subqueue_stopped(netdev_ring(tx_ring),
 					     tx_ring->queue_index)
 		    && !test_bit(__IXGBE_DOWN, &adapter->state)) {
-			netif_wake_subqueue(tx_ring->netdev,
+			netif_wake_subqueue(netdev_ring(tx_ring),
 					    tx_ring->queue_index);
 			++tx_ring->tx_stats.restart_queue;
 		}
@@ -1226,7 +1228,7 @@ static inline void ixgbe_rx_hash(struct ixgbe_ring *ring,
 				 union ixgbe_adv_rx_desc *rx_desc,
 				 struct sk_buff *skb)
 {
-	if (ring->netdev->features & NETIF_F_RXHASH)
+	if (netdev_ring(ring)->features & NETIF_F_RXHASH)
 		skb->rxhash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss);
 }
 
@@ -1260,10 +1262,12 @@ static inline void ixgbe_rx_checksum(struct ixgbe_ring *ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
+	struct net_device *dev = netdev_ring(ring);
+
 	skb_checksum_none_assert(skb);
 
 	/* Rx csum disabled */
-	if (!(ring->netdev->features & NETIF_F_RXCSUM))
+	if (!(dev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* if IP and error */
@@ -1559,7 +1563,7 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 				     union ixgbe_adv_rx_desc *rx_desc,
 				     struct sk_buff *skb)
 {
-	struct net_device *dev = rx_ring->netdev;
+	struct net_device *dev = netdev_ring(rx_ring);
 
 	ixgbe_update_rsc_stats(rx_ring, skb);
 
@@ -1739,7 +1743,7 @@ static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
 				  union ixgbe_adv_rx_desc *rx_desc,
 				  struct sk_buff *skb)
 {
-	struct net_device *netdev = rx_ring->netdev;
+	struct net_device *netdev = netdev_ring(rx_ring);
 
 	/* verify that the packet does not have any known errors */
 	if (unlikely(ixgbe_test_staterr(rx_desc,
@@ -1905,7 +1909,7 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+		skb = netdev_alloc_skb_ip_align(netdev_ring(rx_ring),
 						IXGBE_RX_HDR_SIZE);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_rx_buff_failed++;
@@ -1986,6 +1990,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	int ddp_bytes;
 	unsigned int mss = 0;
+	struct net_device *netdev = netdev_ring(rx_ring);
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
 
@@ -2041,7 +2046,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			/* include DDPed FCoE data */
 			if (ddp_bytes > 0) {
 				if (!mss) {
-					mss = rx_ring->netdev->mtu -
+					mss = netdev->mtu -
 						sizeof(struct fcoe_hdr) -
 						sizeof(struct fc_frame_header) -
 						sizeof(struct fcoe_crc_eof);
@@ -2455,58 +2460,6 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
 	}
 }
 
-static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
-					   u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
-static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
-					    u64 qmask)
-{
-	u32 mask;
-	struct ixgbe_hw *hw = &adapter->hw;
-
-	switch (hw->mac.type) {
-	case ixgbe_mac_82598EB:
-		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
-		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
-		break;
-	case ixgbe_mac_82599EB:
-	case ixgbe_mac_X540:
-		mask = (qmask & 0xFFFFFFFF);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
-		mask = (qmask >> 32);
-		if (mask)
-			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
-		break;
-	default:
-		break;
-	}
-	/* skip the flush */
-}
-
 /**
  * ixgbe_irq_enable - Enable default interrupt generation settings
  * @adapter: board private structure
@@ -2946,6 +2899,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 			     struct ixgbe_ring *ring)
 {
+	struct net_device *netdev = netdev_ring(ring);
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
@@ -3005,7 +2959,7 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 		struct ixgbe_q_vector *q_vector = ring->q_vector;
 
 		if (q_vector)
-			netif_set_xps_queue(adapter->netdev,
+			netif_set_xps_queue(netdev,
 					    &q_vector->affinity_mask,
 					    ring->queue_index);
 	}
@@ -3395,7 +3349,7 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 	int rss_i = adapter->ring_feature[RING_F_RSS].indices;
-	int p;
+	u16 pool;
 
 	/* PSRTYPE must be initialized in non 82598 adapters */
 	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
@@ -3412,9 +3366,8 @@ static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
 	else if (rss_i > 1)
 		psrtype |= 1 << 29;
 
-	for (p = 0; p < adapter->num_rx_pools; p++)
-		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(p)),
-				psrtype);
+	for_each_set_bit(pool, &adapter->vsi_bitmask, 32)
+		IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
 }
 
 static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
@@ -3676,6 +3629,8 @@ static void ixgbe_vlan_strip_disable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl &= ~IXGBE_RXDCTL_VME;
@@ -3706,6 +3661,8 @@ static void ixgbe_vlan_strip_enable(struct ixgbe_adapter *adapter)
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		for (i = 0; i < adapter->num_rx_queues; i++) {
+			if (adapter->rx_ring[i]->vmdq_netdev)
+				continue;
 			j = adapter->rx_ring[i]->reg_idx;
 			vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(j));
 			vlnctrl |= IXGBE_RXDCTL_VME;
@@ -3736,15 +3693,16 @@ static void ixgbe_restore_vlan(struct ixgbe_adapter *adapter)
  *                0 on no addresses written
  *                X on writing X addresses to the RAR table
  **/
-static int ixgbe_write_uc_addr_list(struct net_device *netdev)
+int ixgbe_write_uc_addr_list(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_hw *hw = &adapter->hw;
 	unsigned int rar_entries = hw->mac.num_rar_entries - 1;
 	int count = 0;
 
-	/* In SR-IOV mode significantly less RAR entries are available */
-	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+	/* In SR-IOV/VMDQ modes significantly less RAR entries are available */
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED ||
+	    adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)
 		rar_entries = IXGBE_MAX_PF_MACVLANS - 1;
 
 	/* return ENOMEM indicating insufficient memory for addresses */
@@ -3765,6 +3723,7 @@ static int ixgbe_write_uc_addr_list(struct net_device *netdev)
 			count++;
 		}
 	}
+
 	/* write the addresses in reverse order to avoid write combining */
 	for (; rar_entries > 0 ; rar_entries--)
 		hw->mac.ops.clear_rar(hw, rar_entries);
@@ -4114,6 +4073,8 @@ static void ixgbe_fdir_filter_restore(struct ixgbe_adapter *adapter)
 static void ixgbe_configure(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct net_device *upper;
+	struct list_head *iter;
 
 	ixgbe_configure_pb(adapter);
 #ifdef CONFIG_IXGBE_DCB
@@ -4126,6 +4087,13 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 	ixgbe_configure_virtualization(adapter);
 
 	ixgbe_set_rx_mode(adapter->netdev);
+	netdev_for_each_upper_dev_rcu(adapter->netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+
+		ixgbe_vsi_set_rx_mode(upper);
+	}
+
 	ixgbe_restore_vlan(adapter);
 
 	switch (hw->mac.type) {
@@ -4452,7 +4420,7 @@ void ixgbe_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_rx_ring - Free Rx Buffers per Queue
  * @rx_ring: ring to free buffers from
  **/
-static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
+void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
 	unsigned long size;
@@ -4576,8 +4544,9 @@ static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
 
 void ixgbe_down(struct ixgbe_adapter *adapter)
 {
-	struct net_device *netdev = adapter->netdev;
+	struct net_device *upper, *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct list_head *iter;
 	u32 rxctrl;
 	int i;
 
@@ -4588,6 +4557,14 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	rxctrl = IXGBE_READ_REG(hw, IXGBE_RXCTRL);
 	IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, rxctrl & ~IXGBE_RXCTRL_RXEN);
 
+	/* disable all VSIs */
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+		if (netif_running(upper))
+			ixgbe_vsi_down(upper);
+	}
+
 	/* disable all enabled rx queues */
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		/* this call also flushes the previous write */
@@ -4831,6 +4808,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
 		return -EIO;
 	}
 
+	/* PF holds first pool slot */
+	set_bit(0, &adapter->vsi_bitmask);
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
 	return 0;
@@ -5136,7 +5115,9 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
 static int ixgbe_open(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
-	int err;
+	struct net_device *upper;
+	struct list_head *iter;
+	int err, queues;
 
 	/* disallow open during test */
 	if (test_bit(__IXGBE_TESTING, &adapter->state))
@@ -5161,16 +5142,22 @@ static int ixgbe_open(struct net_device *netdev)
 		goto err_req_irq;
 
 	/* Notify the stack of the actual queue counts. */
-	err = netif_set_real_num_tx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_tx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_tx_queues > IXGBE_MAX_VSI_QUEUES)
+		queues = IXGBE_MAX_VSI_QUEUES;
+	else
+		queues = adapter->num_tx_queues;
+
+	err = netif_set_real_num_tx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
-
-	err = netif_set_real_num_rx_queues(netdev,
-					   adapter->num_rx_pools > 1 ? 1 :
-					   adapter->num_rx_queues);
+	if (adapter->num_rx_pools > 1 &&
+	    adapter->num_rx_queues > IXGBE_MAX_VSI_QUEUES)
+		queues = IXGBE_MAX_VSI_QUEUES;
+	else
+		queues = adapter->num_rx_queues;
+	err = netif_set_real_num_rx_queues(netdev, queues);
 	if (err)
 		goto err_set_queues;
 
@@ -5178,6 +5165,16 @@ static int ixgbe_open(struct net_device *netdev)
 
 	ixgbe_up_complete(adapter);
 
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		struct ixgbe_vsi_adapter *vadapter;
+
+		if (!netif_is_vsi_port(upper))
+			continue;
+		vadapter = netdev_priv(upper);
+		if (vadapter->online)
+			ixgbe_vsi_up(upper);
+	}
+
 	return 0;
 
 err_set_queues:
@@ -5208,7 +5205,6 @@ static int ixgbe_close(struct net_device *netdev)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
 	ixgbe_ptp_stop(adapter);
-
 	ixgbe_down(adapter);
 	ixgbe_free_irq(adapter);
 
@@ -5383,9 +5379,10 @@ static void ixgbe_shutdown(struct pci_dev *pdev)
  **/
 void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 {
-	struct net_device *netdev = adapter->netdev;
+	struct net_device *vsi, *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct ixgbe_hw_stats *hwstats = &adapter->stats;
+	struct list_head *iter;
 	u64 total_mpc = 0;
 	u32 i, missed_rx = 0, mpc, bprc, lxon, lxoff, xon_off_tot;
 	u64 non_eop_descs = 0, restart_queue = 0, tx_busy = 0;
@@ -5407,6 +5404,34 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 		adapter->rsc_total_flush = rsc_flush;
 	}
 
+	netdev_for_each_upper_dev_rcu(netdev, vsi, iter) {
+		struct net_device_stats *vmdq_net_stats;
+		struct ixgbe_ring *rx_ring, *tx_ring;
+		struct ixgbe_vsi_adapter *vadapter;
+		unsigned int txq, rxq;
+		int j;
+
+		if (!netif_is_vsi_port(vsi))
+			continue;
+
+		vmdq_net_stats = &vsi->stats;
+		vadapter = netdev_priv(vsi);
+		txq = vadapter->tx_base_queue;
+		rxq = vadapter->rx_base_queue;
+
+		for (j = txq; j < txq + adapter->num_rx_queues_per_pool; j++) {
+			tx_ring = adapter->tx_ring[j];
+			vmdq_net_stats->tx_packets = tx_ring->stats.packets;
+			vmdq_net_stats->tx_bytes = tx_ring->stats.bytes;
+		}
+
+		for (j = rxq; j < rxq + adapter->num_rx_queues_per_pool; j++) {
+			rx_ring = adapter->rx_ring[j];
+			vmdq_net_stats->rx_packets = rx_ring->stats.packets;
+			vmdq_net_stats->rx_bytes = rx_ring->stats.bytes;
+		}
+	}
+
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		struct ixgbe_ring *rx_ring = adapter->rx_ring[i];
 		non_eop_descs += rx_ring->rx_stats.non_eop_descs;
@@ -5743,6 +5768,8 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 link_speed = adapter->link_speed;
+	struct net_device *upper;
+	struct list_head *iter;
 	bool flow_rx, flow_tx;
 
 	/* only continue if link was previously down */
@@ -5798,6 +5825,11 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter)
 
 	/* ping all the active vfs to let them know link has changed */
 	ixgbe_ping_all_vfs(adapter);
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+		netif_carrier_on(upper);
+	}
 }
 
 /**
@@ -5809,6 +5841,8 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct ixgbe_hw *hw = &adapter->hw;
+	struct net_device *upper;
+	struct list_head *iter;
 
 	adapter->link_up = false;
 	adapter->link_speed = 0;
@@ -5829,6 +5863,11 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
 
 	/* ping all the active vfs to let them know link has changed */
 	ixgbe_ping_all_vfs(adapter);
+	netdev_for_each_upper_dev_rcu(netdev, upper, iter) {
+		if (!netif_is_vsi_port(upper))
+			continue;
+		netif_carrier_off(upper);
+	}
 }
 
 /**
@@ -6561,7 +6600,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring,
 
 static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 {
-	netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_stop_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -6573,7 +6612,7 @@ static int __ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-	netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
+	netif_start_subqueue(netdev_ring(tx_ring), tx_ring->queue_index);
 	++tx_ring->tx_stats.restart_queue;
 	return 0;
 }
@@ -6624,6 +6663,9 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_ring *tx_ring)
 {
 	struct ixgbe_tx_buffer *first;
+#ifdef IXGBE_FCOE
+	struct net_device *dev;
+#endif
 	int tso;
 	u32 tx_flags = 0;
 	unsigned short f;
@@ -6715,9 +6757,10 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 	first->protocol = protocol;
 
 #ifdef IXGBE_FCOE
+	dev = netdev_ring(tx_ring);
 	/* setup tx offload for FCoE */
 	if ((protocol == __constant_htons(ETH_P_FCOE)) &&
-	    (tx_ring->netdev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
+	    (dev->features & (NETIF_F_FSO | NETIF_F_FCOE_CRC))) {
 		tso = ixgbe_fso(tx_ring, first, &hdr_len);
 		if (tso < 0)
 			goto out_drop;
@@ -7042,6 +7085,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	 */
 	if (netif_running(dev))
 		ixgbe_close(dev);
+
 	ixgbe_clear_interrupt_scheme(adapter);
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7290,6 +7334,94 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode);
 }
 
+static const struct net_device_ops ixgbe_vsi_netdev_ops = {
+	.ndo_init		= ixgbe_vsi_init,
+	.ndo_open		= ixgbe_vsi_open,
+	.ndo_stop		= ixgbe_vsi_close,
+	.ndo_start_xmit		= ixgbe_vsi_xmit_frame,
+	.ndo_set_rx_mode	= ixgbe_vsi_set_rx_mode,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_set_mac_address	= ixgbe_vsi_set_mac,
+	.ndo_change_mtu		= ixgbe_vsi_change_mtu,
+	.ndo_tx_timeout		= ixgbe_vsi_tx_timeout,
+	.ndo_vlan_rx_add_vid	= ixgbe_vsi_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid	= ixgbe_vsi_vlan_rx_kill_vid,
+	.ndo_set_features	= ixgbe_vsi_set_features,
+};
+
+static int ixgbe_vsi_add(struct net_device *dev, struct net_device *vsi)
+{
+	struct ixgbe_vsi_adapter *vsi_adapter = netdev_priv(vsi);
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	int pool, vmdq_pool, base_queue;
+
+	/* Check for hardware restriction on number of rx/tx queues */
+	if (vsi->num_rx_queues != vsi->num_tx_queues ||
+	    vsi->num_tx_queues > IXGBE_MAX_VSI_QUEUES ||
+	    vsi->num_tx_queues == IXGBE_BAD_VSI_QUEUE) {
+		e_info(drv, "%s: Supports RX/TX Queue counts 1,2, and 4\n",
+		       dev->name);
+		return -EINVAL;
+	}
+
+	if (adapter->num_rx_pools > IXGBE_MAX_VMDQ_INDICES)
+		return -EBUSY;
+
+	pool = find_first_zero_bit(&adapter->vsi_bitmask, 32);
+	adapter->num_rx_pools++;
+	set_bit(pool, &adapter->vsi_bitmask);
+
+	/* Enable VMDq flag so device will be set in VM mode */
+	adapter->flags |= IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED;
+	adapter->ring_feature[RING_F_VMDQ].limit = adapter->num_rx_pools;
+	adapter->ring_feature[RING_F_VMDQ].offset = 0;
+	adapter->ring_feature[RING_F_RSS].limit = IXGBE_MAX_VSI_QUEUES;
+
+	/* Force reinit of ring allocation with VMDQ enabled */
+	ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
+
+	/* Configure VSI adapter structure */
+	vmdq_pool = VMDQ_P(pool);
+	base_queue = vmdq_pool * adapter->num_rx_queues_per_pool;
+
+	netdev_dbg(dev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   pool, adapter->num_rx_pools,
+		   base_queue, base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->vsi_bitmask);
+
+	vsi_adapter->pool = pool;
+	vsi_adapter->netdev = vsi;
+	vsi_adapter->real_adapter = adapter;
+	vsi_adapter->rx_base_queue = base_queue;
+	vsi_adapter->tx_base_queue = base_queue;
+
+	vsi->netdev_ops = &ixgbe_vsi_netdev_ops;
+	ixgbe_vsi_set_ethtool_ops(vsi);
+
+	return 0;
+}
+
+static void ixgbe_vsi_del(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+
+	ixgbe_vsi_close(dev);
+	clear_bit(vadapter->pool, &adapter->vsi_bitmask);
+	adapter->num_rx_pools--;
+
+	netdev_dbg(dev, "pool %i:%i queues %i:%i VSI bitmask %lx\n",
+		   vadapter->pool, adapter->num_rx_pools,
+		   vadapter->rx_base_queue,
+		   vadapter->rx_base_queue + adapter->num_rx_queues_per_pool,
+		   adapter->vsi_bitmask);
+}
+
+size_t ixgbe_vsi_size(struct net_device *dev)
+{
+	return sizeof(struct ixgbe_vsi_adapter);
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -7334,6 +7466,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
 	.ndo_bridge_setlink	= ixgbe_ndo_bridge_setlink,
 	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
+	.ndo_vsi_add		= ixgbe_vsi_add,
+	.ndo_vsi_del		= ixgbe_vsi_del,
+	.ndo_vsi_size		= ixgbe_vsi_size
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c
new file mode 100644
index 0000000..48ad793
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.c
@@ -0,0 +1,428 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2013 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+#include <linux/tcp.h>
+
+#include "ixgbe.h"
+#include "ixgbe_vsi.h"
+
+/**
+ * ixgbe_del_mac_filter - Add a mac filter for the VSI netdev
+ * @adapter: pointer to private adapter struct
+ * @p: pointer to mac address (u8*) to program
+ * @pool: VSI pool to configure with MAC address
+ */
+static void ixgbe_del_mac_filter(struct ixgbe_adapter *adapter, u16 pool)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	unsigned int entry;
+
+	entry = hw->mac.num_rar_entries - pool - 1;
+	hw->mac.ops.clear_vmdq(hw, entry, VMDQ_P(pool));
+}
+
+/**
+ * ixgbe_add_mac_filter - Add a mac filter for the VSI netdev
+ * @adapter: pointer to private adapter struct
+ * @p: pointer to mac address (u8*) to program
+ * @pool: VSI pool to configure with MAC address
+ */
+static void ixgbe_add_mac_filter(struct ixgbe_adapter *adapter,
+				 u8 *addr, u16 pool)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	unsigned int entry;
+
+	entry = hw->mac.num_rar_entries - pool;
+	hw->mac.ops.set_rar(hw, entry, addr, VMDQ_P(pool), IXGBE_RAH_AV);
+}
+
+static void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter, u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMC, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMC_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+}
+
+/**
+ * ixgbe_disable_vsi_ring - shutdown hw queue and release buffers
+ * @adapter: pointer to private adapter struct
+ * @rx_ring: ring to free
+ */
+static void ixgbe_disable_vsi_ring(struct ixgbe_vsi_adapter *vadapter,
+				    struct ixgbe_ring *rx_ring)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int index = rx_ring->queue_index + vadapter->rx_base_queue;
+
+	/* shutdown specific queue receive and wait for dma to settle */
+	ixgbe_disable_rx_queue(adapter, rx_ring);
+	usleep_range(10000, 20000);
+	ixgbe_irq_disable_queues(adapter, ((u64)1 << index));
+	ixgbe_clean_rx_ring(rx_ring);
+}
+
+/**
+ * ixgbe_enable_vsi_ring - allocate rx buffers and start queue
+ * @adapter: pointer to private adapter struct
+ * @rx_ring: ring to enable with queue
+ */
+static void ixgbe_enable_vsi_ring(struct ixgbe_adapter *adapter,
+				   struct ixgbe_ring *rx_ring)
+{
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+}
+
+int ixgbe_vsi_init(struct net_device *dev)
+{
+	struct net_device *lower;
+	struct list_head *iter;
+
+	/* There can only be one lower dev at this point */
+	netdev_for_each_lower_dev_rcu(dev, lower, iter) {
+		dev->features		= lower->features;
+		dev->vlan_features	= lower->vlan_features;
+		dev->gso_max_size	= lower->gso_max_size;
+		dev->iflink		= lower->ifindex;
+		dev->hard_header_len	= lower->hard_header_len;
+	}
+
+	return 0;
+}
+
+static void ixgbe_vsi_psrtype(struct ixgbe_vsi_adapter *vadapter)
+{
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int rss_i = vadapter->netdev->real_num_rx_queues;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u16 pool = vadapter->pool;
+	u32 psrtype = IXGBE_PSRTYPE_TCPHDR |
+		      IXGBE_PSRTYPE_UDPHDR |
+		      IXGBE_PSRTYPE_IPV4HDR |
+		      IXGBE_PSRTYPE_L2HDR |
+		      IXGBE_PSRTYPE_IPV6HDR;
+
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		return;
+
+	if (rss_i > 3)
+		psrtype |= 2 << 29;
+	else if (rss_i > 1)
+		psrtype |= 1 << 29;
+
+	IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(VMDQ_P(pool)), psrtype);
+}
+
+int ixgbe_vsi_open(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	int err = 0;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state)) {
+		e_dev_info("%s is down\n", adapter->netdev->name);
+		return -EBUSY;
+	}
+
+	vadapter->online = true;
+	err = ixgbe_vsi_up(dev);
+	if (!err)
+		netif_carrier_on(dev);
+	return err;
+}
+
+int ixgbe_vsi_up(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	unsigned int rxbase = vadapter->pool * adapter->num_rx_queues_per_pool;
+	unsigned int txbase = vadapter->pool * adapter->num_rx_queues_per_pool;
+	int err, i;
+
+	netif_carrier_off(dev);
+
+	vadapter->rx_base_queue = rxbase;
+	vadapter->tx_base_queue = txbase;
+
+	for (i = 0; i < dev->num_rx_queues; i++)
+		ixgbe_disable_vsi_ring(vadapter, adapter->rx_ring[rxbase + i]);
+
+	for (i = 0; i < dev->num_rx_queues; i++) {
+		adapter->rx_ring[rxbase + i]->vmdq_netdev = dev;
+		ixgbe_enable_vsi_ring(adapter, adapter->rx_ring[rxbase + i]);
+	}
+
+	for (i = 0; i < dev->num_tx_queues; i++)
+		adapter->tx_ring[txbase + i]->vmdq_netdev = dev;
+
+	if (is_valid_ether_addr(dev->dev_addr))
+		ixgbe_add_mac_filter(adapter, dev->dev_addr, vadapter->pool);
+
+	err = netif_set_real_num_tx_queues(dev, dev->num_tx_queues);
+	if (err)
+		goto err_set_queues;
+	err = netif_set_real_num_rx_queues(dev, dev->num_rx_queues);
+	if (err)
+		goto err_set_queues;
+
+	ixgbe_vsi_psrtype(vadapter);
+	netif_tx_start_all_queues(dev);
+	return 0;
+err_set_queues:
+	for (i = 0; i < dev->num_rx_queues; i++)
+		ixgbe_disable_vsi_ring(vadapter, adapter->rx_ring[rxbase + i]);
+	return err;
+}
+
+int ixgbe_vsi_close(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+
+	vadapter->online = false;
+	return ixgbe_vsi_down(dev);
+}
+
+int ixgbe_vsi_down(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	unsigned int rxbase = vadapter->rx_base_queue;
+	int i;
+
+	netif_tx_stop_all_queues(dev);
+	netif_carrier_off(dev);
+	netif_tx_disable(dev);
+
+	for (i = 0; i < dev->num_rx_queues; i++)
+		ixgbe_disable_vsi_ring(vadapter, adapter->rx_ring[rxbase + i]);
+
+	return 0;
+}
+
+netdev_tx_t ixgbe_vsi_xmit_frame(struct sk_buff *skb, struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_ring *tx_ring;
+	unsigned int queue;
+
+	queue = skb->queue_mapping + vadapter->tx_base_queue;
+	tx_ring = vadapter->real_adapter->tx_ring[queue];
+
+	return ixgbe_xmit_frame_ring(skb, vadapter->real_adapter, tx_ring);
+}
+
+struct net_device_stats *ixgbe_vsi_get_stats(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+
+	/* only return the current stats */
+	return &vadapter->net_stats;
+}
+
+int ixgbe_vsi_set_features(struct net_device *dev, netdev_features_t features)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 vlnctrl;
+	int i;
+
+	for (i = 0; i < dev->num_rx_queues; i++) {
+		unsigned int index = i + vadapter->rx_base_queue;
+		int reg_idx = adapter->rx_ring[index]->reg_idx;
+
+		vlnctrl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+		if (features & NETIF_F_HW_VLAN_CTAG_RX)
+			vlnctrl |= IXGBE_RXDCTL_VME;
+		else
+			vlnctrl &= ~IXGBE_RXDCTL_VME;
+		IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), vlnctrl);
+	}
+
+	return 0;
+}
+
+void ixgbe_vsi_set_rx_mode(struct net_device *dev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 vmolr;
+
+	/* No unicast promiscuous support for VMDQ devices. */
+	vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vadapter->pool));
+	vmolr |= (IXGBE_VMOLR_ROMPE | IXGBE_VMOLR_BAM | IXGBE_VMOLR_AUPE);
+
+	/* clear the affected bit */
+	vmolr &= ~IXGBE_VMOLR_MPE;
+
+	if (dev->flags & IFF_ALLMULTI) {
+		vmolr |= IXGBE_VMOLR_MPE;
+	} else {
+		vmolr |= IXGBE_VMOLR_ROMPE;
+		hw->mac.ops.update_mc_addr_list(hw, dev);
+	}
+	ixgbe_write_uc_addr_list(adapter->netdev);
+	IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vadapter->pool), vmolr);
+}
+
+int ixgbe_vsi_set_mac(struct net_device *dev, void *p)
+{
+	struct sockaddr *addr = p;
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	ixgbe_del_mac_filter(adapter, vadapter->pool);
+	memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+	ixgbe_add_mac_filter(adapter, dev->dev_addr, vadapter->pool);
+	return 0;
+}
+
+int ixgbe_vsi_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+
+	if (adapter->netdev->mtu < new_mtu) {
+		e_warn(probe,
+		       "Set MTU on %s to >= %d before changing MTU on %s\n",
+			adapter->netdev->name, new_mtu, dev->name);
+		return -EINVAL;
+	}
+	dev->mtu = new_mtu;
+	return 0;
+}
+
+void ixgbe_vsi_tx_timeout(struct net_device *dev)
+{
+	return;
+}
+
+int ixgbe_vsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_hw *hw = &vadapter->real_adapter->hw;
+
+	hw->mac.ops.set_vfta(hw, vid, vadapter->pool, true);
+	return 0;
+}
+
+int ixgbe_vsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(dev);
+	struct ixgbe_hw *hw = &vadapter->real_adapter->hw;
+
+	hw->mac.ops.set_vfta(hw, vid, vadapter->pool, false);
+	return 0;
+}
+
+static int ixgbe_vsi_get_settings(struct net_device *netdev,
+				   struct ethtool_cmd *ecmd)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+	struct net_device *real_netdev = vadapter->real_adapter->netdev;
+
+	return ixgbe_get_settings(real_netdev, ecmd);
+}
+
+static u32 ixgbe_vsi_get_msglevel(struct net_device *netdev)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+
+	return vadapter->real_adapter->msg_enable;
+}
+
+static void ixgbe_vsi_get_drvinfo(struct net_device *netdev,
+				   struct ethtool_drvinfo *drvinfo)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+	struct ixgbe_adapter *adapter = vadapter->real_adapter;
+	struct net_device *main_netdev = adapter->netdev;
+
+	strncpy(drvinfo->driver, ixgbe_driver_name, 32);
+	strncpy(drvinfo->version, ixgbe_driver_version, 32);
+
+	strncpy(drvinfo->fw_version, "N/A", 4);
+	snprintf(drvinfo->bus_info, 32, "%s VSI %d",
+		 main_netdev->name, vadapter->pool);
+	drvinfo->n_stats = 0;
+	drvinfo->testinfo_len = 0;
+	drvinfo->regdump_len = 0;
+}
+
+static void ixgbe_vsi_get_ringparam(struct net_device *netdev,
+				     struct ethtool_ringparam *ring)
+{
+	struct ixgbe_vsi_adapter *vadapter = netdev_priv(netdev);
+	unsigned int txr = vadapter->tx_base_queue;
+	unsigned int rxr = vadapter->rx_base_queue;
+	struct ixgbe_ring *tx_ring, *rx_ring;
+
+	tx_ring = vadapter->real_adapter->tx_ring[txr];
+	rx_ring = vadapter->real_adapter->rx_ring[rxr];
+
+	ring->rx_max_pending = IXGBE_MAX_RXD;
+	ring->tx_max_pending = IXGBE_MAX_TXD;
+	ring->rx_mini_max_pending = 0;
+	ring->rx_jumbo_max_pending = 0;
+	ring->rx_pending = rx_ring->count;
+	ring->tx_pending = tx_ring->count;
+	ring->rx_mini_pending = 0;
+	ring->rx_jumbo_pending = 0;
+}
+
+static struct ethtool_ops ixgbe_vsi_ethtool_ops = {
+	.get_settings	= ixgbe_vsi_get_settings,
+	.get_drvinfo	= ixgbe_vsi_get_drvinfo,
+	.get_link	= ethtool_op_get_link,
+	.get_ringparam	= ixgbe_vsi_get_ringparam,
+	.get_msglevel	= ixgbe_vsi_get_msglevel,
+};
+
+void ixgbe_vsi_set_ethtool_ops(struct net_device *netdev)
+{
+	SET_ETHTOOL_OPS(netdev, &ixgbe_vsi_ethtool_ops);
+}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h
new file mode 100644
index 0000000..509f485
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_vsi.h
@@ -0,0 +1,71 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2013 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+#include "ixgbe.h"
+
+void ixgbe_ping_all_vadapters(struct net_device *dev);
+int ixgbe_vsi_init(struct net_device *dev);
+int ixgbe_vsi_up(struct net_device *dev);
+int ixgbe_vsi_open(struct net_device *dev);
+int ixgbe_vsi_close(struct net_device *dev);
+int ixgbe_vsi_down(struct net_device *dev);
+netdev_tx_t ixgbe_vsi_xmit_frame(struct sk_buff *skb, struct net_device *dev);
+struct net_device_stats *ixgbe_vsi_get_stats(struct net_device *dev);
+void ixgbe_vsi_set_rx_mode(struct net_device *dev);
+int ixgbe_vsi_set_mac(struct net_device *dev, void *addr);
+int ixgbe_vsi_change_mtu(struct net_device *dev, int new_mtu);
+void ixgbe_vsi_tx_timeout(struct net_device *dev);
+int ixgbe_vsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid);
+int ixgbe_vsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid);
+void ixgbe_vsi_set_ethtool_ops(struct net_device *netdev);
+int ixgbe_vsi_set_features(struct net_device *netdev,
+			   netdev_features_t features);
+
+static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
+					   u64 qmask)
+{
+	u32 mask;
+	struct ixgbe_hw *hw = &adapter->hw;
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = (IXGBE_EIMS_RTX_QUEUE & qmask);
+		IXGBE_WRITE_REG(hw, IXGBE_EIMS, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+	/* skip the flush */
+}

^ permalink raw reply related

* ip l2tp - suspected defect using IPv6 local/remote addresses
From: Jeff Loughridge @ 2013-09-11 18:52 UTC (permalink / raw)
  To: netdev

Using IPv6 address as L2TPv3 endpoints doesn't seem to work in
iproute2 3.11. I see a cosmetic defect in the output of 'ip l2tp show
tunnel'. In addition, I can't get tunnels to function with UDP or IP
encapsulation.

root@debian:~# ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000
encap udp local a::1 remote a::2 udp_sport 5000 udp_dport 6000
root@debian:~# ip l2tp add session tunnel_id 3000 session_id 1000
peer_session_id 2000
root@debian:~# ip l2tp show tunnel
Tunnel 3000, encap UDP
  From 127.0.0.1 to 127.0.0.1
  Peer tunnel 4000
  UDP source / dest ports: 5000/6000
root@debian:~#

See below for more details about my set-up. I have no problems using
IPv4 endpoints.

root@debian:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN group default qlen 1000
    link/ether 00:0c:29:91:cb:3c brd ff:ff:ff:ff:ff:ff
    inet 5.1.0.100/24 scope global eth0
    inet6 a::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe91:cb3c/64 scope link
       valid_lft forever preferred_lft forever
root@debian:~#
root@debian:~# ip -V
ip utility, iproute2-ss130903
root@debian:~# uname -a
Linux debian 3.2.0-4-686-pae #1 SMP Debian 3.2.41-2 i686 GNU/Linux
root@debian:~#
root@debian:~# modprobe l2tp_eth
root@debian:~# lsmod | grep l2tp
l2tp_eth               12738  0
l2tp_netlink           17263  1 l2tp_eth
l2tp_core              17486  2 l2tp_netlink,l2tp_eth

Jeff L.

^ permalink raw reply

* Re: [PATCH] bnx2x: avoid atomic allocations during initialization
From: David Miller @ 2013-09-11 19:44 UTC (permalink / raw)
  To: mschmidt; +Cc: netdev, ariele, eilong
In-Reply-To: <1378411989-19775-1-git-send-email-mschmidt@redhat.com>

From: Michal Schmidt <mschmidt@redhat.com>
Date: Thu,  5 Sep 2013 22:13:09 +0200

> During initialization bnx2x allocates significant amounts of memory
> (for rx data, rx SGEs, TPA pool) using atomic allocations.
> 
> I received a report where bnx2x failed to allocate SGEs and it had
> to fall back to TPA-less operation.
> 
> Let's use GFP_KERNEL allocations during initialization, which runs
> in process context. Add gfp_t parameters to functions that are used
> both in initialization and in the receive path.
> 
> Use an unlikely branch in bnx2x_frag_alloc() to avoid atomic allocation
> by netdev_alloc_frag(). The branch is taken several thousands of times
> during initialization, but then never more. Note that fp->rx_frag_size
> is never greater than PAGE_SIZE, so __get_free_page() can be used here.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

I've applied this, the voiced objections were completely unreasonable
and absolutely do not match the established basic approaches to memory
allocation taken in every other major networking driver.

Thanks Michal.

^ permalink raw reply

* Re: [PATCH] cxgb4: remove workqueue when driver registration fails
From: David Miller @ 2013-09-11 19:52 UTC (permalink / raw)
  To: weiyang; +Cc: dm, netdev
In-Reply-To: <1378431145-7203-1-git-send-email-weiyang@linux.vnet.ibm.com>

From: Wei Yang <weiyang@linux.vnet.ibm.com>
Date: Fri,  6 Sep 2013 09:32:25 +0800

> When driver registration fails, we need to clean up the resources allocated
> before. cxgb4 missed to destroy the workqueue allocated at the very beginning.
> 
> This patch destroies the workqueue when registration fails.
> 
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>

This patch was corrupted by your email client, please email the patch
to yourself, make sure the patch you receive in the email applies
cleanly to the current 'net' tree, and then resubmit.

Thanks.

^ permalink raw reply

* [PATCH v2 net 1/1] RFC: drivers/net/phy: Fix for a BCM5482S auto-negotiation problem
From: Corey Ashford @ 2013-09-11 19:52 UTC (permalink / raw)
  To: netdev; +Cc: Corey Ashford

Differences from v1 of this patch:
- Added description of the patch remedy for the problem
- Added configurations tested
- Corrected spelling of "negotiation" in the subject line

When a 1Gb network interface is used in SGMII mode off, and then
a Broadcom 5482S is used to redrive SGMII (SGMII to SGMII mode)
to a Broadcom 54616 PHY, the standard PHY registers do not appear to
be updated correctly after auto-negotiation.  This causes the
kernel to get confused about the state of the link and also
causes the MAC layer driver to inappropriately configure the MAC.

By 'standard' registers I mean those that are read by genphy_read_link
(MII_BMSR, 0x01) and genphy_read_status (MII_STAT1000, 0x0a; MII_CTRL1000,
0x09; MII_LPA, 0x05).

Below you can find register dumps for the various configurations.

The patch uses the Operating Mode Register in the BCM5482S that appears to be
updated correctly with the speed / duplex / link status upon all
configurations of the PHY. The general registers are not always updated
correctly but this OMR register always had correct values for the various
configurations that were tested.

That said, I've only tested two of the configurations:
SGMII to SGMII
SGMII to serdes

SGMII-to-SGMII mode, 1Gb, full duplex:

# /var/dump1GPhy
Shadow register '11111'=0x00007E5C
Dumping registers for PHY at port e:
PHY register 0=0x00001140
PHY register 1=0x00007949
PHY register 2=0x00000143
PHY register 3=0x0000BCB2
PHY register 4=0x000001E1
PHY register 5=0x00000000
PHY register 6=0x00000064
PHY register 7=0x00002001
PHY register 8=0x00000000
PHY register 9=0x00000200
PHY register A=0x00000000
PHY register B=0x00000000
PHY register C=0x00000000
PHY register D=0x00000000
PHY register E=0x00000000
PHY register F=0x00003000
PHY register 10=0x00001000
PHY register 11=0x00002000
PHY register 12=0x00000000
PHY register 13=0x00000C00
PHY register 14=0x00000000
PHY register 15=0x00000000
PHY register 16=0x00000000
PHY register 17=0x00000000
PHY register 18=0x00000400
PHY register 19=0x00001000
PHY register 1A=0x00000000
PHY register 1B=0x0000FFF1
PHY register 1C=0x00007E5C
PHY register 1D=0x00000000
PHY register 1E=0x00000000
PHY register 1F=0x00000000
PHY expansion register E00=0x00001140
PHY expansion register E01=0x0000016D
PHY expansion register E02=0x00000143
PHY expansion register E03=0x0000BCB2
PHY expansion register E04=0x00000001
PHY expansion register E05=0x0000D801
PHY expansion register E06=0x00000064
PHY expansion register E07=0x00002001
PHY expansion register E08=0x00000000
PHY expansion register E09=0x00000000
PHY expansion register E0A=0x00000000
PHY expansion register E0B=0x00000000
PHY expansion register E0C=0x00000000
PHY expansion register E0D=0x00000000
PHY expansion register E0E=0x00000000
PHY expansion register E0F=0x0000C000
PHY expansion register E10=0x00000000
PHY expansion register E11=0x00000000
PHY expansion register E12=0x00000080
PHY expansion register E13=0x00000089
PHY expansion register E14=0x00000000
PHY expansion register E15=0x0000038A
PHY expansion register E16=0x0000002E
Mode status register = 0x0000D072
Successfully completed!

SGMII-to-SGMII mode, 100Mb, half duplex:

# /var/dump1GPhy
Shadow register '11111'=0x00007E5C
Dumping registers for PHY at port e:
PHY register 0=0x00001140
PHY register 1=0x00007949
PHY register 2=0x00000143
PHY register 3=0x0000BCB2
PHY register 4=0x000001E1
PHY register 5=0x00000000
PHY register 6=0x00000064
PHY register 7=0x00002001
PHY register 8=0x00000000
PHY register 9=0x00000200
PHY register A=0x00000000
PHY register B=0x00000000
PHY register C=0x00000000
PHY register D=0x00000000
PHY register E=0x00000000
PHY register F=0x00003000
PHY register 10=0x00001000
PHY register 11=0x00002000
PHY register 12=0x00000000
PHY register 13=0x00000C00
PHY register 14=0x00000000
PHY register 15=0x00000000
PHY register 16=0x00000000
PHY register 17=0x00000000
PHY register 18=0x00000400
PHY register 19=0x00001000
PHY register 1A=0x00000000
PHY register 1B=0x0000FFF1
PHY register 1C=0x00007E5C
PHY register 1D=0x00000000
PHY register 1E=0x00000000
PHY register 1F=0x00000000
PHY expansion register E00=0x00001140
PHY expansion register E01=0x00000169
PHY expansion register E02=0x00000143
PHY expansion register E03=0x0000BCB2
PHY expansion register E04=0x00000001
PHY expansion register E05=0x0000C401
PHY expansion register E06=0x00000066
PHY expansion register E07=0x00002001
PHY expansion register E08=0x00000000
PHY expansion register E09=0x00000000
PHY expansion register E0A=0x00000000
PHY expansion register E0B=0x00000000
PHY expansion register E0C=0x00000000
PHY expansion register E0D=0x00000000
PHY expansion register E0E=0x00000000
PHY expansion register E0F=0x0000C000
PHY expansion register E10=0x00000000
PHY expansion register E11=0x00000000
PHY expansion register E12=0x00000080
PHY expansion register E13=0x00000058
PHY expansion register E14=0x00000000
PHY expansion register E15=0x0000026A
PHY expansion register E16=0x0000002E
Mode status register = 0x0000A072
Successfully completed!

SGMII-to-serdes, 1Gb, full duplex:

# /var/dump1GPhy
Shadow register '11111'=0x00007E5C
Dumping registers for PHY at port e:
PHY register 0=0x00001140
PHY register 1=0x00007949
PHY register 2=0x00000143
PHY register 3=0x0000BCB2
PHY register 4=0x000001E1
PHY register 5=0x00000000
PHY register 6=0x00000064
PHY register 7=0x00002001
PHY register 8=0x00000000
PHY register 9=0x00000200
PHY register A=0x00000000
PHY register B=0x00000000
PHY register C=0x00000000
PHY register D=0x00000000
PHY register E=0x00000000
PHY register F=0x00003000
PHY register 10=0x00001000
PHY register 11=0x00000000
PHY register 12=0x00000000
PHY register 13=0x00000C00
PHY register 14=0x00000000
PHY register 15=0x0000D072
PHY register 16=0x00000000
PHY register 17=0x00000F42
PHY register 18=0x00000400
PHY register 19=0x00001000
PHY register 1A=0x00000000
PHY register 1B=0x0000FFF1
PHY register 1C=0x00007E5C
PHY register 1D=0x00000000
PHY register 1E=0x00000000
PHY register 1F=0x00000000
PHY expansion register E00=0x00001140
PHY expansion register E01=0x0000014D
PHY expansion register E02=0x00000143
PHY expansion register E03=0x0000BCB2
PHY expansion register E04=0x00000060
PHY expansion register E05=0x00000000
PHY expansion register E06=0x00000064
PHY expansion register E07=0x00002001
PHY expansion register E08=0x00000000
PHY expansion register E09=0x00000000
PHY expansion register E0A=0x00000000
PHY expansion register E0B=0x00000000
PHY expansion register E0C=0x00000000
PHY expansion register E0D=0x00000000
PHY expansion register E0E=0x00000000
PHY expansion register E0F=0x0000C000
PHY expansion register E10=0x00000000
PHY expansion register E11=0x00000000
PHY expansion register E12=0x00000080
PHY expansion register E13=0x000002D3
PHY expansion register E14=0x00000000
PHY expansion register E15=0x00000388
PHY expansion register E16=0x0000002E
Mode status register = 0x0000D072
Successfully completed!

Signed-off-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
---
 drivers/net/phy/broadcom.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index f8c90ea..9f5d076 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -142,6 +142,13 @@
 #define  MII_BCM54XX_EXP_EXP96_MYST		0x0010
 #define MII_BCM54XX_EXP_EXP97			0x0f97
 #define  MII_BCM54XX_EXP_EXP97_MYST		0x0c0c
+#define MII_BCM54XX_EXP_OPER_MODE		(MII_BCM54XX_EXP_SEL_ER | 0x42)
+#define MII_BCM54XX_EXP_OPER_MODE_SERDES_LINK		0x8000
+#define MII_BCM54XX_EXP_OPER_MODE_SERDES_SPEED_MASK	0x6000
+#define MII_BCM54XX_EXP_OPER_MODE_SERDES_SPEED_1000	0x4000
+#define MII_BCM54XX_EXP_OPER_MODE_SERDES_SPEED_100	0x2000
+#define MII_BCM54XX_EXP_OPER_MODE_SERDES_SPEED_10	0x0000
+#define MII_BCM54XX_EXP_OPER_MODE_SERDES_DUPLEX		0x1000

 /*
  * BCM5482: Secondary SerDes registers
@@ -491,21 +498,31 @@ static int bcm5482_config_init(struct phy_device *phydev)
 static int bcm5482_read_status(struct phy_device *phydev)
 {
 	int err;
+	err = bcm54xx_exp_read(phydev, MII_BCM54XX_EXP_OPER_MODE);
+	if (err < 0)
+		return err;

-	err = genphy_read_status(phydev);
+	phydev->link = ((err & MII_BCM54XX_EXP_OPER_MODE_SERDES_LINK) ==
+		MII_BCM54XX_EXP_OPER_MODE_SERDES_LINK);

-	if (phydev->dev_flags & PHY_BCM_FLAGS_MODE_1000BX) {
-		/*
-		 * Only link status matters for 1000Base-X mode, so force
-		 * 1000 Mbit/s full-duplex status
-		 */
-		if (phydev->link) {
+	if (phydev->link) {
+		switch (err & MII_BCM54XX_EXP_OPER_MODE_SERDES_SPEED_MASK) {
+		case MII_BCM54XX_EXP_OPER_MODE_SERDES_SPEED_1000:
 			phydev->speed = SPEED_1000;
-			phydev->duplex = DUPLEX_FULL;
+			break;
+		case MII_BCM54XX_EXP_OPER_MODE_SERDES_SPEED_100:
+			phydev->speed = SPEED_100;
+			break;
+		default:
+			phydev->speed = SPEED_10;
+			break;
 		}
+		if (err & MII_BCM54XX_EXP_OPER_MODE_SERDES_DUPLEX)
+			phydev->duplex = DUPLEX_FULL;
+		else
+			phydev->duplex = DUPLEX_HALF;
 	}
-
-	return err;
+	return 0;
 }

 static int bcm54xx_ack_interrupt(struct phy_device *phydev)
-- 
1.8.1.4

^ permalink raw reply related

* Re: [patch net/stable] ipv6/exthdrs: accept tlv which includes only padding
From: David Miller @ 2013-09-11 19:53 UTC (permalink / raw)
  To: jiri; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber, eldad
In-Reply-To: <1378476145-6282-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Fri,  6 Sep 2013 16:02:25 +0200

> In rfc4942 and rfc2460 I cannot find anything which would implicate to
> drop packets which have only padding in tlv.
> 
> Current behaviour breaks TAHI Test v6LC.1.2.6.
> 
> Problem was intruduced in:
> 9b905fe6843 "ipv6/exthdrs: strict Pad1 and PadN check"
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

Ok I've changed my position, applied and queued up for -stable.

Thanks.

^ permalink raw reply

* Re: [PATCH -net v2 0/2] bonding: fix arp_validate desync state & race
From: David Miller @ 2013-09-11 19:55 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, fubar, andy
In-Reply-To: <1378504826-18855-1-git-send-email-nikolay@redhat.com>

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Sat,  7 Sep 2013 00:00:24 +0200

> Hello all,
> These two patches aim to fix the possible de-sync state which the bond
> can enter if we have arp_validate without arp_interval or the other way
> around. They also fix a race condition between arp_validate setting and
> mode changing.
> 
> Patch 01 - fixes the race condition between store_arp_validate and bond
> mode change by using rtnl for sync
> Patch 02 - fixes the possible de-sync state by setting/unsetting recv_probe
> if arp_interval is set/unset and also if arp_validate is set/unset
> 
> v2: Fix the mode check in store_arp_validate

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net 1/1] qlcnic: Fix warning reported by kbuild test robot.
From: David Miller @ 2013-09-11 20:00 UTC (permalink / raw)
  To: jitendra.kalsaria; +Cc: netdev, sony.chacko, shahed.shaikh, Dept-HSGLinuxNICDev
In-Reply-To: <1378505072-7879-1-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Date: Fri, 6 Sep 2013 18:04:32 -0400

> From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
> 
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c: In function 'qlcnic_handle_fw_message':
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c:922:4: warning: overflow in implicit constant conversion [-Woverflow]
> 
> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Applied, thanks.

^ permalink raw reply

* Re: macvlan: Move skb_clone check closer to call
From: David Miller @ 2013-09-11 20:04 UTC (permalink / raw)
  To: horms; +Cc: herbert, netdev
In-Reply-To: <20130910013810.GA28966@verge.net.au>

From: Simon Horman <horms@verge.net.au>
Date: Tue, 10 Sep 2013 10:38:10 +0900

> On Sat, Sep 07, 2013 at 12:27:11PM +1000, Herbert Xu wrote:
>> Currently macvlan calls skb_clone in macvlan_broadcast but checks
>> for a NULL return in macvlan_broadcast_one instead.  This is
>> needlessly confusing and may lead to bugs introduced later.
>> 
>> This patch moves the error check to where the skb_clone call is.
>> 
>> The only other caller of macvlan_broadcast_one never passes in a
>> NULL value so it doesn't need the check either.
>> 
>> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> This seems good to me as macvlan_handle_frame(), which is
> the only other caller of macvlan_broadcast_one(), already checks
> that skb is non-NULL before calling macvlan_handle_frame().
> 
> Reviewed-by: Simon Horman <horms@verge.net.au>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH 1/2] irda: donauboe: Remove casting the return value which is a void pointer
From: David Miller @ 2013-09-11 20:13 UTC (permalink / raw)
  To: jg1.han; +Cc: netdev, samuel
In-Reply-To: <004b01cead1c$8e1678d0$aa436a70$%han@samsung.com>

From: Jingoo Han <jg1.han@samsung.com>
Date: Mon, 09 Sep 2013 14:22:19 +0900

> Casting the return value which is a void pointer is redundant.
> The conversion from void pointer to any other pointer type is
> guaranteed by the C programming language.
> 
> Signed-off-by: Jingoo Han <jg1.han@samsung.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] irda: vlsi_ir: Remove casting the return value which is a void pointer
From: David Miller @ 2013-09-11 20:13 UTC (permalink / raw)
  To: jg1.han; +Cc: netdev, samuel
In-Reply-To: <004c01cead1c$d899f120$89cdd360$%han@samsung.com>

From: Jingoo Han <jg1.han@samsung.com>
Date: Mon, 09 Sep 2013 14:24:24 +0900

> Casting the return value which is a void pointer is redundant.
> The conversion from void pointer to any other pointer type is
> guaranteed by the C programming language.
> 
> Signed-off-by: Jingoo Han <jg1.han@samsung.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: fix multiqueue selection
From: David Miller @ 2013-09-11 20:13 UTC (permalink / raw)
  To: alexander.duyck; +Cc: eric.dumazet, netdev, alexander.h.duyck
In-Reply-To: <522BF85B.3000909@gmail.com>

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sat, 07 Sep 2013 21:08:59 -0700

> On 09/07/2013 12:02 PM, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>> 
>> commit 416186fbf8c5b4e4465 ("net: Split core bits of netdev_pick_tx
>> into __netdev_pick_tx") added a bug that disables caching of queue
>> index in the socket.
>> 
>> This is the source of packet reorders for TCP flows, and
>> again this is happening more often when using FQ pacing.
>> 
>> Old code was doing 
>> 
>> if (queue_index != old_index)
>> 	sk_tx_queue_set(sk, queue_index);
>> 
>> Alexander renamed the variables but forgot to change sk_tx_queue_set()
>> 2nd parameter.
>> 
>> if (queue_index != new_index)
>> 	sk_tx_queue_set(sk, queue_index);
>> 
>> This means we store -1 over and over in sk->sk_tx_queue_mapping
>> 
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
 ...
> Ugh, my bad.  This is a nasty one too since the behaviour appeared to be
> correct for most cases.
> 
> It looks like this needs to go into stable for 3.9 - 3.11.
> 
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> 

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net] net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE
From: David Miller @ 2013-09-11 20:14 UTC (permalink / raw)
  To: dborkman; +Cc: netdev, linux-sctp, jacob.e.keller
In-Reply-To: <1378565099-20987-1-git-send-email-dborkman@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Sat,  7 Sep 2013 16:44:59 +0200

> If we do not add braces around ...
> 
>   mask |= POLLERR |
>           sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0;
> 
> ... then this condition always evaluates to true as POLLERR is
> defined as 8 and binary or'd with whatever result comes out of
> sock_flag(). Hence instead of (X | Y) ? A : B, transform it into
> X | (Y ? A : B). Unfortunatelty, commit 8facd5fb73 ("net: fix
> smatch warnings inside datagram_poll") forgot about SCTP. :-(
> 
> Introduced by 7d4c04fc170 ("net: add option to enable error queue
> packets waking select").
> 
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] net: sctp: fix smatch warning in sctp_send_asconf_del_ip
From: David Miller @ 2013-09-11 20:14 UTC (permalink / raw)
  To: dborkman; +Cc: netdev, linux-sctp, nhorman, micchie
In-Reply-To: <1378579881-27881-1-git-send-email-dborkman@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Sat,  7 Sep 2013 20:51:21 +0200

> This was originally reported in [1] and posted by Neil Horman [2], he said:
> 
>   Fix up a missed null pointer check in the asconf code. If we don't find
>   a local address, but we pass in an address length of more than 1, we may
>   dereference a NULL laddr pointer. Currently this can't happen, as the only
>   users of the function pass in the value 1 as the addrcnt parameter, but
>   its not hot path, and it doesn't hurt to check for NULL should that ever
>   be the case.
> 
> The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered
> from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1
> while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR
> so that we do *not* find a single address in the association's bind address
> list that is not in the packed array of addresses. If this happens when we
> have an established association with ASCONF-capable peers, then we could get
> a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1
> and call later sctp_make_asconf_update_ip() with NULL laddr.
> 
> BUT: this actually won't happen as sctp_bindx_rem() will catch such a case
> and return with an error earlier. As this is incredably unintuitive and error
> prone, add a check to catch at least future bugs here. As Neil says, its not
> hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the
> single-homed host").
> 
>  [1] http://www.spinics.net/lists/linux-sctp/msg02132.html
>  [2] http://www.spinics.net/lists/linux-sctp/msg02133.html
> 
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] net: fib: fib6_add: fix potential NULL pointer dereference
From: David Miller @ 2013-09-11 20:14 UTC (permalink / raw)
  To: matti.vaittinen; +Cc: dborkman, netdev, mlin
In-Reply-To: <522D69FF.9000001@nsn.com>

From: Matti Vaittinen <matti.vaittinen@nsn.com>
Date: Mon, 09 Sep 2013 09:26:07 +0300

> On 09/07/2013 10:35 PM, ext Hannes Frederic Sowa wrote:
>> On Sat, Sep 07, 2013 at 03:13:20PM +0200, Daniel Borkmann wrote:
>>> When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return
>>> with an error in fn = fib6_add_1(), then error codes are encoded into
>>> the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we
>>> write the error code into err and jump to out, hence enter the if(err)
>>> condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for:
>>>
>>>    if (pn != fn && pn->leaf == rt)
>>>      ...
>>>    if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO))
>>>      ...
>>>
>>> Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn
>>> evaluates to true and causes a NULL-pointer dereference on further
>>> checks on pn. Fix it, by setting both NULL in error case, so that
>>> pn != fn already evaluates to false and no further dereference
>>> takes place.
>>>
>>> This was first correctly implemented in 4a287eba2 ("IPv6 routing,
>>> NLM_F_* flag support: REPLACE and EXCL flags support, warn about
>>> missing CREATE flag"), but the bug got later on introduced by
>>> 188c517a0 ("ipv6: return errno pointers consistently for
>>> fib6_add_1()").
>>>
>>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>>> Cc: Lin Ming <mlin@ss.pku.edu.cn>
>>> Cc: Matti Vaittinen <matti.vaittinen@nsn.com>
>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>
>> Full ACK!
>>
>> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>
> Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH net] net: ovs: flow: fix potential illegal memory access in __parse_flow_nlattrs
From: David Miller @ 2013-09-11 20:14 UTC (permalink / raw)
  To: jesse; +Cc: dborkman, netdev, azhou, dev
In-Reply-To: <CAEP_g=_Bgt8OsORTyYRt=tbssfbd-0Jh2HhnjPdj1aVP8Zd1BA@mail.gmail.com>

From: Jesse Gross <jesse@nicira.com>
Date: Sat, 7 Sep 2013 22:35:33 -0700

> On Sat, Sep 7, 2013 at 12:41 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
>> In function __parse_flow_nlattrs(), we check for condition
>> (type > OVS_KEY_ATTR_MAX) and if true, print an error, but we do
>> not return from this function as in other checks. It seems this
>> has been forgotten, as otherwise, we could access beyond the
>> memory of ovs_key_lens, which is of ovs_key_lens[OVS_KEY_ATTR_MAX + 1].
>> Hence, a maliciously prepared nla_type from user space could access
>> beyond this upper limit.
>>
>> Introduced by 03f0d916a ("openvswitch: Mega flow implementation").
>>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> Cc: Andy Zhou <azhou@nicira.com>
> 
> Yeah, looks like a mistake to me.
> 
> Acked-by: Jesse Gross <jesse@nicira.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: korina: remove deprecated IRQF_DISABLED
From: David Miller @ 2013-09-11 20:14 UTC (permalink / raw)
  To: michael.opdenacker
  Cc: emilio, mugunthanvnm, jg1.han, hsweeten, netdev, linux-kernel
In-Reply-To: <1378531257-5446-1-git-send-email-michael.opdenacker@free-electrons.com>

From: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Date: Sat,  7 Sep 2013 07:20:57 +0200

> This patch proposes to remove the IRQF_DISABLED flag from
> drivers/net/ethernet/korina.c
> 
> It's a NOOP since 2.6.35 and it will be removed one day.
> 
> Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bcm63xx_enet: remove deprecated IRQF_DISABLED
From: David Miller @ 2013-09-11 20:15 UTC (permalink / raw)
  To: michael.opdenacker; +Cc: jogo, joe, jg1.han, mbizon, netdev, linux-kernel
In-Reply-To: <1378537010-7128-1-git-send-email-michael.opdenacker@free-electrons.com>

From: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Date: Sat,  7 Sep 2013 08:56:50 +0200

> This patch proposes to remove the IRQF_DISABLED flag from
> drivers/net/ethernet/broadcom/bcm63xx_enet.c
> 
> It's a NOOP since 2.6.35 and it will be removed one day.
> 
> Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>

Applied.

^ permalink raw reply

* Re: [PATCH net 1/1] r8169: enforce RX_MULTI_EN for the 8168f.
From: David Miller @ 2013-09-11 20:16 UTC (permalink / raw)
  To: dborkman; +Cc: romieu, netdev, david, fredo, hayeswang
In-Reply-To: <522C391A.3020408@redhat.com>

From: Daniel Borkmann <dborkman@redhat.com>
Date: Sun, 08 Sep 2013 10:45:14 +0200

> On 09/08/2013 01:15 AM, Francois Romieu wrote:
>> Same narrative as eb2dc35d99028b698cdedba4f5522bc43e576bd2 ("r8169:
>> RxConfig
>> hack for the 8168evl.") regarding AMD IOMMU errors.
>>
>> RTL_GIGA_MAC_VER_36 - 8168f as well - has not been reported to behave
>> the
>> same.
>>
>> Tested-by: David R <david@unsolicited.net>
>> Tested-by: Frédéric Leroy <fredo@starox.org>
>> Cc: Hayes Wang <hayeswang@realtek.com>
>> ---
> 
> Your signed-off-by is missing.

Francois, if you reply to this thread with your signoff, all will
be well and I will apply this.

Thanks.

^ permalink raw reply

* Re: [PATCH] fib6_rules: fix indentation
From: David Miller @ 2013-09-11 20:16 UTC (permalink / raw)
  To: stefan.tomanek; +Cc: netdev
In-Reply-To: <20130908150943.GR21970@zirkel.wertarbyte.de>

From: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Date: Sun, 8 Sep 2013 17:09:43 +0200

> This change just removes two tabs from the source file.
> 
> Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 1/1] bridge: fix message_age_timer calculation
From: David Miller @ 2013-09-11 20:48 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: stephen, netdev, bridge, buytenh
In-Reply-To: <5230B93B.3040900@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Wed, 11 Sep 2013 22:40:59 +0400

> Hello.
> 
> On 09/09/2013 08:56 PM, Chris Healy wrote:
> 
>> This changes the message_age_timer calculation to use the BPDU's max
>> age as opposed to the local bridge's max age.  This is in accordance
>> with section 8.6.2.3.2 Step 2 of the 802.1D-1998 sprecification.
> 
>    You should wrap your changelog lines at 80 chars at most, preferably
>    less.

Yes, please fix this and resubmit.

^ permalink raw reply

* Re: [PATCH] net: tilegx driver: avoid compiler warning
From: David Miller @ 2013-09-11 20:58 UTC (permalink / raw)
  To: cmetcalf; +Cc: netdev, linux-kernel
In-Reply-To: <201309091822.r89IMXmA032018@farm-0002.internal.tilera.com>

From: Chris Metcalf <cmetcalf@tilera.com>
Date: Mon, 9 Sep 2013 14:11:54 -0400

> The "id" variable was being incremented in common code, but only
> initialized and used in IPv4 code.  We move the increment to the IPv4
> code too, and then legitimately use the uninitialized_var() macro to
> avoid the gcc 4.6 warning that 'id' may be used uninitialized.
> Note that gcc 4.7 does not warn.
> 
> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>

Ugly situation, but whatever, applied :-)

^ permalink raw reply

* Re: [PATCH net] ipv6: don't call fib6_run_gc() until routing is ready
From: David Miller @ 2013-09-11 21:05 UTC (permalink / raw)
  To: mkubecek; +Cc: netdev, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20130909194504.1C691E5E72@unicorn.suse.cz>

From: Michal Kubecek <mkubecek@suse.cz>
Date: Mon,  9 Sep 2013 21:45:04 +0200 (CEST)

> When loading the ipv6 module, ndisc_init() is called before
> ip6_route_init(). As the former registers a handler calling
> fib6_run_gc(), this opens a window to run the garbage collector
> before necessary data structures are initialized. If a network
> device is initialized in this window, adding MAC address to it
> triggers a NETDEV_CHANGEADDR event, leading to a crash in
> fib6_clean_all().
> 
> Take the event handler registration out of ndisc_init() into a
> separate function ndisc_late_init() and move it after
> ip6_route_init().
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

Looks good, applied, thanks.

^ permalink raw reply

* Re: [net v7 0/8][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2013-09-11 21:08 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, gospo, sassmann, jesse.brandeburg, shannon.nelson,
	peter.p.waskiewicz.jr, e1000-devel
In-Reply-To: <1378893056-4821-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 11 Sep 2013 02:50:48 -0700

>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master

Ok, I've pulled this.

Please send fixups based upon the trivial issues a few folks have
pointed out.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox