netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe
@ 2008-05-02  0:42 PJ Waskiewicz
  2008-05-02  0:43 ` [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes PJ Waskiewicz
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: PJ Waskiewicz @ 2008-05-02  0:42 UTC (permalink / raw)
  To: jgarzik; +Cc: netdev

Jeff,

This patchset introduces Data Center Bridging, a scheduling technology
supported by Intel's 82598 silicon.  The technology uses 802.1p VLAN priority
tags to schedule and control traffic rates across an entire network.  It
also uses IEEE 802.1Qaz (priority grouping) and IEEE 802.1Qbb (priority
flow control) technologies, in order to physically separate traffic flows
that coexist on the same physical link.

The technology is initially targeting storage traffic and regular LAN traffic
on the same physical connection.  Using priority flow control, one flow can
be paused at the MAC level (same as 802.3 flow control) while not affecting
other flows running at different priorities.

The first patchset introduces a netlink interface for ixgbe, which is used
to configure all the DCB parameters that come either from userspace, or from
DCB-capable switch negotiation.

The second patchset introduces the hardware initialization code to turn this
whole technology on in the device.

The third patchset implements the netlink interface and hardware init code,
and enables DCB support in the driver.

Thanks,
-- 
PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-02  0:42 [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe PJ Waskiewicz
@ 2008-05-02  0:43 ` PJ Waskiewicz
  2008-05-02 11:03   ` Jeff Garzik
  2008-05-02  0:43 ` [PATCH 2/3] ixgbe: Add DCB hardware initialization routines PJ Waskiewicz
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: PJ Waskiewicz @ 2008-05-02  0:43 UTC (permalink / raw)
  To: jgarzik; +Cc: netdev

This patch introduces a new generic netlink subsystem for Data Center
Bridging, aka DCB.  The interface will allow userspace applications to
configure the DCB parameters in the driver required to make the technology
work.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---

 drivers/net/ixgbe/ixgbe_dcb_nl.c | 1273 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 1273 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ixgbe/ixgbe_dcb_nl.c
new file mode 100644
index 0000000..95920fb
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_dcb_nl.c
@@ -0,0 +1,1273 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2008 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#include "ixgbe.h"
+
+#include <linux/netlink.h>
+#include <linux/genetlink.h>
+#include <net/genetlink.h>
+#include <linux/netdevice.h>
+
+/* DCB configuration commands */
+enum {
+	DCB_C_UNDEFINED,
+	DCB_C_GSTATE,
+	DCB_C_SSTATE,
+	DCB_C_PG_STATS,
+	DCB_C_PGTX_GCFG,
+	DCB_C_PGTX_SCFG,
+	DCB_C_PGRX_GCFG,
+	DCB_C_PGRX_SCFG,
+	DCB_C_PFC_GCFG,
+	DCB_C_PFC_SCFG,
+	DCB_C_PFC_STATS,
+	DCB_C_GLINK_SPD,
+	DCB_C_SLINK_SPD,
+	DCB_C_SET_ALL,
+	DCB_C_GPERM_HWADDR,
+	__DCB_C_ENUM_MAX,
+};
+
+#define IXGBE_DCB_C_MAX               (__DCB_C_ENUM_MAX - 1)
+
+/* DCB configuration attributes */
+enum {
+	DCB_A_UNDEFINED = 0,
+	DCB_A_IFNAME,
+	DCB_A_STATE,
+	DCB_A_PFC_STATS,
+	DCB_A_PFC_CFG,
+	DCB_A_PG_STATS,
+	DCB_A_PG_CFG,
+	DCB_A_LINK_SPD,
+	DCB_A_SET_ALL,
+	DCB_A_PERM_HWADDR,
+	__DCB_A_ENUM_MAX,
+};
+
+#define IXGBE_DCB_A_MAX               (__DCB_A_ENUM_MAX - 1)
+
+/* PERM HWADDR attributes */
+enum {
+	PERM_HW_A_UNDEFINED,
+	PERM_HW_A_0,
+	PERM_HW_A_1,
+	PERM_HW_A_2,
+	PERM_HW_A_3,
+	PERM_HW_A_4,
+	PERM_HW_A_5,
+	PERM_HW_A_ALL,
+	__PERM_HW_A_ENUM_MAX,
+};
+
+#define IXGBE_DCB_PERM_HW_A_MAX        (__PERM_HW_A_ENUM_MAX - 1)
+
+/* PFC configuration attributes */
+enum {
+	PFC_A_UP_UNDEFINED,
+	PFC_A_UP_0,
+	PFC_A_UP_1,
+	PFC_A_UP_2,
+	PFC_A_UP_3,
+	PFC_A_UP_4,
+	PFC_A_UP_5,
+	PFC_A_UP_6,
+	PFC_A_UP_7,
+	PFC_A_UP_MAX, /* Used as an iterator cap */
+	PFC_A_UP_ALL,
+	__PFC_A_UP_ENUM_MAX,
+};
+
+#define IXGBE_DCB_PFC_A_UP_MAX        (__PFC_A_UP_ENUM_MAX - 1)
+
+/* Priority Group Traffic Class and Bandwidth Group
+ * configuration attributes
+ */
+enum {
+	PG_A_UNDEFINED,
+	PG_A_TC_0,
+	PG_A_TC_1,
+	PG_A_TC_2,
+	PG_A_TC_3,
+	PG_A_TC_4,
+	PG_A_TC_5,
+	PG_A_TC_6,
+	PG_A_TC_7,
+	PG_A_TC_MAX, /* Used as an iterator cap */
+	PG_A_TC_ALL,
+	PG_A_BWG_0,
+	PG_A_BWG_1,
+	PG_A_BWG_2,
+	PG_A_BWG_3,
+	PG_A_BWG_4,
+	PG_A_BWG_5,
+	PG_A_BWG_6,
+	PG_A_BWG_7,
+	PG_A_BWG_MAX, /* Used as an iterator cap */
+	PG_A_BWG_ALL,
+	__PG_A_ENUM_MAX,
+};
+
+#define IXGBE_DCB_PG_A_MAX     (__PG_A_ENUM_MAX - 1)
+
+enum {
+	TC_A_PARAM_UNDEFINED,
+	TC_A_PARAM_STRICT_PRIO,
+	TC_A_PARAM_BW_GROUP_ID,
+	TC_A_PARAM_BW_PCT_IN_GROUP,
+	TC_A_PARAM_UP_MAPPING,
+	TC_A_PARAM_MAX, /* Used as an iterator cap */
+	TC_A_PARAM_ALL,
+	__TC_A_PARAM_ENUM_MAX,
+};
+
+#define IXGBE_DCB_TC_A_PARAM_MAX      (__TC_A_PARAM_ENUM_MAX - 1)
+
+#define DCB_PROTO_VERSION             0x1
+#define is_pci_device(dev) ((dev)->bus == &pci_bus_type)
+
+#define BIT_DCB_MODE   0x01
+#define BIT_PFC        0x02
+#define BIT_PG_RX      0x04
+#define BIT_PG_TX      0x08
+#define BIT_LINKSPEED  0x10
+
+static struct genl_family dcb_family = {
+    .id = GENL_ID_GENERATE,
+    .hdrsize = 0,
+    .name = "IXGBE_DCB",
+    .version = DCB_PROTO_VERSION,
+    .maxattr = IXGBE_DCB_A_MAX,
+};
+
+/* DCB NETLINK attributes policy */
+static struct nla_policy dcb_genl_policy[IXGBE_DCB_A_MAX + 1] = {
+	[DCB_A_IFNAME]    = {.type = NLA_STRING, .len = IFNAMSIZ - 1},
+	[DCB_A_STATE]     = {.type = NLA_U8},
+	[DCB_A_PG_CFG]    = {.type = NLA_NESTED},
+	[DCB_A_PFC_CFG]   = {.type = NLA_NESTED},
+	[DCB_A_PFC_STATS] = {.type = NLA_NESTED},
+	[DCB_A_PG_STATS]  = {.type = NLA_NESTED},
+	[DCB_A_LINK_SPD]  = {.type = NLA_U8},
+	[DCB_A_SET_ALL]   = {.type = NLA_U8},
+	[DCB_A_PERM_HWADDR] = {.type = NLA_NESTED},
+};
+
+/* DCB_A_PERM_HWADDR nested attributes... an array. */
+static struct nla_policy dcb_perm_hwaddr_nest[IXGBE_DCB_PERM_HW_A_MAX + 1] = {
+	[PERM_HW_A_0] = {.type = NLA_U8},
+	[PERM_HW_A_1] = {.type = NLA_U8},
+	[PERM_HW_A_2] = {.type = NLA_U8},
+	[PERM_HW_A_3] = {.type = NLA_U8},
+	[PERM_HW_A_4] = {.type = NLA_U8},
+	[PERM_HW_A_5] = {.type = NLA_U8},
+	[PERM_HW_A_ALL] = {.type = NLA_FLAG},
+};
+
+/* DCB_A_PFC_CFG nested attributes...like an array. */
+static struct nla_policy dcb_pfc_up_nest[IXGBE_DCB_PFC_A_UP_MAX + 1] = {
+	[PFC_A_UP_0]   = {.type = NLA_U8},
+	[PFC_A_UP_1]   = {.type = NLA_U8},
+	[PFC_A_UP_2]   = {.type = NLA_U8},
+	[PFC_A_UP_3]   = {.type = NLA_U8},
+	[PFC_A_UP_4]   = {.type = NLA_U8},
+	[PFC_A_UP_5]   = {.type = NLA_U8},
+	[PFC_A_UP_6]   = {.type = NLA_U8},
+	[PFC_A_UP_7]   = {.type = NLA_U8},
+	[PFC_A_UP_ALL] = {.type = NLA_FLAG},
+};
+
+/* DCB_A_PG_CFG nested attributes...like a struct. */
+static struct nla_policy dcb_pg_nest[IXGBE_DCB_PG_A_MAX + 1] = {
+	[PG_A_TC_0]   = {.type = NLA_NESTED},
+	[PG_A_TC_1]   = {.type = NLA_NESTED},
+	[PG_A_TC_2]   = {.type = NLA_NESTED},
+	[PG_A_TC_3]   = {.type = NLA_NESTED},
+	[PG_A_TC_4]   = {.type = NLA_NESTED},
+	[PG_A_TC_5]   = {.type = NLA_NESTED},
+	[PG_A_TC_6]   = {.type = NLA_NESTED},
+	[PG_A_TC_7]   = {.type = NLA_NESTED},
+	[PG_A_TC_ALL] = {.type = NLA_NESTED},
+	[PG_A_BWG_0]  = {.type = NLA_U8},
+	[PG_A_BWG_1]  = {.type = NLA_U8},
+	[PG_A_BWG_2]  = {.type = NLA_U8},
+	[PG_A_BWG_3]  = {.type = NLA_U8},
+	[PG_A_BWG_4]  = {.type = NLA_U8},
+	[PG_A_BWG_5]  = {.type = NLA_U8},
+	[PG_A_BWG_6]  = {.type = NLA_U8},
+	[PG_A_BWG_7]  = {.type = NLA_U8},
+	[PG_A_BWG_ALL]= {.type = NLA_FLAG},
+};
+
+/* TC_A_CLASS_X nested attributes. */
+static struct nla_policy dcb_tc_param_nest[IXGBE_DCB_TC_A_PARAM_MAX + 1] = {
+	[TC_A_PARAM_STRICT_PRIO]     = {.type = NLA_U8},
+	[TC_A_PARAM_BW_GROUP_ID]     = {.type = NLA_U8},
+	[TC_A_PARAM_BW_PCT_IN_GROUP] = {.type = NLA_U8},
+	[TC_A_PARAM_UP_MAPPING]      = {.type = NLA_U8},
+	[TC_A_PARAM_ALL]             = {.type = NLA_FLAG},
+};
+
+static int ixgbe_dcb_check_adapter(struct net_device *netdev)
+{
+	struct device *busdev;
+	struct pci_dev *pcidev;
+
+	busdev = netdev->dev.parent;
+	if (!busdev)
+		return -EINVAL;
+
+	if (!is_pci_device(busdev))
+		return -EINVAL;
+
+	pcidev = to_pci_dev(busdev);
+	if (!pcidev)
+		return -EINVAL;
+
+	if (ixgbe_is_ixgbe(pcidev))
+		return 0;
+	else
+		return -EINVAL;
+}
+
+static int ixgbe_copy_dcb_cfg(struct ixgbe_dcb_config *src_dcb_cfg,
+			      struct ixgbe_dcb_config *dst_dcb_cfg, int tc_max)
+{
+	struct tc_configuration *src_tc_cfg = NULL;
+	struct tc_configuration *dst_tc_cfg = NULL;
+	int i;
+
+	if (!src_dcb_cfg || !dst_dcb_cfg)
+		return -EINVAL;
+
+	dst_dcb_cfg->link_speed = src_dcb_cfg->link_speed;
+
+	for (i = PG_A_TC_0; i < tc_max + PG_A_TC_0; i++) {
+		src_tc_cfg = &src_dcb_cfg->tc_config[i - PG_A_TC_0];
+		dst_tc_cfg = &dst_dcb_cfg->tc_config[i - PG_A_TC_0];
+
+		dst_tc_cfg->path[DCB_TX_CONFIG].prio_type =
+				src_tc_cfg->path[DCB_TX_CONFIG].prio_type;
+
+		dst_tc_cfg->path[DCB_TX_CONFIG].bwg_id =
+				src_tc_cfg->path[DCB_TX_CONFIG].bwg_id;
+
+		dst_tc_cfg->path[DCB_TX_CONFIG].bwg_percent =
+				src_tc_cfg->path[DCB_TX_CONFIG].bwg_percent;
+
+		dst_tc_cfg->path[DCB_TX_CONFIG].up_to_tc_bitmap =
+				src_tc_cfg->path[DCB_TX_CONFIG].up_to_tc_bitmap;
+
+		dst_tc_cfg->path[DCB_RX_CONFIG].prio_type =
+				src_tc_cfg->path[DCB_RX_CONFIG].prio_type;
+
+		dst_tc_cfg->path[DCB_RX_CONFIG].bwg_id =
+				src_tc_cfg->path[DCB_RX_CONFIG].bwg_id;
+
+		dst_tc_cfg->path[DCB_RX_CONFIG].bwg_percent =
+				src_tc_cfg->path[DCB_RX_CONFIG].bwg_percent;
+
+		dst_tc_cfg->path[DCB_RX_CONFIG].up_to_tc_bitmap =
+				src_tc_cfg->path[DCB_RX_CONFIG].up_to_tc_bitmap;
+	}
+
+	for (i = PG_A_BWG_0; i < PG_A_BWG_MAX; i++) {
+		dst_dcb_cfg->bw_percentage[DCB_TX_CONFIG][i - PG_A_BWG_0] =
+		    src_dcb_cfg->bw_percentage[DCB_TX_CONFIG][i - PG_A_BWG_0];
+		dst_dcb_cfg->bw_percentage[DCB_RX_CONFIG][i - PG_A_BWG_0] =
+	            src_dcb_cfg->bw_percentage[DCB_RX_CONFIG][i - PG_A_BWG_0];
+	}
+
+	for (i = PFC_A_UP_0; i < PFC_A_UP_MAX; i++) {
+		dst_dcb_cfg->tc_config[i - PFC_A_UP_0].dcb_pfc =
+			src_dcb_cfg->tc_config[i - PFC_A_UP_0].dcb_pfc;
+	}
+
+	return 0;
+}
+
+static int ixgbe_nl_reply(u8 value, u8 cmd, u8 attr, struct genl_info *info)
+{
+	struct sk_buff *dcb_skb = NULL;
+	void *data;
+	int ret;
+
+	dcb_skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!dcb_skb)
+		return -EINVAL;
+
+	data =  genlmsg_put_reply(dcb_skb, info, &dcb_family, 0, cmd);
+	if (!data)
+		goto err;
+
+	ret = nla_put_u8(dcb_skb, attr, value);
+	if (ret)
+        	goto err;
+
+	/* end the message, assign the nlmsg_len. */
+	genlmsg_end(dcb_skb, data);
+	ret = genlmsg_reply(dcb_skb, info);
+	if (ret)
+        	goto err;
+
+	return 0;
+
+err:
+	kfree(dcb_skb);
+	return -EINVAL;
+}
+
+static int ixgbe_dcb_gstate(struct sk_buff *skb, struct genl_info *info)
+{
+	int ret = -ENOMEM;
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+
+	if (!info->attrs[DCB_A_IFNAME])
+		return -EINVAL;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		return -EINVAL;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	ret = ixgbe_nl_reply(!!(adapter->flags & IXGBE_FLAG_DCB_ENABLED),
+				DCB_C_GSTATE, DCB_A_STATE, info);
+	if (ret)
+		goto err_out;
+
+	DPRINTK(DRV, INFO, "Get DCB Admin Mode.\n");
+
+err_out:
+	dev_put(netdev);
+	return ret;
+}
+
+static int ixgbe_dcb_sstate(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	int ret = -EINVAL;
+	u8 value;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_STATE])
+		goto err;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		goto err;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	value = nla_get_u8(info->attrs[DCB_A_STATE]);
+	if ((value & 1) != value) {
+		DPRINTK(DRV, INFO, "Value is not 1 or 0, it is %d.\n", value);
+	} else {
+		switch (value) {
+		case 0:
+			if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
+				set_bit(__IXGBE_DOWN, &adapter->state);
+				if (netdev->flags & IFF_UP)
+					ixgbe_close(netdev);
+				ixgbe_reset_interrupt_capability(adapter);
+				kfree(adapter->tx_ring);
+				kfree(adapter->rx_ring);
+
+				adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED;
+				adapter->flags |= IXGBE_FLAG_RSS_ENABLED;
+				ixgbe_init_interrupt_scheme(adapter);
+				if (netdev->flags & IFF_UP)
+					ixgbe_open(netdev);
+				clear_bit(__IXGBE_DOWN, &adapter->state);
+				break;
+			} else {
+				/* Nothing to do, already off */
+				goto out;
+			}
+		case 1:
+			if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
+				/* Nothing to do, already on */
+				goto out;
+			} else {
+				set_bit(__IXGBE_DOWN, &adapter->state);
+				if (netdev->flags & IFF_UP)
+					ixgbe_close(netdev);
+				ixgbe_reset_interrupt_capability(adapter);
+				kfree(adapter->tx_ring);
+				kfree(adapter->rx_ring);
+
+				adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED;
+				adapter->flags |= IXGBE_FLAG_DCB_ENABLED;
+				ixgbe_init_interrupt_scheme(adapter);
+				if (netdev->flags & IFF_UP)
+					ixgbe_open(netdev);
+				clear_bit(__IXGBE_DOWN, &adapter->state);
+				break;
+			}
+		}
+	}
+
+out:
+	ret = ixgbe_nl_reply(0, DCB_C_SSTATE, DCB_A_STATE, info);
+	if (ret)
+		goto err_out;
+
+	DPRINTK(DRV, INFO, "Set DCB Admin Mode.\n");
+
+err_out:
+	dev_put(netdev);
+err:
+	return ret;
+}
+
+static int ixgbe_dcb_glink_spd(struct sk_buff *skb, struct genl_info *info)
+{
+	int ret = -ENOMEM;
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+
+	if (!info->attrs[DCB_A_IFNAME])
+		return -EINVAL;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		return -EINVAL;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	ret = ixgbe_nl_reply(adapter->dcb_cfg.link_speed & 0xff,
+				DCB_C_GLINK_SPD, DCB_A_LINK_SPD, info);
+	if (ret)
+		goto err_out;
+
+	DPRINTK(DRV, INFO, "Get DCB Link Speed.\n");
+
+err_out:
+	dev_put(netdev);
+	return ret;
+}
+
+static int ixgbe_dcb_slink_spd(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	int ret = -EINVAL;
+	u8 value;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_LINK_SPD])
+		goto err;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		goto err;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	value = nla_get_u8(info->attrs[DCB_A_LINK_SPD]);
+	if (value > 9) {
+		DPRINTK(DRV, ERR, "Value is not 0 thru 9, it is %d.\n", value);
+	} else {
+		if (!adapter->dcb_set_bitmap &&
+		   ixgbe_copy_dcb_cfg(&adapter->dcb_cfg, &adapter->temp_dcb_cfg,
+				adapter->ring_feature[RING_F_DCB].indices)) {
+			ret = -EINVAL;
+			goto err_out;
+		}
+
+		adapter->temp_dcb_cfg.link_speed = value;
+		adapter->dcb_set_bitmap |= BIT_LINKSPEED;
+	}
+
+	ret = ixgbe_nl_reply(0, DCB_C_SLINK_SPD, DCB_A_LINK_SPD, info);
+	if (ret)
+		goto err_out;
+
+	DPRINTK(DRV, INFO, "Set DCB Link Speed to %d.\n", value);
+
+err_out:
+	dev_put(netdev);
+err:
+	return ret;
+}
+
+static int ixgbe_dcb_gperm_hwaddr(struct sk_buff *skb, struct genl_info *info)
+{
+	void *data;
+	struct sk_buff *dcb_skb = NULL;
+	struct nlattr *tb[IXGBE_DCB_PERM_HW_A_MAX + 1], *nest;
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	struct ixgbe_hw *hw = NULL;
+	int ret = -ENOMEM;
+	int i;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_PERM_HWADDR])
+		return -EINVAL;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		return -EINVAL;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	hw = &adapter->hw;
+
+	ret = nla_parse_nested(tb, IXGBE_DCB_PERM_HW_A_MAX,
+				info->attrs[DCB_A_PERM_HWADDR],
+				dcb_perm_hwaddr_nest);
+	if (ret)
+		goto err;
+
+	dcb_skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!dcb_skb)
+		goto err;
+
+	data =  genlmsg_put_reply(dcb_skb, info, &dcb_family, 0,
+				  DCB_C_GPERM_HWADDR);
+	if (!data)
+		goto err;
+
+	nest = nla_nest_start(dcb_skb, DCB_A_PERM_HWADDR);
+	if (!nest)
+		goto err;
+
+	for (i = 0; i < netdev->addr_len; i++) {
+		if (!tb[i+PERM_HW_A_0] && !tb[PERM_HW_A_ALL])
+			goto err;
+
+		ret = nla_put_u8(dcb_skb, DCB_A_PERM_HWADDR,
+				 hw->mac.perm_addr[i]);
+
+		if (ret) {
+			nla_nest_cancel(dcb_skb, nest);
+			goto err;
+		}
+	}
+
+	nla_nest_end(dcb_skb, nest);
+
+	genlmsg_end(dcb_skb, data);
+
+	ret = genlmsg_reply(dcb_skb, info);
+	if (ret)
+		goto err;
+
+	dev_put(netdev);
+	return 0;
+
+err:
+	DPRINTK(DRV, ERR, "Error in get permanent hwaddr.\n");
+	kfree(dcb_skb);
+err_out:
+	dev_put(netdev);
+	return ret;
+}
+
+static int ixgbe_dcb_pg_scfg(struct sk_buff *skb, struct genl_info *info,
+				int dir)
+{
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	struct tc_configuration *tc_config = NULL;
+	struct tc_configuration *tc_tmpcfg = NULL;
+	struct nlattr *pg_tb[IXGBE_DCB_PG_A_MAX + 1];
+	struct nlattr *param_tb[IXGBE_DCB_TC_A_PARAM_MAX + 1];
+	int i, ret, tc_max;
+	u8 value;
+	u8 changed = 0;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_PG_CFG])
+		return -EINVAL;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		return -EINVAL;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err;
+	else
+		adapter = netdev_priv(netdev);
+
+	ret = nla_parse_nested(pg_tb, IXGBE_DCB_PG_A_MAX,
+			       info->attrs[DCB_A_PG_CFG], dcb_pg_nest);
+	if (ret)
+		goto err;
+
+	if (!adapter->dcb_set_bitmap &&
+	    ixgbe_copy_dcb_cfg(&adapter->dcb_cfg, &adapter->temp_dcb_cfg,
+			       adapter->ring_feature[RING_F_DCB].indices))
+		goto err;
+
+	tc_max = adapter->ring_feature[RING_F_DCB].indices;
+	for (i = PG_A_TC_0; i < tc_max + PG_A_TC_0; i++) {
+		if (!pg_tb[i])
+			continue;
+
+		ret = nla_parse_nested(param_tb, IXGBE_DCB_TC_A_PARAM_MAX,
+				       pg_tb[i], dcb_tc_param_nest);
+		if (ret)
+			goto err;
+
+		tc_config = &adapter->dcb_cfg.tc_config[i - PG_A_TC_0];
+		tc_tmpcfg = &adapter->temp_dcb_cfg.tc_config[i - PG_A_TC_0];
+		if (param_tb[TC_A_PARAM_STRICT_PRIO]) {
+			value = nla_get_u8(param_tb[TC_A_PARAM_STRICT_PRIO]);
+			tc_tmpcfg->path[dir].prio_type = value;
+			if (tc_tmpcfg->path[dir].prio_type !=
+				tc_config->path[dir].prio_type)
+				changed = 1;
+		}
+		if (param_tb[TC_A_PARAM_BW_GROUP_ID]) {
+			value = nla_get_u8(param_tb[TC_A_PARAM_BW_GROUP_ID]);
+			tc_tmpcfg->path[dir].bwg_id = value;
+			if (tc_tmpcfg->path[dir].bwg_id !=
+				tc_config->path[dir].bwg_id)
+				changed = 1;
+		}
+		if (param_tb[TC_A_PARAM_BW_PCT_IN_GROUP]) {
+			value = nla_get_u8(param_tb[TC_A_PARAM_BW_PCT_IN_GROUP]);
+			tc_tmpcfg->path[dir].bwg_percent = value;
+			if (tc_tmpcfg->path[dir].bwg_percent !=
+				tc_config->path[dir].bwg_percent)
+				changed = 1;
+		}
+		if (param_tb[TC_A_PARAM_UP_MAPPING]) {
+			value = nla_get_u8(param_tb[TC_A_PARAM_UP_MAPPING]);
+			tc_tmpcfg->path[dir].up_to_tc_bitmap = value;
+			if (tc_tmpcfg->path[dir].up_to_tc_bitmap !=
+				tc_config->path[dir].up_to_tc_bitmap)
+				changed = 1;
+		}
+	}
+
+	for (i = PG_A_BWG_0; i < PG_A_BWG_MAX; i++) {
+		if (!pg_tb[i])
+			continue;
+
+		value = nla_get_u8(pg_tb[i]);
+		adapter->temp_dcb_cfg.bw_percentage[dir][i-PG_A_BWG_0] = value;
+
+		if (adapter->temp_dcb_cfg.bw_percentage[dir][i-PG_A_BWG_0] !=
+			adapter->dcb_cfg.bw_percentage[dir][i-PG_A_BWG_0])
+			changed = 1;
+	}
+
+	adapter->temp_dcb_cfg.round_robin_enable = false;
+
+	if (changed) {
+		if (dir == DCB_TX_CONFIG)
+			adapter->dcb_set_bitmap |= BIT_PG_TX;
+		else
+			adapter->dcb_set_bitmap |= BIT_PG_RX;
+
+		DPRINTK(DRV, INFO, "Set DCB PG\n");
+	} else {
+		DPRINTK(DRV, INFO, "Set DCB PG - no changes\n");
+	}
+
+	ret = ixgbe_nl_reply(0, (dir? DCB_C_PGRX_SCFG : DCB_C_PGTX_SCFG),
+			     DCB_A_PG_CFG, info);
+	if (ret)
+		goto err;
+
+err:
+	dev_put(netdev);
+	return ret;
+}
+
+static int ixgbe_dcb_pgtx_scfg(struct sk_buff *skb, struct genl_info *info)
+{
+	return ixgbe_dcb_pg_scfg(skb, info, DCB_TX_CONFIG);
+}
+
+static int ixgbe_dcb_pgrx_scfg(struct sk_buff *skb, struct genl_info *info)
+{
+	return ixgbe_dcb_pg_scfg(skb, info, DCB_RX_CONFIG);
+}
+
+static int ixgbe_dcb_pg_gcfg(struct sk_buff *skb, struct genl_info *info,
+				int dir)
+{
+	void *data;
+	struct sk_buff *dcb_skb = NULL;
+	struct nlattr *pg_nest, *param_nest, *tb;
+	struct nlattr *pg_tb[IXGBE_DCB_PG_A_MAX + 1];
+	struct nlattr *param_tb[IXGBE_DCB_TC_A_PARAM_MAX + 1];
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	struct tc_configuration *tc_config = NULL;
+	struct tc_bw_alloc *tc = NULL;
+	int ret  = -ENOMEM;
+	int i, tc_max;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_PG_CFG])
+		return -EINVAL;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		return -EINVAL;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	ret = nla_parse_nested(pg_tb, IXGBE_DCB_PG_A_MAX,
+			       info->attrs[DCB_A_PG_CFG], dcb_pg_nest);
+	if (ret)
+		goto err;
+
+	dcb_skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!dcb_skb)
+		goto err;
+
+	data =  genlmsg_put_reply(dcb_skb, info, &dcb_family, 0,
+				 (dir) ? DCB_C_PGRX_GCFG : DCB_C_PGTX_GCFG);
+
+	if (!data)
+		goto err;
+
+	pg_nest = nla_nest_start(dcb_skb, DCB_A_PG_CFG);
+	if (!pg_nest)
+		goto err;
+
+	tc_max = adapter->ring_feature[RING_F_DCB].indices;
+	for (i = PG_A_TC_0; i < tc_max + PG_A_TC_0; i++) {
+		if (!pg_tb[i] && !pg_tb[PG_A_TC_ALL])
+			continue;
+
+		if (pg_tb[PG_A_TC_ALL])
+			tb = pg_tb[PG_A_TC_ALL];
+		else
+			tb = pg_tb[i];
+		ret = nla_parse_nested(param_tb, IXGBE_DCB_TC_A_PARAM_MAX,
+				       tb, dcb_tc_param_nest);
+		if (ret)
+			goto err_pg;
+
+		param_nest = nla_nest_start(dcb_skb, i);
+		if (!param_nest)
+			goto err_pg;
+
+		tc_config = &adapter->dcb_cfg.tc_config[i - PG_A_TC_0];
+		tc = &adapter->dcb_cfg.tc_config[i - PG_A_TC_0].path[dir];
+
+		if (param_tb[TC_A_PARAM_STRICT_PRIO] ||
+		    param_tb[TC_A_PARAM_ALL]) {
+			ret = nla_put_u8(dcb_skb, TC_A_PARAM_STRICT_PRIO,
+					 tc->prio_type);
+			if (ret)
+				goto err_param;
+		}
+		if (param_tb[TC_A_PARAM_BW_GROUP_ID] ||
+		    param_tb[TC_A_PARAM_ALL]) {
+			ret = nla_put_u8(dcb_skb, TC_A_PARAM_BW_GROUP_ID,
+					 tc->bwg_id);
+			if (ret)
+				goto err_param;
+		}
+		if (param_tb[TC_A_PARAM_BW_PCT_IN_GROUP] ||
+		    param_tb[TC_A_PARAM_ALL]) {
+			ret = nla_put_u8(dcb_skb, TC_A_PARAM_BW_PCT_IN_GROUP,
+					 tc->bwg_percent);
+			if (ret)
+				goto err_param;
+		}
+		if (param_tb[TC_A_PARAM_UP_MAPPING] ||
+		    param_tb[TC_A_PARAM_ALL]) {
+			ret = nla_put_u8(dcb_skb, TC_A_PARAM_UP_MAPPING,
+					 tc->up_to_tc_bitmap);
+			if (ret)
+				goto err_param;
+		}
+		nla_nest_end(dcb_skb, param_nest);
+	}
+
+	for (i = PG_A_BWG_0; i < PG_A_BWG_MAX; i++) {
+		if (!pg_tb[i] && !pg_tb[PG_A_BWG_ALL])
+			continue;
+
+		ret = nla_put_u8(dcb_skb, i,
+		            adapter->dcb_cfg.bw_percentage[dir][i-PG_A_BWG_0]);
+
+		if (ret)
+			goto err_pg;
+	}
+
+	nla_nest_end(dcb_skb, pg_nest);
+
+	genlmsg_end(dcb_skb, data);
+	ret = genlmsg_reply(dcb_skb, info);
+	if (ret)
+		goto err;
+
+	DPRINTK(DRV, INFO, "Get PG %s Attributes.\n", dir?"RX":"TX");
+	dev_put(netdev);
+	return 0;
+
+err_param:
+	DPRINTK(DRV, ERR, "Error in get pg %s.\n", dir?"rx":"tx");
+	nla_nest_cancel(dcb_skb, param_nest);
+err_pg:
+	nla_nest_cancel(dcb_skb, pg_nest);
+err:
+	kfree(dcb_skb);
+err_out:
+	dev_put(netdev);
+	return ret;
+}
+
+static int ixgbe_dcb_pgtx_gcfg(struct sk_buff *skb, struct genl_info *info)
+{
+	return ixgbe_dcb_pg_gcfg(skb, info, DCB_TX_CONFIG);
+}
+
+static int ixgbe_dcb_pgrx_gcfg(struct sk_buff *skb, struct genl_info *info)
+{
+	return ixgbe_dcb_pg_gcfg(skb, info, DCB_RX_CONFIG);
+}
+
+static int ixgbe_dcb_spfccfg(struct sk_buff *skb, struct genl_info *info)
+{
+	struct nlattr *tb[IXGBE_DCB_PFC_A_UP_MAX + 1];
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	int i, ret = -ENOMEM;
+	u8 setting;
+	u8 changed = 0;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_PFC_CFG]) {
+		DPRINTK(DRV, INFO, "set pfc: ifname:%d pfc_cfg:%d\n",
+			!info->attrs[DCB_A_IFNAME],
+			!info->attrs[DCB_A_PFC_CFG]);
+		return -EINVAL;
+	}
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		return -EINVAL;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err;
+	else
+		adapter = netdev_priv(netdev);
+
+	ret = nla_parse_nested(tb, IXGBE_DCB_PFC_A_UP_MAX,
+		               info->attrs[DCB_A_PFC_CFG],
+		               dcb_pfc_up_nest);
+	if (ret)
+		goto err;
+
+	if (!adapter->dcb_set_bitmap &&
+	    ixgbe_copy_dcb_cfg(&adapter->dcb_cfg, &adapter->temp_dcb_cfg,
+			       adapter->ring_feature[RING_F_DCB].indices)) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	for (i = PFC_A_UP_0; i < PFC_A_UP_MAX; i++) {
+		if (!tb[i])
+			continue;
+
+		setting = nla_get_u8(tb[i]);
+		adapter->temp_dcb_cfg.tc_config[i-PFC_A_UP_0].dcb_pfc = setting;
+
+		if (adapter->temp_dcb_cfg.tc_config[i-PFC_A_UP_0].dcb_pfc !=
+			adapter->dcb_cfg.tc_config[i-PFC_A_UP_0].dcb_pfc)
+			changed = 1;
+	}
+
+	if (changed) {
+		adapter->dcb_set_bitmap |= BIT_PFC;
+		DPRINTK(DRV, INFO, "Set DCB PFC\n");
+	} else {
+		DPRINTK(DRV, INFO, "Set DCB PFC - no changes\n");
+	}
+
+	ret = ixgbe_nl_reply(0, DCB_C_PFC_SCFG, DCB_A_PFC_CFG, info);
+	if (ret)
+		goto err;
+
+err:
+	dev_put(netdev);
+	return ret;
+}
+
+static int ixgbe_dcb_gpfccfg(struct sk_buff *skb, struct genl_info *info)
+{
+	void *data;
+	struct sk_buff *dcb_skb = NULL;
+	struct nlattr *tb[IXGBE_DCB_PFC_A_UP_MAX + 1], *nest;
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	int ret = -ENOMEM;
+	int i;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_PFC_CFG])
+		return -EINVAL;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		return -EINVAL;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	ret = nla_parse_nested(tb, IXGBE_DCB_PFC_A_UP_MAX,
+			       info->attrs[DCB_A_PFC_CFG], dcb_pfc_up_nest);
+	if (ret)
+		goto err;
+
+	dcb_skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!dcb_skb)
+		goto err;
+
+	data =  genlmsg_put_reply(dcb_skb, info, &dcb_family, 0,
+				  DCB_C_PFC_GCFG);
+	if (!data)
+		goto err;
+
+	nest = nla_nest_start(dcb_skb, DCB_A_PFC_CFG);
+	if (!nest)
+		goto err;
+
+	for (i = PFC_A_UP_0; i < PFC_A_UP_MAX; i++) {
+		if (!tb[i] && !tb[PFC_A_UP_ALL])
+			continue;
+
+		ret = nla_put_u8(dcb_skb, i,
+			      adapter->dcb_cfg.tc_config[i-PFC_A_UP_0].dcb_pfc);
+		if (ret) {
+			nla_nest_cancel(dcb_skb, nest);
+			goto err;
+		}
+	}
+
+	nla_nest_end(dcb_skb, nest);
+
+	genlmsg_end(dcb_skb, data);
+
+	ret = genlmsg_reply(dcb_skb, info);
+	if (ret)
+		goto err;
+
+	DPRINTK(DRV, INFO, "Get PFC CFG.\n");
+	dev_put(netdev);
+	return 0;
+
+err:
+	DPRINTK(DRV, ERR, "Error in get pfc stats.\n");
+	kfree(dcb_skb);
+err_out:
+	dev_put(netdev);
+	return ret;
+}
+
+static int ixgbe_dcb_set_all(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net_device *netdev = NULL;
+	struct ixgbe_adapter *adapter = NULL;
+	int ret = -ENOMEM;
+	u8 value;
+	u8 retval = 0;
+
+	if (!info->attrs[DCB_A_IFNAME] || !info->attrs[DCB_A_SET_ALL])
+		goto err;
+
+	netdev = dev_get_by_name(&init_net,
+				 nla_data(info->attrs[DCB_A_IFNAME]));
+	if (!netdev)
+		goto err;
+
+	ret = ixgbe_dcb_check_adapter(netdev);
+	if (ret)
+		goto err_out;
+	else
+		adapter = netdev_priv(netdev);
+
+	value = nla_get_u8(info->attrs[DCB_A_SET_ALL]);
+	if ((value & 1) != value) {
+		DPRINTK(DRV, INFO, "Value is not 1 or 0, it is %d.\n", value);
+	} else {
+		if (!adapter->dcb_set_bitmap) {
+			retval = 1;
+			goto out;
+		}
+
+		while (test_and_set_bit(__IXGBE_RESETTING, &adapter->state))
+			msleep(1);
+
+		ret = ixgbe_copy_dcb_cfg(&adapter->temp_dcb_cfg,
+				&adapter->dcb_cfg,
+				adapter->ring_feature[RING_F_DCB].indices);
+		if (ret) {
+			clear_bit(__IXGBE_RESETTING, &adapter->state);
+			goto err_out;
+		}
+
+		ixgbe_down(adapter);
+		ixgbe_up(adapter);
+		adapter->dcb_set_bitmap = 0x00;
+		clear_bit(__IXGBE_RESETTING, &adapter->state);
+	}
+
+out:
+	ret = ixgbe_nl_reply(retval, DCB_C_SET_ALL, DCB_A_SET_ALL, info);
+	if (ret)
+		goto err_out;
+
+	DPRINTK(DRV, INFO, "Set all pfc pg and link speed configuration.\n");
+
+err_out:
+	dev_put(netdev);
+err:
+	return ret;
+}
+
+
+/* DCB Generic NETLINK command Definitions */
+/* Get DCB Admin Mode */
+static struct genl_ops ixgbe_dcb_genl_c_gstate = {
+    .cmd = DCB_C_GSTATE,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_gstate,
+    .dumpit =  NULL,
+};
+
+/* Set DCB Admin Mode */
+static struct genl_ops ixgbe_dcb_genl_c_sstate = {
+    .cmd = DCB_C_SSTATE,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_sstate,
+    .dumpit =  NULL,
+};
+
+/* Set TX Traffic Attributes */
+static struct genl_ops ixgbe_dcb_genl_c_spgtx = {
+    .cmd = DCB_C_PGTX_SCFG,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_pgtx_scfg,
+    .dumpit =  NULL,
+};
+
+/* Set RX Traffic Attributes */
+static struct genl_ops ixgbe_dcb_genl_c_spgrx = {
+    .cmd = DCB_C_PGRX_SCFG,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_pgrx_scfg,
+    .dumpit =  NULL,
+};
+
+/* Set PFC CFG */
+static struct genl_ops ixgbe_dcb_genl_c_spfc = {
+    .cmd = DCB_C_PFC_SCFG,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_spfccfg,
+    .dumpit =  NULL,
+};
+
+/* Get TX Traffic Attributes */
+static struct genl_ops ixgbe_dcb_genl_c_gpgtx = {
+    .cmd = DCB_C_PGTX_GCFG,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_pgtx_gcfg,
+    .dumpit =  NULL,
+};
+
+/* Get RX Traffic Attributes */
+static struct genl_ops ixgbe_dcb_genl_c_gpgrx = {
+    .cmd = DCB_C_PGRX_GCFG,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_pgrx_gcfg,
+    .dumpit =  NULL,
+};
+
+/* Get PFC CFG */
+static struct genl_ops ixgbe_dcb_genl_c_gpfc = {
+    .cmd = DCB_C_PFC_GCFG,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_gpfccfg,
+    .dumpit =  NULL,
+};
+
+
+/* Get Link Speed setting */
+static struct genl_ops ixgbe_dcb_genl_c_glink_spd = {
+    .cmd = DCB_C_GLINK_SPD,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_glink_spd,
+    .dumpit =  NULL,
+};
+
+/* Set Link Speed setting */
+static struct genl_ops ixgbe_dcb_genl_c_slink_spd = {
+    .cmd = DCB_C_SLINK_SPD,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_slink_spd,
+    .dumpit =  NULL,
+};
+
+/* Set all "set" feature */
+static struct genl_ops ixgbe_dcb_genl_c_set_all= {
+    .cmd = DCB_C_SET_ALL,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_set_all,
+    .dumpit =  NULL,
+};
+
+/* Get permanent HW address */
+static struct genl_ops ixgbe_dcb_genl_c_gperm_hwaddr = {
+    .cmd = DCB_C_GPERM_HWADDR,
+    .flags = GENL_ADMIN_PERM,
+    .policy = dcb_genl_policy,
+    .doit = ixgbe_dcb_gperm_hwaddr,
+    .dumpit =  NULL,
+};
+
+/**
+ * ixgbe_dcb_netlink_register - Initialize the NETLINK communication channel
+ *
+ * Description:
+ * Call out to the DCB components so they can register their families and
+ * commands with Generic NETLINK mechanism.  Return zero on success and
+ * non-zero on failure.
+ *
+ */
+int ixgbe_dcb_netlink_register(void)
+{
+	int ret = 1;
+
+	ret = genl_register_family(&dcb_family);
+	if (ret)
+		return ret;
+
+	ret =  genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_gstate);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_sstate);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_spgtx);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_spgrx);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_spfc);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_gpfc);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_gpgtx);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_gpgrx);
+	if (ret)
+		goto err;
+
+
+	ret =  genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_glink_spd);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_slink_spd);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_set_all);
+	if (ret)
+		goto err;
+
+	ret = genl_register_ops(&dcb_family, &ixgbe_dcb_genl_c_gperm_hwaddr);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	genl_unregister_family(&dcb_family);
+	return ret;
+}
+
+int ixgbe_dcb_netlink_unregister(void)
+{
+	return genl_unregister_family(&dcb_family);
+}


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] ixgbe: Add DCB hardware initialization routines
  2008-05-02  0:42 [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe PJ Waskiewicz
  2008-05-02  0:43 ` [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes PJ Waskiewicz
@ 2008-05-02  0:43 ` PJ Waskiewicz
  2008-05-02  0:43 ` [PATCH 3/3] ixgbe: Enable Data Center Bridging (DCB) support PJ Waskiewicz
  2008-05-02 11:19 ` [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe Andi Kleen
  3 siblings, 0 replies; 13+ messages in thread
From: PJ Waskiewicz @ 2008-05-02  0:43 UTC (permalink / raw)
  To: jgarzik; +Cc: netdev

This patch adds the shared code functions for DCB functionality.  These
routines take care of the bandwidth credit calculations for the hardware
arbiters, takes care of priority grouping changes, and all the hardware
writes to enable these features in 82598.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---

 drivers/net/ixgbe/ixgbe_dcb.c       |  330 +++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_dcb.h       |  168 +++++++++++++++
 drivers/net/ixgbe/ixgbe_dcb_82598.c |  400 +++++++++++++++++++++++++++++++++++
 drivers/net/ixgbe/ixgbe_dcb_82598.h |   98 +++++++++
 4 files changed, 996 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_dcb.c b/drivers/net/ixgbe/ixgbe_dcb.c
new file mode 100644
index 0000000..28f190a
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_dcb.c
@@ -0,0 +1,330 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2007 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  Linux NICS <linux.nics@intel.com>
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+
+#include "ixgbe.h"
+#include "ixgbe_type.h"
+#include "ixgbe_dcb.h"
+#include "ixgbe_dcb_82598.h"
+
+/**
+ * ixgbe_dcb_config - Struct containing DCB settings.
+ * @dcb_config: Pointer to DCB config structure
+ *
+ * This function checks DCB rules for DCB settings.
+ * The following rules are checked:
+ * 1. The sum of bandwidth percentages of all Bandwidth Groups must total 100%.
+ * 2. The sum of bandwidth percentages of all Traffic Classes within a Bandwidth
+ *    Group must total 100.
+ * 3. A Traffic Class should not be set to both Link Strict Priority
+ *    and Group Strict Priority.
+ * 4. Link strict Bandwidth Groups can only have link strict traffic classes
+ *    with zero bandwidth.
+ */
+s32 ixgbe_dcb_check_config(struct ixgbe_dcb_config *dcb_config)
+{
+	struct tc_bw_alloc *p;
+	s32 ret_val = 0;
+	u8 i, j, bw = 0, bw_id;
+	u8 bw_sum[2][MAX_BW_GROUP];
+	bool link_strict[2][MAX_BW_GROUP];
+
+	memset(bw_sum, 0, sizeof(bw_sum));
+	memset(link_strict, 0, sizeof(link_strict));
+
+	/* First Tx, then Rx */
+	for (i = 0; i < 2; i++) {
+		/* Check each traffic class for rule violation */
+		for (j = 0; j < MAX_TRAFFIC_CLASS; j++) {
+			p = &dcb_config->tc_config[j].path[i];
+
+			bw = p->bwg_percent;
+			bw_id = p->bwg_id;
+
+			if (bw_id >= MAX_BW_GROUP) {
+				ret_val = DCB_ERR_CONFIG;
+				goto err_config;
+			}
+			if (p->prio_type == prio_link) {
+				link_strict[i][bw_id] = true;
+				/* Link strict should have zero bandwidth */
+				if (bw) {
+					ret_val = DCB_ERR_LS_BW_NONZERO;
+					goto err_config;
+				}
+			} else if (!bw) {
+				/*
+				 * Traffic classes without link strict
+				 * should have non-zero bandwidth.
+				 */
+				ret_val = DCB_ERR_TC_BW_ZERO;
+				goto err_config;
+			}
+			bw_sum[i][bw_id] += bw;
+		}
+
+		bw = 0;
+
+		/* Check each bandwidth group for rule violation */
+		for (j = 0; j < MAX_BW_GROUP; j++) {
+			bw += dcb_config->bw_percentage[i][j];
+			/*
+			 * Sum of bandwidth percentages of all traffic classes
+			 * within a Bandwidth Group must total 100 except for
+			 * link strict group (zero bandwidth).
+			 */
+			if (link_strict[i][j]) {
+				if (bw_sum[i][j]) {
+					/*
+					 * Link strict group should have zero
+					 * bandwidth.
+					 */
+					ret_val = DCB_ERR_LS_BWG_NONZERO;
+					goto err_config;
+				}
+			} else if (bw_sum[i][j] != BW_PERCENT &&
+			           bw_sum[i][j] != 0) {
+				ret_val = DCB_ERR_TC_BW;
+				goto err_config;
+			}
+		}
+
+		if (bw != BW_PERCENT) {
+			ret_val = DCB_ERR_BW_GROUP;
+			goto err_config;
+		}
+	}
+
+err_config:
+	return ret_val;
+}
+
+/**
+ * ixgbe_dcb_calculate_tc_credits - Calculates traffic class credits
+ * @ixgbe_dcb_config: Struct containing DCB settings.
+ * @direction: Configuring either Tx or Rx.
+ *
+ * This function calculates the credits allocated to each traffic class.
+ * It should be called only after the rules are checked by
+ * ixgbe_dcb_check_config().
+ */
+s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *dcb_config,
+                                   u8 direction)
+{
+	struct tc_bw_alloc *p;
+	s32 ret_val = 0;
+	/* Initialization values default for Tx settings */
+	u32 credit_refill       = 0;
+	u32 credit_max          = 0;
+	u16 link_percentage     = 0;
+	u8  bw_percent          = 0;
+	u8  i;
+
+	if (dcb_config == NULL) {
+		ret_val = DCB_ERR_CONFIG;
+		goto out;
+	}
+
+	/* Find out the link percentage for each TC first */
+	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
+		p = &dcb_config->tc_config[i].path[direction];
+		bw_percent = dcb_config->bw_percentage[direction][p->bwg_id];
+
+		link_percentage = p->bwg_percent;
+		/* Must be careful of integer division for very small nums */
+		link_percentage = (link_percentage * bw_percent) / 100;
+		if (p->bwg_percent > 0 && link_percentage == 0)
+			link_percentage = 1;
+
+		/* Save link_percentage for reference */
+		p->link_percent = (u8)link_percentage;
+
+		/* Calculate credit refill and save it */
+		credit_refill = link_percentage * MINIMUM_CREDIT_REFILL;
+		p->data_credits_refill = (u16)credit_refill;
+
+		/* Calculate maximum credit for the TC */
+		credit_max = (link_percentage * MAX_CREDIT) / 100;
+
+		/*
+		 * Adjustment based on rule checking, if the percentage
+		 * of a TC is too small, the maximum credit may not be
+		 * enough to send out a jumbo frame in data plane arbitration.
+		 */
+		if (credit_max && credit_max < MINIMUM_CREDIT_FOR_JUMBO)
+			credit_max = MINIMUM_CREDIT_FOR_JUMBO;
+
+		if (direction == DCB_TX_CONFIG) {
+			/*
+			 * Adjustment based on rule checking, if the
+			 * percentage of a TC is too small, the maximum
+			 * credit may not be enough to send out a TSO
+			 * packet in descriptor plane arbitration.
+			 */
+			if (credit_max &&
+			    (credit_max < MINIMUM_CREDIT_FOR_TSO))
+				credit_max = MINIMUM_CREDIT_FOR_TSO;
+
+			dcb_config->tc_config[i].desc_credits_max = (u16)credit_max;
+		}
+
+		p->data_credits_max = (u16)credit_max;
+	}
+
+out:
+	return ret_val;
+}
+
+/**
+ * ixgbe_dcb_get_tc_stats - Returns status of each traffic class
+ * @hw: pointer to hardware structure
+ * @stats: pointer to statistics structure
+ * @tc_count:  Number of elements in bwg_array.
+ *
+ * This function returns the status data for each of the Traffic Classes in use.
+ */
+s32 ixgbe_dcb_get_tc_stats(struct ixgbe_hw *hw, struct ixgbe_hw_stats *stats,
+                           u8 tc_count)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_get_tc_stats_82598(hw, stats, tc_count);
+	return ret;
+}
+
+/**
+ * ixgbe_dcb_get_pfc_stats - Returns CBFC status of each traffic class
+ * hw - pointer to hardware structure
+ * stats - pointer to statistics structure
+ * tc_count -  Number of elements in bwg_array.
+ *
+ * This function returns the CBFC status data for each of the Traffic Classes.
+ */
+s32 ixgbe_dcb_get_pfc_stats(struct ixgbe_hw *hw, struct ixgbe_hw_stats *stats,
+                            u8 tc_count)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_get_pfc_stats_82598(hw, stats, tc_count);
+	return ret;
+}
+
+/**
+ * ixgbe_dcb_config_rx_arbiter - Config Rx arbiter
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Rx Data Arbiter and credits for each traffic class.
+ */
+s32 ixgbe_dcb_config_rx_arbiter(struct ixgbe_hw *hw,
+                                struct ixgbe_dcb_config *dcb_config)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_config_rx_arbiter_82598(hw, dcb_config);
+	return ret;
+}
+
+/**
+ * ixgbe_dcb_config_tx_desc_arbiter - Config Tx Desc arbiter
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Tx Descriptor Arbiter and credits for each traffic class.
+ */
+s32 ixgbe_dcb_config_tx_desc_arbiter(struct ixgbe_hw *hw,
+                                     struct ixgbe_dcb_config *dcb_config)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_config_tx_desc_arbiter_82598(hw, dcb_config);
+	return ret;
+}
+
+/**
+ * ixgbe_dcb_config_tx_data_arbiter - Config Tx data arbiter
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Tx Data Arbiter and credits for each traffic class.
+ */
+s32 ixgbe_dcb_config_tx_data_arbiter(struct ixgbe_hw *hw,
+                                     struct ixgbe_dcb_config *dcb_config)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_config_tx_data_arbiter_82598(hw, dcb_config);
+	return ret;
+}
+
+/**
+ * ixgbe_dcb_config_pfc - Config priority flow control
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Priority Flow Control for each traffic class.
+ */
+s32 ixgbe_dcb_config_pfc(struct ixgbe_hw *hw,
+                         struct ixgbe_dcb_config *dcb_config)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_config_pfc_82598(hw, dcb_config);
+	return ret;
+}
+
+/**
+ * ixgbe_dcb_config_tc_stats - Config traffic class statistics
+ * @hw: pointer to hardware structure
+ *
+ * Configure queue statistics registers, all queues belonging to same traffic
+ * class uses a single set of queue statistics counters.
+ */
+s32 ixgbe_dcb_config_tc_stats(struct ixgbe_hw *hw)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_config_tc_stats_82598(hw);
+	return ret;
+}
+
+/**
+ * ixgbe_dcb_hw_config - Config and enable DCB
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure dcb settings and enable dcb mode.
+ */
+s32 ixgbe_dcb_hw_config(struct ixgbe_hw *hw,
+                        struct ixgbe_dcb_config *dcb_config)
+{
+	s32 ret = 0;
+	if (hw->mac.type == ixgbe_mac_82598EB)
+		ret = ixgbe_dcb_hw_config_82598(hw, dcb_config);
+	return ret;
+}
diff --git a/drivers/net/ixgbe/ixgbe_dcb.h b/drivers/net/ixgbe/ixgbe_dcb.h
new file mode 100644
index 0000000..e19a5f4
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_dcb.h
@@ -0,0 +1,168 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2007 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  Linux NICS <linux.nics@intel.com>
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#ifndef _DCB_CONFIG_H_
+#define _DCB_CONFIG_H_
+
+#include "ixgbe_type.h"
+
+/* DCB data structures */
+
+#define IXGBE_MAX_PACKET_BUFFERS 8
+#define MAX_USER_PRIORITY        8
+#define MAX_TRAFFIC_CLASS        8
+#define MAX_BW_GROUP             8
+#define BW_PERCENT               100
+
+#define DCB_TX_CONFIG            0
+#define DCB_RX_CONFIG            1
+
+/* DCB error Codes */
+#define DCB_SUCCESS              0
+#define DCB_ERR_CONFIG           -1
+#define DCB_ERR_PARAM            -2
+
+/* Transmit and receive Errors */
+/* Error in bandwidth group allocation */
+#define DCB_ERR_BW_GROUP        -3
+/* Error in traffic class bandwidth allocation */
+#define DCB_ERR_TC_BW           -4
+/* Traffic class has both link strict and group strict enabled */
+#define DCB_ERR_LS_GS           -5
+/* Link strict traffic class has non zero bandwidth */
+#define DCB_ERR_LS_BW_NONZERO   -6
+/* Link strict bandwidth group has non zero bandwidth */
+#define DCB_ERR_LS_BWG_NONZERO  -7
+/*  Traffic class has zero bandwidth */
+#define DCB_ERR_TC_BW_ZERO      -8
+
+#define DCB_NOT_IMPLEMENTED      0x7FFFFFFF
+
+struct dcb_pfc_tc_debug
+{
+	u8  tc;
+	u8  pause_status;
+	u64 pause_quanta;
+};
+
+enum strict_prio_type {
+	prio_none = 0,
+	prio_group,
+	prio_link
+};
+
+/* Traffic class bandwidth allocation per direction */
+struct tc_bw_alloc
+{
+	u8 bwg_id;                /* Bandwidth Group (BWG) ID */
+	u8 bwg_percent;           /* % of BWG's bandwidth */
+	u8 link_percent;          /* % of link bandwidth */
+	u8 up_to_tc_bitmap;       /* User Priority to Traffic Class mapping */
+	u16 data_credits_refill;  /* Credit refill amount in 64B granularity */
+	u16 data_credits_max;     /* Max credits for a configured packet buffer
+	                           * in 64B granularity.*/
+	enum strict_prio_type prio_type; /* Link or Group Strict Priority */
+};
+
+enum dcb_pfc_type {
+	pfc_disabled = 0,
+	pfc_enabled_full,
+	pfc_enabled_tx,
+	pfc_enabled_rx
+};
+
+/* Traffic class configuration */
+struct tc_configuration
+{
+	struct tc_bw_alloc path[2]; /* One each for Tx/Rx */
+	enum dcb_pfc_type  dcb_pfc; /* Class based flow control setting */
+
+	u16 desc_credits_max; /* For Tx Descriptor arbitration */
+	u8 tc; /* Traffic class (TC) */
+};
+
+enum dcb_rx_pba_cfg {
+	pba_equal,     /* PBA[0-7] each use 64KB FIFO */
+	pba_80_48      /* PBA[0-3] each use 80KB, PBA[4-7] each use 48KB */
+};
+
+struct ixgbe_dcb_config
+{
+	struct tc_configuration tc_config[MAX_TRAFFIC_CLASS];
+	u8     bw_percentage[2][MAX_BW_GROUP]; /* One each for Tx/Rx */
+
+	bool  round_robin_enable;
+
+	enum dcb_rx_pba_cfg rx_pba_cfg;
+
+	u32  dcb_cfg_version; /* Not used...OS-specific? */
+	u32  link_speed; /* For bandwidth allocation validation purpose */
+};
+
+
+/* DCB driver APIs */
+
+/* DCB rule checking function.*/
+s32 ixgbe_dcb_check_config(struct ixgbe_dcb_config *config);
+
+/* DCB credits calculation */
+s32 ixgbe_dcb_calculate_tc_credits(struct ixgbe_dcb_config *config,
+                                   u8 direction);
+
+/* DCB PFC functions */
+s32 ixgbe_dcb_config_pfc(struct ixgbe_hw *hw,
+                         struct ixgbe_dcb_config *dcb_config);
+s32 ixgbe_dcb_get_pfc_stats(struct ixgbe_hw *hw, struct ixgbe_hw_stats *stats,
+                            u8 tc_count);
+
+/* DCB traffic class stats */
+s32 ixgbe_dcb_config_tc_stats(struct ixgbe_hw *);
+s32 ixgbe_dcb_get_tc_stats(struct ixgbe_hw *hw, struct ixgbe_hw_stats *stats,
+                           u8 tc_count);
+
+/* DCB config arbiters */
+s32 ixgbe_dcb_config_tx_desc_arbiter(struct ixgbe_hw *hw,
+                                     struct ixgbe_dcb_config *dcb_config);
+s32 ixgbe_dcb_config_tx_data_arbiter(struct ixgbe_hw *hw,
+                                     struct ixgbe_dcb_config *dcb_config);
+s32 ixgbe_dcb_config_rx_arbiter(struct ixgbe_hw *hw,
+                                struct ixgbe_dcb_config *dcb_config);
+
+/* DCB hw initialization */
+s32 ixgbe_dcb_hw_config(struct ixgbe_hw *hw, struct ixgbe_dcb_config *config);
+
+
+/* DCB definitions for credit calculation */
+#define MAX_CREDIT_REFILL       511  /* 0x1FF * 64B = 32704B */
+#define MINIMUM_CREDIT_REFILL   5    /* 5*64B = 320B */
+#define MINIMUM_CREDIT_FOR_JUMBO 145  /* 145= UpperBound((9*1024+54)/64B) for 9KB jumbo frame */
+#define DCB_MAX_TSO_SIZE        32*1024 /* MAX TSO packet size supported in DCB mode */
+#define MINIMUM_CREDIT_FOR_TSO  (DCB_MAX_TSO_SIZE/64 + 1) /* 513 for 32KB TSO packet */
+#define MAX_CREDIT              4095 /* Maximum credit supported: 256KB * 1204 / 64B */
+
+#endif /* _DCB_CONFIG_H */
diff --git a/drivers/net/ixgbe/ixgbe_dcb_82598.c b/drivers/net/ixgbe/ixgbe_dcb_82598.c
new file mode 100644
index 0000000..39b63ee
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_dcb_82598.c
@@ -0,0 +1,400 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2007 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  Linux NICS <linux.nics@intel.com>
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+
+#include "ixgbe_type.h"
+#include "ixgbe_dcb.h"
+#include "ixgbe_dcb_82598.h"
+
+/**
+ * ixgbe_dcb_get_tc_stats_82598 - Return status data for each traffic class
+ * @hw: pointer to hardware structure
+ * @stats: pointer to statistics structure
+ * @tc_count:  Number of elements in bwg_array.
+ *
+ * This function returns the status data for each of the Traffic Classes in use.
+ */
+s32 ixgbe_dcb_get_tc_stats_82598(struct ixgbe_hw *hw,
+                                 struct ixgbe_hw_stats *stats,
+                                 u8 tc_count)
+{
+	int tc;
+
+	if (tc_count > MAX_TRAFFIC_CLASS)
+		return DCB_ERR_PARAM;
+
+	/* Statistics pertaining to each traffic class */
+	for (tc = 0; tc < tc_count; tc++) {
+		/* Transmitted Packets */
+		stats->qptc[tc] += IXGBE_READ_REG(hw, IXGBE_QPTC(tc));
+		/* Transmitted Bytes */
+		stats->qbtc[tc] += IXGBE_READ_REG(hw, IXGBE_QBTC(tc));
+		/* Received Packets */
+		stats->qprc[tc] += IXGBE_READ_REG(hw, IXGBE_QPRC(tc));
+		/* Received Bytes */
+		stats->qbrc[tc] += IXGBE_READ_REG(hw, IXGBE_QBRC(tc));
+	}
+
+	return 0;
+}
+
+/**
+ * ixgbe_dcb_get_pfc_stats_82598 - Returns CBFC status data
+ * @hw: pointer to hardware structure
+ * @stats: pointer to statistics structure
+ * @tc_count:  Number of elements in bwg_array.
+ *
+ * This function returns the CBFC status data for each of the Traffic Classes.
+ */
+s32 ixgbe_dcb_get_pfc_stats_82598(struct ixgbe_hw *hw,
+                                  struct ixgbe_hw_stats *stats,
+                                  u8 tc_count)
+{
+	int tc;
+
+	if (tc_count > MAX_TRAFFIC_CLASS)
+		return DCB_ERR_PARAM;
+
+	for (tc = 0; tc < tc_count; tc++) {
+		/* Priority XOFF Transmitted */
+		stats->pxofftxc[tc] += IXGBE_READ_REG(hw, IXGBE_PXOFFTXC(tc));
+		/* Priority XOFF Received */
+		stats->pxoffrxc[tc] += IXGBE_READ_REG(hw, IXGBE_PXOFFRXC(tc));
+	}
+
+	return 0;
+}
+
+/**
+ * ixgbe_dcb_config_packet_buffers_82598 - Configure packet buffers
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure packet buffers for DCB mode.
+ */ 
+s32 ixgbe_dcb_config_packet_buffers_82598(struct ixgbe_hw *hw,
+                                          struct ixgbe_dcb_config *dcb_config)
+{
+	s32 ret_val = 0;
+	u32 value = IXGBE_RXPBSIZE_64KB;
+	u8  i = 0;
+
+	/* Setup Rx packet buffer sizes */
+	switch (dcb_config->rx_pba_cfg) {
+	case pba_80_48:
+		/* Setup the first four at 80KB */
+		value = IXGBE_RXPBSIZE_80KB;
+		for (; i < 4; i++) {
+			IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(i), value);
+		}
+		/* Setup the last four at 48KB...don't re-init i */
+		value = IXGBE_RXPBSIZE_48KB;
+		/* Fall Through */
+	case pba_equal:
+	default:
+		for (; i < IXGBE_MAX_PACKET_BUFFERS; i++) {
+			IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(i), value);
+		}
+
+		/* Setup Tx packet buffer sizes */
+		for (i = 0; i < IXGBE_MAX_PACKET_BUFFERS; i++) {
+			IXGBE_WRITE_REG(hw, IXGBE_TXPBSIZE(i),
+			                IXGBE_TXPBSIZE_40KB);
+		}
+		break;
+	}
+
+	return ret_val;
+}
+
+/**
+ * ixgbe_dcb_config_rx_arbiter_82598 - Config Rx data arbiter
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Rx Data Arbiter and credits for each traffic class.
+ */
+s32 ixgbe_dcb_config_rx_arbiter_82598(struct ixgbe_hw *hw,
+                                      struct ixgbe_dcb_config *dcb_config)
+{
+	struct tc_bw_alloc    *p;
+	u32    reg           = 0;
+	u32    credit_refill = 0;
+	u32    credit_max    = 0;
+	u8     i             = 0;
+
+	reg = IXGBE_READ_REG(hw, IXGBE_RUPPBMR) | IXGBE_RUPPBMR_MQA;
+	IXGBE_WRITE_REG(hw, IXGBE_RUPPBMR, reg);
+
+	reg = IXGBE_READ_REG(hw, IXGBE_RMCS);
+	/* Enable Arbiter */
+	reg &= ~IXGBE_RMCS_ARBDIS;
+	/* Enable Receive Recycle within the BWG */
+	reg |= IXGBE_RMCS_RRM;
+	/* Enable Deficit Fixed Priority arbitration*/
+	reg |= IXGBE_RMCS_DFP;
+
+	IXGBE_WRITE_REG(hw, IXGBE_RMCS, reg);
+
+	/* Configure traffic class credits and priority */
+	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
+		p = &dcb_config->tc_config[i].path[DCB_RX_CONFIG];
+		credit_refill = p->data_credits_refill;
+		credit_max    = p->data_credits_max;
+
+		reg = credit_refill | (credit_max << IXGBE_RT2CR_MCL_SHIFT);
+
+		if (p->prio_type == prio_link)
+			reg |= IXGBE_RT2CR_LSP;
+
+		IXGBE_WRITE_REG(hw, IXGBE_RT2CR(i), reg);
+	}
+
+	reg = IXGBE_READ_REG(hw, IXGBE_RDRXCTL);
+	reg |= IXGBE_RDRXCTL_RDMTS_1_2;
+	reg |= IXGBE_RDRXCTL_MPBEN;
+	reg |= IXGBE_RDRXCTL_MCEN;
+	IXGBE_WRITE_REG(hw, IXGBE_RDRXCTL, reg);
+
+	reg = IXGBE_READ_REG(hw, IXGBE_RXCTRL);
+	/* Make sure there is enough descriptors before arbitration */
+	reg &= ~IXGBE_RXCTRL_DMBYPS;
+	IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, reg);
+
+	return 0;
+}
+
+/**
+ * ixgbe_dcb_config_tx_desc_arbiter_82598 - Config Tx Desc. arbiter
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Tx Descriptor Arbiter and credits for each traffic class.
+ */
+s32 ixgbe_dcb_config_tx_desc_arbiter_82598(struct ixgbe_hw *hw,
+                                           struct ixgbe_dcb_config *dcb_config)
+{
+	struct tc_bw_alloc *p;
+	u32    reg, max_credits;
+	u8     i;
+
+	reg = IXGBE_READ_REG(hw, IXGBE_DPMCS);
+
+	/* Enable arbiter */
+	reg &= ~IXGBE_DPMCS_ARBDIS;
+	if (!(dcb_config->round_robin_enable)) {
+		/* Enable DFP and Recycle mode */
+		reg |= (IXGBE_DPMCS_TDPAC | IXGBE_DPMCS_TRM);
+	}
+	reg |= IXGBE_DPMCS_TSOEF;
+	/* Configure Max TSO packet size 34KB including payload and headers */
+	reg |= (0x4 << IXGBE_DPMCS_MTSOS_SHIFT);
+
+	IXGBE_WRITE_REG(hw, IXGBE_DPMCS, reg);
+
+	/* Configure traffic class credits and priority */
+	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
+		p = &dcb_config->tc_config[i].path[DCB_TX_CONFIG];
+		max_credits = dcb_config->tc_config[i].desc_credits_max;
+		reg = max_credits << IXGBE_TDTQ2TCCR_MCL_SHIFT;
+		reg |= p->data_credits_refill;
+		reg |= (u32)(p->bwg_id) << IXGBE_TDTQ2TCCR_BWG_SHIFT;
+
+		if (p->prio_type == prio_group)
+			reg |= IXGBE_TDTQ2TCCR_GSP;
+
+		if (p->prio_type == prio_link)
+			reg |= IXGBE_TDTQ2TCCR_LSP;
+
+		IXGBE_WRITE_REG(hw, IXGBE_TDTQ2TCCR(i), reg);
+	}
+
+	return 0;
+}
+
+/**
+ * ixgbe_dcb_config_tx_data_arbiter_82598 - Config Tx data arbiter
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Tx Data Arbiter and credits for each traffic class.
+ */
+s32 ixgbe_dcb_config_tx_data_arbiter_82598(struct ixgbe_hw *hw,
+                                           struct ixgbe_dcb_config *dcb_config)
+{
+	struct tc_bw_alloc *p;
+	u32 reg;
+	u8 i;
+
+	reg = IXGBE_READ_REG(hw, IXGBE_PDPMCS);
+	/* Enable Data Plane Arbiter */
+	reg &= ~IXGBE_PDPMCS_ARBDIS;
+	/* Enable DFP and Transmit Recycle Mode */
+	reg |= (IXGBE_PDPMCS_TPPAC | IXGBE_PDPMCS_TRM);
+
+	IXGBE_WRITE_REG(hw, IXGBE_PDPMCS, reg);
+
+	/* Configure traffic class credits and priority */
+	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
+		p = &dcb_config->tc_config[i].path[DCB_TX_CONFIG];
+		reg = p->data_credits_refill;
+		reg |= (u32)(p->data_credits_max) << IXGBE_TDPT2TCCR_MCL_SHIFT;
+		reg |= (u32)(p->bwg_id) << IXGBE_TDPT2TCCR_BWG_SHIFT;
+
+		if (p->prio_type == prio_group)
+			reg |= IXGBE_TDPT2TCCR_GSP;
+
+		if (p->prio_type == prio_link)
+			reg |= IXGBE_TDPT2TCCR_LSP;
+
+		IXGBE_WRITE_REG(hw, IXGBE_TDPT2TCCR(i), reg);
+	}
+
+	/* Enable Tx packet buffer division */
+	reg = IXGBE_READ_REG(hw, IXGBE_DTXCTL);
+	reg |= IXGBE_DTXCTL_ENDBUBD;
+	IXGBE_WRITE_REG(hw, IXGBE_DTXCTL, reg);
+
+	return 0;
+}
+
+/**
+ * ixgbe_dcb_config_pfc_82598 - Config priority flow control
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure Priority Flow Control for each traffic class.
+ */
+s32 ixgbe_dcb_config_pfc_82598(struct ixgbe_hw *hw,
+                               struct ixgbe_dcb_config *dcb_config)
+{
+	u32 reg, rx_pba_size;
+	u8  i;
+
+	/* Enable Transmit Priority Flow Control */
+	reg = IXGBE_READ_REG(hw, IXGBE_RMCS);
+	reg &= ~IXGBE_RMCS_TFCE_802_3X;
+	/* correct the reporting of our flow control status */
+	hw->fc.type = ixgbe_fc_none;
+	reg |= IXGBE_RMCS_TFCE_PRIORITY;
+	IXGBE_WRITE_REG(hw, IXGBE_RMCS, reg);
+
+	/* Enable Receive Priority Flow Control */
+	reg = IXGBE_READ_REG(hw, IXGBE_FCTRL);
+	reg &= ~IXGBE_FCTRL_RFCE;
+	reg |= IXGBE_FCTRL_RPFCE;
+	IXGBE_WRITE_REG(hw, IXGBE_FCTRL, reg);
+
+	/*
+	 * Configure flow control thresholds and enable priority flow control
+	 * for each traffic class.
+	 */
+	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
+		if (dcb_config->rx_pba_cfg == pba_equal) {
+			rx_pba_size = IXGBE_RXPBSIZE_64KB;
+		} else {
+			rx_pba_size = (i < 4) ? IXGBE_RXPBSIZE_80KB
+			                      : IXGBE_RXPBSIZE_48KB;
+		}
+
+		reg = ((rx_pba_size >> 5) &  0xFFF0);
+		if (dcb_config->tc_config[i].dcb_pfc == pfc_enabled_tx ||
+		    dcb_config->tc_config[i].dcb_pfc == pfc_enabled_full)
+			reg |= IXGBE_FCRTL_XONE;
+
+		IXGBE_WRITE_REG(hw, IXGBE_FCRTL(i), reg);
+
+		reg = ((rx_pba_size >> 2) & 0xFFF0);
+		if (dcb_config->tc_config[i].dcb_pfc == pfc_enabled_tx ||
+		    dcb_config->tc_config[i].dcb_pfc == pfc_enabled_full)
+			reg |= IXGBE_FCRTH_FCEN;
+
+		IXGBE_WRITE_REG(hw, IXGBE_FCRTH(i), reg);
+	}
+
+	/* Configure pause time */
+	for (i = 0; i < (MAX_TRAFFIC_CLASS >> 1); i++)
+		IXGBE_WRITE_REG(hw, IXGBE_FCTTV(i), 0x68006800);
+
+	/* Configure flow control refresh threshold value */
+	IXGBE_WRITE_REG(hw, IXGBE_FCRTV, 0x3400);
+
+	return 0;
+}
+
+/**
+ * ixgbe_dcb_config_tc_stats_82598 - Configure traffic class statistics
+ * @hw: pointer to hardware structure
+ *
+ * Configure queue statistics registers, all queues belonging to same traffic
+ * class uses a single set of queue statistics counters.
+ */
+s32 ixgbe_dcb_config_tc_stats_82598(struct ixgbe_hw *hw)
+{
+	u32 reg = 0;
+	u8  i   = 0;
+	u8  j   = 0;
+
+	/* Receive Queues stats setting -  8 queues per statistics reg */
+	for (i = 0, j = 0; i < 15 && j < 8; i = i + 2, j++) {
+		reg = IXGBE_READ_REG(hw, IXGBE_RQSMR(i));
+		reg |= ((0x1010101) * j);
+		IXGBE_WRITE_REG(hw, IXGBE_RQSMR(i), reg);
+		reg = IXGBE_READ_REG(hw, IXGBE_RQSMR(i + 1));
+		reg |= ((0x1010101) * j);
+		IXGBE_WRITE_REG(hw, IXGBE_RQSMR(i + 1), reg);
+	}
+	/* Transmit Queues stats setting -  4 queues per statistics reg */
+	for (i = 0; i < 8; i++) {
+		reg = IXGBE_READ_REG(hw, IXGBE_TQSMR(i));
+		reg |= ((0x1010101) * i);
+		IXGBE_WRITE_REG(hw, IXGBE_TQSMR(i), reg);
+	}
+
+	return 0;
+}
+
+/**
+ * ixgbe_dcb_hw_config_82598 - Config and enable DCB
+ * @hw: pointer to hardware structure
+ * @dcb_config: pointer to ixgbe_dcb_config structure
+ *
+ * Configure dcb settings and enable dcb mode.
+ */
+s32 ixgbe_dcb_hw_config_82598(struct ixgbe_hw *hw,
+                              struct ixgbe_dcb_config *dcb_config)
+{
+	ixgbe_dcb_config_packet_buffers_82598(hw, dcb_config);
+	ixgbe_dcb_config_rx_arbiter_82598(hw, dcb_config);
+	ixgbe_dcb_config_tx_desc_arbiter_82598(hw, dcb_config);
+	ixgbe_dcb_config_tx_data_arbiter_82598(hw, dcb_config);
+	ixgbe_dcb_config_pfc_82598(hw, dcb_config);
+	ixgbe_dcb_config_tc_stats_82598(hw);
+
+	return 0;
+}
diff --git a/drivers/net/ixgbe/ixgbe_dcb_82598.h b/drivers/net/ixgbe/ixgbe_dcb_82598.h
new file mode 100644
index 0000000..d550944
--- /dev/null
+++ b/drivers/net/ixgbe/ixgbe_dcb_82598.h
@@ -0,0 +1,98 @@
+/*******************************************************************************
+
+  Intel 10 Gigabit PCI Express Linux driver
+  Copyright(c) 1999 - 2007 Intel Corporation.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms and conditions of the GNU General Public License,
+  version 2, as published by the Free Software Foundation.
+
+  This program is distributed in the hope it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+
+  The full GNU General Public License is included in this distribution in
+  the file called "COPYING".
+
+  Contact Information:
+  Linux NICS <linux.nics@intel.com>
+  e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+
+*******************************************************************************/
+
+#ifndef _DCB_82598_CONFIG_H_
+#define _DCB_82598_CONFIG_H_
+
+/* DCB register definitions */
+
+#define IXGBE_DPMCS_MTSOS_SHIFT 16
+#define IXGBE_DPMCS_TDPAC       0x00000001 /* 0 Round Robin, 1 DFP - Deficit Fixed Priority */
+#define IXGBE_DPMCS_TRM         0x00000010 /* Transmit Recycle Mode */
+#define IXGBE_DPMCS_ARBDIS      0x00000040 /* DCB arbiter disable */
+#define IXGBE_DPMCS_TSOEF       0x00080000 /* TSO Expand Factor: 0=x4, 1=x2 */
+
+#define IXGBE_RUPPBMR_MQA       0x80000000 /* Enable UP to queue mapping */
+
+#define IXGBE_RT2CR_MCL_SHIFT   12 /* Offset to Max Credit Limit setting */
+#define IXGBE_RT2CR_LSP         0x80000000 /* LSP enable bit */
+
+#define IXGBE_RDRXCTL_MPBEN     0x00000010 /* DMA config for multiple packet buffers enable */
+#define IXGBE_RDRXCTL_MCEN      0x00000040 /* DMA config for multiple cores (RSS) enable */
+
+#define IXGBE_TDTQ2TCCR_MCL_SHIFT   12
+#define IXGBE_TDTQ2TCCR_BWG_SHIFT   9
+#define IXGBE_TDTQ2TCCR_GSP     0x40000000
+#define IXGBE_TDTQ2TCCR_LSP     0x80000000
+
+#define IXGBE_TDPT2TCCR_MCL_SHIFT   12
+#define IXGBE_TDPT2TCCR_BWG_SHIFT   9
+#define IXGBE_TDPT2TCCR_GSP     0x40000000
+#define IXGBE_TDPT2TCCR_LSP     0x80000000
+
+#define IXGBE_PDPMCS_TPPAC      0x00000020 /* 0 Round Robin, 1 for DFP - Deficit Fixed Priority */
+#define IXGBE_PDPMCS_ARBDIS     0x00000040 /* Arbiter disable */
+#define IXGBE_PDPMCS_TRM        0x00000100 /* Transmit Recycle Mode enable */
+
+#define IXGBE_DTXCTL_ENDBUBD    0x00000004 /* Enable DBU buffer division */
+
+#define IXGBE_TXPBSIZE_40KB     0x0000A000 /* 40KB Packet Buffer */
+#define IXGBE_RXPBSIZE_48KB     0x0000C000 /* 48KB Packet Buffer */
+#define IXGBE_RXPBSIZE_64KB     0x00010000 /* 64KB Packet Buffer */
+#define IXGBE_RXPBSIZE_80KB     0x00014000 /* 80KB Packet Buffer */
+
+#define IXGBE_RDRXCTL_RDMTS_1_2 0x00000000
+
+/* DCB hardware-specific driver APIs */
+
+/* DCB PFC functions */
+s32 ixgbe_dcb_config_pfc_82598(struct ixgbe_hw *hw,
+                               struct ixgbe_dcb_config *dcb_config);
+s32 ixgbe_dcb_get_pfc_stats_82598(struct ixgbe_hw *hw,
+                                  struct ixgbe_hw_stats *stats,
+                                  u8 tc_count);
+
+/* DCB traffic class stats */
+s32 ixgbe_dcb_config_tc_stats_82598(struct ixgbe_hw *hw);
+s32 ixgbe_dcb_get_tc_stats_82598(struct ixgbe_hw *hw,
+                                 struct ixgbe_hw_stats *stats,
+                                 u8 tc_count);
+
+/* DCB config arbiters */
+s32 ixgbe_dcb_config_tx_desc_arbiter_82598(struct ixgbe_hw *hw,
+                                           struct ixgbe_dcb_config *dcb_config);
+s32 ixgbe_dcb_config_tx_data_arbiter_82598(struct ixgbe_hw *hw,
+                                           struct ixgbe_dcb_config *dcb_config);
+s32 ixgbe_dcb_config_rx_arbiter_82598(struct ixgbe_hw *hw,
+                                      struct ixgbe_dcb_config *dcb_config);
+
+/* DCB hw initialization */
+s32 ixgbe_dcb_hw_config_82598(struct ixgbe_hw *hw,
+                              struct ixgbe_dcb_config *config);
+
+#endif /* _DCB_82598_CONFIG_H */


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] ixgbe: Enable Data Center Bridging (DCB) support
  2008-05-02  0:42 [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe PJ Waskiewicz
  2008-05-02  0:43 ` [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes PJ Waskiewicz
  2008-05-02  0:43 ` [PATCH 2/3] ixgbe: Add DCB hardware initialization routines PJ Waskiewicz
@ 2008-05-02  0:43 ` PJ Waskiewicz
  2008-05-02 11:19 ` [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe Andi Kleen
  3 siblings, 0 replies; 13+ messages in thread
From: PJ Waskiewicz @ 2008-05-02  0:43 UTC (permalink / raw)
  To: jgarzik; +Cc: netdev

This patch enables DCB support for 82598.  DCB is a technology implementing
the 802.1Qaz and 802.1Qbb IEEE standards for priority grouping and priority
pause.  The technology uses the 802.1p VLAN priority tags to identify
traffic on the network, and establish prioritization of the traffic
throughout the environment.  The 802.1Qbb priority flow control allows
MAC-level flow control on each of these priorities, creating 8 virtual
links in the network, allowing certain types of traffic to be paused while
not affecting other traffic types.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---

 drivers/net/ixgbe/Makefile          |    3 
 drivers/net/ixgbe/ixgbe.h           |   28 +++-
 drivers/net/ixgbe/ixgbe_dcb_82598.c |    1 
 drivers/net/ixgbe/ixgbe_ethtool.c   |   36 +++++
 drivers/net/ixgbe/ixgbe_main.c      |  267 ++++++++++++++++++++++++++++-------
 5 files changed, 275 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ixgbe/Makefile b/drivers/net/ixgbe/Makefile
index ccd83d9..2a45fa0 100644
--- a/drivers/net/ixgbe/Makefile
+++ b/drivers/net/ixgbe/Makefile
@@ -33,4 +33,5 @@
 obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
-              ixgbe_82598.o ixgbe_phy.o
+              ixgbe_82598.o ixgbe_phy.o ixgbe_dcb.o ixgbe_dcb_82598.o \
+              ixgbe_dcb_nl.o
diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
index d981134..5098b9d 100644
--- a/drivers/net/ixgbe/ixgbe.h
+++ b/drivers/net/ixgbe/ixgbe.h
@@ -35,6 +35,7 @@
 
 #include "ixgbe_type.h"
 #include "ixgbe_common.h"
+#include "ixgbe_dcb.h"
 
 #ifdef CONFIG_DCA
 #include <linux/dca.h>
@@ -98,6 +99,7 @@
 #define IXGBE_TX_FLAGS_TSO		(u32)(1 << 2)
 #define IXGBE_TX_FLAGS_IPV4		(u32)(1 << 3)
 #define IXGBE_TX_FLAGS_VLAN_MASK	0xffff0000
+#define IXGBE_TX_FLAGS_VLAN_PRIO_MASK	0x0000e000
 #define IXGBE_TX_FLAGS_VLAN_SHIFT	16
 
 /* wrapper around a pointer to a socket buffer,
@@ -144,7 +146,7 @@ struct ixgbe_ring {
 
 	u16 reg_idx; /* holds the special value that gets the hardware register
 		      * offset associated with this ring, which is different
-		      * for DCE and RSS modes */
+		      * for DCB and RSS modes */
 
 #ifdef CONFIG_DCA
 	/* cpu for tx queue */
@@ -162,8 +164,10 @@ struct ixgbe_ring {
 	u16 work_limit;                /* max work per interrupt */
 };
 
+#define RING_F_DCB  0
 #define RING_F_VMDQ 1
 #define RING_F_RSS  2
+#define IXGBE_MAX_DCB_INDICES   8
 #define IXGBE_MAX_RSS_INDICES  16
 #define IXGBE_MAX_VMDQ_INDICES 16
 struct ixgbe_ring_feature {
@@ -174,6 +178,10 @@ struct ixgbe_ring_feature {
 #define MAX_RX_QUEUES 64
 #define MAX_TX_QUEUES 32
 
+#define MAX_RX_PACKET_BUFFERS ((adapter->flags & IXGBE_FLAG_DCB_ENABLED) \
+                               ? 8 : 1)
+#define MAX_TX_PACKET_BUFFERS MAX_RX_PACKET_BUFFERS
+
 /* MAX_MSIX_Q_VECTORS of these are allocated,
  * but we only use one per queue-specific vector.
  */
@@ -226,6 +234,9 @@ struct ixgbe_adapter {
 	struct work_struct reset_task;
 	struct ixgbe_q_vector q_vector[MAX_MSIX_Q_VECTORS];
 	char name[MAX_MSIX_COUNT][IFNAMSIZ + 5];
+	struct ixgbe_dcb_config dcb_cfg;
+	struct ixgbe_dcb_config temp_dcb_cfg;
+	u8 dcb_set_bitmap;
 
 	/* Interrupt Throttle Rate */
 	u32 itr_setting;
@@ -234,6 +245,7 @@ struct ixgbe_adapter {
 
 	/* TX */
 	struct ixgbe_ring *tx_ring;	/* One per active queue */
+	int num_tx_queues;
 	u64 restart_queue;
 	u64 lsc_int;
 	u64 hw_tso_ctxt;
@@ -243,12 +255,11 @@ struct ixgbe_adapter {
 
 	/* RX */
 	struct ixgbe_ring *rx_ring;	/* One per active queue */
+	int num_rx_queues;
 	u64 hw_csum_tx_good;
 	u64 hw_csum_rx_error;
 	u64 hw_csum_rx_good;
 	u64 non_eop_descs;
-	int num_tx_queues;
-	int num_rx_queues;
 	int num_msix_vectors;
 	struct ixgbe_ring_feature ring_feature[3];
 	struct msix_entry *msix_entries;
@@ -270,6 +281,7 @@ struct ixgbe_adapter {
 #define IXGBE_FLAG_RSS_ENABLED                  (u32)(1 << 6)
 #define IXGBE_FLAG_VMDQ_ENABLED                 (u32)(1 << 7)
 #define IXGBE_FLAG_DCA_ENABLED                  (u32)(1 << 8)
+#define IXGBE_FLAG_DCB_ENABLED                  (u32)(1 << 9)
 
 	/* OS defined structs */
 	struct net_device *netdev;
@@ -314,5 +326,15 @@ extern int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 				    struct ixgbe_ring *rxdr);
 extern int ixgbe_setup_tx_resources(struct ixgbe_adapter *adapter,
 				    struct ixgbe_ring *txdr);
+/* needed by ixgbe_dcb_nl.c */
+extern void ixgbe_configure_dcb(struct ixgbe_adapter *adapter);
+extern int ixgbe_close(struct net_device *netdev);
+extern void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter);
+extern int ixgbe_open(struct net_device *netdev);
+extern int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter);
+extern bool ixgbe_is_ixgbe(struct pci_dev *pcidev);
+
+extern int ixgbe_dcb_netlink_register(void);
+extern int ixgbe_dcb_netlink_unregister(void);
 
 #endif /* _IXGBE_H_ */
diff --git a/drivers/net/ixgbe/ixgbe_dcb_82598.c b/drivers/net/ixgbe/ixgbe_dcb_82598.c
index 39b63ee..3c7f187 100644
--- a/drivers/net/ixgbe/ixgbe_dcb_82598.c
+++ b/drivers/net/ixgbe/ixgbe_dcb_82598.c
@@ -27,6 +27,7 @@
 *******************************************************************************/
 
 
+#include "ixgbe.h"
 #include "ixgbe_type.h"
 #include "ixgbe_dcb.h"
 #include "ixgbe_dcb_82598.h"
diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index 4e46377..944f669 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -97,7 +97,17 @@ static struct ixgbe_stats ixgbe_gstrings_stats[] = {
 		 ((struct ixgbe_adapter *)netdev->priv)->num_rx_queues) * \
 		 (sizeof(struct ixgbe_queue_stats) / sizeof(u64)))
 #define IXGBE_GLOBAL_STATS_LEN	ARRAY_SIZE(ixgbe_gstrings_stats)
-#define IXGBE_STATS_LEN (IXGBE_GLOBAL_STATS_LEN + IXGBE_QUEUE_STATS_LEN)
+#define IXGBE_PB_STATS_LEN ( \
+		(((struct ixgbe_adapter *)netdev->priv)->flags & \
+		 IXGBE_FLAG_DCB_ENABLED) ? \
+		 (sizeof(((struct ixgbe_adapter *)0)->stats.pxonrxc) + \
+		  sizeof(((struct ixgbe_adapter *)0)->stats.pxontxc) + \
+		  sizeof(((struct ixgbe_adapter *)0)->stats.pxoffrxc) + \
+		  sizeof(((struct ixgbe_adapter *)0)->stats.pxofftxc)) \
+		 / sizeof(u64) : 0)
+#define IXGBE_STATS_LEN (IXGBE_GLOBAL_STATS_LEN + \
+		IXGBE_PB_STATS_LEN + \
+		IXGBE_QUEUE_STATS_LEN)
 
 static int ixgbe_get_settings(struct net_device *netdev,
 			      struct ethtool_cmd *ecmd)
@@ -806,6 +816,16 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i + k] = queue_stat[k];
 		i += k;
 	}
+	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
+		for (j = 0; j < MAX_TX_PACKET_BUFFERS; j++) {
+			data[i++] = adapter->stats.pxontxc[j];
+			data[i++] = adapter->stats.pxofftxc[j];
+		}
+		for (j = 0; j < MAX_RX_PACKET_BUFFERS; j++) {
+			data[i++] = adapter->stats.pxonrxc[j];
+			data[i++] = adapter->stats.pxoffrxc[j];
+		}
+	}
 }
 
 static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
@@ -834,6 +854,20 @@ static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
 			sprintf(p, "rx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
 		}
+		if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
+			for (i = 0; i < MAX_TX_PACKET_BUFFERS; i++) {
+				sprintf(p, "tx_pb_%u_pxon", i);
+				p += ETH_GSTRING_LEN;
+				sprintf(p, "tx_pb_%u_pxoff", i);
+				p += ETH_GSTRING_LEN;
+			}
+			for (i = 0; i < MAX_RX_PACKET_BUFFERS; i++) {
+				sprintf(p, "rx_pb_%u_pxon", i);
+				p += ETH_GSTRING_LEN;
+				sprintf(p, "rx_pb_%u_pxoff", i);
+				p += ETH_GSTRING_LEN;
+			}
+		}
 /*		BUG_ON(p - data != IXGBE_STATS_LEN * ETH_GSTRING_LEN); */
 		break;
 	}
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 7b85922..82312d7 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1,7 +1,7 @@
 /*******************************************************************************
 
   Intel 10 Gigabit PCI Express Linux driver
-  Copyright(c) 1999 - 2007 Intel Corporation.
+  Copyright(c) 1999 - 2008 Intel Corporation.
 
   This program is free software; you can redistribute it and/or modify it
   under the terms and conditions of the GNU General Public License,
@@ -48,7 +48,7 @@ char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
 	"Intel(R) 10 Gigabit PCI Express Network Driver";
 
-#define DRV_VERSION "1.3.18-k2"
+#define DRV_VERSION "1.3.26-k2"
 const char ixgbe_driver_version[] = DRV_VERSION;
 static const char ixgbe_copyright[] =
 	 "Copyright (c) 1999-2007 Intel Corporation.";
@@ -397,13 +397,13 @@ static void ixgbe_receive_skb(struct ixgbe_adapter *adapter,
 			      u16 tag)
 {
 	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) {
-		if (adapter->vlgrp && is_vlan)
+		if (adapter->vlgrp && is_vlan && (tag != 0))
 			vlan_hwaccel_receive_skb(skb, adapter->vlgrp, tag);
 		else
 			netif_receive_skb(skb);
 	} else {
 
-		if (adapter->vlgrp && is_vlan)
+		if (adapter->vlgrp && is_vlan && (tag != 0))
 			vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
 		else
 			netif_rx(skb);
@@ -545,14 +545,13 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_adapter *adapter,
 	struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer;
 	struct sk_buff *skb;
 	unsigned int i;
-	u32 upper_len, len, staterr;
+	u32 len, staterr;
 	u16 hdr_info, vlan_tag;
 	bool is_vlan, cleaned = false;
 	int cleaned_count = 0;
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 
 	i = rx_ring->next_to_clean;
-	upper_len = 0;
 	rx_desc = IXGBE_RX_DESC_ADV(*rx_ring, i);
 	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	rx_buffer_info = &rx_ring->rx_buffer_info[i];
@@ -560,6 +559,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_adapter *adapter,
 	vlan_tag = le16_to_cpu(rx_desc->wb.upper.vlan);
 
 	while (staterr & IXGBE_RXD_STAT_DD) {
+		u32 upper_len = 0;
 		if (*work_done >= work_to_do)
 			break;
 		(*work_done)++;
@@ -1343,7 +1343,7 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 }
 
 /**
- * ixgbe_configure_tx - Configure 8254x Transmit Unit after Reset
+ * ixgbe_configure_tx - Configure 8259x Transmit Unit after Reset
  * @adapter: board private structure
  *
  * Configure the Tx unit of the MAC after a reset.
@@ -1371,9 +1371,9 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
 		/* Disable Tx Head Writeback RO bit, since this hoses
 		 * bookkeeping if things aren't delivered in order.
 		 */
-		txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
+		txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(j));
 		txctrl &= ~IXGBE_DCA_TXCTRL_TX_WB_RO_EN;
-		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), txctrl);
+		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(j), txctrl);
 	}
 }
 
@@ -1382,7 +1382,7 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
 
 #define IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT			2
 /**
- * ixgbe_configure_rx - Configure 8254x Receive Unit after Reset
+ * ixgbe_configure_rx - Configure 8259x Receive Unit after Reset
  * @adapter: board private structure
  *
  * Configure the Rx unit of the MAC after a reset.
@@ -1529,6 +1529,16 @@ static void ixgbe_vlan_rx_register(struct net_device *netdev,
 		ixgbe_irq_disable(adapter);
 	adapter->vlgrp = grp;
 
+	/*
+	 * For a DCB driver, always enable VLAN tag stripping so we can
+	 * still receive traffic from a DCB-enabled host even if we're
+	 * not in DCB mode.
+	 */
+	ctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_VLNCTRL);
+	ctrl |= IXGBE_VLNCTRL_VME;
+	ctrl &= ~IXGBE_VLNCTRL_CFIEN;
+	IXGBE_WRITE_REG(&adapter->hw, IXGBE_VLNCTRL, ctrl);
+
 	if (grp) {
 		/* enable VLAN tag insert/strip */
 		ctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_VLNCTRL);
@@ -1672,6 +1682,42 @@ static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 	}
 }
 
+/*
+ * ixgbe_configure_dcb - Configure DCB hardware
+ * @adapter: ixgbe adapter struct
+ *
+ * This is called by the driver on open to configure the DCB hardware.
+ * This is also called by the gennetlink interface when reconfiguring
+ * the DCB state.
+ */
+void ixgbe_configure_dcb(struct ixgbe_adapter *adapter)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 txdctl, vlnctrl;
+	int i, j;
+
+	ixgbe_dcb_check_config(&adapter->dcb_cfg);
+	ixgbe_dcb_calculate_tc_credits(&adapter->dcb_cfg, DCB_TX_CONFIG);
+	ixgbe_dcb_calculate_tc_credits(&adapter->dcb_cfg, DCB_RX_CONFIG);
+
+	/* reconfigure the hardware */
+	ixgbe_dcb_hw_config(&adapter->hw, &adapter->dcb_cfg);
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		j = adapter->tx_ring[i].reg_idx;
+		txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(j));
+		/* PThresh workaround for Tx hang with DFP enabled. */
+		txdctl |= 32;
+		IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(j), txdctl);
+	}
+	/* Enable VLAN tag insert/strip */
+	vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
+	vlnctrl |= IXGBE_VLNCTRL_VME | IXGBE_VLNCTRL_VFE;
+	vlnctrl &= ~IXGBE_VLNCTRL_CFIEN;
+	IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
+	ixgbe_set_vfta(hw, 0, 0, true);
+}
+
 static void ixgbe_configure(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
@@ -1680,6 +1726,12 @@ static void ixgbe_configure(struct ixgbe_adapter *adapter)
 	ixgbe_set_multi(netdev);
 
 	ixgbe_restore_vlan(adapter);
+	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
+		netif_set_gso_max_size(netdev, 32768);
+		ixgbe_configure_dcb(adapter);
+	} else {
+		netif_set_gso_max_size(netdev, 65536);
+	}
 
 	ixgbe_configure_tx(adapter);
 	ixgbe_configure_rx(adapter);
@@ -1699,6 +1751,11 @@ static int ixgbe_up_complete(struct ixgbe_adapter *adapter)
 
 	ixgbe_get_hw_control(adapter);
 
+#ifdef CONFIG_NETDEVICES_MULTIQUEUE
+	if (adapter->num_tx_queues > 1)
+		netdev->features |= NETIF_F_MULTI_QUEUE;
+#endif
+
 	if ((adapter->flags & IXGBE_FLAG_MSIX_ENABLED) ||
 	    (adapter->flags & IXGBE_FLAG_MSI_ENABLED)) {
 		if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
@@ -1943,36 +2000,44 @@ static void ixgbe_clean_all_tx_rings(struct ixgbe_adapter *adapter)
 void ixgbe_down(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
+	struct ixgbe_hw *hw = &adapter->hw;
 	u32 rxctrl;
+	u32 txdctl;
+	int i, j;
 
 	/* signal that we are down to the interrupt handler */
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
 	/* disable receives */
-	rxctrl = IXGBE_READ_REG(&adapter->hw, IXGBE_RXCTRL);
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_RXCTRL,
-			rxctrl & ~IXGBE_RXCTRL_RXEN);
-
-	netif_tx_disable(netdev);
-
-	/* disable transmits in the hardware */
+	rxctrl = IXGBE_READ_REG(hw, IXGBE_RXCTRL);
+	IXGBE_WRITE_REG(hw, IXGBE_RXCTRL, rxctrl & ~IXGBE_RXCTRL_RXEN);
 
-	/* flush both disables */
-	IXGBE_WRITE_FLUSH(&adapter->hw);
+	IXGBE_WRITE_FLUSH(hw);
 	msleep(10);
 
+	netif_stop_queue(netdev);
+	if (netif_is_multiqueue(netdev))
+		for (i = 0; i < adapter->num_tx_queues; i++)
+			netif_stop_subqueue(netdev, i);
+
 	ixgbe_irq_disable(adapter);
 
 	ixgbe_napi_disable_all(adapter);
 	del_timer_sync(&adapter->watchdog_timer);
 
+	/* disable transmits in the hardware now that interrupts are off */
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		j = adapter->tx_ring[i].reg_idx;
+		txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(j));
+		IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(j),
+		                (txdctl & ~IXGBE_TXDCTL_ENABLE));
+	}
+
 	netif_carrier_off(netdev);
-	netif_stop_queue(netdev);
 
 	ixgbe_reset(adapter);
 	ixgbe_clean_all_tx_rings(adapter);
 	ixgbe_clean_all_rx_rings(adapter);
-
 }
 
 static int ixgbe_suspend(struct pci_dev *pdev, pm_message_t state)
@@ -2069,6 +2134,11 @@ static void ixgbe_reset_task(struct work_struct *work)
 	struct ixgbe_adapter *adapter;
 	adapter = container_of(work, struct ixgbe_adapter, reset_task);
 
+	/* If we're already down or resetting, just bail */
+	if (test_bit(__IXGBE_DOWN, &adapter->state) ||
+	    test_bit(__IXGBE_RESETTING, &adapter->state))
+		return;
+
 	adapter->tx_timeout_count++;
 
 	ixgbe_reinit_locked(adapter);
@@ -2112,6 +2182,7 @@ static void ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter,
 		adapter->flags &= ~IXGBE_FLAG_MSIX_ENABLED;
 		kfree(adapter->msix_entries);
 		adapter->msix_entries = NULL;
+		adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED;
 		adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED;
 		adapter->num_tx_queues = 1;
 		adapter->num_rx_queues = 1;
@@ -2121,19 +2192,39 @@ static void ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter,
 	}
 }
 
-static void __devinit ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
+static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 {
-	int nrq, ntq;
+	int nrq = 1, ntq = 1;
 	int feature_mask = 0, rss_i, rss_m;
+	int dcb_i, dcb_m;
 
 	/* Number of supported queues */
 	switch (adapter->hw.mac.type) {
 	case ixgbe_mac_82598EB:
+		dcb_i = adapter->ring_feature[RING_F_DCB].indices;
+		dcb_m = 0;
 		rss_i = adapter->ring_feature[RING_F_RSS].indices;
 		rss_m = 0;
+		feature_mask |= IXGBE_FLAG_DCB_ENABLED;
 		feature_mask |= IXGBE_FLAG_RSS_ENABLED;
 
 		switch (adapter->flags & feature_mask) {
+		case (IXGBE_FLAG_DCB_ENABLED):
+#ifdef CONFIG_NETDEVICES_MULTIQUEUE
+			dcb_m = 0x7 << 3;
+			nrq = dcb_i;
+			ntq = dcb_i;
+#else
+			printk(KERN_INFO, "Kernel has no multiqueue "
+				"support, disabling DCB.\n");
+			/* Fall back onto RSS */
+			rss_m = 0xF;
+			nrq = rss_i;
+			ntq = 1;
+			dcb_m = 0;
+			dcb_i = 0;
+#endif
+			break;
 		case (IXGBE_FLAG_RSS_ENABLED):
 			rss_m = 0xF;
 			nrq = rss_i;
@@ -2145,6 +2236,8 @@ static void __devinit ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 			break;
 		case 0:
 		default:
+			dcb_i = 0;
+			dcb_m = 0;
 			rss_i = 0;
 			rss_m = 0;
 			nrq = 1;
@@ -2152,6 +2245,8 @@ static void __devinit ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 			break;
 		}
 
+		adapter->ring_feature[RING_F_DCB].indices = dcb_i;
+		adapter->ring_feature[RING_F_DCB].mask = dcb_m;
 		adapter->ring_feature[RING_F_RSS].indices = rss_i;
 		adapter->ring_feature[RING_F_RSS].mask = rss_m;
 		break;
@@ -2179,15 +2274,25 @@ static void __devinit ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
 	 */
 	int feature_mask = 0, rss_i;
 	int i, txr_idx, rxr_idx;
+	int dcb_i;
 
 	/* Number of supported queues */
 	switch (adapter->hw.mac.type) {
 	case ixgbe_mac_82598EB:
+		dcb_i = adapter->ring_feature[RING_F_DCB].indices;
 		rss_i = adapter->ring_feature[RING_F_RSS].indices;
 		txr_idx = 0;
 		rxr_idx = 0;
+		feature_mask |= IXGBE_FLAG_DCB_ENABLED;
 		feature_mask |= IXGBE_FLAG_RSS_ENABLED;
 		switch (adapter->flags & feature_mask) {
+		case (IXGBE_FLAG_DCB_ENABLED):
+			/* the number of queues is assumed to be symmetric */
+			for (i = 0; i < dcb_i; i++) {
+				adapter->rx_ring[i].reg_idx = i << 3;
+				adapter->tx_ring[i].reg_idx = i << 2;
+			}
+			break;
 		case (IXGBE_FLAG_RSS_ENABLED):
 			for (i = 0; i < adapter->num_rx_queues; i++)
 				adapter->rx_ring[i].reg_idx = i;
@@ -2212,7 +2317,7 @@ static void __devinit ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
  * number of queues at compile-time.  The polling_netdev array is
  * intended for Multiqueue, but should work fine with a single queue.
  **/
-static int __devinit ixgbe_alloc_queues(struct ixgbe_adapter *adapter)
+static int ixgbe_alloc_queues(struct ixgbe_adapter *adapter)
 {
 	int i;
 
@@ -2252,7 +2357,7 @@ err_tx_ring_allocation:
  * Attempt to configure the interrupts using the best available
  * capabilities of the hardware and the kernel.
  **/
-static int __devinit ixgbe_set_interrupt_capability(struct ixgbe_adapter
+static int ixgbe_set_interrupt_capability(struct ixgbe_adapter
 						    *adapter)
 {
 	int err = 0;
@@ -2281,6 +2386,7 @@ static int __devinit ixgbe_set_interrupt_capability(struct ixgbe_adapter
 	adapter->msix_entries = kcalloc(v_budget,
 					sizeof(struct msix_entry), GFP_KERNEL);
 	if (!adapter->msix_entries) {
+		adapter->flags &= ~IXGBE_FLAG_DCB_ENABLED;
 		adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED;
 		ixgbe_set_num_queues(adapter);
 		kfree(adapter->tx_ring);
@@ -2323,7 +2429,7 @@ out:
 	return err;
 }
 
-static void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter)
+void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter)
 {
 	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
 		adapter->flags &= ~IXGBE_FLAG_MSIX_ENABLED;
@@ -2347,7 +2453,7 @@ static void ixgbe_reset_interrupt_capability(struct ixgbe_adapter *adapter)
  * - Hardware queue count (num_*_queues)
  *   - defined by miscellaneous hardware support/features (RSS, etc.)
  **/
-static int __devinit ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
+int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
 {
 	int err;
 
@@ -2395,11 +2501,27 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct pci_dev *pdev = adapter->pdev;
 	unsigned int rss;
+	int j;
+	struct tc_configuration *tc;
 
 	/* Set capability flags */
 	rss = min(IXGBE_MAX_RSS_INDICES, (int)num_online_cpus());
 	adapter->ring_feature[RING_F_RSS].indices = rss;
 	adapter->flags |= IXGBE_FLAG_RSS_ENABLED;
+	adapter->ring_feature[RING_F_DCB].indices = IXGBE_MAX_DCB_INDICES;
+	for (j = 0; j < MAX_TRAFFIC_CLASS; j++) {
+		tc = &adapter->dcb_cfg.tc_config[j];
+		tc->path[DCB_TX_CONFIG].bwg_id = 0;
+		tc->path[DCB_TX_CONFIG].bwg_percent = 12 + (j & 1);
+		tc->path[DCB_RX_CONFIG].bwg_id = 0;
+		tc->path[DCB_RX_CONFIG].bwg_percent = 12 + (j & 1);
+		tc->dcb_pfc = pfc_disabled;
+	}
+	adapter->dcb_cfg.bw_percentage[DCB_TX_CONFIG][0] = 100;
+	adapter->dcb_cfg.bw_percentage[DCB_RX_CONFIG][0] = 100;
+	adapter->dcb_cfg.rx_pba_cfg = pba_equal;
+	adapter->dcb_cfg.round_robin_enable = false;
+	adapter->dcb_set_bitmap = 0x00;
 
 	/* Enable Dynamic interrupt throttling by default */
 	adapter->rx_eitr = 1;
@@ -2681,7 +2803,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
  * handler is registered with the OS, the watchdog timer is started,
  * and the stack is notified that the interface is ready.
  **/
-static int ixgbe_open(struct net_device *netdev)
+int ixgbe_open(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	int err;
@@ -2736,7 +2858,7 @@ err_setup_tx:
  * needs to be disabled.  A global MAC reset is issued to stop the
  * hardware, and all transmit and receive resources are freed.
  **/
-static int ixgbe_close(struct net_device *netdev)
+int ixgbe_close(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
@@ -2769,6 +2891,18 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 		adapter->stats.mpc[i] += mpc;
 		total_mpc += adapter->stats.mpc[i];
 		adapter->stats.rnbc[i] += IXGBE_READ_REG(hw, IXGBE_RNBC(i));
+		adapter->stats.qptc[i] += IXGBE_READ_REG(hw, IXGBE_QPTC(i));
+		adapter->stats.qbtc[i] += IXGBE_READ_REG(hw, IXGBE_QBTC(i));
+		adapter->stats.qprc[i] += IXGBE_READ_REG(hw, IXGBE_QPRC(i));
+		adapter->stats.qbrc[i] += IXGBE_READ_REG(hw, IXGBE_QBRC(i));
+		adapter->stats.pxonrxc[i] += IXGBE_READ_REG(hw,
+		                                            IXGBE_PXONRXC(i));
+		adapter->stats.pxontxc[i] += IXGBE_READ_REG(hw,
+		                                            IXGBE_PXONTXC(i));
+		adapter->stats.pxoffrxc[i] += IXGBE_READ_REG(hw,
+		                                             IXGBE_PXOFFRXC(i));
+		adapter->stats.pxofftxc[i] += IXGBE_READ_REG(hw,
+		                                             IXGBE_PXOFFTXC(i));
 	}
 	adapter->stats.gprc += IXGBE_READ_REG(hw, IXGBE_GPRC);
 	/* work around hardware counting issue */
@@ -2865,10 +2999,9 @@ static void ixgbe_watchdog(unsigned long data)
 
 			netif_carrier_on(netdev);
 			netif_wake_queue(netdev);
-#ifdef CONFIG_NETDEVICES_MULTIQUEUE
-			for (i = 0; i < adapter->num_tx_queues; i++)
-				netif_wake_subqueue(netdev, i);
-#endif
+			if (netif_is_multiqueue(netdev))
+				for (i = 0; i < adapter->num_tx_queues; i++)
+					netif_wake_subqueue(netdev, i);
 		} else {
 			/* Force detection of hung controller */
 			adapter->detect_tx_hung = true;
@@ -2878,6 +3011,9 @@ static void ixgbe_watchdog(unsigned long data)
 			DPRINTK(LINK, INFO, "NIC Link is Down\n");
 			netif_carrier_off(netdev);
 			netif_stop_queue(netdev);
+			if (netif_is_multiqueue(netdev))
+				for (i = 0; i < adapter->num_tx_queues; i++)
+					netif_stop_subqueue(netdev, i);
 		}
 	}
 
@@ -2972,6 +3108,8 @@ static int ixgbe_tso(struct ixgbe_adapter *adapter,
 		mss_l4len_idx |=
 		    (skb_shinfo(skb)->gso_size << IXGBE_ADVTXD_MSS_SHIFT);
 		mss_l4len_idx |= (l4len << IXGBE_ADVTXD_L4LEN_SHIFT);
+		/* use index 1 for TSO */
+		mss_l4len_idx |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
 		context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx);
 
 		tx_buffer_info->time_stamp = jiffies;
@@ -3044,6 +3182,7 @@ static bool ixgbe_tx_csum(struct ixgbe_adapter *adapter,
 		}
 
 		context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd_mlhl);
+		/* use index zero for tx checksum offload */
 		context_desc->mss_l4len_idx = 0;
 
 		tx_buffer_info->time_stamp = jiffies;
@@ -3152,6 +3291,8 @@ static void ixgbe_tx_queue(struct ixgbe_adapter *adapter,
 		olinfo_status |= IXGBE_TXD_POPTS_TXSM <<
 						IXGBE_ADVTXD_POPTS_SHIFT;
 
+		/* use index 1 context for tso */
+		olinfo_status |= (1 << IXGBE_ADVTXD_IDX_SHIFT);
 		if (tx_flags & IXGBE_TX_FLAGS_IPV4)
 			olinfo_status |= IXGBE_TXD_POPTS_IXSM <<
 						IXGBE_ADVTXD_POPTS_SHIFT;
@@ -3195,11 +3336,11 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev,
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-#ifdef CONFIG_NETDEVICES_MULTIQUEUE
-	netif_stop_subqueue(netdev, tx_ring->queue_index);
-#else
-	netif_stop_queue(netdev);
-#endif
+	if (netif_is_multiqueue(netdev))
+		netif_stop_subqueue(netdev, tx_ring->queue_index);
+	else
+		netif_stop_queue(netdev);
+
 	/* Herbert's original patch had:
 	 *  smp_mb__after_netif_stop_queue();
 	 * but since that doesn't exist yet, just open code it. */
@@ -3211,11 +3352,10 @@ static int __ixgbe_maybe_stop_tx(struct net_device *netdev,
 		return -EBUSY;
 
 	/* A reprieve! - use start_queue because it doesn't call schedule */
-#ifdef CONFIG_NETDEVICES_MULTIQUEUE
-	netif_wake_subqueue(netdev, tx_ring->queue_index);
-#else
-	netif_wake_queue(netdev);
-#endif
+	if (netif_is_multiqueue(netdev))
+		netif_start_subqueue(netdev, tx_ring->queue_index);
+	else
+		netif_start_queue(netdev);
 	++adapter->restart_queue;
 	return 0;
 }
@@ -3253,6 +3393,7 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 		dev_kfree_skb(skb);
 		return NETDEV_TX_OK;
 	}
+
 	mss = skb_shinfo(skb)->gso_size;
 
 	if (mss)
@@ -3269,8 +3410,21 @@ static int ixgbe_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 		return NETDEV_TX_BUSY;
 	}
 	if (adapter->vlgrp && vlan_tx_tag_present(skb)) {
+		tx_flags |= vlan_tx_tag_get(skb);
+#ifdef CONFIG_NETDEVICES_MULTIQUEUE
+		if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
+			tx_flags &= ~IXGBE_TX_FLAGS_VLAN_PRIO_MASK;
+			tx_flags |= (skb->queue_mapping << 13);
+		}
+#endif
+		tx_flags <<= IXGBE_TX_FLAGS_VLAN_SHIFT;
+		tx_flags |= IXGBE_TX_FLAGS_VLAN;
+#ifdef CONFIG_NETDEVICES_MULTIQUEUE
+	} else if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
+		tx_flags |= (skb->queue_mapping << 13);
+		tx_flags <<= IXGBE_TX_FLAGS_VLAN_SHIFT;
 		tx_flags |= IXGBE_TX_FLAGS_VLAN;
-		tx_flags |= (vlan_tx_tag_get(skb) << IXGBE_TX_FLAGS_VLAN_SHIFT);
+#endif
 	}
 
 	if (skb->protocol == htons(ETH_P_IP))
@@ -3520,13 +3674,12 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 	netdev->features |= NETIF_F_TSO;
 
 	netdev->features |= NETIF_F_TSO6;
+	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
+		adapter->flags &= ~IXGBE_FLAG_RSS_ENABLED;
+
 	if (pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
-#ifdef CONFIG_NETDEVICES_MULTIQUEUE
-	netdev->features |= NETIF_F_MULTI_QUEUE;
-#endif
-
 	/* make sure the EEPROM is good */
 	if (ixgbe_validate_eeprom_checksum(hw, NULL) < 0) {
 		dev_err(&pdev->dev, "The EEPROM Checksum Is Not Valid\n");
@@ -3593,10 +3746,9 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 
 	netif_carrier_off(netdev);
 	netif_stop_queue(netdev);
-#ifdef CONFIG_NETDEVICES_MULTIQUEUE
-	for (i = 0; i < adapter->num_tx_queues; i++)
-		netif_stop_subqueue(netdev, i);
-#endif
+	if (netif_is_multiqueue(netdev))
+		for (i = 0; i < adapter->num_tx_queues; i++)
+			netif_stop_subqueue(netdev, i);
 
 	ixgbe_napi_add_all(adapter);
 
@@ -3774,6 +3926,11 @@ static struct pci_driver ixgbe_driver = {
 	.err_handler = &ixgbe_err_handler
 };
 
+bool ixgbe_is_ixgbe(struct pci_dev *pcidev)
+{
+	return (!(pci_dev_driver(pcidev) != &ixgbe_driver));
+}
+
 /**
  * ixgbe_init_module - Driver Registration Routine
  *
@@ -3782,18 +3939,17 @@ static struct pci_driver ixgbe_driver = {
  **/
 static int __init ixgbe_init_module(void)
 {
-	int ret;
 	printk(KERN_INFO "%s: %s - version %s\n", ixgbe_driver_name,
 	       ixgbe_driver_string, ixgbe_driver_version);
 
 	printk(KERN_INFO "%s: %s\n", ixgbe_driver_name, ixgbe_copyright);
 
+	ixgbe_dcb_netlink_register();
 #ifdef CONFIG_DCA
 	dca_register_notify(&dca_notifier);
 
 #endif
-	ret = pci_register_driver(&ixgbe_driver);
-	return ret;
+	return pci_register_driver(&ixgbe_driver);
 }
 module_init(ixgbe_init_module);
 
@@ -3808,6 +3964,7 @@ static void __exit ixgbe_exit_module(void)
 #ifdef CONFIG_DCA
 	dca_unregister_notify(&dca_notifier);
 #endif
+	ixgbe_dcb_netlink_unregister();
 	pci_unregister_driver(&ixgbe_driver);
 }
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-02  0:43 ` [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes PJ Waskiewicz
@ 2008-05-02 11:03   ` Jeff Garzik
  2008-05-02 20:08     ` Waskiewicz Jr, Peter P
  2008-05-07 20:53     ` Waskiewicz Jr, Peter P
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff Garzik @ 2008-05-02 11:03 UTC (permalink / raw)
  To: PJ Waskiewicz; +Cc: netdev

PJ Waskiewicz wrote:
> This patch introduces a new generic netlink subsystem for Data Center
> Bridging, aka DCB.  The interface will allow userspace applications to
> configure the DCB parameters in the driver required to make the technology
> work.
> 
> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> ---
> 
>  drivers/net/ixgbe/ixgbe_dcb_nl.c | 1273 ++++++++++++++++++++++++++++++++++++++
>  1 files changed, 1273 insertions(+), 0 deletions(-)

Seems to me we don't want each driver supporting this technology to 
create their own netlink interface...

This seems more appropriate via ethtool, or ethtool-netlink (Thomas Graf 
posted an RFC)

	Jeff





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe
  2008-05-02  0:42 [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe PJ Waskiewicz
                   ` (2 preceding siblings ...)
  2008-05-02  0:43 ` [PATCH 3/3] ixgbe: Enable Data Center Bridging (DCB) support PJ Waskiewicz
@ 2008-05-02 11:19 ` Andi Kleen
  2008-05-02 20:18   ` Waskiewicz Jr, Peter P
  3 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2008-05-02 11:19 UTC (permalink / raw)
  To: PJ Waskiewicz; +Cc: jgarzik, netdev

PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com> writes:
>
> The third patchset implements the netlink interface and hardware init code,
> and enables DCB support in the driver.

Probably the interface shouldn't be driver specific.

I would suggest you post a higher level description of the goals/implementation 
etc. of the interface.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-02 11:03   ` Jeff Garzik
@ 2008-05-02 20:08     ` Waskiewicz Jr, Peter P
  2008-05-07 20:53     ` Waskiewicz Jr, Peter P
  1 sibling, 0 replies; 13+ messages in thread
From: Waskiewicz Jr, Peter P @ 2008-05-02 20:08 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev

> Seems to me we don't want each driver supporting this 
> technology to create their own netlink interface...

Definitely reasonable.  However, even with an ethtool interface, there's
going to be a good chunk of development.  Is the issue just the
non-generic netlink family we're implementing, or is it something else?

> This seems more appropriate via ethtool, or ethtool-netlink 
> (Thomas Graf posted an RFC)

Oddly enough, we made the decision early on in development not to use
ethtool.  We would be adding a rather large number of ioctls for this
technology, and our initial thoughts were that the community would
reject that large of a change to the ethtool interface.

I'm very interested in the ethtool-netlink interface, which would
probably help us get rid of the ixgbe_is_ixgbe() call in the DCB driver.

Aside from the netlink issues, how does the rest of the code appear?
Thanks for the feedback Jeff.

Cheers,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe
  2008-05-02 11:19 ` [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe Andi Kleen
@ 2008-05-02 20:18   ` Waskiewicz Jr, Peter P
  0 siblings, 0 replies; 13+ messages in thread
From: Waskiewicz Jr, Peter P @ 2008-05-02 20:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jgarzik, netdev

> Probably the interface shouldn't be driver specific.
> 
> I would suggest you post a higher level description of the 
> goals/implementation etc. of the interface.

Agreed the netlink interface being driver-specific isn't very desirable.
I'm going to look more at the ethtool-netlink interface Jeff mentioned
in the meantime; but here is the basic need for the interface:

DCB allows configuration of both Tx and Rx parameters for bandwidth
control and priority flow control settings.  It also has the ability to
group traffic classes into bandwidth groups, which can then have other
features turned on to control the way bandwidth is arbitrated within the
bandwidth group itself, and across bandwidth groups.  Note that all of
these settings are in hardware, so we need an interface to the driver to
feed these configuration sets into the hardware.  Originally we thought
ethtool, but the number of ioctls we would need to add in order to
support the dataset was pretty huge.  So we chose to try using netlink,
and not pollute ethtool at this point.

We have a set of userspace tools that will be posted to Sourceforge
shortly.  There is a daemon (dcbd) and a command-line tool (dcbtool).
The daemon is the code that implements the netlink interface into the
driver, and feeds the configuration sent from dcbtool or from its local
configuration file.  dcbd also implements the Data Center Bridging
Exchange protocol, which is an LLDP-based protocol that allows DCB
devices to negotiate settings between link partners.  The recently
announced Cisco Nexus switches run a DCB exchange service that
implements the protocol (I don't have the spec link for it, but it's a
joint spec from Intel, Cisco, and IBM).  So our userspace tool
implements the protocol, performs all the negotiation with the link
partners, and sends the configuration changes to the driver via netlink.

I hope that gives a better understanding of how the driver interface
works.  Please feel free to ask additional questions and give more
suggestions/feedback.

Cheers,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-02 11:03   ` Jeff Garzik
  2008-05-02 20:08     ` Waskiewicz Jr, Peter P
@ 2008-05-07 20:53     ` Waskiewicz Jr, Peter P
  2008-05-07 21:13       ` Jeff Garzik
  1 sibling, 1 reply; 13+ messages in thread
From: Waskiewicz Jr, Peter P @ 2008-05-07 20:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev

> Seems to me we don't want each driver supporting this 
> technology to create their own netlink interface...
> 
> This seems more appropriate via ethtool, or ethtool-netlink 
> (Thomas Graf posted an RFC)
> 
> 	Jeff
> 

Jeff,

I've given this much more thought, and have some additional feedback.
While I see your point about each driver wanting to support DCB
shouldn't have to create their own netlink interface, having the ioctl's
in ethtool for every other driver not supporting DCB isn't necessary
either.  I understand there are commands in ethtool that some drivers
don't implement, but the required commands for DCB would add a pretty
decent chunk of code into ethtool.  But for other advanced features
today, many drivers implement sysfs interfaces to support tweaking of
values outside of the ethtool umbrella.  Given this is less than a
driver-only configuration tool, but it's a tool that is configuring the
behavior of the entire network, we need one userspace tool that can
communicate to all registered devices, and netlink lends itself well to
that.

I also looked at Thomas' proposal, and it does look fine.  However, we'd
have the same issue of needing to implement all the DCB commands in
ethtool, which I'm still not totally convinced is the correct thing to
do, given how the DCB stack from userspace to the link partner works.

Our long-term goal is to implement the dcbd (userspace daemon) interface
in the kernel as a module interface, so the userspace commands interact
with it and only it directly.  Much like the mac80211 interface, which
the dcbd interface in the kernel would push the commands to registered
drivers through a common kernel interface, most likely through the
netdev.  We're not there yet, but we need to step before we run.  Hence
why the driver is using netlink today.

Please let me know what you think.

Cheers,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-07 20:53     ` Waskiewicz Jr, Peter P
@ 2008-05-07 21:13       ` Jeff Garzik
  2008-05-16 22:45         ` Waskiewicz Jr, Peter P
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2008-05-07 21:13 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: netdev

Waskiewicz Jr, Peter P wrote:
> I've given this much more thought, and have some additional feedback.
> While I see your point about each driver wanting to support DCB
> shouldn't have to create their own netlink interface, having the ioctl's
> in ethtool for every other driver not supporting DCB isn't necessary
> either.  I understand there are commands in ethtool that some drivers
> don't implement, but the required commands for DCB would add a pretty
> decent chunk of code into ethtool.  But for other advanced features
> today, many drivers implement sysfs interfaces to support tweaking of
> values outside of the ethtool umbrella.  Given this is less than a
> driver-only configuration tool, but it's a tool that is configuring the
> behavior of the entire network, we need one userspace tool that can
> communicate to all registered devices, and netlink lends itself well to
> that.
> 
> I also looked at Thomas' proposal, and it does look fine.  However, we'd
> have the same issue of needing to implement all the DCB commands in
> ethtool, which I'm still not totally convinced is the correct thing to
> do, given how the DCB stack from userspace to the link partner works.
> 
> Our long-term goal is to implement the dcbd (userspace daemon) interface
> in the kernel as a module interface, so the userspace commands interact
> with it and only it directly.  Much like the mac80211 interface, which
> the dcbd interface in the kernel would push the commands to registered
> drivers through a common kernel interface, most likely through the
> netdev.  We're not there yet, but we need to step before we run.  Hence
> why the driver is using netlink today.


If its complex enough, or doesn't fit the ioctl model well, it doesn't 
necessarily have to be via the ethtool ioctl.

Two goals I have, though, are

* the userspace ethtool utility configures this stuff.  Note I did /not/ 
say "must use ethtool ioctl."  The core idea behind ethtool is to 
centralize NIC-specific knowledge -- thus that's the place where 
chip-specific register dumping code resides.  So within reason, it's OK 
to put DCB-specific commands into ethtool that do not use the ethtool ioctl.

But like everything else in life, one must weight various costs.  Maybe 
it is complex enough to warrant a new tool.  We don't know until there's 
a design doc or review of the generic interface that will be used for DCB.

* the kernel portion can be used by other non-Intel drivers, i.e. a 
generic and separate piece.  We should not be embedding an entire 
netlink interface into each driver.

Regards,

	Jeff




^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-07 21:13       ` Jeff Garzik
@ 2008-05-16 22:45         ` Waskiewicz Jr, Peter P
  2008-05-16 23:20           ` Stephen Hemminger
  0 siblings, 1 reply; 13+ messages in thread
From: Waskiewicz Jr, Peter P @ 2008-05-16 22:45 UTC (permalink / raw)
  To: Jeff Garzik, David Miller; +Cc: netdev

> But like everything else in life, one must weight various 
> costs.  Maybe it is complex enough to warrant a new tool.  We 
> don't know until there's a design doc or review of the 
> generic interface that will be used for DCB.

100% agree.  I have a design document added to the bottom of this email
in plain text for the in-kernel design.  I can send a PDF or OpenOffice
doc if desired.  Please review it, see if that's something worth
supporting and developing, and send feedback.  If things look ok, I'll
write the in-kernel layer and send those patches asap, along with a
modified ixgbe to use the new interface.

Thanks Jeff,

-PJ Waskiewicz
<peter.p.waskiewicz.jr@intel.com>

------------------------------------------------------

Linux Kernel Interface Design for Data Center Bridging
Peter P. Waskiewicz Jr, Intel Corp.
5/16/2008

Overview:
	Data Center Bridging is a new layer 2 networking technology
targeted for data centers looking to converge certain network flows.
The technology uses 802.1p priority tagging from 802.1q, and then adds
the new 802.1Qaz Priority Grouping and 802.1Qbb Priority Flow Control
technologies.  The purpose is to provide per-priority flow control to
individual network flows without impacting other flows.  It also
provides the mechanism to enforce bandwidth allocation per flow.
	A Wikipedia article describing the network-wide model this fits
into can be found here:
http://en.wikipedia.org/wiki/Data_Center_Ethernet.

Netdev changes:

	The generic netlink interface will be added with the below
commands and attributes.  It will include a structure of function
pointers.  This structure will be part of the netdevice structure, and
will be populated by the underlying base driver, if applicable.

Structure of function pointers:

struct dcb_genl_ops *dcb_ops;

struct {
	int	(*getstate)(struct genl_info *info);
	int	(*setstate)(struct genl_info *info);
	int	(*setpgcfgtx)(struct genl_info *info);
	int	(*setpgcfgrx)(struct genl_info *info);
	int	(*getpgcfgtx)(struct genl_info *info);
	int	(*getpgcfgrx)(struct genl_info *info);
	int	(*setpfccfg)(struct genl_info *info);
	int	(*getpfccfg)(struct genl_info *info);
	int	(*setall)(struct genl_info *info);
} dcb_genl_ops;

	The entire module for all DCB netlink operations can be called
dcb_nl, and be built as a module.  The Kconfig option CONFIG_DCB can be
added, and can be used to include or omit the struct dcb_genl_ops
*dcb_ops member from the netdevice structure.

	NOTE: the (*setall) command applies all changes sent to the
driver.  This way all changes to Rx and Tx can be made to the device at
one time, and not leave the device in an unknown state.  This implies
the underlying implementation should store the new values from the
netlink interface in a temporary structure, and then apply it all when
(*setall) is invoked.

Purpose of the netlink interface:

- Provide a common interface for Data Center Bridging devices to
register into.  This includes 802.1Qaz priority grouping and 802.1Qbb
priority flow control mechanisms.
- The common interface will accept and process input from userspace and
send through to registered drivers.
- The common interface will accept and process input from registered
drivers and send to userspace.  This piece is required for the Data
Center Bridging Exchange protocol between link partners.

Commands:

	@DCB_CMD_UNDEFINED: unspecified command to catch errors
	@DCB_CMD_GSTATE: request the state of DCB in the device
	@DCB_CMD_SSTATE: set the state of DCB in the device
	@DCB_CMD_PGTX_GCFG: request the priority group configuration for
Tx
	@DCB_CMD_PGTX_SCFG: set the priority group configuration for Tx
	@DCB_CMD_PGRX_GCFG: request the priority group configuration for
Rx
	@DCB_CMD_PGRX_SCFG: set the priority group configuration for Rx
	@DCB_CMD_PFC_GCFG: request the priority flow control
configuration
	@DCB_CMD_PFC_SCFG: set the priority flow control configuration
	@DCB_CMD_SET_ALL: apply all changes to the underlying device
	@DCB_CMD_GPERM_HWADDR: get the permanent MAC address of the
underlying device.  Only useful when using bonding.

DCB configuration Attributes:

	@DCB_ATTR_UNDEFINED: unspecified attribute to catch errors
	@DCB_ATTR_IFNAME: interface name of the underlying device
(NLA_STRING)
	@DCB_ATTR_STATE: state of the DCB state machine in the device
(NLA_U8)
	@DCB_ATTR_PFC_CFG: priority flow control configuration
(NLA_NESTED)
	@DCB_ATTR_PG_CFG: priority group configuration (NLA_NESTED
	@DCB_ATTR_SET_ALL: bool to commit changes to hardware or not
(NLA_U8)
	@DCB_ATTR_PERM_HWADDR: MAC address of the physical device
(NLA_NESTED)

Permanent Hardware attributes:

	@DCB_PERM_HW_ATTR_UNDEFINED: unspecified attribute to catch
errors
	@DCB_PERM_HW_ATTR_0: MAC address from receive address 0 (NLA_U8)
	@DCB_PERM_HW_ATTR_1: MAC address from receive address 1 (NLA_U8)
	@DCB_PERM_HW_ATTR_2: MAC address from receive address 2 (NLA_U8)
	@DCB_PERM_HW_ATTR_3: MAC address from receive address 3 (NLA_U8)
	@DCB_PERM_HW_ATTR_4: MAC address from receive address 4 (NLA_U8)
	@DCB_PERM_HW_ATTR_5: MAC address from receive address 5 (NLA_U8)
	@DCB_PERM_HW_ATTR_ALL: apply to all MAC addresses (NLA_FLAG)

Priority Flow Control attributes:

	@DCB_PFC_ATTR_UP_UNDEFINED: unspecified attribute to catch
errors
	@DCB_PFC_ATTR_UP_0: Priority Flow Control setting for User
Priority 0 (NLA_U8)
	@DCB_PFC_ATTR_UP_1: Priority Flow Control setting for User
Priority 1 (NLA_U8)
	@DCB_PFC_ATTR_UP_2: Priority Flow Control setting for User
Priority 2 (NLA_U8)
	@DCB_PFC_ATTR_UP_3: Priority Flow Control setting for User
Priority 3 (NLA_U8)
	@DCB_PFC_ATTR_UP_4: Priority Flow Control setting for User
Priority 4 (NLA_U8)
	@DCB_PFC_ATTR_UP_5: Priority Flow Control setting for User
Priority 5 (NLA_U8)
	@DCB_PFC_ATTR_UP_6: Priority Flow Control setting for User
Priority 6 (NLA_U8)
	@DCB_PFC_ATTR_UP_7: Priority Flow Control setting for User
Priority 7 (NLA_U8)
	@DCB_PFC_ATTR_UP_MAX: highest attribute number currently defined
	@DCB_PFC_ATTR_UP_ALL: apply to all priority flow control
attributes (NLA_FLAG)

Priority Group Traffic Class and Bandwidth Group attributes:

	@DCB_PG_ATTR_UNDEFINED: unspecified attribute to catch errors
	@DCB_PG_ATTR_TC_0: Priority Group Traffic Class 0 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_1: Priority Group Traffic Class 1 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_2: Priority Group Traffic Class 2 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_3: Priority Group Traffic Class 3 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_4: Priority Group Traffic Class 4 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_5: Priority Group Traffic Class 5 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_6: Priority Group Traffic Class 6 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_7: Priority Group Traffic Class 7 configuration
(NLA_NESTED)
	@DCB_PG_ATTR_TC_MAX: highest attribute number currently defined
	@DCB_PG_ATTR_TC_ALL: apply to all traffic classes (NLA_NESTED)
	@DCB_PG_ATTR_BWG_0: Bandwidth group 0 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_1: Bandwidth group 1 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_2: Bandwidth group 2 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_3: Bandwidth group 3 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_4: Bandwidth group 4 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_5: Bandwidth group 5 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_6: Bandwidth group 6 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_7: Bandwidth group 7 configuration (NLA_U8)
	@DCB_PG_ATTR_BWG_MAX: highest attribute number currently defined
	@DCB_PG_ATTR_BWG_ALL: apply to all bandwidth groups (NLA_FLAG)

Traffic Class configuration attributes:

	@DCB_TC_ATTR_PARAM_UNDEFINED: unspecified attribute to catch
errors
	@DCB_TC_ATTR_PARAM_STRICT_PRIO: Type of strict bandwidth
aggregration (link strict or group strict) (NLA_U8)
	@DCB_TC_ATTR_PARAM_BW_GROUP_ID: Bandwidth group this traffic
class belongs to (NLA_U8)
	@DCB_TC_ATTR_PARAM_BW_PCT_IN_GROUP: Percentage of bandwidth in
the bandwidth group this traffic class has (NLA_U8)
	@DCB_TC_ATTR_PARAM_UP_MAPPING: Traffic class to user priority
map (NLA_U8)
	@DCB_TC_ATTR_PARAM_ALL: apply to all traffic class parameters
(NLA_FLAG

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-16 22:45         ` Waskiewicz Jr, Peter P
@ 2008-05-16 23:20           ` Stephen Hemminger
  2008-05-17  9:14             ` Waskiewicz Jr, Peter P
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2008-05-16 23:20 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: Jeff Garzik, David Miller, netdev

On Fri, 16 May 2008 15:45:11 -0700
"Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@intel.com> wrote:

> > But like everything else in life, one must weight various 
> > costs.  Maybe it is complex enough to warrant a new tool.  We 
> > don't know until there's a design doc or review of the 
> > generic interface that will be used for DCB.
> 
> 100% agree.  I have a design document added to the bottom of this email
> in plain text for the in-kernel design.  I can send a PDF or OpenOffice
> doc if desired.  Please review it, see if that's something worth
> supporting and developing, and send feedback.  If things look ok, I'll
> write the in-kernel layer and send those patches asap, along with a
> modified ixgbe to use the new interface.
> 
> Thanks Jeff,
> 
> -PJ Waskiewicz
> <peter.p.waskiewicz.jr@intel.com>
> 
> ------------------------------------------------------
> 
> Linux Kernel Interface Design for Data Center Bridging
> Peter P. Waskiewicz Jr, Intel Corp.
> 5/16/2008
> 
> Overview:
> 	Data Center Bridging is a new layer 2 networking technology
> targeted for data centers looking to converge certain network flows.
> The technology uses 802.1p priority tagging from 802.1q, and then adds
> the new 802.1Qaz Priority Grouping and 802.1Qbb Priority Flow Control
> technologies.  The purpose is to provide per-priority flow control to
> individual network flows without impacting other flows.  It also
> provides the mechanism to enforce bandwidth allocation per flow.
> 	A Wikipedia article describing the network-wide model this fits
> into can be found here:
> http://en.wikipedia.org/wiki/Data_Center_Ethernet.
> 


I wonder if doing this generically with existing bridge code would
be a better approach. It seems like the TOE problem all over again,
only this time it is Traffic Control offload. Sorry, just because
you can do it in hardware doesn't mean it is a good thing.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes.
  2008-05-16 23:20           ` Stephen Hemminger
@ 2008-05-17  9:14             ` Waskiewicz Jr, Peter P
  0 siblings, 0 replies; 13+ messages in thread
From: Waskiewicz Jr, Peter P @ 2008-05-17  9:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jeff Garzik, David Miller, netdev

> I wonder if doing this generically with existing bridge code would
> be a better approach. It seems like the TOE problem all over again,
> only this time it is Traffic Control offload. Sorry, just because
> you can do it in hardware doesn't mean it is a good thing.

How is this anything like TOE?  We're not bypassing the stack, we're not
inserting any exit points in the stack, it's just another way the
hardware can schedule traffic internally to itself.  But there needs to
be a mechanism to configure it.  We have 4 options.  1) Do nothing.
This is very unattractive for 10 GbE devices, especially since a number
of 10 GbE devices have fairly complex and useful Tx and Rx advanced
features.  2) Insert these commands into ethtool.  This doesn't make
much sense, since most drivers around won't need these entry points to
the driver.  3) Embed the interface in the driver.  Jeff G. made a good
point that this isn't sustainable for the future, which I agree with.
This is a new standard in ethernet, so we should make the generic
interface.  4) Make the generic interface captured in the design doc
from the previous email from me in this thread.

All I'm proposing is 9 entry points into the drivers via netlink for
device configuration.  I'm not asking for a stack rewrite or insertion
into the stack to bypass the existing stack.

Cheers,
-PJ Waskiewicz

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-05-17  9:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-02  0:42 [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe PJ Waskiewicz
2008-05-02  0:43 ` [PATCH 1/3] ixgbe: Add Data Center Bridging netlink listener for DCB runtime changes PJ Waskiewicz
2008-05-02 11:03   ` Jeff Garzik
2008-05-02 20:08     ` Waskiewicz Jr, Peter P
2008-05-07 20:53     ` Waskiewicz Jr, Peter P
2008-05-07 21:13       ` Jeff Garzik
2008-05-16 22:45         ` Waskiewicz Jr, Peter P
2008-05-16 23:20           ` Stephen Hemminger
2008-05-17  9:14             ` Waskiewicz Jr, Peter P
2008-05-02  0:43 ` [PATCH 2/3] ixgbe: Add DCB hardware initialization routines PJ Waskiewicz
2008-05-02  0:43 ` [PATCH 3/3] ixgbe: Enable Data Center Bridging (DCB) support PJ Waskiewicz
2008-05-02 11:19 ` [ANNOUNCE] ixgbe: Data Center Bridging (DCB) support for ixgbe Andi Kleen
2008-05-02 20:18   ` Waskiewicz Jr, Peter P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).