Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] e1000e: Change wthresh to 1 to avoid possible Tx stalls.
From: Jeff Kirsher @ 2012-06-06  8:46 UTC (permalink / raw)
  To: Hiroaki SHIMODA; +Cc: davem, denys, eric.dumazet, therbert, netdev
In-Reply-To: <20120606174355.823e9aa7.shimoda.hiroaki@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 864 bytes --]

On Wed, 2012-06-06 at 17:43 +0900, Hiroaki SHIMODA wrote:
> Denys Fedoryshchenko reported Tx stalls on e1000e with BQL enabled.
> 
> e1000e has WTHRESH which determines when Tx descripters are written
> back and successive Tx interrupts are generated, and setting WTHRESH
> to 5 gives efficient bus utilization but this cause possible Tx
> stalls,
> especially on BQL enabled system.
> 
> To avoid possible Tx stalls, change WTHRESH to 1.
> 
> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h  |    6 +++---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-) 

Thanks! I will add this to my queue of e1000e patches.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH net] e1000e: Change wthresh to 1 to avoid possible Tx stalls.
From: Eric Dumazet @ 2012-06-06  8:50 UTC (permalink / raw)
  To: Hiroaki SHIMODA; +Cc: jeffrey.t.kirsher, davem, denys, therbert, netdev
In-Reply-To: <20120606174355.823e9aa7.shimoda.hiroaki@gmail.com>

On Wed, 2012-06-06 at 17:43 +0900, Hiroaki SHIMODA wrote:
> Denys Fedoryshchenko reported Tx stalls on e1000e with BQL enabled.
> 
> e1000e has WTHRESH which determines when Tx descripters are written
> back and successive Tx interrupts are generated, and setting WTHRESH
> to 5 gives efficient bus utilization but this cause possible Tx stalls,
> especially on BQL enabled system.
> 
> To avoid possible Tx stalls, change WTHRESH to 1.
> 
> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h  |    6 +++---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)

Thanks a lot !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [net-next PATCH v2 0/3] Energy Efficient Ethernet (eee) support
From: Yuval Mintz @ 2012-06-06  8:58 UTC (permalink / raw)
  To: davem, netdev; +Cc: eilong, bhutchings, peppe.cavallaro, Yuval Mintz

Hi Dave,

This patch series adds energy efficient ethernet support for the
bnx2x driver (for new chips with appropriate phys). 
It also extends the ethtool API to enable control of the eee feature.

Another patch series has been sent to Ben to allow the ethtool application
to use this new API.

Changes from Version 1:
	Patch 1/3:
		-Added documentation to ethtool_eee struct in header.
		-Clearing the ethtool_eee struct before passing to driver.
		-Checking the driver's return value of 'get_eee' call.
	Patches 2-3/3:
		-Corrected conversion of tx_lpi_timer speeds in bnx2x.

Please consider applying it to 'net-next'.

Thanks,
Yuval Mintz

^ permalink raw reply

* [net-next PATCH v2 1/3] Added kernel support in EEE Ethtool commands
From: Yuval Mintz @ 2012-06-06  8:58 UTC (permalink / raw)
  To: davem, netdev; +Cc: eilong, bhutchings, peppe.cavallaro, Yuval Mintz
In-Reply-To: <1338973098-16439-1-git-send-email-yuvalmin@broadcom.com>

This patch extends the kernel's ethtool interface by adding support
for 2 new EEE commands - get_eee and set_eee.

Thanks goes to Giuseppe Cavallaro for his original patch adding this support.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 include/linux/ethtool.h |   32 ++++++++++++++++++++++++++++++++
 net/core/ethtool.c      |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index e17fa71..6250e1f 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -137,6 +137,32 @@ struct ethtool_eeprom {
 };
 
 /**
+ * struct ethtool_eee - Energy Efficient Ethernet information
+ * @cmd: ETHTOOL_{G,S}EEE
+ * @supported: Link speeds for which there is eee support.
+ * @advertised: Link speeds the interface advertises (AN) as eee capable.
+ * @lp_advertised: Link speeds the link partner advertised as eee capable.
+ * @eee_active: Result of the eee auto negotiation.
+ * @eee_enabled: EEE configured mode (enabled/disabled).
+ * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given
+ *	that eee was negotiated.
+ * @tx_lpi_timer: Time in microseconds the interface delays prior to asserting
+ *	its tx lpi (after reaching 'idle' state). Effective only when eee
+ *	was negotiated and tx_lpi_enabled was set.
+ */
+struct ethtool_eee {
+	__u32	cmd;
+	__u32	supported;
+	__u32	advertised;
+	__u32	lp_advertised;
+	__u32	eee_active;
+	__u32	eee_enabled;
+	__u32	tx_lpi_enabled;
+	__u32	tx_lpi_timer;
+	__u32	reserved[2];
+};
+
+/**
  * struct ethtool_modinfo - plugin module eeprom information
  * @cmd: %ETHTOOL_GMODULEINFO
  * @type: Standard the module information conforms to %ETH_MODULE_SFF_xxxx
@@ -945,6 +971,8 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 n_rx_rings)
  * @get_module_info: Get the size and type of the eeprom contained within
  *	a plug-in module.
  * @get_module_eeprom: Get the eeprom information from the plug-in module
+ * @get_eee: Get Energy-Efficient (EEE) supported and status.
+ * @set_eee: Set EEE status (enable/disable) as well as LPI timers.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -1011,6 +1039,8 @@ struct ethtool_ops {
 				   struct ethtool_modinfo *);
 	int     (*get_module_eeprom)(struct net_device *,
 				     struct ethtool_eeprom *, u8 *);
+	int	(*get_eee)(struct net_device *, struct ethtool_eee *);
+	int	(*set_eee)(struct net_device *, struct ethtool_eee *);
 
 
 };
@@ -1089,6 +1119,8 @@ struct ethtool_ops {
 #define ETHTOOL_GET_TS_INFO	0x00000041 /* Get time stamping and PHC info */
 #define ETHTOOL_GMODULEINFO	0x00000042 /* Get plug-in module information */
 #define ETHTOOL_GMODULEEEPROM	0x00000043 /* Get plug-in module eeprom */
+#define ETHTOOL_GEEE		0x00000044 /* Get EEE settings */
+#define ETHTOOL_SEEE		0x00000045 /* Set EEE settings */
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9c2afb4..5a582da 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -729,6 +729,40 @@ static int ethtool_set_wol(struct net_device *dev, char __user *useraddr)
 	return dev->ethtool_ops->set_wol(dev, &wol);
 }
 
+static int ethtool_get_eee(struct net_device *dev, char __user *useraddr)
+{
+	struct ethtool_eee edata;
+	int rc;
+
+	if (!dev->ethtool_ops->get_eee)
+		return -EOPNOTSUPP;
+
+	memset(&edata, 0, sizeof(struct ethtool_eee));
+	edata.cmd = ETHTOOL_GEEE;
+	rc = dev->ethtool_ops->get_eee(dev, &edata);
+
+	if (rc)
+		return rc;
+
+	if (copy_to_user(useraddr, &edata, sizeof(edata)))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int ethtool_set_eee(struct net_device *dev, char __user *useraddr)
+{
+	struct ethtool_eee edata;
+
+	if (!dev->ethtool_ops->get_eee)
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&edata, useraddr, sizeof(edata)))
+		return -EFAULT;
+
+	return dev->ethtool_ops->set_eee(dev, &edata);
+}
+
 static int ethtool_nway_reset(struct net_device *dev)
 {
 	if (!dev->ethtool_ops->nway_reset)
@@ -1471,6 +1505,12 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 		rc = ethtool_set_value_void(dev, useraddr,
 				       dev->ethtool_ops->set_msglevel);
 		break;
+	case ETHTOOL_GEEE:
+		rc = ethtool_get_eee(dev, useraddr);
+		break;
+	case ETHTOOL_SEEE:
+		rc = ethtool_set_eee(dev, useraddr);
+		break;
 	case ETHTOOL_NWAY_RST:
 		rc = ethtool_nway_reset(dev);
 		break;
-- 
1.7.9.rc2

^ permalink raw reply related

* [net-next PATCH v2 2/3] bnx2x: Added EEE support
From: Yuval Mintz @ 2012-06-06  8:58 UTC (permalink / raw)
  To: davem, netdev; +Cc: eilong, bhutchings, peppe.cavallaro, Yuval Mintz
In-Reply-To: <1338973098-16439-1-git-send-email-yuvalmin@broadcom.com>

This patch adds energy efficient energy support (802.3az) to bnx2x
boards with 84833 phys (and sufficiently new BC and external FW).

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h   |   61 ++++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c  |  323 ++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h  |   26 ++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |   23 ++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h   |  123 ++++++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c |    4 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h |    2 +
 7 files changed, 552 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
index a440a8b..c61aa37 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_hsi.h
@@ -1067,8 +1067,18 @@ struct port_feat_cfg {		    /* port 0: 0x454  port 1: 0x4c8 */
 	   uses the same defines as link_config */
 	u32 mfw_wol_link_cfg2;				    /* 0x480 */
 
-	u32 Reserved2[17];				    /* 0x484 */
 
+	/*  EEE power saving mode */
+	u32 eee_power_mode;                                 /* 0x484 */
+	#define PORT_FEAT_CFG_EEE_POWER_MODE_MASK                     0x000000FF
+	#define PORT_FEAT_CFG_EEE_POWER_MODE_SHIFT                    0
+	#define PORT_FEAT_CFG_EEE_POWER_MODE_DISABLED                 0x00000000
+	#define PORT_FEAT_CFG_EEE_POWER_MODE_BALANCED                 0x00000001
+	#define PORT_FEAT_CFG_EEE_POWER_MODE_AGGRESSIVE               0x00000002
+	#define PORT_FEAT_CFG_EEE_POWER_MODE_LOW_LATENCY              0x00000003
+
+
+	u32 Reserved2[16];                                  /* 0x488 */
 };
 
 
@@ -1255,6 +1265,8 @@ struct drv_func_mb {
 	#define DRV_MSG_CODE_DRV_INFO_ACK               0xd8000000
 	#define DRV_MSG_CODE_DRV_INFO_NACK              0xd9000000
 
+	#define DRV_MSG_CODE_EEE_RESULTS_ACK            0xda000000
+
 	#define DRV_MSG_CODE_SET_MF_BW                  0xe0000000
 	#define REQ_BC_VER_4_SET_MF_BW                  0x00060202
 	#define DRV_MSG_CODE_SET_MF_BW_ACK              0xe1000000
@@ -1320,6 +1332,8 @@ struct drv_func_mb {
 	#define FW_MSG_CODE_DRV_INFO_ACK                0xd8100000
 	#define FW_MSG_CODE_DRV_INFO_NACK               0xd9100000
 
+	#define FW_MSG_CODE_EEE_RESULS_ACK              0xda100000
+
 	#define FW_MSG_CODE_SET_MF_BW_SENT              0xe0000000
 	#define FW_MSG_CODE_SET_MF_BW_DONE              0xe1000000
 
@@ -1383,6 +1397,8 @@ struct drv_func_mb {
 
 	#define DRV_STATUS_DRV_INFO_REQ                 0x04000000
 
+	#define DRV_STATUS_EEE_NEGOTIATION_RESULTS      0x08000000
+
 	u32 virt_mac_upper;
 	#define VIRT_MAC_SIGN_MASK                      0xffff0000
 	#define VIRT_MAC_SIGNATURE                      0x564d0000
@@ -1613,6 +1629,11 @@ struct fw_flr_mb {
 	struct fw_flr_ack ack;
 };
 
+struct eee_remote_vals {
+	u32         tx_tw;
+	u32         rx_tw;
+};
+
 /**** SUPPORT FOR SHMEM ARRRAYS ***
  * The SHMEM HSI is aligned on 32 bit boundaries which makes it difficult to
  * define arrays with storage types smaller then unsigned dwords.
@@ -2053,6 +2074,41 @@ struct shmem2_region {
 #define DRV_INFO_CONTROL_OP_CODE_MASK      0x0000ff00
 #define DRV_INFO_CONTROL_OP_CODE_SHIFT     8
 	u32 ibft_host_addr; /* initialized by option ROM */
+	struct eee_remote_vals eee_remote_vals[PORT_MAX];
+	u32 reserved[E2_FUNC_MAX];
+
+
+	/* the status of EEE auto-negotiation
+	 * bits 15:0 the configured tx-lpi entry timer value. Depends on bit 31.
+	 * bits 19:16 the supported modes for EEE.
+	 * bits 23:20 the speeds advertised for EEE.
+	 * bits 27:24 the speeds the Link partner advertised for EEE.
+	 * The supported/adv. modes in bits 27:19 originate from the
+	 * SHMEM_EEE_XXX_ADV definitions (where XXX is replaced by speed).
+	 * bit 28 when 1'b1 EEE was requested.
+	 * bit 29 when 1'b1 tx lpi was requested.
+	 * bit 30 when 1'b1 EEE was negotiated. Tx lpi will be asserted iff
+	 * 30:29 are 2'b11.
+	 * bit 31 when 1'b0 bits 15:0 contain a PORT_FEAT_CFG_EEE_ define as
+	 * value. When 1'b1 those bits contains a value times 16 microseconds.
+	 */
+	u32 eee_status[PORT_MAX];
+	#define SHMEM_EEE_TIMER_MASK		   0x0000ffff
+	#define SHMEM_EEE_SUPPORTED_MASK	   0x000f0000
+	#define SHMEM_EEE_SUPPORTED_SHIFT	   16
+	#define SHMEM_EEE_ADV_STATUS_MASK	   0x00f00000
+		#define SHMEM_EEE_100M_ADV	   (1<<0)
+		#define SHMEM_EEE_1G_ADV	   (1<<1)
+		#define SHMEM_EEE_10G_ADV	   (1<<2)
+	#define SHMEM_EEE_ADV_STATUS_SHIFT	   20
+	#define	SHMEM_EEE_LP_ADV_STATUS_MASK	   0x0f000000
+	#define SHMEM_EEE_LP_ADV_STATUS_SHIFT	   24
+	#define SHMEM_EEE_REQUESTED_BIT		   0x10000000
+	#define SHMEM_EEE_LPI_REQUESTED_BIT	   0x20000000
+	#define SHMEM_EEE_ACTIVE_BIT		   0x40000000
+	#define SHMEM_EEE_TIME_OUTPUT_BIT	   0x80000000
+
+	u32 sizeof_port_stats;
 };
 
 
@@ -2599,6 +2655,9 @@ struct host_port_stats {
 	u32            pfc_frames_tx_lo;
 	u32            pfc_frames_rx_hi;
 	u32            pfc_frames_rx_lo;
+
+	u32            eee_lpi_count_hi;
+	u32            eee_lpi_count_lo;
 };
 
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
index a3fb721..c7c814d 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
@@ -1305,6 +1305,94 @@ int bnx2x_ets_strict(const struct link_params *params, const u8 strict_cos)
 
 	return 0;
 }
+
+/******************************************************************/
+/*			EEE section				   */
+/******************************************************************/
+static u8 bnx2x_eee_has_cap(struct link_params *params)
+{
+	struct bnx2x *bp = params->bp;
+
+	if (REG_RD(bp, params->shmem2_base) <=
+		   offsetof(struct shmem2_region, eee_status[params->port]))
+		return 0;
+
+	return 1;
+}
+
+static int bnx2x_eee_nvram_to_time(u32 nvram_mode, u32 *idle_timer)
+{
+	switch (nvram_mode) {
+	case PORT_FEAT_CFG_EEE_POWER_MODE_BALANCED:
+		*idle_timer = EEE_MODE_NVRAM_BALANCED_TIME;
+		break;
+	case PORT_FEAT_CFG_EEE_POWER_MODE_AGGRESSIVE:
+		*idle_timer = EEE_MODE_NVRAM_AGGRESSIVE_TIME;
+		break;
+	case PORT_FEAT_CFG_EEE_POWER_MODE_LOW_LATENCY:
+		*idle_timer = EEE_MODE_NVRAM_LATENCY_TIME;
+		break;
+	default:
+		*idle_timer = 0;
+		break;
+	}
+
+	return 0;
+}
+
+static int bnx2x_eee_time_to_nvram(u32 idle_timer, u32 *nvram_mode)
+{
+	switch (idle_timer) {
+	case EEE_MODE_NVRAM_BALANCED_TIME:
+		*nvram_mode = PORT_FEAT_CFG_EEE_POWER_MODE_BALANCED;
+		break;
+	case EEE_MODE_NVRAM_AGGRESSIVE_TIME:
+		*nvram_mode = PORT_FEAT_CFG_EEE_POWER_MODE_AGGRESSIVE;
+		break;
+	case EEE_MODE_NVRAM_LATENCY_TIME:
+		*nvram_mode = PORT_FEAT_CFG_EEE_POWER_MODE_LOW_LATENCY;
+		break;
+	default:
+		*nvram_mode = PORT_FEAT_CFG_EEE_POWER_MODE_DISABLED;
+		break;
+	}
+
+	return 0;
+}
+
+static u32 bnx2x_eee_calc_timer(struct link_params *params)
+{
+	u32 eee_mode, eee_idle;
+	struct bnx2x *bp = params->bp;
+
+	if (params->eee_mode & EEE_MODE_OVERRIDE_NVRAM) {
+		if (params->eee_mode & EEE_MODE_OUTPUT_TIME) {
+			/* time value in eee_mode --> used directly*/
+			eee_idle = params->eee_mode & EEE_MODE_TIMER_MASK;
+		} else {
+			/* hsi value in eee_mode --> time */
+			if (bnx2x_eee_nvram_to_time(params->eee_mode &
+						    EEE_MODE_NVRAM_MASK,
+						    &eee_idle))
+				return 0;
+		}
+	} else {
+		/* hsi values in nvram --> time*/
+		eee_mode = ((REG_RD(bp, params->shmem_base +
+				    offsetof(struct shmem_region, dev_info.
+				    port_feature_config[params->port].
+				    eee_power_mode)) &
+			     PORT_FEAT_CFG_EEE_POWER_MODE_MASK) >>
+			    PORT_FEAT_CFG_EEE_POWER_MODE_SHIFT);
+
+		if (bnx2x_eee_nvram_to_time(eee_mode, &eee_idle))
+			return 0;
+	}
+
+	return eee_idle;
+}
+
+
 /******************************************************************/
 /*			PFC section				  */
 /******************************************************************/
@@ -1729,6 +1817,14 @@ static int bnx2x_xmac_enable(struct link_params *params,
 	/* update PFC */
 	bnx2x_update_pfc_xmac(params, vars, 0);
 
+	if (vars->eee_status & SHMEM_EEE_ADV_STATUS_MASK) {
+		DP(NETIF_MSG_LINK, "Setting XMAC for EEE\n");
+		REG_WR(bp, xmac_base + XMAC_REG_EEE_TIMERS_HI, 0x1380008);
+		REG_WR(bp, xmac_base + XMAC_REG_EEE_CTRL, 0x1);
+	} else {
+		REG_WR(bp, xmac_base + XMAC_REG_EEE_CTRL, 0x0);
+	}
+
 	/* Enable TX and RX */
 	val = XMAC_CTRL_REG_TX_EN | XMAC_CTRL_REG_RX_EN;
 
@@ -2439,6 +2535,16 @@ static void bnx2x_update_mng(struct link_params *params, u32 link_status)
 			port_mb[params->port].link_status), link_status);
 }
 
+static void bnx2x_update_mng_eee(struct link_params *params, u32 eee_status)
+{
+	struct bnx2x *bp = params->bp;
+
+	if (bnx2x_eee_has_cap(params))
+		REG_WR(bp, params->shmem2_base +
+		       offsetof(struct shmem2_region,
+				eee_status[params->port]), eee_status);
+}
+
 static void bnx2x_update_pfc_nig(struct link_params *params,
 		struct link_vars *vars,
 		struct bnx2x_nig_brb_pfc_port_params *nig_params)
@@ -3950,6 +4056,20 @@ static void bnx2x_warpcore_set_10G_XFI(struct bnx2x_phy *phy,
 	bnx2x_cl45_write(bp, phy, MDIO_WC_DEVAD,
 			 MDIO_WC_REG_DIGITAL4_MISC3, val | 0x8080);
 
+	/* Enable LPI pass through */
+	if ((params->eee_mode & EEE_MODE_ADV_LPI) &&
+	    (phy->flags & FLAGS_EEE_10GBT) &&
+	    (!(params->eee_mode & EEE_MODE_ENABLE_LPI) ||
+	      bnx2x_eee_calc_timer(params)) &&
+	    (params->req_duplex[bnx2x_phy_selection(params)] == DUPLEX_FULL)) {
+		DP(NETIF_MSG_LINK, "Configure WC for LPI pass through\n");
+		bnx2x_cl45_write(bp, phy, MDIO_WC_DEVAD,
+				 MDIO_WC_REG_EEE_COMBO_CONTROL0,
+				 0x7c);
+		bnx2x_cl45_read_or_write(bp, phy, MDIO_WC_DEVAD,
+					 MDIO_WC_REG_DIGITAL4_MISC5, 0xc000);
+	}
+
 	/* 10G XFI Full Duplex */
 	bnx2x_cl45_write(bp, phy, MDIO_WC_DEVAD,
 			 MDIO_WC_REG_IEEE0BLK_MIICNTL, 0x100);
@@ -6462,6 +6582,15 @@ static int bnx2x_update_link_down(struct link_params *params,
 	       (MISC_REGISTERS_RESET_REG_2_RST_BMAC0 << port));
 	}
 	if (CHIP_IS_E3(bp)) {
+		REG_WR(bp, MISC_REG_CPMU_LP_FW_ENABLE_P0 + (params->port << 2),
+		       0);
+		REG_WR(bp, MISC_REG_CPMU_LP_DR_ENABLE, 0);
+		REG_WR(bp, MISC_REG_CPMU_LP_MASK_ENT_P0 + (params->port << 2),
+		       0);
+		vars->eee_status &= ~(SHMEM_EEE_LP_ADV_STATUS_MASK |
+				      SHMEM_EEE_ACTIVE_BIT);
+
+		bnx2x_update_mng_eee(params, vars->eee_status);
 		bnx2x_xmac_disable(params);
 		bnx2x_umac_disable(params);
 	}
@@ -6501,6 +6630,16 @@ static int bnx2x_update_link_up(struct link_params *params,
 			bnx2x_umac_enable(params, vars, 0);
 		bnx2x_set_led(params, vars,
 			      LED_MODE_OPER, vars->line_speed);
+
+		if ((vars->eee_status & SHMEM_EEE_ACTIVE_BIT) &&
+		    (vars->eee_status & SHMEM_EEE_LPI_REQUESTED_BIT)) {
+			DP(NETIF_MSG_LINK, "Enabling LPI assertion\n");
+			REG_WR(bp, MISC_REG_CPMU_LP_FW_ENABLE_P0 +
+			       (params->port << 2), 1);
+			REG_WR(bp, MISC_REG_CPMU_LP_DR_ENABLE, 1);
+			REG_WR(bp, MISC_REG_CPMU_LP_MASK_ENT_P0 +
+			       (params->port << 2), 0xfc20);
+		}
 	}
 	if ((CHIP_IS_E1x(bp) ||
 	     CHIP_IS_E2(bp))) {
@@ -6538,7 +6677,7 @@ static int bnx2x_update_link_up(struct link_params *params,
 
 	/* update shared memory */
 	bnx2x_update_mng(params, vars->link_status);
-
+	bnx2x_update_mng_eee(params, vars->eee_status);
 	/* Check remote fault */
 	for (phy_idx = INT_PHY; phy_idx < MAX_PHYS; phy_idx++) {
 		if (params->phy[phy_idx].flags & FLAGS_TX_ERROR_CHECK) {
@@ -6582,6 +6721,8 @@ int bnx2x_link_update(struct link_params *params, struct link_vars *vars)
 		phy_vars[phy_index].phy_link_up = 0;
 		phy_vars[phy_index].link_up = 0;
 		phy_vars[phy_index].fault_detected = 0;
+		/* different consideration, since vars holds inner state */
+		phy_vars[phy_index].eee_status = vars->eee_status;
 	}
 
 	if (USES_WARPCORE(bp))
@@ -6711,6 +6852,9 @@ int bnx2x_link_update(struct link_params *params, struct link_vars *vars)
 			vars->link_status |= LINK_STATUS_SERDES_LINK;
 		else
 			vars->link_status &= ~LINK_STATUS_SERDES_LINK;
+
+		vars->eee_status = phy_vars[active_external_phy].eee_status;
+
 		DP(NETIF_MSG_LINK, "Active external phy selected: %x\n",
 			   active_external_phy);
 	}
@@ -9579,9 +9723,9 @@ static int bnx2x_8481_config_init(struct bnx2x_phy *phy,
 static int bnx2x_84833_cmd_hdlr(struct bnx2x_phy *phy,
 				   struct link_params *params,
 		   u16 fw_cmd,
-		   u16 cmd_args[])
+		   u16 cmd_args[], int argc)
 {
-	u32 idx;
+	int idx;
 	u16 val;
 	struct bnx2x *bp = params->bp;
 	/* Write CMD_OPEN_OVERRIDE to STATUS reg */
@@ -9601,7 +9745,7 @@ static int bnx2x_84833_cmd_hdlr(struct bnx2x_phy *phy,
 	}
 
 	/* Prepare argument(s) and issue command */
-	for (idx = 0; idx < PHY84833_CMDHDLR_MAX_ARGS; idx++) {
+	for (idx = 0; idx < argc; idx++) {
 		bnx2x_cl45_write(bp, phy, MDIO_CTL_DEVAD,
 				MDIO_84833_CMD_HDLR_DATA1 + idx,
 				cmd_args[idx]);
@@ -9622,7 +9766,7 @@ static int bnx2x_84833_cmd_hdlr(struct bnx2x_phy *phy,
 		return -EINVAL;
 	}
 	/* Gather returning data */
-	for (idx = 0; idx < PHY84833_CMDHDLR_MAX_ARGS; idx++) {
+	for (idx = 0; idx < argc; idx++) {
 		bnx2x_cl45_read(bp, phy, MDIO_CTL_DEVAD,
 				MDIO_84833_CMD_HDLR_DATA1 + idx,
 				&cmd_args[idx]);
@@ -9656,7 +9800,7 @@ static int bnx2x_84833_pair_swap_cfg(struct bnx2x_phy *phy,
 	data[1] = (u16)pair_swap;
 
 	status = bnx2x_84833_cmd_hdlr(phy, params,
-		PHY84833_CMD_SET_PAIR_SWAP, data);
+		PHY84833_CMD_SET_PAIR_SWAP, data, PHY84833_CMDHDLR_MAX_ARGS);
 	if (status == 0)
 		DP(NETIF_MSG_LINK, "Pairswap OK, val=0x%x\n", data[1]);
 
@@ -9734,6 +9878,95 @@ static int bnx2x_84833_hw_reset_phy(struct bnx2x_phy *phy,
 	return 0;
 }
 
+static int bnx2x_8483x_eee_timers(struct link_params *params,
+				   struct link_vars *vars)
+{
+	u32 eee_idle = 0, eee_mode;
+	struct bnx2x *bp = params->bp;
+
+	eee_idle = bnx2x_eee_calc_timer(params);
+
+	if (eee_idle) {
+		REG_WR(bp, MISC_REG_CPMU_LP_IDLE_THR_P0 + (params->port << 2),
+		       eee_idle);
+	} else if ((params->eee_mode & EEE_MODE_ENABLE_LPI) &&
+		   (params->eee_mode & EEE_MODE_OVERRIDE_NVRAM) &&
+		   (params->eee_mode & EEE_MODE_OUTPUT_TIME)) {
+		DP(NETIF_MSG_LINK, "Error: Tx LPI is enabled with timer 0\n");
+		return -EINVAL;
+	}
+
+	vars->eee_status &= ~(SHMEM_EEE_TIMER_MASK | SHMEM_EEE_TIME_OUTPUT_BIT);
+	if (params->eee_mode & EEE_MODE_OUTPUT_TIME) {
+		/* eee_idle in 1u --> eee_status in 16u */
+		eee_idle >>= 4;
+		vars->eee_status |= (eee_idle & SHMEM_EEE_TIMER_MASK) |
+				    SHMEM_EEE_TIME_OUTPUT_BIT;
+	} else {
+		if (bnx2x_eee_time_to_nvram(eee_idle, &eee_mode))
+			return -EINVAL;
+		vars->eee_status |= eee_mode;
+	}
+
+	return 0;
+}
+
+static int bnx2x_8483x_disable_eee(struct bnx2x_phy *phy,
+				   struct link_params *params,
+				   struct link_vars *vars)
+{
+	int rc;
+	struct bnx2x *bp = params->bp;
+	u16 cmd_args = 0;
+
+	DP(NETIF_MSG_LINK, "Don't Advertise 10GBase-T EEE\n");
+
+	/* Make Certain LPI is disabled */
+	REG_WR(bp, MISC_REG_CPMU_LP_FW_ENABLE_P0 + (params->port << 2), 0);
+	REG_WR(bp, MISC_REG_CPMU_LP_DR_ENABLE, 0);
+
+	/* Prevent Phy from working in EEE and advertising it */
+	rc = bnx2x_84833_cmd_hdlr(phy, params,
+		PHY84833_CMD_SET_EEE_MODE, &cmd_args, 1);
+	if (rc != 0) {
+		DP(NETIF_MSG_LINK, "EEE disable failed.\n");
+		return rc;
+	}
+
+	bnx2x_cl45_write(bp, phy, MDIO_AN_DEVAD, MDIO_AN_REG_EEE_ADV, 0);
+	vars->eee_status &= ~SHMEM_EEE_ADV_STATUS_MASK;
+
+	return 0;
+}
+
+static int bnx2x_8483x_enable_eee(struct bnx2x_phy *phy,
+				   struct link_params *params,
+				   struct link_vars *vars)
+{
+	int rc;
+	struct bnx2x *bp = params->bp;
+	u16 cmd_args = 1;
+
+	DP(NETIF_MSG_LINK, "Advertise 10GBase-T EEE\n");
+
+	rc = bnx2x_84833_cmd_hdlr(phy, params,
+		PHY84833_CMD_SET_EEE_MODE, &cmd_args, 1);
+	if (rc != 0) {
+		DP(NETIF_MSG_LINK, "EEE enable failed.\n");
+		return rc;
+	}
+
+	bnx2x_cl45_write(bp, phy, MDIO_AN_DEVAD, MDIO_AN_REG_EEE_ADV, 0x8);
+
+	/* Mask events preventing LPI generation */
+	REG_WR(bp, MISC_REG_CPMU_LP_MASK_EXT_P0 + (params->port << 2), 0xfc20);
+
+	vars->eee_status &= ~SHMEM_EEE_ADV_STATUS_MASK;
+	vars->eee_status |= (SHMEM_EEE_10G_ADV << SHMEM_EEE_ADV_STATUS_SHIFT);
+
+	return 0;
+}
+
 #define PHY84833_CONSTANT_LATENCY 1193
 static int bnx2x_848x3_config_init(struct bnx2x_phy *phy,
 				   struct link_params *params,
@@ -9833,7 +10066,8 @@ static int bnx2x_848x3_config_init(struct bnx2x_phy *phy,
 		cmd_args[2] = PHY84833_CONSTANT_LATENCY + 1;
 		cmd_args[3] = PHY84833_CONSTANT_LATENCY;
 		rc = bnx2x_84833_cmd_hdlr(phy, params,
-			PHY84833_CMD_SET_EEE_MODE, cmd_args);
+			PHY84833_CMD_SET_EEE_MODE, cmd_args,
+			PHY84833_CMDHDLR_MAX_ARGS);
 		if (rc != 0)
 			DP(NETIF_MSG_LINK, "Cfg AutogrEEEn failed.\n");
 	}
@@ -9858,6 +10092,48 @@ static int bnx2x_848x3_config_init(struct bnx2x_phy *phy,
 				 MDIO_CTL_REG_84823_USER_CTRL_REG, val);
 	}
 
+	bnx2x_cl45_read(bp, phy, MDIO_CTL_DEVAD,
+			MDIO_84833_TOP_CFG_FW_REV, &val);
+
+	/* Configure EEE support */
+	if ((val >= MDIO_84833_TOP_CFG_FW_EEE) && bnx2x_eee_has_cap(params)) {
+		phy->flags |= FLAGS_EEE_10GBT;
+		vars->eee_status |= SHMEM_EEE_10G_ADV <<
+				    SHMEM_EEE_SUPPORTED_SHIFT;
+		/* Propogate params' bits --> vars (for migration exposure) */
+		if (params->eee_mode & EEE_MODE_ENABLE_LPI)
+			vars->eee_status |= SHMEM_EEE_LPI_REQUESTED_BIT;
+		else
+			vars->eee_status &= ~SHMEM_EEE_LPI_REQUESTED_BIT;
+
+		if (params->eee_mode & EEE_MODE_ADV_LPI)
+			vars->eee_status |= SHMEM_EEE_REQUESTED_BIT;
+		else
+			vars->eee_status &= ~SHMEM_EEE_REQUESTED_BIT;
+
+		rc = bnx2x_8483x_eee_timers(params, vars);
+		if (rc != 0) {
+			DP(NETIF_MSG_LINK, "Failed to configure EEE timers\n");
+			bnx2x_8483x_disable_eee(phy, params, vars);
+			return rc;
+		}
+
+		if ((params->req_duplex[actual_phy_selection] == DUPLEX_FULL) &&
+		    (params->eee_mode & EEE_MODE_ADV_LPI) &&
+		    (bnx2x_eee_calc_timer(params) ||
+		     !(params->eee_mode & EEE_MODE_ENABLE_LPI)))
+			rc = bnx2x_8483x_enable_eee(phy, params, vars);
+		else
+			rc = bnx2x_8483x_disable_eee(phy, params, vars);
+		if (rc != 0) {
+			DP(NETIF_MSG_LINK, "Failed to set EEE advertisment\n");
+			return rc;
+		}
+	} else {
+		phy->flags &= ~FLAGS_EEE_10GBT;
+		vars->eee_status &= ~SHMEM_EEE_SUPPORTED_MASK;
+	}
+
 	if (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) {
 		/* Bring PHY out of super isolate mode as the final step. */
 		bnx2x_cl45_read(bp, phy,
@@ -9989,6 +10265,31 @@ static u8 bnx2x_848xx_read_status(struct bnx2x_phy *phy,
 		if (val & (1<<11))
 			vars->link_status |=
 				LINK_STATUS_LINK_PARTNER_10GXFD_CAPABLE;
+
+		/* Determine if EEE was negotiated */
+		if (phy->type == PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM84833) {
+			u32 eee_shmem = 0;
+
+			bnx2x_cl45_read(bp, phy, MDIO_AN_DEVAD,
+					MDIO_AN_REG_EEE_ADV, &val1);
+			bnx2x_cl45_read(bp, phy, MDIO_AN_DEVAD,
+					MDIO_AN_REG_LP_EEE_ADV, &val2);
+			if ((val1 & val2) & 0x8) {
+				DP(NETIF_MSG_LINK, "EEE negotiated\n");
+				vars->eee_status |= SHMEM_EEE_ACTIVE_BIT;
+			}
+
+			if (val2 & 0x12)
+				eee_shmem |= SHMEM_EEE_100M_ADV;
+			if (val2 & 0x4)
+				eee_shmem |= SHMEM_EEE_1G_ADV;
+			if (val2 & 0x68)
+				eee_shmem |= SHMEM_EEE_10G_ADV;
+
+			vars->eee_status &= ~SHMEM_EEE_LP_ADV_STATUS_MASK;
+			vars->eee_status |= (eee_shmem <<
+					     SHMEM_EEE_LP_ADV_STATUS_SHIFT);
+		}
 	}
 
 	return link_up;
@@ -11243,7 +11544,8 @@ static struct bnx2x_phy phy_84833 = {
 	.def_md_devad	= 0,
 	.flags		= (FLAGS_FAN_FAILURE_DET_REQ |
 			   FLAGS_REARM_LATCH_SIGNAL |
-			   FLAGS_TX_ERROR_CHECK),
+			   FLAGS_TX_ERROR_CHECK |
+			   FLAGS_EEE_10GBT),
 	.rx_preemphasis	= {0xffff, 0xffff, 0xffff, 0xffff},
 	.tx_preemphasis	= {0xffff, 0xffff, 0xffff, 0xffff},
 	.mdio_ctrl	= 0,
@@ -12011,6 +12313,8 @@ int bnx2x_phy_init(struct link_params *params, struct link_vars *vars)
 		break;
 	}
 	bnx2x_update_mng(params, vars->link_status);
+
+	bnx2x_update_mng_eee(params, vars->eee_status);
 	return 0;
 }
 
@@ -12023,6 +12327,9 @@ int bnx2x_link_reset(struct link_params *params, struct link_vars *vars,
 	/* disable attentions */
 	vars->link_status = 0;
 	bnx2x_update_mng(params, vars->link_status);
+	vars->eee_status &= ~(SHMEM_EEE_LP_ADV_STATUS_MASK |
+			      SHMEM_EEE_ACTIVE_BIT);
+	bnx2x_update_mng_eee(params, vars->eee_status);
 	bnx2x_bits_dis(bp, NIG_REG_MASK_INTERRUPT_PORT0 + port*4,
 		       (NIG_MASK_XGXS0_LINK_STATUS |
 			NIG_MASK_XGXS0_LINK10G |
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
index ea4371f..e920800 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.h
@@ -149,6 +149,7 @@ struct bnx2x_phy {
 #define FLAGS_DUMMY_READ		(1<<9)
 #define FLAGS_MDC_MDIO_WA_B0		(1<<10)
 #define FLAGS_TX_ERROR_CHECK		(1<<12)
+#define FLAGS_EEE_10GBT			(1<<13)
 
 	/* preemphasis values for the rx side */
 	u16 rx_preemphasis[4];
@@ -265,6 +266,30 @@ struct link_params {
 	u8 num_phys;
 
 	u8 rsrv;
+
+	/* Used to configure the EEE Tx LPI timer, has several modes of
+	 * operation, according to bits 29:28 -
+	 * 2'b00: Timer will be configured by nvram, output will be the value
+	 *        from nvram.
+	 * 2'b01: Timer will be configured by nvram, output will be in
+	 *        microseconds.
+	 * 2'b10: bits 1:0 contain an nvram value which will be used instead
+	 *        of the one located in the nvram. Output will be that value.
+	 * 2'b11: bits 19:0 contain the idle timer in microseconds; output
+	 *        will be in microseconds.
+	 * Bits 31:30 should be 2'b11 in order for EEE to be enabled.
+	 */
+	u32 eee_mode;
+#define EEE_MODE_NVRAM_BALANCED_TIME		(0xa00)
+#define EEE_MODE_NVRAM_AGGRESSIVE_TIME		(0x100)
+#define EEE_MODE_NVRAM_LATENCY_TIME		(0x6000)
+#define EEE_MODE_NVRAM_MASK		(0x3)
+#define EEE_MODE_TIMER_MASK		(0xfffff)
+#define EEE_MODE_OUTPUT_TIME		(1<<28)
+#define EEE_MODE_OVERRIDE_NVRAM		(1<<29)
+#define EEE_MODE_ENABLE_LPI		(1<<30)
+#define EEE_MODE_ADV_LPI			(1<<31)
+
 	u16 hw_led_mode; /* part of the hw_config read from the shmem */
 	u32 multi_phy_config;
 
@@ -301,6 +326,7 @@ struct link_vars {
 
 	/* The same definitions as the shmem parameter */
 	u32 link_status;
+	u32 eee_status;
 	u8 fault_detected;
 	u8 rsrv1;
 	u16 periodic_flags;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index f755a66..a622bb7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -3176,6 +3176,12 @@ static void bnx2x_set_mf_bw(struct bnx2x *bp)
 	bnx2x_fw_command(bp, DRV_MSG_CODE_SET_MF_BW_ACK, 0);
 }
 
+static void bnx2x_handle_eee_event(struct bnx2x *bp)
+{
+	DP(BNX2X_MSG_MCP, "EEE - LLDP event\n");
+	bnx2x_fw_command(bp, DRV_MSG_CODE_EEE_RESULTS_ACK, 0);
+}
+
 static void bnx2x_handle_drv_info_req(struct bnx2x *bp)
 {
 	enum drv_info_opcode op_code;
@@ -3742,6 +3748,8 @@ static void bnx2x_attn_int_deasserted3(struct bnx2x *bp, u32 attn)
 			if (val & DRV_STATUS_AFEX_EVENT_MASK)
 				bnx2x_handle_afex_cmd(bp,
 					val & DRV_STATUS_AFEX_EVENT_MASK);
+			if (val & DRV_STATUS_EEE_NEGOTIATION_RESULTS)
+				bnx2x_handle_eee_event(bp);
 			if (bp->link_vars.periodic_flags &
 			    PERIODIC_FLAGS_LINK_EVENT) {
 				/*  sync with link */
@@ -10082,7 +10090,7 @@ static void __devinit bnx2x_get_port_hwinfo(struct bnx2x *bp)
 {
 	int port = BP_PORT(bp);
 	u32 config;
-	u32 ext_phy_type, ext_phy_config;
+	u32 ext_phy_type, ext_phy_config, eee_mode;
 
 	bp->link_params.bp = bp;
 	bp->link_params.port = port;
@@ -10149,6 +10157,19 @@ static void __devinit bnx2x_get_port_hwinfo(struct bnx2x *bp)
 		bp->port.need_hw_lock = bnx2x_hw_lock_required(bp,
 							bp->common.shmem_base,
 							bp->common.shmem2_base);
+
+	/* Configure link feature according to nvram value */
+	eee_mode = (((SHMEM_RD(bp, dev_info.
+		      port_feature_config[port].eee_power_mode)) &
+		     PORT_FEAT_CFG_EEE_POWER_MODE_MASK) >>
+		    PORT_FEAT_CFG_EEE_POWER_MODE_SHIFT);
+	if (eee_mode != PORT_FEAT_CFG_EEE_POWER_MODE_DISABLED) {
+		bp->link_params.eee_mode = EEE_MODE_ADV_LPI |
+					   EEE_MODE_ENABLE_LPI |
+					   EEE_MODE_OUTPUT_TIME;
+	} else {
+		bp->link_params.eee_mode = 0;
+	}
 }
 
 void bnx2x_get_iscsi_info(struct bnx2x *bp)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
index bbd3874..bfef98f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
@@ -1488,6 +1488,121 @@
  * 2:1 - otp_misc_do[51:50]; 0 - otp_misc_do[1]. */
 #define MISC_REG_CHIP_TYPE					 0xac60
 #define MISC_REG_CHIP_TYPE_57811_MASK				 (1<<1)
+#define MISC_REG_CPMU_LP_DR_ENABLE				 0xa858
+/* [RW 1] FW EEE LPI Enable. When 1 indicates that EEE LPI mode is enabled
+ * by FW. When 0 indicates that the EEE LPI mode is disabled by FW. Clk
+ * 25MHz. Reset on hard reset. */
+#define MISC_REG_CPMU_LP_FW_ENABLE_P0				 0xa84c
+/* [RW 32] EEE LPI Idle Threshold. The threshold value for the idle EEE LPI
+ * counter. Timer tick is 1 us. Clock 25MHz. Reset on hard reset. */
+#define MISC_REG_CPMU_LP_IDLE_THR_P0				 0xa8a0
+/* [RW 18] LPI entry events mask. [0] - Vmain SM Mask. When 1 indicates that
+ * the Vmain SM end state is disabled. When 0 indicates that the Vmain SM
+ * end state is enabled. [1] - FW Queues Empty Mask. When 1 indicates that
+ * the FW command that all Queues are empty is disabled. When 0 indicates
+ * that the FW command that all Queues are empty is enabled. [2] - FW Early
+ * Exit Mask / Reserved (Entry mask). When 1 indicates that the FW Early
+ * Exit command is disabled. When 0 indicates that the FW Early Exit command
+ * is enabled. This bit applicable only in the EXIT Events Mask registers.
+ * [3] - PBF Request Mask. When 1 indicates that the PBF Request indication
+ * is disabled. When 0 indicates that the PBF Request indication is enabled.
+ * [4] - Tx Request Mask. When =1 indicates that the Tx other Than PBF
+ * Request indication is disabled. When 0 indicates that the Tx Other Than
+ * PBF Request indication is enabled. [5] - Rx EEE LPI Status Mask. When 1
+ * indicates that the RX EEE LPI Status indication is disabled. When 0
+ * indicates that the RX EEE LPI Status indication is enabled. In the EXIT
+ * Events Masks registers; this bit masks the falling edge detect of the LPI
+ * Status (Rx LPI is on - off). [6] - Tx Pause Mask. When 1 indicates that
+ * the Tx Pause indication is disabled. When 0 indicates that the Tx Pause
+ * indication is enabled. [7] - BRB1 Empty Mask. When 1 indicates that the
+ * BRB1 EMPTY indication is disabled. When 0 indicates that the BRB1 EMPTY
+ * indication is enabled. [8] - QM Idle Mask. When 1 indicates that the QM
+ * IDLE indication is disabled. When 0 indicates that the QM IDLE indication
+ * is enabled. (One bit for both VOQ0 and VOQ1). [9] - QM LB Idle Mask. When
+ * 1 indicates that the QM IDLE indication for LOOPBACK is disabled. When 0
+ * indicates that the QM IDLE indication for LOOPBACK is enabled. [10] - L1
+ * Status Mask. When 1 indicates that the L1 Status indication from the PCIE
+ * CORE is disabled. When 0 indicates that the RX EEE LPI Status indication
+ * from the PCIE CORE is enabled. In the EXIT Events Masks registers; this
+ * bit masks the falling edge detect of the L1 status (L1 is on - off). [11]
+ * - P0 E0 EEE EEE LPI REQ Mask. When =1 indicates that the P0 E0 EEE EEE
+ * LPI REQ indication is disabled. When =0 indicates that the P0 E0 EEE LPI
+ * REQ indication is enabled. [12] - P1 E0 EEE LPI REQ Mask. When =1
+ * indicates that the P0 EEE LPI REQ indication is disabled. When =0
+ * indicates that the P0 EEE LPI REQ indication is enabled. [13] - P0 E1 EEE
+ * LPI REQ Mask. When =1 indicates that the P0 EEE LPI REQ indication is
+ * disabled. When =0 indicates that the P0 EEE LPI REQ indication is
+ * enabled. [14] - P1 E1 EEE LPI REQ Mask. When =1 indicates that the P0 EEE
+ * LPI REQ indication is disabled. When =0 indicates that the P0 EEE LPI REQ
+ * indication is enabled. [15] - L1 REQ Mask. When =1 indicates that the L1
+ * REQ indication is disabled. When =0 indicates that the L1 indication is
+ * enabled. [16] - Rx EEE LPI Status Edge Detect Mask. When =1 indicates
+ * that the RX EEE LPI Status Falling Edge Detect indication is disabled (Rx
+ * EEE LPI is on - off). When =0 indicates that the RX EEE LPI Status
+ * Falling Edge Detec indication is enabled (Rx EEE LPI is on - off). This
+ * bit is applicable only in the EXIT Events Masks registers. [17] - L1
+ * Status Edge Detect Mask. When =1 indicates that the L1 Status Falling
+ * Edge Detect indication from the PCIE CORE is disabled (L1 is on - off).
+ * When =0 indicates that the L1 Status Falling Edge Detect indication from
+ * the PCIE CORE is enabled (L1 is on - off). This bit is applicable only in
+ * the EXIT Events Masks registers. Clock 25MHz. Reset on hard reset. */
+#define MISC_REG_CPMU_LP_MASK_ENT_P0				 0xa880
+/* [RW 18] EEE LPI exit events mask. [0] - Vmain SM Mask. When 1 indicates
+ * that the Vmain SM end state is disabled. When 0 indicates that the Vmain
+ * SM end state is enabled. [1] - FW Queues Empty Mask. When 1 indicates
+ * that the FW command that all Queues are empty is disabled. When 0
+ * indicates that the FW command that all Queues are empty is enabled. [2] -
+ * FW Early Exit Mask / Reserved (Entry mask). When 1 indicates that the FW
+ * Early Exit command is disabled. When 0 indicates that the FW Early Exit
+ * command is enabled. This bit applicable only in the EXIT Events Mask
+ * registers. [3] - PBF Request Mask. When 1 indicates that the PBF Request
+ * indication is disabled. When 0 indicates that the PBF Request indication
+ * is enabled. [4] - Tx Request Mask. When =1 indicates that the Tx other
+ * Than PBF Request indication is disabled. When 0 indicates that the Tx
+ * Other Than PBF Request indication is enabled. [5] - Rx EEE LPI Status
+ * Mask. When 1 indicates that the RX EEE LPI Status indication is disabled.
+ * When 0 indicates that the RX LPI Status indication is enabled. In the
+ * EXIT Events Masks registers; this bit masks the falling edge detect of
+ * the EEE LPI Status (Rx EEE LPI is on - off). [6] - Tx Pause Mask. When 1
+ * indicates that the Tx Pause indication is disabled. When 0 indicates that
+ * the Tx Pause indication is enabled. [7] - BRB1 Empty Mask. When 1
+ * indicates that the BRB1 EMPTY indication is disabled. When 0 indicates
+ * that the BRB1 EMPTY indication is enabled. [8] - QM Idle Mask. When 1
+ * indicates that the QM IDLE indication is disabled. When 0 indicates that
+ * the QM IDLE indication is enabled. (One bit for both VOQ0 and VOQ1). [9]
+ * - QM LB Idle Mask. When 1 indicates that the QM IDLE indication for
+ * LOOPBACK is disabled. When 0 indicates that the QM IDLE indication for
+ * LOOPBACK is enabled. [10] - L1 Status Mask. When 1 indicates that the L1
+ * Status indication from the PCIE CORE is disabled. When 0 indicates that
+ * the RX EEE LPI Status indication from the PCIE CORE is enabled. In the
+ * EXIT Events Masks registers; this bit masks the falling edge detect of
+ * the L1 status (L1 is on - off). [11] - P0 E0 EEE EEE LPI REQ Mask. When
+ * =1 indicates that the P0 E0 EEE EEE LPI REQ indication is disabled. When
+ * =0 indicates that the P0 E0 EEE LPI REQ indication is enabled. [12] - P1
+ * E0 EEE LPI REQ Mask. When =1 indicates that the P0 EEE LPI REQ indication
+ * is disabled. When =0 indicates that the P0 EEE LPI REQ indication is
+ * enabled. [13] - P0 E1 EEE LPI REQ Mask. When =1 indicates that the P0 EEE
+ * LPI REQ indication is disabled. When =0 indicates that the P0 EEE LPI REQ
+ * indication is enabled. [14] - P1 E1 EEE LPI REQ Mask. When =1 indicates
+ * that the P0 EEE LPI REQ indication is disabled. When =0 indicates that
+ * the P0 EEE LPI REQ indication is enabled. [15] - L1 REQ Mask. When =1
+ * indicates that the L1 REQ indication is disabled. When =0 indicates that
+ * the L1 indication is enabled. [16] - Rx EEE LPI Status Edge Detect Mask.
+ * When =1 indicates that the RX EEE LPI Status Falling Edge Detect
+ * indication is disabled (Rx EEE LPI is on - off). When =0 indicates that
+ * the RX EEE LPI Status Falling Edge Detec indication is enabled (Rx EEE
+ * LPI is on - off). This bit is applicable only in the EXIT Events Masks
+ * registers. [17] - L1 Status Edge Detect Mask. When =1 indicates that the
+ * L1 Status Falling Edge Detect indication from the PCIE CORE is disabled
+ * (L1 is on - off). When =0 indicates that the L1 Status Falling Edge
+ * Detect indication from the PCIE CORE is enabled (L1 is on - off). This
+ * bit is applicable only in the EXIT Events Masks registers.Clock 25MHz.
+ * Reset on hard reset. */
+#define MISC_REG_CPMU_LP_MASK_EXT_P0				 0xa888
+/* [RW 16] EEE LPI Entry Events Counter. A statistic counter with the number
+ * of counts that the SM entered the EEE LPI state. Clock 25MHz. Read only
+ * register. Reset on hard reset. */
+#define MISC_REG_CPMU_LP_SM_ENT_CNT_P0				 0xa8b8
 /* [RW 32] The following driver registers(1...16) represent 16 drivers and
    32 clients. Each client can be controlled by one driver only. One in each
    bit represent that this driver control the appropriate client (Ex: bit 5
@@ -5372,6 +5487,8 @@
 /* [RW 32] Lower 48 bits of ctrl_sa register. Used as the SA in PAUSE/PFC
  * packets transmitted by the MAC */
 #define XMAC_REG_CTRL_SA_LO					 0x28
+#define XMAC_REG_EEE_CTRL					 0xd8
+#define XMAC_REG_EEE_TIMERS_HI					 0xe4
 #define XMAC_REG_PAUSE_CTRL					 0x68
 #define XMAC_REG_PFC_CTRL					 0x70
 #define XMAC_REG_PFC_CTRL_HI					 0x74
@@ -6813,6 +6930,8 @@ Theotherbitsarereservedandshouldbezero*/
 #define MDIO_AN_REG_LP_AUTO_NEG		0x0013
 #define MDIO_AN_REG_LP_AUTO_NEG2	0x0014
 #define MDIO_AN_REG_MASTER_STATUS	0x0021
+#define MDIO_AN_REG_EEE_ADV		0x003c
+#define MDIO_AN_REG_LP_EEE_ADV		0x003d
 /*bcm*/
 #define MDIO_AN_REG_LINK_STATUS 	0x8304
 #define MDIO_AN_REG_CL37_CL73		0x8370
@@ -6866,6 +6985,8 @@ Theotherbitsarereservedandshouldbezero*/
 #define MDIO_PMA_REG_84823_LED3_STRETCH_EN			0x0080
 
 /* BCM84833 only */
+#define MDIO_84833_TOP_CFG_FW_REV			0x400f
+#define MDIO_84833_TOP_CFG_FW_EEE		0x10b1
 #define MDIO_84833_TOP_CFG_XGPHY_STRAP1			0x401a
 #define MDIO_84833_SUPER_ISOLATE		0x8000
 /* These are mailbox register set used by 84833. */
@@ -6993,11 +7114,13 @@ Theotherbitsarereservedandshouldbezero*/
 #define MDIO_WC_REG_DIGITAL3_UP1			0x8329
 #define MDIO_WC_REG_DIGITAL3_LP_UP1			 0x832c
 #define MDIO_WC_REG_DIGITAL4_MISC3			0x833c
+#define MDIO_WC_REG_DIGITAL4_MISC5			0x833e
 #define MDIO_WC_REG_DIGITAL5_MISC6			0x8345
 #define MDIO_WC_REG_DIGITAL5_MISC7			0x8349
 #define MDIO_WC_REG_DIGITAL5_ACTUAL_SPEED		0x834e
 #define MDIO_WC_REG_DIGITAL6_MP5_NEXTPAGECTRL		0x8350
 #define MDIO_WC_REG_CL49_USERB0_CTRL			0x8368
+#define MDIO_WC_REG_EEE_COMBO_CONTROL0			0x8390
 #define MDIO_WC_REG_TX66_CONTROL			0x83b0
 #define MDIO_WC_REG_RX66_CONTROL			0x83c0
 #define MDIO_WC_REG_RX66_SCW0				0x83c2
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
index 1e2785c..0e8bdcb 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c
@@ -785,6 +785,10 @@ static int bnx2x_hw_stats_update(struct bnx2x *bp)
 
 	pstats->host_port_stats_counter++;
 
+	if (CHIP_IS_E3(bp))
+		estats->eee_tx_lpi += REG_RD(bp,
+					     MISC_REG_CPMU_LP_SM_ENT_CNT_P0);
+
 	if (!BP_NOMCP(bp)) {
 		u32 nig_timer_max =
 			SHMEM_RD(bp, port_mb[BP_PORT(bp)].stat_nig_timer);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
index 93e689fd..24b8e50 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.h
@@ -203,6 +203,8 @@ struct bnx2x_eth_stats {
 	/* Recovery */
 	u32 recoverable_error;
 	u32 unrecoverable_error;
+	/* src: Clear-on-Read register; Will not survive PMF Migration */
+	u32 eee_tx_lpi;
 };
 
 
-- 
1.7.9.rc2

^ permalink raw reply related

* [net-next PATCH v2 3/3] bnx2x: Added EEE Ethtool support.
From: Yuval Mintz @ 2012-06-06  8:58 UTC (permalink / raw)
  To: davem, netdev; +Cc: eilong, bhutchings, peppe.cavallaro, Yuval Mintz
In-Reply-To: <1338973098-16439-1-git-send-email-yuvalmin@broadcom.com>

This patch extends the bnx2x's ethtool interface to enable
control in the eee feature, as well as report statistic information
about it.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    |  134 ++++++++++++++++++++
 1 files changed, 134 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index ddc18ee..bf30e28 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -177,6 +177,8 @@ static const struct {
 			4, STATS_FLAGS_FUNC, "recoverable_errors" },
 	{ STATS_OFFSET32(unrecoverable_error),
 			4, STATS_FLAGS_FUNC, "unrecoverable_errors" },
+	{ STATS_OFFSET32(eee_tx_lpi),
+			4, STATS_FLAGS_PORT, "Tx LPI entry count"}
 };
 
 #define BNX2X_NUM_STATS		ARRAY_SIZE(bnx2x_stats_arr)
@@ -1543,6 +1545,136 @@ static const struct {
 	{ "idle check (online)" }
 };
 
+static u32 bnx2x_eee_to_adv(u32 eee_adv)
+{
+	u32 modes = 0;
+
+	if (eee_adv & SHMEM_EEE_100M_ADV)
+		modes |= ADVERTISED_100baseT_Full;
+	if (eee_adv & SHMEM_EEE_1G_ADV)
+		modes |= ADVERTISED_1000baseT_Full;
+	if (eee_adv & SHMEM_EEE_10G_ADV)
+		modes |= ADVERTISED_10000baseT_Full;
+
+	return modes;
+}
+
+static u32 bnx2x_adv_to_eee(u32 modes, u32 shift)
+{
+	u32 eee_adv = 0;
+	if (modes & ADVERTISED_100baseT_Full)
+		eee_adv |= SHMEM_EEE_100M_ADV;
+	if (modes & ADVERTISED_1000baseT_Full)
+		eee_adv |= SHMEM_EEE_1G_ADV;
+	if (modes & ADVERTISED_10000baseT_Full)
+		eee_adv |= SHMEM_EEE_10G_ADV;
+
+	return eee_adv << shift;
+}
+
+static int bnx2x_get_eee(struct net_device *dev, struct ethtool_eee *edata)
+{
+	struct bnx2x *bp = netdev_priv(dev);
+	u32 eee_cfg;
+
+	if (!SHMEM2_HAS(bp, eee_status[BP_PORT(bp)])) {
+		DP(BNX2X_MSG_ETHTOOL, "BC Version does not support EEE\n");
+		return -EOPNOTSUPP;
+	}
+
+	eee_cfg = SHMEM2_RD(bp, eee_status[BP_PORT(bp)]);
+
+	edata->supported =
+		bnx2x_eee_to_adv((eee_cfg & SHMEM_EEE_SUPPORTED_MASK) >>
+				 SHMEM_EEE_SUPPORTED_SHIFT);
+
+	edata->advertised =
+		bnx2x_eee_to_adv((eee_cfg & SHMEM_EEE_ADV_STATUS_MASK) >>
+				 SHMEM_EEE_ADV_STATUS_SHIFT);
+	edata->lp_advertised =
+		bnx2x_eee_to_adv((eee_cfg & SHMEM_EEE_LP_ADV_STATUS_MASK) >>
+				 SHMEM_EEE_LP_ADV_STATUS_SHIFT);
+
+	/* SHMEM value is in 16u units --> Convert to 1u units. */
+	edata->tx_lpi_timer = (eee_cfg & SHMEM_EEE_TIMER_MASK) << 4;
+
+	edata->eee_enabled    = (eee_cfg & SHMEM_EEE_REQUESTED_BIT)	? 1 : 0;
+	edata->eee_active     = (eee_cfg & SHMEM_EEE_ACTIVE_BIT)	? 1 : 0;
+	edata->tx_lpi_enabled = (eee_cfg & SHMEM_EEE_LPI_REQUESTED_BIT) ? 1 : 0;
+
+	return 0;
+}
+
+static int bnx2x_set_eee(struct net_device *dev, struct ethtool_eee *edata)
+{
+	struct bnx2x *bp = netdev_priv(dev);
+	u32 eee_cfg;
+	u32 advertised;
+
+	if (IS_MF(bp))
+		return 0;
+
+	if (!SHMEM2_HAS(bp, eee_status[BP_PORT(bp)])) {
+		DP(BNX2X_MSG_ETHTOOL, "BC Version does not support EEE\n");
+		return -EOPNOTSUPP;
+	}
+
+	eee_cfg = SHMEM2_RD(bp, eee_status[BP_PORT(bp)]);
+
+	if (!(eee_cfg & SHMEM_EEE_SUPPORTED_MASK)) {
+		DP(BNX2X_MSG_ETHTOOL, "Board does not support EEE!\n");
+		return -EOPNOTSUPP;
+	}
+
+	advertised = bnx2x_adv_to_eee(edata->advertised,
+				      SHMEM_EEE_ADV_STATUS_SHIFT);
+	if ((advertised != (eee_cfg & SHMEM_EEE_ADV_STATUS_MASK))) {
+		DP(BNX2X_MSG_ETHTOOL,
+		   "Direct manipulation of EEE advertisment is not supported\n");
+		return -EINVAL;
+	}
+
+	if (edata->tx_lpi_timer > EEE_MODE_TIMER_MASK) {
+		DP(BNX2X_MSG_ETHTOOL,
+		   "Maximal Tx Lpi timer supported is %x(u)\n",
+		   EEE_MODE_TIMER_MASK);
+		return -EINVAL;
+	}
+	if (edata->tx_lpi_enabled &&
+	    (edata->tx_lpi_timer < EEE_MODE_NVRAM_AGGRESSIVE_TIME)) {
+		DP(BNX2X_MSG_ETHTOOL,
+		   "Minimal Tx Lpi timer supported is %d(u)\n",
+		   EEE_MODE_NVRAM_AGGRESSIVE_TIME);
+		return -EINVAL;
+	}
+
+	/* All is well; Apply changes*/
+	if (edata->eee_enabled)
+		bp->link_params.eee_mode |= EEE_MODE_ADV_LPI;
+	else
+		bp->link_params.eee_mode &= ~EEE_MODE_ADV_LPI;
+
+	if (edata->tx_lpi_enabled)
+		bp->link_params.eee_mode |= EEE_MODE_ENABLE_LPI;
+	else
+		bp->link_params.eee_mode &= ~EEE_MODE_ENABLE_LPI;
+
+	bp->link_params.eee_mode &= ~EEE_MODE_TIMER_MASK;
+	bp->link_params.eee_mode |= (edata->tx_lpi_timer &
+				    EEE_MODE_TIMER_MASK) |
+				    EEE_MODE_OVERRIDE_NVRAM |
+				    EEE_MODE_OUTPUT_TIME;
+
+	/* Restart link to propogate changes */
+	if (netif_running(dev)) {
+		bnx2x_stats_handle(bp, STATS_EVENT_STOP);
+		bnx2x_link_set(bp);
+	}
+
+	return 0;
+}
+
+
 enum {
 	BNX2X_CHIP_E1_OFST = 0,
 	BNX2X_CHIP_E1H_OFST,
@@ -2472,6 +2604,8 @@ static const struct ethtool_ops bnx2x_ethtool_ops = {
 	.get_rxfh_indir_size	= bnx2x_get_rxfh_indir_size,
 	.get_rxfh_indir		= bnx2x_get_rxfh_indir,
 	.set_rxfh_indir		= bnx2x_set_rxfh_indir,
+	.get_eee		= bnx2x_get_eee,
+	.set_eee		= bnx2x_set_eee,
 };
 
 void bnx2x_set_ethtool_ops(struct net_device *netdev)
-- 
1.7.9.rc2

^ permalink raw reply related

* Re: [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Michael S. Tsirkin @ 2012-06-06  9:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120606075217.29081.30713.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, Jun 06, 2012 at 03:52:17PM +0800, Jason Wang wrote:
> Satistics counters is useful for debugging and performance optimization, so this
> patch lets virtio_net driver collect following and export them to userspace
> through "ethtool -S":
> 
> - number of packets sent/received
> - number of bytes sent/received
> - number of callbacks for tx/rx
> - number of kick for tx/rx
> - number of bytes/packets queued for tx
> 
> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
> collected like:
> 
> NIC statistics:
>      tx_bytes[0]: 1731209929
>      tx_packets[0]: 60685
>      tx_kicks[0]: 63
>      tx_callbacks[0]: 73
>      tx_queued_bytes[0]: 1935749360
>      tx_queued_packets[0]: 80652
>      rx_bytes[0]: 2695648
>      rx_packets[0]: 40767
>      rx_kicks[0]: 1
>      rx_callbacks[0]: 2077
>      tx_bytes[1]: 9105588697
>      tx_packets[1]: 344150
>      tx_kicks[1]: 162
>      tx_callbacks[1]: 905
>      tx_queued_bytes[1]: 8901049412
>      tx_queued_packets[1]: 324184
>      rx_bytes[1]: 23679828
>      rx_packets[1]: 358770
>      rx_kicks[1]: 6
>      rx_callbacks[1]: 17717
>      tx_bytes: 10836798626
>      tx_packets: 404835
>      tx_kicks: 225
>      tx_callbacks: 978
>      tx_queued_bytes: 10836798772
>      tx_queued_packets: 404836
>      rx_bytes: 26375476
>      rx_packets: 399537
>      rx_kicks: 7
>      rx_callbacks: 19794
> 
> TODO:
> 
> - more statistics
> - calculate the pending bytes/pkts
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> ---
> Changes from v1:
> 
> - style & typo fixs
> - convert the statistics fields to array
> - use unlikely()
> ---
>  drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 113 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6e4aa6f..909a0a7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
>  enum virtnet_stats_type {
>  	VIRTNET_TX_BYTES,
>  	VIRTNET_TX_PACKETS,
> +	VIRTNET_TX_KICKS,
> +	VIRTNET_TX_CBS,
> +	VIRTNET_TX_Q_BYTES,
> +	VIRTNET_TX_Q_PACKETS,

What about counting the time we spend with queue
stopped and # of times we stop the queue?

>  	VIRTNET_RX_BYTES,
>  	VIRTNET_RX_PACKETS,
> +	VIRTNET_RX_KICKS,
> +	VIRTNET_RX_CBS,

What about a counter for oom on rx?

>  	VIRTNET_NUM_STATS,
>  };
>  
> @@ -54,6 +60,21 @@ struct virtnet_stats {
>  	u64 data[VIRTNET_NUM_STATS];
>  };
>  
> +static struct {
> +	char string[ETH_GSTRING_LEN];
> +} virtnet_stats_str_attr[] = {
> +	{ "tx_bytes" },
> +	{ "tx_packets" },
> +	{ "tx_kicks" },
> +	{ "tx_callbacks" },
> +	{ "tx_queued_bytes" },
> +	{ "tx_queued_packets" },
> +	{ "rx_bytes" },
> +	{ "rx_packets" },
> +	{ "rx_kicks" },
> +	{ "rx_callbacks" },
> +};
> +
>  struct virtnet_info {
>  	struct virtio_device *vdev;
>  	struct virtqueue *rvq, *svq, *cvq;
> @@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>  static void skb_xmit_done(struct virtqueue *svq)
>  {
>  	struct virtnet_info *vi = svq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_TX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Suppress further interrupts. */
>  	virtqueue_disable_cb(svq);
> @@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  {
>  	int err;
>  	bool oom;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  
>  	do {
>  		if (vi->mergeable_rx_bufs)
> @@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  	} while (err > 0);
>  	if (unlikely(vi->num > vi->max))
>  		vi->max = vi->num;
> -	virtqueue_kick(vi->rvq);
> +	if (virtqueue_kick_prepare(vi->rvq)) {
> +		virtqueue_notify(vi->rvq);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->data[VIRTNET_RX_KICKS]++;
> +		u64_stats_update_end(&stats->syncp);
> +	}
>  	return !oom;
>  }
>  
>  static void skb_recv_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_RX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
> +

This data path so not entirely free.
I am guessing the overhead is not measureable but
did you check?

An alternative is to count when napi callbacks
are envoked. If we also count when weight was exceeded
we get almost the same result.


>  	/* Schedule NAPI, Suppress further interrupts if successful. */
>  	if (napi_schedule_prep(&vi->napi)) {
>  		virtqueue_disable_cb(rvq);
> @@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  	int capacity;
> +	bool kick;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(vi);
> @@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		kfree_skb(skb);
>  		return NETDEV_TX_OK;
>  	}
> -	virtqueue_kick(vi->svq);
> +
> +	kick = virtqueue_kick_prepare(vi->svq);
> +	if (unlikely(kick))
> +		virtqueue_notify(vi->svq);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	if (unlikely(kick))
> +		stats->data[VIRTNET_TX_KICKS]++;
> +	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
> +	stats->data[VIRTNET_TX_Q_PACKETS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Don't wait up for transmitted skbs to be freed. */
>  	skb_orphan(skb);
> @@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>  
>  }
>  
> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
> +{
> +	int i, cpu;
> +	switch (stringset) {
> +	case ETH_SS_STATS:
> +		for_each_possible_cpu(cpu)
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				sprintf(buf, "%s[%u]",
> +					virtnet_stats_str_attr[i].string, cpu);
> +				buf += ETH_GSTRING_LEN;
> +			}
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			memcpy(buf, virtnet_stats_str_attr[i].string,
> +				ETH_GSTRING_LEN);
> +			buf += ETH_GSTRING_LEN;
> +		}
> +		break;
> +	}
> +}
> +
> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static void virtnet_get_ethtool_stats(struct net_device *dev,
> +				      struct ethtool_stats *stats, u64 *buf)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int cpu, i;
> +	unsigned int start;
> +	struct virtnet_stats sample, total;
> +
> +	memset(&total, 0, sizeof(total));
> +
> +	for_each_possible_cpu(cpu) {
> +		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&s->syncp);
> +			memcpy(&sample.data, &s->data,
> +			       sizeof(u64) * VIRTNET_NUM_STATS);
> +		} while (u64_stats_fetch_retry(&s->syncp, start));
> +
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			*buf = sample.data[i];
> +			total.data[i] += sample.data[i];
> +			buf++;
> +		}
> +	}
> +
> +	memcpy(buf, &total.data, sizeof(u64) * VIRTNET_NUM_STATS);
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
> +	.get_strings = virtnet_get_strings,
> +	.get_sset_count = virtnet_get_sset_count,
>  };
>  
>  #define MIN_MTU 68

^ permalink raw reply

* Re: [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Jason Wang @ 2012-06-06  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120606082752.GA12767@redhat.com>

On 06/06/2012 04:27 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 03:52:17PM +0800, Jason Wang wrote:
>> Satistics counters is useful for debugging and performance optimization, so this
>> patch lets virtio_net driver collect following and export them to userspace
>> through "ethtool -S":
>>
>> - number of packets sent/received
>> - number of bytes sent/received
>> - number of callbacks for tx/rx
>> - number of kick for tx/rx
>> - number of bytes/packets queued for tx
>>
>> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
>> collected like:
>>
>> NIC statistics:
>>       tx_bytes[0]: 1731209929
>>       tx_packets[0]: 60685
>>       tx_kicks[0]: 63
>>       tx_callbacks[0]: 73
>>       tx_queued_bytes[0]: 1935749360
>>       tx_queued_packets[0]: 80652
>>       rx_bytes[0]: 2695648
>>       rx_packets[0]: 40767
>>       rx_kicks[0]: 1
>>       rx_callbacks[0]: 2077
>>       tx_bytes[1]: 9105588697
>>       tx_packets[1]: 344150
>>       tx_kicks[1]: 162
>>       tx_callbacks[1]: 905
>>       tx_queued_bytes[1]: 8901049412
>>       tx_queued_packets[1]: 324184
>>       rx_bytes[1]: 23679828
>>       rx_packets[1]: 358770
>>       rx_kicks[1]: 6
>>       rx_callbacks[1]: 17717
>>       tx_bytes: 10836798626
>>       tx_packets: 404835
>>       tx_kicks: 225
>>       tx_callbacks: 978
>>       tx_queued_bytes: 10836798772
>>       tx_queued_packets: 404836
>>       rx_bytes: 26375476
>>       rx_packets: 399537
>>       rx_kicks: 7
>>       rx_callbacks: 19794
>>
>> TODO:
>>
>> - more statistics
>> - calculate the pending bytes/pkts
>>
> Do we need that? pending is (queued - packets), no?
>   

No, if we choose to calculate by tools.
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>
>> ---
>> Changes from v1:
>>
>> - style&  typo fixs
>> - convert the statistics fields to array
>> - use unlikely()
>> ---
>>   drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
>>   1 files changed, 113 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 6e4aa6f..909a0a7 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
>>   enum virtnet_stats_type {
>>   	VIRTNET_TX_BYTES,
>>   	VIRTNET_TX_PACKETS,
>> +	VIRTNET_TX_KICKS,
>> +	VIRTNET_TX_CBS,
>> +	VIRTNET_TX_Q_BYTES,
>> +	VIRTNET_TX_Q_PACKETS,
>>   	VIRTNET_RX_BYTES,
>>   	VIRTNET_RX_PACKETS,
>> +	VIRTNET_RX_KICKS,
>> +	VIRTNET_RX_CBS,
>>   	VIRTNET_NUM_STATS,
>>   };
>>
>> @@ -54,6 +60,21 @@ struct virtnet_stats {
>>   	u64 data[VIRTNET_NUM_STATS];
>>   };
>>
>> +static struct {
> static const?
>

Sorry, forget this.
>> +	char string[ETH_GSTRING_LEN];
>> +} virtnet_stats_str_attr[] = {
>> +	{ "tx_bytes" },
>> +	{ "tx_packets" },
>> +	{ "tx_kicks" },
>> +	{ "tx_callbacks" },
>> +	{ "tx_queued_bytes" },
>> +	{ "tx_queued_packets" },
>> +	{ "rx_bytes" },
>> +	{ "rx_packets" },
>> +	{ "rx_kicks" },
>> +	{ "rx_callbacks" },
>> +};
>> +
>>   struct virtnet_info {
>>   	struct virtio_device *vdev;
>>   	struct virtqueue *rvq, *svq, *cvq;
>> @@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>>   static void skb_xmit_done(struct virtqueue *svq)
>>   {
>>   	struct virtnet_info *vi = svq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->data[VIRTNET_TX_CBS]++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Suppress further interrupts. */
>>   	virtqueue_disable_cb(svq);
>> @@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   {
>>   	int err;
>>   	bool oom;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>
>>   	do {
>>   		if (vi->mergeable_rx_bufs)
>> @@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   	} while (err>  0);
>>   	if (unlikely(vi->num>  vi->max))
>>   		vi->max = vi->num;
>> -	virtqueue_kick(vi->rvq);
>> +	if (virtqueue_kick_prepare(vi->rvq)) {
> if (unlikely())
> also move stats here where they are actually used?

Sure.
>> +		virtqueue_notify(vi->rvq);
>> +		u64_stats_update_begin(&stats->syncp);
>> +		stats->data[VIRTNET_RX_KICKS]++;
>> +		u64_stats_update_end(&stats->syncp);
>> +	}
>>   	return !oom;
>>   }
>>
>>   static void skb_recv_done(struct virtqueue *rvq)
>>   {
>>   	struct virtnet_info *vi = rvq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->data[VIRTNET_RX_CBS]++;
>> +	u64_stats_update_end(&stats->syncp);
>> +
>>   	/* Schedule NAPI, Suppress further interrupts if successful. */
>>   	if (napi_schedule_prep(&vi->napi)) {
>>   		virtqueue_disable_cb(rvq);
>> @@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>>   static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct virtnet_info *vi = netdev_priv(dev);
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>   	int capacity;
>> +	bool kick;
>>
>>   	/* Free up any pending old buffers before queueing new ones. */
>>   	free_old_xmit_skbs(vi);
>> @@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   		kfree_skb(skb);
>>   		return NETDEV_TX_OK;
>>   	}
>> -	virtqueue_kick(vi->svq);
>> +
>> +	kick = virtqueue_kick_prepare(vi->svq);
>> +	if (unlikely(kick))
>> +		virtqueue_notify(vi->svq);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	if (unlikely(kick))
>> +		stats->data[VIRTNET_TX_KICKS]++;
>> +	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
>> +	stats->data[VIRTNET_TX_Q_PACKETS]++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Don't wait up for transmitted skbs to be freed. */
>>   	skb_orphan(skb);
>> @@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>>
>>   }
>>
>> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
>> +{
>> +	int i, cpu;
>> +	switch (stringset) {
>> +	case ETH_SS_STATS:
>> +		for_each_possible_cpu(cpu)
>> +			for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +				sprintf(buf, "%s[%u]",
>> +					virtnet_stats_str_attr[i].string, cpu);
>> +				buf += ETH_GSTRING_LEN;
> I would do
> 	 ret = snprintf(buf, ETH_GSTRING_LEN, ...)
> 	 BUG_ON(ret>= ETH_GSTRING_LEN);
> here to make it more robust.

Ok.
>> +			}
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			memcpy(buf, virtnet_stats_str_attr[i].string,
>> +				ETH_GSTRING_LEN);
>> +			buf += ETH_GSTRING_LEN;
>> +		}
> 		So why not just memcpy the whole array there?
> 		memcpy(buf, virtnet_stats_str_attr,
> 		       sizeof virtnet_stats_str_attr);
>
>> +		break;
>> +	}
>> +}
>> +
>> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
>> +{
>> +	switch (sset) {
>> +	case ETH_SS_STATS:
> also add
> 	BUILD_BUG_ON(VIRTNET_NUM_STATS != (sizeof virtnet_stats_str_attr) / ETH_GSTRING_LEN);
>

Ok.
>> +		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +}
>> +
>> +static void virtnet_get_ethtool_stats(struct net_device *dev,
>> +				      struct ethtool_stats *stats, u64 *buf)
>> +{
>> +	struct virtnet_info *vi = netdev_priv(dev);
>> +	int cpu, i;
>> +	unsigned int start;
>> +	struct virtnet_stats sample, total;
>> +
>> +	memset(&total, 0, sizeof(total));
> sizeof total
> when operand is a variable,
> to distinguish from when it is a type.

Sure.
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
>> +		do {
>> +			start = u64_stats_fetch_begin(&s->syncp);
>> +			memcpy(&sample.data,&s->data,
>> +			       sizeof(u64) * VIRTNET_NUM_STATS);
>> +		} while (u64_stats_fetch_retry(&s->syncp, start));
>> +
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			*buf = sample.data[i];
>> +			total.data[i] += sample.data[i];
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	memcpy(buf,&total.data, sizeof(u64) * VIRTNET_NUM_STATS);
>> +}
>> +
>>   static const struct ethtool_ops virtnet_ethtool_ops = {
>>   	.get_drvinfo = virtnet_get_drvinfo,
>>   	.get_link = ethtool_op_get_link,
>>   	.get_ringparam = virtnet_get_ringparam,
>> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
>> +	.get_strings = virtnet_get_strings,
>> +	.get_sset_count = virtnet_get_sset_count,
>>   };
>>
>>   #define MIN_MTU 68

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Jason Wang @ 2012-06-06  9:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: mst, netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338972341.2760.3944.camel@edumazet-glaptop>

On 06/06/2012 04:45 PM, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet<edumazet@google.com>
>>
>> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
>> on 32bit arches.
>>
>> We must use separate syncp for rx and tx path as they can be run at the
>> same time on different cpus. Thus one sequence increment can be lost and
>> readers spin forever.
>>
>> Signed-off-by: Eric Dumazet<edumazet@google.com>
>> Cc: Stephen Hemminger<shemminger@vyatta.com>
>> Cc: Michael S. Tsirkin<mst@redhat.com>
>> Cc: Jason Wang<jasowang@redhat.com>
>> ---
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
>
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
>
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

Thanks for explaing, current virtio-net is safe I think. But the patch 
is still needed as my patch would update the statistics in irq.
>
> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH 7/7] netfilter: add user-space connection tracking helper infrastructure
From: Ferenc Wagner @ 2012-06-06  9:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, wferi
In-Reply-To: <1338812485-4232-8-git-send-email-pablo@netfilter.org>

pablo@netfilter.org writes:

> * Security: Avoid complex string matching and mangling in kernel-space
>   running in unprivileged mode.

Or rather in privileged mode?

> 2) Add rules to enable the FTP user-space helper which is
>    used to track traffic going to TCP port 10000.

The examples use port 21 in the iptables commands and the expectations:

>  iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp
>  iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp
>
>     [NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
> [DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
-- 
Regards,
Feri.

^ permalink raw reply

* Deadlock, L2TP over IP are not working, 3.4.1
From: Denys Fedoryshchenko @ 2012-06-06  9:54 UTC (permalink / raw)
  To: davem, netdev, linux-kernel

It seems l2tp are not working, at least for me, due some bug

Script i uses, to reproduce:
SERVER=192.168.11.2
LOCALIP=`curl http://${SERVER}:8080/myip`
ID=`curl http://${SERVER}:8080/tunid` # It will generate some number, 
let's say 2
echo ID: ${ID}
modprobe l2tp_ip
modprobe l2tp_eth
ip l2tp add tunnel remote ${SERVER} local ${LOCALIP} tunnel_id ${ID} 
peer_tunnel_id ${ID} encap ip
ip l2tp add session name tun100 tunnel_id ${ID} session_id 1 
peer_session_id 1
ip link set dev tun100 up
ip addr add dev tun100 10.0.6.${ID}/24

Here is report for latest stable kernel. I can reproduce it on multiple 
pc's.
It is new setup, so i am not sure it was working on old kernels or not 
(regression or not).

[ 8683.927442] ======================================================
[ 8683.927555] [ INFO: possible circular locking dependency detected ]
[ 8683.927672] 3.4.1-build-0061 #14 Not tainted
[ 8683.927782] -------------------------------------------------------
[ 8683.927895] swapper/0/0 is trying to acquire lock:
[ 8683.928007]  (slock-AF_INET){+.-...}, at: [<e0fc73ec>] 
l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]
[ 8683.928121] but task is already holding lock:
[ 8683.928121]  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[ 8683.928121]
[ 8683.928121] which lock already depends on the new lock.
[ 8683.928121]
[ 8683.928121]
[ 8683.928121] the existing dependency chain (in reverse order) is:
[ 8683.928121]
[ 8683.928121] -> #1 (_xmit_ETHER#2){+.-...}:
[ 8683.928121]        [<c015a561>] lock_acquire+0x71/0x85
[ 8683.928121]        [<c034da2d>] _raw_spin_lock+0x33/0x40
[ 8683.928121]        [<c0304e0c>] ip_send_reply+0xf2/0x1ce
[ 8683.928121]        [<c0317dbc>] tcp_v4_send_reset+0x153/0x16f
[ 8683.928121]        [<c0317f4a>] tcp_v4_do_rcv+0x172/0x194
[ 8683.928121]        [<c031929b>] tcp_v4_rcv+0x387/0x5a0
[ 8683.928121]        [<c03001d0>] ip_local_deliver_finish+0x13a/0x1e9
[ 8683.928121]        [<c0300645>] NF_HOOK.clone.11+0x46/0x4d
[ 8683.928121]        [<c030075b>] ip_local_deliver+0x41/0x45
[ 8683.928121]        [<c03005dd>] ip_rcv_finish+0x31a/0x33c
[ 8683.928121]        [<c0300645>] NF_HOOK.clone.11+0x46/0x4d
[ 8683.928121]        [<c0300960>] ip_rcv+0x201/0x23d
[ 8683.928121]        [<c02de91b>] __netif_receive_skb+0x329/0x378
[ 8683.928121]        [<c02deae8>] netif_receive_skb+0x4e/0x7d
[ 8683.928121]        [<e08d5ef3>] rtl8139_poll+0x243/0x33d [8139too]
[ 8683.928121]        [<c02df103>] net_rx_action+0x90/0x15d
[ 8683.928121]        [<c012b2b5>] __do_softirq+0x7b/0x118
[ 8683.928121]
[ 8683.928121] -> #0 (slock-AF_INET){+.-...}:
[ 8683.928121]        [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[ 8683.928121]        [<c015a561>] lock_acquire+0x71/0x85
[ 8683.928121]        [<c034da2d>] _raw_spin_lock+0x33/0x40
[ 8683.928121]        [<e0fc73ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
[ 8683.928121]        [<e0fe31fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
[ 8683.928121]        [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[ 8683.928121]        [<c02f064c>] sch_direct_xmit+0x55/0x119
[ 8683.928121]        [<c02e0528>] dev_queue_xmit+0x282/0x418
[ 8683.928121]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]        [<c031f524>] arp_xmit+0x22/0x24
[ 8683.928121]        [<c031f567>] arp_send+0x41/0x48
[ 8683.928121]        [<c031fa7d>] arp_process+0x289/0x491
[ 8683.928121]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]        [<c031f7a0>] arp_rcv+0xb1/0xc3
[ 8683.928121]        [<c02de91b>] __netif_receive_skb+0x329/0x378
[ 8683.928121]        [<c02de9d3>] process_backlog+0x69/0x130
[ 8683.928121]        [<c02df103>] net_rx_action+0x90/0x15d
[ 8683.928121]        [<c012b2b5>] __do_softirq+0x7b/0x118
[ 8683.928121]
[ 8683.928121] other info that might help us debug this:
[ 8683.928121]
[ 8683.928121]  Possible unsafe locking scenario:
[ 8683.928121]
[ 8683.928121]        CPU0                    CPU1
[ 8683.928121]        ----                    ----
[ 8683.928121]   lock(_xmit_ETHER#2);
[ 8683.928121]                                lock(slock-AF_INET);
[ 8683.928121]                                lock(_xmit_ETHER#2);
[ 8683.928121]   lock(slock-AF_INET);
[ 8683.928121]
[ 8683.928121]  *** DEADLOCK ***
[ 8683.928121]
[ 8683.928121] 3 locks held by swapper/0/0:
[ 8683.928121]  #0:  (rcu_read_lock){.+.+..}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[ 8683.928121]  #1:  (rcu_read_lock_bh){.+....}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[ 8683.928121]  #2:  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[ 8683.928121]
[ 8683.928121] stack backtrace:
[ 8683.928121] Pid: 0, comm: swapper/0 Not tainted 3.4.1-build-0061 #14
[ 8683.928121] Call Trace:
[ 8683.928121]  [<c034bdd2>] ? printk+0x18/0x1a
[ 8683.928121]  [<c0158904>] print_circular_bug+0x1ac/0x1b6
[ 8683.928121]  [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[ 8683.928121]  [<c015a561>] lock_acquire+0x71/0x85
[ 8683.928121]  [<e0fc73ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]  [<c034da2d>] _raw_spin_lock+0x33/0x40
[ 8683.928121]  [<e0fc73ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]  [<e0fc73ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]  [<e0fe31fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth]
[ 8683.928121]  [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[ 8683.928121]  [<c02f064c>] sch_direct_xmit+0x55/0x119
[ 8683.928121]  [<c02e0528>] dev_queue_xmit+0x282/0x418
[ 8683.928121]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[ 8683.928121]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]  [<c031f524>] arp_xmit+0x22/0x24
[ 8683.928121]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[ 8683.928121]  [<c031f567>] arp_send+0x41/0x48
[ 8683.928121]  [<c031fa7d>] arp_process+0x289/0x491
[ 8683.928121]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[ 8683.928121]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]  [<c031f7a0>] arp_rcv+0xb1/0xc3
[ 8683.928121]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[ 8683.928121]  [<c02de91b>] __netif_receive_skb+0x329/0x378
[ 8683.928121]  [<c02de9d3>] process_backlog+0x69/0x130
[ 8683.928121]  [<c02df103>] net_rx_action+0x90/0x15d
[ 8683.928121]  [<c012b2b5>] __do_softirq+0x7b/0x118
[ 8683.928121]  [<c012b23a>] ? local_bh_enable+0xd/0xd
[ 8683.928121]  <IRQ>  [<c012b4d0>] ? irq_exit+0x41/0x91
[ 8683.928121]  [<c0103c6f>] ? do_IRQ+0x79/0x8d
[ 8683.928121]  [<c0157ea1>] ? trace_hardirqs_off_caller+0x2e/0x86
[ 8683.928121]  [<c034ef6e>] ? common_interrupt+0x2e/0x34
[ 8683.928121]  [<c0108a33>] ? default_idle+0x23/0x38
[ 8683.928121]  [<c01091a8>] ? cpu_idle+0x55/0x6f
[ 8683.928121]  [<c033df25>] ? rest_init+0xa1/0xa7
[ 8683.928121]  [<c033de84>] ? __read_lock_failed+0x14/0x14
[ 8683.928121]  [<c0498745>] ? start_kernel+0x303/0x30a
[ 8683.928121]  [<c0498209>] ? repair_env_string+0x51/0x51
[ 8683.928121]  [<c04980a8>] ? i386_start_kernel+0xa8/0xaf


[158595.436934]
[158595.437018] ======================================================
[158595.437111] [ INFO: possible circular locking dependency detected ]
[158595.437198] 3.4.0-build-0061 #12 Tainted: G        W
[158595.437281] -------------------------------------------------------
[158595.437365] swapper/0/0 is trying to acquire lock:
[158595.437447]  (slock-AF_INET){+.-...}, at: [<f86453ec>] 
l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437613]
[158595.437613] but task is already holding lock:
[158595.437763]  (_xmit_ETHER#2){+.-...}, at: [<c02f09b9>] 
sch_direct_xmit+0x36/0x119
[158595.437837]
[158595.437837] which lock already depends on the new lock.
[158595.437837]
[158595.437837]
[158595.437837] the existing dependency chain (in reverse order) is:
[158595.437837]
[158595.437837] -> #1 (_xmit_ETHER#2){+.-...}:
[158595.437837]        [<c015a6d1>] lock_acquire+0x71/0x85
[158595.437837]        [<c034de94>] _raw_spin_lock_irqsave+0x40/0x50
[158595.437837]        [<c017c1f2>] get_page_from_freelist+0x227/0x398
[158595.437837]        [<c017c5a7>] __alloc_pages_nodemask+0xef/0x5f9
[158595.437837]        [<c019c34f>] alloc_slab_page+0x1d/0x21
[158595.437837]        [<c019c39f>] new_slab+0x4c/0x164
[158595.437837]        [<c019d259>] 
__slab_alloc.clone.59.clone.64+0x247/0x2de
[158595.437837]        [<c019dd21>] __kmalloc_track_caller+0x55/0xa4
[158595.437837]        [<c02d56fb>] __alloc_skb+0x51/0x100
[158595.437837]        [<c02d2cfa>] sock_alloc_send_pskb+0x9e/0x263
[158595.437837]        [<c02d2ed7>] sock_alloc_send_skb+0x18/0x1d
[158595.437837]        [<c0303e04>] 
__ip_append_data.clone.52+0x302/0x6dc
[158595.437837]        [<c030494c>] ip_append_data+0x80/0x88
[158595.437837]        [<c03209dd>] icmp_push_reply+0x5c/0x101
[158595.437837]        [<c0321555>] icmp_send+0x31d/0x342
[158595.437837]        [<f862b05c>] send_unreach+0x19/0x1b [ipt_REJECT]
[158595.437837]        [<f862b0f5>] reject_tg+0x53/0x2de [ipt_REJECT]
[158595.437837]        [<c033359a>] ipt_do_table+0x3ad/0x410
[158595.437837]        [<f856c0c4>] iptable_filter_hook+0x56/0x5e 
[iptable_filter]
[158595.437837]        [<c02f9941>] nf_iterate+0x36/0x5c
[158595.437837]        [<c02f99bf>] nf_hook_slow+0x58/0xf1
[158595.437837]        [<c0301f33>] ip_forward+0x295/0x2a2
[158595.437837]        [<c0300969>] ip_rcv_finish+0x31a/0x33c
[158595.437837]        [<c03009d1>] NF_HOOK.clone.11+0x46/0x4d
[158595.437837]        [<c0300cec>] ip_rcv+0x201/0x23d
[158595.437837]        [<c02deca7>] __netif_receive_skb+0x329/0x378
[158595.437837]        [<c02dee74>] netif_receive_skb+0x4e/0x7d
[158595.437837]        [<c02def60>] napi_skb_finish+0x1e/0x34
[158595.437837]        [<c02df389>] napi_gro_receive+0x20/0x24
[158595.437837]        [<f850e213>] rtl8169_poll+0x2e6/0x52c [r8169]
[158595.437837]        [<c02df48f>] net_rx_action+0x90/0x15d
[158595.437837]        [<c012b42d>] __do_softirq+0x7b/0x118
[158595.437837]
[158595.437837] -> #0 (slock-AF_INET){+.-...}:
[158595.437837]        [<c015a08b>] __lock_acquire+0x9a3/0xc27
[158595.437837]        [<c015a6d1>] lock_acquire+0x71/0x85
[158595.437837]        [<c034ddad>] _raw_spin_lock+0x33/0x40
[158595.437837]        [<f86453ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
[158595.437837]        [<f86591fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
[158595.437837]        [<c02e0573>] dev_hard_start_xmit+0x333/0x3f2
[158595.437837]        [<c02f09d8>] sch_direct_xmit+0x55/0x119
[158595.437837]        [<c02e08b4>] dev_queue_xmit+0x282/0x418
[158595.437837]        [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]        [<c031f8b0>] arp_xmit+0x22/0x24
[158595.437837]        [<c031f8f3>] arp_send+0x41/0x48
[158595.437837]        [<c031fe09>] arp_process+0x289/0x491
[158595.437837]        [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]        [<c031fb2c>] arp_rcv+0xb1/0xc3
[158595.437837]        [<c02deca7>] __netif_receive_skb+0x329/0x378
[158595.437837]        [<c02ded5f>] process_backlog+0x69/0x130
[158595.437837]        [<c02df48f>] net_rx_action+0x90/0x15d
[158595.437837]        [<c012b42d>] __do_softirq+0x7b/0x118
[158595.437837]
[158595.437837] other info that might help us debug this:
[158595.437837]
[158595.437837]  Possible unsafe locking scenario:
[158595.437837]
[158595.437837]        CPU0                    CPU1
[158595.437837]        ----                    ----
[158595.437837]   lock(_xmit_ETHER#2);
[158595.437837]                                lock(slock-AF_INET);
[158595.437837]                                lock(_xmit_ETHER#2);
[158595.437837]   lock(slock-AF_INET);
[158595.437837]
[158595.437837]  *** DEADLOCK ***
[158595.437837]
[158595.437837] 3 locks held by swapper/0/0:
[158595.437837]  #0:  (rcu_read_lock){.+.+..}, at: [<c02dbf9c>] 
rcu_lock_acquire+0x0/0x30
[158595.437837]  #1:  (rcu_read_lock_bh){.+....}, at: [<c02dbf9c>] 
rcu_lock_acquire+0x0/0x30
[158595.437837]  #2:  (_xmit_ETHER#2){+.-...}, at: [<c02f09b9>] 
sch_direct_xmit+0x36/0x119
[158595.437837]
[158595.437837] stack backtrace:
[158595.437837] Pid: 0, comm: swapper/0 Tainted: G        W    
3.4.0-build-0061 #12
[158595.437837] Call Trace:
[158595.437837]  [<c034c156>] ? printk+0x18/0x1a
[158595.437837]  [<c0158a74>] print_circular_bug+0x1ac/0x1b6
[158595.437837]  [<c015a08b>] __lock_acquire+0x9a3/0xc27
[158595.437837]  [<c015a6d1>] lock_acquire+0x71/0x85
[158595.437837]  [<f86453ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437837]  [<c034ddad>] _raw_spin_lock+0x33/0x40
[158595.437837]  [<f86453ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437837]  [<f86453ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437837]  [<f86591fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth]
[158595.437837]  [<c02e0573>] dev_hard_start_xmit+0x333/0x3f2
[158595.437837]  [<c02f09d8>] sch_direct_xmit+0x55/0x119
[158595.437837]  [<c02e08b4>] dev_queue_xmit+0x282/0x418
[158595.437837]  [<c02e0632>] ? dev_hard_start_xmit+0x3f2/0x3f2
[158595.437837]  [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]  [<c031f8b0>] arp_xmit+0x22/0x24
[158595.437837]  [<c02e0632>] ? dev_hard_start_xmit+0x3f2/0x3f2
[158595.437837]  [<c031f8f3>] arp_send+0x41/0x48
[158595.437837]  [<c031fe09>] arp_process+0x289/0x491
[158595.437837]  [<c031fb80>] ? __neigh_lookup.clone.20+0x42/0x42
[158595.437837]  [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]  [<c031fb2c>] arp_rcv+0xb1/0xc3
[158595.437837]  [<c031fb80>] ? __neigh_lookup.clone.20+0x42/0x42
[158595.437837]  [<c02deca7>] __netif_receive_skb+0x329/0x378
[158595.437837]  [<c02ded5f>] process_backlog+0x69/0x130
[158595.437837]  [<c02df48f>] net_rx_action+0x90/0x15d
[158595.437837]  [<c012b42d>] __do_softirq+0x7b/0x118
[158595.437837]  [<c013236e>] ? do_send_specific+0xb/0x8f
[158595.437837]  [<c012b3b2>] ? local_bh_enable+0xd/0xd
[158595.437837]  <IRQ>  [<c012b648>] ? irq_exit+0x41/0x91
[158595.437837]  [<c0103c73>] ? do_IRQ+0x79/0x8d
[158595.437837]  [<c0158011>] ? trace_hardirqs_off_caller+0x2e/0x86
[158595.437837]  [<c034f2ee>] ? common_interrupt+0x2e/0x34
[158595.437837]  [<c015007b>] ? ktime_get_ts+0x8f/0x9b
[158595.437837]  [<c0108a0a>] ? mwait_idle+0x50/0x5a
[158595.437837]  [<c01091ac>] ? cpu_idle+0x55/0x6f
[158595.437837]  [<c033e2b1>] ? rest_init+0xa1/0xa7
[158595.437837]  [<c033e210>] ? __read_lock_failed+0x14/0x14
[158595.437837]  [<c049874f>] ? start_kernel+0x30d/0x314
[158595.437837]  [<c0498209>] ? repair_env_string+0x51/0x51
[158595.437837]  [<c04980a8>] ? i386_start_kernel+0xa8/0xaf

[63546.808787]
[63546.809025] ======================================================
[63546.809259] [ INFO: possible circular locking dependency detected ]
[63546.809494] 3.4.1-build-0061 #14 Not tainted
[63546.809685] -------------------------------------------------------
[63546.809685] swapper/0/0 is trying to acquire lock:
[63546.809685]  (slock-AF_INET){+.-...}, at: [<f8c593ec>] 
l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]
[63546.809685] but task is already holding lock:
[63546.809685]  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[63546.809685]
[63546.809685] which lock already depends on the new lock.
[63546.809685]
[63546.809685]
[63546.809685] the existing dependency chain (in reverse order) is:
[63546.809685]
[63546.809685] -> #1 (_xmit_ETHER#2){+.-...}:
[63546.809685]        [<c015a561>] lock_acquire+0x71/0x85
[63546.809685]        [<c034dc06>] _raw_spin_lock_bh+0x38/0x45
[63546.809685]        [<c02a4e8a>] ppp_push+0x59/0x4b3
[63546.809685]        [<c02a66b9>] ppp_xmit_process+0x41b/0x4be
[63546.809685]        [<c02a69b9>] ppp_write+0x90/0xa1
[63546.809685]        [<c01a2e8c>] vfs_write+0x7e/0xab
[63546.809685]        [<c01a2ffc>] sys_write+0x3d/0x5e
[63546.809685]        [<c034e191>] syscall_call+0x7/0xb
[63546.809685]
[63546.809685] -> #0 (slock-AF_INET){+.-...}:
[63546.809685]        [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[63546.809685]        [<c015a561>] lock_acquire+0x71/0x85
[63546.809685]        [<c034da2d>] _raw_spin_lock+0x33/0x40
[63546.809685]        [<f8c593ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
[63546.809685]        [<f8c751fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
[63546.809685]        [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[63546.809685]        [<c02f064c>] sch_direct_xmit+0x55/0x119
[63546.809685]        [<c02e0528>] dev_queue_xmit+0x282/0x418
[63546.809685]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]        [<c031f524>] arp_xmit+0x22/0x24
[63546.809685]        [<c031f567>] arp_send+0x41/0x48
[63546.809685]        [<c031fa7d>] arp_process+0x289/0x491
[63546.809685]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]        [<c031f7a0>] arp_rcv+0xb1/0xc3
[63546.809685]        [<c02de91b>] __netif_receive_skb+0x329/0x378
[63546.809685]        [<c02de9d3>] process_backlog+0x69/0x130
[63546.809685]        [<c02df103>] net_rx_action+0x90/0x15d
[63546.809685]        [<c012b2b5>] __do_softirq+0x7b/0x118
[63546.809685]
[63546.809685] other info that might help us debug this:
[63546.809685]
[63546.809685]  Possible unsafe locking scenario:
[63546.809685]
[63546.809685]        CPU0                    CPU1
[63546.809685]        ----                    ----
[63546.809685]   lock(_xmit_ETHER#2);
[63546.809685]                                lock(slock-AF_INET);
[63546.809685]                                lock(_xmit_ETHER#2);
[63546.809685]   lock(slock-AF_INET);
[63546.809685]
[63546.809685]  *** DEADLOCK ***
[63546.809685]
[63546.809685] 3 locks held by swapper/0/0:
[63546.809685]  #0:  (rcu_read_lock){.+.+..}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[63546.809685]  #1:  (rcu_read_lock_bh){.+....}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[63546.809685]  #2:  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[63546.809685]
[63546.809685] stack backtrace:
[63546.809685] Pid: 0, comm: swapper/0 Not tainted 3.4.1-build-0061 #14
[63546.809685] Call Trace:
[63546.809685]  [<c034bdd2>] ? printk+0x18/0x1a
[63546.809685]  [<c0158904>] print_circular_bug+0x1ac/0x1b6
[63546.809685]  [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[63546.809685]  [<c015a561>] lock_acquire+0x71/0x85
[63546.809685]  [<f8c593ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]  [<c034da2d>] _raw_spin_lock+0x33/0x40
[63546.809685]  [<f8c593ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]  [<f8c593ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]  [<f8c751fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth]
[63546.809685]  [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[63546.809685]  [<c02f064c>] sch_direct_xmit+0x55/0x119
[63546.809685]  [<c02e0528>] dev_queue_xmit+0x282/0x418
[63546.809685]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[63546.809685]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]  [<c031f524>] arp_xmit+0x22/0x24
[63546.809685]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[63546.809685]  [<c031f567>] arp_send+0x41/0x48
[63546.809685]  [<c031fa7d>] arp_process+0x289/0x491
[63546.809685]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[63546.809685]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]  [<c031f7a0>] arp_rcv+0xb1/0xc3
[63546.809685]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[63546.809685]  [<c02de91b>] __netif_receive_skb+0x329/0x378
[63546.809685]  [<c02de9d3>] process_backlog+0x69/0x130
[63546.809685]  [<c02df103>] net_rx_action+0x90/0x15d
[63546.809685]  [<c012b2b5>] __do_softirq+0x7b/0x118
[63546.809685]  [<c012b23a>] ? local_bh_enable+0xd/0xd
[63546.809685]  [<c012b23a>] ? local_bh_enable+0xd/0xd
[63546.809685]  <IRQ>  [<c012b4d0>] ? irq_exit+0x41/0x91
[63546.809685]  [<c0103c6f>] ? do_IRQ+0x79/0x8d
[63546.809685]  [<c0157ea1>] ? trace_hardirqs_off_caller+0x2e/0x86
[63546.809685]  [<c034ef6e>] ? common_interrupt+0x2e/0x34
[63546.809685]  [<c015007b>] ? do_gettimeofday+0x20/0x29
[63546.809685]  [<c0108a06>] ? mwait_idle+0x50/0x5a
[63546.809685]  [<c01091a8>] ? cpu_idle+0x55/0x6f
[63546.809685]  [<c033df25>] ? rest_init+0xa1/0xa7
[63546.809685]  [<c033de84>] ? __read_lock_failed+0x14/0x14
[63546.809685]  [<c0498745>] ? start_kernel+0x303/0x30a
[63546.809685]  [<c0498209>] ? repair_env_string+0x51/0x51
[63546.809685]  [<c04980a8>] ? i386_start_kernel+0xa8/0xaf

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 10:04 UTC (permalink / raw)
  To: Sathya.Perla; +Cc: eric.dumazet, netdev
In-Reply-To: <CAL8zT=iP+5o11am67ZBVOd=QrOfcjFWDydiuM9RrrAKe_k7LZw@mail.gmail.com>

2012/5/30 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/5/30  <Sathya.Perla@emulex.com>:
>>>-----Original Message-----
>>>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>>>Behalf Of Jean-Michel Hautbois
>>>
>>>2012/5/30 Jean-Michel Hautbois <jhautbois@gmail.com>:
>>>
>>>I used vmstat in order to see the differences between the two kernels.
>>>The main difference is the number of interrupts per second.
>>>I have an average of 87500 on 3.2 and 7500 on 2.6, 10 times lower !
>>>I suspect the be2net driver to be the main cause, and I checkes the
>>>/proc/interrupts file in order to be sure.
>>>
>>>I have for eth1-tx on 2.6.26 about 2200 interrupts per second and 23000 on 3.2.
>>>BTW, it is named eth1-q0 on 3.2 (and tx and rx are the same IRQ)
>>>whereas there is eth1-rx0 and eth1-tx on 2.6.26.
>>
>> Yes, there is an issue with be2net interrupt mitigation in the recent code with
>> RX and TX on the same Evt-Q (commit 10ef9ab4). The high interrupt rate happens when a TX blast is
>> done while RX is relatively silent on a queue pair. Interrupt rate due to TX completions is not being
>> mitigated.
>>
>> I have a fix and will send it out soon..
>>
>> thanks,
>> -Sathya
>
> Hi Sathya !
> Thanks for this information !
> I had the correct diagnostic :). I am waiting for your fix.
>

Well, well, well, after having tested several configurations, several
drivers, I have a big difference between an old 2.6.26 kernel and a
newer one (I tried 3.2 and 3.4).

Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
set to 4096. I am sending packets only, nothing on RX.
I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
kernel, but a lot of drops with a newer kernel.
So, I don't know if I missed something in my kernel configuration, but
I have used the 2.6.26 one as a reference, in order to set the same
options (DMA related, etc).

I easily reproduce this problem and setting a bigger txqueuelen solves
it partially.
1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !

If you have any idea, I am interested, as this is a big issue for my use case.

JM

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Eric Dumazet @ 2012-06-06 11:01 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=ggT7Y2on6qmsp3u9CLOCwd6nOr3VjQfEsGZzA+O6us0A@mail.gmail.com>

On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:

> Well, well, well, after having tested several configurations, several
> drivers, I have a big difference between an old 2.6.26 kernel and a
> newer one (I tried 3.2 and 3.4).
> 
> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
> set to 4096. I am sending packets only, nothing on RX.
> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
> kernel, but a lot of drops with a newer kernel.
> So, I don't know if I missed something in my kernel configuration, but
> I have used the 2.6.26 one as a reference, in order to set the same
> options (DMA related, etc).
> 
> I easily reproduce this problem and setting a bigger txqueuelen solves
> it partially.
> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
> 
> If you have any idea, I am interested, as this is a big issue for my use case.
> 

Yep.

This driver wants to limit number of tx completions, thats just wrong.

Fix and dirty patch:


diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index c5c4c0e..1e8f8a6 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
 #define MAX_TX_QS		8
 #define MAX_ROCE_EQS		5
 #define MAX_MSIX_VECTORS	(MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
-#define BE_TX_BUDGET		256
+#define BE_TX_BUDGET		65535
 #define BE_NAPI_WEIGHT		64
 #define MAX_RX_POST		BE_NAPI_WEIGHT /* Frags posted at a time */
 #define RX_FRAGS_REFILL_WM	(RX_Q_LEN - MAX_RX_POST)

^ permalink raw reply related

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 11:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338972341.2760.3944.camel@edumazet-glaptop>

On Wed, Jun 06, 2012 at 10:45:41AM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> > on 32bit arches.
> > 
> > We must use separate syncp for rx and tx path as they can be run at the
> > same time on different cpus. Thus one sequence increment can be lost and
> > readers spin forever.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Stephen Hemminger <shemminger@vyatta.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > ---
> 
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
> 
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
> 
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

We currently do all stats either on napi callback or from
start_xmit callback.
This makes them safe, yes?

> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
> 
> 

^ permalink raw reply

* Re: [PATCH] netdev: mv643xx_eth: Prevent build on PPC32
From: Josh Boyer @ 2012-06-06 11:21 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Ben Hutchings, Lennert Buytenhek, Olof Johansson, netdev
In-Reply-To: <20120606052910.GA674@lunn.ch>

On Wed, Jun 06, 2012 at 07:29:10AM +0200, Andrew Lunn wrote:
> > The proper fix, from my minimal looking, was one of:
> > 
> > 1) revert the change for ARM that introduced th clk stuff
> > 2) do a similar change as the original commit but with a bunch of
> > #ifdef-ery
> > 3) implement the clkdev API stuff for 32-bit ppc
> > 
> > Honestly, I'd go for either 1 or 2.  The commit that introduced it was
> > broken to begin with, but that isn't my call.
> 
> I broke it. Sorry.
> 
> At the time, there was a push to remove all the #ifdefs. The following
> patchset was doing this:
> 
> https://lkml.org/lkml/2012/4/21/94
> 
> it would provide dummy implementations for those systems without clk
> support. However, it seems that patch set never made it in, and i did
> not declare my dependency on it.
> 
> I'm happy to add #ifdef. However, i would first like to understand
> what was 'broken to begin with'.

Simply that a commit was introduced that did not build on all the
existing platforms the driver supports.  The world is not ARM, or x86,
or PPC32, etc.  I haven't looked to see if it would still function
correctly in the presence of a dummy clk implementation, but if not that
would also be bad.

josh

^ permalink raw reply

* Re: [PATCH] net: sierra_net: device IDs for Aircard 320U++
From: Greg KH @ 2012-06-06 12:16 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Dan Williams, linux-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8, Autif Khan,
	Tom Cassidy, stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87wr3kua2k.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

On Wed, Jun 06, 2012 at 10:19:15AM +0200, Bjørn Mork wrote:
> Greg KH <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org> writes:
> > On Wed, Jun 06, 2012 at 09:18:10AM +0200, Bjørn Mork wrote:
> >> Adding device IDs for Aircard 320U and two other devices
> >> found in the out-of-tree version of this driver.
> >> 
> >> Cc: linux-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8@public.gmane.org
> >> Cc: Autif Khan <autif.mlist-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> Cc: Tom Cassidy <tomas.cassidy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Signed-off-by: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
> >> ---
> >>  drivers/net/usb/sierra_net.c |   14 ++++++++++----
> >>  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > Wait, Tom just sent me a patch adding these device ids to the sierra
> > serial driver, why would the same device work for both drivers?
> 
> Because it's a composite device.  Was this a trick question? :-)
> 
> >  Where should the device id go?
> 
> To both drivers.  The device is similar to the 1199:68a3 device already
> supported by both drivers.  It has a number of serial ports (depending
> on how many features like GPS etc is enabled) supported by the "sierra"
> driver and one ethernet interface speaking Sierra's HIP protocol
> supported by the "sierra_net" driver.

Ok, thanks for clearing that up, I'll take the serial patch, and I'm
sure that David will take this one.

	Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 12:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <1338980484.2760.4219.camel@edumazet-glaptop>

2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>
>> Well, well, well, after having tested several configurations, several
>> drivers, I have a big difference between an old 2.6.26 kernel and a
>> newer one (I tried 3.2 and 3.4).
>>
>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>> set to 4096. I am sending packets only, nothing on RX.
>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>> kernel, but a lot of drops with a newer kernel.
>> So, I don't know if I missed something in my kernel configuration, but
>> I have used the 2.6.26 one as a reference, in order to set the same
>> options (DMA related, etc).
>>
>> I easily reproduce this problem and setting a bigger txqueuelen solves
>> it partially.
>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>
>> If you have any idea, I am interested, as this is a big issue for my use case.
>>
>
> Yep.
>
> This driver wants to limit number of tx completions, thats just wrong.
>
> Fix and dirty patch:
>
>
> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
> index c5c4c0e..1e8f8a6 100644
> --- a/drivers/net/ethernet/emulex/benet/be.h
> +++ b/drivers/net/ethernet/emulex/benet/be.h
> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>  #define MAX_TX_QS              8
>  #define MAX_ROCE_EQS           5
>  #define MAX_MSIX_VECTORS       (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
> -#define BE_TX_BUDGET           256
> +#define BE_TX_BUDGET           65535
>  #define BE_NAPI_WEIGHT         64
>  #define MAX_RX_POST            BE_NAPI_WEIGHT /* Frags posted at a time */
>  #define RX_FRAGS_REFILL_WM     (RX_Q_LEN - MAX_RX_POST)
>

I will try that in a few minutes.
I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
a broadcom (bnx2x).

JM

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 13:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=hHDPnvRFcQ0w+D=AP+QK6ic4X=tva6Yw_XGwuTbAYjhQ@mail.gmail.com>

2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
>> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>>
>>> Well, well, well, after having tested several configurations, several
>>> drivers, I have a big difference between an old 2.6.26 kernel and a
>>> newer one (I tried 3.2 and 3.4).
>>>
>>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>>> set to 4096. I am sending packets only, nothing on RX.
>>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>>> kernel, but a lot of drops with a newer kernel.
>>> So, I don't know if I missed something in my kernel configuration, but
>>> I have used the 2.6.26 one as a reference, in order to set the same
>>> options (DMA related, etc).
>>>
>>> I easily reproduce this problem and setting a bigger txqueuelen solves
>>> it partially.
>>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>>
>>> If you have any idea, I am interested, as this is a big issue for my use case.
>>>
>>
>> Yep.
>>
>> This driver wants to limit number of tx completions, thats just wrong.
>>
>> Fix and dirty patch:
>>
>>
>> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
>> index c5c4c0e..1e8f8a6 100644
>> --- a/drivers/net/ethernet/emulex/benet/be.h
>> +++ b/drivers/net/ethernet/emulex/benet/be.h
>> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>>  #define MAX_TX_QS              8
>>  #define MAX_ROCE_EQS           5
>>  #define MAX_MSIX_VECTORS       (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
>> -#define BE_TX_BUDGET           256
>> +#define BE_TX_BUDGET           65535
>>  #define BE_NAPI_WEIGHT         64
>>  #define MAX_RX_POST            BE_NAPI_WEIGHT /* Frags posted at a time */
>>  #define RX_FRAGS_REFILL_WM     (RX_Q_LEN - MAX_RX_POST)
>>
>
> I will try that in a few minutes.
> I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
> a broadcom (bnx2x).
>

And it is not really better, still need about 18000 at 2.4Gbps in
order to avoid drops...
I really think there is something in the networking stack or in my
configuration (DMA ? Something else ?)...
As it doesn't seem to be driver related as I said...

JM

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 13:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606111357.GA15070@redhat.com>

On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:

> We currently do all stats either on napi callback or from
> start_xmit callback.
> This makes them safe, yes?

Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
include/linux/u64_stats_sync.h section 6)

 * 6) If counter might be written by an interrupt, readers should block interrupts.
 *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
 *     read partial values)

Yes, its tricky...

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..705aaa7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
 		u64 tpackets, tbytes, rpackets, rbytes;

 		do {
-			start = u64_stats_fetch_begin(&stats->syncp);
+			start = u64_stats_fetch_begin_bh(&stats->syncp);
 			tpackets = stats->tx_packets;
 			tbytes   = stats->tx_bytes;
 			rpackets = stats->rx_packets;
 			rbytes   = stats->rx_bytes;
-		} while (u64_stats_fetch_retry(&stats->syncp, start));
+		} while (u64_stats_fetch_retry_bh(&stats->syncp, start));

 		tot->rx_packets += rpackets;
 		tot->tx_packets += tpackets;

^ permalink raw reply related

* [PATCH 1/1] block/nbd: micro-optimization in nbd request completion
From: Chetan Loke @ 2012-06-06 14:15 UTC (permalink / raw)
  To: Paul.Clements, axboe, linux-kernel; +Cc: netdev, Chetan Loke


Add in-flight cmds to the tail. That way while searching(during request completion),we will always get a hit on the first element.


Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
 drivers/block/nbd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 061427a..8957b9f 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -481,7 +481,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req)
 		nbd_end_request(req);
 	} else {
 		spin_lock(&nbd->queue_lock);
-		list_add(&req->queuelist, &nbd->queue_head);
+		list_add_tail(&req->queuelist, &nbd->queue_head);
 		spin_unlock(&nbd->queue_lock);
 	}
 
-- 
1.7.5.2

^ permalink raw reply related

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 14:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=jGHo82mo-s8Tfs9LWzfu2GkrS4eZJoeOpHhpXHMr6csg@mail.gmail.com>

2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
>> 2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
>>> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>>>
>>>> Well, well, well, after having tested several configurations, several
>>>> drivers, I have a big difference between an old 2.6.26 kernel and a
>>>> newer one (I tried 3.2 and 3.4).
>>>>
>>>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>>>> set to 4096. I am sending packets only, nothing on RX.
>>>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>>>> kernel, but a lot of drops with a newer kernel.
>>>> So, I don't know if I missed something in my kernel configuration, but
>>>> I have used the 2.6.26 one as a reference, in order to set the same
>>>> options (DMA related, etc).
>>>>
>>>> I easily reproduce this problem and setting a bigger txqueuelen solves
>>>> it partially.
>>>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>>>
>>>> If you have any idea, I am interested, as this is a big issue for my use case.
>>>>
>>>
>>> Yep.
>>>
>>> This driver wants to limit number of tx completions, thats just wrong.
>>>
>>> Fix and dirty patch:
>>>
>>>
>>> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
>>> index c5c4c0e..1e8f8a6 100644
>>> --- a/drivers/net/ethernet/emulex/benet/be.h
>>> +++ b/drivers/net/ethernet/emulex/benet/be.h
>>> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>>>  #define MAX_TX_QS              8
>>>  #define MAX_ROCE_EQS           5
>>>  #define MAX_MSIX_VECTORS       (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
>>> -#define BE_TX_BUDGET           256
>>> +#define BE_TX_BUDGET           65535
>>>  #define BE_NAPI_WEIGHT         64
>>>  #define MAX_RX_POST            BE_NAPI_WEIGHT /* Frags posted at a time */
>>>  #define RX_FRAGS_REFILL_WM     (RX_Q_LEN - MAX_RX_POST)
>>>
>>
>> I will try that in a few minutes.
>> I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
>> a broadcom (bnx2x).
>>
>
> And it is not really better, still need about 18000 at 2.4Gbps in
> order to avoid drops...
> I really think there is something in the networking stack or in my
> configuration (DMA ? Something else ?)...
> As it doesn't seem to be driver related as I said...
>

If it can help, on a 3.0 kernel a txqueuelen of 9000 is sufficient in
order to get this bandwith on TX.

JM

^ permalink raw reply

* Re: [PATCH] ip.7: Improve explanation about calling listen or connect
From: Flavio Leitner @ 2012-06-06 14:44 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Peter Schiffer, linux-man-u79uwXL29TY76Z2rM5mHXA, netdev
In-Reply-To: <4FBF66D8.7060007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


Hi,

Could someone tell me what's the patch current state?
It has been a month already with no feedback.
thanks,
fbl

On Fri, 25 May 2012 13:02:48 +0200
Peter Schiffer <pschiffe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Hi Michael,
> 
> do you have any comments for this update? Or do you need some supporting 
> info?
> 
> peter
> 
> On 05/09/2012 02:30 PM, Flavio Leitner wrote:
> > Signed-off-by: Flavio Leitner<fbl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >   man7/ip.7 |   15 +++++++++------
> >   1 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/man7/ip.7 b/man7/ip.7
> > index 9f560df..84fe32d 100644
> > --- a/man7/ip.7
> > +++ b/man7/ip.7
> > @@ -69,12 +69,11 @@ For
> >   you may specify a valid IANA IP protocol defined in
> >   RFC\ 1700 assigned numbers.
> >   .PP
> > -.\" FIXME ip current does an autobind in listen, but I'm not sure
> > -.\" if that should be documented.
> >   When a process wants to receive new incoming packets or connections, it
> >   should bind a socket to a local interface address using
> >   .BR bind (2).
> > -Only one IP socket may be bound to any given local (address, port) pair.
> > +In this case, only one IP socket may be bound to any given local
> > +(address, port) pair.
> >   When
> >   .B INADDR_ANY
> >   is specified in the bind call, the socket will be bound to
> > @@ -82,10 +81,14 @@ is specified in the bind call, the socket will be bound to
> >   local interfaces.
> >   When
> >   .BR listen (2)
> > -or
> > +is called on an unbound socket, the socket is automatically bound
> > +to a random free port with the local address set to
> > +.BR INADDR_ANY .
> > +When
> >   .BR connect (2)
> > -are called on an unbound socket, it is automatically bound to a
> > -random free port with the local address set to
> > +is called on an unbound socket, the socket is automatically bound
> > +to a random free port or an usable shared port with the local address
> > +set to
> >   .BR INADDR_ANY .
> >
> >   A TCP local socket address that has been bound is unavailable for

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Possible deadlock in ipv6?
From: Vladimir Davydov @ 2012-06-06 14:49 UTC (permalink / raw)
  To: netdev

I'm not familiar with the linux net subsystem, so I would appreciate if 
someone could clarify if the following call chain is possible:

addrconf_ifdown() calls neigh_ifdown(nd_tbl) which locks nd_tbl.lock for 
writing and calls

     pneigh_ifdown
     pndisc_destructor
     ipv6_dev_mc_dec
     __ipv6_dev_mc_dec
     igmp6_group_dropped
     igmp6_leave_group
     igmp6_send
     icmp6_dst_alloc
     ip6_neigh_lookup
     neigh_create

and neigh_create() locks nd_tbl.lock for writing again resulting in a 
deadlock.

Thank you.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 14:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338988210.2760.4485.camel@edumazet-glaptop>

On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> 
> > We currently do all stats either on napi callback or from
> > start_xmit callback.
> > This makes them safe, yes?
> 
> Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> include/linux/u64_stats_sync.h section 6)
> 
>  * 6) If counter might be written by an interrupt, readers should block interrupts.
>  *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
>  *     read partial values)
> 
> Yes, its tricky...

Sounds good, but I have a question: this realies on counters
being atomic on 64 bit.
Would not it be better to always use a seqlock even on 64 bit?
This way counters would actually be correct and in sync.
As it is if we want e.g. average packet size,
we can not rely e.g. on it being bytes/packets.

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..705aaa7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
>  		u64 tpackets, tbytes, rpackets, rbytes;
>  
>  		do {
> -			start = u64_stats_fetch_begin(&stats->syncp);
> +			start = u64_stats_fetch_begin_bh(&stats->syncp);
>  			tpackets = stats->tx_packets;
>  			tbytes   = stats->tx_bytes;
>  			rpackets = stats->rx_packets;
>  			rbytes   = stats->rx_bytes;
> -		} while (u64_stats_fetch_retry(&stats->syncp, start));
> +		} while (u64_stats_fetch_retry_bh(&stats->syncp, start));
>  
>  		tot->rx_packets += rpackets;
>  		tot->tx_packets += tpackets;
> 

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Stephen Hemminger @ 2012-06-06 15:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Jason Wang, netdev, rusty, linux-kernel,
	virtualization
In-Reply-To: <20120606144941.GA17092@redhat.com>

On Wed, 6 Jun 2012 17:49:42 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

This has not been a requirement on real physical devices; therefore
the added overhead is not really justified.

Many network cards use counters in hardware to count packets/bytes
and there is no expectation of atomic access there.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox