[net-next v2 10/15] e1000e: Fix Hardware Unit Hang

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
To: davem@davemloft.net
Cc: David Ertman <davidx.m.ertman@intel.com>,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com,
	Bruce Allan <bruce.w.allan@intel.com>,
	Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Subject: [net-next v2 10/15] e1000e: Fix Hardware Unit Hang
Date: Tue, 18 Mar 2014 20:42:08 -0700	[thread overview]
Message-ID: <1395200533-16908-11-git-send-email-jeffrey.t.kirsher@intel.com> (raw)
In-Reply-To: <1395200533-16908-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: David Ertman <davidx.m.ertman@intel.com>

The check for pending Tx work when link is lost was mistakenly moved to be
done only when link is first detected to be lost.  It turns out there is a
small window of opportunity for additional Tx work to get queued up shortly
after link is dropped.

Move the check back to the place it was before in the watchdog task.  Put in
additional debug information for other reset paths and a final catch-all for
false hangs in the scheduled function that prints out the hardware hang
message.

Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 32 +++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index eafad41..d577723 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1090,8 +1090,14 @@ static void e1000_print_hw_hang(struct work_struct *work)
 		adapter->tx_hang_recheck = true;
 		return;
 	}
-	/* Real hang detected */
 	adapter->tx_hang_recheck = false;
+
+	if (er32(TDH(0)) == er32(TDT(0))) {
+		e_dbg("false hang detected, ignoring\n");
+		return;
+	}
+
+	/* Real hang detected */
 	netif_stop_queue(netdev);
 
 	e1e_rphy(hw, MII_BMSR, &phy_status);
@@ -1121,6 +1127,8 @@ static void e1000_print_hw_hang(struct work_struct *work)
 	      eop, jiffies, eop_desc->upper.fields.status, er32(STATUS),
 	      phy_status, phy_1000t_status, phy_ext_status, pci_status);
 
+	e1000e_dump(adapter);
+
 	/* Suggest workaround for known h/w issue */
 	if ((hw->mac.type == e1000_pchlan) && (er32(CTRL) & E1000_CTRL_TFCE))
 		e_err("Try turning off Tx pause (flow control) via ethtool\n");
@@ -4798,6 +4806,7 @@ static void e1000e_check_82574_phy_workaround(struct e1000_adapter *adapter)
 
 	if (adapter->phy_hang_count > 1) {
 		adapter->phy_hang_count = 0;
+		e_dbg("PHY appears hung - resetting\n");
 		schedule_work(&adapter->reset_task);
 	}
 }
@@ -4956,15 +4965,11 @@ static void e1000_watchdog_task(struct work_struct *work)
 				mod_timer(&adapter->phy_info_timer,
 					  round_jiffies(jiffies + 2 * HZ));
 
-			/* The link is lost so the controller stops DMA.
-			 * If there is queued Tx work that cannot be done
-			 * or if on an 8000ES2LAN which requires a Rx packet
-			 * buffer work-around on link down event, reset the
-			 * controller to flush the Tx/Rx packet buffers.
-			 * (Do the reset outside of interrupt context).
+			/* 8000ES2LAN requires a Rx packet buffer work-around
+			 * on link down event; reset the controller to flush
+			 * the Rx packet buffer.
 			 */
-			if ((adapter->flags & FLAG_RX_NEEDS_RESTART) ||
-			    (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
+			if (adapter->flags & FLAG_RX_NEEDS_RESTART)
 				adapter->flags |= FLAG_RESTART_NOW;
 			else
 				pm_schedule_suspend(netdev->dev.parent,
@@ -4987,6 +4992,15 @@ link_up:
 	adapter->gotc_old = adapter->stats.gotc;
 	spin_unlock(&adapter->stats64_lock);
 
+	/* If the link is lost the controller stops DMA, but
+	 * if there is queued Tx work it cannot be done.  So
+	 * reset the controller to flush the Tx packet buffers.
+	 */
+	if (!netif_carrier_ok(netdev) &&
+	    (e1000_desc_unused(tx_ring) + 1 < tx_ring->count))
+		adapter->flags |= FLAG_RESTART_NOW;
+
+	/* If reset is necessary, do it outside of interrupt context. */
 	if (adapter->flags & FLAG_RESTART_NOW) {
 		schedule_work(&adapter->reset_task);
 		/* return immediately since reset is imminent */
-- 
1.8.3.1

next prev parent reply	other threads:[~2014-03-19  3:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-19  3:41 [net-next v2 00/15][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
2014-03-19  3:41 ` [net-next v2 01/15] i40e: support VF link state ndo Jeff Kirsher
2014-03-19 18:06   ` David Miller
2014-03-19  3:42 ` [net-next v2 02/15] i40evf: correctly program RSS HLUT table Jeff Kirsher
2014-03-19 18:07   ` David Miller
2014-03-19  3:42 ` [net-next v2 03/15] i40evf: use min_t Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 04/15] i40e: Patch to enable Ethtool/netdev feature flag for NTUPLE control Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 05/15] i40e: Refactor and cleanup i40e_open(), adding i40e_vsi_open() Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 06/15] i40e/i40evf: enable hardware feature head write back Jeff Kirsher
2014-03-19 18:09   ` David Miller
2014-03-19  3:42 ` [net-next v2 07/15] i40e/i40evf: reduce context descriptors Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 08/15] i40e: potential array underflow in i40e_vc_process_vf_msg() Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 09/15] i40e/i40evf: Bump build versions Jeff Kirsher
2014-03-19  3:42 ` Jeff Kirsher [this message]
2014-03-19  3:42 ` [net-next v2 11/15] e1000e: Fix Explicitly set Transmit Control Register Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 12/15] igb: Add register defines needed for time sync functions Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 13/15] ixgbe: add ixgbe_write_pci_cfg_word with ixgbe_removed check Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 14/15] ixgbevf: Indicate removal state explicitly Jeff Kirsher
2014-03-19  3:42 ` [net-next v2 15/15] ixgbevf: Protect ixgbevf_down with __IXGBEVF_DOWN bit Jeff Kirsher
2014-03-19  8:17 ` [net-next v2 00/15][pull request] Intel Wired LAN Driver Updates Jeff Kirsher

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:eafad41 dfblob:d577723 )
 OR (
bs:"[net-next v2 10/15] e1000e: Fix Hardware Unit Hang" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1395200533-16908-11-git-send-email-jeffrey.t.kirsher@intel.com \
    --to=jeffrey.t.kirsher@intel.com \
    --cc=bruce.w.allan@intel.com \
    --cc=davem@davemloft.net \
    --cc=davidx.m.ertman@intel.com \
    --cc=gospo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=sassmann@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).